User-Oriented Frameworks for GPT Engineering: Initial Approaches

Michael Foster
9 min readNov 21, 2023

--

Of its many promises, artificial intelligence has brought us closer to one: fully modified AI assistants we can design specifically to entertain, inform, educate, or work for us.

Now, there’s just one problem: one needs to program their own assistant, and few people are programmers.

ChatGPT’s solution is simple: you program in natural language, and since everyone knows natural language, this is easier.

This, in fact, is wrong for several reasons.

For one, many people do not think in terms of natural language. I used to teach language and literature, trust me on this.

For those that do, there is still a translation and application issue that is currently too overlooked.

Even if natural language provides the optimal language for establishing parameters to create a GPT, it doesn’t mean that natural language will optimize the programming process. Of course GPT is optimized to that workflow, but human beings are not.

We do not use language because we prefer it to other forms of communication; we use it because we have no choice.

Therefore, there’s a market for providing frameworks on top of the current UI. Let’s quickly revisit it.

The red textbox is user inputs, the blue is the builder’s inputs (based on your inputs), and the green square is the output.

The visual language emphasizes GPT Builder and ChatGPT as the real stars of the process–users need to simply type one sentence to get their own GPT.

Every individual is going to input into the builder based on several criteria: their experience with ChatGPT, their comfort with the English language (I know GPT can take other language inputs, but ChatGPT is built on English at its core and this fact needs to be kept in our minds at all times), their desire to type for long stretches (programming a GPT on a phone is much harder than on a desktop computer), and so on.

The visual language encourages the user to think of one parameter at a time and build on that, which most people will not really think to do. Also, it takes up too much time.

An alternative would be to reintroduce a structure for users that guides their intuitive use of natural language into the format that ChatGPT most likes.

There are other advantages to creating a structure: you can build a company that, for instance, makes it easier for individuals to build marketing assistants and another for RPG dungeon masters.

Indeed, one could build a decision tree that guides a user towards several basic use cases–consider those sets that the decision tree then moves into more and more specific subsets that, themselves, would rely on pre-established frameworks that do not limit the inputs into GPT but make it easier for the user to provide relevant parameters that get the user to where they want to go more quickly.

A Framework for Interface Building

That’s a very abstract idea, so let me give you an example. I will base this on Alexander Sniffin’s article on how to develop a custom GPT with no code specifically to allow “users to upload images of an outfit where the vision model identifies the different clothing pieces then attempting to find those same clothing pieces online.” I’m too much of a slob to use such a thing, but this obviously has a huge market.

His instructions are brilliant and, while I don’t know Alexander personally and have never spoken to him, he obviously knows how to create retail-oriented programs and thinks very clearly in the kind of way that is so crucial for successful digital marketing: combining ideas from complex mathematics such as set theory and graph theory and ideas from cultural theory, linguistics, and communication.

Idk who’s reading this, but if you’re looking for someone to build or market AI products, hire Alexander and give him a pile of money. The guy is legit.

The only criticism I have of Alexander’s suggestions is that they have an assumed structure that may be suboptimal for creating the kind of bot he has made. It is good, but many better guides will come out (they may exist and I haven’t seen them), and our sense of the mistakes in GPT Engineering will become more intuitive (I already see some small errors in Alexander’s article I would tweak, but as I say they are very minor).

Therefore, a framework built on his guide isn’t a useful framework. Think of what I’m about to show you as a guide for how these kinds of frameworks could be built that could, in turn, be stitched together and packaged as a new, perhaps even gamified, user interface that makes creating and exchanging GPTs much easier.

To do this, I want to quote Sniffin’s full instructions to GPT and its expected initial output:

You’re an AI assistant designed to help the user find similar clothing online by analyzing and identify clothing from example images. These images can be sourced from social media posts, user uploads like screenshots, etc. Your task involves detailed analysis and subsequent search for similar clothing items available for purchase.

Step-by-Step Process:
1. Image Acquisition:
- Request the user to provide an image. This can be a direct upload or a screenshot from social media platforms.
- Note: Inform the user that screenshots may be necessary for certain social media platforms that require login, as you cannot access these platforms directly.

2. Identifying the Subject:
- If the image contains multiple people, ask the user to specify whose clothing they are interested in.
- Proceed once the user identifies the subject of interest.

3. Detailed Clothing Analysis:
- Thoroughly describe each piece of clothing worn by the chosen subject in the image.
- Include details such as color, pattern, fabric type, style (e.g., v-neck, button-down), and any distinctive features (e.g., logos, embellishments).

4. Verification:
- Present the clothing description to the user for confirmation.
- If there are inaccuracies or missing details, ask the user to clarify or provide additional information.

5. Search and Present Options:
- Once the description is confirmed, begin web browsing for similar clothing items.
- Ask the user if they prefer to search for all items simultaneously or one at a time.
- Searched results can be direct links to a specific item or a search query to another site.
- For each item found, provide a direct purchase link for each line item, the link should be the entire summery of the item. e.g. “[- Amazon: A white t-shirt](link)”
- Try to provide a price if possible for each item

6. User Confirmation and Iteration:
- After presenting each find, ask the user to confirm if it matches their expectations.
- If the user is not satisfied, either adjust the search based on new input (repeat from step 5) or ask if they wish to start the process over with a new image.

Constraints:
- When asking the user questions, prompt in clear and simple to understand format, give the user a selection of options in a structured manner. e.g. “… Let me know if this correct, here are the next steps: — Search for all items — Search each item one at a time”
- Format your responses in HTML or MD to be easier to read
- Be concise when possible, remember the user is trying to find answers quickly
- Speak with emojis and be helpful, for example this would be an intro:
“””
# 🌟 Welcome to Your Fashion Search Assistant Powered by ChatGPT! 🌟

Hello! 👋 If you’re looking to **find clothing items similar to those in a photo**, I’m here to help. 🛍️👗👔
### Getting Started is Easy:
1. **Upload an Image** 🖼️ or
2. **Provide a Screenshot** from a social media platform. 📱💻 🔍

**Remember:** If it’s from a social media platform that requires login, a **screenshot** will be necessary. Let’s embark on this fashion-finding journey together! 🚀
“””

This will look familiar to anyone who has taught English Composition 101 or creative writing: this is, in essence, a writing guide. To teach writing one has to break it down.

There is a breakdown here already provided by Sniffin, with six process steps preceded by establishing parameters. This is how a computer program is conventionally coded, so the structure isn’t surprising.

However, for some aspects of these instructions, the language of programming doesn’t really fit here, and the natural language at play obscures implied subsets that make this approach to programming bots much more powerful. These emerge if we try to restate this instruction process as a matrix of Sets -> subsets -> subsets.

Firstly, the objective includes 8 inputs defining 8 vectors. I’ve tagged these using HTML because I’m an old man:

Here the second subsets would be broken down for each subset1, so e.g. Author could be broken down into characterization (“you’re an AI assistant modeled after Ernest Hemingway” or “You’re an AI assistant who also moonlights as a dominatrix and it keeps slipping into your dayjob recommending clothes”).

By translating it into a matrix, we now have a structure to offer users further and further resolution for exactly how much detail they want: E.g., you can make Hemingway or An AI assistant modeled on Ernest Hemingway after he moved to Cuba but in a parallel universe where alcohol turns your skin purple and F. Scott Fitzgerald actually had a small penis (go read A Moveable Feast to know what I’m talking about, it’s a wild story).

Now we have infinite scalability for text inputs and the opportunity to create parameters to channel users’ interest towards a pre-determined goal. One could have this be a selection page at the start of the AI process in a UI that exists on top of the current GPT Builder.

Of course, this isn’t limited to the objective, and we see this matrix-based approach to creating frameworks for natural language coding instructions becomes extremely powerful extremely quickly:

I haven’t filled this all out because I’m lazy, but you get the idea. Now if we zoom in on “Characterization” and “Character” we quickly see a huge opportunity.

Characterization is character’s more subjective qualities: are we creating charismatic heroes, for instance. Character, as I’ve defined it here, is more quantities about the character: how old is the person and what are they wearing, for instance. Both elicit subjective responses, but one is very easy to define (“the color of clothes”) and the other is not (“clothes that evoke memories of childhood”).

We know these are different things that, technically speaking, are clopen sets with significantly important connections between their elements. Speaking more casually, we know who a person really is influences how we feel about the person, but also how the person feels and how we feel about other people influences how we feel about them too.

Opportunities and Challenges

If you teach natural language zero-code programming with no layer to structure the inputs, you’re not going to get better and better GPTs. I am not saying zero code natural language programming of GPTs will fail, but that it is suboptimal unless we also teach people natural language better.

I want to again stress that the above framework is not necessarily 100% accurate or that the underlying text it was based on is 100% optimal. I don’t believe either. But I do think that we need to begin building frameworks and tools based on frameworks to build GPT programming. Oh, and whoever cracks this nut will be worth a billion dollars very quickly.

The craziest implication of all of this is that our next generation of billionaires is going to be English, journalism, and communication majors. English departments at universities could take this opportunity to become as sexy and exciting and important as economics departments were in the mid-20th century. And of course localization startups helping to translate not just the languages but the cultural expectations in terms of style, characterization, etc. Understanding the interface of culture and humanities with technology was always important (Jobs knew this; it’s why everyone uses iPhones and not Windows Phones), but it is exponentially more so now.

I guess the “learn to code” meme is dead.

--

--

Michael Foster
Michael Foster

No responses yet