Review

TL;DR: How do you build a fancy chatbot like ChatGPT/Bard/etc from a plain LLM system, and how could I learn how to do this?

I'm trying to learn more about AI. Naturally, I need to actually play around with it and get my hands dirty, the same as I would with any other software engineering subject. So I messed around with pulling some models off of Hugging Face and running them on my computer via a Python wrapper and seeing how well I could get them to answer questions.

When you interact with an AI chatbot from one of the big players, you more or less just interact with them in natural language and they provide answers to your questions and appear to have some ability to reason logically. This is very much not the experience I had with open-source models. With those, I had the very strong impression that the only thing this tool could do was complete the next word. I wouldn't be able to replicate anything like the output of a big chatbot with raw input to these models.

My assumption is that when you send a prompt to one of the big chatbots, some process transforms the prompt into something else. This "something else" is just whatever the developers have found to cause the token predictor to output words that, in the opinion of the developers, actually answer the questions, make logical inferences required by the prompt, and so on. I'm going to call this "algorithmic prompt transformation" since "prompt engineering" seems to have come to mean "how you as a human can write better prompts to make the AI do what you want."

Is this at all accurate? If so, are there books/papers/tutorials on how to actually implement this sort of transformation that anyone specifically recommends? And what is the actual name for the thing I'm calling "algorithmic prompt transformation"?

For example, I found this thing called ReAct. However, it seems to consist of having humans rewrite a prompt into a series of labelled steps that are fed into a simple evaluator which sends the step body to OpenAI or Wikipedia or wherever depending on the label. While this makes sense when you're just trying to validate that ReAct's approach gets more right answers, obviously mechanically rewriting the prompt into the labelled steps would be more interesting. This is the sort of thing I want to know about.

I suppose I could just read all the cites in the ReAct paper and look up information about all the systems those papers test, but I'm lazy and maybe somebody here just knows offhand the best place to start.

New Answer
New Comment

1 Answers sorted by

Dave Orr

20

The thing you're missing is called instruction tuning. You gather a series of prompt/response pairs and fine tune the model over that data. Do it right and you have a chatty model.