To partially answer your question ( I think the answer to "What is happening inside the LLM when it 'switches' to one task or another?" is pretty much "We don't know"), techniques such as RLHF (which nowadays are applied to pretty much any public-facing model you are likely to interact with) cause the model to act less like something searching for the most likely completion of this sentence on the internet, and more like something which is trying to answer your questions. These models would take "question" interpretation over the "autocomplete" one.
A purely pretrained model might be more likely to do something like output a probability distribution over the first token which is split over both possible continuations, and then would stick with whichever interpretetation it happened to generate as it autoregresses on its own output.
My understanding of LLMs:
I have some very basic understanding of LLM and the underlying transformer architecture. As I understand it, these LLMs are, as an analogy, basically a much-improved version of autocomplete. During training they are shown a word-sequence with some words masked out and the LLM is asked to guess what the correct word is. If it gets the word wrong the model gets adjusted to better predict the correct word the next time.
The attention mechanism is trained to give the LLM an idea which previous words are most important to pay attention to when predicting the masked word.
That’s all the LLM learns: what previous words to pay attention to and in that context to predict the next/masked word.
During inference/deployment the user prompts the LLM with some words and the model outputs a hopefully sensible response.
In some models there is a separate attention mechanism 1) attention to the user prompt 2) attention to the so far generated model output sequence.
Question:
How does the LLM know what task the prompt is asking it to do?
I could want the model to continue/extend my prompt-sentence with OR I could wand the model to answer a question.
Examples:
* Prompt: what is the best car
- If this was intended to be a question the answer might be: Toyota or Tesla - maybe used in a whole sentence.
- If this was intended to be a prompt to be continued it might be: "for big families" or "going on camping vacation"
*Prompt: Harry Potter walked into the halls of Hogwarts and guess who he found there sitting at the stairs.
- If this was intended to be a question the answer might be: Hermine - maybe used in a whole sentence.
- If this was intended to be a prompt to be continued it might be followed up by a page of Harry Potter fan fiction.
I do understand that prompt engineering is an important part of using these LLMs. But in the above examples there is no refined prompting it’s just a one sentence, unstructured text - no preceding examples (that would be few shot) or keywords like "please translate the following" given.
And even with "few shot" examples in the prompt I don’t have a good intuition what is happening inside the model that makes it do the expected task. I mean LLMs are just trained to do next word prediction. How does this understanding of the intended task still work so well? What is happening inside the LLM when it "switches" to one task or another?
I'm grateful for your answer or links to a relevant article.
Thanks.