A point against that particular terminology which you might find interesting. https://www.lesswrong.com/posts/6bpW2kyeKaBtuJuEk/why-i-hate-the-accident-vs-misuse-ai-x-risk-dichotomy-quick
To partially answer your question ( I think the answer to "What is happening inside the LLM when it 'switches' to one task or another?" is pretty much "We don't know"), techniques such as RLHF (which nowadays are applied to pretty much any public-facing model you are likely to interact with) cause the model to act less like something searching for the most likely completion of this sentence on the internet, and more like something which is trying to answer your questions. These models would take "question" interpretation over the "autocomplete" one.
A...
How hard was it to find the examples of goal misgeneralization? Did the results take much “coaxing”?
I'd like to use this from your description, but tough to trust / understand without a README!