Labelling, Variables, and In-Context Learning in Llama2

Joshua Penman

6 Labelling, Variables, and In-Context Learning in Llama2

3rd Aug 2024

1 min read

6

This is a linkpost for https://colab.research.google.com/gist/JoshuaSP/0b26dab14c618d0325faf2236ebe8825/variables-and-in-context-learning-in-llama2.ipynb#scrollTo=bFn7VFxFLwda

Hi LessWrong! This is my first LessWrong post sharing my first piece of mechanistic interpretability work.

I studied in-context learning in Llama2. The idea was to look at when we associate two concepts in the LLM's context — an object (e.g. "red square"), and a label (e.g. "Bob"), how is that information transmitted through the model?

I found several interesting things. In this toy example, I found that:

information about the association is passed by reference not by value — in other words, what is passed is a pointer to "this information is here", and then later that information is loaded
the reference position is not a token position but rather information about the semantic location (i.e. "the third item in the list") of the information in question. I suspect later heads actually load a "cloud" of data around the location, and I suspect that this is mediated by punctuation or other markers of structure. (also related to why frontier models are so sensitive to prompt structure)

Interpretability (ML & AI)AI

Frontpage

6

New Comment

Moderation Log