All of danielbalsam's Comments + Replies

Great post -- thanks for sharing. I am trying to replicate this work and was able to do so for several models but having a lot of trouble reproducing this for the Llama 3 models. I am able to sometimes success in some narrow prompts but not others. Are there any suggestions you have or anything else non-obvious for that model family?

2Andy Arditi
The most finicky part of our methodology (and the part I'm least satisfied with currently) is in the selection of a direction. For reproducibility of our Llama 3 results, I can share the positions and layers where we extracted the directions from: * 8B: (position_idx = -1, layer_idx = 12) * 70B: (position_idx = -5, layer_idx = 37) The position indexing assumes the usage of this prompt template, with two new lines appended to the end.

Hi! I am in the process of reading this sequence and would love some supplemental lecture materials (particularly at the intersection of alignment research) and was very excited by the prospect of the lectures form the June summit, however the YouTube channels appears to 404 now. Is there somewhere else I can listen to these lectures?

2Liam Carroll
Ah! Thanks for that - it seems the general playlist organising them has splintered a bit, so here is the channel containing the lectures, the structure of which is explained here. I'll update this post accordingly.