Evan Hockings - LessWrong

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

It’s great to see that these techniques basically work at scale, but not so much to hear that things remain messy. Do you have any intuition for whether things would start to clean up if the model was trained until the loss curve flattened out? Maybe Chinchilla-optimality even has some interesting bearing on this!

[Linkpost] Introducing Superalignment

Evan Hockings2y147

This is incredible—the most hopeful I’ve been in a long time. 20% of current compute and plans for a properly-sized team! I’m not aware of any clearer or more substantive sign from a major lab that they actually care, and are actually going to try not to kill everyone. I hope that DeepMind and Anthropic have great things planned to leapfrog this!

[SEE NEW EDITS] No, *You* Need to Write Clearer

Evan Hockings2y2115

Agreed—thanks for writing this. I have the sense that there's somewhat of a norm that goes like 'it's better to publish something that not, even if it's unpolished' and while this is not wrong, exactly, I think those who are doing this professionally, or seek to do this professionally, ought to put in the extra effort to polish their work.

I am often reminded of this Jacob Steinhardt comment.

Researchers are, in a very substantial sense, professional writers. It does no good to do groundbreaking research if you are unable to communicate what you have done and why it matters to your field. I hope that the work done by the AI existential safety will attract the attention of the broader ML community; I don't think we can do this alone. But for that to happen, there needs to be good work and it must be communicated well.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments