Sean Osier

Message

LLMs Universally Learn a Feature Representing Token Frequency / Rarity

Summary * LLMs appear to universally learn a feature in their embeddings representing the frequency / rarity of the tokens they were trained on * This feature is observed across model sizes, in base models, instruction tuned models, regular text models, and code models * In models without tied weights,...

Jun 30, 202413

Mathematical Circuits in Neural Networks

(Also posted on the EA Forum) This is one of my final projects for the Columbia EA Summer 2022 Project Based AI Safety Reading Group (special thanks to facilitators Rohan Subramini and Gabe Mukobi). If you're curious you can find my other project here. Summary In this project, I: 1....

Sep 22, 202234

LESSWRONG
LW

LESSWRONG
LW

Sean Osier

Sean Osier

Sean Osier

Sean Osier

LLMs Universally Learn a Feature Representing Token Frequency / Rarity

Mathematical Circuits in Neural Networks

LLMs Universally Learn a Feature Representing Token Frequency / Rarity

Mathematical Circuits in Neural Networks