afspies

Message

PhD Student at Imperial College London. Neurosymbolic AI and Mechanistic Interpretability. Looking forward to spending my retirement as a paperclip. https://afspies.com

afspies

Understanding mesa-optimization using toy models

Overview * Solving the problem of mesa-optimization would probably be easier if we understood how models do search internally * We are training GPT-type models on the toy task of solving mazes and studying them in both a mechanistic interpretability and behavioral context. * This post lays out our model...

May 7, 202346

afspies

Message

PhD Student at Imperial College London. Neurosymbolic AI and Mechanistic Interpretability. Looking forward to spending my retirement as a paperclip. https://afspies.com

afspies

Understanding mesa-optimization using toy models

May 7, 202346

LESSWRONG
LW

LESSWRONG
LW

afspies

afspies

afspies

Understanding mesa-optimization using toy models

afspies

afspies

afspies

Understanding mesa-optimization using toy models

Overview

Introduction