Georgios Kaklamanos

Message

149

Searching for a model's concepts by their shape – a theoretical framework

Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort Introduction I think that Discovering Latent Knowledge in Language Models Without Supervision (DLK; Burns, Ye, Klein, & Steinhardt, 2022) is a very cool paper – it proposes a way to do unsupervised mind reading[1] –...

Feb 23, 202351

[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".

Preface We would like to thank the following people who contributed to the generation of ideas and provided feedback on this post: Alexandre Variengien, Daniel Filan, John Wentworth, Jonathan Claybrough, Jörn Stöhler, June Ku, Marius Hobbhahn, and Matt MacDermott. We are a group of four who participate in SERI ML...

Jan 25, 202348

LESSWRONG
LW

LESSWRONG
LW

Georgios Kaklamanos

Georgios Kaklamanos

Georgios Kaklamanos

Georgios Kaklamanos

Searching for a model's concepts by their shape – a theoretical framework

[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".

Searching for a model's concepts by their shape – a theoretical framework

[RFC] Possible ways to expand on "Discovering Latent Knowledge in Language Models Without Supervision".