Liv Gorton

Message

Liv Gorton has not written any posts yet.

Replying toA List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team

This is a great post! Thank you for writing this up :)

On training SAEs on ConvNets - I recently trained SAEs for all layers of InceptionV1. I've written up a paper on some of the findings of early vision, with a specific focus on curve detectors (twitter thread on the paper and another on some branch specialisation related findings). The features look really good across the entire model, including finding interpretable, monosemantic features in the final layer which, to the best of my knowledge, hasn't been done before, which is really exciting! I'm hoping to put out a blog post focusing on on the final layer in the next couple of weeks (including circuit analysis between the last few layers).

To be able to say we fully understand any real neural network is such a huge step forward for the field and it seems like with SAEs we are well-positioned to actually achieve this goal now.

LESSWRONG
LW

LESSWRONG
LW

Liv Gorton

Liv Gorton

Liv Gorton

Liv Gorton