My website is here.
What is the error message?
Yep, this sounds interesting! My suggestion for anyone wanting to run this experiment would be to start with SAD-mini, a subset of SAD with the five most intuitive and simple tasks. It should be fairly easy to adapt our codebase to call the Goodfire API. Feel free to reach out to myself or @L Rudolf L if you want assistance or guidance.
How do you know what "ideal behaviour" is after you steer or project out your feature? How would you differentiate a feature with sufficiently high cosine sim to a "true model feature" and a "true model feature"? I agree you can get some signal on whether a feature is causal, but would argue this is not ambitious enough.
Yes, that's right -- see footnote 10. We think that Transcoders and Crosscoders are directionally correct, in the sense that they leverage more of the models functional structure via activations from several sites, but agree that their vanilla versions suffer similar problems to regular SAEs.
Also related to the idea that the best linear SAE encoder is not the transpose of the decoder.
Agreed that this post presents the altruistic case.
I discuss both the money and status points in the "career capital" paragraph (though perhaps should have factored them out).
your image of a man with a huge monitor doesn't quite scream "government policymaker" to me
As a general rule, I try and minimise my phone screen time and maximise my laptop screen time. I can do every "productive" task faster on a laptop than on my phone.
Here are some things object level things I do that I find helpful that I haven't yet seen discussed.