What is the error message?
Yep, this sounds interesting! My suggestion for anyone wanting to run this experiment would be to start with SAD-mini, a subset of SAD with the five most intuitive and simple tasks. It should be fairly easy to adapt our codebase to call the Goodfire API. Feel free to reach out to myself or @L Rudolf L if you want assistance or guidance.
How do you know what "ideal behaviour" is after you steer or project out your feature? How would you differentiate a feature with sufficiently high cosine sim to a "true model feature" and a "true model feature"? I agree you can get some signal on whether a feature is causal, but would argue this is not ambitious enough.
Yes, that's right -- see footnote 10. We think that Transcoders and Crosscoders are directionally correct, in the sense that they leverage more of the models functional structure via activations from several sites, but agree that their vanilla versions suffer similar problems to regular SAEs.
Also related to the idea that the best linear SAE encoder is not the transpose of the decoder.
Agreed that this post presents the altruistic case.
I discuss both the money and status points in the "career capital" paragraph (though perhaps should have factored them out).
your image of a man with a huge monitor doesn't quite scream "government policymaker" to me
In fact, this mindset gave me burnout earlier this year.
I relate pretty strongly to this. I think almost all junior researchers are incentivised to 'paper grind' for longer than is correct. I do think there are pretty strong returns to having one good paper for credibility reasons; it signals that you are capable of doing AI safety research, and thus makes it easier to apply for subsequent opportunities.
Over the past 6 months I've dropped the paper grind mindset and am much happier for this. Notably, were it not for short term grants where needing to visib...
You might want to stop using the honey extension. Here are some shady things they do, beyond the usual:
any update on this?
thanks! added to post
UC Berkeley has historically had the largest concentration of people thinking about AI existential safety. It's also closely coupled to the Bay Area safety community. I think you're possibly underrating Boston universities (i.e. Harvard and Northeastern, as you say the MIT deadline has passed). There is a decent safety community there, in part due to excellent safety-focussed student groups. Toronto is also especially strong on safety imo.
Generally, I would advise thinking more about advisors with aligned interests over universities (this relates to Neel's...
Agreed. A related thought is that we might only need to be able to interpret a single model at a particular capability level to unlock the safety benefits, as long as we can make a sufficient case that we should use that model. We don't care inherently about interpreting GPT-4, we care about there existing a GPT-4 level model that we can interpret.
Tangentially relevant: this paper by Jacob Andreas' lab shows you can get pretty far on some algorithmic tasks by just training a randomly initialized network's embedding parameters. This is in some sense the opposite to experiment 2.
I don't think it's great for post age-60 actually, as compared with a regular pension, see my reply. The comment on asset tests is useful though, thanks. Roughly LISA assets count towards many tests, while pensions don't. More details here for those interested: https://www.moneysavingexpert.com/savings/lifetime-isas/
Couple more things I didn't explain:
Should you invest in a Lifetime ISA? (UK)
The Lifetime Individual Savings Account (LISA) is a government saving scheme in the UK intended primarily to help individuals between the ages of 18 and 50 buy their first home (among a few other things). You can hold your money either as cash or in stocks and shares.
The unique selling point of the scheme is that the government will add a 25% bonus on all savings up to £4000 per year. However, this comes with several restrictions. The account is intended to only be used for the following purposes:
1) to buy your firs...
As a general rule, I try and minimise my phone screen time and maximise my laptop screen time. I can do every "productive" task faster on a laptop than on my phone.
Here are some things object level things I do that I find helpful that I haven't yet seen discussed.