Anthropic's approach doesn't seem to have panned out
Please don't take that tweet as evidence that mech interp is doomed! Much attention is on sparse autoencoders nowadays, which seem like a cool and promising approach
Thanks! I will separately say that I disagree with the statement regardless of whether you're treating my tweet as evidence
In what sense do you consider the mechinterp paradigm that originated with Olah, to be working?
We are finding a bunch of insights about the internal features and circuits inside models that I believe to be true, and developing useful techniques like sparse autoencoders and activation patching that expand the space of what we can do. We're starting to see signs of life of actually doing things with mech interp, though it's early days. I think skepticism is reasonable, and we're still far from actually mattering for alignment, but I feel like the field is making real progress and is far from failed
I think Tsvi is quite mistaken about the speed we are likely to see AGI develop at. I expect AGI by 2028 with ~95% probability. He does not. Maybe we should dialogue about this?
Sure, though if you're just going to say "I know how to do it! Also I won't tell you!" then it doesn't seem very pointful?
"Endpoints are easier to predict than trajectories"; eventual singularity is such an endpoint; on our current trajectory, the person who is going to do it does not necessarily know they are going to do it until it is done.
I don't know if you're a woman, but the women I know have had much more success in politics than the men I know.
Not a woman, sadly.
I believe it, especially if one takes a view of "success" that's about popularity rather than fiat power.
But FYI to future advisors: the thing I would want to prospectively optimize for, along the gov path, when making this decision, is about fiat power. I'm highly uncertain about whether viable paths exist from a standing start to [benevolent] bureaucratic fiat power over AI governance, and if so, where those viable paths originate.
If it was just about reach, I'd probably look for a columnist position instead.
I don't have much in the way of good ideas for you to try next. I will, however, link you to my viewpoint on what the next few years probably look like.
Hi! I'm Lorec, AKA Mack. I made this post 3 years ago:
Wanted: Foom-scared alignment research partner
I met some great people, but we never got much of anywhere.
Since then, technical alignment research in general also has not gotten much of anywhere [ counterexample; other strongish counterexamples I know of include the Visible Thoughts idea and Pliny's approach, Anthropic's approach doesn't seem to have panned out ] and AI doom aversion policy has become a thing.
I made a Discord a while ago for discussion of doom aversion methods. We were some of the first people [to my knowledge] talking positively about SB-1047. I consider it a failure: we were early and correct, but because we were not plugged into any network, nothing came of it.
I am indifferent to technical versus policy work except to the extent of [ the effectiveness factor over the risk factor of [technical work in general] ], versus [ the effectiveness factor over the risk factor of [policy work in general] ].
Factors I'm coming in considering important contributors to the technical versus policy weigh-in:
Pro Technical
- Can potentially have low safety risks if the researcher knows exactly what they are doing and does not use their employer's money to contribute to capabilities
- Can potentially have high safety upsides if the researcher knows exactly what they are doing and is a paranoid saint and can work without ever posting their exciting intermediate results on social media [difficulty level: impossible]
- Technical experience lends [any] policy credibility, while policy experience does not lend technical credibility
Pro Policy
- Fairly safe, for people who have a reasonable level of knowing what they are doing
- Policy jobs [from my faraway position; this might be wrong] seem likely to be more fungible [with each other] than technical jobs - resulting in less risk of being locked in to one employer whose mission I find myself disagreeing with
- I expect to have an easier time getting one of these kinds of jobs; while I consider myself decent enough at programming to be qualified for such technical alignment research as is hiring in principle, in practice I have no degree, job history, or portfolio, and am done wasting my time trying to acquire them, like, no, really, done. End of story.
Who should I talk to? What movements or orgs should I look into? Where are Things Happening the most? As stated in title, all my spoons are available for this, provided I find something that's actually high prospective impact, low prospective risk.
I appreciate your time and consideration.