However, many of these people might not have a sufficient “toolbox” or research experience to have much marginal impact in short timelines worlds.
I think this is true for some people, but I also think people tend to overestimate the amount of years it takes to have enough research experience to contribute.
I think a few people have been able to make useful contributions within their first year (though in fairness they generally had backgrounds in ML or AI, so they weren't starting completely from scratch), and several highly respected senior researchers have just a few years of research experience. (And they, on average, had less access to mentorship/infrastructure than today's folks).
I also think people often overestimate the amount of time it takes to become an expert in a specific area relevant to AI risk (like subtopics in compute governance, information security, etc.)
Finally, I think people should try to model community growth & neglectedness of AI risk in their estimates. Many people have gotten interested in AI safety in the last 1-3 years. I expect that many more will get interested in AI safety in the upcoming years. Being one researcher in a field of 300 seems more useful than being one researcher in a field of 1500.
With all that in mind, I really like this exercise, and I expect that I'll encourage people to do this in the future:
- Write out your credences for AGI being realized in 2027, 2032, and 2042;
- Write out your plans if you had 100% credence in each of 2027, 2032, and 2042;
- Write out your marginal impact in lowering P(doom) via each of those three plans;
- Work towards the plan that is the argmax of your marginal impact, weighted by your credence in the respective AGI timelines.
[Note: written on a phone, quite rambly and disorganized]
I broadly agree with the approach, some comments:
Hmm. Since most of my probability mass is in <5 years range, it seems this is just going to mislead people into not being at all helpful? Why not do this but for the years 2024, 2026, 2028? What makes you privilege the years you chose to mention?
These days have particular significance in my AGI timelines ranking and I think are a good default spread based on community opinion. However, there is no reason you shouldn't choose alternate years!
if everyone followed the argmax approach I laid out here. Are there any ways they might do something you think is predictably wrong?
While teamwork seems to be assumed in the article, I believe it's worth spelling out explicitly that argmaxing for a plan with highest marginal impact might mean joining and/or building a team where the team effort will make the most impact, not optimizing for highest individual contribution.
Spending time to explain why a previous research failed might help 100 other groups to learn from our mistake, so it could be more impactful than pursuing the next shiny idea.
We don't want to optimize for the naive feeling of individual marginal impact, we want to keep in mind the actual goal is to make an Aligned AGI.
This seems basically reasonable, but as stated I think importantly misses that the plan you follow will change the accuracy of your estimates in steps (1) and (3) when you come to reassess. With 100% credence on some year, there's no value in picking a plan that gets you evidence about timelines, or evidence of your likely impact in scenarios you're assuming won't happen.
It's not enough to revisit the plan often if the plan you're following isn't giving you much new evidence.
Explore vs. exploit is a frame I naturally use (Though I do like your timeline-argmax frame, as well), where I ask myself "Roughly how many years should I feel comfortable exploring before I really need to be sitting down and attacking the hard problems directly somehow"?
Admittedly, this is confounded a bit by how exactly you're measuring it. If I have 15-year timelines for median AGI-that-can-kill-us (which is about right, for me) then I should be willing to spend 5-6 years exploring by the standard 1/e algorithm. But when did "exploring" start? Obviously I should count my last eight months of upskilling and research as part of the exploration process. But what about my pre-alignment software engineering experience? If so, that's now 4/19 years spent exploring, giving me about three left. If I count my CS degree as well, that's 8/23 and I should start exploiting in less than a year.
Another frame I like is "hill-climbing" - namely, take the opportunity that seems best at a given moment. Though it is worth asking what makes something the best opportunity if you're comparing, say, maximum impact now vs. maximum skill growth for impact later.
Epistemic status: This model is mostly based on a few hours of dedicated thought, and the post was written in 30 min. Nevertheless, I think this model is probably worth considering.
Many people seem to be entering the AI safety ecosystem, acquiring a belief in short timelines and high P(doom), and immediately dropping everything to work on AI safety agendas that might pay off in short-timeline worlds. However, many of these people might not have a sufficient “toolbox” or research experience to have much marginal impact in short timelines worlds.
Rather than tell people what they should do on the object level, I sometimes tell them:
Some further considerations