(Cross-posted to EA Forum)
TL;DR
If your median estimate for TAI is in 30 years you might have to plan to solve the alignment problem in 5 years
When thinking about AI timelines I like to think about the timeline and high level plan for solutions based on them. I want to encourage others to think about and model timelines in similar ways. The following models are illustrations of the idea but likely not the best way to think about this.
Planning
A while ago I watched the 2001 film Spy Game starring Robert Redford. There is one line in the film that has stuck with me ever since. Redford's character is questioned about being paranoid and responds “When did Noah build the Ark? Before the rain. Before the rain.”
Noah building an ark inland while there was no rain at all would have looked like a paranoid fool. When the early warning signs of light rain started everyone would reasonably argue that there wouldn’t be a flood. As the rain gets heavier people reason that it must be about to reach its peak and so will likely end soon. By the time the water level starts to rise it is too late to start building a raft let alone an ark. There is some point of no return where an otherwise preventable event, that has not yet happened, is no longer possible to prevent from happening.
Example:
(The numbers used in this example are chosen more for ease of demonstration than for being well thought out.)
I’ll assume there is a 10% probability for TAI in 15 years and 50% for 30 years. Or 10% for 2036 and 50% for 2051.
Using the precautionary principle we don’t want to plan to finish solving the problem somewhere around the 50% mark, because this leaves plenty of time to be too late. When considering AI risk the “could happen as early as” matter much more. So pick some point early in the probability distribution that becomes your due date. In this example I’m going with 10%, but pick whatever makes sense to you here.
So the due date for solving the alignment problem becomes 2036 or ~170 months. Actually this is the date that the solution needs to have been implemented. I assume there will be some time between solving the problem and implementing it. Here I think of a solution as the “whiteboard solution” that is the time someone has come up with the initial idea and only just sketched it out on a whiteboard. This I am going to assume is 5 years.
So the due date for solving the alignment problem becomes 2031 or ~104 months. If we consider the planning fallacy though we would be quite optimistic to think we would finish the project on time. We need to have a much more aggressive internal target. Here I will cut the project time in half, removing another 5 years.
So the due date for solving the alignment problem becomes 2026 or ~52 months. From here you might start planning these 52 months on a more granular, inside view detail. For some people at this level (and example numbers) the plan here might be 1) study, 2) research. Which could put some pressure on how much time you have for schooling to get anything else done.
The top-line result of AI timeline work tends to be the median probability point. What I show here is that this point can seem deceptively far away in terms of how urgently we need to take action. In this example I quite quickly went from a 2051 median to a 2026 target and greatly increased my sense of urgency.
What I'm trying to say is that it's much harder to do AI alignment research while models are still small, so TAI timelines somewhat dictate the progress of AI alignment research. If I wanted my 5 year plan to have the best chance at success, I would have "test this on a dog-intelligence-level AI" in my plan, even if I thought that probably wouldn't arrive by 2036, because that would make AI alignment research much easier.