(Cross-posted to EA Forum)

TL;DR

If your median estimate for TAI is in 30 years you might have to plan to solve the alignment problem in 5 years



 

When thinking about AI timelines I like to think about the timeline and high level plan for solutions based on them. I want to encourage others to think about and model timelines in similar ways. The following models are illustrations of the idea but likely not the best way to think about this.

 

Planning

A while ago I watched the 2001 film Spy Game starring Robert Redford. There is one line in the film that has stuck with me ever since. Redford's character is questioned about being paranoid and responds “When did Noah build the Ark? Before the rain. Before the rain.” 

Noah building an ark inland while there was no rain at all would have looked like a paranoid fool. When the early warning signs of light rain started everyone would reasonably argue that there wouldn’t be a flood. As the rain gets heavier people reason that it must be about to reach its peak and so will likely end soon. By the time the water level starts to rise it is too late to start building a raft let alone an ark. There is some point of no return where an otherwise preventable  event, that has not yet happened, is no longer possible to prevent from happening. 

 

Example:

(The numbers used in this example are chosen more for ease of demonstration than for being  well thought out.)

 

I’ll assume there is a 10% probability for TAI in 15 years and 50% for 30 years. Or 10% for 2036 and 50% for 2051.

Using the precautionary principle we don’t want to plan to finish solving the problem somewhere around the 50% mark, because this leaves plenty of time to be too late. When considering AI risk the “could happen as early as” matter much more. So pick some point early in the probability distribution that becomes your due date. In this example I’m going with 10%, but pick whatever makes sense to you here.

So the due date for solving the alignment problem becomes 2036 or ~170 months. Actually this is the date that the solution needs to have been implemented. I assume there will be some time between solving the problem and implementing it. Here I think of a solution as the “whiteboard solution” that is the time someone has come up with the initial idea and only just sketched it out on a whiteboard. This I am going to assume is 5 years.

So the due date for solving the alignment problem becomes 2031 or ~104 months. If we consider the planning fallacy though we would be quite optimistic to think we would finish the project on time. We need to have a much more aggressive internal target. Here I will cut the project time in half, removing another 5 years.

So the due date for solving the alignment problem becomes  2026 or ~52 months. From here you might start planning these 52 months on a more granular, inside view detail. For some people at this level (and example numbers) the plan here might be 1) study, 2) research. Which could put some pressure on how much time you have for schooling to get anything else done.

The top-line result of AI timeline work tends to be the median probability point. What I show here is that this point can seem deceptively far away in terms of how urgently we need to take action. In this example I quite quickly went from a 2051 median to a 2026 target and greatly increased my sense of urgency.

New Comment
5 comments, sorted by Click to highlight new comments since:

I think that if we saw the working AI alignment solution used in 2050 in a paper written in 2026, we wouldn't be confident it would work. That's because there are a lot of uncertainties about how hard the AI alignment problem is in the first place, how ML behaves when it's scaled up, ect. I think most plans for AI safety need to go like "we make the theory now, then we keep working on it as ML scales up and adapt accordingly".

Yes, if you have a solution in 2026 it isn't likely to be relevant to something used in 2050. But 2026 is the planned solution date and 2050 is the median TAI date. 

The numbers I used above a just to demonstrate the point thought. The broad idea is that coming up with a solution/theory to alignment takes longer than planned. Having a theory isn't enough, you still have some time to make it count. Then TAI might come at the early end of your probability distribution. 

It's pretty optimistic to plan that TAI will come at your median estimate and that you won't run into the planning fallacy.

What I'm trying to say is that it's much harder to do AI alignment research while models are still small, so TAI timelines somewhat dictate the progress of AI alignment research. If I wanted my 5 year plan to have the best chance at success, I would have "test this on a dog-intelligence-level AI" in my plan, even if I thought that probably wouldn't arrive by 2036, because that would make AI alignment research much easier.

The plan and numbers I lay out above you actually finish friendly AI in 2036, which is the 10% point

Here is an argument I've heard for why we shouldn't try to solve AI alignment super early:

If you aren't one of the top few AI safety researchers in the world, then you are far more likely to solve AI alignment if you spend some years to develop your skills first. Therefore most people in AI alignment should forsake some early timelines (like anything before 2040) and optimize for their impact once they're a senior researcher.

This would be false if either less experienced AI safety researchers were able to contribute to completing AI alignment in 5 years, or if they can develop skills nearly as well working on a 5 year alignment plan as they could just optimizing for learning. I think both of these are somewhat true, which weakens the argument for me.