It seems more important whether humans can figure out how to evaluate alignment in 2028 rather than whether they can make human level aligned AGIs (though of course that’s instrumentally useful and correlated). In particular, the AIs need to prevent humans from discovering the method by which the AIs evaluate alignment. This seems probably doable for ASIs but may be a significant constraint esp. for only somewhat superhuman AIs if they’ve e.g. solved mech interp and applied it themselves but need to hide this for a long time.

Analyzing A Critique Of The AI 2027 Timeline Forecasts

elifland19d72

The timelines model didn't get nearly as many reviews as the scenario. We shared the timelines writeup with all of the people who we shared the later drafts of the scenario with, but I think almost none of them looked at the timelines writeup.

We also asked a few people to specifically review the timelines forecasts, most notably a few FutureSearch forecasters who we then added as a final author. However, we mainly wanted them to estimate the parameter values and didn't specifically ask them for feedback on the underlying modeling choices (though they did form some opinions, for example they liked benchmark and gaps much more than time horizon extension; also btw the superexponential plays a much smaller role in benchmarks and gaps). No one brought up the criticisms that titotal did.

In general the timelines model certainly got way less effort than the scenario, probably about 5% as much effort. Our main focus was the scenario as we think that it's a much higher value add.

I'm been pretty surprised at to how much quality-weighted criticisms have focused on the timelines model relative to the scenario, and wish that it was more tilted toward the scenario (and also toward the takeoff model, which IMO is more important than the timelines model but has gotten much less attention). To be clear I'm still very glad that these critiques exist if the alternative is that they didn't exist and nothing replaced them.

A deep critique of AI 2027’s bad timeline models

elifland24d302

I'll say various facts as best as I can recall and allow you and others to decide how bad/deceptive the time horizon prediction graph was.

The prediction on the graph was formed by extrapolating a superexponential with a 15% decay. This was set to roughly get SC at the right time, based on an estimate for what time horizon is needed for SC that is similar to my median in the timelines forecast. This is essentially a simplified version of our time horizon extension model that doesn't account for AI R&D automation. Or another way to view this is that we crudely accounted for AI R&D automation by raising the decay.
This was not intended to represent our overall median forecast as that is later, but instead to represent roughly the trajectory that happens in AI 2027.
As is shown in titotal's post, the graph is barely in distribution for the trajectories of our timelines model which reach SC in Mar 2027, it's certainly not central. We did not check this before the AI 2027 release.
Why didn't we use a central trajectory from our timelines model rather than the simplified version? This was on my TODO list, but I ran out of time. As you can imagine, we were working right up until a deadline and didn't get to many TODOs that would have been great to have. But very likely I should have prioritized it more highly, so this is my mistake.
- Or we should have more clearly labeled that the graph was not generated via the timelines model.
If we had the correct graph, then the new model releases would have been a bit above our predicted trend, rather than right on it. So it should be a very slight update toward the plausibility of shorter timelines than AI 2027.

A deep critique of AI 2027’s bad timeline models

elifland24d2711

I'm kind of split about this critique, since the forecast did end up as good propaganda if nothing else. But I do now feel that the marketing around it was kind of misleading, and we probably care about maintaining good epistemics here or something.

I'm interested in you expanding on which parts of the marketing were misleading. Here are some quick more specific thoughts:

Overall AI 2027 comms
1. In our website frontpage, I think we were pretty careful not to overclaim. We say that the forecast is our "best guess", "informed by trend extrapolations, wargames, ..." Then in the "How did we write it?" box we basically just say it was written iteratively and informed by wargames and feedback. In "Why is it valuable?" we say "We have set ourselves an impossible task. Trying to predict how superhuman AI in 2027 would go is like trying to predict how World War 3 in 2027 would go, except that it’s an even larger departure from past case studies. Yet it is still valuable to attempt, just as it is valuable for the US military to game out Taiwan scenarios." I don't think we said anywhere that it was backed up by straightforward, strongly empirically validated extrapolations.
2. In our initial tweet, Daniel said it was a "deeply researched" scenario forecast. This still seems accurate to me, we spent quite a lot of time on it (both the scenario and supplements) and I still think our supplementary research is mostly state of the art, though I can see how people could take it too strongly.
3. In various follow-up discussions, I think Scott and others sometimes pointed to the length of all of the supplementary research as justification for taking the scenario seriously. I still think this mostly holds up but again I think it could be interpreted in the wrong way.
4. Probably there has been similar discussion in various podcast appearances etc., but I haven't listened to most of those and don't remember how this sort of thing was presented in the ones I did listen to.
Timelines forecast specific comms
1. We do not say prominently explicitly in the timelines forecast that it relies on a bunch of non-obvious parameter choices rather than just empirical trend extrapolation, so I agree that people could come away with the wrong impression.
  1. Plausibly we should have had / I should add a disclaimer saying something like this.
  2. I have been frustrated with previous forecasts for not communicating this well, so plausibly I'm being hypocritical.
  3. One reason I'm hesitant to add this is that I think it might update non-rationalists too much toward thinking it's useless, when in fact I think it's pretty informative. But this might be motivated reasoning toward the choice I made before. I might add a disclaimer.
  4. I didn't explicitly consider adding a prominent disclaimer previously; perhaps because I was typical minding and thinking it was obvious that any AGI timelines forecast will rely on intuitively estimated parameters.
2. However, I think that including 3 different people/groups' forecasts very prominently does implicitly get across the idea that different parameter estimations can lead to very different results. This is especially true for including the FutureSearch aggregate, which has a within-model median of 2032 rather than 2027 or 2028.
  1. There's a graph at the top of the timelines forecast with all 3 of our distributions, and in my tweet thread about the timelines forecast this was in my top tweet.
3. As I've said, I agree that we messed up to some extent re: the time horizon prediction graph. I might write more about this in response to TurnTrout.

Not-very-charitably put, my impression now is that all the technical details in the forecast were free parameters fine-tuned to support the authors' intuitions, when they weren't outright ignored. Now, I also gather that those intuitions were themselves supported by playing around with said technical models, and there's something to be said about doing the math, then burning the math and going with your gut. I'm not saying the forecast should be completely dismissed because of that.

I tried not to just fine-tune the parameters to support my existing beliefs, though I of course probably implicitly did to some extent. I agree that the level of free parameters is a reason to distrust our forecasts.

FWIW, my and Daniel's timelines beliefs have both shifted some as a result of our modeling. Mine initially got shorter then got a bit longer due to the most recent update, Daniel moved his timelines longer to 2028 in significant part because of our timelines model.

... But "the authors, who are smart people with a good track record of making AI-related predictions, intuitively feel that this is sort of right, and they were able to come up with functions whose graphs fit those intuitions" is a completely different kind of evidence compared to "here's a bunch of straightforward extrapolations of existing trends, with non-epsilon empirical support, that the competent authors intuitively think are going to continue".

Mostly agree. I would say we have more than non-epsilon empirical support though because of METR's time horizons work and RE-Bench. But I agree that there are a bunch of parameters estimated that don't have much empirical support to rely on.

But if I did interpret the forecast as being based on intuitively chosen but non-tampered straightforward extrapolations of existing trends, I think I would be pretty disappointed right now.

I don't agree with the connotation of "non-tampered," but otherwise agree re: relying on straightforward extrapolations. I don't think it's feasible to only rely on straightforward extrapolations when predicting AGI timelines.

You should've maybe put a "these graphs are for illustrative purposes only" footnote somewhere, like this one did.

I think "illustrative purposes only" would be too strong. The graphs are the result of an actual model that I think is reasonable to give substantial weight to in one's timelines estimates (if you're only referring to the specific graph that I've apologized for, then I agree we should have moved more in that direction re: more clear labeling).

I don't feel that "this is the least-bad forecast that exists" is a good defence. Whether an analysis is technical or vibes-based is a spectrum, but it isn't graded on a curve.

I'm not sure exactly how to respond to this. I agree that the absolute level of usefulness of the timelines forecast also matters, and I probably think that our timelines model is more useful than you do. But also I think that the relative usefulness does matter quite a bit on the decision of whether to release and publicize model. I think maybe this critique is primarily coupled with your points about communication issues.

[Unlike the top-level comment, Daniel hasn't endorsed this, this is just Eli.]

A deep critique of AI 2027’s bad timeline models

elifland1mo*14234

Thanks titotal for taking the time to dig deep into our model and write up your thoughts, it's much appreciated. This comment speaks for Daniel Kokotajlo and me, not necessarily any of the other authors on the timelines forecast or AI 2027. It addresses most but not all of titotal’s post.

Overall view: titotal pointed out a few mistakes and communication issues which we will mostly fix. We are therefore going to give titotal a $500 bounty to represent our appreciation. However, we continue to disagree on the core points regarding whether the model’s takeaways are valid and whether it was reasonable to publish a model with this level of polish. We think titotal’s critiques aren’t strong enough to overturn the core conclusion that superhuman coders by 2027 are a serious possibility, ~~nor to significantly move our overall median~~ (edit: I now think it's plausible that changes made as a result of titotal's critique will move our median significantly). Moreover, we continue to think that AI 2027’s timelines forecast is (unfortunately) the world’s state-of-the-art, and challenge others to do better. If instead of surpassing us, people simply want to offer us critiques, that’s helpful too; we hope to surpass ourselves every year in part by incorporating and responding to such critiques.

Clarification regarding the updated model

My apologies about quietly updating the timelines forecast with an update without announcing it; we are aiming to announce it soon. I’m glad that titotal was able to see it.

A few clarifications:

titotal says “it predicts years longer timescales than the AI2027 short story anyway.” While the medians are indeed 2029 and 2030, the models still give ~25-40% to superhuman coders by the end of 2027.
Other team members (e.g. Daniel K) haven’t reviewed the updated model in depth, and have not integrated it into their overall views. Daniel is planning to do this soon, and will publish a blog post about it when he does.

Most important disagreements

I'll let titotal correct us if we misrepresent them on any of this.

Whether to estimate and model dynamics for which we don't have empirical data. e.g. titotal says there is "very little empirical validation of the model," and especially criticizes the modeling of superexponentiality as having no empirical backing. We agree that it would be great to have more empirical validation of more of the model components, but unfortunately that's not feasible at the moment while incorporating all of the highly relevant factors.^[1]
1. Whether to adjust our estimates based on factors outside the data. For example, titotal criticizes us for making judgmental forecasts for the date of RE-Bench saturation, rather than plugging in the logistic fit. I’m strongly in favor of allowing intuitive adjustments on top of quantitative modeling when estimating parameters.
[Unsure about level of disagreement] The value of a "least bad" timelines model. While the model is certainly imperfect due to limited time and the inherent difficulties around forecasting AGI timelines, we still think overall it’s the “least bad” timelines model out there and it’s the model that features most prominently in my overall timelines views. I think titotal disagrees, though I’m not sure which one they consider least bad (perhaps METR’s simpler one in their time horizon paper?). But even if titotal agreed that ours was “least bad,” my sense is that they might still be much more negative on it than us. Some reasons I’m excited about publishing a least bad model:
1. Reasoning transparency. We wanted to justify the timelines in AI 2027, given limited time. We think it’s valuable to be transparent about where our estimates come from even if the modeling is flawed in significant ways. Additionally, it allows others like titotal to critique it.
2. Advancing the state of the art. Even if a model is flawed, it seems best to publish to inform others’ opinions and to allow others to build on top of it.
The likelihood of time horizon growth being superexponential, before accounting for AI R&D automation. See this section for our arguments in favor of superexponentiallity being plausible, and titotal’s responses (I put it at 45% in our original model). This comment thread has further discussion. If you are very confident in no inherent superexponentiality, superhuman coders by end of 2027 become significantly less likely, though are still >10% if you agree with the rest of our modeling choices (see here for a side-by-side graph generated from my latest model).
1. How strongly superexponential the progress would be. This section argues that our choice of superexponential function is arbitrary. While we agree that the choice is fairly arbitrary and ideally we would have uncertainty over the best function, my intuition is that titotal’s proposed alternative curve feels less plausible than the one we use in the report, conditional on some level of superexponentiality.
2. Whether the argument for superexponentiality is stronger at higher time horizons. titotal is confused about why there would sometimes be a delayed superexponential rather than starting at the simulation starting point. The reasoning here is that the conceptual argument for superexponentiality is much stronger at higher time horizons (e.g. going from 100 to 1,000 years feels likely much easier than going from 1 to 10 days, while it’s less clear for 1 to 10 weeks vs. 1 to 10 days). It’s unclear that the delayed superexponential is the exact right way to model that, but it’s what I came up with for now.

Other disagreements

Intermediate speedups: Unfortunately we haven’t had the chance to dig deeply into this section of titotal’s critique, and it’s mostly based on the original version of the model rather than the updated one so we probably will not get to this. The speedup from including AI R&D automation seems pretty reasonable intuitively at the moment (you can see a side-by-side here).
RE-Bench logistic fit (section): We think it’s reasonable to set the ceiling of the logistic at wherever we think the maximum achievable performance would be. We don’t think it makes any sense to give weight to a fit that achieves a maximum of 0.5 when we know reference solutions achieve 1.0 and we also have reason to believe it’s possible to get substantially higher. We agree that we are making a guess (or with more positive connotation, “estimate”) about the maximum score, but it seems better than the alternative of doing no fit.

Mistakes that titotal pointed out

We agree that the graph we’ve tweeted is not closely representative of the typical trajectory of our timelines model conditional on superhuman coders in March 2027. Sorry about that, we should have prioritized making it more precisely faithful to the model. We will fix this in future communications.
They convinced us to remove the public vs. internal argument as a consideration in favor of superexponentiality (section).
We like the analysis done regarding the inconsistency of the RE-Bench saturation forecasts with an interpolation of the time horizons progression. We agree that it’s plausible that we should just not have RE-Bench in the benchmarks and gaps model; this is partially an artifact of a version of the model that existed before the METR time horizons paper.

In accordance with our bounties program, we will award $500 to titotal for pointing these out.

Communication issues

There were several issues with communication that titotal pointed out which we agree should be clarified, and we will do so. These issues arose from lack of polish rather than malice. 2 of the most important ones:

The “exponential” time horizon case still has superexponential growth once you account for automation of AI R&D.
The forecasts for RE-Bench saturation were adjusted based on other factors on top of the logistic fit.

^{^}
Relatedly, titotal thinks that we made our model too complicated, while I think it's important to make our best guess for how each relevant factor affects our forecast.

steve2152's Shortform

elifland2mo62

Sorry for the late reply.

If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And really there’s gonna be some division of compute between (A) and (B), such that the amount of (A) is less than I claimed? …Or how are you thinking about that?

I'm not 100% sure what you mean, but my guess is that you mean (B) to represent the compute used for experiments? We do project a split here and the copies/speed numbers are just for (A). You can see our projections for the split in our compute forecast (we are not confident that they are roughly right).

Re: the rest of your comment, makes sense. Perhaps the place I most disagree is that if LLMs will be the thing discovering the new paradigm, they will probably also be useful for things like automating alignment research, epistemics, etc. Also if they are misaligned they could sabotage the research involved in the paradigm shift.

Shortform

elifland2mo40

Tagging @romeo who did our security forecast.

plex's Shortform

elifland2mo20

Oh I misunderstood you sorry. I think the form should have post-2023, not sure about the website because it adds complexity and I'm skeptical that it's common that people are importantly confused by it as is.

steve2152's Shortform

elifland2mo192

Whew, a critique that our takeoff should be faster for a change, as opposed to slower.

Fun fact: AI-2027 estimates that getting to ASI might take the equivalent of a 100-person team of top human AI research talent working for tens of thousands of years.
(Calculation details: For example, in October 2027 of the AI-2027 modal scenario, they have “330K superhuman AI researcher copies thinking at 57x human speed”, which is 1.6 million person-years of research in that month alone. And that’s mostly going towards inventing ASI, I think. Did I get that right?)

This depends on how large you think the penalty is for parallelized labor as opposed to serial. If 330k parallel researchers is more like equivalent to 100 researchers at 50x speed than 100 researchers at 3,300x speed, then it's more like a team of 100 researchers working for (50*57)/12=~250 years.

Also of course to the extent you think compute will be an important input, during October they still just have a month's worth of total compute even though they're working for 250-25,000 subjective years.

I’m curious why ASI would take so much work. What exactly is the R&D labor supposed to be doing each day, that adds up to so much effort? I’m curious how people are thinking about that, if they buy into this kind of picture. Thanks :)

I'm imagining that there's a mix of investing tons of effort into optimizing experimenting ideas, implementing and interpreting every experiment quickly, as well as tons of effort into more conceptual agendas given the compute shortage, some of which bear fruit but also involve lots of "wasted" effort exploring possible routes, and most of which end up needing significant experimentation as well to get working.

(My own opinion, stated without justification, is that LLMs are not a paradigm that can scale to ASI, but after some future AI paradigm shift, there will be very very little R&D separating “this type of AI can do anything importantly useful at all” and “full-blown superintelligence”. Like maybe dozens or hundreds of person-years, or whatever, as opposed to millions. More on this in a (hopefully) forthcoming post.)

I don't share this intuition regarding the gap between the first importantly useful AI and ASI. If so, that implies extremely fast takeoff, correct? Like on the order of days from AI that can do important things to full-blown superintelligence?

Currently there are hundreds or perhaps low thousands of years of relevant research effort going into frontier AI each year. The gap between importantly useful AI and ASI seems larger than a year of current AI progress (though I'm not >90% confident in that, especially if timelines are <2 years). Then we also need to take into account diminishing returns, compute bottlenecks, and parallelization penalties, so my guess is that the required person-years should be at minimum in the thousands and likely much more. Overall the scenario you're describing is maybe (roughly) my 95th percentile speed?

I'm curious about your definition for importantly useful AI actually. Under some interpretations I feel like current AI should cross that bar.

I'm uncertain about the LLMs thing but would lean toward pretty large shifts by the time of ASI; I think it's more likely LLMs scale to superhuman coders than to ASI.

plex's Shortform

elifland2mo20

I think it's not worth getting into this too much more as I don't feel strongly about the exact 1.05x, but I feel compelled to note a few quick things:

I'm not sure exactly what you mean by eating a smaller penalty but I think the labor->progress penalty is quite large
The right way to think about 1.05x vs. 1.2x is not a 75% reduction, but instead what is the exponent for which 1.05^n=1.2
Remember the 2022 vs. 2023 difference, though my guess is that the responses wouldn't have been that sensitive to this

Also one more thing I'd like to pre-register: people who fill out the survey who aren't frontier AI researchers will generally report higher speedups because their work is generally less compute-loaded and sometimes more greenfieldy or requiring less expertise, but we should give by far the most weight to frontier AI researchers.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments