LESSWRONG
LW

All of elifland's Comments + Replies

Analyzing A Critique Of The AI 2027 Timeline Forecasts

The timelines model didn't get nearly as many reviews as the scenario. We shared the timelines writeup with all of the people who we shared the later drafts of the scenario with, but I think almost none of them looked at the timelines writeup.

We also asked a few people to specifically review the timelines forecasts, most notably a few FutureSearch forecasters who we then added as a final author. However, we mainly wanted them to estimate the parameter values and didn't specifically ask them for feedback on the underlying modeling choices (though they... (read more)

GideonF14d*118

I suspect part of the reasons for the quality-weighted criticism of the timelines rather than the scenario:

If it is the case that you put far less effort into the timelines model than the scenario, then the timelines model is probably just worse - some of the more obvious mistakes that titotal points out probably don't have analogies in your scenario, so its just easier to criticise the timelines model, as there is more to criticise there
In many ways, the timelines model is pretty key to the headline claim of your scenario. The other parts (scenario and ta

... (read more)

A deep critique of AI 2027’s bad timeline models

elifland19d302

I'll say various facts as best as I can recall and allow you and others to decide how bad/deceptive the time horizon prediction graph was.

The prediction on the graph was formed by extrapolating a superexponential with a 15% decay. This was set to roughly get SC at the right time, based on an estimate for what time horizon is needed for SC that is similar to my median in the timelines forecast. This is essentially a simplified version of our time horizon extension model that doesn't account for AI R&D automation. Or another way to view this is that we c

... (read more)

TurnTrout17d175

Thanks, I appreciate your comments.

This is essentially a simplified version of our time horizon extension model that doesn't account for AI R&D automation. Or another way to view this is that we crudely accounted for AI R&D automation by raising the decay.

Why did you simplify the model for a graph? You could have plotted a trajectory to begin with, instead of making a bespoke simplification. Is it because you wanted to "represent roughly the trajectory that happens in AI 2027"? I get that AI 2027 is a story, but why not use your real model to sampl... (read more)

9TurnTrout17d

Yes, I think this would have been quite good.

A deep critique of AI 2027’s bad timeline models

elifland19d2711

I'm kind of split about this critique, since the forecast did end up as good propaganda if nothing else. But I do now feel that the marketing around it was kind of misleading, and we probably care about maintaining good epistemics here or something.

I'm interested in you expanding on which parts of the marketing were misleading. Here are some quick more specific thoughts:

Overall AI 2027 comms
1. In our website frontpage, I think we were pretty careful not to overclaim. We say that the forecast is our "best guess", "informed by trend extrapolations, wargames, ..

... (read more)

9Thane Ruthenis19d

Mostly this part, I think: Like, yes, the supplementary materials definitely represent a huge amount of legitimate research that went into this. But the forecasts are "informed by" this research, rather than being directly derived from it, and the pointing-at kind of conveys the latter vibe. Glad you get where I'm coming from; I wasn't wholly sure how legitimate my complaints were. I agree that this part is tricky, hence my being hesitant about fielding this critique at all. Persuasiveness isn't something we should outright ignore, especially with something as high-profile as this. But also, the lack of such a disclaimer opens you up to takedowns such as titotal's, and if one of those becomes high-profile (which it already might have?), that'd potentially hurt the persuasiveness more than a clear statement would have. There's presumably some sort of way to have your cake and eat it too here; to correctly communicate how the forecast was generated, but in terms that wouldn't lead to it being dismissed by people at large. Yeah, sorry, I was being unnecessarily hyperbolic there.

A deep critique of AI 2027’s bad timeline models

elifland21d*14234

Thanks titotal for taking the time to dig deep into our model and write up your thoughts, it's much appreciated. This comment speaks for Daniel Kokotajlo and me, not necessarily any of the other authors on the timelines forecast or AI 2027. It addresses most but not all of titotal’s post.

Overall view: titotal pointed out a few mistakes and communication issues which we will mostly fix. We are therefore going to give titotal a $500 bounty to represent our appreciation. However, we continue to disagree on the core points regarding whether the model’s t... (read more)

titotal16d132

I'm leaving the same comment here and in reply to daniel on my blog.

First, thank you for engaging in good faith and rewarding deep critique. Hopefully this dialogue will help people understand the disagreements over AI development and modelling better, so they can make their own judgements.

I think I’ll hold off on replying to most of the points there, and make my judgement after Eli does an in-depth writeup of the new model. However, I did see that there was more argumentation over the superexponential curve, so I’ll try out some more critiques... (read more)

6Tom Davidson17d

Re intermediate speed ups : a simple fix You currently have the pace of total progress growing exponentially as AI improves. And this leads the bad back-predictions that the pace of progress used to be much slower. I think your back predictions would be fine if you said that total progress = human-driven progress + AI-driven progress, and then had only the AI part grow exponentially. Then in the back prediction the AI part would rapidly shrink but the human part would remain.

Thane Ruthenis19d4521

So I'm kind of not very satisfied with this defence.

Not-very-charitably put, my impression now is that all the technical details in the forecast were free parameters fine-tuned to support the authors' intuitions^[1], when they weren't outright ignored. Now, I also gather that those intuitions were themselves supported by playing around with said technical models, and there's something to be said about doing the math, then burning the math and going with your gut. I'm not saying the forecast should be completely dismissed because of that.

... But "the authors... (read more)

steve2152's Shortform

elifland2mo62

Sorry for the late reply.

If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And

... (read more)

Shortform

elifland2mo40

Tagging @romeo who did our security forecast.

plex's Shortform

elifland2mo20

Oh I misunderstood you sorry. I think the form should have post-2023, not sure about the website because it adds complexity and I'm skeptical that it's common that people are importantly confused by it as is.

steve2152's Shortform

elifland2mo192

Whew, a critique that our takeoff should be faster for a change, as opposed to slower.

Fun fact: AI-2027 estimates that getting to ASI might take the equivalent of a 100-person team of top human AI research talent working for tens of thousands of years.
(Calculation details: For example, in October 2027 of the AI-2027 modal scenario, they have “330K superhuman AI researcher copies thinking at 57x human speed”, which is 1.6 million person-years of research in that month alone. And that’s mostly going towards inventing ASI, I think. Did I get that ri

... (read more)

7Steven Byrnes2mo

Thanks, that’s very helpful! If we divide the inventing-ASI task into (A) “thinking about and writing algorithms” versus (B) “testing algorithms”, in the world of today there’s a clean division of labor where the humans do (A) and the computers do (B). But in your imagined October 2027 world, there’s fungibility between how much compute is being used on (A) versus (B). I guess I should interpret your “330K superhuman AI researcher copies thinking at 57x human speed” as what would happen if the compute hypothetically all went towards (A), none towards (B)? And really there’s gonna be some division of compute between (A) and (B), such that the amount of (A) is less than I claimed? …Or how are you thinking about that? Right, but I’m positing a discontinuity between current AI and the next paradigm, and I was talking about the gap between when AI-of-that-next-paradigm is importantly useful versus when it’s ASI. For example, AI-of-that-next-paradigm might arguably already exist today but where it’s missing key pieces such that it barely works on toy models in obscure arxiv papers. Or here’s a more concrete example: Take the “RL agent” line of AI research (AlphaZero, MuZero, stuff like that), which is quite different from LLMs (e.g. “training environment” rather than “training data”, and there’s nothing quite like self-supervised pretraining (see here)). This line of research has led to great results on board games and videogames, but it’s more-or-less economically useless, and certainly useless for alignment research, societal resilience, capabilities research, etc. If it turns out that this line of research is actually much closer to how future ASI will work at a nuts-and-bolts level than LLMs are (for the sake of argument), then we have not yet crossed the “AI-of-that-next-paradigm is importantly useful” threshold in my sense. If it helps, here’s a draft paragraph from that (hopefully) forthcoming post: Next: Well, even if you have an ML training plan that will yi

plex's Shortform

elifland2mo20

I think it's not worth getting into this too much more as I don't feel strongly about the exact 1.05x, but I feel compelled to note a few quick things:

I'm not sure exactly what you mean by eating a smaller penalty but I think the labor->progress penalty is quite large
The right way to think about 1.05x vs. 1.2x is not a 75% reduction, but instead what is the exponent for which 1.05^n=1.2
Remember the 2022 vs. 2023 difference, though my guess is that the responses wouldn't have been that sensitive to this

Also one more thing I'd like to pre-register: people... (read more)

2plex2mo

(feel free to not go any deeper, appreciate you having engaged as much as you have!) 1. Yup, was just saying my first-pass guess would have been a less large labour->progress penalty. I do defer here fairly thoroughly. hmm, seems true if you're expecting the people to not have applied a correction already, but less true if they are already making a correction and you're estimating how wrong their correction is? And yup, agree with that preregistration on all counts.

plex's Shortform

elifland2mo20

Yup feel free to make that change, sounds good

2plex2mo

Clarification: 1. Change to the form to ask about without AI assistence? 2. Change to the website to refer to "AI provides the following speedups from a baseline of 2022/3 AI:"? (I don't have write access) (assuming 1 for now, will revert if incorrect)

plex's Shortform

elifland2mo20

No AI help seems harder to compare to since it's longer ago, it seems easiest to think of something close to today as the baseline when thinking about future speedups. Also for timelines/takeoff modeling it's a bit nicer to set the baseline to be more recent (looks like for those we again confusingly allowed 2024 AIs in the baseline as well rather than just 2023. Perhaps I should have standardized that with the side panel).

2plex2mo

I think this risks people underappreciating how much progress is being sped up, my naive read of the UI was the numbers were based on "no AI" and I'd bet most readers would think the same at a glance. Changing the text from "AI provides the following speedups:" to "AI provides the following speedups from a baseline of 2022/3 AI:" would resolve this (I would guess common) misreading.

plex's Shortform

elifland2mo40

I'm not sure what the exact process was, tbh my guess is that they were estimated mostly independently but likely sanity checked with the survey to some extent in mind. It seems like they line up about right, given the 2022 vs. 2023 difference, the intuition regarding underadjusting for labor->progress, and giving weight to our own views as well rather than just the survey, given that we've thought more about this than survey takers (while of course they have the advantage of currently doing frontier AI research).

I'd make less of an adjustment if we ask... (read more)

2plex2mo

Alright, my first pass guess would have been algorithmic progress seems like the kind of thing that eats a much smaller penalty than most forms org-level progress, not none but not a 75% reduction, and not likely more than a 50% reduction, but you guys have the track record. Cool, added a nudge to the last question.

plex's Shortform

elifland2mo20

Yup, seems good

2plex2mo

Okay, switched. I'm curious about why you didn't set the baseline to "no AI help", especially if you expect pre-2024 AI to be mostly useless, as that seems like a cleaner comparison than asking people to remember how good old AIs were?

plex's Shortform

elifland2mo20

I also realized that I believe that confusingly the survey asks about speedup vs. no post-2022 AIs, while I believe the scenario side panel is for no post-2023 AIs, which should make the side panel numbers lower, unclear exactly how much given 2023 AIs weren't particularly useful.

2plex2mo

I can switch the number to 2023?

plex's Shortform

elifland2mo20

Look at the question I mentioned above about the current productivity multiplier

2plex2mo

Oh, yup, missed that optional question in my ctrl-f. Thanks!

plex's Shortform

elifland2mo20

I think a copy would be best, thanks!

2plex2mo

This survey looks like it's asking something different? It's asking about human range, no mention of speed-up from AI.

plex's Shortform

elifland2mo20

I think the survey is an overestimate for the reason I gave above, I think this stuff is subtle and researchers are likely to underestimate the decrease from labor speedup to progress speedup, especially in this sort of survey where it didn't involve discussing with them verbally. Based on their responses to other questions in the survey seems like at least 2 people didn't understand the difference between labor and overall progress/productivity.

Here is the survey: https://forms.gle/6GUbPR159ftBQcVF6. The question we're discussing is: "[optional] What... (read more)

2plex2mo

Wait, actually, I want to double click on this. What was the process that caused you to transform the number you got from the survey (1.2x) to the number on the website (1.05x)? Is there a question that could be asked which would not require a correction? Or which would have a pre-registered correction?[1] 1. ^ Bonus: Was this one pre-registered?

2plex2mo

That resolves the inconsistency. I do worry that dropping a 20% speed-up to a 5% one, especially if post hoc, might cover up some important signal, but I'm sure you've put dramatically more cycles into thinking about this than me. Thanks for the survey, would it make sense to just pass this form around so the numbers go to the same place and you'll check, or should I make a copy and send results if I get them?

plex's Shortform

elifland2mo20

You mean the median would be at least 1.33x rather than the previous 1.2x? Sounds about right so don't feel the need to bet against. Also I'm not planning on doing a follow-up survey but would be excited for others to.

2plex2mo

Your website lists * April 2025 as 1.13x * August 2025 as 1.21x * December 2025 as 1.30x * December 2024 as 1.05x (which seems contradicted by your survey, if the replies were in November) If you think today's number is ~1.33x we're ~7 months ahead of schedule vs the listed forecast, unless I'm really misreading something. Also, re: "would be excited for others to.", is the survey public or easy to share if someone wanted to use the same questions? And I'd bet 1:4 for the current number is actually >1.5x, if that's more interesting. You've updated me to not have that as the main expectation, but still seems pretty plausible. Obviously depends on someone rerunning the survey, and reasonable that you've got your hands full with other things right now.

plex's Shortform

elifland2mo20

Most of the responses were in Nov.

2plex2mo

That seems like stale data, given how these graphs look. Even with the updates you caused, I'm happy to offer an even odds token bet ($100?) that a rerun of a similar survey would give significantly higher average (at least +0.2 over the predicted 1.13x, or about the AI you expect in Dec 2025). I'd be even more happy if the question asked about the researcher's own productivity, as that seems like something they'd have better vision of, but would be pretty noisy with small sample so reasonable to stick with original question.

plex's Shortform

elifland2mo*82

This was from Nov 2024 to Mar 2025 so fairly recent. I think the transition to faster was mostly due to the transition to reasoning models and perhaps the beginnings of increased generalization from shorter to longer time horizons.

Edit: the responses are from between Nov 2024 and Mar 2025. Responses are in increasing order: 1.05-1.1, 1.15, 1.2, 1.3, 2. The lowest one is the most recent but is from a former not current frontier AI researcher.

2plex2mo

The switch to reasoning models does line up well, probably more cleanly. Moved that to main hypothesis, thanks. Having some later responses makes it less likely they missed the change, curious if the other responses were closer to Dec or March. I would guess the not-current-researcher one being excluded probably makes sense? The datapoint from me is not exactly 2x on this, but 'most of an approximately 2x', so would need revisiting with the exact question before it could be directly included, and I'd imagine you'd want the source. I still have some weight on higher research boost from AI than your model is expecting, due to other lines of evidence, but not putting quite as much weight on it.

plex's Shortform

elifland2mo156

We did do a survey in late 2024 of 4 frontier AI researchers who estimated the speedup was about 1.1-1.2x. This is for their whole company, not themselves.

This also matches the vibe I’ve gotten when talking to other researchers, I’d guess they’re more likely to be overestimating than underestimating the effect due to not adjusting enough for my next point. Keep in mind that the multiplier is for overall research progress rather than a speedup on researchers’ labor, this lowers the multiplier by a bunch because compute/data are also inputs to progress.

5Daniel Kokotajlo2mo

That said, we just talked to another frontier AI company researcher who said the speedup was 2x. I disagree with them but it's a data point at least.

2plex2mo

Okay, that updates me some. I'm curious about what your alternate guess about the transition to the faster exponential on the METR long-horizon tasks, and whether you expect that to hold up or be not actually tracking something important? (also please note that via me you now also have a very recent datapoint of a frontier AI researcher who thinks the METR speed-up of ~2x was mostly due to AI accelerating research) Edit: How late in 2024? Because the trendline was only just starting to become apparent even right near the end and was almost invisible a couple months early, it's pretty plausible to me that if you re-ran that survey now you would get different results. The researchers inside will have had a sense somewhat before releases, but also lag on updating is real.

Interpreting the METR Time Horizons Post

elifland2mo30

If the trend isn’t inherently superexponential and continues at 7 month doubling times by default, it does seem hard to get to AGI within a few years. If it’s 4 months, IIRC in my timelines model it’s still usually after 2027 but it can be close because of intermediate AI R&D speedups depending on how big you think the gaps between benchmarks and the real world. I’d have to go back and look if we want a more precise answer. If you add error bars around the 4 month time, that increases the chance of AGI soon ofc.

If you treat the shift from 7 to 4 month ... (read more)

The case for multi-decade AI timelines [Linkpost]

elifland2mo85

It underrates the difficulty of automating the job of a researcher. Real world work environments are messy and contain lots of detail that are neglected in an abstraction purely focused on writing code and reasoning about the results of experiments. As a result, we shouldn’t expect automating AI R&D to be much easier than automating remote work in general.

I basically agree. The reason I expect AI R&D automation to happen before the rest of remote work isn't because I think it's fundamentally much easier, but because (a) companies will try to automa... (read more)

The case for multi-decade AI timelines [Linkpost]

elifland2mo106

I still think full automation of remote work in 10 years is plausible, because it’s what we would predict if we straightforwardly extrapolate current rates of revenue growth and assume no slowdown. However, I would only give this outcome around 30% chance.

In an important sense I feel like Ege and I are not actually far off here. I'm at more like 65-70% on this. I think this overall recommends quite similar actions. Perhaps we have a more important disagreement regarding something like P(AGI within 3 years), for which I'm at approx. 25-30% and Ege might be ... (read more)

2Noosphere892mo

I do think the difference between an AGI timeline median of 5 years and one of 20 years does matter, because politics starts affecting whether we get AGI way more if we have to wait 20 years instead of 5, and serial alignment agendas make more sense if we assume a timeline of 20 years is a reasonable median. Also, he argues against very fast takeoffs/software only singularity in the case for multi-decade timelines post.