New Answer
New Comment

3 Answers sorted by

Noosphere89

71

I'd say the main things that made my own p(Doom) went down this year are the following:

  1. I've come to believe that data was both a major factor in capabilities and alignment, and I also believe that careful interventions on that data could be really helpful for alignment.

  2. I've come to think that instrumental convergence is closer to a scalar quantity than a boolean, and while I don't think 0 instrumental convergence is incentivized for capabilities and domain reasons, I do think that restraining instrumental convergence/putting useful constraints on instrumental convergence like world models is helpful for capabilities to the extent that I think that power-seeking will likely be a lot more local than what humans do.

  3. I've overall shifted towards a worldview where the common thought experiment of the second-species argument, where humans have killed over 90%+ of chimpanzees and gorillas due to them running away with intelligence and being misaligned neglects very crucial differences between the human and the AI case that makes my p(Doom) lower.

(Maybe another way to say it is I think the outcome of humans just completely running roughshod on every other species due to instrumental convergence is not the median outcome of AI development, but a deep outiler that is very uninformative to how AI outcomes will look like.)

  1. I've come to believe that human values, or at least the generator of values, are actually simpler than a lot of people think, and that a lot of the complexity that appears to be there is because we generally don't like admitting that very simple rules can generate very complex outcomes.

Seth Herd

51

The recent rumors about slowed progress in large training runs have reduced my p(doom). More time to prepare for AGI raises our odds. This probably won't be a large delay. This is combined with the observation that inference-time compute does also scale results, but it probably doesn't scale them that fast - the graph released with o1 preview didn't include units on the cost/compute axis.

More than that, my p(doom) went steadily down as I kept contemplating instruction-following as the central alignment goal. I increasingly think it's the obvious thing to try once you're actually contemplating launching an AGI that could become smarter than you; and it's a huge benefit to any technical alignment scheme, since it offers the advantages of corrigibility, allowing you to correct some alignment errors.

More on that logic in Instruction-following AGI is easier and more likely than value aligned AGI

To be clear, I don't yet believe that the rumors are true, or that if they are, that they matter.

We will have to wait until 2026-2027 to get real evidence on large training run progress.

2Seth Herd
TBC, I don't think it will slow progress all that much; there are other routes to improvement. I guess I didn't express the biggest reason this shifts my p(doom) a little: it's a slower takeoff, giving more time for the reality of the situation to sink in before we have takeover capable AGI. I think we'll still hit near-human LLM agents on schedule (1-2 years) by scaffolding next-gen LLMs boosted with o1 style training. I'm really hoping that the autonomy of these systems will impact people emotionally, creating more and better policy thinking and alignment work on those types of AGIs. I think the rate of approach to AGI is more important than the absolute timelines; we'll see ten times the work on really relevant policy and alignment once we see compelling evidence of the type of AGI that will be transformative and dangerous. I've heard enough credible-sounding rumors to give > 50% that they're true. This is partly a product of this result fitting my theory of why LLMs work so well. While they are predictors, what they're learning from human text is mostly to copy human intelligence. Moving past that will be slower. Do you mean we're waiting tiil 2026/27 for results of thee next scaleup? If this round (GPT5, Claude 4, Gemini 2.0) show diminishing returns, wouldn't we expect that the next will too?
3Noosphere89
To answer this specific question Yes, assuming Claude 4/Gemini 2.0/GPT-5 don't release or are disappointing in 2025-2026, this is definitely evidence that things are slowing down. It doesn't conclusively disprove it, but it does make progress shakier. Agree with the rest of the comment.

Rafael Harth

30

I have my own benchmark of tasks that I think measure general reasoning to decide when I freak out about LLMs, and they haven't improved on them. I was ready to be cautiously optimistic that LLMs can't scale to AGI (and would have reduced by p(doom) slightly) even if they keep scaling by conventional metrics, so the fact that scaling itself also seems to break down (maybe, possibly, partially, to whatever extent it does in fact break down, I haven't looked into it much) and we're reaching physical limits are all good things.

I'm not particularly more optimistic about alignment working anytime soon, just about very long timelines.