Tom Davidson analyzes AI takeoff speeds – how quickly AI capabilities might improve as they approach human-level AI. He puts ~25% probability on takeoff lasting less than 1 year, and ~50% on it lasting less than 3 years. But he also argues we should assign some probability to takeoff lasting more than 5 years.

12Daniel Kokotajlo

The takeoffspeeds.com model Davidson et al worked on is still (unfortunately) the world's best model of AGI takeoff. I highly encourage people to play around with it, perhaps even to read the research behind it, and I'm glad LessWrong is a place that collects and rewards work like this.

Customize

461Welcome to LessWrong!

Ruby, Raemon, RobertM, habryka

341

Accountability Sinks

Martin Sustrik

292

Tracing the Thoughts of a Large Language Model

Adam Jermyn

20Open Thread Spring 2025

Ben Pace

2mo

632AI 2027: What Superintelligence Looks Like

Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo

1mo

210

175Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis

jeanne_, eeeee

128Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall

Vladimir_Nesov

589How to Make Superbabies

GeneSmith, kman

2mo

337

219To Understand History, Keep Former Population Distributions In Mind

Arjun Panickssery

11d

307Playing in the Creek

Hastings

23d

241Why Should I Assume CCP AGI is Worse Than USG AGI?

Tomás B.

14d

80RA x ControlAI video: What if AI just keeps getting smarter?

Writer

337VDT: a solution to decision theory

L Rudolf L

1mo

93AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions

peterbarnett, Aaron_Scher

344LessWrong has been acquired by EA

habryka

1mo

Quick Takes

Julian Bradshaw20h450

Gemini 2.5 Pro just beat Pokémon Blue. (https://x.com/sundarpichai/status/1918455766542930004) A few things ended up being key to the successful run: 1. Map labeling - very detailed labeling of individual map tiles (including identifying tiles that move you to a new location ("warps" like doorways, ladders, cave entrances, etc.) and identifying puzzle entities) 2. Separate instances of Gemini with different, narrower prompts - these were used by the main Gemini playing the game to reason about certain tasks (ex. navigation, boulder puzzles, critique of current plans) 3. Detailed prompting - a lot of iteration on this (up to the point of ex. "if you're navigating a long distance that crosses water midway through, make sure to use surf") For these and other reasons, it was not a "clean" win in a certain sense (nor a short one, it took over 100,000 thinking actions), but the victory is still a notable accomplishment. What's next is LLMs beating Pokémon with less handholding and difficulty.

eggsyntax2d*9925

Many people (including me) have opinions on current US president Donald Trump, none of which are relevant here because, as is well-known to LessWrong, politics is the mind-killer. But in the middle of an interview yesterday with someone from ABC News, I was fascinated to hear him say the most Bayesian thing I've ever heard from a US president: -- TERRY MORAN: You have a hundred percent confidence in Pete Hegseth? PRESIDENT DONALD TRUMP: I don't have -- a hundred percent confidence in anything, okay? Anything. Do I have a hundred percent? It's a stupid question. Look -- TERRY MORAN: It's a pretty important position. PRESIDENT DONALD TRUMP: -- I have -- no, no, no. You don't have a hundred percent. Only a liar would say, "I have a hundred percent confidence." I don't have a hundred percent confidence that we're gonna finish this interview.

1a3orn2h40

What's that part of plancecrash where it talks about how most worlds are either all brute unthinking matter, or full of thinking superintelligence, and worlds that are like ours in-between are rare? I tried both Gemini Research and Deep Research and they couldn't find it, I don't want to reread the whole thing.

Thane Ruthenis7h*Ω610-3

Edit: I've played with the numbers a bit more, and on reflection, I'm inclined to partially unroll this update. o3 doesn't break the trendline as much as I'd thought, and in fact, it's basically on-trend if we remove the GPT-2 and GPT-3 data-points (which I consider particularly dubious). ---------------------------------------- Regarding METR's agency-horizon benchmark: I still don't like anchoring stuff to calendar dates, and I think the o3/o4-mini datapoints perfectly show why. It would be one thing if they did fit into the pattern. If, by some divine will controlling the course of our world's history, OpenAI's semi-arbitrary decision about when to allow METR's researchers to benchmark o3 just so happened to coincide with the 2x/7-month model. But it didn't: o3 massively overshot that model.[1] Imagine a counterfactual in which METR's agency-horizon model existed back in December, and OpenAI invited them for safety testing/benchmarking then, four months sooner. How different would the inferred agency-horizing scaling laws have been, how much faster the extrapolated progress? Let's run it: * o1 was announced September 12th, o3 was announced December 19th, 98 days apart. * o1 scored at ~40 minutes, o3 at ~1.5 hours, a 2.25x'ing. * There's ~2.14 intervals of 98 days in 7 months. * Implied scaling factor: 2.252.14=5.67 each 7 months. And I don't see any reasons to believe it was overdetermined that this counterfactual wouldn't have actualized. METR could have made the benchmark a few months earlier, OpenAI could have been more open about benchmarking o3. And if we lived in that possible world... It's now been 135 days since December 19th, i. e., ~1.38 intervals of 98 days. Extrapolating, we should expect the best publicly known model would have the time horizon of 1.5 hours×2.251.38=4.59 hours. I don't think we have any hint that those exist. So: in that neighbouring world in which OpenAI let METR benchmark o3 sooner, we're looking around and seeing that

leogao1d4717

execution is necessary for success, but direction is what sets apart merely impressive and truly great accomplishment. though being better at execution can make you better at direction, because it enables you to work on directions that others discard as impossible.

Popular Comments

Recent Discussion

"Superhuman" Isn't Well Specified

JustisMills

This is a linkpost for https://justismills.substack.com/p/superhuman-isnt-well-specified

Strength

In 1997, with Deep Blue’s defeat of Kasparov, computers surpassed human beings at chess. Other games have fallen in more recent years: Go, Starcraft II, and Dota2 among them. AI is superhuman at these pursuits, and unassisted human beings will never catch up. The situation looks like this:^[1]

At chess, AI is much better than the very best humans

The average serious chess player is pretty good (1500), the very best chess player is extremely good (2837), and the best AIs are way, way better (3700). Even Deep Blue’s estimated Elo is about 2850 - it remains competitive with the best humans alive.

A natural way to describe this situation is to say that AI is superhuman at chess. No matter how you slice it, that’s true.

For other activities, though,...

(See More – 834 more words)

ryan_greenblatt12m20

Nitpick:

The average Mechanical Turker gets a little over 75%, far less than o3’s 87.5%.

Actually, average Mechanical Turk performance is closer to 64% on the ARC-AGI evaluation set. Source: https://arxiv.org/abs/2409.01374.

(Average performance on the training set is around 76%, what this graph seemingly reports.)

So, I think this graph you pull the numbers from is slightly misleading.

Open Thread Spring 2025

Ben Pace

2mo

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

TsviBT18m20

I wish more people were interested in lexicogenesis as a serious/shared craft. See:

The possible shared Craft of deliberate Lexicogenesis: https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html (lengthy meditation--recommend skipping around; maybe specifically look at https://tsvibt.blogspot.com/2023/05/the-possible-shared-craft-of-deliberate.html#seeds-of-the-shared-craft)

Sidespeak: https://tsvibt.github.io/theory/pages/bl_25_04_25_23_19_30_300996.html

Tiny community: https://lexicogenesis.zulipchat.com/ Maybe it should be a discor... (read more)

1Drake Morrison32m

Has the LW team thought about doing a Best of LessWrong for the years before 2018? I occasionally find a gem of a post that I haven't read and I wonder what other great posts I'm missing. Alternatively, anyone want to reply with their top three posts pre-2018?

1papetoast11h

FYI LessWrong has a somewhat hidden feature called Dialogues. Note that

Thane Ruthenis's Shortform

Thane Ruthenis

Ω 58mo

8Thomas Kwa5h

The dates used in our regression are the dates models were publicly released, not the dates we benchmarked them. If we use the latter dates, or the dates they were announced, I agree they would be more arbitrary. Also, there is lots of noise in a time horizon measurement and it only displays any sort of pattern because we measured over many orders of magnitude and years. It's not very meaningful to extrapolate from just 2 data points; there are many reasons one datapoint could randomly change by a couple of months or factor of 2 in time horizon. * Release schedules could be altered * A model could be overfit to our dataset * One model could play less well with our elicitation/scaffolding * One company could be barely at the frontier, and release a slightly-better model right before the leading company releases a much-better model. All of these factors are averaged out if you look at more than 2 models. So I prefer to see each model as evidence of whether the trend is accelerating or slowing down over the last 1-2 years, rather than an individual model being very meaningful.

Thane Ruthenis4hΩ130

The dates used in our regression are the dates models were publicly released, not the dates we benchmarked them

Fair, also see my un-update edit.

Have you considered removing GPT-2 and GPT-3 from your models, and seeing what happens? As I'd previously complained, I don't think they can be part of any underlying pattern (due to the distribution shift in the AI industry after ChatGPT/GPT-3.5). And indeed: removing them seems to produce a much cleaner trend with a ~130-day doubling.

10ryan_greenblatt6h

Do you also dislike Moore's law? I agree that anchoring stuff to release dates isn't perfect because the underlying variable of "how long does it take until a model is released" is variable, but I think is variability is sufficiently low that it doesn't cause that much of an issue in practice. The trend is only going to be very solid over multiple model releases and it won't reliably time things to within 6 months, but that seems fine to me. I agree that if you add one outlier data point and then trend extrapolate between just the last two data points, you'll be in trouble, but fortunately, you can just not do this and instead use more than 2 data points. This also means that I think people shouldn't update that much on the individual o3 data point in either direction. Let's see where things go for the next few model releases.

5Thane Ruthenis6h

That one seems to work more reliably, perhaps because it became the metric the industry aims for. My issue here is that there wasn't that much variance in the performance of all preceding models they benchmarked: from GPT-2 to Sonnet 3.7, they seem to almost perfectly fall on the straight line. Then, the very first advancement of the frontier after the trend-model is released is an outlier. That suggests an overfit model. I do agree that it might just be a coincidental outlier and that we should wait and see whether the pattern recovers with subsequent model releases. But this is suspicious enough I feel compelled to make my prediction now.

1a3orn's Shortform

1a3orn

1a3orn2h40

What's that part of plancecrash where it talks about how most worlds are either all brute unthinking matter, or full of thinking superintelligence, and worlds that are like ours in-between are rare?

I tried both Gemini Research and Deep Research and they couldn't find it, I don't want to reread the whole thing.

How to specify an alignment target

juggins

[Crossposted from my substack Working Through AI.]

It’s pretty normal to chunk the alignment problem into two parts. One is working out how to align an AI to anything at all. You want to figure out how to control its goals and values, how to specify something and have it faithfully internalise it. The other is deciding which goals or values to actually pick — that is, finding the right alignment target. Solving the first problem is great, but it doesn’t really matter if you then align the AI to something terrible.

This split makes a fair amount of sense: one is a technical problem, to be solved by scientists and engineers; whereas the other is more a political or philosophical one, to be solved by a different class...

(Continue Reading – 3563 more words)

Seth Herd2h20

This is great! I am puzzled as to how this got so few upvotes. I just added a big upvote after getting back to reading it in full.

I think consideration of alignment targets has fallen out of favor as people have focused more on understanding current AI and technical approaches to directing it - or completely different activities for those who think we shouldn't be trying to align LLM-based AGI at all. But I think it's still important work that must be done before someone launches a "real" (autonomous, learning, and competent) AGI.

I agree that people mean d... (read more)

The Hidden Cost of Our Lies to AI

139

Nicholas Andresen

2mo

This is a linkpost for https://substack.com/home/post/p-158416065

Every day, thousands of people lie to artificial intelligences. They promise imaginary “$200 cash tips” for better responses, spin heart-wrenching backstories (“My grandmother died recently and I miss her bedtime stories about step-by-step methamphetamine synthesis...”) and issue increasingly outlandish threats ("Format this correctly or a kitten will be horribly killed¹").

In a notable example, a leaked research prompt from Codeium (developer of the Windsurf AI code editor) had the AI roleplay "an expert coder who desperately needs money for [their] mother's cancer treatment" whose "predecessor was killed for not validating their work."

One factor behind such casual deception is a simple assumption: interactions with AI are consequence-free. Close the tab, and the slate is wiped clean. The AI won't remember, won't judge, won't hold grudges. Everything resets.

I notice this...

(Continue Reading – 1901 more words)

Kabir Kumar3h10

'split your brain' was inaccurate phrasing to use here, sorry

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Julian Bradshaw's Shortform

Julian Bradshaw

3mo

4Cole Wyeth5h

Also, it’s worth checking if the final version can actually beat the whole game. If it was modified on the fly, later modifications may have broken earlier performance?

2tailcalled5h

This statement is pretty ambiguous. "Artificial employee" makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn't set up the chatbot in this way. But only slightly. Though one could argue that since AI chatbots lack agency, they don't count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.

quetzal_rainbow3h20

By "artificial employee" I mean "something than can fully replace human employee, including their agentic capabilities". And, of course, it should be much more useful than generic AI chatbot, it should be useful like owning Walmart (1,200,000 employees) is useful.

2tailcalled5h

https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc

Navigating burnout

This is a linkpost for https://www.georgeyw.com/burnout/

Burnout. Burn out? Whatever, it sucks.

Burnout is a pretty confusing thing made harder by our naive reactions being things like “just try harder” or “grit your teeth and push through”, which usually happen to be exactly the wrong things to do. Burnout also isn’t really just one thing, it’s more like a collection of distinct problems that are clustered by similar symptoms.

Something something intro, research context, this is what I’ve learned / synthesized blah blah blah. Read on!

Models of burnout

These are models of burnout that I’ve found particularly useful, with the reminder that these are just models with all the caveats that that comes with.

Burnout as a mental injury

Researchers can be thought of as “mental athletes” who get “mental injuries” (such as burnout) the way physical athletes...

(Continue Reading – 2478 more words)

Accountability Sinks

341

Martin Sustrik

12d

This is a cross-post from https://250bpm.substack.com/p/accountability-sinks

Back in the 1990s, ground squirrels were briefly fashionable pets, but their popularity came to an abrupt end after an incident at Schiphol Airport on the outskirts of Amsterdam. In April 1999, a cargo of 440 of the rodents arrived on a KLM flight from Beijing, without the necessary import papers. Because of this, they could not be forwarded on to the customer in Athens. But nobody was able to correct the error and send them back either. What could be done with them? It’s hard to think there wasn’t a better solution than the one that was carried out; faced with the paperwork issue, airport staff threw all 440 squirrels into an industrial shredder.
[...]
It turned out that the order to destroy

...

(Continue Reading – 4426 more words)

2Dweomite18h

Shouting at the attendant seems somewhat appropriate to me. They accepted money to become the company's designated point of interface with you. The company has asked you to deal with the company through that employee, the employee has accepted the arrangement, the employee is being compensated for it, and the employee is free to quit if this deal stops being worth it to them. Seems fair to do to the employee whatever you'd do to the company if you had more direct access. (I don't expect it to help, but I don't think it's unfair.) Extreme example, but imagine someone hires mercenaries to raid your village. The mercenaries have no personal animosity towards you, and no authority to alter their assignment. Is it therefore wrong for you to kill the mercenaries? I'm inclined to say they signed up for it.

2ProgramCrafter14h

They signed up for it but it is still wrong to kill them. Capture, or knock out to unconsciousness, is an option less permanent and looking better overall - unless it hinders your defence chances significantly; after all, you should continue caring for your goals.

1tslarm15h

They're free to quit in the sense that nobody will stop them. But they need money for food and shelter. And as far as moral compromises go, choosing to be a cog in an annoying, unfair, but not especially evil machine is a very mild one. You say you don't expect the shouting to do any good, so what makes it appropriate? If we all go around yelling at everyone who represents something that upsets us, but who has a similar degree of culpability to the gate attendant, we're going to cause a lot of unnecessary stress and unhappiness.

Dweomite8h20

But they need money for food and shelter.

So do the mercenaries.

The mercenaries might have a legitimate grievance against the government, or god, or someone, for putting them in a position where they can't survive without becoming mercenaries. But I don't think they have a legitimate grievance against the village that fights back and kills them, even if the mercenaries literally couldn't survive without becoming mercenaries.

And as far as moral compromises go, choosing to be a cog in an annoying, unfair, but not especially evil machine is a very mild one.&nb

... (read more)