Thanks for this dialogue. I find Nate and Oliver's "here's what I think will actually happen" thoughts useful.
I also think I'd find it useful for Nate to spell out "conditional on good things happening, here's what I think the steps look like, and here's the kind of work that I think people should be doing right now. To be clear, I think this is all doomed, and I'm only saying this being Akash directly asked me to condition on worlds where things go well, so here's my best shot."
To be clear, I think some people do too much "play to your outs" reasoning. In the excess, this can lead to people just being like "well maybe all we need to do is beat China" or "maybe alignment will be way easier than we feared" or "maybe we just need to bet on worlds where we get a fire alarm for AGI."
I'm particularly curious to see what happens if Nate tries to reason in this frame, especially since I expect his "play to your outs" reasoning/conclusions might look fairly different from that of others in the community.
Some examples of questions for Nate (and others who have written more about what they actually expect to happen and less about what happens if we condition on things going well):
I'll also note that I'd be open to having a dialogue about this with Nate (and possibly other "doomy" people who have not written up their "play to your outs" thoughts).
Even if humanity isn't like, having a huge mood shift, I do still expect the next 10 years to have a lot more people working on stuff that actually helps than the previous 10 years.
What kinds of things are you imagining, here? I'm worried that on the current margin people coming into safety will predominately go into interpretability/evals/etc because that's the professional/legible thing we have on offer, even though by my lights the rate of progress and the methods/aims/etc of these fields are not nearly enough to get us to alignment in ~10 years (in worlds where alignment is not trivially easy, which is also the world I suspect we're in). My own hope for another ten years is more like "that gives us some space and time to develop a proper science here," which at the current stage doesn't feel very bottlenecked by number of people. But I'm curious what your thoughts are on the "adding more people pushes us closer to alignment" question.
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2024. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
The ousting of Sam Altman by a Board with 3 EA people could be the strongest public move so far.
On "lab leaders would choose to stop if given the coordination-guaranteed button" vs "big ol' global mood shift", I think the mood shift is way more likely (relatively) for two reasons.
One of which was argued directionally about and I want to capture more crisply the way I see it, and the other I didn't see mentioned and might be a helpful model to consider.
The "inverse scaling law" for human intelligence vs rationality. "AI arguments are pretty hard" for "folks like scott and sam and elon and dario" because it's very easy for intelligent people to wade into the thing overconfidently and tie themselves into knots of rationalization (amplified by incentives, "commitment and consistency" as Nate mentioned re: SBF, etc). Whereas for most people (and this, afaict, was a big part of Eliezer's update on communicating AGI Ruin to a general audience), it's a straightforward "looks very dangerous, let's not do this."
The "agency bias" (?): lab leaders et al think they can and should fix things. Not just point out problems, but save the day with positive action. ("I'm not going to oppose Big Oil, I'm going to build Tesla.") "I'm the smart, careful one, I have a plan (to make the current thing be ok-actually to be doing; to salvage it; to do the different probably-wrong thing, etc.)" Most people don't give themselves that "hero license" and even oppose others having it, which is one of those "almost always wrong but in this case right actually" things with AI.
So getting a vast number of humans to "big ol' global mood shift" into "let's stop those hubristically-agentic people from getting everyone killed which is obviously bad" seems more likely to me than getting the small number of the latter into "our plans suck actually, including mine and any I could still come up with, so we should stop."
Nate and I discuss the recent increase in public and political attention on AGI ruin. We try and figure out where, if anywhere, we feel extra hope as a result.
I think it's plausible that this is enough of a shake-up that the world starts relating to AGI in a pretty different way that might help a lot with AI risk going better. Nate is sceptical that a shake-up causes something that is this far from working to start working.
Relatedly, I also think that the extra time might be enough for a bunch of other game-board flipping plans (like cognitively enhancing humans or human uploads) to make significant progress. While Nate agrees that's relatively hopeful, he's pessimistic that humanity will take much advantage of longer timelines. For example, he's pessimistic that this will cause that many additional people to work on these things.
We eventually decided to keep parts of this conversation private. In one or two places that had cuts, the conversational flow is less smooth.
Oliver's & Nate's reactions to public attention on AGI
(At this point, the dialogue was paused, but resumed a few days later.)
Would 10 Years help?
(At this point, the conversation went down a track that we didn't publish. It resumes on a new thread below.)
Gameboard-flipping opportunities
(At this point habryka started typing 5-6 times but kept deleting what he wrote.)
Is AGI recklessness fragile or unusual?
Do people do reasonable things after encountering the AI Risk arguments?
Do humans do useful things on feeling terrified?
Closing thoughts