MichaelDickens - LessWrong

Hmm I wonder if this is why so many April Fools posts have >200 upvotes. April Fools Day in cahoots with itself?

Why do many people who care about AI Safety not clearly endorse PauseAI?

isn't your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?

Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with "completely pause" on one end and "race as fast as possible" on the other. Pushing more toward the "pause" side of the spectrum has the ~opposite effect as pushing toward the "race" side.

I wish you'd try modeling this with more granularity than "is alignment hard" or whatever

I've never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn't to say it's particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
In general I am not a fan of criticisms of the form "this model is too simple". All models are too simple. What, specifically, is wrong with it?

I had a quick look at the linked post and it seems to be making some implicit assumptions, such as

the plan of "use AI to make AI safe" has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it's true)
there is a ~100% chance of slow takeoff
if you unilaterally pause, this doesn't increase the probability that anyone else pauses, doesn't make it easier to get regulations passed, etc.

I would like to see some quantification of the from "we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there's only a 50% chance that the 2nd-leading company will attempt to align AI in a way we'd find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment." (Or a more detailed version of that.)

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens7d50

I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn't simultaneously somehow slow down non-US development.

It seems to me that to believe this, you have to believe all of these four things are true:

Solving AI alignment is basically easy
Non-US frontier AI developers are not interested in safety
Non-US frontier AI developers will quickly catch up to the US
If US developers slow down, then non-US developers are very unlikely to also slow down—either voluntarily, or because the US strong-arms them into signing a non-proliferation treaty, or whatever

I think #3 is sort-of true and the others are probably false, so the probability of all four being simultaneously true is quite low.

(Statements I've seen from Chinese developers lead me to believe that they are less interested in racing and more concerned about safety.)

I made a quick Squiggle model on racing vs. slowing down. Based on my first-guess parameters, it suggests that racing to build AI destroys ~half the expected value of the future compared to not racing. Parameter values are rough, of course.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens7d10

That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!

The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I'd say it was net positive.

(My guess is it's more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens7d40

Yeah I actually agree with that, I don't think it was sufficient, I just think it was pretty good. I wrote the comment too quickly without thinking about my wording.

Why do many people who care about AI Safety not clearly endorse PauseAI?

Answer by MichaelDickensMar 30, 2025243

I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.

(I do in fact support PauseAI US, and I have defended it publicly because I think it's important to do so, but it makes me feel silly whenever I do.)

That's not the only reason why people don't endorse PauseAI, but I think it's an important reason that should be mentioned.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens8d*5-5

Well -- I'm gonna speak broadly -- if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI -- for instance the kind of policy measures proposed by people working at AI companies isn't enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on.

They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI development?

I think there is indeed exactly one such policy measure, which is SB 1047, supported by Center for AI Safety which is OpenPhil-funded (IIRC), which most big AI companies lobbied against, and Anthropic opposed the original stronger version and got it reduced to a weaker and probably less-safe version.

When I wrote where I was donating in 2024 I went through a bunch of orgs' policy proposals and explained why they appear deeply inadequate. Some specific relevant parts: 1, 2, 3, 4

Edit: Adding some color so you don't have to click through– when I say the proposals I reviewed were inadequate, I mean they said things like (paraphrasing) "safety should be done on a completely voluntary basis with no government regulations" and "companies should have safety officers but those officers should not have final say on anything", and would simply not address x-risk at all, or would make harmful proposals like "the US Department of Defense should integrate more AI into its weapon systems" or "we need to stop worrying about x-risk because it's distracting from the real issues".

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens8d5-1

If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it see bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.

A couple notes on this:

AFAICT PauseAI US does not do the thing you describe.
I've looked at a good amount of research on protest effectiveness. There are many observational studies showing that nonviolent protests are associated with preferred policy changes / voting patterns, and ~four natural experiments. If protests backfired for fairly minor reasons like "their website makes some hard-to-defend claims" (contrasted with major reasons like "the protesters are setting buildings on fire"), I think that would show up in the literature, and it doesn't.

Why do many people who care about AI Safety not clearly endorse PauseAI?

MichaelDickens8d4-2

B. "Pausing AI" is indeed more popular than PauseAI, but it's not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think "yeah I support pausing AI."

This strikes me as a very strange claim. You're essentially saying, even if a general policy is widely supported, it's practically impossible to implement any specific version of that policy? Why would that be true?

For example I think a better alternative to "nobody fund PauseAI, and nobody make an alternative version they like better" would be "there are 10+ orgs all trying to pause AI and they all have somewhat different goals but they're all generally pushing in the direction of pausing AI". I think in the latter scenario you are reasonably likely to get some decent policies put into place even if they're not my favorite.

Tormenting Gemini 2.5 with the [[[]]][][[]] Puzzle

MichaelDickens8d10

I don't think you could refute it. I believe you could construct a binary polynomial function that gives the correct answer to every example.

For example it is difficult to reconcile the cases of 3, 12, and 19 using a reasonable-looking function, but you could solve all three cases by defining E E as the left-associative binary operation

f(x, y) = -1/9 x^2 + 32/9 x - 22/9 + y

LESSWRONG
LW

Posts

Wikitag Contributions

Comments