isn't your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?
Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with "completely pause" on one end and "race as fast as possible" on the other. Pushing more toward the "pause" side of the spectrum has the ~opposite effect as pushing toward the "race" side.
I wish you'd try modeling this with more granularity than "is alignment hard" or whatever
I had a quick look at the linked post and it seems to be making some implicit assumptions, such as
I would like to see some quantification of the from "we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there's only a 50% chance that the 2nd-leading company will attempt to align AI in a way we'd find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment." (Or a more detailed version of that.)
I think it would probably be bad for the US to unilaterally force all US AI developers to pause if they didn't simultaneously somehow slow down non-US development.
It seems to me that to believe this, you have to believe all of these four things are true:
I think #3 is sort-of true and the others are probably false, so the probability of all four being simultaneously true is quite low.
(Statements I've seen from Chinese developers lead me to believe that they are less interested in racing and more concerned about safety.)
I made a quick Squiggle model on racing vs. slowing down. Based on my first-guess parameters, it suggests that racing to build AI destroys ~half the expected value of the future compared to not racing. Parameter values are rough, of course.
That's kind-of what happened with the anti-nuclear movement, but it ended up doing lots of harm because the things that could be stopped were the good ones!
The global stockpile of nuclear weapons is down 6x since its peak in 1986. Hard to attribute causality but if the anti-nuclear movement played a part in that, then I'd say it was net positive.
(My guess is it's more attributable to the collapse of the Soviet Union than to anything else, but the anti-nuclear movement probably still played some nonzero role)
I feel kind of silly about supporting PauseAI. Doing ML research, or writing long fancy policy reports feels high status. Public protests feel low status. I would rather not be seen publicly advocating for doing something low-status. I suspect a good number of other people feel the same way.
(I do in fact support PauseAI US, and I have defended it publicly because I think it's important to do so, but it makes me feel silly whenever I do.)
That's not the only reason why people don't endorse PauseAI, but I think it's an important reason that should be mentioned.
Well -- I'm gonna speak broadly -- if you look at the history of PauseAI, they are marked by belief that the measures proposed by others are insufficient for Actually Stopping AI -- for instance the kind of policy measures proposed by people working at AI companies isn't enough; that the kind of measures proposed by people funded by OpenPhil are often not enough; and so on.
They are correct as far as I can tell. Can you identify a policy measure proposed by an AI company or an OpenPhil-funded org that you think would be sufficient to stop unsafe AI development?
I think there is indeed exactly one such policy measure, which is SB 1047, supported by Center for AI Safety which is OpenPhil-funded (IIRC), which most big AI companies lobbied against, and Anthropic opposed the original stronger version and got it reduced to a weaker and probably less-safe version.
When I wrote where I was donating in 2024 I went through a bunch of orgs' policy proposals and explained why they appear deeply inadequate. Some specific relevant parts: 1, 2, 3, 4
Edit: Adding some color so you don't have to click through– when I say the proposals I reviewed were inadequate, I mean they said things like (paraphrasing) "safety should be done on a completely voluntary basis with no government regulations" and "companies should have safety officers but those officers should not have final say on anything", and would simply not address x-risk at all, or would make harmful proposals like "the US Department of Defense should integrate more AI into its weapon systems" or "we need to stop worrying about x-risk because it's distracting from the real issues".
If you look at the kind of claims that PauseAI makes in their risks page, you might believe that some of them seem exaggerated, or that PauseAI is simply throwing all the negative things they can find about AI into big list to make it see bad. If you think that credibility is important to the effort to pause AI, then PauseAI might seem very careless about truth in a way that could backfire.
A couple notes on this:
B. "Pausing AI" is indeed more popular than PauseAI, but it's not clearly possible to make a more popular version of PauseAI that actually does anything; any such organization will have strategy/priorities/asks/comms that alienate many of the people who think "yeah I support pausing AI."
This strikes me as a very strange claim. You're essentially saying, even if a general policy is widely supported, it's practically impossible to implement any specific version of that policy? Why would that be true?
For example I think a better alternative to "nobody fund PauseAI, and nobody make an alternative version they like better" would be "there are 10+ orgs all trying to pause AI and they all have somewhat different goals but they're all generally pushing in the direction of pausing AI". I think in the latter scenario you are reasonably likely to get some decent policies put into place even if they're not my favorite.
I don't think you could refute it. I believe you could construct a binary polynomial function that gives the correct answer to every example.
For example it is difficult to reconcile the cases of 3, 12, and 19 using a reasonable-looking function, but you could solve all three cases by defining E E
as the left-associative binary operation
f(x, y) = -1/9 x^2 + 32/9 x - 22/9 + y
Hmm I wonder if this is why so many April Fools posts have >200 upvotes. April Fools Day in cahoots with itself?