I'm not totally sure of this, but it looks to me like there's already more scientific consensus around mirror life being a threat worth taking seriously, than is the case for AI. E.g., my impression is that this paper was largely positively received by various experts in the field, including experts that weren't involved in the paper. AI risk looks much more contentious to me even if there are some very credible people talking about it. That could be driving some of the difference in responses, but yeah, the economic potential of AI probably drives a bunch of the difference too.
To add to that, Oeberst (2023) argues that all cognitive biases at heart are just confirmation bias based around a few "fundamental prior" beliefs. (A "belief" would be a hypothesis about the world bundled with an accuracy.) The fundamental beliefs are:
That is obviously rather speculative, but I think it's some further weak reason to think motivated reasoning is in some sense a fundamental problem of rationality.
It seems like an obviously sensible thing to do from a game-theoretic point of view.
Hmm, seems highly contingent on how well-known the gift would be? And even if potential future Petrovs are vaguely aware that this happened to Petrov's heirs, it's not clear that it would be an important factor when they make key decisions, if anything it would probably feel pretty speculative/distant as a possible positive consequence of doing the right thing. Especially if those future decisions are not directly analogous to Petrov's, such that it's not clear whether it's the same category. But yeah, mainly I just suspect this type of thing to not get enough attention that it ends up shifting important decisions in the future? Interesting idea, though -- upvoted.
Specific examples might include criticisms of RSPs, Kelsey’s coverage of the OpenAI NDA stuff, alleged instances of labs or lab CEOs misleading the public/policymakers, and perspectives from folks like Tegmark and Leahy (who generally see a lot of lab governance as safety-washing and probably have less trust in lab CEOs than the median AIS person).
Isn't much of that criticism also forms of lab governance? I've always understood the field of "lab governance" as something like "analysing and suggesting improvements for practices, policies, and organisational structures in AI labs". By that definition, many critiques of RSPs would count as lab governance, as could the coverage of OpenAI's NDAs. But arguments of the sort "labs aren't responsive to outside analyses/suggestions, dooming such analyses/suggestions" would indeed be criticisms of lab governance as a field or activity.
(ETA: Actually, I suppose there's no reason why a piece of X research cannot critique X (the field it's a part of). So my whole comment may be superfluous. But eh, maybe it's worth pointing out that the stuff you propose adding can also be seen as a natural part of the field.)
Yes, this seems right to me. The OP says
The key point I will make is that, from a game-theoretic point of view, this race is not an arms race but a suicide race. In an arms race, the winner ends up better off than the loser, whereas in a suicide race, both parties lose massively if either one crosses the finish line.
But from a game-theoretic perspective, it can still make sense for the US to aggressively pursue AGI, even if one believes there's a substantial risk of an AGI takeover in the case of a race, especially if the US acts in its own self interest. Even with this simple model, the optimal strategy would depend on how likely AGI takeover is, how bad China getting controllable AGI first would be from the point of view of the US, and how likely China is to also not race if the US does not race. In particular, if the US is highly confident that China will aggressively pursue AGI even if the US chooses to not race, then the optimal strategy for the US could be to race even if AGI takeover is highly likely.
So really I think some key cruxes here are:
And vice versa for China. But the OP doesn't really make any headway on those.
Additionally, I think there are a bunch of complicating details that also end up mattering, for example:
It seems to me all these things could matter when determining the optimal US strategy, but I don't see them addressed in the OP.
for people who are not very good at navigating social conventions, it is often easier to learn to be visibly weird than to learn to adapt to the social conventions.
are you basing this on intuition or personal experience or something else? I guess we should avoid basing it on observations of people who did succeed in that way. People who try and succeed in adapting to social conventions are likely much less noticeable/salient than people who succeed at being visibly weird.
Yeah that makes sense. I think I underestimated the extent to which "warning shots" are largely defined post-hoc, and events in my category ("non-catastrophic, recoverable accident") don't really have shared features (or at least features in common that aren't also there in many events that don't lead to change).
One man's 'warning shot' is just another man's "easily patched minor bug of no importance if you aren't anthropomorphizing irrationally", because by definition, in a warning shot, nothing bad happened that time. (If something had, it wouldn't be a 'warning shot', it'd just be a 'shot' or 'disaster'.
I agree that "warning shot" isn't a good term for this, but then why not just talk about "non-catastrophic, recoverable accident" or something? Clearly those things do sometimes happen, and there is sometimes a significant response going beyond "we can just patch that quickly". For example:
I think one point you're making is that some incidents that arguably should cause people to take action (e.g., Sydney), don't, because they don't look serious or don't cause serious damage. I think that's true, but I also thought that's not the type of thing most people have in mind when talking about "warning shots". (I guess that's one reason why it's a bad term.)
I guess a crux here is whether we will get incidents involving AI that (1) cause major damage (hundreds of lives or billions of dollars), (2) are known to the general public or key decision makers, (3) can be clearly causally traced to an AI, and (4) happen early enough that there is space to respond appropriately. I think it's pretty plausible that there'll be such incidents, but maybe you disagree. I also think that if such incidents happen it's highly likely that there'll be a forceful response (though it could still be an incompetent forceful response).
I don't really have a settled view on this; I'm mostly just interested in hearing a more detailed version of MIRI's model. I also don't have a specific expert in mind, but I guess the type of person that Akash occasionally refers to -- someone who's been in DC for a while, focuses on AI, and has encouraged a careful/diplomatic communication strategy.
“Be careful what you say, try to look normal, and slowly accumulate political capital and connections in the hope of swaying policymakers long-term” isn’t an unconditionally good strategy, it’s a strategy adapted to a particular range of situations and goals.
I agree with this. I also think that being more outspoken is generally more virtuous in politics, though I also see drawbacks with it. Maybe I'd wished OP mentioned some of the possible drawbacks of the outspoken strategy and whether there are sensible ways to mitigate those, or just making clear that MIRI thinks they're outweighed by the advantages. (There's some discussion, e.g., the risk of being "discounted or uninvited in the short term", but this seems to be mostly drawn from the "ineffective" bucket, not from the "actively harmful" bucket.)
AI risk is a pretty weird case, in a number of ways: it’s highly counter-intuitive, not particularly politically polarized / entrenched, seems to require unprecedentedly fast and aggressive action by multiple countries, is almost maximally high-stakes, etc.
Yeah, I guess this is a difference in worldview between me and MIRI, where I have longer timelines, am less doomy, and am more bullish on forceful government intervention, causing me to think increased variance is probably generally bad.
That said, I'm curious why you think AI risk is highly counterintuitive (compared to, say, climate change) -- it seems the argument can be boiled down to a pretty simple, understandable (if reductive) core ("AI systems will likely be very powerful, perhaps more than humans, controlling them seems hard, and all that seems scary"), and it has indeed been transmitted like that successfully in the past, in films and other media.
I'm also not sure why it's relevant here that AI risk is relatively unpolarized -- if anything, that seems like it should make it more important not to cause further polarization (at least if highly visible moral issues being relatively unpolarized represent unstable equilibriums)?
Fyi, it's also available on https://chat.deepseek.com/, as is their reasoning model DeepSeek-R1-Lite-Preview ("DeepThink"). (I suggest signing up with a throwaway email and not inputting any sensitive queries.) From quickly throwing it a few requests I recently asked 3.5 Sonnet, DeepSeek-V3 seems slightly worse, but nonetheless solid.