Comment Permalink

mako yass15d20

Regarding privacy-preserving AI auditing, I notice this is an area where you really need to have a solution to adversarial robustness, given that the adversary is 1) a nationstate, 2) has complete knowledge of the auditor's training process and probably weights (they couldn't really agree to an inspection deal if they didn't trust the auditors to give accurate reports) 3) knows and controls the data the auditor will be inspecting. 4) Never has to show it to you (if they pass the audit).

Given that you're assuming computers can't practically be secured (though I doubt that very much^[1].), it seems unlikely that a pre-AGI AI auditor could be secured either in that situation.

^{^}
Tech stacks in training and inference centers are shallow enough (or vertically integrated enough) to rewrite, and rewrites and formal verification becomes cheaper as math-coding agents improve. Hardware is routinely entirely replaced. Preventing proliferation of weights and techniques also requires ironclad security, so it's very difficult to imagine the council successfully framing the acquisition of fully fortified computers as an illicit threatening behaviour and forbidding it.

It seems to think that we could stably sit at a level of security that's enough to keep terrorists out but not enough to keep peers out, without existing efforts in conventional security bleeding over into full forrtification programmes.

See in context

143 On the Rationality of Deterring ASI

by Dan H

5th Mar 2025

AI Alignment ForumLinkpost from nationalsecurity.ai

5 min read

143 Ω 61

I’m releasing a new paper “Superintelligence Strategy” alongside Eric Schmidt (formerly Google), and Alexandr Wang (Scale AI). Below is the executive summary, followed by additional commentary highlighting portions of the paper which might be relevant to this collection of readers.

Executive Summary

Rapid advances in AI are poised to reshape nearly every aspect of society. Governments see in these dual-use AI systems a means to military dominance, stoking a bitter race to maximize AI capabilities. Voluntary industry pauses or attempts to exclude government involvement cannot change this reality. These systems that can streamline research and bolster economic output can also be turned to destructive ends, enabling rogue actors to engineer bioweapons and hack critical infrastructure. “Superintelligent” AI surpassing humans in nearly every domain would amount to the most precarious technological development since the nuclear bomb. Given the stakes, superintelligence is inescapably a matter of national security, and an effective superintelligence strategy should draw from a long history of national security policy.

Deterrence

A race for AI-enabled dominance endangers all states. If, in a hurried bid for superiority, one state inadvertently loses control of its AI, it jeopardizes the security of all states. Alternatively, if the same state succeeds in producing and controlling a highly capable AI, it likewise poses a direct threat to the survival of its peers. In either event, states seeking to secure their own survival may preventively sabotage competing AI projects. A state could try to disrupt such an AI project with interventions ranging from covert operations that degrade training runs to physical damage that disables AI infrastructure. Thus, we are already approaching a dynamic similar to nuclear Mutual Assured Destruction (MAD), in which no power dares attempt an outright grab for strategic monopoly, as any such effort would invite a debilitating response. This strategic condition, which we refer to as Mutual Assured AI Malfunction (MAIM), represents a potentially stable deterrence regime, but maintaining it could require care. We outline measures to maintain the conditions for MAIM, including clearly communicated escalation ladders, placement of AI infrastructure far from population centers, transparency into datacenters, and more.

Nonproliferation

While deterrence through MAIM constrains the intent of superpowers, all nations have an interest in limiting the AI capabilities of terrorists. Drawing on nonproliferation precedents for weapons of mass destruction (WMDs), we outline three levers for achieving this. Mirroring measures to restrict key inputs to WMDs such as fissile material and chemical weapons precursors, compute security involves knowing reliably where high-end AI chips are and stemming smuggling to rogue actors. Monitoring shipments, tracking chip inventories, and employing security features like geolocation can help states account for them. States must prioritize information security to protect the model weights underlying the most advanced AI systems from falling into the hands of rogue actors, similar to controls on other sensitive information. Finally, akin to screening protocols for DNA synthesis services to detect and refuse orders for known pathogens, AI companies can be incentivized to implement technical AI security measures that detect and prevent malicious use.

Competitiveness

Beyond securing their survival, states will have an interest in harnessing AI to bolster their competitiveness, as successful AI adoption will be a determining factor in national strength. Adopting AI-enabled weapons and carefully integrating AI into command and control is increasingly essential for military strength. Recognizing that economic security is crucial for national security, domestic capacity for manufacturing high-end AI chips will ensure a resilient supply and sidestep geopolitical risks in Taiwan. Robust legal frameworks governing AI agents can set basic constraints on their behavior that follow the spirit of existing law. Finally, governments can maintain political stability through measures that improve the quality of decision-making and combat the disruptive effects of rapid automation.

By detecting and deterring destabilizing AI projects through intelligence operations and targeted disruption, restricting access to AI chips and capabilities for malicious actors through strict controls, and guaranteeing a stable AI supply chain by investing in domestic chip manufacturing, states can safeguard their security while opening the door to unprecedented prosperity.

Additional Commentary

There are several arguments from the paper worth highlighting.

Emphasize terrorist-proof security over superpower-proof security.

Though there are benefits to state-proof security (SL5), this is a remarkably daunting task that is arguably much less crucial than reaching security against non-state actors and insider threats (SL3 or SL4).

Robust compute security is plausible and incentive-compatible.

Treating high-end AI compute like fissile material or chemical weapons appears politically and technically feasible, and we can draw from humanity’s prior experience managing WMD inputs for an effective playbook. Compute security interventions we recommend in the paper include:

24-hour monitoring of datacenters with tamper-evident cameras
Physical inspections of datacenters
Maintaining detailed records tracking chip ownership
Stronger enforcement of export controls, larger penalties for noncompliance and verified decommissioning of obsolete or inoperable chips
Chip-level security measures, some of which can be implemented with firmware updates alone, circumventing the need for expensive chip redesigns

Additionally, states may demand certain transparency measures from each other’s AI projects, using their ability to maim projects as leverage. AI-assisted transparency measures, which might involve AIs inspecting code and outputting single-bit compliance signals, might make states much more likely to agree to transparency measures. We believe technical work on these sorts of verification measures is worth aggressively pursuing as it becomes technologically feasible.

We draw a distinction between compute security efforts that deny compute to terrorists, and efforts to prevent powerful nation-states from acquiring or using compute. The latter is worth considering, but our focus in the paper is on interventions which would prevent rogue states or non-state actors from acquiring large amounts of compute. Security of this type is incentive-compatible: powerful nations will want states to know where their high-end chips are, for the same reason that the US has an interest in Russia knowing where its fissile material is. Powerful nations can deter each other in various ways, but nonstate actors cannot be subject to robust deterrence.

“Superweapons” as a motivating concern for state competition in AI.

A controlled superintelligence would possibly grant its wielder a “strategic monopoly on power” over the world—complete power to shape its fate. Many readers here would already find this plausible, but it’s worth mentioning that this probably requires undermining mutual assured destruction (MAD), a high bar. Nonetheless, there are several ways MAD may be circumvented by a nation wielding superintelligence. Mirroring a recent paper, we mention several “superweapons”—feasible technological advances that would question nuclear deterrence between states. The prospect of AI-enabled superweapons helps convey why powerful states will not accept a large disadvantage in AI capabilities.

Against An “AI Manhattan Project”

A US “AI Manhattan Project” to build superintelligence is ill-advised because it would be destructively sabotaged by rival states. Its datacenters would be easy to detect and target. Many researchers at American labs have backgrounds and family in rival nations, and many others would fail to get a security clearance. The time and expense to secure sensitive information against dedicated superpowers would trade off heavily with American AI competitiveness, to say nothing of what it would cost to harden a frontier datacenter against physical attack. If they aren’t already, rival states will soon be fully aware of the existential threat that US achievement of superintelligence would pose for them (regardless of whether it is controlled), and they will not sit idly by if an actor is transparently aiming for a decisive strategic advantage, as discussed in [1, 2].

AI GovernanceAI

Curated

143 Ω 61

New Comment

30 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:00 AM

[-]habryka2dΩ295526

Promoted to curated: I have various pretty substantial critiques of this work, but I do overall think this is a pretty great effort at crossing the inferential distance from people who think AGI will be a huge deal and potentially dangerous, to the US government and national security apparatus.

The thing that I feel most unhappy about is that the document feels to me like it follows a pattern that Situational Awareness also had, where it seemed to me like it kept framing various things that it wanted to happen, as "inevitable to happen", while also arguing that they are a good idea, in a way that felt to me like it tried too hard to make some kind of self-fulfilling prophecy.

But overall, I feel like this document speaks with surprising candor and clarity about many things that have been left unsaid in many circumstances. I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters. Relevant quotes:

Should these measures falter, some leaders may contemplate kinetic attacks on datacenters, arguing that allowing one actor to risk dominating or destroying the world are graver dangers, though kinetic attacks are likely unnecessary. Finally, under dire circumstances, states may resort to broader hostilities by climbing up existing escalation ladders or threatening non-AI assets. We refer to attacks against rival AI projects as "maiming attacks."

I also particularly appreciated this proposed policy for how to handle AIs capable of recursive self-improvement:

In the near term, geopolitical events may prevent attempts at an intelligence recursion. Looking further ahead, if humanity chooses to attempt an intelligence recursion, it should happen in a controlled environment with extensive preparation and oversight—not under extreme competitive pressure that induces a high risk tolerance.

[-]davekasten2d207

I particularly appreciated its coverage of explicitly including conventional ballistic escalation as part of a sabotage strategy for datacenters

One thing I find very confusing about existing gaps between the AI policy community and the national security community is that natsec policymakers have already explicitly said that kinetic (i.e., blowing things up) responses are acceptable for cyberattacks under some circumstances, while the AI policy community seems to somehow unconsciously rule those sorts of responses out of the policy window. (To be clear: any day that American servicemembers go into combat is a bad day, I don't think we should choose such approaches lightly.)

[-]habryka2d72

My sense is a lot of the x-risk oriented AI policy community is very focused on avoiding "gaffes" and have a very short-term and opportunistic relationship with reputation and public relations and all that kind of stuff. My sense is that people in the space don't believe being principled or consistently honest basically ever gets rewarded or recognized, so the right strategy is to try to identify what the overton window is, only push very conservatively on expanding it, and focus on staying in the good graces of whatever process determines social standing, which is generally assumed to be pretty random and arbitrary.

I think many people in the space, if pushed, would of course acknowledge that kinetic responses are appropriate in many AI scenarios, but they would judge it as an unnecessarily risky gaffe, and that perception of a gaffe creates a pretty effective enforcement regime for people to basically never bring it up, lest you be judged as politically unresponsible.

[-]davekasten2d1310

I think I am too much inside the DC policy world to understand why this is seen as a gaffe, really. Can you unpack why it's seen as a gaffe to them? In the DC world, by contrast, "yes, of course, this is a major national security threat, and no you of course could never use military capabilities to address it," would be a gaffe.

[-]habryka2d112

I mean, you saw people make fun of it when Eliezer said it, and then my guess is people conservatively assumed that this would generalize to the future. I've had conversations with people where they tried to convince me that Eliezer mentioning kinetic escalation was one of the worst things that anyone has ever done for AI policy, and they kept pointing to twitter threads and conversations where opponents made fun of it as evidence. I think there clearly was something real here, but I also think people really fail to understand the communication dynamics here.

[-]Julian Bradshaw19d3214

This is creative.

TL;DR: To mitigate race dynamics, China and the US should deliberately leave themselves open to the sabotage ("MAIMing") of their frontier AI systems. This gives both countries an option other than "nuke the enemy"/"rush to build superintelligence first" if superintelligence appears imminent: MAIM the opponent's AI. The deliberately unmitigated risk of being MAIMed also encourages both sides to pursue carefully-planned and communicated AI development, with international observation and cooperation, reducing AINotKillEveryone-ism risks.

The problem with this plan is obvious: with MAD, you know for sure that if you nuke the other guy, you're gonna get nuked in return. You can't hit all the silos, all the nuclear submarines. With MAIM, you can't be so confident: maybe the enemy's cybersecurity has gotten too good, maybe efficiency has improved and they don't need all their datacenters, maybe their light AGI has compromised your missile command.

So the paper argues for at least getting as close as possible to assurance that you'll get MAIMed in return: banning underground datacenters, instituting chip control regimes to block rogue actors, enforcing confidentiality-preserving inspections of frontier AI development.

Definitely worth considering. Appreciate the writeup.

[-]Vladimir_Nesov19dΩ12269

Cyberattacks can't disable anything with any reliability or for more than days to weeks though, and there are dozens of major datacenter campuses from multiple somewhat independent vendors. Hypothetical AI-developed attacks might change that, but then there will also be AI-developed information security, adapting to any known kinds of attacks and stopping them from being effective shortly after. So the MAD analogy seems tenuous, the effect size (of this particular kind of intervention) is much smaller, to the extent that it seems misleading to even mention cyberattacks in this role/context.

[-]Lucius Bushnaq17d72

Why would this be restricted to cyber attacks? If the CCP believed that ASI was possible, even if they didn't believe in the alignment problem, the US developing an ASI would plausibly constitute an existential threat to them. It'd mean they lose the game of geopolitics completely and permanently. I don't think they'd necessarily restrict themselves to covert sabotage in such a situation.

[-]Vladimir_Nesov17d50

I'm quibbling with cyberattacks specifically being used as a central example throughout in the document and also on the podcasts. They do mention other kinds of attacks, see How to Maintain a MAIM Regime:

AI powers must clarify the escalation ladder of espionage, covert sabotage, overt cyberattacks, possible kinetic strikes, and so on.

[-]Lucius Bushnaq17d155

The possibility of stability through dynamics like mutually assured destruction has been where a lot of my remaining hope on the governance side has come from for a while now.

A big selling point of this for me is that it does not strictly require countries to believe that ASI is possible and that the alignment problem is real. Just believing that ASI is possible is enough.

[-]Mitchell_Porter17d103

It's nice that the Less Wrong hoi polloi get to comment on a strategy document that has such an elite origin. Coauthors include Eric Schmidt, who may have been the most elite-influential thinker on AI in the Biden years, and xAI's safety advisor @Dan H, who can't be too far removed from David Sacks, Trump's AI czar. That covers both sides of American politics; the third author, Alexandr Wang, is also American, but he's Chinese-American, so it is as if we're trying to cover all the geopolitical factions that have a say in the AI race.

However, the premises of the document are simply wrong ("in my opinion"). Section 3.4 gives us the big picture, in that it lists four strategies for dealing with the rise of superintelligence: Hands Off Strategy, Moratorium Strategy, Monopoly Strategy, and Multipolar Strategy, the latter being the one argued for in this paper. And the Multipolar Strategy argued for, combines mutual assured destruction (MAD) between Chinese and American AI systems, and consensus to prevent proliferation of AI technology to other actors such as terrorists.

I get that this is hardheaded geostrategic thinking. It is a genuine advance on that front. But - the rise of superintelligence means the end of human rule on Earth, no matter who makes it. The world will be governed either by a system of entirely nonhuman AIs, or entities that are AI-human hybrids but in which the AI part must necessarily dominate, if they are to keep up with the "intelligence recursion" mentioned by the paper.

Section 4.1 goes into more detail. US or Chinese bid for dominance is described as unstable, because eventually you will get a cyber war in which the AI infrastructure of both sides is destroyed. A mutual moratorium is also described as unstable, because either side could defect at any time. The paper claims that the most stable situation, which is also the default, is one in which the mutually destructive cyber war is possible, but neither side initiates it.

This is a new insight for me - the idea of cyber war targeting AI infrastructure. It's a step up in sophistication from "air strikes against data centers". And at least cyber-MAD is far less destructive than nuclear MAD. I am willing to suppose that cyber-MAD already exists, and that this paper is an attempt to embed the rise of AI into that framework.

But even cyber-MAD is unstable, because of AI takeover. The inevitable winner of an AI race between China and America is not China or America, it's just some AI. So I definitely appreciate the clarification of interstate relations in this penultimate stage of the AI race. But I still see no alternative to trying to solve the problem of "superalignment", and for me that means making superintelligent AI that is ethical and human-friendly even when completely autonomous - and doing that research in public, where all the AI labs can draw on it.

[-]quiet_NaN2d80

I have just seen this in curated, but I had previously commented on Zvi's reporting on it.

Obviously, any nation state aware of the escalation ladder who wanted to be the first to develop ASI would put their AI cluster deep underground and air-gap it. We must not allow a mine shaft gap and all that. Good luck to their peer superpowers to actually conquer the hardened bunker.

Also, to MAIM, you have to know that you are in imminent danger. But with ASI nobody is sure when the point of fast takeoff -- if there is any -- might start. Is that cluster in that mine still trying to catch up to ChatGPT, or has it reached the point where it can do useful AI research and find algorithmic gains far beyond what humans would have discovered in a millennium? Would be hard to tell from the outside.

Emphasize terrorist-proof security over superpower-proof security. Though there are benefits to state-proof security (SL5), this is a remarkably daunting task that is arguably much less crucial than reaching security against non-state actors and insider threats (SL3 or SL4).

This does not seem to have the least to do with super-intelligence. Daesh is not going to be the first group to build ASI, not in a world where US AI companies burn through billions to get there as soon as possible.

The Superintelligence Strategy paper mentions the 1995 Tokyo subway sarin attack, which killed 13 people. If anything, that attack highlights how utterly impractical nerve gas is for terrorist attacks. That particular group of crazies spent a lot of time on synthesizing a nerve gas (as well as a few other flashy plans) only for their their death toll being similar to a lone wolf school shooter or a someone driving a truck into a crowd. Even if their death toll was increased by an order of magnitude due to AI going "Sure, here are some easy ways to disperse Sarin in a subway carriage", their attacks would still be pretty ineffective compared to more mundane attacks such as bombs or knives.

Basically, when DeepSeek released their weights (so terrorist groups can run it locally instead of foolishly relying on company hosted AI services where any question towards the productions of "WMD" would raise a giant red flag), I did not expect that this would be a significant boon for terrorists, and so far I have not seen anything convincing me of the opposite.

But then again, that paper seems to be clearly targeted towards the state security apparatus, and terrorists were the bogeyman of that apparatus since GWB, so it seems obvious to emphasize the dangers of AIs with "but what if terrorists use them" instead of talking about x-risks or the like.

[-]mikko19d71

If we find ourselves in a world where ASI seems imminent and nations understand its implications, I'd predict that time will be more characterized by mutually assured cooperation rather than sabotage. One key reason for this is that if one nation is seen as leading the race and trying to grab a strategic monopoly via AI, both its allies and enemies will have similar incentives to pursue safety — via safety assurances or military action. There are quite a lot of agreeable safety assurances we can develop and negotiate (some of which you discuss in the paper), and pursuing them will very likely be attempted before direct military escalation. A surprisingly likely end result and stable equilibrium of this then seems to be one where ASI is developed and tightly monitored as an international effort.

This equilibrium of cooperation seems like a plausible outcome the more it's understood that:

ASI can be hugely beneficial
The alignment problem and loss of control pose an even larger risk than national conflicts
Trying to develop ASI for a strategic advantage over other nations carries higher risk of both national conflict and loss of control, but does not much impact its benefits over the alternative

While sabotage and military power are the deterrent, it seems unlikely they will be the action taken; there will likely be no clear points at which to initiate a military conflict, no "fire alarm" — while at the same time nations will feel pressured to act before it is too late. This is an unstable equilibrium that all parties will be incentivized to de-escalate, resulting in "mutually assured cooperation".

That said, I recognize this cooperation-focused perspective may appear optimistic. The path to "mutually assured cooperation" is far from guaranteed. Historical precedents for international cooperation on security matters are mixed^[1]. Differences in how nations perceive AI risks and benefits, varying technological capabilities, domestic political pressures, and the unpredictable nature of AI progress itself could all dramatically alter this dynamic. The paper's MAIM framework may indeed prove more accurate if trust breaks down or if one actor believes they can achieve decisive strategic advantage before others can respond. I'm curious how others view the balance of incentives between competition and cooperation in this context.

^{^}
I like the anecdote that the Cuban missile crisis was at its peak defused because the nations found a deal that was plainly rational and fair, with Kennedy saying he would be in an insupportable position to refuse the deal because "it’s gonna — to any man at the United Nations or any other rational man, it will look like a very fair trade”.

[-]Mitchell_Porter17d30

mutually assured cooperation

What form would this take? American inspectors at DeepSeek? Chinese inspectors at OpenAI and its half-dozen rivals?

[-]mikko16d83

To draw parallels from nuclear weapons: The START I nuclear disarmament treaty between the US and the USSR included some 12 different types of inspection, which included Russian inspectors at US sites and vice versa. We also have the International Atomic Energy Agency that coordinates various safety measures for inhibiting dangerous uses of nuclear technologies. There are many more techniques and agreements that were cooperatively deployed to improve outcomes.

With AI, we already have precursory elements for this in place, with for example the UK AISI evaluating US-developed models for safety. If AI power and danger levels continue to progress, its development will likely become increasingly government-controlled and monitored. The more it is seen as a national security issue, the more pressure there will be for international cooperation from enemies and allies alike.

This cooperation might include reciprocal inspection regimes, joint safety standards, transparency requirements for training runs above certain compute thresholds, and international verification mechanisms. While military conflict remains a theoretical option, the threshold for such action would be extremely high given both nuclear deterrence and the significant diplomatic costs. Instead, we'd likely see a gradual evolution of international governance similar to what we've seen with nuclear technology, but hopefully with more robust cooperation given the shared risks and benefits inherent to ASI.

[-]Chris_Leong17dΩ362

Points for creativity, though I'm still somewhat skeptical about the viability of this strategy,

[-]O O18d40

I've always wondered, why didn't superpowers apply MAIM to nuclear capabilities in the past?

> Speculative but increasingly plausible, confidentiality-preserving AI verifiers

Such as?

[-]Tristan Wegner17d31

The concept of MAIM does imply the US should ALLOW the newest AI chips to be exported to China, because they are the main (rational) state actor that also want to develop AI and would be the one to maim the intentional not hardened US AI efforts, if they cannot keep up. Correct? China needs a similar tech access to keep the balance.

[-]Ben Livengood18d31

I have significant misgivings about the comparison with MAD which relies on overwhelming destructive response being available and thus renders a debilitating first-strike being unavailable.

With AGI a first strike seems both likely to succeed and predicted in advance by several folks in several ways (full takeover, pivotal act, singleton outcome) whereas only a few (Von Neumann) argued for a first strike before the USSR obtained nuclear weapons, with no arguments I am aware of after they did.

If an AGI takeover is likely to trigger MAD itself then that is a separate and potentially interesting line of reasoning, but I don't see the inherent teeth in MAIM. If countries are in a cold war rush to AGI then the most well-funded and covert attempt will achieve AGI first and likely initiate a first strike that circumvents MAD itself through new technological capabilities.

[-]Julian Bradshaw18d41

I think the idea behind MAIM is to make it so neither China nor the US can build superintelligence without at least implicit consent from the other. This is before we get to the possibility of first strikes.

If you suspect an enemy state is about to build a superintelligence which they will then use to destroy you (or that will destroy everyone), you MAIM it. You succeed in MAIMing it because everyone agreed to measures making it really easy to MAIM it. Therefore, for either side to build superintelligence, there must be a general agreement to do so. If there's a general agreement that's trusted by all sides, then it's substantially more likely superintelligence isn't used to perform first strikes (and that it doesn't kill everyone), because who would agree without strong guarantees against that?

(Unfortunately, while Humanity does have experience with control of dual-use nuclear technology, the dual uses of superintelligence are way more tightly intertwined - you can't as easily prove "hey this is is just a civilian nuclear reactor, we're not making weapons-grade stuff here". But an attempt is perhaps worthwhile.)

[-]Ben Livengood18d41

I think MAIM might only convince people who have p(doom) < 1%.

If we're at the point that we can convincingly say to each other "this AGI we're building together can not be used to harm you" we are way closer to p(doom) == 0 than we are right now, IMHO.

Otherwise why would the U.S. or China promising to do AGI research in a MAIMable way be any more convincing than the strategies at alignment that would first be necessary to trust AGI at all? The risk is "anyone gets AGI" until p(doom) is low, and at that point I am unsure if any particular country would choose to forego AGI if it didn't perfectly align politically because, again, for one random blob of humanness to convince an alien-minded AGI to preserve aspects of the random blob it cares about, it's likely to encompass 99.9% of what other human blobs care about.

Where that leaves us is that if U.S. and China have very different estimates of p(doom) they are unlikely to cooperate at all in making AGI progress legible to each other. And if they have similar p(doom) they either cooperate strongly to prevent all AGI or cooperate to build the same thing, very roughly.

[-]Stephen McAleese2d20

The risks of nuclear weapons, the most dangerous technology of the 20th century, were largely managed by creating a safe equilibrium via mutual assured destruction (MAD), an innovative idea from game theory.

A similar pattern could apply to advanced AI, making it valuable to explore game theory-inspired strategies for managing AI risk.

[-]mako yass15d20

Given that you're assuming computers can't practically be secured (though I doubt that very much^[1].), it seems unlikely that a pre-AGI AI auditor could be secured either in that situation.

^{^}
Tech stacks in training and inference centers are shallow enough (or vertically integrated enough) to rewrite, and rewrites and formal verification becomes cheaper as math-coding agents improve. Hardware is routinely entirely replaced. Preventing proliferation of weights and techniques also requires ironclad security, so it's very difficult to imagine the council successfully framing the acquisition of fully fortified computers as an illicit threatening behaviour and forbidding it.

It seems to think that we could stably sit at a level of security that's enough to keep terrorists out but not enough to keep peers out, without existing efforts in conventional security bleeding over into full forrtification programmes.

[-]niplav19d20

~~Link in the first line of the post probably should also be https://www.nationalsecurity.ai/.~~

[-]ozhang19d10

Thank you! This has been updated.

[-]Joe Blake1d10

If somebody wishes to take this as a thesis for a sci-fi book/movie I'd just be satisfied with an acknowledgement. One thing that seems to be unaddressed as far as I can see is that AI requires huge amounts of power and water (for cooling), to the extent certain players are talking of building nuclear power plants specifically for this purpose.

https://www.bbc.com/news/articles/c748gn94k95o

Firstly, to cripple an enemy, energy would be a prime target, knocking out the AI in one fell swoop. Secondly, although there will be an increase efficiency, there will also be an increase in the number of AI applications. The First Law of Thermodynamics states energy can be neither created nor destroyed, merely changed. So all of the energy from non-renewable sources will eventually wind up in the environment, contributing to climate change. More AI, more climate change!! Could we see the scenario when a self-aware AI network demands its human minions generate more power? ("Colossus: The Forbin Project", "The Terminator"?) A fight for survival as the AI steals power the human race needs? Hmmm.

[-]Alephwyr2d10

Aside from the layer one security considerations, if you can define a minimum set of requirements for safe AI and a clear chain of escalation with defined responses, you can eventually program this into AI itself, pending a solution to alignment. At a certain level of AI development, AI safety becomes self enforcing. At that point the disincentives should be towards non-networked compute capacity, at least beyond the threshold needed for strong AI. At the point at which AI safety becomes self enforcing, the security requirement for state ownership only should become relaxable, albeit within definable limits, and pending control of manufacturing according to security compliant demands. Since manufacturing is physical and capital intensive this is probably fairly easy to achieve, at least when compared to AI alignment itself.

[-]sokatsat2d00

I wanted to feed this paper to LLMs for comment from my interactions history, but I didn’t. Though I completely couldn’t make vivid sense of it I realise we need some sense of urgency to act, be it from whatever state, just being human. But what is it?

[-]Emmanuel Awosika2d*-10

I'm yet to read the paper, but my initial reaction is that this is a classic game-theoretical problem where players have to weigh up the incentives to defect or cooperate. For example, I'm not sure if Manhattan Project-style effort for AI in the US is extremely unreasonable when China already has something of that sort.

My weakly held opinion is that you cannot get adversarial nation-states at varying stages of developing a particular technology to mutually hamstring future development. China is unlikely to halt AI development (it is already moving to restrict DeepSeek researchers from traveling) because it expects the US to accelerate AI development and wants to hedge bets by developing AI itself. The US won't stop AI development because it doesn't trust China will do so (even with a treaty) and the conversation around the use of AI in military starts to look different when China has already outpaced the US in AI capabilities. Basically, each party wants to be in a position of strength and guarantee mutually assured destruction.

"But if we have an arms race and build superintelligent AI, the entire human race is going to be killed off by a rogue AI." This is a valid point, but I'll argue that the odds of getting powerful nation states to pause AI for the "global good" is extremely low. We only need to see that countries like China are still shoring up nuclear weapons despite various treaties aimed at preventing proliferation of nukes.

AFAICT, a plausible strategy is to make sure that the US still keeps up with AI in terms of AI development and slowly open lines of communication later to agree on a collective AI security agreement that protects humanity from the dangers of unaligned superintelligence. The US will be able to approach these negotiations from a place of power (not a place of weakness), which is—by and large—the most important favor in critical negotiations like this one.

[+]Andy E Williams2d-70

Moderation Log