Why those who care about catastrophic and existential risk should care about autonomous weapons

aaguirre

60 Why those who care about catastrophic and existential risk should care about autonomous weapons

11th Nov 2020

23 min read

60

(crossposted to EA forum here.)

Although I have not seen the argument made in any detail or in writing, I and the Future of Life Institute (FLI) have gathered the strong impression that parts of the effective altruism ecosystem are skeptical of the importance of the issue of autonomous weapons systems. This post explains why we think those interested in avoiding catastrophic and existential risk, especially risk stemming from emerging technologies, may want to have this issue higher on their list of concerns.

We will first define some terminology and do some disambiguation, as there are many classes of autonomous weapons that are often conflated; all classes have some issues of concern, but some are much more problematic than others. We then detail three basic motivations for research, advocacy, coordination, and policymaking around the issue:

Governance of autonomous weapon systems is a dry-run, and precedent, for governance of AGI. In the short term, AI-enabled weapons systems will share many of the technical weaknesses and shortcomings of other AI systems, but like general AI also raise safety concerns that are likely to increase rather than decrease with capability advances. The stakes are intrinsically high (literally life-or-death), and the context is an inevitably adversarial one involving states and major corporations. The sort of global coordination amongst potentially adversarial parties that will be required for governance of transformative/general AI systems will not arise from nowhere, and autonomous weapons offer an invaluable precedent and arena in which to build experience, capability, and best practices.
Some classes of lethal autonomous weapon systems constitute scalable weapons of mass destruction (which may also have a much lower threshold for first use or accidental escalation), and hence a nascent catastrophic risk.
By increasing the probability of the initiation and/or escalation of armed conflict, including catastrophic global armed conflict and/or nuclear war, autonomous weapons represent a very high expected cost that overwhelmingly offsets any gain in life from substituting autonomous weapons for humans in armed conflict.

Classes of autonomous weapons

Because many things with very different characteristics could fall under the rubric of “autonomous weapon systems” (AWSs) it is worth distinguishing and classifying them. First, let us split off cyberweapons – including AI-powered ones – as being an important but distinct issue. Likewise, we’ll set aside AI in other aspects of the military not directly related to the use of force, from strategy to target identification, where it serves to augment human action and decision-making. Rather, we focus on systems that have both (some form of) AI and physical armaments.

We now consider in turn these armaments’ target types, which we will break into categories of anti-personnel weapons, force-on-force (i.e. attacking manned enemy vehicles or structures) weaponry, and those targeting other autonomous weapon systems.

Anti-personnel AWSs can be further divided into lethal (or grossly injurious) ones versus nonlethal ones. While an interesting topic,^[1] we leave aside here non-lethal anti-personnel autonomous weapon systems, which have a somewhat distinct set of considerations.^[2]

We regard force-on-force systems designed to attack manned military vehicles and installations as relatively less intrinsically concerning. The targets of such weapons will, with considerably higher probability, be valid military targets rather than civilian ones, and insofar as they scale to mass damage, that damage will be to an adversary’s military. Of course if these weapons are highly effective, the manned targets they are designed to attack may quickly be replaced with unmanned ones.^[3]

This brings us to force-on-force systems that attack other autonomous weapons (anti-AWSs). These exist now, for example in the form of automated anti-missile systems, and are likely to grow more prevalent and sophisticated. These raise a nuanced set of considerations, as we’ll see. Some types are quite uncontroversial: no one has to our knowledge advocated for prohibiting, say, automated defenses on ships. On the other hand, very effective anti-ballistic missile systems could undermine the current nuclear equilibrium based on mutual assured destruction. And while the prospect of robots fighting robots rather than humans fighting humans is beguiling from the standpoint of avoiding the horrors of war, we’ll argue below that it is very unlikely for this to be a net positive.

This leads to a fairly complex set of considerations. FLI and other organizations have advocated for a prohibition against kinetic lethal anti-personnel weapons, with various degrees of distinction between anti-personnel and force-on-force lethal autonomous weapons, and various levels of concern and proposed regulation concerning some classes of force-on-force autonomous weapons. Motivations for this advocacy vary, but we start with one that is of particular important to FLI and to the EA/long-termist community.

Lethal autonomous weapons systems are an early test for AGI safety, arms race avoidance, value alignment, and governance

There are a surprising number of parallels between the issue of autonomous weapons and some of the most challenging parts of the AGI safety issue. These parallels include:

In both cases, a race condition is both natural and dangerous;
Military involvement is possible in AGI and inevitable for AWSs;
Involvement by national governments is likely in AGI and inevitable for AWSs;
Secrecy and information hazards are likely in both;
Major ethical/responsibility concerns exist for both, perhaps more explicitly in AWSs;
In both cases, unpredictability and loss of control are key issues.
In both cases, early versions are potentially dangerous because of their incompetence; later versions are dangerous because of their competence.

The danger of arms races has long been recognized as a potentially existential threat in terms of AGI: if companies or countries worry that being second to realize a technology could be catastrophic for their corporate or national interest, then safety (and essentially all other) considerations will tend to fall to the wayside. When applied to autonomous weapons, “arms race” is literal rather than metaphorical but similar considerations apply. The general problem with arms races is that they are very easy to lose, but very difficult to win: you lose if you fail to compete, but you also lose if the competition leads to a situation dramatically increasing the risk to both parties, or to huge adverse side-effects; and this appears likely to be the case in autonomous weapons and AGI, just as it was in nuclear weapons.^[4] Unfortunately, the current international and national security context includes multiple parties fomenting a “great powers” rivalry between the US and China that is feeding an arms race narrative in AI in general, including in the military and potentially extending to AGI.

Managing to avoid an arms race in autonomous weapons – via multi-stakeholder international agreement and other means – would set a very powerful precedent for avoiding one more generally. Fortunately, there is reason to believe that this arms race is avoidable.^[5] The vast majority of AI researchers and developers are strongly against an arms race in AWSs,^[6] and AWSs enjoy very little popular support.^[7] Thus prohibition or strong governance of lethal autonomous weapons is a test instance in which the overwhelming majority of AI researchers and developers agree. This presents an opportunity to draw at least some line, in a globally coordinated way, between what is and is not acceptable in delegating decisions, actions, and responsibility to AI. And doing so would set a precedent for avoiding a race by recognizing that even each participant’s interests are better served by at least some coordination and cooperation.

Good governance of AWSs will take exactly the sort of multilateral cooperation, including getting militaries onboard, that is likely to be necessary with an overall AI/AGI (figurative) arms race. The methods, institutions, and ideas necessary to govern AGI in a beneficial and stable multilateral system is very unlikely to arise quickly or from nowhere. It might arise steadily from growth of current AI governance institutions such as the OECD, international standards bodies, regulatory frameworks such as that developing in the EU, etc. But these institutions tend to explicitly and deliberately exclude discussion of military issues so as to make reaching agreements easier. But this then avoids precisely the sorts of issues of national interest and military and geopolitical power that would be at the forefront of the most disastrous type of AGI race. Seeking to govern deeply unpopular AWSs (which also presently lack strong interest groups pushing for them) provides the easiest possible opportunity for a “win” in coordination amongst military powers.

Beyond race vs. cooperative dynamics, autonomous weapons and AGI present other important parallels at the level of technical AI safety and alignment, and multi-agent dynamics.

Lethal autonomous weapon systems are a special case of a more general problem in AI safety and ethics that the technical capability of being effective may be much simpler than what is necessary to be moral or ethical or legal. Indeed the gap between making an autonomous weapon that is effective (successfully kills enemies) and one that is moral (in the sense of, at minimum, being able to act in accord with international law) may be larger than in any other AI application: the stakes are so high, and the situations so complex, that the problem may well be AGI-complete.^[8]

In the short-term, then, there are complex moral questions. In particular, who is responsible for the decisions made by an AI system when the moral responsibility cannot lie with the system? If an AI system is programmed to obey the “law of war,” but then fails, who is at fault? On the flip side, what happens if the AI system “disagrees” with a human commander directing an illegal act? Even a weapon that is very effective at obeying such rules is unlikely to (be programmed to be able to) disobey a direct “order” from its user: if such “insubordination” is possible it raises the risk of incorrigible and intransigent intelligent weapons systems; but if not, it removes an existing barrier to unconscionable military acts. While these concerns are not foremost from the perspective of overall expected utility, for these and other reasons we believe that delegating the decision to take a human life to machine systems is a deep moral error, and doing so in the military sets a terrible precedent.

Things get even more complex when multiple cooperative and adversarial systems are involved. As argued below, the unpredictability and adaptability of AWSs is an issue that will increase with better AI rather than decrease. And when many such agents interact, emergent effects are likely that are even less predictable in advance. This corresponds closely to the control problem in AI in general, and indicates a quite pernicious problem of AI systems leaving humans unable to predict what they will do, or effectively intervene if what they do runs counter to the wishes of human overseers.

In advanced AI in general, one of the most dangerous dynamics is the unwarranted belief of AI developers, users, and funders that AI will – like most engineered technologies – by default do what we want it to do. It is important that those who would research, commission, or deploy autonomous weapons be fully cognizant of this issue; and we might hope that the cautious mindset this could engender could bleed into or be transplanted into safety considerations for powerful AI systems in and out of the military.

Lethal autonomous weapons as WMDs

There is a very strong case to classify some anti-personnel AWSs as weapons of mass destruction. We regard as the key defining characteristic of WMDs^[9] that a single person’s agency directed through the weapon can directly cause many fatalities with very little additional support structure (like an army to command.) This is not possible with “conventional” weapons systems like guns, aircraft, and tanks, where the deaths caused scale roughly linearly with the number of people involved in causing those deaths.

With this definition, some anti-personnel lethal AWSs (such as microdrone munition-carrying “slaughterbots”) would easily qualify. These weapons are essentially (microdrone)+(bullet)+(smartphone components), and with near-future technology and efficiency of scale, slaughterbots could plausibly be as inexpensive as $100 each to manufacture en masse. Even with a 50% success rate and doubling of the cost to account for delivery, this is $400/fatality. Nuclear weapons cost billions to develop, then tens to hundreds of millions per warhead. A nuclear strike against a major city is likely to have hundreds of thousands of fatalities (for example a 100 kiloton strike against downtown San Francisco would cause an estimated 200K fatalities and 400K injuries.) 100,000 kills worth of slaughterbots, at a cost of $40M, would be just as cost-effective to manufacture and deploy, and dramatically cheaper to develop. They are more bulky than a nuclear warhead but could plausibly still fit in a 40’ shipping container (and unlike nuclear, chemical and biological weapons are safe to transport, hard to detect, and can easily be deployed remotely.)

This is possible with near-future technology.^[10] It is not hard to imagine even more miniaturized weaponry, in a continuum that could reach all the way to nanotechnology. And unlike (to first approximation) for nuclear weapons, effectiveness and cost-efficiency are likely to significantly increase with technological improvement.^[11] Thus if even a fraction of the resources that have been put into nuclear weapons were put into anti-personnel lethal AWSs, they could potentially become as large of a threat. Consider that it took less than 20 years from the 1945 Trinity test until the Cuban Missile Crisis that almost led to a global catastrophe, and that a determined but relatively minor program by a major military could likely develop a slaughterbot-type WMD within a handful of years.

One crucial difference between AWs and other WMDs is that the former’s ability to discriminate among potential targets is much better, and this capability should increase with time. A second is that Autonomous WMD would, unlike other WMDs, leave the targeted territory relatively undamaged and quickly inhabitable.

In certain ways these are major advantages: a (somewhat more) responsible actor could use this capability to target only military personnel insofar as they are distinguishable, or target only the leadership structure of some rogue organization, without harming civilians or other bystanders. Even if such distinctions are difficult, such weapons could relatively easily never target children, the wounded, etc. And a military victory would not necessarily be accompanied by the physical destruction of an adversary’s infrastructure and economy.

The unfortunate flip-side of these differences, however, is that anti-personnel lethal AWSs are much more likely to be used. In terms of “bad actors,” along with the advantages of being safe to transport and hard to detect, the ability to selectively attack particular types of people who have been identified as worthy of killing will help assuage the moral qualms that might otherwise discourage mass killing. Particular ethnic groups, languages, uniforms, clothing, or individual identities (culled from the internet and matched using facial recognition) could all provide a basis for targeting and rationalization. And scalable destruction of physical assets would make autonomous WMDs far more strategically effective for seizing territory.

Autonomous WMDs would pose all of the same sorts of threats that other ones do,^[12] from acts of terror to geopolitical destabilization to catastrophic conflict between major powers. Tens of billions of USD are spent by the US and other states to prevent terrorist actions using WMDs and to prevent the “wrong” states from acquiring them. And recall that a primary (claimed) reason for the Iraq war (at trillions of USD in total cost) was its (claimed) possession of WMDs. It thus seems foolish in the extreme to allow – let alone implicitly encourage – the development of a new class of WMDs that could proliferate much more easily than nuclear weapons.

Lethal autonomous weapons as destabilizing elements in and out of war

On the list of most important things in the world, retaining global international peace and stability rates very highly; instability is a critical risk factor for global catastrophic or X-risk. Even nuclear weapons, probably the greatest current catastrophic risk, are arguably stabilizing against large-scale war. In contrast, there are many compelling reasons to see autonomous weapons as a destabilizing effect, perhaps profoundly so.^[13]

For a start, AWSs like slaughterbots are ideal tools of assassination and terror, hence deeply politically destabilizing. The usual obstacles to one individual killing another – technical difficulty, fear of being caught, physical risk during execution, and innate moral aversion – are all lowered or eliminated using a programmable autonomous weapon. All else being equal, if lethal AWSs proliferate, this will make both political assassinations and acts of terror inevitably more possible, and dramatically so if the current rate is limited by any of the above obstacles. Our sociopolitical systems react very strongly to both types of violence, and the consequences are unpredictable but could be very large-scale. Tallying up the economic cost of the largest terror attacks to date – those on 9/11 – surely reaches into trillions of $USD, with an accompanying social cost of surveillance, global conflict, and so on.

Second, like drone warfare, lethals AWSs are likely to further (and more widely) lower the threshold of state violence toward other states. The US, for one, has shown little reluctance to strike targets of interest in certain other countries, and lethal AWSs could diminish the reluctance even more by lowering the level of collateral damage.^[14] This type of action might spread to other countries that currently lack the US’s technical ability to accomplish such strikes. Lethal (or nonlethal) AWSs could also increase states’ ability to perpetrate violence against its own citizens; whether this increases or decreases stability of those states, seems, however, unclear.

Third, AWSs of all types threaten to upset the status quo of military power. The advantage of major military powers rests on decades of technological advantage coupled with vast levels of spending on training and equipment. A significant part of this investment and advantage would be nullified by a new class of weapon that evolves on software rather than hardware timescale. Moreover, even if the current capability “ranking” of military powers were preserved, for a weapon that strongly favors offense (as some have argued for antipersonnel AWSs) there may be no plausible technical advantage that suffices^[15] – indeed this is a key reason that major military powers are so concerned about nuclear proliferation.

Finally, and probably most worrisome, if there is an open arms race in AWSs of all types, we see a dramatically increased risk of accidental triggering or escalation of armed conflict.^[16] A crucial desirable feature of AWSs from the military point of view is to be able to understand and predict^[17] how they will operate in a given situation: under what conditions will they take action on what sorts of targets, and how. This is a very difficult technical problem because, given the variety of situations in which an AWS might be placed, it could easily fall outside the context of its training data. But it is a very crucial one: without such an understanding, fielding an AWS would raise a spectrum of potential unintended consequences.

But now consider a situation in which AWSs are designed to attack and defend against other AWSs. In this case, predictability of how a given AWS will function turns from a desirable feature (for military decision makers to understand how their weapon will function) into an exploitable liability.^[18] There will then be a very strong conflict between the desire to make an AWS predictable to its user, and the necessity of making it unpredictable and unexploitable to its adversary. This is likely to manifest as a parallel conflict between a simple set of clear and followable rules (making the AWS more predictable) versus a high degree of flexibility and “improvisation” (making the AWS more effective but less predictable.) This competition would happen alongside a competition in the speed of the OODA (Observe, Orient, Decide, Act) loop. The net effect seems to inevitably point to a situation in which AWSs react to each other in a way that is both unpredictable in advance, and too fast for humans to intervene. There seems little opportunity for such conflict between such weapons to de-escalate. Inadvertent military conflict is already a major problem when humans are involved who fully understand the stakes. It seems very dangerous to have a situation in which the ability to resist or forestall such escalation would be seen as a major and exploitable military disadvantage.

Keeping the threshold for war high is very obviously very important but it is worth looking at the numbers. A large-scale nuclear war is unbelievably costly: it would most likely kill 1-7Bn in the first year and wipe out a large fraction of Earth’s economic activity (i.e. of order one quadrillion USD or more, a decade worth of world GDP.)Some current estimates of the likelihood of global-power nuclear war over the next few decades range from ~0.5-20%. So just a 10% increase in this probability, due to an increase in the probability of conflict that leads to nuclear war, costs in expectation ~500K - 150m lives and ~$0.1-10Tn (not counting huge downstream life-loss and economic losses). Insofar as saving the lives of soldiers is an argument in favor of deploying AWSs, it seems exceedingly unlikely that substituting lethal AWSs for soldiers will ever save this many lives or value: AWSs are unlikely to save any lives in a global thermonuclear war, and it is hard to imagine a conventional war of large enough scale that AWSs could substitute for this many humans, without the war escalating into a nuclear one. In other words, imagine a war with human combatants in which are expected to die, with probability $P_{n}$ of that or another related war escalating into a nuclear exchange costing $M$ lives. We suppose that we might replace these $N$ human combatants with autonomous ones but at the cost of increasing the probability to $P_{y}$ . The expected deaths are $N + p_{n} M$ in the human-combatant case and $P_{y} M$ in the autonomous combatant case, with a difference in fatalities of ( $P_{y} - P_{n}) (M - N)$ . Given how much larger $M$ (~1-7 Bn) is than $N$ (tens of thousands at most) it only takes a small difference $(P_{y} - P_{n})$ for this to be a very poor exchange.

What should be done?

We’ve argued above that the issue of autonomous weapons is not simply concerns about soulless robots killing people or discomfort with the inevitable applications of AI to military purposes. Rather, particular properties of autonomous weapons seem likely to lead, in expectation, to a substantially more dangerous world. Moreover, actions to mitigate this danger may even help – via precedent and capability-building – in mitigating others. The issue is also relatively tractable – at least for now, and in comparison to more intractable-but-important issues like nuclear accident risk or the problematic business model of certain big tech companies. Although involvement of militaries makes it difficult, there is as yet relatively little strong corporate interest in the issue.^[19] International negotiations exist and are underway (though struggling to make significant headway.) It is also relatively neglected, with a small number of NGOs working at high activity, and relatively little public awareness of the issue. It is thus a good target for action by usual criteria.

Arguments against being concerned with autonomous weapons appear to fall into three general classes:^[20]The first is that autonomous weapons are a net good. The second is that autonomous weapons are an inevitability, and there’s little or nothing to be done about it. The third is simply that this is “somebody else’s problem,” and low-impact relative to other issues to which effort and resources could be devoted.^[21] We’ve argued above against all three positions: the expected utility of widespread autonomous weapons is likely to be highly negative (due to increase probability of large-scale war, if nothing else), the issue is addressable (with multiple examples of past successful arms-control agreements), currently tractable if difficult, and success would also improve the probability of positive results in even more high-stakes arenas including global AGI governance.

If the issue of autonomous weapons is important, tractable and neglected, it is worth asking what success would look like. Many of the above concerns could be substantially mitigated via an international agreement governing autonomous weapons; unfortunately they are unlikely to be significantly impacted by lesser measures. Arguments against such an agreement tend to focus on how hard or effective it would be, or conflate very distinct considerations or weapons classes. But there are many possible provisions such an agreement could include that would be net-good and that we believe many countries (including major military powers) might agree on. For example:

Some particular, well-defined, classes of weapons could be prohibited (as biological weapons, laser blinding weapons, space-based nuclear weapons, etc., are currently). Weapons with high potential for abuse and relatively little real military advantage to major powers (like slaughterbots) should be first in line. Automated primarily defensive weaponry targeting missiles or other unmanned objects, or non-injurious AWSs, very probably should not be prohibited in general. The grey area in the middle should be worked out in multilateral negotiation.
For whatever is not prohibited, there could be agreements (supplemented by internal regulations) regarding proliferation, tracking, attribution, human control, etc., to AWSs; for some examples see this “Roadmapping exercise,” which emerged as a sketch of consensus recommendations from a meeting between technical experts with a very wide range of views on autonomous weapons.

Highlighting the risks of autonomous weapons may also encourage militaries to invest substantially in effective defensive technologies (especially those that are non-AI and/or that are purely defensive rather than force-on-force) against lethal autonomous weapons, including the prohibited varieties. This could lead to (an imperfect but far less problematic than our current trajectory) scenario in which anti-personnel AWSs are generally prohibited, yet defended against, and other AWSs are either prohibited or governed by a strong set of agreements aimed at maintaining a stable detente in terms of AI weapons.

FLI has advanced the view – widely shared in the AI research community – that the world will be very ill-served by an arms race and unfettered buildup in autonomous weaponry. Our confidence in this is quite high. We have further argued here that the stakes are significantly greater than many have appreciated, which has motivated both FLI’s advocacy in this area as well as this posting. Less clear is how much and what can be done about the dynamics driving us in that direction. We welcome feedback both regarding the arguments put forth in this piece, and more generally about what actions can be taken to best mitigate the long-term risks that autonomous weapons may pose.

I thank FLI staff and especially Jared Brown and Emilia Javorsky for helpful feedback and notes on this piece.

Notes

As a major advantage, nonlethal autonomous weapons need not defend themselves and so can take on significant harm in order to prevent harm while subduing a human. On the other hand if such weapons become _too _effective they may make it too easy and “low-cost” for authoritarian governments to subdue their populace. ↩︎
Though we would note that converting a nonlethal autonomous weapon into a lethal one could require relatively small modification, as it would really amount to using the same software on different hardware (weapons). ↩︎
Autonomy also created new capabilities – like swarms – that are wholly new and will subvert existing weapons categories. The versatility of small, scalable, lethal AWS is of note here, as they might be quickly repurposed for a variety of target types, with many combining to attack a larger target. ↩︎
The claim here is not that nuclear weapons are without benefit (as they arguably have been a stabilizing influence so far), but the arms race to weapons numbers far beyond deterrence probably is. Understanding of nuclear winter laid bare the lose-lose nature of the nuclear arms race: even if one power were able to perform a magically effective first strike to eliminate all of the enemy’s weapons, that power would still find itself with a starving population. ↩︎
AI will be unavoidably tied to military capability, as it has appropriate roles in the military that would be unpreventable even if this were desirable. However this is very different from an unchecked arms race, and de-linking AI and weaponry as much as possible seems a net win. ↩︎
For example in polling for the Asilomar Principles among many of the world’s foremost AI researchers, Principle 18, “An arms race in lethal autonomous weapons should be avoided,” polled the very highest. ↩︎
This survey shows about 61% pro and 22% con for their use. This article points to a more recent EU poll with high (73%) support for an international treaty prohibiting them. It should be noted that both surveys were commissioned by the Campaign to Stop Killer Robot. This study argues that opinions can easily change due to additional factors, and in general we should assume that public understanding of autonomous weapons and their implications is fairly low. ↩︎
There is significant literature and debate on the difficulty of satisfying requirements of international law in distinction, proportionality; see e.g. this general analysis, and this discussion of general issues of human vs. machine control. Beyond the question of legality are moral questions, as explored in detail here for example. ↩︎
The term “WMD” is somewhat poorly defined, sometimes conflated with the trio of chemical, biological and nuclear weapons. But if we define WMDs in terms of characteristics such that the term could at least in principle apply both to and beyond nuclear, chemical and biological weapons, then it’s hard to avoid including anti-personnel AWSs. One might include additional or alternative characteristics that (a) WMDs must be very destructive, and/or (b) that they are highly indiscriminate, and/or (c) that they somehow offend human sensibilities through their mode of killing. However, (a) chemical and biological weapons are not necessarily destructive (other than to life/humans); (b) if biological weapons are made more discriminate, e.g. to attack only people with some given set of genetic markers, they would almost certainly still be classed as WMDs and arguably be of even more concern; (c) “offending sensibilities” is rather murkily defined. ↩︎
It has been argued that increasing levels of autonomy in loitering munition systems represent a slippery slope, behaving functionally as lethal autonomous weapons on the battlefield. Some of the systems identified as of highest concern and also of lower cost relative to large drones have been deployed in recent drone conflicts in Libya and Nagorno-Karabakh. ↩︎
While speculating on particular technologies is probably not worthwhile, note that the physical limits are quite lax. For example, ten million gnat-sized drones carrying a poison (or noncontagious bioweapon) payload could fit into a suitcase and fly at 1 km/hr (as gnats do). ↩︎
Note that autonomous WMDs could also be combined with or enable other ones: miniature autonomous weapons could efficiently deliver a tiny chemical, biological or radiological payload, combining the high lethality of existing WMDs with the precision of autonomous ones. ↩︎
For some analyses of this issue see this UNIDIR report and this piece. Even the dissertation by Paul Scharre, concludes that “The widespread deployment of fully autonomous weapons is therefore likely to undermine stability because of the risk of unintended lethal engagements.” and recommends regulatory approaches to mitigate the issue. ↩︎
In the case of the US, this effect is likely to be present even if lethal AWSs were prohibited – human-piloted microdrones or swarms should be able to provide most of the advantages as lethal AWSs, except in rare circumstances when the signal can be blocked. ↩︎
Israel presents a particularly important case. While its small population motivates replacing or augmenting human soldiers with machines, to us it seems unwise to seek unchecked global development of lethal AWSs, when it is surrounded by adversaries perfectly capable of developing and fielding them. ↩︎
This RAND publication lays out the argument in some detail. ↩︎
For detailed discussion of these terms, see e.g. this UNIDIR report. ↩︎
Autonomous weapons developers are already thinking along these lines of course; see for example this article about planning to undermine drone swarms by predicting and intervening in their dynamics. ↩︎
While arms manufacturers will tend to disfavor limitations on arms, few if any are currently profiting from the sorts of weapons that might be prohibited by international agreement, and there is plenty of scope for profit-making in designing defenses against lethal autonomous weapons, etc. ↩︎
We leave out disingenuous arguments against straw men such as “But if we give up lethal autonomous weapons and allow others to develop them, we lose the war.” No one serious, to our knowledge, is advocating this – the whole point of multilateral arms control agreements is that all parties are subject to them. Ironically, though, this self-defeating position is the one taken at least formally by the US (among others), for which current policy largely disallows (though see this re-interpretation) fully autonomous lethal weapons, even while the US argues against a treaty creating such a prohibition for other countries. ↩︎
A more pernicious argument that we have heard is that advocacy regarding autonomous weapons is antagonistic to the US military and government, which could lead to lack of influence in other matters. This seems terribly misguided to us. We strongly believe US national security is served, rather than hindered, by agreements and limitations on autonomous weapons and their proliferation. There is a real danger that US policymakers and military planners are failing to realize this precisely due to lack of input from experts who understand the issues surrounding AI systems best. Moreover, neither the US government nor the US military establishment are monolithic institutions, but huge complexes with many distinct agents and interests. ↩︎

Autonomous WeaponsWarAIWorld Optimization

Frontpage

60

Mentioned in

1372020 AI Alignment Literature Review and Charity Comparison

New Comment

20 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:49 PM

[-]Ben Pace4y300

Thank you for this post. I’ve long felt confused about FLI’s epistemic state on lethal autonomous systems, and this post definitely helps.

Okay, so let me try to restate your position in my words. I think there’s two parts:

The basic reasons why it’s important in its own right is that this seems like a qualitative step forward in the offensive abilities of countries. This both means directly more death, and also the potential for more targeted acts of political violence (assassinations and terrorism) that are politically destabilizing, which seems bad.
But, perhaps more importantly, we’re going to need to be able to set global regulation on AI development. This is a necessary such regulation, and it seems like one of the easier wins (it’s very visceral and related to countries taking power over each other) and so it seems worth it to fight for this particular international treaty, both on its own merits and also as a test run (and to set up infrastructure) for far more complex regulation required later.

I'm still thinking through my thoughts.

In summary, I think my current mainline expectation is that if such a treaty were to be passed, it would not meaningfully improve the situation regarding military use of ML, and it would not meaningfully improve the situation regarding development of AGI. As a result I am further concerned it would spend all the goodwill countries and companies had on setting global regulations.

A few more details (pardon the length of the second one):

Firstly, I am not confident that this is an appropriate trial test for issues around AGI later on. Autonomous weapons, surveillance, hacking, etc, are all involved in intelligence and warfare. But it seems at least >30% probably to me that the first sufficiently-powerful AGI will be built in the course of trying to make systems that can do (a) science and engineering or (b) create profitable services in the economy. It doesn't seem obvious to me that treaties about warfare are an appropriate tool to deal with this.

Secondly, I am concerned that this regulation itself is exceedingly complex and may straightforwardly fail.

Briefly repeating what I think is your position: my sense is you think that while this is not the perfect regulation, on the margin it is strongly net positive, and that everyone will broadly be able to agree on this. Analogously, consider a country with no drug regulations. They may first set up regulations around selling poisons that can kill humans. Later on they will try to figure out the details about taxes and permissions around a whole host of pharmaceuticals and so on, but first it’s appropriate to set up the most obviously important regulations. You say that “This presents an opportunity to draw at least some line, in a globally coordinated way, between what is and is not acceptable”

Overall though I am not at all confident you will be able to set up regulation that does anything successfully around autonomous weapons, nor anything as simple to understand as “don’t sell poison”. Things around having ‘a human in the loop’ seem too weak to me to actually do anything or prevent basically all the potential problems of autonomous weapons (e.g. it doesn't seem to meaningfully change the system to have a guy sitting in an office hitting a button that says "confirm" every time without having a deep understanding of the tradeoffs the ML system is making).

An analogous proposal might be something like “Let’s draw a line in the sand and say that Newsfeed algorithms by major social networks cannot show you fake news. Later on we’ll figure out the more subtle regulations around how to ensure that they’re helping you connect to other people and not doing any bad censorship and aren’t encouraging bullying and harassment, but let’s just draw a line in the sand we can all agree on.”

Just as it is not easy to draw a line between newsfeed algorithms that do and don’t share fake news, it is not clear to me that it is easy to draw a line between machine learning weapons systems that are autonomous and non-autonomous. I'm not certain.

(Relatedly, I think the time to write a workable proposal is before organizing such a campaign, not after.)

[-]aaguirre4y80

I'd say you are summarizing at least part of the reasoning as I see it, but I'd add that AWs in general seem likely to significantly increase the number of conflicts and the risk of escalation into a full-scale war (very likely to then go nuclear).

I'm not sure what basis there is for thinking that there is some level of "finite supply" of goodwill toward international agreements. Indeed my intuition would be that more international agreements set precedent and mechanisms for more others, in more of a virtuous than self-limiting cycle. If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. Possibly it's more like lawmaking where there is some level of political capital a given party has and is able to spend; I guess that depends on to what degree the parties see it as a zero sum vs. positive-sum negotiation.

But it seems at least >30% probably to me that the first sufficiently-powerful AGI will be built in the course of trying to make systems that can do (a) science and engineering or (b) create profitable services in the economy. It doesn't seem obvious to me that treaties about warfare are an appropriate tool to deal with this.

I agree: it's quite possible that AGI will develop fairly slowly or pretty firmly in the private sector and military and government involvement will be secondary and not that crucial to the dynamics. In this case the AW governance work would be less relevant and precedent-setting. In that case international governance work like at the OECD would be much more relevant. But since we don't know, I think it makes sense to plan for multiple scenarios.

Just as it is not easy to draw a line between newsfeed algorithms that do and don’t share fake news, it is not clear to me that it is easy to draw a line between machine learning weapons systems that are autonomous and non-autonomous. I'm not certain.

I agree that this is an interesting analogy, and in both cases it's hard. But because something is hard does not necessarily mean it isn't worth doing (in both cases.) In the newsfeed case I expect "outlawing fake news" is indeed unworkable. But in trying to figure out what might work, actually interesting solutions may well arise. Likewise AW governance will be difficult, but our experience was that once we got some real experts into a room to think about what might actually be done, there were all sorts of good ideas.

[-]habryka4y110

But in trying to figure out what might work, actually interesting solutions may well arise.

Hmm, this feels like it highlights some problem I have with FLI's work in this domain. As you seem to agree with here, it's pretty plausible that there is no legislation that is particularly useful in this space, because legislation is really heavily limited by how complicated and nuanced it can be, and heavily constrained by how robust to rules-lawyering it has to be, and so it's pretty plausible to me that all legislation in this space is a bad idea.

But both this article, and a bunch of other FLI material, treat the bottom-line as "there is obviously legislation that is good here", which just feels pretty off to me, and don't feel like the existing material has met the necessary burden of proof to show that marginal legislation here would work, and be net-positive.

In general, it seems that historically humanity has been vastly overeager in trying to solve problems with legislative regulation, and overestimating what problems can be solved with legislative regulation, causing a really massive amount of damage (and, in my worldview, also contributing substantially to humanity's inability to navigate existential and catastrophic risk, as evidenced by COVID). As such, I feel like we have a pretty strong responsibility to not call for further regulation that is badly suited to solving the problems at hand, and this domain very much pattern matches to this problem for me.

I can imagine changing my mind on this after looking more into the details and the specific policy proposals, but the specific things I've heard of, like mandating to have a human in the loop, don't seem like good solutions to me that solve substantial parts of the problem. There is some chance that somewhere in FLI's proposals there are policy solutions that feel both feasible and actually useful to me, but I haven't encountered them, and on-priors, the type of communication that FLI seems to be pursuing (launching viral campaigns with satirical videos and dramatic presentation, together with blogposts mostly filled with strong rhetoric and little nuanced analysis), really doesn't fill me with confidence that there is serious effort put into really asking the question of whether regulation here is at all a good idea, and how regulation could backfire.

Maybe this is how policy gets made in practice, but then again, the vast majority of policy passed is strongly net-negative, so just because that's how policy usually gets made, doesn't mean we should do the same.

[-]aaguirre4y150

Thanks Oliver for this, which likewise very much helps me understand better where some of the ideological disagreements lie. Your statement “but then again, the vast majority of policy passed is strongly net-negative” encapsulates it well. Leaving aside that (even if we could agree on what “positive” and “negative” were) this seems almost impossible to evaluate, it indicates a view that the absence of a policy on something is “no policy”. Whereas in my view in the vast majority of situations the absence of some policy is some other policy, whether it’s explicit or implicit. Certainly in the case of AWs, the US (and other militaries) will have some policy about AWs. That’s not at issue. At issue is what the policy will be. And some policies (like prohibiting autonomous WMDs) make much less sense in the absence of the context of an international agreement. So creating that context can create the possibility for a wider range of policies, including better ones.

More generally, when you look at arenas where there is “no policy” often there actually is one. For example, as I understand it, the modern social media ecosystem does not exist because there is no policy governing it, but due to the DMCA. Had that Act been different (or nonexistent) other policies would be governing things instead, for better or worse. Or, if there were no FDA, there would still be policy: it would govern advertising, and lawsuits, and independent market-based drug-testers, and so on. In a more abstract sense, I view policy as a general term for the basic legal structure governing how our society works, and there isn’t a “default” setting. There are settings in various situations that are “leave this to market forces” or “tightly regulate this with a strict government agency” and all manner of others, but those are generally choices implicitly or explicitly made. The US has made “leave it to market forces" much more of the norm and default (which has had a lot of great results!), but that is, at a higher scale, still a policy choice. There are lots of other ways to organize a society, and we’re tried some of them. When we’re talking about development of weapons and international diplomacy I don’t think there is a reliably good default — at all.

So I think it’s quite reasonable to ask for what policy proposals are, and evaluate the particular proposals — as Ben and you are doing. But I don’t think it’s fair or wise to assume that the policies that will be generated by existing actors, in AI weaponry, or AGI, or whatever else, are likely to be particularly good. Policies will exist whether we participate or not, and they will come into being due to the efforts of policymakers guided by various interests. That they are most likely better without the input of groups like FLI who understand the issues and stakes, encompass a lot of expertise, and are working purely in the interest of humanity and it’s longterm flourishing, seems improbably even from an outside view. And of course from the inside view seeing the actual work we have done I don’t think it’s the case.

[-]habryka4y100

If the above seemed confusing, just replace “policy” with “regulation” and my point doesn’t change very much. I feel like it’s not that hard to reliably identify worlds with more vs. less government regulation. I agree that in some abstract sense “there is always a policy”, but I am pointing to a much more concrete effect, which is that most passed regulation seems net-negative to me, whether national or international.

I think it’s very reasonable to try to influence and change regulation that will be passed anyways, but it seems that FLI is lobbying for passing new regulation, and importantly saying nothing about where the boundaries of the regulation should be, and exploring when passing that regulation would be net negative.

It seems that on-net, FLI’s actions are going to cause more regulation to happen, with the potential for large negative externalities over the status-quo (which is some mixture of leaving it to market forces, social norms, trust relationships, etc.). Those negative externalities won’t necessarily be directly in the domain of autonomous weapons. Good candidates for externalities are less economic growth, less trust between policy makers and scientists, potential for abuse by autocratic governments, increased confusion and complexity of navigating the AI space due to regulation making everything a minefield.

Another way to phrase my concerns is that over the last 50 years, it appears quite plausible to me that due to interest groups, not too dissimilar to ours such as the environmentalist movement, we have basically crippled our modern economy by passing a ridiculous volume of regulation interfering with every part of business and life. Science has substantially slowed down, with government involvement being one of the top candidates for the cause. This means, that from my perspective of global trustworthiness, we have a responsibility to really carefully analyze whether the marginal piece of legislation we are lobbying for is good, and whether it meets a greatly elevated standard of quality. I don’t see that work as having occurred here.

Indeed, almost all the writing by FLI is almost indistinguishable from writing coming from the environmentalist movement, which has caused very large amounts of harm. I am open to supporting regulation in this space, I just really want us to be in a different reference class than past social movements that called for broad regulation, without taking into account the potential substantial costs of that regulation.

[-]aaguirre4y70

"Regulation," in the sense of a government limitation on otherwise "free" industry does indeed make a bit more sense, and you're certainly entitled to the view that many pieces of regulation of the free market are net negative — though again I think it is quite nuanced, as in many cases (DMCA would be one) regulation allows more free markets that might not otherwise exist.

In this case, though, I think the more relevant reference class is "international arms control agreements" like the bioweapons convention, the convention on conventional weapons, the space treaty, landmine treaty, the nuclear nonproliferation treaties, etc. These are not so much regulations as compacts not to develop and use certain weapons. They may also include some prohibitions on industry developing and selling those weapons, but the important part is that the militaries are not making or buying them. (For example, you can't just decide to build nuclear weapons, but I doubt it is illegal to develop or manufacture a laser-blinding weapon or landmine.)

The issue of regulation in the sense of putting limitations on AI developers (say toward safety issues) is worth debating but I think is a relatively distinct one. It is absolution important to carefully consider whether a given piece of policy or regulation is better than the alternatives (and not, I say again, "better than nothing" because in general the alternative is not "nothing.") And I think it's vital to try to improve existing legislation etc., which has been most of FLI for example's focus.

[-]habryka4y60

Hmm, yeah, I think there is still something missing here. I agree that regulation on a "free" industry is one example that I am thinking of, and certainly one that matters a good amount, but I am talking about something a bit broader. More something like "governments in general seem to take actions of a specific shape and structure, and in some domains, actions of that shape can make problems worse, not better".

Like, whether something is done via an international arms control agreement, or locally passed legislation, the shape of both of those actions is highly similar, with both being heavily constrained to be simple, both being optimized for low trust environments, both requiring high levels of legibility, etc. In this perspective, there are of course some important differences between regulation and "international arms control agreements", but they clearly share a lot of structure, and their failure modes will probably be pretty similar.

I am also importantly not arguing for "let's just ignore the whole space". I am arguing for something much more like "it appears that in order to navigate this space successfully, it seems really important to understand past failures of people who are in a similar reference class to us, and generally enter this space with sensible priors about downside risk".

I think past participants in this space appear to have very frequently slid down a slope of deception, exaggeration and adversarial discourse norms, as well as ended up being taken advantage of by local power dynamics and short-term incentives, in a way that caused them to lose at least a lot of my trust, and I would like us to act in a way that is trustworthy in this space. I also want us to avoid negative externalities for the surrounding communities and people working in the x-risk space (by engaging in really polarizing discourse norms, or facilitating various forms of implicit threats and violence).

I think one important difference might be that I view the case of AI development and regulation around autonomous weapons deeply interlinked, at the very least via the people involved in both (such as FLI). If we act in a way that doesn't seem trustworthy in the space of autonomous weapons, then that seems likely to reduce our ability to gain trust and enact legislation that is more directly relevant and important to issues related to AI Alignment. As such, while I agree substantially that the problems in this space shares structure with the longer-term AI Alignment problem, it strikes me as being of paramount importance to display excellent judgement on which problems should be attacked via regulation and which ones should not be. It appears to me that the trust to design and enact legislation is directly dependent on the trust that you will not enact bad or unnecessary regulation, and as such, getting this answer right in the case of autonomous weapons seems pretty important (and which is why I find the test-case framing somewhat less compelling).

My guess is there are a number of pretty large inferential distances here, so not sure how to best proceed. I would be happy to jump on a call sometime, if you are interested. In all of this I also want to make clear that I am really glad you wrote this post, and while I have some disagreements with it, it is a great step forward in helping me understand where FLI is coming from, and it contains a number of arguments I find pretty compelling and hadn't considered before.

[-]aaguirre4y120

You're certainly entitled to your (by conventional standards) pretty extreme anti-regulatory view of e.g. the FDA, IRBs, environmental regulations, etc., and to your prior that regulations are in general highly net negative. I don't share those views but I think we can probably agree that there are regulations (like seatbelts, those governing CFCs, asbestos, leaded gasoline, etc.) that are highly net positive, and others (e.g. criminalization of some drugs, anti-cryptography, industry protections against class action suits, etc.) that are nearly completely negative. What we can do to maximize the former and minimize the latter is a discussion worth having, and a very important one.

In the present case of autonomous weapons, I again think the right reference class is that of things like the bioweapons convention and the space treaty. I think these, also, have been almost unreservedly good: made the world more stable, avoided potentially catastrophic arms races, and left industries (like biotec, pharma, space industry, arms industry) perfectly healthy and arguably (especially for biotech) much better off than they would have been with a reputation mixed up in creating horrifying weapons. I also think in these cases, as with at least some AWs like antipersonnel WMDs, there is a pretty significant asymmetry, with the negative affects (of no regulation) having a tail into extremely bad outcomes, while the negative affects of well-structured regulations seem pretty mild at worst. Those are exactly the sorts of regulations/agreements I think we should be pushing on.

Very glad I wrote up the piece as I did, it's been great to share and discuss it here with this community, which I have huge respect for!

[-]habryka4y130

the space treaty

It's been a while since I looked into this, but if I remember correctly the space treaty is currently a major obstacle to industrializing space, due to making mostly impossible to have almost any form of private property of asteroids, or the moon, or other planets, and creating a large fraction of regulatory ambiguity and uncertainty for anyone wanting to work in this space.

When I talked to legal experts in the space for an investigation I ran for FHI 3-4 years ago, my sense was that the space treaty was a giant mess, nobody knew what it implied or meant, and that it generally was so ambiguous and unclear, and with so little binding power, that none of the legal scholars expected it to hold up in the long run.

I do think it's likely that the space treaty overall did help demilitarize space, but given a lot of the complicated concerns around MAD, it's not obvious to me that that was actually net-positive for the world. In any case, I really don't see why one would list the space treaty as an obvious success.

[-]aaguirre4y120

I am not an expert on the Outer Space Treaty either, but by also by anecdotal evidence, I have always heard it to be of considerable benefit and a remarkable achievement of diplomatic foresight during the Cold War. However, I would welcome any published criticisms of the Outer Space Treaty you wish to provide.

It's important to note that the treaty was originally ratified in 1967 (as in, ~two years before landing on the Moon, ~5 years after the Cuban Missile Crisis). If you critique a policy for its effects long after its original passage (as with reference to space mining, or as others have the effects of Section 230 of the CDA passed in 1996), your critique is really about the government(s) failing to update and revise the policy, not with the enactment of original policy. Likewise, it is important to run the counterfactual to the policy never being enacted. In this circumstance, I’m not sure how you envision a breakdown in US-USSR (and other world powers) negotiations on the demilitarization of space in 1967 would have led to better outcomes.

[-]habryka4y200

your critique is really about the government(s) failing to update and revise the policy, not with the enactment of original policy.

This feels like a really weird statement to me. It is highly predictably that as soon as a law is in place, that there is an incentive to use it for rent-seeking, and that abolishing a policy is reliably much harder than enacting a new policy. When putting in place legislation, the effects of your actions that I will hold you responsible for of course include the highly predictably problems that will occur when your legislation will not have been updated and revised in a long time. That's one of the biggest limitations of legislation as a tool for solving problems!

[-]Ben Pace4y20

Just want to acknowledge that I agree it's worth working on even if it's not the only scenario by which AGI might be developed.

The main thing I want to reply to has already been said by Oliver, that I want to see a thoughtful policy proposal first, and expect any such campaign to be net negative before then.

Regarding the amount of goodwill, I'm talking about the goodwill between the regulators and the regulated. When regulatory bodies are incredibly damaging and net negative (e.g. the FDA, the IRB) then the people in the relevant fields often act in an actively adversarial way toward those bodies. If a group passes a pointless, costly regulation about military ML systems, AI companies will (/should) respond similarly, correctly anticipating future regulatory bloat and overreach.

[-]MichaelA4y10

If I had to choose between a AW treaty and some treaty governing powerful AI, the latter (if it made sense) is clearly more important. I really doubt there is such a choice and that one helps with the other, but I could be wrong here. [emphasis added]

Did you mean something like "and in fact I think that one helps with the other"?

[-]anon034y10

I'm concerned that these efforts are just delaying the inevitable. (Which is still very worthwhile!!) But in the longer run, we're just doomed!

Like, the people in the military and defense contractors developing autonomous drone navigation systems are doing the exact same thing as probably dozens of university researchers, drone agriculture technology companies, Amazon, etc. In fact the latter are probably doing it better!

So ideally we want a high technological barrier between what's legal and the weapons that we don't want to exist, otherwise anyone can immediately build the weapons. What's the nature of that technological barrier? Right now it's the navigation / AI, but again that's not gonna last, unless we block drone navigation AI at companies and universities which is not politically feasible. What else is there? The drone hardware? Nope. The weapon carried by the drone? I mean, with some string etc., a quadcopter can carry a handgun or little bomb or whatever, so this doesn't seem like much of a technological barrier, although it's better than nothing, and certainly it's better than having a nicely-packaged armed drone directly for sale. So yeah, I'm inclined to say that we're just doomed to have these things for sale, at least by organized crime groups, sooner or later. I don't know, that's just the conclusion I jump to with no particular knowledge.

[-]aaguirre4y40

As with chemical, biological, and nuclear weapons, it will/would be difficult to forestall determined people from getting their hands on them indefinitely — and probably more difficult than any of those cases since there's indeed lots of dual use from drones, and you won't (probably) fear for your life in building one.

Nonetheless I think there is a huge difference between weapons built by amateurs (and even by militaries in secret) versus an open and potential arms-race effort by major military powers. No amateur is going to create a drone WMD, and we can hope that at some level nation-state level anti-AW defenses can keep up with a much less determined program of AW development.

[-]Gyrodiot4y10

Thank you for this clear and well-argued piece.

From my reading, I consider three main features of AWSs in order to evaluate the risk they present:

arms race avoidance: I agree that the proliferation of AWSs is a good test bed for international coordination on safety, which extends to the widespread implementation of safe powerful AI systems in general. I'd say this extends to AGI, were we would need all (or at least the first, or only some, depending on takeoff speeds) such deployed systems to conform to safety standards.
leverage: I agree that AWSs would have much greater damage/casualties per cost, or per human operator. I have a question regarding persistent autonomous weapons which, much like landmines, do not require human operators at all once deployed: what, in that case, would be the limiting component of their operation? Ammo, energy supply?
value alignment: the relevance of this AI safety problem to the discussion would depend, in my opinion, on what exactly is included in the OODA loop of AWSs. Would weapon systems have ways to act in ways that enable their continued operation without frequent human input? Would they have other ways than weapons to influence their environment? If they don't, is the worst-case damage they can do capped at the destruction capabilities they have at launch?

I would be interested by a further investigation on the risk brought by various kinds of autonomy, expected time between human command and impact, etc.

[-]adamShimi4y10

Thanks for this post! I found it clear and relatively persuading. It also taught me details about the different forms of AWSs, and what risks they entail.

That being said, I don't think it will impact my own research, as I don't see any specific topic of technical AI Safety that is more relevant to this question. You seem to defend the fact that Alignment is important for AWSs, but that is the goal of technical AI Safety anyway. For governance, I think you present interesting possibilities to work on the problem.

[-]aaguirre4y10

Thanks for your comment. In terms of technical AI safety, I think an interesting research question is the dynamics of multiple adversarial agents — i.e. is there a way to square the need for predicability and control with the need for a system to be unexploitable by an adversary, or are these in hopeless tension? This is relevant for AWs, but seems to also potentially be quite relevant for any multipolar AI world with strongly competitive dynamics.

[-]Eliezer Yudkowsky4y260

To answer your research question, in much the same way that in computer security any non-understood behavior of the system which violates our beliefs about how it's supposed to work is a "bug" and very likely en route to an exploit - in the same way that OpenBSD treats every crash as a security problem, because the system is not supposed to crash and therefore any crash proves that our beliefs about the system are false and therefore our beliefs about its security may also be false because its behavior is not known - in AI safety, you would expect system security to rest on understandable system behaviors. In AGI alignment, I do not expect to be working in an adversarial environment unless things are already far past having been lost, so it's a moot point. Predictability, stability, and control are the keys to exploit-resistance and this will be as true in AI as it is in computer security, with a few extremely limited exceptions in which randomness is deployed across a constrained and well-understood range of randomized behaviors with numerical parameters, much as memory locations and private keys are randomized in computer security without say randomizing the code. I hope this allows you to lay this research question to rest and move on.

[-]jimrandomh4y40

I started a reply to this comment and it turned into this shortform post.

Moderation Log