A missing point in favor of coordination getting easier: AI safety as a field seems likely to mature over time, and as it does the argument "let's postpone running this AGI code until we first solve x" may become more compelling, as x increases in legibility and tractability.
elityre makes a sincere effort to examination of the question from the ground up. But this overlooks the work that's already been done in similar fields. A lot of what has been accomplished with regard to applied genetic research is likely to be transferable, for instance.
More generally, formal methods of safety engineering can provide a useful framework, when adapted flexibly to reflect novel aspects of the question.
Are there existing agreements constraining the deployment of applied genetic research? What are the keywords I should search for, if I want to know more?
The only thing I know about this area is that an unaffiliated researcher used CRISPR to modify human embryos, and that most of the field rebuked him for it. This suggests that there are general norms about which experiments are irresponisble to try, but not strong coordination that prevents those experiments from being done.
I suspect that one of the factors that will make coordinating to not build AGI harder is that the incentive to build AGI will become greater for a larger amount of people. Right now, there's a large class of people who view AI as a benign technology, that will bring about large amounts of economic growth, and that it's effects are going to be widespread and positive. I think this position is best captured by Andew Ng when he says "AI is the new electricity". Likewise, the Whitehouse states "Artificial intelligence holds the promise of great benefits for American workers, with the potential to improve safety, increase productivity, and create new industries we can’t yet imagine.".
However, as time goes by AI capabilities grow and so will public demonstrations of what's possible with AI. This will cause people to revise upwards their beliefs about the impact/power of AI and AGI and drag far more actors into the game. I think that if the Whitehouse shared the views of DeepMind or OpenAI on AGI, they wouldn't hesitate to start the equivalent of a second Manhattan project.
New consideration: hyperbolic time discounting suggests it gets harder over time. It's easier to lose a benefit that seems far off in the future than to lose a benefit that seems imminent.
(Though usually I think of this consideration as suggesting that coordination right now will be easier than we think.)
Hyperbolic discounting applies to negative as well, correct? Which means this could go either way.
In what situation should a longtermist (a person who cares about people in the future as much as they care about people in the present) ever do hyperbolic discounting
The question makes the assumption that "The World" is in any way coordinating or will ever coordinate to build AGI. I posit that "The World" has not and will not coordinate anything.
Are there global corporations and super power governments that are coordinating AI projects? Yes, but its not a singular AI project, The AI-sphere contains multitudes.
Also, AGI is a specific term and although its become more popular, its mostly a term Goertzel created because the term "AI" was being improperly used to label even simplistic statistical models like deep learning networks. At least that is how I saw it when I first read the term. I'm still looking for a free copy of AGI Revolution.
One factor no one mentions here is the changing nature of our ability to coordinate at all. If our ability to coordinate in general is breaking down rapidly, which seems at least highly plausible, then that will likely carry over to AGI, and until that reverses it will continuously make coordination on AGI harder same as everything else.
In general, this post and the answers felt strangely non-"messy" in that sense, although there's also something to be said for the abstract view.
In terms of inclusion, I think it's a question that deserves more thought, but I didn't feel like the answers here (in OP and below) were enlightening enough to merit inclusion.
The technologies for maintaining surveillance of would-be AGI developers improve.
Yeah, when I was reading Bostrom's Black Ball paper I wanted to yell many times, "Transparent Society would pretty much totally preclude all of this".
We need to talk a lot more about the outcome where surveillance becomes so pervasive that it's not dystopian any more (in short, "It's not a panopticon if ordinary people can see through the inspection house"), because it seems like 95% of x-risks would be averted if we could just all see what everyone is doing and coordinate. And that's on top of the more obvious benefits like, you know, the reduction of violent crime, the economic benefits of massive increase in openness.
Regarding technologies for defeating surveillance... I don't think falsification is going to be all that tough to solve (Scrying for outcomes where the problem of deepfakes has been solved).
If it gets to the point where multiple well sealed cameras from different manufacturers are validating every primary source and where so much of the surrounding circumstances of every event are recorded as well, and where everything is signed and timestamped in multiple locations the moment it happens, it's going to get pretty much impossible to lie about anything, no matter how good your fabricated video is, no matter how well you hid your dealings with your video fabricators operating in shaded jurisdictions, we must ask where you'd think you could slot it in, where people wouldn't notice the seams.
But of course, this will require two huge cultural shifts. One to transparency and another to actually legislate against AGI boxing, because right now if someone wanted to openly do that, no one could stop them. Lots of work to do.
This is a really good example of a possible cultural/technological change that would alter the coordination landscape substantially. Thanks.
FYI, here's a past Paul Christiano exploration of this topic:
Anyway, I did say that I thought there were lots of plausible angles, so I can try to give one. This is very off-the-cuff, it’s not a topic that I have yet thought about much though I expect to at some point.
Example: tagging advanced technology
Let’s say that a technology is “basic” if it is available in 2016; otherwise we say it is “advanced.” We would like to:
1. Give individuals complete liberty when dealing with basic technology.
2. Give individuals considerable liberty when dealing with advanced technology.
3. Prevent attackers from using advanced technologies developed by law-abiding society in order to help do something destructive .
We’ll try to engineer a property of being “tagged,” aiming for the following desiderata:
1. All artifacts embodying advanced technology, produced or partly produced by law-abiding citizens, are tagged.
2. All artifacts produced using tagged artifacts are themselves tagged.
3. Tagged artifacts are not destructive (in the sense of being much more useful for an agent who wants to destroy).
Property #1 is relatively easy to satisfy, since the law can require tagging advanced technology. Ideally tagging will be cheap and compatible with widely held ethical ideals, so that there is little incentive to violate such laws. The difficulty is achieving properties #2 and #3 while remaining cheap / agreeable.
The most brutish way to achieve properties #2 and #3 is to have a government agency X which retains control over all advanced artifacts. When you contribute an artifact to X they issue you a title. The title-holder can tell X what to do with an advanced artifact, and X will honor those recommendations so long as (1) the proposed use is not destructive, and (2) the proposed use does not conflict with X’s monopoly on control of advanced artifacts. The title-holder is responsible for bearing the costs associated with maintaining X’s monopoly — for example, if a title-holder would like to used advanced artifacts in a factory in Nevada, then X will need to physically defend that factory, and the title-holder must pay the associated costs.
(In this case, tagging = “controlled by X.”)
This system is problematic for a number of reasons. In particular: (1) it provides an objectionable level of power to the organization X itself, (2) it may impose significant overhead on the use of advanced artifacts, (3) it only works so long as X is able to understand the consequences of actions recommended by title-holders (further increasing overhead and invasiveness).
More clever tagging schemes can ameliorate these difficulties, and AI seems very helpful for that. For example, if we were better able to automate bureaucracies, we could ensure that power rests with a democratic process that controls X rather than with the bureaucrats who implement X (and could potentially address concerns with privacy). We could potentially reduce overhead for some artifacts by constructing them in such a way that their destructive power is limited without having to retain physical control. (This would be much easier if we could build powerful AI into advanced artifacts.) And so on. In general, the notion of “tagging” could be quite amorphous and subtle.
If we implemented some kind of tagging, then a would-be attacker’s situation in the future is not much better than it is today. They could attempt to develop advanced technology in parallel; if they did that without the use of other advanced artifacts then it would require the same kind of coordination that is currently beyond the ability of terrorist groups. If they did it with the use of tagged advanced artifacts, then their products would end up getting tagged.
This was a very important question that I had previously not even been thinking about – I had implicitly been assuming it was better to delay AGI. Now I'm mostly unsure, but do suspect coordination probably does get harder over time.
I'm curating this question.
I think I'd thought about each of the considerations Eli lists here, but I had not seen them listed out all at once and framed as a part of a single question before. I also had some sort of implicit background belief that longer timelines were better from a coordination standpoint. But as soon as I saw these concerns listed together, I realized that was not at all obvious.
So far none of the answers here seem that compelling to me. I'd be very interested in more comprehensive answers that try to weigh the various considerations at play.
I am still really confused that I hadn't really properly asked myself this question that crisply before this post came out. Like, it sure seems like a really key question.
Now, almost two years later I don't have fully amazing answers, but I do think that this decomposition has helped me a few times since then, but I also still really want to see more work on this question.
A Question post!
I think I want to write up a summary of the 2009 Nobel Prize book I own on commons governance. This post had me update to think it's more topically relevant than I realized.
The LW review could use more question posts, if the goal is to solidify something like a canon of articles to build on. A question invites responses. I am disappointed in the existing answers, which appear less thought through than the question. Good curation, good nomination.
Coordination to do something is hard, and possible only because it doesn't require everyone agree, only enough people to do the thing. Coordination NOT to do something that's obviously valuable (but carries risks) is _MUCH_ harder, because it requires agreement (or at least compliance and monitoring) from literally everyone.
It's not a question of getting harder or easier to coordinate over time - it's not possible to prevent AGI research now, and it won't become any less or more possible later. It's mostly a race to understand safety well enough to publish mechanisms to mitigate and reduce risks BEFORE a major self-improving AGI can be built by someone.
I'm a trained rationalist and all the things I've read precedently about AI being an existential risk were bullshit. But I know the Lesswrong community (which I respect) is involved in AI risk. So where can I find a concise, exhaustive list of all sound arguments pro and con AGI being likely an existential risk? If no such curated list exist, are people really caring about the potential issue?
I would like to update my belief about the risk. But I suppose that most people talking about AGI risk have not enough knowledge about what technically constitute an AGI. I'm currently building an AGI that aims to understand natural language and to optimally answer questions, internally satisfying a coded utilitarian effective altruistic finality system. The AGI take language as input and output natural language text. That's it. How can text be an existential risk is to be answered... There's no reason to give effectors to AGI, just asking her knowledge and optimal decision would be suffisant for revolutionizing humanity well being (e.g optimal politics), and the output would be analysed by rational humans, stopping it from AGI mistakes. As for thinking that an AGI will become self conscious, this is nonsense and I would be fascinated to be proved otherwise.
So where can I find a concise, exhaustive list of all sound arguments pro and con AGI being likely an existential risk?
Nick Bostrom’s book ‘Superintelligence’ is the standard reference here. I also find the AI FOOM Debate especially enlightening, which hits a lot of the same points. Both you can find easily using google.
But I suppose that most people talking about AGI risk have not enough knowledge about what technically constitute an AGI.
I agree most people who talk about it are not experts in mathematics, computer science, or the field of ML, but the smaller set of people that I trust often are, such as researchers at UC Berkeley (Stuart Russell, Andrew Critch, many more), OpenAI (Paul Christiano, Chris Olah, many more), DeepMind (Jan Leike, Vika Krakovna, many more), MIRI, FHI, and so on. And of course just being an expert in a related technical domain does not make you an expert in long-term forecasting or even AGI, of which there are plausibly zero people with deep understanding.
And in this community Eliezer has talked often about actually solving the hard problem of AGI, not bouncing off and solving something easier and nearby, in part here but also in other places I’m having a hard time linking right now.
Bostrom's book is a bit out of date, and perhaps isn't the best reference on the AI safety community's current concerns. Here are some more recent articles:
Thanks. I'll further add Paul's post What Failure Looks Like, and say that the Alignment Forum sequences raise a lot more specific technical concerns.
The AI asks for lots of info on biochemistry, and gives you a long list of chemicals that it claims cure various diseases. Most of these are normal cures. One of these chemicals will mutate the common cold into a lethal super plague. Soon we start some clinical trials of the various drugs, until someone with a cold takes the wrong one and suddenly the wold has a super plague.
The medial marvel AI is asked about the plague, It gives a plausible cover story for the plagues origins, along with describing an easy to make and effective vaccine. As casualties mount, humans rush to put the vaccine into production. The vaccine is designed to have an interesting side effect, a subtle modification of how the brain handles trust and risk. Soon the AI project leaders have been vaccinated. The AI says that it can cure the plague, it has a several billion base pair DNA file, that should be put into a bacterium. We allow it to output this file. We inspect it in less detail than we should have, given the effect of the vaccine, then we synthesize the sequence and put it in a bacteria. A few minutes later, the sequence bootstraps molecular nanotech. over the next day, the nanotech spreads around the world. Soon its exponentially expanding across the universe turning all matter into drugged out brains in vats. This is the most ethical action according to the AI's total utilitarian ethics.
The fundamental problem is that any time that you make a decision based on the outputs of an AI, that gives it a chance to manipulate you. If what you want isn't exactly what it wants, then it has incentive to manipulate.
(There is also the possibility of a side channel. For example, manipulating its own circuits to produce a cell phone signal, spinning its hard drive in a way that makes a particular sound, ect. Making a computer just output text, rather than outputing text, and traces of sound, microwaves and heat which can normally be ignored but might be maliciously manipulated by software, is hard)
I'm a trained rationalist
What training process did you go through? o.o
My understanding is that we don't really know a reliable way to produce anything that could be called a "trained rationalist", a label which sets impossibly high standards (in the view of a layperson) and is thus pretty much unusable. (A large part of becoming an aspiring rationalist involves learning how any agent's rationality is necessarily limited, laypeople have overoptimistic intuitions about that)
An AGI that can reason about it's own capabilities to decide how to spend resources might be more capable then one that can't reason about itself because it know better how to approach solving a given problem. It's plausible that a sufficiently complex neural net finds that this is a useful sub-feature and implements it.
I wouldn't expect Google translate to suddenly develop self consciousness but self consciousness is a tool that helps humans to reason better. Self consciousness allows us to reflect about our own action and think about how we should best approach a given problem.
An AGI that can reason about it's own capabilities to decide how to spend resources might be more capable then one that can't reason about itself because it know better how to approach solving a given problem. It's plausible that a sufficiently complex neural net finds that this is a useful subfeature and implements it.
(Or, is coordination easier in a long timeline?)
It seems like it would be good if the world could coordinate to not build AGI. That is, at some point in the future, when some number of teams will have the technical ability to build and deploy and AGI, but they all agree to voluntarily delay (perhaps on penalty of sanctions) until they’re confident that humanity knows how to align such a system.
Currently, this kind of coordination seems like a pretty implausible state of affairs. But I want to know if it seems like it becomes more or less plausible as time passes.
The following is my initial thinking in this area. I don’t know the relative importance of the factors that I listed, and there’s lots that I don’t understand about each of them. I would be glad for…
If coordination gets harder overtime, that’s probably because...
If coordination gets easier over time, that’s probably because…