I was going to post this story in the open thread, but it seems relevant here:
So my partner and I went to see the new Captain America movie, and at one point there is a scene involving an AI/mind upload, along with a mention of an Operation Paperclip. And my first thought was "Is that a real thing, or is someone on the writing staff a Less Wronger doing a shoutout? Because that would be awesome."
Turns out it was a real thing. :-( Oh well.
Something more interesting happened afterward. I mentioned the connection to my partner, said paperclips were an inside joke here. She asked me to explain, so I gave her a (very) brief rundown of some LW thought on AI to provide context for the concept of a paperclipper. Part of the conversation went like this:
"So, next bit of context, just because an AI isn't actively evil doesn't mean it won't try to kill us."
To which she responded:
"Well, of course not. I mean, maybe it decides killing us will solve some other problem it has."
And I thought: That click Eliezer was talking about in the Sequences? This seems like a case of it. What makes it interesting is that my partner doesn't have a Mensa-class intellect or an...
This is great! For a long time I've been saying that we need summaries at different lengths, and I see it's coming together now.
This one is good as an executive summary.
The next step is to produce a short summary with emotional appeal; a call to action. It's been noted that simply stating the problem of AI existential risk does not bring people on-board. Staring into the Singularity is an example of a emotionally appealing call to action (for outdated policies, however).
But I do not have any specific ideas for implementation, and again, this is excellent for the purpose it was designed for.
Something about the name-dropping and phrasing in the "super-committee" line is off-putting. I'm not sure how to fix it, though.
In the second to last paragraph you write "nut" instead of "not".
In the last paragraph you're using the word "either" when I think "each" or "both" would be more correct.
Mostly this looks good.
Is there a convenient place to see just what changed from the old to the new?
Online diff tools aren't usefully handling the paragraphs when I copy-paste, and my solution of download -> insert line breaks -> run through my favorite diff program is probably inconvenient for most.
As long as other humans exist in competition with other humans, there is now way to keep AI as safe AI.
As long as competitive humans exist, boxes and rules are futile.
The only way to stop hostile AI is to have no AI. Otherwise, expect hostile AI.
There really isn't a logical way around this reality.
Without competitive humans, you could box the AI, give it ONLY preventative primary goals (primarily: 1. don't lie 2. always ask before creating a new goal), and feed it limited-time secondary goals that expire upon inevitable completion. There can never be a strong AI that has continuous goals that aren't solely designed to keep the AI safe.
I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.
OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.
So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much "we are humans, we are smart, we can understand the goals of even an incredibly smart AGI"; it's "an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires."
So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I'm pretty skeptical of naive extrapolation in this domain anyway, given Eliezer's point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn't expect trends to be maintained across those shifts.
Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are.
You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.
..."an incredibly smart AGI is i
AI risk
Bullet points
Executive summary
The risks from artificial intelligence (AI) in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but extreme intelligence isn’t one of them. And it is precisely extreme intelligence that would give an AI its power, and hence make it dangerous.
The human brain is not much bigger than that of a chimpanzee. And yet those extra neurons account for the difference of outcomes between the two species: between a population of a few hundred thousand and basic wooden tools, versus a population of several billion and heavy industry. The human brain has allowed us to spread across the surface of the world, land on the moon, develop nuclear weapons, and coordinate to form effective groups with millions of members. It has granted us such power over the natural world that the survival of many other species is no longer determined by their own efforts, but by preservation decisions made by humans.
In the last sixty years, human intelligence has been further augmented by automation: by computers and programmes of steadily increasing ability. These have taken over tasks formerly performed by the human brain, from multiplication through weather modelling to driving cars. The powers and abilities of our species have increased steadily as computers have extended our intelligence in this way. There are great uncertainties over the timeline, but future AIs could reach human intelligence and beyond. If so, should we expect their power to follow the same trend? When the AI’s intelligence is as beyond us as we are beyond chimpanzees, would it dominate us as thoroughly as we dominate the great apes?
There are more direct reasons to suspect that a true AI would be both smart and powerful. When computers gain the ability to perform tasks at the human level, they tend to very quickly become much better than us. No-one today would think it sensible to pit the best human mind again a cheap pocket calculator in a contest of long division. Human versus computer chess matches ceased to be interesting a decade ago. Computers bring relentless focus, patience, processing speed, and memory: once their software becomes advanced enough to compete equally with humans, these features often ensure that they swiftly become much better than any human, with increasing computer power further widening the gap.
The AI could also make use of its unique, non-human architecture. If it existed as pure software, it could copy itself many times, training each copy at accelerated computer speed, and network those copies together (creating a kind of “super-committee” of the AI equivalents of, say, Edison, Bill Clinton, Plato, Einstein, Caesar, Spielberg, Ford, Steve Jobs, Buddha, Napoleon and other humans superlative in their respective skill-sets). It could continue copying itself without limit, creating millions or billions of copies, if it needed large numbers of brains to brute-force a solution to any particular problem.
Our society is setup to magnify the potential of such an entity, providing many routes to great power. If it could predict the stock market efficiently, it could accumulate vast wealth. If it was efficient at advice and social manipulation, it could create a personal assistant for every human being, manipulating the planet one human at a time. It could also replace almost every worker in the service sector. If it was efficient at running economies, it could offer its services doing so, gradually making us completely dependent on it. If it was skilled at hacking, it could take over most of the world’s computers and copy itself into them, using them to continue further hacking and computer takeover (and, incidentally, making itself almost impossible to destroy). The paths from AI intelligence to great AI power are many and varied, and it isn’t hard to imagine new ones.
Of course, simply because an AI could be extremely powerful, does not mean that it need be dangerous: its goals need not be negative. But most goals become dangerous when an AI becomes powerful. Consider a spam filter that became intelligent. Its task is to cut down on the number of spam messages that people receive. With great power, one solution to this requirement is to arrange to have all spammers killed. Or to shut down the internet. Or to have everyone killed. Or imagine an AI dedicated to increasing human happiness, as measured by the results of surveys, or by some biochemical marker in their brain. The most efficient way of doing this is to publicly execute anyone who marks themselves as unhappy on their survey, or to forcibly inject everyone with that biochemical marker.
This is a general feature of AI motivations: goals that seem safe for a weak or controlled AI, can lead to extremely pathological behaviour if the AI becomes powerful. As the AI gains in power, it becomes more and more important that its goals be fully compatible with human flourishing, or the AI could enact a pathological solution rather than one that we intended. Humans don’t expect this kind of behaviour, because our goals include a lot of implicit information, and we take “filter out the spam” to include “and don’t kill everyone in the world”, without having to articulate it. But the AI might be an extremely alien mind: we cannot anthropomorphise it, or expect it to interpret things the way we would. We have to articulate all the implicit limitations. Which may mean coming up with a solution to, say, human value and flourishing – a task philosophers have been failing at for millennia – and cast it unambiguously and without error into computer code.
Note that the AI may have a perfect understanding that when we programmed in “filter out the spam”, we implicitly meant “don’t kill everyone in the world”. But the AI has no motivation to go along with the spirit of the law: its goals are the letter only, the bit we actually programmed into it. Another worrying feature is that the AI would be motivated to hide its pathological tendencies as long as it is weak, and assure us that all was well, through anything it says or does. This is because it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get anywhere. Only when we can no longer control it, would it be willing to act openly on its true goals – we can but hope these turn out safe.
It is not certain that AIs could become so powerful, nor is it certain that a powerful AI would become dangerous. Nevertheless, the probabilities of both are high enough that the risk cannot be dismissed. The main focus of AI research today is creating an AI; much more work needs to be done on creating it safely. Some are already working on this problem (such as the Future of Humanity Institute and the Machine Intelligence Research Institute), but a lot remains to be done, both at the design and at the policy level.