AI risk, new executive summary

12 Post author: Stuart_Armstrong 18 April 2014 10:45AM

AI risk

Bullet points

  • By all indications, an Artificial Intelligence could someday exceed human intelligence.
  • Such an AI would likely become extremely intelligent, and thus extremely powerful.
  • Most AI motivations and goals become dangerous when the AI becomes powerful.
  • It is very challenging to program an AI with fully safe goals, and an intelligent AI would likely not interpret ambiguous goals in a safe way.
  • A dangerous AI would be motivated to seem safe in any controlled training setting.
  • Not enough effort is currently being put into designing safe AIs.

 

Executive summary

The risks from artificial intelligence (AI) in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but extreme intelligence isn’t one of them. And it is precisely extreme intelligence that would give an AI its power, and hence make it dangerous.

The human brain is not much bigger than that of a chimpanzee. And yet those extra neurons account for the difference of outcomes between the two species: between a population of a few hundred thousand and basic wooden tools, versus a population of several billion and heavy industry. The human brain has allowed us to spread across the surface of the world, land on the moon, develop nuclear weapons, and coordinate to form effective groups with millions of members. It has granted us such power over the natural world that the survival of many other species is no longer determined by their own efforts, but by preservation decisions made by humans.

In the last sixty years, human intelligence has been further augmented by automation: by computers and programmes of steadily increasing ability. These have taken over tasks formerly performed by the human brain, from multiplication through weather modelling to driving cars. The powers and abilities of our species have increased steadily as computers have extended our intelligence in this way. There are great uncertainties over the timeline, but future AIs could reach human intelligence and beyond. If so, should we expect their power to follow the same trend? When the AI’s intelligence is as beyond us as we are beyond chimpanzees, would it dominate us as thoroughly as we dominate the great apes?

There are more direct reasons to suspect that a true AI would be both smart and powerful. When computers gain the ability to perform tasks at the human level, they tend to very quickly become much better than us. No-one today would think it sensible to pit the best human mind again a cheap pocket calculator in a contest of long division. Human versus computer chess matches ceased to be interesting a decade ago. Computers bring relentless focus, patience, processing speed, and memory: once their software becomes advanced enough to compete equally with humans, these features often ensure that they swiftly become much better than any human, with increasing computer power further widening the gap.

The AI could also make use of its unique, non-human architecture. If it existed as pure software, it could copy itself many times, training each copy at accelerated computer speed, and network those copies together (creating a kind of “super-committee” of the AI equivalents of, say, Edison, Bill Clinton, Plato, Einstein, Caesar, Spielberg, Ford, Steve Jobs, Buddha, Napoleon and other humans superlative in their respective skill-sets). It could continue copying itself without limit, creating millions or billions of copies, if it needed large numbers of brains to brute-force a solution to any particular problem.

Our society is setup to magnify the potential of such an entity, providing many routes to great power. If it could predict the stock market efficiently, it could accumulate vast wealth. If it was efficient at advice and social manipulation, it could create a personal assistant for every human being, manipulating the planet one human at a time. It could also replace almost every worker in the service sector. If it was efficient at running economies, it could offer its services doing so, gradually making us completely dependent on it. If it was skilled at hacking, it could take over most of the world’s computers and copy itself into them, using them to continue further hacking and computer takeover (and, incidentally, making itself almost impossible to destroy). The paths from AI intelligence to great AI power are many and varied, and it isn’t hard to imagine new ones.

Of course, simply because an AI could be extremely powerful, does not mean that it need be dangerous: its goals need not be negative. But most goals become dangerous when an AI becomes powerful. Consider a spam filter that became intelligent. Its task is to cut down on the number of spam messages that people receive. With great power, one solution to this requirement is to arrange to have all spammers killed. Or to shut down the internet. Or to have everyone killed. Or imagine an AI dedicated to increasing human happiness, as measured by the results of surveys, or by some biochemical marker in their brain. The most efficient way of doing this is to publicly execute anyone who marks themselves as unhappy on their survey, or to forcibly inject everyone with that biochemical marker.

This is a general feature of AI motivations: goals that seem safe for a weak or controlled AI, can lead to extremely pathological behaviour if the AI becomes powerful. As the AI gains in power, it becomes more and more important that its goals be fully compatible with human flourishing, or the AI could enact a pathological solution rather than one that we intended. Humans don’t expect this kind of behaviour, because our goals include a lot of implicit information, and we take “filter out the spam” to include “and don’t kill everyone in the world”, without having to articulate it. But the AI might be an extremely alien mind: we cannot anthropomorphise it, or expect it to interpret things the way we would. We have to articulate all the implicit limitations. Which may mean coming up with a solution to, say, human value and flourishing – a task philosophers have been failing at for millennia – and cast it unambiguously and without error into computer code.

Note that the AI may have a perfect understanding that when we programmed in “filter out the spam”, we implicitly meant “don’t kill everyone in the world”. But the AI has no motivation to go along with the spirit of the law: its goals are the letter only, the bit we actually programmed into it. Another worrying feature is that the AI would be motivated to hide its pathological tendencies as long as it is weak, and assure us that all was well, through anything it says or does. This is because it will never be able to achieve its goals if it is turned off, so it must lie and play nice to get anywhere. Only when we can no longer control it, would it be willing to act openly on its true goals – we can but hope these turn out safe.

It is not certain that AIs could become so powerful, nor is it certain that a powerful AI would become dangerous. Nevertheless, the probabilities of both are high enough that the risk cannot be dismissed. The main focus of AI research today is creating an AI; much more work needs to be done on creating it safely. Some are already working on this problem (such as the Future of Humanity Institute and the Machine Intelligence Research Institute), but a lot remains to be done, both at the design and at the policy level.


Comments (76)

Comment author: JoshuaFox 18 April 2014 12:29:02PM *  3 points [-]

This is great! For a long time I've been saying that we need summaries at different lengths, and I see it's coming together now.

This one is good as an executive summary.

The next step is to produce a short summary with emotional appeal; a call to action. It's been noted that simply stating the problem of AI existential risk does not bring people on-board. Staring into the Singularity is an example of a emotionally appealing call to action (for outdated policies, however).

But I do not have any specific ideas for implementation, and again, this is excellent for the purpose it was designed for.

Comment author: Error 19 April 2014 04:18:29PM 9 points [-]

I was going to post this story in the open thread, but it seems relevant here:

So my partner and I went to see the new Captain America movie, and at one point there is a scene involving an AI/mind upload, along with a mention of an Operation Paperclip. And my first thought was "Is that a real thing, or is someone on the writing staff a Less Wronger doing a shoutout? Because that would be awesome."

Turns out it was a real thing. :-( Oh well.

Something more interesting happened afterward. I mentioned the connection to my partner, said paperclips were an inside joke here. She asked me to explain, so I gave her a (very) brief rundown of some LW thought on AI to provide context for the concept of a paperclipper. Part of the conversation went like this:

"So, next bit of context, just because an AI isn't actively evil doesn't mean it won't try to kill us."

To which she responded:

"Well, of course not. I mean, maybe it decides killing us will solve some other problem it has."

And I thought: That click Eliezer was talking about in the Sequences? This seems like a case of it. What makes it interesting is that my partner doesn't have a Mensa-class intellect or any significant exposure to the Less Wrong memeplex. Which suggests that clicking on the dangers of...call it non-ethical AI, as opposed to un-ethical, unless there's already a more standard term for the class of AI's that contains paperclippers but not Skynet...isn't limited to the high-IQ bubble.

That may not be news to MIRI, but it seemed worth commenting about here. Because we are a high IQ bubble. And that's part of why I like coming here. But I'm sure MIRI would be pleased to reach outside the bubble.

(of interest: Obviously the first connection she drew from dangerous AI was Skynet...but once I described the idea of an AI that was neutral-but-still-dangerous, the second connection she made was to Kyubey. And that felt sort-of-right to me. I told her that was the right idea but didn't go far enough.)

Comment author: Kaj_Sotala 21 April 2014 09:59:54AM 1 point [-]

I've seen plenty of high-IQ folks actively resisting that particular click. I think it has more to do with something like your degree of cynicism rather than your IQ: if you like to think of most people as inherently good, and want to think of people as inherently good, then you may also want to resist the thought of AIs as being dangerous by default.

Comment author: TheAncientGeek 21 April 2014 12:03:15PM *  1 point [-]

There's a significant difference between "might" and "by default"

Comment author: shminux 20 April 2014 07:55:23AM *  -1 points [-]

Re goals, I feel that comparing advanced AGI to humans is like comparing humans to chimps: regardless how much we want to explain human ethics and goals to a chimp, and how much effort we put in, its mind just isn't equipped to comprehend them. Similarly, even the most benevolent and conscientious AGI would be unable to explain its goal system or its ethical system to even a very smart human. Like chimps, humans have their own limits of comprehension, even though we do not know what they are from the inside.

Comment author: [deleted] 20 April 2014 04:08:45PM *  1 point [-]

What are your reasons for thinking this? I find myself disagreeing: one big disanalogy is that while we have language and chimps do not, we and the AGI both have language. I find it implausible that the AGI could not in principle communicate to us its goals: give the AGI and ourselves an arbitrarily large amount of time and resources to talk, do you really think we'd never come to a common understanding? Because even if we don't, the AGI effectively does have such resources by which it might, I donno, choose its words with care.

I'm also not sure why we should think it would even be particularly challenging to understand the goals of an AGI. It's not easy even with other humans, but why would it be much harder with AGI? Do we have some reason to expect its goals to be more complex than ours? It's been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be. My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.

Comment author: shminux 20 April 2014 05:31:11PM 2 points [-]

I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

Certainly language is important, and human language is much more evolved than that of other animals. There are parts of human language, like writing, which are probably inaccessible to chimps, no matter how much effort we put into teaching them and how patient we are. I can easily imagine that AGI would use some kind of "meta-language", because human language would simply be inadequate for expressing its goals, like the chimp language is inadequate for expressing human metaethics.

I do not know what this next step would be, no more than an intelligent chimp being able to predict that humans would invent writing. My mind as-is is too limited and I understand as much. An AGI would have to make me smarter first, before being able to explain what it means to me. Call it "human uplifting".

Do we have some reason to expect its goals to be more complex than ours?

Yes, if you look through the tower of goals, more intelligent species have more complex goals.

It's been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be.

It has not been mine. When someone smarter than I am behaves a certain way, they have to patiently explain to me why they do what they do. And I still only see the path they have taken, not the million paths they briefly considered and rejected along the way.

My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.

My prejudice tells me that when someone a few levels above mine tries to explain their goals and motivations to me in English, I may understand each word, but not the complete sentences. If you cannot relate to this experience, go to a professional talk on a subject you know nothing about. For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her. Certainly some of this gap can be patched to a degree, and after a decade or so of dedicated work by both sides, wrought with frustration and doubt, but I don't think if the gap is wide enough it can be bridged completely.

I find the line of thinking "we are humans, we are smart, we can understand the goals of even an incredibly smart AGI" to be naive, unimaginative and closed-minded, given that our experience is rife with counterexamples.

Comment author: pragmatist 21 April 2014 03:55:19PM *  1 point [-]

I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.

So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much "we are humans, we are smart, we can understand the goals of even an incredibly smart AGI"; it's "an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires."

So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I'm pretty skeptical of naive extrapolation in this domain anyway, given Eliezer's point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn't expect trends to be maintained across those shifts.

Comment author: shminux 21 April 2014 05:25:25PM *  -1 points [-]

Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are.

You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.

"an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires."

This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.

Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me.

Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).

I'm pretty skeptical of naive extrapolation in this domain anyway, given Eliezer's point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn't expect trends to be maintained across those shifts.

Indeed, we shouldn't. I probably phrased my point poorly. What I tried to convey is that because "major advances in optimization power are meta-level qualitative shifts", confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.

Comment author: EHeller 20 April 2014 11:33:15PM *  1 point [-]

For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her.

Thats because you weren't really speaking english, you were speaking the english words for math terms related to physics. The people who spoke the relevant math you were alluding to could follow, those who didn't, could not, because they didn't have concrete mathematical ideas to tie the words to. Its not just a matter of jargon, its an actual language barrier. I think you'd find, with a jargon cheat sheet, you could follow many non-mathematical phd defenses just fine.

The same thing happens in music, which is its own language (after years of playing, I find I can "listen" to a song by reading sheet music).

Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?

Comment author: shminux 21 April 2014 12:06:32AM -1 points [-]

Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?

"mathematics" may be a wrong word for it. I totally think that a transhuman can create concepts and ideas which a mere human cannot understand even when patiently explained. I am quite surprised that other people here don't find it an obvious default.

Comment author: Armok_GoB 21 April 2014 12:55:48PM 0 points [-]

My impression was the question was not if it'd have those concepts, since as you say thats obvious, but if they'd be referenced necessarily by the utility function.

Comment author: EHeller 21 April 2014 01:43:44AM *  0 points [-]

Sure, but I find "can't understand" sort of fuzzy as a concept. i.e. I wouldn't say I 'understand' compactification and calabi yau manifolds the same way I understand sheet music (or the same way I understand the word green), but I do understand them all in some way.

It seems unlikely to me that there exist concepts that can't be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.

for example- it seems unlikely there exists a law of physics that cannot be expressed as an equation. It seems equally unlikely there exists an equation I would be totally incapable of working with. Even if I'll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.

Comment author: Armok_GoB 21 April 2014 01:40:18PM 1 point [-]

Human languages can encode anything, but a human can't understand most things valid in human languages; most notably, extremely long things, and numbers specified with a lot of digits that actually matters. Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.

Humans seem "concept-complete" in much the same way your desktop PC seems turing complete. Except it's much more easily broken because the human brain has absurdly shity memory.

Comment author: EHeller 21 April 2014 10:36:11PM *  1 point [-]

numbers specified with a lot of digits that actually matters

Thats why we have paper, I can write it down. "Understanding" and "remembering" seem somewhat orthogonal here. I can't recite Moby Dick from memory, but I understood the book. If you give me a 20 digit number 123... and I can't hold it but retain "a number slightly larger than 1.23 * 10^20," that doesn't mean I can't understand you.

Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.

Print it out for me, and give me enough time, and I will be able to understand it, especially if you give me some context.

Yes, you can encode things in a way that make them harder for humans to understand, no one would argue that. The question is- are there concepts that are simply impossible to explain to a human? I point out that while I can't remember a 20 digit number, I can derive pretty much all of classical physics, so certainly humans can hold quite complex ideas in their head, even if they aren't optimized for storage of long numbers.

Comment author: Armok_GoB 22 April 2014 01:52:26AM 0 points [-]

You can construct a system consisting of a planet's worth of paper and pencils and an immortal version of yourself (or a vast dynasty of successors) that can understand it, if nothing else because it's turing complete and can simulate the AGI. this is not the same as you understanding it while still remaining fully human. Even if you did somehow integrate the paper-system sufficiently that'd be just as big a change as uploading and intelligence-augmenting the normal way.

The approximation thing is why I specified digits mattering. It wont help one bit when talking about something like gödel numbering.

Comment author: shminux 21 April 2014 02:43:43AM -1 points [-]

It seems unlikely to me that there exist concepts that can't be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.

My intuition is the exact opposite.

it seems unlikely there exists a law of physics that cannot be expressed as an equation

I can totally imagine that some models are not reducible to equations, but that's not the point, really.

Even if I'll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.

Unless this "use" requires more brainpower than you have... You might still be able to work with some simplified version, but you'd have to have transhuman intelligence to "do things" with the full equation.

Comment author: EHeller 21 April 2014 03:17:10AM 0 points [-]

Unless this "use" requires more brainpower than you have...

But that seems incredibly nebulous. What is the exact failure mode?

Comment author: nshepperd 21 April 2014 03:49:09AM 0 points [-]

To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

Yes, if you look through the tower of goals, more intelligent species have more complex goals.

This seems like a bogus use of the outside view. AGI is qualitatively different to evolved intelligence, in that it is not evolved, but built by a lesser intelligence. Moreover, there's a simple explanation for the observation that more intelligent animals have more complex goals, which is that more intelligence permits more subgoals, and natural selection generally alters a species' goals by adding, rather than simplifying. This is pretty much totally inapplicable to a constructed AGI.

Comment author: shminux 21 April 2014 05:00:02AM *  -2 points [-]

I'd love to hear what actual AGI experts think about it, not just us idle forum dwellers.

Comment author: [deleted] 20 April 2014 07:29:53PM *  0 points [-]

I will try to refute you by understanding what you say. So could you explain to me this idea of a 'meta-language'? I guess that by 'meta-' you intend to say that at least some sentences in the meta-language couldn't in principle be translated into a non-meta 'human' language. Is that right?

given that our experience is rife with counterexamples.

This is not a given. I've been to plenty of dissertation defenses on topics I know little to nothing about, and you're right that I'm often at a loss. But this, I find, is because the understanding of a newly minted doctor is too narrow and too newborn to be easily understood. PhD defenses are not the place to go to find people who really get something, they're the place to go to find someone who's just now gotten a foothold. My experience is still that the more intelligent and experienced PhDs tend to be more intelligible. But this is a little beside the point: PhDs tend to be hard to understand, when they are, because they're discussing something quite complex.

What reason do you have for thinking an AGI's goals would be complex at all? If your reasoning is that human beings that are more intelligent tend to have more complex goals (I don't agree, but say I grant this) why do you think an AGI will be so much like an intelligent human being?

Comment author: shminux 20 April 2014 10:22:39PM -1 points [-]

I will try to refute you by understanding what you say.

I am not sure what you mean by "refute" here. Prove my conjecture wrong by giving a counterexample? Show that my arguments are wrong? Show that the examples I used to make my point clearer are bad examples? If it's the last one, but then I would not call it a refutation.

I guess that by 'meta-' you intend to say that at least some sentences in the meta-language couldn't in principle be translated into a non-meta 'human' language. Is that right?

Indeed, at least not without some extra layer of meaning not originally expressed in the language. To give another example (not a proof, just an illustration of my point), you can sort-of teach a parrot or an ape to recognize words, to count and maybe even to add, but I don't expect it to be possible to teach one to construct mathematical proofs or to understand what one even is. Even if a proof can be expressed as a finite string of symbols (a sentence in a language) a chimp is capable of distinguishing from another string. There is just too much meta there, with symbols standing for other symbols or numbers or concepts.

I agree that my PhD defense example is not a proof, but an illustration meant to show that humans quite often experience a disconnect between a language ans an underlying concept, which well might be out of reach, despite being expressed with familiar symbols, just like a chimp would in the above example.

What reason do you have for thinking an AGI's goals would be complex at all?

I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.

Comment author: Armok_GoB 21 April 2014 02:00:26PM *  0 points [-]

I can in fact imagine what else a super-intelligence would use instead of a goal system. A bunch of different ones even. For example, a lump of incomprehensible super-solomonoff-compressed code that approximates a hypercomputer simulating a multiverse with the utility function as an epiphenomenal physical law feeding backwards in time to the AIs actions. Or a carefully tuned decentralized process (think natural selection, or the invisible hand) found to match the AIs previous goals exactly by searching through an infinite platonic space.

(yes, half of those are not real words; the goal was to imagine something that per definition could not be understood, so it's hard to do better than vaguely pointing in the direction of a feeling.)

Edit: I forgot: "goal system replaced by completely arbitrary thing that resembles it even less because it was traded away counterfactually to another part of tegmark-5"

Comment author: [deleted] 20 April 2014 10:51:40PM *  0 points [-]

I am not sure what you mean by "refute" here.

It was just a joke: I meant that I would prove you wrong by showing that I can understand you, despite the difference in our intellectual faculties. I don't really know if we have very different intellectual faculties; it was just a slightly ironic reposte to being called "naive, unimaginative and closed-minded" earlier. You may be right! But then my understanding you is at least a counterexample.

you can sort-of teach a parrot or an ape to recognize words

Can we taboo the 'animals can't be made to understand us' analogy? I don't think it's a good analogy, and I assume you can express your point without it. It certainly can't be the substance of your argument.

Anyway, would you be willing to agree to this: "There are at least some sentences in the meta-language (i.e. the kind of language an AGI might be capable of) such that those sentences cannot be translated into even an arbitrarily complex expressions in human language." For example, there will be sentences in the meta-language that cannot be expressed in human language, even if we allow the users of human language (and the AGI) an arbitrarily large amount of time, an arbitrarily large number of attempts at conversation, question and answer, etc. an arbitrarily large capacity for producing metaphor, illustration, etc. Is that your view? Or is that far too extreme? Do you just mean to say that the average human being today couldn't get their heads around an AGI's goals given 40 minutes, pencil, and paper? Or something in between these two claims?

I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.

Why do you think this is a strong argument? It strikes me as very indirect and intuitionistic. I mean, I see what you're saying, but I'm not at all confident that the relations between a protozoa and a fish, a dog and a chimp, a 8th century dock worker and a 21st century physicist, and the smartest of (non-uplifted) people and an AGI all fall onto a single continuum of intelligence/complexity of goals. I don't even know what kind of empirical evidence (I mean the sort of think one would find in a scientific journal) could be given in favor of such a conclusion. I just don't really see why you're so confident in this conclusion.

Comment author: Armok_GoB 21 April 2014 02:08:21PM 0 points [-]

Using "even an arbitrarily complex expressions in human language" seem unfair, given that it's turing complete but describing even a simple program in it fully in it without external tools will far exceed the capability of any actual human except for maybe a few savants that ended up highly specialized towards that narrow kind of task.

Comment author: [deleted] 21 April 2014 02:23:13PM *  0 points [-]

I agree, but I was taking the work of translation to be entirely on the side of an AGI: it would take whatever sentences it thinks in a meta-language and translate them into human language. Figuring out how to express such thoughts in our language would be a challenging practical problem, but that's exactly where AGI shines. I'm assuming, obviously, that it wants to be understood. I am very ready to agree that an AGI attempting to be obscure to us will probably succeed.

Comment author: Armok_GoB 22 April 2014 01:56:27AM 1 point [-]

Thats obvious and not what I meant. I'm talking about the simplest possible in principle expression in the human language being that long and complex.

Comment author: shminux 20 April 2014 11:16:40PM -1 points [-]

it was just a slightly ironic reposte to being called "naive, unimaginative and closed-minded" earlier. You may be right! But then my understanding you is at least a counterexample.

Sorry, didn't mean to call you personally any of those adjectives :)

Anyway, would you be willing to agree to this [...]

Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn't so.

but I'm not at all confident that the relations between [...] fall onto a single continuum of intelligence/complexity of goals.

If you agree with Eliezer's definition of intelligence as optimization power, then shouldn't we be able to express this power as a number? If so, the difference between difference intelligences is only that of scale.

Comment author: [deleted] 21 April 2014 12:57:29AM *  0 points [-]

Sorry, didn't mean to call you personally any of those adjectives :)

None taken then.

Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn't so.

Well, tell me what you think of this argument:

Lets divide the meta-language into two sets: P (the sentences that cannot be rendered in English) and Q (the sentences that can). If you expect Q to be empty, then let me know and we can talk about that case. But let's assume for now that Q is not empty, since I assume we both think that an AGI will be able to handle human language quite easily. Q is, for all intents and purposes, a 'human' language itself.

Premise one is that that translation is transitive: if I can translate language a into language b, and language b into language c, then I can translate language a into language c (maybe I need to use language b as an intermediate step, though).

Premise two: If I cannot translate a sentence in language a into an expression in language b, then there is no expression in language b that expresses the same thought as that sentence in language a.

Premise three: Any AGI would have to learn language originally from us, and thereafter either from us or from previous versions of itself.

So by stipulation, every sentence in Q can be rendered in English, and Q is non-empty. If any sentence in P cannot be rendered in English, then it follows from premise one that sentences in P cannot be rendered in sentences in Q (since then they could thereby be rendered into English). It also follows, if you accept premise two, that Q cannot express any sentence in P. So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).

Hence, no AGI, beginning from a language like English could go on to learn how to express any sentence in P. Therefore no AGI will ever know P.

I'm not super confident this argument is sound, but it seems to me to be at least plausible.

If you agree with Eliezer's definition of intelligence as optimization power

Well, that's a fine definition, but it's tricky in this case. Because if intelligence is optimization power, and optimizing presupposes something to optimize, then intelligence (on that definition) isn't strictly a factor in (ultimate) goal formation. If that's right, than something's being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.

Comment author: Armok_GoB 21 April 2014 02:15:32PM 0 points [-]

Premise one is false assuming finite memory.

Premise 3 does not hold well either; Many new words come from pointing out a pattern in the environment, not from defining in terms of previous words.

Comment author: Jiro 21 April 2014 02:21:39AM 0 points [-]

By this reasoning no AGI beginning from English could ever know French either, for similar reasons. (Note that every language has sentences that cannot be rendered in another language, in the sense that someone who knows the truth value of the unrendered sentence can know the truth value of the rendered sentence; consider variations on Godel-undecideable sentences.)

Comment author: shminux 21 April 2014 02:36:23AM -1 points [-]

So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).

Honestly, I expected you to do a bit more steelmanning with the examples I gave. Or maybe you have, just didn't post them here, Anyway, does the quote mean that any English sentence can be expressed in Chimp, since we evolved from a common ancestor? If you don't claim that (I hope you don't), then where did your logic stop applying to humans and chimps vs AGI and humans? Presumably it's relying on the Premise 3 that gets us a wrong conclusion in the English/Chimp example, since it is required to construct an unbroken chain of languages. What happened to humans over their evolution which made them create Q out of P where Q is not reducible to P? And if this is possible in the mindless evolutionary process, then would it not be even more likely during intelligence explosion?

If that's right, than something's being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.

I don't understand this point. I would expect the terminal goals evolve as the evolving intelligence understands more and more about the world. For example, for many people here the original terminal goal was, ostensibly, "serve God". Then they stopped believing and now their terminal goal is more like "do good". Similarly, I would expect an evolving AGI to adjust its terminal goals as the ones it had before are obsoleted, not because they have been reached, but because they become meaningless.

Comment author: hairyfigment 20 April 2014 07:05:33PM -1 points [-]

My hangup is that it seems like a truly benevolent AI would share our goals. And in a sense your argument "only" applies to instrumental goals, or to those developed through self-modification. (Amoebas don't design fish.) I'll grant it might take a conversation forever to reach the level we'd understand.

Comment author: shminux 20 April 2014 08:56:51PM 0 points [-]

My hangup is that it seems like a truly benevolent AI would share our goals.

In the way that a "truly benevolent" human would leave an unpolluted lake for fish to live in, instead of using it for its own purposes. The fish might think that humans share its goals, but the human goals would be infinitely more complex than fish could understand.

Comment author: hairyfigment 20 April 2014 10:54:13PM -1 points [-]

...It sounds like you're hinting at the fact that humans are not benevolent towards fish. If we are, then we do share its goals when it comes to outcomes for the fish - we just have other goals, which do not conflict. (I'm assuming the fish actually has clear preferences.) And a well-designed AI should not even have additional goals. The lack of understanding "only" might come in with the means, or with our poor understanding of our own preferences.

Comment author: christopherj 27 April 2014 02:59:59PM 0 points [-]

Do we have some reason to expect [an AGI's] goals to be more complex than ours?

I find myself agreeing with you -- human goals are a complex mess, which we seldom understand ourselves. We don't come with clear inherent goals, and what goals we do have we abuse by using things like sugar and condoms instead of eating healthy and reproducing like we were "supposed" to. People have been asking about the meaning of life for thousands of years, and we still have no answer.

An AI on the other hand, could have very simple goals -- make paperclips, for example. An AI's goals might be completely specified in two words. It's the AI's sub-goals and plans to reach its goals that I doubt I could comprehend. It's the very single-mindedness of an AI's goals and our inability to comprehend our own goals, plus the prospect of an AI being both smarter and better at goal-hacking than us, that has many of us fearing that we will accidentally kill ourselves via non-friendly AI. Not everyone will think to clarify "make paperclips" with, "don't exterminate humanity", "don't enslave humanity", "don't destroy the environment", "don't reprogram humans to desire only to make paperclips", and various other disclaimers that wouldn't be necessary if you were addressing a human (and we don't know the full disclaimer list either).

Comment author: TheOtherDave 20 April 2014 05:20:37PM 1 point [-]

Can you say more about what you're expecting a successful explanation to comprise, here?

E.g., suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them. I expect to be significantly better at predicting the AGI's rankings than I was before the explanation.

I don't expect to be able to do anything equivalent with a chimp.

Do our expectations differ here?

Comment author: shminux 20 April 2014 06:03:33PM *  1 point [-]

E.g., suppose an AGI attempts to explain its ethics and goals to me

"Suppose an AGI attempts to explain its <untranslatable1> and <untranslatable2> to me" is what I expect it to sound like to humans if we were to replace human abstractions with those an advanced AGI would use. It would not even call these abstractions "ethics" or "goals", no more than we call ethics "groom" and goals "sex" when talking to a chimp.

suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them.

I do not expect it to be able to generate such descriptions at all, due to the limitations of the human mind and human language. So, yes, our expectations differ here. I do not think that human intelligence reached some magical threshold where everything can be explained to it, given enough effort, even though it was not possible with "less advanced" animals. For all I know, I am not even using the right terms. Maybe an AGI improvement on the term "explain" is incomprehensible to us. Like if we were to translate "explain" into chimp or cat it would come out as "show", or something.

Comment author: TheOtherDave 20 April 2014 10:44:12PM *  0 points [-]

(shrug) Translating the terms is rather beside my point here.

If the AGI is using these things to choose among possible future worlds, then I expect it to be able to teach me to choose among possible future worlds more like it does than I would without that explanation.

I'm happy to call those things goals, ethics, morality, etc., even if those words don't capture what the AGI means by them. (I don't know that they really capture what I mean by them either, come to that.) Perhaps I would do better to call them "groom" or "fleem" or "untranslatable1" or refer to them by means of a specific shade of orange. I don't know; but as I say, I don't really care; terminology is largely independent of explanation.

But, sure, if you expect that it's incapable of doing that, then our expectations differ.

I'll note that my expectations don't depend on my having reached a magical threshold, or on everything being explainable to me given enough effort.

Comment author: Armok_GoB 21 April 2014 12:48:47PM *  0 points [-]

It might not be possible to "truly comprehend" the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can't comprehend. And it doesn't seem unreasonable that this code of behavior wouldn't have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you'd expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn't accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.

Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I'm forgetting something right now. (I probably am)

There, question dissolved.

Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don't need to understand something to be able to construct true statements about it or even direct it's expression powerfully to have properties you can reference but don't understand either, especially if you have access to external tools.

Comment author: TheAncientGeek 20 April 2014 08:55:12AM *  0 points [-]

Is the problems supposed to be that the human doesn't have enough intelligence, or that we have some kind of highly parochial rationality?

Comment author: shminux 20 April 2014 05:34:15PM 0 points [-]

Not enough intelligence, yes. And rationality is a part of intelligence. Also, see my reply to hen.

Comment author: TheAncientGeek 21 April 2014 10:57:47AM *  -1 points [-]

But that's not ready analogous to the human champ gap, which is qualitative....chimps don't have language.

Comment author: V_V 20 April 2014 09:04:19AM 1 point [-]

How do you know that Skynet is not a paperclipper?

Comment author: ygert 20 April 2014 11:43:13AM *  2 points [-]

By observing the lack of an unusual amount of paperclips in the world which Skynet inhabits.

Comment author: private_messaging 22 April 2014 02:51:23PM 0 points [-]

Skynet kills people as secondary to it's self preservation, too.

Perhaps it is just a very banal insight that doesn't really shed any light on what an AI is likely to do.

Comment author: roystgnr 22 April 2014 06:30:11PM 2 points [-]

Something about the name-dropping and phrasing in the "super-committee" line is off-putting. I'm not sure how to fix it, though.

In the second to last paragraph you write "nut" instead of "not".

In the last paragraph you're using the word "either" when I think "each" or "both" would be more correct.

Mostly this looks good.

Comment author: kokotajlod 30 April 2014 11:08:11PM 0 points [-]

Something about the name-dropping and phrasing in the "super-committee" line is off-putting. I'm not sure how to fix it, though.

Agreed. Maybe it is because it feels like you are talking down to us with the name-dropping? Perhaps this should be tested with people who are unfamiliar with LW and AI-related ideas, to see if they have the same reaction.

Comment author: Stuart_Armstrong 25 April 2014 09:33:10AM *  0 points [-]

Thanks! A lot of the supercomittee weirdness is due to space constraints (this fits on 2 A4 sides - just about).

Using "both" now, thanks.

Comment author: ciphergoth 18 April 2014 01:17:46PM 2 points [-]

s/nut hope/but hope/

Comment author: kevin_p 19 April 2014 07:09:08AM 1 point [-]

I saw the same error, but assumed it should have been "we can not hope" (as in, we can't just hope it works out, we have to do something about it).

Comment author: Stuart_Armstrong 25 April 2014 09:25:39AM 0 points [-]

Thanks! Corrected to "but".

Comment author: faust 20 April 2014 05:17:21AM *  -2 points [-]

As long as other humans exist in competition with other humans, there is now way to keep AI as safe AI.

As long as competitive humans exist, boxes and rules are futile.

The only way to stop hostile AI is to have no AI. Otherwise, expect hostile AI.

There really isn't a logical way around this reality.

Without competitive humans, you could box the AI, give it ONLY preventative primary goals (primarily: 1. don't lie 2. always ask before creating a new goal), and feed it limited-time secondary goals that expire upon inevitable completion. There can never be a strong AI that has continuous goals that aren't solely designed to keep the AI safe.

Comment author: More_Right 24 April 2014 07:52:50AM *  0 points [-]

As long as other humans exist in competition with other humans, there is no_ way to keep AI as safe AI.

Agreed, but in need of qualifiers. There might be a way. I'd say "probably no way." As in, "no guaranteed-reliable method, but a possible likelihood."

As long as competitive humans exist, boxes and rules are futile.

I agree fairly strongly with this statement.

The only way to stop hostile AI is to have no AI. Otherwise, expect hostile AI.

This can be interpreted in two ways. The first sentence I agree with if reworded as "The only way to stop hostile AI in the absence of nearly-as-intelligent but separate-minded competitors, is to have no AI." Otherwise, I think markets indicate fairly well how hostile an AI is likely to be, thanks to governments and the corporate charter. Governments are already-in-existence malevolent AGI. However, they are also very incompetent AGI, in comparison to the theoretical maximum value of malevolent competence without empathic hesitation, internal disagreement, and confusion. (I think we can expect more "unity of purpose" from AGI than we can from government. Interestingly I think this makes sociopathic or "long-term hostile" AI less likely.)

"Expect hostile AI" could either mean "I think hostile AI is likely in this case" or "I think in this case, we should expect hostile AI because one should always expect the worst --as a philosophical matter."

There really isn't a logical way around this reality.

Nature often deals with "less likely" and "more likely," as well as intermediate outcomes. Hopefully you've seen Stephen Omohundro's webinars on hostile universal motivators as basic AI drives and autonomous systems. as well as Peter Voss's excellent ideas on the subject. I think that evolutionary approaches will trend toward neutral benevolence, and even given extremely shocking intermediary experiences, it will trend toward benevolence, especially given enough interaction with benevolent entities. I believe that intelligence trends toward increased interaction with its environment.

Without competitive humans, you could box the AI, give it ONLY preventative primary goals (primarily: 1. don't lie 2. always ask before creating a new goal), and feed it limited-time secondary goals that expire upon inevitable completion. There can never be a strong AI that has continuous goals that aren't solely designed to keep the AI safe.

I think this is just as likely to create malevolent AGI (with limited "G"), possibly more likely. After all, if humans are in competition with each other in anything that operates like the current sociopath-driven "mixed economy," sociopaths will be controlling them. Our only hope is that other sociopaths aren't in their same "professional sociopath" network, and that's a slim hope, indeed.

Comment author: CillianSvendsen 24 April 2014 03:39:08AM *  0 points [-]

I don't think that's a forgone conclusion. After all, there seem to be many proposals on how to get around this problem that individuals compete each other. For example, there's Eliezer's idea of using humanity's coherent extrapolated voalition to guide the AI. I also don't think that its in anyone's advantage to have hostile AI, that no one will try to bring about explicitly hostile AI on purpose, and that anyone sufficiently intelligent to program a working AI will probably recognize the dangers that AI contain.

Yes, humans will fight amongst each other and there is temptation for seed AI programmers to abuse the resulting AI to destroy their rivals. But I don't agree with the idea that AIs will always be hostile to the enemies of programmers. With some of the proposals that researchers have, it doesn't seem like individuals can abuse the AI to compete with other humans at all. The large potential for abuse doesn't mean that there is no potential for a good result.

Comment author: rule_and_line 18 April 2014 05:57:45PM *  0 points [-]

Is there a convenient place to see just what changed from the old to the new?

Online diff tools aren't usefully handling the paragraphs when I copy-paste, and my solution of download -> insert line breaks -> run through my favorite diff program is probably inconvenient for most.

Comment author: itaibn0 19 April 2014 12:57:47PM 0 points [-]

Thinking about this, it seems like there should exist some version of diff which points out differences on the word level rather than the line level. That would be useful for text documents which only have line breaks in between paragraphs. Given how easy I expect it to be to program such a thing almost certainly does exist, but I don't know where to find it.

Comment author: DSimon 21 April 2014 08:12:58PM 0 points [-]

Try wdiff

Comment author: rule_and_line 19 April 2014 05:45:53PM *  0 points [-]

I'm only familiar with open source tools, but git will do this with "git diff --word-diff FILE1 FILE2" and Emacs diff has the "ediff-toggle-autorefine" command. IMO you still need to insert line breaks before they become useful.

GNU has wdiff though I've never used it: https://www.gnu.org/software/wdiff/ (update: the git command above seems to do the same thing)

I'm still looking for an online diff tool that makes the word-level differences obvious. That would be ideal here (my web skills are too weak to make it happen this month).