shminux comments on AI risk, new executive summary - Less Wrong

12 Post author: Stuart_Armstrong 18 April 2014 10:45AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (76)

You are viewing a single comment's thread. Show more comments above.

Comment author: Error 19 April 2014 04:18:29PM 9 points [-]

I was going to post this story in the open thread, but it seems relevant here:

So my partner and I went to see the new Captain America movie, and at one point there is a scene involving an AI/mind upload, along with a mention of an Operation Paperclip. And my first thought was "Is that a real thing, or is someone on the writing staff a Less Wronger doing a shoutout? Because that would be awesome."

Turns out it was a real thing. :-( Oh well.

Something more interesting happened afterward. I mentioned the connection to my partner, said paperclips were an inside joke here. She asked me to explain, so I gave her a (very) brief rundown of some LW thought on AI to provide context for the concept of a paperclipper. Part of the conversation went like this:

"So, next bit of context, just because an AI isn't actively evil doesn't mean it won't try to kill us."

To which she responded:

"Well, of course not. I mean, maybe it decides killing us will solve some other problem it has."

And I thought: That click Eliezer was talking about in the Sequences? This seems like a case of it. What makes it interesting is that my partner doesn't have a Mensa-class intellect or any significant exposure to the Less Wrong memeplex. Which suggests that clicking on the dangers of...call it non-ethical AI, as opposed to un-ethical, unless there's already a more standard term for the class of AI's that contains paperclippers but not Skynet...isn't limited to the high-IQ bubble.

That may not be news to MIRI, but it seemed worth commenting about here. Because we are a high IQ bubble. And that's part of why I like coming here. But I'm sure MIRI would be pleased to reach outside the bubble.

(of interest: Obviously the first connection she drew from dangerous AI was Skynet...but once I described the idea of an AI that was neutral-but-still-dangerous, the second connection she made was to Kyubey. And that felt sort-of-right to me. I told her that was the right idea but didn't go far enough.)

Comment author: shminux 20 April 2014 07:55:23AM *  -1 points [-]

Re goals, I feel that comparing advanced AGI to humans is like comparing humans to chimps: regardless how much we want to explain human ethics and goals to a chimp, and how much effort we put in, its mind just isn't equipped to comprehend them. Similarly, even the most benevolent and conscientious AGI would be unable to explain its goal system or its ethical system to even a very smart human. Like chimps, humans have their own limits of comprehension, even though we do not know what they are from the inside.

Comment author: [deleted] 20 April 2014 04:08:45PM *  1 point [-]

What are your reasons for thinking this? I find myself disagreeing: one big disanalogy is that while we have language and chimps do not, we and the AGI both have language. I find it implausible that the AGI could not in principle communicate to us its goals: give the AGI and ourselves an arbitrarily large amount of time and resources to talk, do you really think we'd never come to a common understanding? Because even if we don't, the AGI effectively does have such resources by which it might, I donno, choose its words with care.

I'm also not sure why we should think it would even be particularly challenging to understand the goals of an AGI. It's not easy even with other humans, but why would it be much harder with AGI? Do we have some reason to expect its goals to be more complex than ours? It's been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be. My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.

Comment author: shminux 20 April 2014 05:31:11PM 2 points [-]

I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

Certainly language is important, and human language is much more evolved than that of other animals. There are parts of human language, like writing, which are probably inaccessible to chimps, no matter how much effort we put into teaching them and how patient we are. I can easily imagine that AGI would use some kind of "meta-language", because human language would simply be inadequate for expressing its goals, like the chimp language is inadequate for expressing human metaethics.

I do not know what this next step would be, no more than an intelligent chimp being able to predict that humans would invent writing. My mind as-is is too limited and I understand as much. An AGI would have to make me smarter first, before being able to explain what it means to me. Call it "human uplifting".

Do we have some reason to expect its goals to be more complex than ours?

Yes, if you look through the tower of goals, more intelligent species have more complex goals.

It's been my experience that the more sophisticated and intelligent someone is, the more intelligible their behavior tends to be.

It has not been mine. When someone smarter than I am behaves a certain way, they have to patiently explain to me why they do what they do. And I still only see the path they have taken, not the million paths they briefly considered and rejected along the way.

My prejudice therefore says that the goals of an AGI would be much easier to understand than, say, my own.

My prejudice tells me that when someone a few levels above mine tries to explain their goals and motivations to me in English, I may understand each word, but not the complete sentences. If you cannot relate to this experience, go to a professional talk on a subject you know nothing about. For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her. Certainly some of this gap can be patched to a degree, and after a decade or so of dedicated work by both sides, wrought with frustration and doubt, but I don't think if the gap is wide enough it can be bridged completely.

I find the line of thinking "we are humans, we are smart, we can understand the goals of even an incredibly smart AGI" to be naive, unimaginative and closed-minded, given that our experience is rife with counterexamples.

Comment author: pragmatist 21 April 2014 03:55:19PM *  1 point [-]

I am trying to use an outside view here, because I find the inside view too limiting. The best I can do is to construct a tower of comparisons between species vastly different in intelligence and conjecture that this tower does not end with humans on top, a Copernican principle, if you like. To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

OK, but why not look at this tower another way. A fish is basically useless at explaining its goals to an amoeba. We are not in fact useless at explaining our goals to chimps. Human researchers are often able to convey simple goals to chimps, and then see if chimps will help them accomplish those goals, for instance. I am able to convey simple goals to my dog: I can convey to him some information about the kinds of things I dislike and the kinds of things I like.

So the gap in intelligence between fish and humans also seems to translate into a gap in ability to convey useful information about goals to creatures of lower intelligence. Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are. Extrapolating this, you might expect a superintelligent AGI to be much much superior at communicating its goals (if it wants to). The line of thinking here is not so much "we are humans, we are smart, we can understand the goals of even an incredibly smart AGI"; it's "an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires."

So it seems like naive extrapolation pulls in two separate directions here. On the one hand, the tower of intelligence seems to put limits on the ability of beings lower down to comprehend the goals of beings higher up. On the other hand, the higher up you go, the better beings at that level become at communicating their goals to beings lower down. Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me. I'm pretty skeptical of naive extrapolation in this domain anyway, given Eliezer's point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn't expect trends to be maintained across those shifts.

Comment author: shminux 21 April 2014 05:25:25PM *  -1 points [-]

Humans are much better at communicating with less intelligent beings than fish or cattle or chimps are.

You are right that we are certainly able to convey a small simple subset of our goals, desires and motivations to some complex enough animals. You would probably also agree that most of what makes us human can never be explained to a dog or a cat, no matter how hard we try. We appear to them like members of their own species who sometimes make completely incomprehensible decisions they have no choice but put up with.

"an incredibly smart AGI is incredibly smart, so it will be able to find effective strategies for communicating its goals to us if it so desires."

This is quite possible. It might give us its dumbed-down version of its 10 commandments which would look to us like an incredible feat of science and philosophy.

Which one of these tendencies will win out when it comes to human-AGI interaction? Beats me.

Right. An optimistic view is that we can understand the explanations, a pessimistic view is that we would only be able to follow instructions (this is not the most pessimistic view by far).

I'm pretty skeptical of naive extrapolation in this domain anyway, given Eliezer's point that major advances in optimization power are meta-level qualitative shifts, and so we shouldn't expect trends to be maintained across those shifts.

Indeed, we shouldn't. I probably phrased my point poorly. What I tried to convey is that because "major advances in optimization power are meta-level qualitative shifts", confidently proclaiming that an advanced AGI will be able to convey what it thinks to humans is based on the just-world fallacy, not on any solid scientific footing.

Comment author: EHeller 20 April 2014 11:33:15PM *  1 point [-]

For example, a musician friend of mine who attended my PhD defense commented on what she said was a surreal experience: I was talking in English, and most of the words she knew, but most of what I said was meaningless to her.

Thats because you weren't really speaking english, you were speaking the english words for math terms related to physics. The people who spoke the relevant math you were alluding to could follow, those who didn't, could not, because they didn't have concrete mathematical ideas to tie the words to. Its not just a matter of jargon, its an actual language barrier. I think you'd find, with a jargon cheat sheet, you could follow many non-mathematical phd defenses just fine.

The same thing happens in music, which is its own language (after years of playing, I find I can "listen" to a song by reading sheet music).

Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?

Comment author: shminux 21 April 2014 12:06:32AM -1 points [-]

Is your argument, essentially, that you think a machine intelligence can create a mathematics humans cannot understand, even in principle?

"mathematics" may be a wrong word for it. I totally think that a transhuman can create concepts and ideas which a mere human cannot understand even when patiently explained. I am quite surprised that other people here don't find it an obvious default.

Comment author: Armok_GoB 21 April 2014 12:55:48PM 0 points [-]

My impression was the question was not if it'd have those concepts, since as you say thats obvious, but if they'd be referenced necessarily by the utility function.

Comment author: EHeller 21 April 2014 01:43:44AM *  0 points [-]

Sure, but I find "can't understand" sort of fuzzy as a concept. i.e. I wouldn't say I 'understand' compactification and calabi yau manifolds the same way I understand sheet music (or the same way I understand the word green), but I do understand them all in some way.

It seems unlikely to me that there exist concepts that can't be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.

for example- it seems unlikely there exists a law of physics that cannot be expressed as an equation. It seems equally unlikely there exists an equation I would be totally incapable of working with. Even if I'll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.

Comment author: Armok_GoB 21 April 2014 01:40:18PM 1 point [-]

Human languages can encode anything, but a human can't understand most things valid in human languages; most notably, extremely long things, and numbers specified with a lot of digits that actually matters. Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.

Humans seem "concept-complete" in much the same way your desktop PC seems turing complete. Except it's much more easily broken because the human brain has absurdly shity memory.

Comment author: EHeller 21 April 2014 10:36:11PM *  1 point [-]

numbers specified with a lot of digits that actually matters

Thats why we have paper, I can write it down. "Understanding" and "remembering" seem somewhat orthogonal here. I can't recite Moby Dick from memory, but I understood the book. If you give me a 20 digit number 123... and I can't hold it but retain "a number slightly larger than 1.23 * 10^20," that doesn't mean I can't understand you.

Just because you can count in binary on you hands does not mean you can comprehend the code of an operating system expressed in that format.

Print it out for me, and give me enough time, and I will be able to understand it, especially if you give me some context.

Yes, you can encode things in a way that make them harder for humans to understand, no one would argue that. The question is- are there concepts that are simply impossible to explain to a human? I point out that while I can't remember a 20 digit number, I can derive pretty much all of classical physics, so certainly humans can hold quite complex ideas in their head, even if they aren't optimized for storage of long numbers.

Comment author: Armok_GoB 22 April 2014 01:52:26AM 0 points [-]

You can construct a system consisting of a planet's worth of paper and pencils and an immortal version of yourself (or a vast dynasty of successors) that can understand it, if nothing else because it's turing complete and can simulate the AGI. this is not the same as you understanding it while still remaining fully human. Even if you did somehow integrate the paper-system sufficiently that'd be just as big a change as uploading and intelligence-augmenting the normal way.

The approximation thing is why I specified digits mattering. It wont help one bit when talking about something like gödel numbering.

Comment author: shminux 21 April 2014 02:43:43AM -1 points [-]

It seems unlikely to me that there exist concepts that can't be at least broadly conveyed via some combination of those. My intuition is that existing human languages cover, with their descriptive power, the full range of explainable things.

My intuition is the exact opposite.

it seems unlikely there exists a law of physics that cannot be expressed as an equation

I can totally imagine that some models are not reducible to equations, but that's not the point, really.

Even if I'll never have the insight that lead someone to write it down, if you give it to me, I can use it to do things.

Unless this "use" requires more brainpower than you have... You might still be able to work with some simplified version, but you'd have to have transhuman intelligence to "do things" with the full equation.

Comment author: EHeller 21 April 2014 03:17:10AM 0 points [-]

Unless this "use" requires more brainpower than you have...

But that seems incredibly nebulous. What is the exact failure mode?

Comment author: nshepperd 21 April 2014 03:49:09AM 0 points [-]

To use some drastically different pairing, if you agree that an amoeba can never comprehend fish, that fish can never comprehend chimps, that chimps can never understand humans, then there is no reason to stop there and proclaim that humans would understand whatever intelligence comes next.

Yes, if you look through the tower of goals, more intelligent species have more complex goals.

This seems like a bogus use of the outside view. AGI is qualitatively different to evolved intelligence, in that it is not evolved, but built by a lesser intelligence. Moreover, there's a simple explanation for the observation that more intelligent animals have more complex goals, which is that more intelligence permits more subgoals, and natural selection generally alters a species' goals by adding, rather than simplifying. This is pretty much totally inapplicable to a constructed AGI.

Comment author: shminux 21 April 2014 05:00:02AM *  -2 points [-]

I'd love to hear what actual AGI experts think about it, not just us idle forum dwellers.

Comment author: [deleted] 20 April 2014 07:29:53PM *  0 points [-]

I will try to refute you by understanding what you say. So could you explain to me this idea of a 'meta-language'? I guess that by 'meta-' you intend to say that at least some sentences in the meta-language couldn't in principle be translated into a non-meta 'human' language. Is that right?

given that our experience is rife with counterexamples.

This is not a given. I've been to plenty of dissertation defenses on topics I know little to nothing about, and you're right that I'm often at a loss. But this, I find, is because the understanding of a newly minted doctor is too narrow and too newborn to be easily understood. PhD defenses are not the place to go to find people who really get something, they're the place to go to find someone who's just now gotten a foothold. My experience is still that the more intelligent and experienced PhDs tend to be more intelligible. But this is a little beside the point: PhDs tend to be hard to understand, when they are, because they're discussing something quite complex.

What reason do you have for thinking an AGI's goals would be complex at all? If your reasoning is that human beings that are more intelligent tend to have more complex goals (I don't agree, but say I grant this) why do you think an AGI will be so much like an intelligent human being?

Comment author: shminux 20 April 2014 10:22:39PM -1 points [-]

I will try to refute you by understanding what you say.

I am not sure what you mean by "refute" here. Prove my conjecture wrong by giving a counterexample? Show that my arguments are wrong? Show that the examples I used to make my point clearer are bad examples? If it's the last one, but then I would not call it a refutation.

I guess that by 'meta-' you intend to say that at least some sentences in the meta-language couldn't in principle be translated into a non-meta 'human' language. Is that right?

Indeed, at least not without some extra layer of meaning not originally expressed in the language. To give another example (not a proof, just an illustration of my point), you can sort-of teach a parrot or an ape to recognize words, to count and maybe even to add, but I don't expect it to be possible to teach one to construct mathematical proofs or to understand what one even is. Even if a proof can be expressed as a finite string of symbols (a sentence in a language) a chimp is capable of distinguishing from another string. There is just too much meta there, with symbols standing for other symbols or numbers or concepts.

I agree that my PhD defense example is not a proof, but an illustration meant to show that humans quite often experience a disconnect between a language ans an underlying concept, which well might be out of reach, despite being expressed with familiar symbols, just like a chimp would in the above example.

What reason do you have for thinking an AGI's goals would be complex at all?

I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.

Comment author: Armok_GoB 21 April 2014 02:00:26PM *  0 points [-]

I can in fact imagine what else a super-intelligence would use instead of a goal system. A bunch of different ones even. For example, a lump of incomprehensible super-solomonoff-compressed code that approximates a hypercomputer simulating a multiverse with the utility function as an epiphenomenal physical law feeding backwards in time to the AIs actions. Or a carefully tuned decentralized process (think natural selection, or the invisible hand) found to match the AIs previous goals exactly by searching through an infinite platonic space.

(yes, half of those are not real words; the goal was to imagine something that per definition could not be understood, so it's hard to do better than vaguely pointing in the direction of a feeling.)

Edit: I forgot: "goal system replaced by completely arbitrary thing that resembles it even less because it was traded away counterfactually to another part of tegmark-5"

Comment author: [deleted] 20 April 2014 10:51:40PM *  0 points [-]

I am not sure what you mean by "refute" here.

It was just a joke: I meant that I would prove you wrong by showing that I can understand you, despite the difference in our intellectual faculties. I don't really know if we have very different intellectual faculties; it was just a slightly ironic reposte to being called "naive, unimaginative and closed-minded" earlier. You may be right! But then my understanding you is at least a counterexample.

you can sort-of teach a parrot or an ape to recognize words

Can we taboo the 'animals can't be made to understand us' analogy? I don't think it's a good analogy, and I assume you can express your point without it. It certainly can't be the substance of your argument.

Anyway, would you be willing to agree to this: "There are at least some sentences in the meta-language (i.e. the kind of language an AGI might be capable of) such that those sentences cannot be translated into even an arbitrarily complex expressions in human language." For example, there will be sentences in the meta-language that cannot be expressed in human language, even if we allow the users of human language (and the AGI) an arbitrarily large amount of time, an arbitrarily large number of attempts at conversation, question and answer, etc. an arbitrarily large capacity for producing metaphor, illustration, etc. Is that your view? Or is that far too extreme? Do you just mean to say that the average human being today couldn't get their heads around an AGI's goals given 40 minutes, pencil, and paper? Or something in between these two claims?

I simply follow the chain of goal complexity as it grows with the intelligence complexity, from protozoa to primate and on and note that I do not see a reason why it would stop growing just because we cannot imagine what else a super-intelligence would use for/instead of a goal system.

Why do you think this is a strong argument? It strikes me as very indirect and intuitionistic. I mean, I see what you're saying, but I'm not at all confident that the relations between a protozoa and a fish, a dog and a chimp, a 8th century dock worker and a 21st century physicist, and the smartest of (non-uplifted) people and an AGI all fall onto a single continuum of intelligence/complexity of goals. I don't even know what kind of empirical evidence (I mean the sort of think one would find in a scientific journal) could be given in favor of such a conclusion. I just don't really see why you're so confident in this conclusion.

Comment author: Armok_GoB 21 April 2014 02:08:21PM 0 points [-]

Using "even an arbitrarily complex expressions in human language" seem unfair, given that it's turing complete but describing even a simple program in it fully in it without external tools will far exceed the capability of any actual human except for maybe a few savants that ended up highly specialized towards that narrow kind of task.

Comment author: [deleted] 21 April 2014 02:23:13PM *  0 points [-]

I agree, but I was taking the work of translation to be entirely on the side of an AGI: it would take whatever sentences it thinks in a meta-language and translate them into human language. Figuring out how to express such thoughts in our language would be a challenging practical problem, but that's exactly where AGI shines. I'm assuming, obviously, that it wants to be understood. I am very ready to agree that an AGI attempting to be obscure to us will probably succeed.

Comment author: Armok_GoB 22 April 2014 01:56:27AM 1 point [-]

Thats obvious and not what I meant. I'm talking about the simplest possible in principle expression in the human language being that long and complex.

Comment author: shminux 20 April 2014 11:16:40PM -1 points [-]

it was just a slightly ironic reposte to being called "naive, unimaginative and closed-minded" earlier. You may be right! But then my understanding you is at least a counterexample.

Sorry, didn't mean to call you personally any of those adjectives :)

Anyway, would you be willing to agree to this [...]

Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn't so.

but I'm not at all confident that the relations between [...] fall onto a single continuum of intelligence/complexity of goals.

If you agree with Eliezer's definition of intelligence as optimization power, then shouldn't we be able to express this power as a number? If so, the difference between difference intelligences is only that of scale.

Comment author: [deleted] 21 April 2014 12:57:29AM *  0 points [-]

Sorry, didn't mean to call you personally any of those adjectives :)

None taken then.

Pretty much, yes, I find it totally possible. I am not saying that I am confident that this is the case, just that I find it more likely than the alternative, which would require an additional reason why it isn't so.

Well, tell me what you think of this argument:

Lets divide the meta-language into two sets: P (the sentences that cannot be rendered in English) and Q (the sentences that can). If you expect Q to be empty, then let me know and we can talk about that case. But let's assume for now that Q is not empty, since I assume we both think that an AGI will be able to handle human language quite easily. Q is, for all intents and purposes, a 'human' language itself.

Premise one is that that translation is transitive: if I can translate language a into language b, and language b into language c, then I can translate language a into language c (maybe I need to use language b as an intermediate step, though).

Premise two: If I cannot translate a sentence in language a into an expression in language b, then there is no expression in language b that expresses the same thought as that sentence in language a.

Premise three: Any AGI would have to learn language originally from us, and thereafter either from us or from previous versions of itself.

So by stipulation, every sentence in Q can be rendered in English, and Q is non-empty. If any sentence in P cannot be rendered in English, then it follows from premise one that sentences in P cannot be rendered in sentences in Q (since then they could thereby be rendered into English). It also follows, if you accept premise two, that Q cannot express any sentence in P. So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).

Hence, no AGI, beginning from a language like English could go on to learn how to express any sentence in P. Therefore no AGI will ever know P.

I'm not super confident this argument is sound, but it seems to me to be at least plausible.

If you agree with Eliezer's definition of intelligence as optimization power

Well, that's a fine definition, but it's tricky in this case. Because if intelligence is optimization power, and optimizing presupposes something to optimize, then intelligence (on that definition) isn't strictly a factor in (ultimate) goal formation. If that's right, than something's being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.

Comment author: Armok_GoB 21 April 2014 02:15:32PM 0 points [-]

Premise one is false assuming finite memory.

Premise 3 does not hold well either; Many new words come from pointing out a pattern in the environment, not from defining in terms of previous words.

Comment author: Jiro 21 April 2014 02:21:39AM 0 points [-]

By this reasoning no AGI beginning from English could ever know French either, for similar reasons. (Note that every language has sentences that cannot be rendered in another language, in the sense that someone who knows the truth value of the unrendered sentence can know the truth value of the rendered sentence; consider variations on Godel-undecideable sentences.)

Comment author: shminux 21 April 2014 02:36:23AM -1 points [-]

So an AGI knowing only Q could never learn to express any sentence in P, since if it could, any speaker of Q (potentially any non-improved human) could in principle learn to express sentences in P (given an arbitrarily large amount of resources like time, questions and answers, etc.).

Honestly, I expected you to do a bit more steelmanning with the examples I gave. Or maybe you have, just didn't post them here, Anyway, does the quote mean that any English sentence can be expressed in Chimp, since we evolved from a common ancestor? If you don't claim that (I hope you don't), then where did your logic stop applying to humans and chimps vs AGI and humans? Presumably it's relying on the Premise 3 that gets us a wrong conclusion in the English/Chimp example, since it is required to construct an unbroken chain of languages. What happened to humans over their evolution which made them create Q out of P where Q is not reducible to P? And if this is possible in the mindless evolutionary process, then would it not be even more likely during intelligence explosion?

If that's right, than something's being much more intelligent would (as I think someone else mentioned) just lead to very hard to understand instrumental goals. It would have no direct relationship with terminal goals.

I don't understand this point. I would expect the terminal goals evolve as the evolving intelligence understands more and more about the world. For example, for many people here the original terminal goal was, ostensibly, "serve God". Then they stopped believing and now their terminal goal is more like "do good". Similarly, I would expect an evolving AGI to adjust its terminal goals as the ones it had before are obsoleted, not because they have been reached, but because they become meaningless.

Comment author: hairyfigment 20 April 2014 07:05:33PM -1 points [-]

My hangup is that it seems like a truly benevolent AI would share our goals. And in a sense your argument "only" applies to instrumental goals, or to those developed through self-modification. (Amoebas don't design fish.) I'll grant it might take a conversation forever to reach the level we'd understand.

Comment author: shminux 20 April 2014 08:56:51PM 0 points [-]

My hangup is that it seems like a truly benevolent AI would share our goals.

In the way that a "truly benevolent" human would leave an unpolluted lake for fish to live in, instead of using it for its own purposes. The fish might think that humans share its goals, but the human goals would be infinitely more complex than fish could understand.

Comment author: hairyfigment 20 April 2014 10:54:13PM -1 points [-]

...It sounds like you're hinting at the fact that humans are not benevolent towards fish. If we are, then we do share its goals when it comes to outcomes for the fish - we just have other goals, which do not conflict. (I'm assuming the fish actually has clear preferences.) And a well-designed AI should not even have additional goals. The lack of understanding "only" might come in with the means, or with our poor understanding of our own preferences.

Comment author: christopherj 27 April 2014 02:59:59PM 0 points [-]

Do we have some reason to expect [an AGI's] goals to be more complex than ours?

I find myself agreeing with you -- human goals are a complex mess, which we seldom understand ourselves. We don't come with clear inherent goals, and what goals we do have we abuse by using things like sugar and condoms instead of eating healthy and reproducing like we were "supposed" to. People have been asking about the meaning of life for thousands of years, and we still have no answer.

An AI on the other hand, could have very simple goals -- make paperclips, for example. An AI's goals might be completely specified in two words. It's the AI's sub-goals and plans to reach its goals that I doubt I could comprehend. It's the very single-mindedness of an AI's goals and our inability to comprehend our own goals, plus the prospect of an AI being both smarter and better at goal-hacking than us, that has many of us fearing that we will accidentally kill ourselves via non-friendly AI. Not everyone will think to clarify "make paperclips" with, "don't exterminate humanity", "don't enslave humanity", "don't destroy the environment", "don't reprogram humans to desire only to make paperclips", and various other disclaimers that wouldn't be necessary if you were addressing a human (and we don't know the full disclaimer list either).

Comment author: TheOtherDave 20 April 2014 05:20:37PM 1 point [-]

Can you say more about what you're expecting a successful explanation to comprise, here?

E.g., suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them. I expect to be significantly better at predicting the AGI's rankings than I was before the explanation.

I don't expect to be able to do anything equivalent with a chimp.

Do our expectations differ here?

Comment author: shminux 20 April 2014 06:03:33PM *  1 point [-]

E.g., suppose an AGI attempts to explain its ethics and goals to me

"Suppose an AGI attempts to explain its <untranslatable1> and <untranslatable2> to me" is what I expect it to sound like to humans if we were to replace human abstractions with those an advanced AGI would use. It would not even call these abstractions "ethics" or "goals", no more than we call ethics "groom" and goals "sex" when talking to a chimp.

suppose an AGI attempts to explain its ethics and goals to me, and at the end of that process it generates thousand-word descriptions of N future worlds and asks me to rank them in order of its preferences as I understand them.

I do not expect it to be able to generate such descriptions at all, due to the limitations of the human mind and human language. So, yes, our expectations differ here. I do not think that human intelligence reached some magical threshold where everything can be explained to it, given enough effort, even though it was not possible with "less advanced" animals. For all I know, I am not even using the right terms. Maybe an AGI improvement on the term "explain" is incomprehensible to us. Like if we were to translate "explain" into chimp or cat it would come out as "show", or something.

Comment author: TheOtherDave 20 April 2014 10:44:12PM *  0 points [-]

(shrug) Translating the terms is rather beside my point here.

If the AGI is using these things to choose among possible future worlds, then I expect it to be able to teach me to choose among possible future worlds more like it does than I would without that explanation.

I'm happy to call those things goals, ethics, morality, etc., even if those words don't capture what the AGI means by them. (I don't know that they really capture what I mean by them either, come to that.) Perhaps I would do better to call them "groom" or "fleem" or "untranslatable1" or refer to them by means of a specific shade of orange. I don't know; but as I say, I don't really care; terminology is largely independent of explanation.

But, sure, if you expect that it's incapable of doing that, then our expectations differ.

I'll note that my expectations don't depend on my having reached a magical threshold, or on everything being explainable to me given enough effort.

Comment author: Armok_GoB 21 April 2014 12:48:47PM *  0 points [-]

It might not be possible to "truly comprehend" the AIs advanced meta-meta-ethics and whatever compact algorithm replaces the goal-subgoals tree, but the AI most certainly can provide a code of behavior and prove that following it is a really good idea, much like humans might train pets to provide a variety of useful tasks whose true purpose they can't comprehend. And it doesn't seem unreasonable that this code of behavior wouldn't have the look and feel of an in-depth philosophy of ethics, and have some very very deep and general compression/procedural mechanism that seem very much like things you'd expect from a true and meaningful set of metaethics to humans, even if it did not correspond much to whats going on inside the AI. It also probably wouldn't accidentally trigger hypocrisy-revulsion in the humans, although the AI seeming to also be following it is just one of many solutions to that and probably not a very likely one.

Friendliness is pretty much an entirely tangential issue and the equivalent depth of explaining it would require the solution to several open questions unless I'm forgetting something right now. (I probably am)

There, question dissolved.

Edit; I ended up commenting in a bunch of places, in this comment tree, so i feel the need to clarify; I consider both side here to be making errors, and ended up seeing to favor the shminux side because thats where I were able to make interesting contributions, and it made some true tangential claims that were argued against and not defended well. I do not agree with the implications for friendliness however; you don't need to understand something to be able to construct true statements about it or even direct it's expression powerfully to have properties you can reference but don't understand either, especially if you have access to external tools.

Comment author: TheAncientGeek 20 April 2014 08:55:12AM *  0 points [-]

Is the problems supposed to be that the human doesn't have enough intelligence, or that we have some kind of highly parochial rationality?

Comment author: shminux 20 April 2014 05:34:15PM 0 points [-]

Not enough intelligence, yes. And rationality is a part of intelligence. Also, see my reply to hen.

Comment author: TheAncientGeek 21 April 2014 10:57:47AM *  -1 points [-]

But that's not ready analogous to the human champ gap, which is qualitative....chimps don't have language.