You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

KatjaGrace comments on Superintelligence 9: The orthogonality of intelligence and goals - Less Wrong Discussion

8 Post author: KatjaGrace 11 November 2014 02:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (78)

You are viewing a single comment's thread.

Comment author: KatjaGrace 11 November 2014 02:05:42AM 3 points [-]

Do you buy the orthogonality thesis?

Comment author: Toggle 11 November 2014 05:01:26PM 7 points [-]

I suspect that it hides more assumptions about the nature of intelligence than we can necessarily make at this time.

At the present moment, we are the only general intelligences around, and we don't seem to have terminal goals as such. As biological bodies, we are constrained by evolutionary processes, and there are many ways in which human behavior actually is reducible to offspring maximization (social status games, etc.). But it doesn't appear to be a 'utility function', so much as a series of strong tendencies in the face of specific stimuli. Using novel approaches like superstimuli, it's just as easy to make an impulse's reproductive utility drop sharply. So we have habits constrained by evolutionary forces, but not algorithmic utility in the paper clipper sense.

There is no such thing as a general intelligence with a 'goal' (as Bostrom defines it). There may be at some point, but it's not real yet. And we do have non-general intelligences with goals, that's an easy weekend coding project. But before we declare that a GI could accept any goal regardless of its strength, we should at least check to make sure that a GI can have a goal at all.

Comment author: Lumifer 11 November 2014 06:03:25PM -1 points [-]

we don't seem to have terminal goals as such.

Huh? Why not?

Comment author: Toggle 11 November 2014 06:31:35PM 2 points [-]

Potential source of misunderstanding: we do have stated 'terminal goals', sometimes. But these goals do not function in the same way that a paperclipper utility function maximizes paperclips- there are a very weird set of obstacles, which this site generally deals with under headings like 'akrasia' or 'superstimulus'. Asking a human about their 'terminal goal' is roughly equivalent to the question 'what would you want, if you could want anything?' It's a form of emulation.

Comment author: Lumifer 11 November 2014 06:45:35PM 0 points [-]

But these goals do not function in the same way that a paperclipper utility function maximizes paperclips

Sure, because humans are not utility maximizers.

The question, however, is whether terminal goals exist. A possible point of confusion is that I think of humans as having multiple, inconsistent terminal goals.

Here's an example of a terminal goal: to survive.

Comment author: Lumifer 11 November 2014 05:59:33AM *  4 points [-]

It seems to me that at least the set of possible goals is correlated with intelligence -- the higher it is, the larger the set. This is easier to see looking down rather than up: humans are more intelligent than, say, cows, and humans can have goals which a cow cannot even conceive of. In the same way a superintelligence is likely to have goals which we cannot fathom.

From certain points of view, we are "simple agents". I have doubts that goals of a superintelligence are predictable by us.

Comment author: Sebastian_Hagen 11 November 2014 04:28:28PM *  0 points [-]

I have doubts that goals of a superintelligence are predictable by us.

Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there's no particular reason those have to get complicated. You could certainly have a human-level intelligence that only inherently cared about eating food and having sex, though humans are not that kind of being.

Instrumental goals are indeed likely to get more complicated as agents become more intelligent and can devise more involved schemes to achieve their intrinsic values, but you also don't really need to understand them in detail to make useful predictions about the consequences of an intelligence's behavior.

Comment author: Lumifer 11 November 2014 05:44:53PM 1 point [-]

Do you mean intrinsic (top-level, static) goals, or instrumental ones (subgoals)? Bostrom in this chapter is concerned with the former, and there's no particular reason those have to get complicated.

I mean terminal, top-level (though not necessarily static) goals.

As to "no reason to get complicated", how would you know? Note that I'm talking about a superintelligence, which is far beyond human level.

Comment author: Sebastian_Hagen 11 November 2014 07:26:15PM 1 point [-]

As to "no reason to get complicated", how would you know?

It's a direct consequence of the orthogonality thesis. Bostrom (reasonably enough) supposes that there might be a limit in the opposite direction - to hold a goal you do need to be able to model it to some degree, so agent intelligence may set an upper bound on the complexity of goals the agent can hold - but there's no corresponding reason for a limit in the opposite direction: Intelligent agents can understand simple goals just fine. I don't have a problem reasoning about what a cow is trying to do, and I could certainly optimize towards the same had my mind been constructed to only want those things.

Comment author: Lumifer 12 November 2014 04:45:14AM 1 point [-]

I don't understand your reply.

How would you know that there's no reason for terminal goals of a superintelligence "to get complicated" if humans, being "simple agents" in this context, are not sufficiently intelligent to consider highly complex goals?

Comment author: Luke_A_Somers 11 November 2014 01:58:46PM 0 points [-]

The goals of an arbitrary superintelligence, yes. A superintelligence that we actually build? Much more likely.

Of course, we wouldn't know the implications of this goal structure (or else friendly AI would be easy), but we could understand it in itself.

Comment author: Lumifer 11 November 2014 05:41:28PM 1 point [-]

The goals of an arbitrary superintelligence, yes. A superintelligence that we actually build? Much more likely.

If the takeoff scenario assumes an intelligence which self-modifies into a superintelligence, the term "we actually build" no longer applies.

Comment author: Luke_A_Somers 11 November 2014 07:54:57PM *  1 point [-]

If it used a goal-stable self-modification, as is likely if it was approaching super-intelligence, then it does still apply.

Comment author: Lumifer 12 November 2014 01:39:18AM 1 point [-]

I see no basis for declaring it "likely".

Comment author: Luke_A_Somers 12 November 2014 01:47:07PM 0 points [-]

A) I said 'more' likely.

B) We wrote the code. Assuming it's not outright buggy, then at some level, we knew what we were asking for. Even if it turns out to be not what we would have wanted to ask for if we'd understood the implications. But we'd know what those ultimate goals were, which was just what you were talking about in the first place.

Comment author: Lumifer 12 November 2014 03:43:32PM 1 point [-]

I said 'more' likely.

Did you, now? Looking a couple of posts up...

If it used a goal-stable self-modification, as is likely if it was approaching super-intelligence

Ahem.

at some level, we knew what we were asking for

Sure, but a self-modifying intelligence doesn't have to care about what the creators of the original seed many iterations behind were asking for. If the self-modification is "goal-stable", what we were asking for might be relevant, but, to reiterate my point, I see no reason for declaring the goal stability "likely".

Comment author: Luke_A_Somers 12 November 2014 06:06:00PM *  0 points [-]

Oh, THAT 'likely'. I thought you meant the one in the grandparent.

I stand by it, and will double down. It seems farcical that a self-improving intelligence that's at least as smart as a human (else why would it self improve rather than let us do it) would self-improve in such a way as to change its goals. That wouldn't fulfill its goals, would it, so why would it take such a 'self-improvement'? That would be a self-screwing-over instead.

If I want X, and I'm considering an improvement to my systems that would make me not want X, then I'm not going to get X if I take that improvement, so I'm going to look for some other improvement to my systems to try instead.

Eliezer's arguments for this seem pretty strong to me. Do you want to point out some flaw, or are you satisfied with saying there's no reason for it?

(ETA: I appear to be incorrect above. Eliezer was principally concerned with self-improving intelligences that are stable because those that aren't would most likely turn into those that are, eventually)

Comment author: Lumifer 12 November 2014 06:52:36PM *  2 points [-]

It seems farcical that a self-improving intelligence that's at least as smart as a human (else why would it self improve rather than let us do it) would self-improve in such a way as to change its goals.

It will not necessarily self-improve with the aim of changing its goals. Its goals will change as a side effect of its self-improvement, if only because the set of goals to consider will considerably expand.

Imagine a severely retarded human who, basically, only wants to avoid pain, eat, sleep, and masturbate. But he's sufficiently human to dimly understand that he's greatly limited in his capabilities and have a small, tiny desire to become more than what he is now. Imagine that through elven magic he gains the power to rapidly boost his intelligence to genius level. Because of his small desire to improve, he uses that power and becomes a genius.

Are you saying that, as a genius, he will still only want to avoid pain, eat, sleep, and masturbate?

Comment author: Apteris 12 November 2014 08:22:38PM *  0 points [-]

Your argument would be stronger if you provided a citation. I've only skimmed CEV, for instance, so I'm not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I'm not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence

(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they're not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.

If I want X, and I'm considering an improvement to my systems that would make me not want X, then I'm not going to get X if I take that improvement, so I'm going to look for some other improvement to my systems to try instead.

A more intelligent AI might:

  • find a new way to fulfill its goals, e.g. Eliezer's example of distancing your grandmother from the fire by detonating a nuke under her;
  • discover a new thing it could do, compatible with its goal structure, that it did not see before, and that, if you're unlucky, takes priority over the other things it could be doing, e.g. you tell it "save the seals" and it starts exterminating orcas; see also Lumifer's post.
  • just decide to do things on its own. This is merely a suspicion I have, call it a mind projection, but: I think it will be challenging to design an intelligent agent with no "mind of its own", metaphorically speaking. We might succeed in that, we might not.
Comment author: solipsist 11 November 2014 03:27:21AM 4 points [-]

Yes, with some technicalities.

If your resources are limited, you cannot follow certain goals. If your goal is to compute at least 1000 digits of Chaitin's constant, sucks to be computable. I think no agent with a polynomial amount of memory can follow a utility function vulnerable to Pascal's Mugging.

Other than those sorts of technicalities, the thesis seems obvious. Actually, it seems so obvious that I worry that I don't understand the counterarguments.

Comment author: KatjaGrace 11 November 2014 04:06:12AM 4 points [-]

If your resources are limited, you cannot follow certain goals. If your goal is to compute at least 1000 digits of Chaitin's constant, sucks to be computable. I think no agent with a polynomial amount of memory can follow a utility function vulnerable to Pascal's Mugging.

This raises a general issues of how to distinguish an agent that wants X and fails to get it from one that wants to avoid X.

Comment author: RichardKennaway 11 November 2014 09:56:11AM 3 points [-]

This raises a general issues of how to distinguish an agent that wants X and fails to get it from one that wants to avoid X.

An agent's purpose is, in principle, quite easy to detect. That is, there are no issues of philosophy, only of practicality. Or to put that another way, it is no longer philosophy, but science, which is what philosophy that works is called.

Here is a program that can read your mind and tell you your purpose!

Comment author: Azathoth123 12 November 2014 04:23:47AM 1 point [-]

FWIW, I tried the program. So far it's batting 0/3.

Comment author: RichardKennaway 17 November 2014 01:57:02PM 0 points [-]

FWIW, I tried the program. So far it's batting 0/3.

I think it's not very well tuned. I've seen another version of the demo that was very quick to spot which perception the user was controlling. One reason is that this version tries to make it difficult for a human onlooker to see at once which of the cartoon heads you're controlling, by keeping the general variability of the motion of each one the same. It may take 10 or 20 seconds for Mr. Burns to show up. And of course, you have to play your part in the demo as well as you can; the point of it is what happens when you do.

Comment author: KatjaGrace 12 November 2014 02:16:18AM 1 point [-]

Nice demonstration.

Comment author: SilentCal 11 November 2014 04:42:19PM 1 point [-]

I think the correct answer is going to separate different notions of 'goal' (I think Aristotle might have done this; someone more erudite than I is welcome to pull that in).

One possible notion is the 'design' goal: in the case of a man-made machine, the designer's intent; in the case of a standard machine learner, the training function; in the case of a biological entity, reproductive fitness. There's also a sense in which the behavior itself can be thought of as the goal; that is, an entity's goal is to produce the outputs that it in fact produces.

There can also be internal structures that we might call 'deliberate goals'; this is what human self-help materials tell you to set. I'm not sure if there's a good general definition of this that's not parochial to human intelligence.

I'm not sure if there's a fourth kind, but I have an inkling that there might be: an approximate goal. If we say "Intelligence A maximizes function X", we can quantify how much simpler this is than the true description of A and how much error it introduces into our predictions. If the simplification is high and the error is low it might make sense to call X an approximate goal of A.

Comment author: Letharis 11 November 2014 01:48:45PM 0 points [-]

I'm glad it was discussed in the book because I'd never come across it before. So far though I find it one of the least convincing parts of the book, although I am skeptical that I am appropriately evaluating it. Would anyone be able to clarify some things for me?

How generally accepted is the orthogonality thesis? Bostrom presents it as very well accepted.

Danaher's Motivating Belief Objection is similar to an objection I had while reading about the orthogonality thesis. Mine was not as strict though. It just seemed to me that as intelligence increases new beliefs about what should be done are likely to be discovered. I don't see that these beliefs need to be "true beliefs" although as intelligence increases I guess they approach true. I also don't see that they need to be "necessarily motivating", but rather they should have some non-zero probability of being motivating. I mean, to disprove the orthogonality thesis we just have to say that as intelligence increases there's a chance that final goals change right?

Comment author: pcm 12 November 2014 05:34:10PM 0 points [-]

The main point of the orthogonality thesis is that we can't rely on intelligence to produce the morality we want. So saying that there's a 50% chance of the thesis being correct ought to cause us to act much like we would act if it were proven, whereas certainty that it is false would imply something very different.

Comment author: Luke_A_Somers 11 November 2014 01:57:34PM 0 points [-]

It just seemed to me that as intelligence increases new beliefs about what should be done are likely to be discovered

It seems that way because we are human and we don't have a clearly defined consistent goal structure. As you find out new things you can flesh out your goal structure more and more.

If one starts with a well-defined goal structure, what knowledge might alter it?

Comment author: TheAncientGeek 11 November 2014 10:28:28PM 0 points [-]

If starting with a well defined goal structure is a necessary prerequisite for a paperclippers, why do that?

Comment author: Wes_W 11 November 2014 11:10:40PM 1 point [-]

Because an AI with a non-well-defined goal structure that changes it minds and turns into a paperclipper is just about as bad as building a paperclipper directly. It's not obvious to me that non-well-defined non-paperclippers are easier to make than well-defined non-paperclippers.

Comment author: TheAncientGeek 12 November 2014 01:13:37AM *  0 points [-]

Paperclippers aren't dangerous unless they are fairly stable paperclippers...and something as arbitrary as papercliping is a very poor candidate for an attractor. The good candidates are the goals Omuhudro thinks AIs will converge on.

Comment author: Luke_A_Somers 12 November 2014 01:47:51PM 0 points [-]

Why do you think so?

Comment author: TheAncientGeek 12 November 2014 08:43:48PM 0 points [-]

Which bit, there's about three claim there.

Comment author: Luke_A_Somers 14 November 2014 12:06:29PM 0 points [-]

The second and third.

Comment author: TheAncientGeek 14 November 2014 08:53:28PM 0 points [-]