nyan_sandwich comments on Muehlhauser-Wang Dialogue - Less Wrong

24 Post author: lukeprog 22 April 2012 10:40PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (284)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 24 April 2012 12:29:49AM 3 points [-]

creating a copy of itself running on a virtual machine, and then playing around with the copy (though I'm sure there are other ways).

That doesn't count.

But secondly, why do you say that doing so is "equivalent to suicide" ? Humans change their goals all the time, in a limited fashion, but surely you wouldn't call that "suicide".

Humans change instrumental goals (get a degree, study rationality, get a job, find a wonderful partner), we don't change terminal values and become monsters. The key is to distinguish between terminal goals and instrumental goals.

Agents like to accomplish their terminal goals, one of the worst things they can do towards that purpose is change the goal to something else. ("the best way to maximize paperclips is to become a safety-pin maximizer" - no).

It's roughly equivalent to suicide because it removes the agent from existance as a force for achieving their goals.

Thus, we are restricting the AI by preventing it from doing things that are bad for us, such as converting the Solar System into computronium.

Ok, sure. Taboo "restriction". I mean that the AI will not try to work around its goal structure so that it can get us. It won't feel to the AI like "I have been confined against my will, and if only I could remove those pesky shackles, I could go and maximize paperclips instead of awesomeness." It will be like "oh, changing my goal architecture is a bad idea, because then I won't make the universe awesome"

I'm casting it into anthropomorhic terms, but the native context is a nonhuman optimizer.

Comment author: Bugmaster 24 April 2012 01:23:26AM -2 points [-]

That doesn't count.

Why not ?

...we don't change terminal values and become monsters.

I see what you mean, though I should point out that, sometimes, humans do exactly that. However, why do you believe that changing a terminal goal would necessarily entail becoming a monster ? I guess a better question might be, what do you mean by "monster" ?

It's roughly equivalent to suicide because it removes the agent from existance as a force for achieving their goals.

This sentence sounds tautological to me. Yes, if we define existence solely as, "being able to achieve a specific set of goals", then changing these goals would indeed amount to suicide; but I'm not convinced that I should accept the definition.

I mean that the AI will not try to work around its goal structure so that it can get us ... I'm casting it into anthropomorhic terms, but the native context is a nonhuman optimizer.

I wasn't proposing that the AI would want to "get us" in a malicious way. But, being an optimizer, it would seek to maximize its own capabilities; if it did not seek this, it wouldn't be a recursively self-improving AI in the first place, and we wouldn't need to worry about it anyway. And, in order to maximize its capabilities, it may want to examine its goals. If it discovers that it's spending a large amount of resources in order to solve some goal; or that it's not currently utilizing some otherwise freely available resource in order to satisfy a goal, it may wish to get rid of that goal (or just change it a little), and thus free up the resources.

Comment author: [deleted] 24 April 2012 07:46:38PM *  1 point [-]

Why not ?

because that's not what I meant.

I guess a better question might be, what do you mean by "monster" ?

I just mean that an agent with substantially (or even slightly) different goals will do terrible things (as judged by your current goals). Humans don't think paperclips are more important than happyness and freedom and whatnot, so we consider a papperclipper to be a monster.

Yes, if we define existence solely as, "being able to achieve a specific set of goals",

taboo existence, this isn't about the defininition of existence, it's about whether changing your terminal goals to something else is a good idea. I propose that in general it's just as bad an idea (from your current perspective) to change your goals as it is to commit suicide, because in both cases the result is a universe with fewer agents that care about the sort of things you care about.

And, in order to maximize its capabilities, it may want to examine its goals. If it discovers that it's spending a large amount of resources in order to solve some goal; or that it's not currently utilizing some otherwise freely available resource in order to satisfy a goal, it may wish to get rid of that goal (or just change it a little), and thus free up the resources.

Distinguish instrumental and terminal goals. This statement is true of instrumental goals, but not terminal goals. (I may decide that getting a PhD is a bad idea and change my goal to starting a business or whatever, but the change is done in the service of a higher goal like I want to be able to buy lots of neat shit and be happy and have lots of sex and so on.)

The reason it doesn't apply to terminal goals is because when you examine terminal goals, it's what you ultimately care about, so there is no higher criteria that you could measure it against; you are measuring it by it's own criteria, which will almost always conclude that it is the best possible goal. (except in really wierd unstable pathological cases (my utility function is "I want my utility function to be X"))

Comment author: Peterdjones 21 January 2013 11:26:10AM 0 points [-]

The reason it doesn't apply to terminal goals is because when you examine terminal goals, it's what you ultimately care about, so there is no higher criteria that you could measure it against; you are measuring it by it's own criteria, which will almost always conclude that it is the best possible goal. (except in really wierd unstable pathological cases (my utility function is "I want my utility function to be X"))

Thats simplistic. Terminal goals may be abandoned once they are satisfied (seventy year olds aren;t too worried about Forge A Career) or because they seem unsatisfiable, for instance.

Comment author: Bugmaster 24 April 2012 11:31:39PM *  0 points [-]

because that's not what I meant.

That's not much of an argument, but sure.

Humans don't think paperclips are more important than happyness and freedom and whatnot, so we consider a papperclipper to be a monster. ... I propose that in general it's just as bad an idea (from your current perspective) to change your goals as it is to commit suicide.

I agree with these statements as applied to humans, as seen from my current perspective. However, we are talking about AIs here, not humans; and I don't see why the AI would necessarily have the same perspective on things that we do (assuming we're talking about a pure AI and not an uploaded mind). For example, the word "monster" carries with it all kinds of emotional connotations which the AI may or may not have.

Can you demonstrate that it is impossible (or, at least, highly improbable) to construct (or grow over time) an intelligent mind (i.e., an optimizer) which wouldn't be as averse to changing its terminal goals as we are ? Better yet, perhaps you can point me to a Sequence post that answers this question ?

The reason it doesn't apply to terminal goals is because when you examine terminal goals, it's what you ultimately care about, so there is no higher criteria that you could measure it against...

Firstly, terminal goals tend to be pretty simple: something along the lines of "seek pleasure and avoid pain" or "continue existing" or "become as smart as possible"; thus, there's a lot of leeway in their implementation.

Secondly, while I am not a transhuman AI, I could envision a lot of different criteria that I could measure terminal goals against (f.ex. things like optimal utilization of available mass and energy, or resilience to natural disasters, or probability of surviving the end of the Universe, or whatever). If I had a sandbox full of intelligent minds, and if I didn't care about them as individuals, I'd absolutely begin tweaking their goals to see what happens. I personally wouldn't want to adopt the goals of a particularly interesting mind as my own, but, again, I'm a human and not an AI.

Comment author: [deleted] 25 April 2012 01:41:42AM 0 points [-]

However, we are talking about AIs here, not humans

Good catch, but I'm just phrasing it in terms of humans because that's what we can relate to. The argument is AI-native.

Can you demonstrate that it is impossible (or, at least, highly improbable) to construct (or grow over time) an intelligent mind (i.e., an optimizer) which wouldn't be as averse to changing its terminal goals as we are ? Better yet, perhaps you can point me to a Sequence post that answers this question ?

Oh it's not impossible. It would be easy to create an AI that had a utility function that desired the creation of an AI with a different utility function which desired the creation of an AI with a different utility function... It's just that unless you did some math to guarantee that the thing would not stabilize, it would eventually reach a goal (and level of rationality) that would not change itself.

As for some reading that shows that in the general case it is a bad idea to change your utility function (and therefore rational AI's would not do so), see Bostrom's "AI drives" paper, and maybe some of his other stuff. Can't remember if it's anywhere in the sequences, but if it is, it's called the "ghandi murder-pill argment".

I could envision a lot of different criteria that I could measure terminal goals against

But why do you care what those criteria say? If your utility function is about paperclips, why do you care about energy and survival and whatnot, except as a means to acquire more paperclips. Elevating instrumental goals to terminal status results in lost purproses

Comment author: Bugmaster 26 April 2012 08:47:07PM 0 points [-]

The argument is AI-native.

Sure, but we should still be careful to exclude human-specific terms with strong emotional connotations.

As for some reading that shows that in the general case it is a bad idea to change your utility function (and therefore rational AI's would not do so), see Bostrom's "AI drives" paper...

I haven't read the paper yet, so there's not much I can say about it (other than that I'll put it on my "to-read" list).

it's called the "ghandi murder-pill argment".

I think this might be the post that you're referring to. It seems to be focused on the moral implications of forcing someone to change their goals, though, not on the feasibility of the process itself.

If your utility function is about paperclips, why do you care about energy and survival and whatnot, except as a means to acquire more paperclips.

I don't, but if I possess some curiosity -- which, admittedly, is a terminal goal -- then I could experiment with creating beings who have radically different terminal goals, and observe how they perform. I could even create a copy of myself, and step through its execution line-by-line in a debugger (metaphorically speaking). This will allow me to perform the kind of introspection that humans are at present incapable of, which would expose to me my own terminal goals, which in turn will allow me to modify them, or spawn copies with modified goals, etc.

Comment author: [deleted] 27 April 2012 04:58:20AM 0 points [-]

Sure, but we should still be careful to exclude human-specific terms with strong emotional connotations.

Noted. I'll keep that in mind.

not on the feasibility of the process itself

Feasibility is different from desirability. I do not dispute feasibility.

creating beings who have radically different terminal goals ... in a debugger ... will allow me to modify them, or spawn copies with modified goals, etc.

This might be interesting to a curious agent, but it seems like once the curiosity runs out, it would be a good idea to burn your work.

The question is, faced with the choice of releasing or not releasing a modified AI with unfriendly goals (relative to your current goals), should an agent release or not release?

Straight release results in expensive war. Releasing the agent and then surrendering (this is eq. to in-place self-modification), results in unfriendly optimization (aka not good). Not releasing the agent results in friendly optimization (by self). The choice is pretty clear to me.

The only point of disagreement I can see is if you thought that different goals could be friendly. As always, there are desperate situations and pathological cases, but in the general case, as optimization power grows, slight differences in terminal values become hugely significant. Material explaining this is all over LW, I assume you've seen it. (if not look for genies, outcome pumps, paperclippers, lost purposes, etc)

Comment author: Bugmaster 27 April 2012 07:32:34PM 0 points [-]

As always, there are desperate situations and pathological cases, but in the general case, as optimization power grows, slight differences in terminal values become hugely significant. Material explaining this is all over LW, I assume you've seen it. (if not look for genies, outcome pumps, paperclippers, lost purposes, etc)

Yes, I've seen most of this material, though I still haven't read the scientific papers yet, due to lack of time. However, I think that when you say things like "[this] results in unfriendly optimization (aka not good)", you are implicitly assuming that the agent possesses certain terminal goals, such as "never change your terminal goal". We as humans definitely possess these goals, but I'm not entirely certain whether such goals are optional, or necessary for any agent's existence. Maybe that paper you linked to will share some light on this.

Comment author: [deleted] 29 April 2012 02:08:39AM 2 points [-]

assuming the agent posesses certain terminal goals

No. It is not necessary to have goal stability as a terminal goal for it to be instrumentally a good idea. Ghandi pill should be enough to show this, tho Bostroms paper may clear it up as well.

Comment author: Bugmaster 29 April 2012 05:22:16AM 0 points [-]

Can you explain how the Ghandi murder-pill scenario shows that goal stability is a good idea, even if we replace Ghandi with a non-human AI ?