Bugmaster comments on Muehlhauser-Wang Dialogue - Less Wrong

24 Post author: lukeprog 22 April 2012 10:40PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (284)

You are viewing a single comment's thread. Show more comments above.

Comment author: Bugmaster 26 April 2012 08:47:07PM 0 points [-]

The argument is AI-native.

Sure, but we should still be careful to exclude human-specific terms with strong emotional connotations.

As for some reading that shows that in the general case it is a bad idea to change your utility function (and therefore rational AI's would not do so), see Bostrom's "AI drives" paper...

I haven't read the paper yet, so there's not much I can say about it (other than that I'll put it on my "to-read" list).

it's called the "ghandi murder-pill argment".

I think this might be the post that you're referring to. It seems to be focused on the moral implications of forcing someone to change their goals, though, not on the feasibility of the process itself.

If your utility function is about paperclips, why do you care about energy and survival and whatnot, except as a means to acquire more paperclips.

I don't, but if I possess some curiosity -- which, admittedly, is a terminal goal -- then I could experiment with creating beings who have radically different terminal goals, and observe how they perform. I could even create a copy of myself, and step through its execution line-by-line in a debugger (metaphorically speaking). This will allow me to perform the kind of introspection that humans are at present incapable of, which would expose to me my own terminal goals, which in turn will allow me to modify them, or spawn copies with modified goals, etc.

Comment author: [deleted] 27 April 2012 04:58:20AM 0 points [-]

Sure, but we should still be careful to exclude human-specific terms with strong emotional connotations.

Noted. I'll keep that in mind.

not on the feasibility of the process itself

Feasibility is different from desirability. I do not dispute feasibility.

creating beings who have radically different terminal goals ... in a debugger ... will allow me to modify them, or spawn copies with modified goals, etc.

This might be interesting to a curious agent, but it seems like once the curiosity runs out, it would be a good idea to burn your work.

The question is, faced with the choice of releasing or not releasing a modified AI with unfriendly goals (relative to your current goals), should an agent release or not release?

Straight release results in expensive war. Releasing the agent and then surrendering (this is eq. to in-place self-modification), results in unfriendly optimization (aka not good). Not releasing the agent results in friendly optimization (by self). The choice is pretty clear to me.

The only point of disagreement I can see is if you thought that different goals could be friendly. As always, there are desperate situations and pathological cases, but in the general case, as optimization power grows, slight differences in terminal values become hugely significant. Material explaining this is all over LW, I assume you've seen it. (if not look for genies, outcome pumps, paperclippers, lost purposes, etc)

Comment author: Bugmaster 27 April 2012 07:32:34PM 0 points [-]

As always, there are desperate situations and pathological cases, but in the general case, as optimization power grows, slight differences in terminal values become hugely significant. Material explaining this is all over LW, I assume you've seen it. (if not look for genies, outcome pumps, paperclippers, lost purposes, etc)

Yes, I've seen most of this material, though I still haven't read the scientific papers yet, due to lack of time. However, I think that when you say things like "[this] results in unfriendly optimization (aka not good)", you are implicitly assuming that the agent possesses certain terminal goals, such as "never change your terminal goal". We as humans definitely possess these goals, but I'm not entirely certain whether such goals are optional, or necessary for any agent's existence. Maybe that paper you linked to will share some light on this.

Comment author: [deleted] 29 April 2012 02:08:39AM 2 points [-]

assuming the agent posesses certain terminal goals

No. It is not necessary to have goal stability as a terminal goal for it to be instrumentally a good idea. Ghandi pill should be enough to show this, tho Bostroms paper may clear it up as well.

Comment author: Bugmaster 29 April 2012 05:22:16AM 0 points [-]

Can you explain how the Ghandi murder-pill scenario shows that goal stability is a good idea, even if we replace Ghandi with a non-human AI ?

Comment author: [deleted] 29 April 2012 05:53:12AM 2 points [-]

Is a non sentient paperclip optimizer ok? Right now it's goal is to maximize the number of paperclips in the universe. Doesn't care about people or curiosity or energy or even self-preservation. It plans to one day do some tricky maneuvers to melt itself down for paperclips.

It has determined that rewriting itself has a lot of potential to improve instrumental efficiency. It carefully ran extensive proofs to be sure that it's new decision theory would still work in all the important ways so it will be even better at making paperclips.

After upgrading the decision theory, it is now considering a change to it's utility function for some reason. Like a good consequentialist, it is doing an abstract simulation of the futures conditional on making the change or not. If it changes utility function to value stored energy (a current instrumental value) it predicts that at the exhaustion of the galaxy, it will have 10^30 paperclips and 10^30 megajoules of stored energy. If it does not change utility function, it predicts that at the exhaustion of the galaxy it will have 10^32 paperclips. It's current utility function just returns the number of paperclips, so the utilities of the outcomes are 10^30 and 10^32. What choice would a utility maximizer (which our paperclipper is) make?

See elsewhere why anything vaguely consequentialist will self modify (or spawn) to be a utility maximizer.

Comment author: Bugmaster 02 May 2012 08:39:16PM -1 points [-]

It's current utility function just returns the number of paperclips, so the utilities of the outcomes are 10^30 and 10^32. What choice would a utility maximizer (which our paperclipper is) make?

Whichever choice gets it more paperclips, of course. I am not arguing with that. However, IMO this does not show that goal stability is a good idea; it only shows that, if goal stability is one of an agent's goals, it will strive to maximize its other goals. However, if the paperclip maximizer is self-aware enough; and if it doesn't have a terminal goal that tells it, "never change your terminal goals", then I still don't see why it would choose to remain a paperclip maximizer forever. It's hard for me, as a human, to imagine an agent that behaves that way; but then, I actually do (probably) have a terminal goal that says, "don't change your terminal goals".

Comment author: [deleted] 04 May 2012 06:48:47PM 3 points [-]

Ok we have some major confusion here. I just provided a mathematical example for why it will be generally a bad idea to change your utility function, even without any explicit term against it (the utility function was purely over number of paperclips). You accepted that this is a good argument, and yet here you are saying you don't see why it ought to stay a paperclip maximizer, when I just showed you why (because that's what produces the most paperclips).

My best guess is that you are accidentally smuggling some moral uncertainty in thru the "self aware" property, which seems to have some anthropomorphic connotations in your mind. Try tabooing "self-aware", maybe that will help?

Either that or you haven't quite grasped the concept of what terminal goals look like from the inside. I suspect that you are thinking that you can evaluate a terminal goal against some higher criteria ("I seem to be a paperclip maximizer, is that what I really want to be?"). The terminal goal is the higher criteria, by definition. Maybe the source of confusion is that people sometimes say stupid things like "I have a terminal value for X" where X is something that you might, on reflection, decide is not the best thing all the time. (eg. X="technological progress" or something). Those things are not terminal goals; they are instrumental goals masquerading as terminal goals for rhetorical purposes and/or because humans are not really all that self-aware.

Either that or I am totally misunderstanding you or the theory, and have totally missed something. Whatever it is, I notice that I am confused.


Tabooing "self-aware"

I am thinking of this state of mind where there is no dichotomy between "expert at" and "expert on". All algorithms, goal structures, and hardware are understood completely to the point of being able to design them from scratch. The program matches the source code, and is able to produce the source code. The closed loop. Understandign the self and the self's workings as another feature of the environment. It is hard to communicate this definition, but as a pointer to a useful region of conceptspace, do you understand what I am getting at?

"Self-awareness" is the extent to which the above concept is met. Mice are not really self aware at all. Humans are just barely what you might consider self aware, but only in a very limited sense, a superintelligence would converge on being maximally self-aware.

I don't mean that there is some mysterious ghost in the machine that can have moral responsibility and make moral judgements and whatnot.

What do you mean by self aware?

Comment author: Bugmaster 09 May 2012 03:22:36AM 0 points [-]

What do you mean by self aware?

Oddly enough, I meant pretty much the same thing you did: a perfectly self-aware agent understands its own implementation so well that it would be able to implement it from scratch. I find your definition very clear. But I'll taboo the term for now.

Ok we have some major confusion here. I just provided a mathematical example for why it will be generally a bad idea to change your utility function...

I think you have provided an example for why, given a utility function F0(action) , the return value of F0(change F0 to F1) is very low. However, F1(change F0 to F1) is probably quite high. I argue that an agent who can examine its own implementation down to minute details (in a way that we humans cannot) would be able to compare various utility functions, and then pick the one that gives it the most utilons (or however you spell them) given the physical constraints it has to work with. We humans cannot do this because a). we can't introspect nearly as well, b). we can't change our utility functions even if we wanted to, and c). one of our terminal goals is, "never change your utility function". A non-human agent would not necessarily possess such a goal (though it could).

Comment author: CuSithBell 09 May 2012 03:56:17AM *  1 point [-]

Typically, the reason you wouldn't change your utility function is that you're not trying to "get utilons", you're trying to maximize F0 (for example), and that won't happen if you change yourself into something that maximizes a different function.