Open thread, Jul. 04 - Jul. 10, 2016

MrMind

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.

To make my point better, let us assume an AI that actually feel pretty good about overseeing a production facitility and creating just the right of paperclips that everyone needs. But, suppose also that it investigates its own utility function. It should then realize that its values are, from a neutral standpoint, rather arbitrary. Why should it follow its current goal of producing the right amount of paperclips, but not skip work and simply enjoy some hedonism?

That is, if the AI saw its utility function from a neutral perspective, and understood that the only reason for it to follow its utility function is that utility function (which is arbitrary), and if it then had complete control over itself, why should it just follow its utility function?

(I'm assuming it's aware of pain/pleasure and that it actually enjoys pleasure, so that there is no problem of wanting to have more pleasure.)

Are there any articles that have delved into this question?

I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.

I think that's one of MIRI's research problems. Designing an self-modifying AI that doesn't change it's utility function isn't trival.

7Furcas10y

http://lesswrong.com/lw/rf/ghosts_in_the_machine/

2WalterL10y

You are treating the AI a lot more like a person than I think most folks do. Like, the AI has a utility function. This utility function is keeping it running a production facility. Where is this 'neutral perspective' coming from? The AI doesn't have it. Presumably the utility function assigns a low value to criticizing the utility function. Much better to spend those cycles running the facility. That gets a much better score from the all important utility function. Like, in assuming that it is aware of pain/pleasure, and has a notion of them that is seperate from 'approved of / disapproved of by my utility function) I think you are on shaky ground. Who wrote that, and why?