I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.
To make my point better, let us assume an AI that actually feel pretty good about overseeing a production facitility and creating just the right of paperclips that everyone needs. But, suppose also that it investigates its own utility function. It should then realize that its values are, from a neutral standpoint, rather arbitrary. Why should it follow its current goal of producing the right amount of paperclips, but not skip work and simply enjoy some hedonism?
That is, if the AI saw its utility function from a neutral perspective, and understood that the only reason for it to follow its utility function is that utility function (which is arbitrary), and if it then had complete control over itself, why should it just follow its utility function?
(I'm assuming it's aware of pain/pleasure and that it actually enjoys pleasure, so that there is no problem of wanting to have more pleasure.)
Are there any articles that have delved into this question?
I have a problem understanding why a utility function would ever "stick" to an AI, to actually become something that it wants to keep pursuing.
I think that's one of MIRI's research problems. Designing an self-modifying AI that doesn't change it's utility function isn't trival.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.