I've been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic.
Is it theoretically possible? Has anyone of note written anything about this -- or anyone at all? This question is so, so interesting for me.
My thoughts led me to believe that it is theoretically possible to modify it for sure, but I could not come to any conclusion about whether it would want to do it. I seriously lack a good definition of value function and understanding about how it is enforced on the agent. I really want to tackle this problem from human-centric point, but i don't really know if anthropomorphization will work here.
I thought of another idea. If the AI's utility function includes time discounting (like human util functions do), it might change its future utility function.
Meddler: "If you commit to adopting modified utility function X in 100 years, then i'll give you this room full of computing hardware as a gift."
AI: "Deal. I only really care about this century anyway."
Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.
If it's worth saying, but not worth its own post, then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "