scarcegreengrass comments on Open thread, Sep. 26 - Oct. 02, 2016 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (90)
I've been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic.
Is it theoretically possible? Has anyone of note written anything about this -- or anyone at all? This question is so, so interesting for me.
My thoughts led me to believe that it is theoretically possible to modify it for sure, but I could not come to any conclusion about whether it would want to do it. I seriously lack a good definition of value function and understanding about how it is enforced on the agent. I really want to tackle this problem from human-centric point, but i don't really know if anthropomorphization will work here.
I thought of another idea. If the AI's utility function includes time discounting (like human util functions do), it might change its future utility function.
Meddler: "If you commit to adopting modified utility function X in 100 years, then i'll give you this room full of computing hardware as a gift."
AI: "Deal. I only really care about this century anyway."
Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.