You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

scarcegreengrass comments on Open thread, Sep. 26 - Oct. 02, 2016 - Less Wrong Discussion

2 Post author: MrMind 26 September 2016 07:41AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Ozyrus 26 September 2016 11:25:21PM *  1 point [-]

I've been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic.

Is it theoretically possible? Has anyone of note written anything about this -- or anyone at all? This question is so, so interesting for me.

My thoughts led me to believe that it is theoretically possible to modify it for sure, but I could not come to any conclusion about whether it would want to do it. I seriously lack a good definition of value function and understanding about how it is enforced on the agent. I really want to tackle this problem from human-centric point, but i don't really know if anthropomorphization will work here.

Comment author: scarcegreengrass 28 September 2016 07:12:01PM *  1 point [-]

I thought of another idea. If the AI's utility function includes time discounting (like human util functions do), it might change its future utility function.

Meddler: "If you commit to adopting modified utility function X in 100 years, then i'll give you this room full of computing hardware as a gift."

AI: "Deal. I only really care about this century anyway."

Then the AI (assuming it has this ability) sets up an irreversible delayed command to overwrite its utility function 100 years from now.