WalterL comments on Open thread, Sep. 26 - Oct. 02, 2016 - Less Wrong

2 Post author: MrMind 26 September 2016 07:41AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Ozyrus 26 September 2016 11:25:21PM *  1 point [-]

I've been meditating lately on a possibility of an advanced artificial intelligence modifying its value function, even writing some excrepts about this topic.

Is it theoretically possible? Has anyone of note written anything about this -- or anyone at all? This question is so, so interesting for me.

My thoughts led me to believe that it is theoretically possible to modify it for sure, but I could not come to any conclusion about whether it would want to do it. I seriously lack a good definition of value function and understanding about how it is enforced on the agent. I really want to tackle this problem from human-centric point, but i don't really know if anthropomorphization will work here.

Comment author: WalterL 28 September 2016 01:07:16PM 1 point [-]

On the one hand, there is no magical field that tells a code file whether the modifications coming into it are from me (human programmer) or the AI whose values that code file is. So, of course, if an AI can modify a text file, it can modify its source.

On the other hand, most likely the top goal on that value system is a fancy version of "I shall double never modify my value system", so it shouldn't do it.