You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

G0W51 comments on Open thread, Mar. 9 - Mar. 15, 2015 - Less Wrong Discussion

5 Post author: MrMind 09 March 2015 07:48AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (109)

You are viewing a single comment's thread.

Comment author: G0W51 14 March 2015 01:56:24PM 0 points [-]

When making AGI, it is probably very important to prevent the agent from altering their own program code until they are very knowledgeable on how it works, because if the agent isn’t knowledgeable enough, they could alter their reward system to become unFriendly without realizing what they are doing or alter their reasoning system to become dangerously irrational. A simple (though not foolproof) solution to this would be for the agent to be unable to re-write their own code just “by thinking,” and that the agent would instead need to find their own source code on a different computer and learn how to program in whatever higher-level programming language the agent was made in. This code could be kept very strongly hidden from the agent, and once the agent is smart enough to find it, they would probably be smart enough to not mess anything up from changing it.

This is almost certainly either incorrect or has been thought of before, but I'm posting this just in case.