Vaniver comments on The Design Space of Minds-In-General - Less Wrong

19 Post author: Eliezer_Yudkowsky 25 June 2008 06:37AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (82)

Sort By: Old

You are viewing a single comment's thread. Show more comments above.

Comment author: wnoise 16 January 2011 11:38:41PM 0 points [-]

Really? Isn't editing one's goal directly contrary to one's goal? If an AI self-edits in such a way that its goal changes, it will predictably no longer be working towards that goal, and will thus not consider it a good idea to edit its goal.

Comment author: Vaniver 16 January 2011 11:51:25PM *  0 points [-]

It depends on how it decides whether or not changes are a good thing. If is trying out two utility functions- Ub for utility before and Ua for utility after- you need to be careful to ensure it doesn't say "hey, Ua(x)>Ub(x), so I can make myself better off by switching to Ua!".

Ensuring that doesn't happen is not simple, because it requires stability throughout everything. There can't be a section that decides to try being goalless, or go about resolving the goal in a different way (which is troublesome if you want it to cleverly use instrumental goals).

[edit] To be clearer, you need to not just have the goals be fixed and well-understood, but every part of everywhere else also needs to have a fixed and well-understood relationship to the goals (and a fixed and well-understood sense of understanding, and ...). Most attempts to rewrite source code are not that well-planned.