Stuart_Armstrong comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 03 September 2010 10:59:39AM 0 points [-]

Good comment.

You've assumed away the major difficulty, that of knowing what the AI's utility function is in the first place! If you can simply inspect the utility function like this, there's no need for a filter; you just check whether the utility of outcomes you want is higher than that of outcomes you don't want.

Knowing what U is, and figuring out if U will result in outcomes that you like, are completely different things! We have little grasp of the space of possible outcomes; we don't even know what we want, and we can't imagine some of the things that we don't want.

Yes, we do need to have some idea of what U is - or at least something (a simple AI subroutine applying the filter, an AI designing its next self-improvement) has to have some idea. But it doesn't need to understand U beyond what is needed to apply F. And since F is considerably simpler than what U is likely to be...

It seems plausible that F could be implemented by a simple subroutine even across self-improvement.