RolfAndreassen comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread.

Comment author: RolfAndreassen 02 September 2010 09:34:25PM 5 points [-]

This is very fine provided you know which part of the AI's code contains the utility function, and are certain it's not going to be modified. But it seems to me that if you were able to calculate the utility of world-outcomes modularly, then you wouldn't need an AI in the first place; you would instead build an Oracle, give it your possible actions as input, and select the action with the greatest utility. Consequently, if you have an AI, it is because your utility calculation is not a separable piece of code, but some sort of global function of a huge number of inputs and internal calculations. How can you apply a filter to that?

You've assumed away the major difficulty, that of knowing what the AI's utility function is in the first place! If you can simply inspect the utility function like this, there's no need for a filter; you just check whether the utility of outcomes you want is higher than that of outcomes you don't want.

If you know the utility function, you have no need to filter it. If you don't know it, you can't filter it.

Comment author: timtyler 03 September 2010 08:21:51PM *  1 point [-]

But it seems to me that if you were able to calculate the utility of world-outcomes modularly, then you wouldn't need an AI in the first place; you would instead build an Oracle, give it your possible actions as input, and select the action with the greatest utility.

That sounds as though it is just an intelligent machine which has been crippled by being forced to act through a human body.

You suggest that would be better - but how?

Comment author: Stuart_Armstrong 03 September 2010 10:59:39AM 0 points [-]

Good comment.

You've assumed away the major difficulty, that of knowing what the AI's utility function is in the first place! If you can simply inspect the utility function like this, there's no need for a filter; you just check whether the utility of outcomes you want is higher than that of outcomes you don't want.

Knowing what U is, and figuring out if U will result in outcomes that you like, are completely different things! We have little grasp of the space of possible outcomes; we don't even know what we want, and we can't imagine some of the things that we don't want.

Yes, we do need to have some idea of what U is - or at least something (a simple AI subroutine applying the filter, an AI designing its next self-improvement) has to have some idea. But it doesn't need to understand U beyond what is needed to apply F. And since F is considerably simpler than what U is likely to be...

It seems plausible that F could be implemented by a simple subroutine even across self-improvement.