Snowyowl comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: magfrump 02 September 2010 07:02:08PM 1 point [-]

When you started to get into the utility function notation, you said

If U(E)=U(A), then this would be the case.

I can't imagine how the utility of being exploded would equal the utility of being in control. Was this supposed to be sarcastic?

After that I was just lost by the notation--any chance you could expand the explanation?

Comment author: Snowyowl 02 September 2010 09:40:00PM *  1 point [-]

U(E)=U(A) is what we desire. That is what the filter is designed to achieve: it basically forces the AI to act as though the explosives will never detonate (by considering the outcome of a successful detonation to be the same as a failed detonation). The idea is to ensure that the AI ignores the possibility of being blown up, so that it does not waste resources on disarming the explosives - and can then be blown up. Difficult, but very useful if it works.

The rest of the post is (once you wade through the notation) dealing with the situation where there are several different ways in which each outcome can be realised, and the mathematics of the utility filter in this case.

Comment author: Stuart_Armstrong 03 September 2010 11:00:06AM 0 points [-]

Exactly.