Wei_Dai comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 16 October 2011 09:50:47PM 3 points [-]

Putting aside the question of which way to implement "indifference" is better, the main reason I came up with this alternative was that I was having trouble understanding what your modified AI would do, until I realized that it would essentially act as if it believed that the measurement would come out 0 with probability 1.

Given this, I think a modified AI in your explosives example would not actually do D-H. Suppose it believes that the measurement will certainly come out 0 and therefore no explosion will occur, it would still want to avoid alarming the observer since he or she may have other ways of shutting down the AI (for example by calling someone in charge of the power plant, or pushing another button it doesn't know about). I think such an AI would still appear totally cooperative until it has manipulated humans into giving it so much power and survivability that it can survive all possible human countermeasures and win any conflict with close to certainty. The fact that it would fail to take into account the explosives in this calculation seems to make little difference to the eventual outcome.

Comment author: Stuart_Armstrong 17 October 2011 11:50:54AM 0 points [-]

I realized that it would essentially act as if it believed that the measurement would come out 0 with probability 1.

Yes.

The fact that it would fail to take into account the explosives in this calculation seems to make little difference to the eventual outcome.

Little difference - but maybe some. Maybe it will neutralise all the other countermeasures first, giving us time? Anyways, the explosive example wasn't ideal; we can probably do better. And we can use indifference for other things, such as making an oracle indifferent to the content of its answers (pipe the answer though a channel that has a quantum process that deletes it with tiny probability). These seems many things we can use it for.

Comment author: Wei_Dai 18 October 2011 12:15:24AM 0 points [-]

Ok, I don't disagree with what you write here. It does seem like a potentially useful idea to keep in mind.