timtyler comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: Stuart_Armstrong 03 September 2010 09:26:26AM 0 points [-]

You have two problems here. The first one is the one I mentioned - once you've set up the equality, what happens if the AI learns something that makes certain universes more likely than others?

For instance, let W1' be a universe in which the AI has a backup, W1 one in which it does not, and similarly for W0' and W0.

Initially, U(W0')=U(W1')=U(W0)=2 (it doesn't care what happens if it's backed up) and U(W1)=0 (it "doesn't want to die"). Apply the filter, and get:

F(U)(W1)=1, F(U)(W1')=3, F(U)(W0')=2, F(U)(W0)=2.

So it's indifferent. But then it discovers that it doesn't have a backup; now the relevant ones are W1 and W0, and it prefers W0. So it's no longer indifferent.

The other option is to have it change it's utility every time new information comes in, to track the changes. But this is bad. For a start, it will no longer be an utility maximiser, which will exposes it to predictable weaknesses (see this ). Secondly, a self-improving AI will try and get rid of this as it self-improves, as self-improving AI's move towards utility maximisers.

And lastly, it has all sorts of unintended consequences; the AI, for instance, may decided not to pay attention to certain information (or to only pay attention selectively) because this is the easiest way to accomplish its current goals.

Comment author: timtyler 03 September 2010 08:30:39PM 0 points [-]

You have two problems here.

FWIW, I couldn't make any sense out of the second supposed problem.

Comment author: Stuart_Armstrong 06 September 2010 11:08:54AM 0 points [-]

If you update your utility every time new information comes in, the utility is time-inconsistent. This lets you be money-pumped. Hence it's the kind of thing you would get rid of at you next self-improvement.

Comment author: timtyler 06 September 2010 04:44:33PM *  0 points [-]

The utility function is always the same in this kind of scenario - and is not "updated".

It typically says something roughly like: stop button not pressed: business as normal - stop button pressed: let the engineeres dismantle your brain. That doesn't really let you be money-pumped because - for one thing, a pump needs repeated cycles to do much work. Also, after being switched off the agent can't engage in any economic activities.

Agent's won't get rid of such stipulations as they self-improve - under the assumption that a self-improving agent successfully preserves its utility function. Changing the agent's utility function would typically be very bad - from the point of view of the agent.