Stuart_Armstrong comments on AI indifference through utility manipulation - Less Wrong

4 Post author: Stuart_Armstrong 02 September 2010 05:06PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (53)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacob_cannell 02 September 2010 11:06:38PM *  0 points [-]

I want to second RolfAndreassen' viewpoint below.

The problem with this entire train of thought is that you completely skip past the actual real difficulty, which is constructing any type of utility function even remotely as complex as the one you propose.

Your hypothetical utility function references undefined concepts such as "taking control of", "cooperating", "humans", and "self", etc etc

If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.

I'm skeptical then about the entire concept of 'utility function filters', as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.

A more profitable route, it seems to me, is something like this:

Put the AI's in a matrix-like sim (future evolution of current computer game & film simulation tech) and get a community of a few thousand humans to take part in a Truman Show like experiment. Indeed, some people would pay to spectate or even participate, so it could even be a for profit venture. A hierarchy of admins and control would ensure that potential 'liberators' were protected against. In the worst case, you can always just rewind time. (something the Truman Show could never do - a fundamental advantage of a massive sim)

The 'filter function' operates at the entire modal level of reality: the AI's think they are humans, and do not know they are in a sim. And even if they suspected they were in a sim (ie by figuring out the simulation argument), they wouldn't know who were humans and who were AI's (and indeed they wouldn't know which category they were in). As the human operators would have godlike monitoring capability over the entire sim, including even an ability to monitor AI thought activity, this should make a high level of control possible.

They can't turn against humans in the outside world if they don't even believe it exists.

This sounds like a science fiction scenario (and it is), but it's also feasible, and I'd say far more feasible than approaches which directly try to modify, edit, or guarantee mindstates of AI's who are allowed to actually know they are AIs.

Comment author: Stuart_Armstrong 03 September 2010 09:11:53AM *  0 points [-]

Your hypothetical utility function references undefined concepts such as "taking control of", "cooperating", "humans", and "self", etc etc

If you actually try to ground your utility function and go through the work of making it realistic, you quickly find that it ends up being something on the order of complexity of a human brain, and its not something that you can easily define in a few pages of math.

Don't get confused by the initial example, which was there purely for illustration (as I said, if you knew all these utility values, you wouldn't need any sort of filter, you'd just set all utilities but U(B) to zero).

It's because these concepts are hard that I focused on indifference, which, it seems, has a precise mathematical formulation. You can implement the general indifference without understanding anything about U at all.

I'm skeptical then about the entire concept of 'utility function filters', as it seems their complexity would be on the order of or greater than the utility function itself, and you need to keep constructing an endless sequence of such complex utility function filters.

The description of the filter is in this blog post; a bit more work will be needed to see that certain universes are indistinguishable up until X. But this can be approximated, if needed.

U, on the other hand, can be arbitrarily complex.