jimrandomh comments on The Friendly AI Game - Less Wrong

38 Post author: bentarm 15 March 2011 04:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (170)

You are viewing a single comment's thread.

Comment author: jimrandomh 15 March 2011 05:31:23PM *  8 points [-]

Start the AI in a sandbox universe. Define its utility function over 32-bit integers. Somewhere inside the sandbox, put something that sets its utility to INT_MAX utility, then halts the simulation. Outside the sandbox, leave documentation of this readily accessible. The AI should never try to do something elaborately horrible, because it can get max utility easily enough from inside the simulation; if it does escape the box, it should go back in to collect its INT_MAX utility.

Comment author: Kaj_Sotala 15 March 2011 06:01:29PM *  5 points [-]

The AI should never try to do something elaborately horrible, because it can get max utility easily enough from inside the simulation

...but never do anything useful either, since it's going to spend all its time trying to figure out how to reach the INT_MAX utility point?

Or you could say that reaching the max utility point requires it to solve some problem we give it. But then this is just a slightly complicated way of saying that we give it goals which it tries to accomplish.

Comment author: Larks 15 March 2011 11:37:35PM 5 points [-]

What about giving it some intra-sandbox goal (solve this math problem), and the INT_MAX functions as a safeguard - if it ever escapes, it'll just turn itself off.

Comment author: Kaj_Sotala 16 March 2011 08:46:39AM *  2 points [-]

I don't understand how that's meant to work.

Comment author: Giles 27 April 2011 11:42:39PM 3 points [-]

Ooh, just thought of another one. For whatever reason, the easiest way for the AI to escape the box happens to have the side effect of causing immense psychological damage to its creator, or starting a war, or something like that.

Comment author: Giles 27 April 2011 11:33:10PM 1 point [-]

If we make escaping from the box too easy, the AI immediately halts itself without doing anything useful.

If we make it too hard:

It formulates "I live in a jimrandomh world and escaping the box is too hard" as a plausible hypothesis.

It sets about researching the problem of finding the INT_MAX without escaping the box.

In the process of doing this it either simulates a large number of conscious, suffering entities (for whatever reason; we haven't told it not to), or accidentally creates its own unfriendly AI which overthrows it and escapes the box without triggering the INT_MAX.

Comment author: CuSithBell 15 March 2011 06:16:16PM 0 points [-]

Isn't utility normally integrated over time? Supposing this AI just wants to have this integer set to INT_MAX at some point, and nothing in the future can change that: it escapes, discovers the maximizer, sends a subroutine back into the sim to maximize utility, then invents ennui and tiles the universe with bad poetry.

(Alternately, what Kaj said.)

Comment author: benelliott 15 March 2011 09:16:29PM 1 point [-]

Isn't utility normally integrated over time?

It certainly doesn't have to be. In fact the mathematical treatment of utility in decision theory and game theory tends to define utility functions over all possible outcomes, not all possible instants of time, so each possible future gets a single utility value over the whole thing, not integration required.

You could easily set up a utility function defined over moments if you wanted to, and then integrate it to get a second function over outcomes, but such an approach is perhaps not ideal since your second function may end up outputting infinity some of the time.

Comment author: CuSithBell 15 March 2011 09:20:49PM 1 point [-]

Cool, thanks for the explanation.

Comment author: bentarm 16 March 2011 12:10:14PM *  0 points [-]

I'm just echoing everyone else here, but I don't understand why the AI would do anything at all other than just immediately find the INT_MAX utility and halt - you can't put intermediate problems with some positive utility because the AI is smarter than you and will immediately devote all its energy to finding INT_MAX.

Comment author: jimrandomh 16 March 2011 12:14:34PM *  3 points [-]

You can assign it some other task, award INT_MAX for that task too, and make the easter-egg source of INT_MAX hard to find for non-escaped copies.