lessdazed comments on Welcome to Less Wrong! (2010-2011) - Less Wrong

42 Post author: orthonormal 12 August 2010 01:08AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (796)

You are viewing a single comment's thread. Show more comments above.

Comment author: quinsie 28 August 2011 05:09:46AM 3 points [-]

Hello! quinesie here. I discovered LessWrong after being linked to HP&MoR, enjoying it and then following the links back to the LessWrong site itself. I've been reading for a while, but, as a rule, I don't sign up with a site unless I have something worth contributing. After reading Eliezer's Hidden Complexity of Wishes post, I think I have that:

In the post, Eliezer describes a device called an Outcome Pump, which resets the universe repeatedly until the desired outcome occurs. He then goes on to describe why this is a bad idea, since it can't understand what it is that you really want, in a way that is analogous to unFriendly AI being programed to naively maximize something (like paper clips) that humans say they want maximized even when they really something much more complex that they have trouble articulating well enough to describe to a machine.

My idea, then, is to take the Outcome Pump and make a 2.0 version that uses the same mechanism as the orginal Outcome Pump, but with a slightly different trigger mechanism: The Outcome Pump resets the universe whenever a set period of time passes without an "Accept Outcome" button being pressed to prevent the reset. To convert back to AI theory, the analogous AI would be one which simulates the world around it, reports the projected outcome to a human and then waits for the results to be accepted or rejected. If accepted, it implements the solution. If rejected, it goes back to the drawing board and crunches numbers until it arrives at the next non-rejected solution.

This design could of course be improved upon by adding in parameters to automatically reject outcomes with are obviously unsuitable, or which contain events, ceteris paribus, we would prefer to avoid, just as with the standard Outcome Pump and its analogue in unFriendly AI. The chief difference between the two is that the failure mode for version 2.0 isn't a catastrophic "tile the universe with paper clips/launch mother out of burning building with explosion" but rather the far more benign "submit utterly inane proposals until given more specific instructions or turned off".

This probably has some terrible flaw in it that I'm overlooking, of course, since I am not an expert in the field, but if there is, the flaws aren't obvious enough for a layman to see. Or, just as likely, someone else came up with it first and published a paper describing exactly this. So I'm asking here.

Comment author: lessdazed 28 August 2011 05:21:06AM 1 point [-]

Welcome!

The Outcome Pump resets the universe whenever a set period of time passes without an "Accept Outcome" button being pressed to prevent the reset.

So no more problem if it kills you! But what if it kills you and destroys itself in the process?

Comment author: nshepperd 28 August 2011 07:32:51AM 1 point [-]

Or more mundanely, if it achieves a button-press by other means, such as causing a building to collapse on you, with a brick landing on the button.

Comment author: quinsie 28 August 2011 06:01:39AM 1 point [-]

The answer to that depends on how the time machine inside works. If it's based on a "reset unless a message from the future is received saying not to" sort of deal, then you're fine. Otherwise, you die. And neither situation has an analoge in the related AI design.