Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

The Hidden Complexity of Wishes

50 Post author: Eliezer_Yudkowsky 24 November 2007 12:12AM

Followup toThe Tragedy of Group Selectionism, Fake Optimization Criteria, Terminal Values and Instrumental Values, Artificial Addition, Leaky Generalizations 

"I wish to live in the locations of my choice, in a physically healthy, uninjured, and apparently normal version of my current body containing my current mental state, a body which will heal from all injuries at a rate three sigmas faster than the average given the medical technology available to me, and which will be protected from any diseases, injuries or illnesses causing disability, pain, or degraded functionality or any sense, organ, or bodily function for more than ten days consecutively or fifteen days in any year..."
            -- The Open-Source Wish Project, Wish For Immortality 1.1

There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

Suppose your aged mother is trapped in a burning building, and it so happens that you're in a wheelchair; you can't rush in yourself.  You could cry, "Get my mother out of that building!" but there would be no one to hear.

Luckily you have, in your pocket, an Outcome Pump.  This handy device squeezes the flow of time, pouring probability into some outcomes, draining it from others.

The Outcome Pump is not sentient.  It contains a tiny time machine, which resets time unless a specified outcome occurs.  For example, if you hooked up the Outcome Pump's sensors to a coin, and specified that the time machine should keep resetting until it sees the coin come up heads, and then you actually flipped the coin, you would see the coin come up heads.  (The physicists say that any future in which a "reset" occurs is inconsistent, and therefore never happens in the first place - so you aren't actually killing any versions of yourself.)

Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics.  If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs.

You can also redirect probability flow in more quantitative ways using the "future function" to scale the temporal reset probability for different outcomes.  If the temporal reset probability is 99% when the coin comes up heads, and 1% when the coin comes up tails, the odds will go from 1:1 to 99:1 in favor of tails.  If you had a mysterious machine that spit out money, and you wanted to maximize the amount of money spit out, you would use reset probabilities that diminished as the amount of money increased.  For example, spitting out $10 might have a 99.999999% reset probability, and spitting out $100 might have a 99.99999% reset probability.  This way you can get an outcome that tends to be as high as possible in the future function, even when you don't know the best attainable maximum.

So you desperately yank the Outcome Pump from your pocket - your mother is still trapped in the burning building, remember? - and try to describe your goal: get your mother out of the building!

The user interface doesn't take English inputs.  The Outcome Pump isn't sentient, remember?  But it does have 3D scanners for the near vicinity, and built-in utilities for pattern matching.  So you hold up a photo of your mother's head and shoulders; match on the photo; use object contiguity to select your mother's whole body (not just her head and shoulders); and define the future function using your mother's distance from the building's center.  The further she gets from the building's center, the less the time machine's reset probability.

You cry "Get my mother out of the building!", for luck, and press Enter.

For a moment it seems like nothing happens.  You look around, waiting for the fire truck to pull up, and rescuers to arrive - or even just a strong, fast runner to haul your mother out of the building -

BOOM!  With a thundering roar, the gas main under the building explodes.  As the structure comes apart, in what seems like slow motion, you glimpse your mother's shattered body being hurled high into the air, traveling fast, rapidly increasing its distance from the former center of the building.

On the side of the Outcome Pump is an Emergency Regret Button.  All future functions are automatically defined with a huge negative value for the Regret Button being pressed - a temporal reset probability of nearly 1 - so that the Outcome Pump is extremely unlikely to do anything which upsets the user enough to make them press the Regret Button.  You can't ever remember pressing it.  But you've barely started to reach for the Regret Button (and what good will it do now?) when a flaming wooden beam drops out of the sky and smashes you flat.

Which wasn't really what you wanted, but scores very high in the defined future function...

The Outcome Pump is a genie of the second class.  No wish is safe.

If someone asked you to get their poor aged mother out of a burning building, you might help, or you might pretend not to hear.  But it wouldn't even occur to you to explode the building.  "Get my mother out of the building" sounds like a much safer wish than it really is, because you don't even consider the plans that you assign extreme negative values.

Consider again the Tragedy of Group Selectionism: Some early biologists asserted that group selection for low subpopulation sizes would produce individual restraint in breeding; and yet actually enforcing group selection in the laboratory produced cannibalism, especially of immature females.  It's obvious in hindsight that, given strong selection for small subpopulation sizes, cannibals will outreproduce individuals who voluntarily forego reproductive opportunities.  But eating little girls is such an un-aesthetic solution that Wynne-Edwards, Allee, Brereton, and the other group-selectionists simply didn't think of it.  They only saw the solutions they would have used themselves.

Suppose you try to patch the future function by specifying that the Outcome Pump should not explode the building: outcomes in which the building materials are distributed over too much volume, will have ~1 temporal reset probabilities.

So your mother falls out of a second-story window and breaks her neck.  The Outcome Pump took a different path through time that still ended up with your mother outside the building, and it still wasn't what you wanted, and it still wasn't a solution that would occur to a human rescuer.

If only the Open-Source Wish Project had developed a Wish To Get Your Mother Out Of A Burning Building:

"I wish to move my mother (defined as the woman who shares half my genes and gave birth to me) to outside the boundaries of the building currently closest to me which is on fire; but not by exploding the building; nor by causing the walls to crumble so that the building no longer has boundaries; nor by waiting until after the building finishes burning down for a rescue worker to take out the body..."

All these special cases, the seemingly unlimited number of required patches, should remind you of the parable of Artificial Addition - programming an Arithmetic Expert Systems by explicitly adding ever more assertions like "fifteen plus fifteen equals thirty, but fifteen plus sixteen equals thirty-one instead".

How do you exclude the outcome where the building explodes and flings your mother into the sky?  You look ahead, and you foresee that your mother would end up dead, and you don't want that consequence, so you try to forbid the event leading up to it.

Your brain isn't hardwired with a specific, prerecorded statement that "Blowing up a burning building containing my mother is a bad idea."  And yet you're trying to prerecord that exact specific statement in the Outcome Pump's future function.  So the wish is exploding, turning into a giant lookup table that records your judgment of every possible path through time.

You failed to ask for what you really wanted.  You wanted your mother to go on living, but you wished for her to become more distant from the center of the building.

Except that's not all you wanted.  If your mother was rescued from the building but was horribly burned, that outcome would rank lower in your preference ordering than an outcome where she was rescued safe and sound.  So you not only value your mother's life, but also her health.

And you value not just her bodily health, but her state of mind. Being rescued in a fashion that traumatizes her - for example, a giant purple monster roaring up out of nowhere and seizing her - is inferior to a fireman showing up and escorting her out through a non-burning route.  (Yes, we're supposed to stick with physics, but maybe a powerful enough Outcome Pump has aliens coincidentally showing up in the neighborhood at exactly that moment.)  You would certainly prefer her being rescued by the monster to her being roasted alive, however.

How about a wormhole spontaneously opening and swallowing her to a desert island?  Better than her being dead; but worse than her being alive, well, healthy, untraumatized, and in continual contact with you and the other members of her social network.

Would it be okay to save your mother's life at the cost of the family dog's life, if it ran to alert a fireman but then got run over by a car?  Clearly yes, but it would be better ceteris paribus to avoid killing the dog.  You wouldn't want to swap a human life for hers, but what about the life of a convicted murderer?  Does it matter if the murderer dies trying to save her, from the goodness of his heart?  How about two murderers?  If the cost of your mother's life was the destruction of every extant copy, including the memories, of Bach's Little Fugue in G Minor, would that be worth it?  How about if she had a terminal illness and would die anyway in eighteen months?

If your mother's foot is crushed by a burning beam, is it worthwhile to extract the rest of her?  What if her head is crushed, leaving her body?  What if her body is crushed, leaving only her head?  What if there's a cryonics team waiting outside, ready to suspend the head?  Is a frozen head a person?  Is Terry Schiavo a person?  How much is a chimpanzee worth?

Your brain is not infinitely complicated; there is only a finite Kolmogorov complexity / message length which suffices to describe all the judgments you would make.  But just because this complexity is finite does not make it small.  We value many things, and no they are not reducible to valuing happiness or valuing reproductive fitness.

There is no safe wish smaller than an entire human morality.  There are too many possible paths through Time.  You can't visualize all the roads that lead to the destination you give the genie.  "Maximizing the distance between your mother and the center of the building" can be done even more effectively by detonating a nuclear weapon.  Or, at higher levels of genie power, flinging her body out of the Solar System.  Or, at higher levels of genie intelligence, doing something that neither you nor I would think of, just like a chimpanzee wouldn't think of detonating a nuclear weapon.  You can't visualize all the paths through time, any more than you can program a chess-playing machine by hardcoding a move for every possible board position.

And real life is far more complicated than chess.  You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes.  Especially if you wish for something longer-term or wider-range than rescuing your mother from a burning building.

I fear the Open-Source Wish Project is futile, except as an illustration of how not to think about genie problems.  The only safe genie is a genie that shares all your judgment criteria, and at that point, you can just say "I wish for you to do what I should wish for."  Which simply runs the genie's should function.

Indeed, it shouldn't be necessary to say anything.  To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish. Otherwise the genie may not choose a path through time which leads to the destination you had in mind, or it may fail to exclude horrible side effects that would lead you to not even consider a plan in the first place.  Wishes are leaky generalizations, derived from the huge but finite structure that is your entire morality; only by including this entire structure can you plug all the leaks.

With a safe genie, wishing is superfluous.  Just run the genie.

Comments (111)

Sort By: Old
Comment author: Kevin2 24 November 2007 04:20:27AM 4 points [-]

Is there a safe way to wish for an unsafe genie to behave like a safe genie? That seems like a wish TOSWP should work on.

Comment author: themusicgod1 24 November 2013 06:08:32PM 0 points [-]

A sufficiently powerful genie might make safe genies by definition more unsafe. Then your wish could be granted.

Comment author: Nick_Tarleton 24 November 2007 09:27:36AM 9 points [-]

"I wish for a genie that shares all my judgment criteria" is probably the only safe way.

Comment author: Nebu 30 March 2012 10:42:27PM 28 points [-]

This might be done by picking an arbitrary genie, and then modifying your judgement criteria to match that genie's.

Comment author: AndHisHorse 21 August 2013 12:49:20AM 0 points [-]

What if your judgement criteria are fluid - depending, perhaps, on your current hormonal state, your available knowledge, and your particular position in society?

Comment author: Gray_Area 24 November 2007 10:26:03AM 9 points [-]

Sounds like we need to formalize human morality first, otherwise you aren't guaranteed consistency. Of course formalizing human morality seems like a hopeless project. Maybe we can ask an AI for help!

Comment author: Gray_Area 24 November 2007 12:38:05PM 4 points [-]

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because _all_ human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet). Human morality changes as our technology and understanding changes, sometimes significantly. There is no reason to believe this trend will stop. I am afraid (genuine fear, not figure of speech) that the quest to properly formalize and generalize human morality for use by a 'friendly AI' is akin to properly formalizing and generalizing Ptolemean astronomy.

Comment author: J_Thomas 24 November 2007 03:15:13PM 4 points [-]

This generalises. Since you don't know everything, anything you do might wind up being counterproductive.

Like, I once knew a group of young merchants who wanted their shopping district revitalised. They worked at it and got their share of federal money that was assigned to their city, and they got the lighting improved, and the landscaping, and a beautiful fountain, and so on. It took several years and most of the improvements came in the third year. Then their landlords all raised the rents and they had to move out.

That one was predictable in hindsight, but I didn't predict it. There could always be things like that.

When anything you do could backfire, are you better off to stay in bed? No, the advantages of that are obvious but it's also obvious you can't make a living that way.

You have to make your choices and take your chances. If I had an outcome pump and my mother was trapped in a burning building and I had no other way to get her out, I hope I'd use it. The result might be worse than letting her burn to death but at least there would be a chance for a good outcome. If I can just get it to remove some of the bad outcomes the result may be an improvement.

Comment author: dilys 24 November 2007 03:23:41PM 0 points [-]

Wonderfully provocative post (meaning no disregard toward the poor old woman caught in the net of a rhetorical and definitional impasse). Obviously in reference to the line of thought in the "devil's dilemma" enshrined in the original Bedazzled, and so many magic-wish-fulfillment folk tales, in which there is always a loophole exploited by a counter-force, probably IMO in response to the motive to shortcut certain aspects of reality and its regulatory processes, known or unknown.

It would be interesting to collect real life anecdotes about people who have "gotten what they want," and end up begging for their old life back, like Dudley Moore's 端ber-frustrated Stanley Moon trapped in a convent.

I hope this question, ultimately of the relationship of the Part and the Whole, continues to be expressed, especially as relevant to any transhuman enterprise.

Comment author: Eric_1 24 November 2007 04:52:21PM -1 points [-]

It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user. Technology has to be designed and it is designed with an effect/result in mind. It is then optimized so that the end user understands how to call forth this effect. So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

I suppose the human mind has a very complex "ceteris paribus" function which holds all these background parameters at equal to their previous values, while not explicitly stating them, and the ironic-wish-fulfillment-Genie idea relates to the fulfillment of a wish while violating an unspoken ceteris paribus rule. Demolishing the building structure violates ceteris paribus more than the movements of a robot-retriever would in moving aside burning material to save the woman. Material displaced from building should be as nearly equal to the womans body weight as possible, inducing an explosion is a horrible violation of the objective, if the Pump could just be made to sense the proper (implied) parameters.

If the market forces of supply and demand continue to undergird technological progress (i.e. research and development and manufacturing), then the development of a sophisticated technology not-optimized-for-use is problematic: who pays for the second round of research implementation? Surely not the customer, when you give him an Outcome Pump whose every use could result in the death and destruction of his surrounding environs and family members. Granted this is an aside and maybe impertinent in the context of this discussion.

Comment author: RiverC 24 November 2007 08:15:30PM 2 points [-]

Eric, I think he was merely attempting to point out the futility of wishes. Or rather, the futility of asking something for something you want that does not share your judgments on things. The Outcome pump is merely, like the Genie, a mechanism by which to explain his intended meaning. The problem of the outcome pump is, twofold: 1. Any theory that states that time is anything other than a constant now with motion and probability may work mathematically but has yet to be able to actually alter the thing which it describes in a measurable way, and 2. The production of something such as a time machine to begin with would be so destructive as to ultimately prevent the creation of the Outcome Pump.

In fact, as rational as we would like to be, if we are so rational that we miss the forest for the trees, or in this case, the moral for the myth, we sort of undo the reason we have rationality. It's like disassembling a clock to find the time.

Anyhow, the problem of wishes is the trick of prayer: To get something that God will grant, we cannot create a God that wants what we want; it is our inherent experience in life that if God really is all-powerful and above all that he must be singular, and since men's wishes oft conflict he can not by any stretch of the imagination mysteriously coincide with your own capricious desires. Thus you must make the 'wish no wish' which is to change your judgment to that of God's, and then in that case you can not possibly wish something that he will NOT grant.

The mystery of it is that it is still not the same as the 'safe genie'; but at the same time not altogether different. But in the sense that some old Christian Mystics have said the best prayer is the one in which you make no petitions at all (and in fact say nothing!) probably attests to the fact that it is indeed the 'safe genie'.

Comment author: Nick_Tarleton 24 November 2007 08:59:00PM 0 points [-]

On further reflection, the wish as expressed by Nick Tarleton above sounds dangerous, because _all_ human morality may either be inconsistent in some sense, or 'naive' (failing to account for important aspects of reality we aren't aware of yet).

You're right. Hence, CEV.

Comment author: Doug_S. 24 November 2007 09:39:45PM 0 points [-]

Eliezer, you read Home on the Strange?

Comment author: Eliezer_Yudkowsky 24 November 2007 10:05:31PM 1 point [-]

So positing an effective equivalent of the mythological figure "Genie" in technological form ignores the optimization-for-use that would take place at each stage of developing an Outcome-Pump. The technology-falling-from-heaven which is the Outcome Pump demands that we reverse engineer the optimization of parameters which would have necessarily taken place if it had in fact developed as human technologies do.

Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in turn wish even more powerful Outcome Pumps into existence. So once you cross a certain threshold, you get an explosion of optimization power, which mere trial and error is not sufficient to control because of the enormous change of context, in particular, the genie has gone from being less powerful than you to being more powerful than you, and what appeared to work in the former context won't work in the latter.

Which is precisely what happened to natural selection when it developed humans.

Comment author: Eric_1 24 November 2007 10:37:35PM 0 points [-]

"Unfortunately, Eric, when you build a powerful enough Outcome Pump, it can wish more powerful Outcome Pumps into existence, which can in turn wish even more powerful Outcome Pumps into existence."

Yes, technology that develops itself, once a certain point of sophistication is reached.

My only acquaintance with AI up to now has been this website: http://www.20q.net Which contains a neural network that has been learning for two decades or so. It can "read your mind" when you're thinking of a character from the TV show The Simpsons. Pretty incredible actually!

Comment author: Eric_1 24 November 2007 10:54:07PM 1 point [-]

Eliezer, I clicked on your name in the above comment box and voila- a whole set of resources to learn about AI. I also found out why you use the adjective "unfortunately" in reference to the Outcome Pump, as its on the Singularity Institute website. Fascinating stuff!

Comment author: Gray_Area 25 November 2007 12:34:06AM 2 points [-]

"It seems contradictory to previous experience that humans should develop a technology with "black box" functionality, i.e. whose effects could not be foreseen and accurately controlled by the end-user."

Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases.

Comment author: Eric_1 25 November 2007 01:52:55AM 1 point [-]

"Eric, have you ever been a computer programmer? That technology becomes more and more like a black box is not only in line with previous experience, but I dare say is a trend as technological complexity increases."

No I haven't. Could you expand on what you mean?

Comment author: James_D._Miller 25 November 2007 05:31:16AM 0 points [-]

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome. This is why we always need to give judges some discretion when administering the law.

Comment author: TGGP4 25 November 2007 06:20:53AM 1 point [-]

James Miller, have you read The Myth of the Rule of Law? What do you think of it?

Comment author: Gray_Area 25 November 2007 11:52:12AM 1 point [-]

Every computer programmer, indeed anybody who uses computers extensively has been surprised by computers. Despite being deterministic, a personal computer taken as a whole (hardware, operating system, software running on top of the operating system, network protocols creating the internet, etc. etc.) is too large for a single mind to understand. We have partial theories of how computers work, but of course partial theories sometimes fail and this produces surprise.

This is not a new development. I have only a partial theory of how my car works, but in the old days people only had a partial theory of how a horse works. Even a technology as simple and old as a knife still follows non-trivial physics and so can surprise us (can you predict when a given knife will shatter?). Ultimately, most objects, man-made or not are 'black boxes.'

Comment author: danlowlite 15 February 2011 03:04:21PM *  0 points [-]

Material sciences can give us an estimate on the shattering of a given material given certain criteria.

Just because you do not know specific things about it doesn't make it a black box. Of course, that doesn't make the problems with complex systems disappear, it just exposes our ignorance. Which is not a new point here.

Comment author: James_D._Miller 25 November 2007 03:27:32PM 0 points [-]

TGGP,

I have not read the Myth of the Rule of Law.

Comment author: JulianMorrison 25 November 2007 04:16:48PM -3 points [-]

Given that it's impossible for the someone to know your total mind without being it, the only safe genie is yourself.

From the above it's easy to see why it's never possible to define the "best interests" of anyone but your own self. And from that it's possible to show that it's never possible to define the best interests of the public, except through their individually chosen actions. And from that you can derive libertarianism.

Just an aside :-)

Comment deleted 09 February 2010 08:30:35PM [-]
Comment author: JulianMorrison 10 February 2010 12:01:25PM *  -1 points [-]

Not enough information. The genie is programmed to do what with that knowledge? If it's CEV done right, it's safe.

Comment author: Eric_1 25 November 2007 04:33:23PM 0 points [-]

"Ultimately, most objects, man-made or not are 'black boxes.'"

OK, I see what you're getting at.

Three questions about black boxes:

1) Does the input have to be fully known/observable to constitute a black box? When investigating a population of neurons, we can give stimulus to these cells, but we cannot be sure that we are aware of all the inputs they are receiving. So we effectively do not entirely understand the input being given.

2) Does the output have to be fully known/observable to constitute a black box? When we measure the output of a population of neurons, we also cannot be sure of the totality of information being sent out, due to experimental limitations.

3) If one does not understand a system one uses, does that fact alone make that system a black box? In that case there are absolute black boxes, like the human mind, about which complete information *is not known*, and relative black boxes, like the car or TCP/IP, about which complete information *is not known to the current user*.

4) What degree of understanding is sufficient for something not to be called a black box?

Depending on how we answer these things, it will determine whether black box comes to mean:

1) Anything that is identifiable as a 'part', whose input and output is known but whose intermediate working/processing is not understood. 2) Anything that is identifiable as a 'part' whose input, output and/or processing is not understood. 3) Any 'part' that is not completely understood (i.e. presuming access to all information) 4) Anything that is not understood by the user at the time 5) Anything that is not FULLY understood by the user at the time.

We will quickly be in the realm where anything and everything on earth is considered to be a black box, if we take the latter definitions. So how can this word/metaphor be most profitably wielded?

Comment author: Recovering_irrationalist 25 November 2007 04:57:59PM 0 points [-]

TGGP: What did you think of it? I agree till the Socrates Universe, but thought the logic goes downhill from there.

Comment author: mtraven 25 November 2007 05:10:02PM 0 points [-]

tggp, that paper was interesting, although I found its thesis unremarkable. You should share it with our pal Mencius.

Comment author: Kevin2 25 November 2007 05:48:29PM 0 points [-]

Upon some reflection, I remembered that Robin has showed that two Bayesians who share the same priors can't disagree. So perhaps you can get your wish from an unsafe genie by wishing, "... to run a genie that perfectly shares my goals and prior probabilities."

Comment author: Recovering_irrationalist 25 November 2007 06:03:49PM 0 points [-]

As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible? I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

Comment author: Gray_Area 26 November 2007 12:49:18AM 0 points [-]

"As long as you're wishing, wouldn't you rather have a genie whose prior probabilities correspond to reality as accurately as possible?"

Such a genie might already exist.

Comment author: Caledonian2 26 November 2007 01:30:54AM 0 points [-]

In the first year of law school students learn that for every clear legal rule there always exists situations for which either the rule doesn't apply or for which the rule gives a bad outcome.

If the rule doesn't apply, it's not relevant in the first place. I doubt very much you can establish what a 'bad' outcome would involve in such a way that everyone would agree - and I don't see why your personal opinion on the matter should be of concern when we consider legal design.

Comment author: Recovering_irrationalist 26 November 2007 02:04:33AM 3 points [-]

Such a genie might already exist.

You mean GOD? From the good book? It's more plausible than some stories I could mention.

GOD, I meta-wish for an ((...Emergence-y Re-get) Emergence-y Re-get) Emergency Regret Button.

Comment author: Peter_de_Blanc 26 November 2007 04:53:04AM 1 point [-]

Recovering Irrationalist said:

I wouldn't pick an omnipotent but equally ignorant me to be my best possible genie.

Right. It's silly to wish for a genie with the same _beliefs_ as yourself, because the system consisting of you and an unsafe genie is already such a genie.

Comment author: TGGP4 26 November 2007 06:11:09AM 0 points [-]

I discussed "The Myth of the Rule of Law" with Mencius Moldbug here. I recognize that politics alters the application of law and that as long as it is written in natural language there will be irresolvable differences over its meaning. At the same time I observe that different countries seem to hold different levels of respect for the "rule of law" that the state is expected to obey, and it appears to me that those more prone to do so have more livable societies. I think the norm of neutrality on the part of judges applying law with objective meaning is good to be promoted. When there is bad law it is properly the job of the legislature to fix it. This makes it easier for people to know what the law is in advance so they can avoid being smacked with it.

Comment author: AnnaSalamon 26 November 2007 08:00:08AM 1 point [-]

"You cannot predict, in advance, which of your values will be needed to judge the path through time that the genie takes.... The only safe genie is a genie that shares all your judgment criteria."

Is a genie that *does* share all my judgment criteria necessarily safe?

Maybe my question is ill-formed; I am not sure what "safe" could mean besides "a predictable maximizer of my judgment criteria". But I am concerned that human judgment under ordinary circumstances increases some sort of Beauty/Value/Coolness which would not be increased if that same human judgment was used to search over a less restricted set of possibilities.

The world is full of cases where selecting for A automatically increases B when you are searching over a restricted set of possibilities but does *not* increase B when those restrictions are lifted. Overfitting is a classic example. In cases of overfitting, if we search only over a restricted set of few-parameter models, models that do well on the training set will automatically do well on the generalization set, but if we allow more parameters the correlation disappears.

Modern marketing / product development can search over a larger set of alternatives than we used to have access to. In many cases human judgments correlate with less when used on modern manufactured goods than when used on the smaller set of goods that was formerly available. Judgments of tastiness used to correlate with health but now do not. Judgments of "this is a limited resource which I should grab quickly" used to indicate resources which we really should grab quickly but now do not (because of manufactured "limited time offer only" signs and the like).

Genies or AGI's would search over an even larger space of possibilities than contemporary marketing searches over. In this larger space, many of the traditional correlates of human judgment will disappear. That is: in today's restricted search spaces, outcomes which are ranked highly according to human judgment criteria tend also to have various other properties P1, P2, ... Pk. In an AGI's search space, outcomes which are ranked highly according to human judgment criteria will not have properties P1... Pk.

I am worried that properties P1...Pk are somehow valuable. That is, I am worried that in this world human judgments pick out outcomes that are somehow valuable and that human judgments' ability to do this resides, not in our judgment criteria alone (which would be uploaded into our imagined genie) but in the conjunction of our judgment criteria with the restricted set of possibilities that has so far been available to us.

Comment author: starwed 26 November 2007 11:02:36AM 1 point [-]

"Whatever proposition you can manage to input into the Outcome Pump, somehow happens, though not in a way that violates the laws of physics. If you try to input a proposition that's too unlikely, the time machine will suffer a spontaneous mechanical failure before that outcome ever occurs."

So, a kind of Maxwell's demon? :)

Comment author: Stanislav_Datskovskiy 26 November 2007 12:11:27PM 3 points [-]

Rather than designing a genie to exactly match your moral criteria, the simple solution would be to cheat and use *yourself* as the genie. What the Outcome Pump should solve for is your own future satisfaction. To that end, you would omit all functionality other than the "regret button", and make the latter default-on, with activation by anything other than a satisfied-you vanishingly improbable. Say, with a lengthy password.

Of course, you could still end up in a universe where your brain has been spontaneously re-wired to hate your mother. However, I think that such an event is far less likely than a proper rescue.

Comment author: David_C 26 November 2007 12:23:28PM 0 points [-]

You have a good point about the exhaustiveness required to ensure the best possible outcome. In that case the ability of the genie to act "safely" would depend upon the level of the genie's omniscience. For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion. Therefore it would effectively be using you as an oracle of success or failure.

A non-omniscient genie would either need complete instructions, or would only work well where there was an ideal solution. For example, if you wished for your mother to be rescued by a fireman without anyone dying or experiencing damage to more than 2% of their skin, bones or internal organs. The difficulty is when not all your criteria can be satisfied. Things suddenly become very murky.

Comment author: Stuart_Armstrong 26 November 2007 04:52:46PM 1 point [-]

With a safe genie, wishing is superfluous. Just run the genie.

But while most genies are terminally unsafe, there is a domain of "nearly-safe" genies, which must dwarf the space of "safe" genies (examples of a nearly-safe genie: one that picks the moral code of a random living human before deciding on an action or a safe genie + noise). This might sound like semantics, but I think the search for a totally "safe" genie/AI is a pipe-dream, and we should go for "nearly safe" (I've got a short paper on one approach to this here).

Comment author: Nick_Tarleton 26 November 2007 07:08:42PM 0 points [-]

I am worried that properties P1...Pk are somehow valuable.

In what sense can they be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?

For example, if the genie could predict the results of any action it took, you could simply ask it to select any path that results in you saying "thanks genie, great job" without coercion.

Formalizing "coercion" is itself an exhaustive problem. Saying "don't manipulate my brain except through my senses" is a big first step, but it doesn't exclude, e.g., powerful arguments that you don't really want your mother to live.

Comment author: Benquo 26 November 2007 11:00:37PM 0 points [-]

Nick,

Are you thinking of magically strong arguments, or ones that convince because they provide good reasons?

I'd think the latter would be valuable even if it leads to a result you'd initially suppose to be bad.

Comment author: Nick_Tarleton 27 November 2007 12:01:15AM 0 points [-]

The first.

Comment author: AnnaSalamon 27 November 2007 01:02:47AM 0 points [-]

"In what sense can [properties P1...Pk] be valuable, if they are not valued by human judgment criteria (even if not consciously most of the time)?"

I don't know. It might be that the only sense in which something can be valuable is to look valuable according to human judgment criteria (when thoroughly implemented, and well informed, and all that). If so, my concern is ill-formed or irrelevant.

On the other hand, it seems *possible* that human judgments of value are an imperfect approximation of what is valuable in some other (external?) sense. Imagine for example if we met multiple alien races and all of them said "I see what you're getting at with this 'value/goodness/beauty/truth' thing, but you are misunderstanding it a bit; in a few thousand years, you will modify your root judgment criteria in such-and-such a way." In that case I would wonder whether my current judgment criteria were not best understood as an approximation of this other set of criteria and whether it was not value according to this other set of criteria that I should be aiming for.

If human judgment criteria *are* an approximation of some other kind of value, they would probably cease to approximate that other kind of value when used to search over the large space of genie-accessible possibilities.

By way of analogy, scientists' criteria for judging scientific truth/relevance/etc. seem to be changing usefully over time, and it may be that scientists' criteria at different times can be viewed as successive approximations of some other (external?) truth-criteria. Galilean physicists had one way of determining what to believe, Newtonians another, and contemporary physicists yet another. In the restricted set of situations considered by Galilean physicists, Galilean methods yield approximately the same predictions as the methods of contemporary physicists. In the larger space of genie-accessible situations, they do not.

Comment author: Benquo 29 November 2007 05:41:36AM 0 points [-]

Nick,

What makes you think that magically strong arguments are possible? I can imagine arguments that work better than they should because they indulge someone's unconscious inclinations or biases, but not ones that work better than their truthfulness would suggest and cut against the grain of one's inclinations.

Comment author: Nick_Tarleton 29 November 2007 01:09:24PM 1 point [-]

I don't know that they are, but it's the conservative assumption, in that it carries less risk of the world being destroyed if you're wrong. Also, see the AI-box experiments.

Comment author: maki_hodnett 09 December 2007 07:38:19PM -2 points [-]

I think the best way is to believe you and the genie are one. and therefore it is necessary to be grateful for everything you currently have ..this creates a loop. then you can be grateful for things you "will" have right now. For instance you can begin by affirming and feeling within yourself the gratitude for your financial wealth. Financial wealth...starts to appear!

Comment author: kyb 13 June 2008 03:29:23PM 0 points [-]

Excellent post.

Comment author: cousin_it 26 July 2009 07:05:05AM *  3 points [-]

Damn, it took me a long time to make the connection between the Outcome Pump and quantum suicide reality editing. And the argument that proves the unsafety of the Outcome Pump is perfectly isomorphic to the argument why quantum immortality is scary.

Comment author: MoreOn 21 February 2011 07:36:56PM 0 points [-]

"I wish that the genie could understand a programming language."

Then I could program it unambiguously. I obviously wouldn't be able to program my mother out of the burning building on the spot, but at least there would be a host of other wishes I could make that the genie won't be able to screw up.

Comment author: DevilMaster 25 March 2011 01:54:29PM 0 points [-]

"I wish that wishes would be granted as the wisher would interpret them".

Comment author: FAWS 25 March 2011 02:05:22PM 0 points [-]

Doesn't protect against unforeseen consequences and is possibly underspecified (How should the wish work when it needs to affect things the wisher doesn't understand? Create a version of the wisher that does understand? What if there are multiple possible versions that don't agree on interpretations among each other?).

Comment author: pengvado 25 March 2011 02:26:35PM 1 point [-]

Doesn't protect against a reflectively-consistent misinterpretation of "as the wisher would interpret them".

Comment author: RobertLumley 13 September 2011 11:09:08PM 0 points [-]

You wouldn't want to swap a human life for hers, but what about the life of a convicted murderer?

Are convicted murderers not human?

Comment author: ajuc 01 March 2012 08:36:28PM 0 points [-]

So if I specified to the Outcome Pump, that I want the outcome, where the person, that is future version of me (by DNA, and by physical continuity of the body), will write "ABRACADABRA, This outcome I good enough and I value it for $X" on the paper and put in on the outcome pump, and the $X is how much I value the outcome. And if this won't happen in one year, I don't want this outcome, either).

Are there any loopholes?

Comment author: Qiaochu_Yuan 04 January 2013 12:14:17PM 3 points [-]

Genie takes over your body.

Comment author: Jiro 20 August 2013 10:37:18PM 0 points [-]

If the genie is clueless but not actively malicious, then you can ask the genie to describe how it will fulfill your wish. If it describes making the building explode and having your mother's dead body fly out, you correct the genie and tell it to try again. If it gives an inadequate description (says the building explodes and fails to mention what happens to the mother's body at all), you can ask it to elaborate. If it gives a description that is inadequate in exactly the right way to make you think it's describing it adequately while still leaving a huge loophole, there's not much you can do, but that's not a clueless genie, that's an actively malicious genie pretending to be a clueless one.

Comment author: shminux 20 August 2013 10:56:39PM *  0 points [-]

So your recommendation is to use a human as a part of the genie's outcome utility evaluator, relying on human intelligence when deciding between multiple low-probability (i.e. miraculous) events? Even though people have virtually no intuition when dealing with them? I suspect the results would be pretty grave, but on a larger scale, since the negative consequences would be non-obvious and possibly delayed.

Comment author: Jiro 21 August 2013 06:04:48PM -1 points [-]

A genie asked to rescue my mother from a burning building would do it by performing acts that, while miraculous, will be part of a chain of events that is comprehensible by humans. If the genie throws my mother out of the building at 100 miles per hour, for instance, it is miraculous that anyone can throw her out at that speed, but I certainly understand what it means to do that and am able to object. Even if the genie begins by manipulating some quantum energies in a way I can't understand, that's part of a chain of events that leads to throwing, a concept that I do understand.

Yes, it is always possible that there are delayed negative consequences. Suppose it rescues my mother by opening a door and I have no idea that 10 years from now the mayor is going to be saved from an assassin by the door of a burned out wreck being in the closed position and blocking a bullet. But that kind of negative consequence is not unique to genies, and humans go around all their lives doing things with such consequences. Maybe the next time I donate to charity I have to move my arm in such a way that a cell falls in the path of an oncoming cosmic ray, thus giving me cancer 10 years later. As long as the genie isn't actively malicious and just pretending to be clueless, the risk of such things is acceptable for the same reason it's acceptable for non-genie human activities. Furthermore, if the genie is clueless, it won't hide the fact that its plan would kill my mother--indeed, it doesn't even know that it would need to hide that, since it doesn't know that that would overall displease me. So I should be able to figure out that that's its plan by talking to it.

Comment author: shminux 21 August 2013 06:13:20PM *  -1 points [-]

Right, when humans do the usual human things, they put up with the butterfly effect and rely on their intuition and experience to reduce the odds of screwing things up badly in the short term. However, when evaluating the consequences of miracles we have nothing to guide us, so relying on a human evaluator in the loop is no better than relying on a three-year old to stay away from a ledge or candy box. Neither has a clue.

Comment author: MugaSofer 21 August 2013 07:05:41PM 0 points [-]

A genie asked to rescue my mother from a burning building would do it by performing acts that, while miraculous, will be part of a chain of events that is comprehensible by humans. If the genie throws my mother out of the building at 100 miles per hour, for instance, it is miraculous that anyone can throw her out at that speed, but I certainly understand what it means to do that and am able to object. Even if the genie begins by manipulating some quantum energies in a way I can't understand, that's part of a chain of events that leads to throwing, a concept that I do understand.

This is, of course, not true of superintelligence ... is that your point?

As long as the genie isn't actively malicious and just pretending to be clueless, the risk of such things is acceptable for the same reason it's acceptable for non-genie human activities.

Not really. The genie will look in parts of solution-space you wouldn't (eg setting off the gas main, killing everyone nearby.)

Furthermore, if the genie is clueless, it won't hide the fact that its plan would kill my mother--indeed, it doesn't even know that it would need to hide that, since it doesn't know that that would overall displease me. So I should be able to figure out that that's its plan by talking to it.

Well, if it can talk. And it doesn't realise that you would sabotage the plan if you knew.

Comment author: Jiro 21 August 2013 08:01:10PM *  0 points [-]

This is, of course, not true of superintelligence ... is that your point?

Why would this not be true of superintelligence, assuming the intelligence isn't actively malicious?

The genie will look in parts of solution-space you wouldn't (eg setting off the gas main, killing everyone nearby.)

"Talk to the genie" doesn't require that I be able to understand the solution space, just the result. If the genie is going to frazmatazz the whatzit, killing everyone in the building, I would still be able to discover that by talking to the genie. (Of course, I can't reduce the chance of disaster to zero this way, but I can reduce it to an acceptable level matching other human activities that don't have genies in them.)

Well, if it can talk. And it doesn't realise that you would sabotage the plan if you knew.

If it realizes I would sabotage the plan, then it knows that the plan would not satisfy me. If it pushes for the plan knowing that it won't satisfy me, then it's an actively malicious genie, not a clueless one.

Comment author: MugaSofer 24 August 2013 12:26:33PM 0 points [-]

A genie asked to rescue my mother from a burning building would do it by performing acts that, while miraculous, will be part of a chain of events that is comprehensible by humans. If the genie throws my mother out of the building at 100 miles per hour, for instance, it is miraculous that anyone can throw her out at that speed, but I certainly understand what it means to do that and am able to object.

Superintelligence can use strategies you can't undertstand.

The genie will look in parts of solution-space you wouldn't (eg setting off the gas main, killing everyone nearby.)

"Talk to the genie" doesn't require that I be able to understand the solution space, just the result. If the genie is going to frazmatazz the whatzit, killing everyone in the building, I would still be able to discover that by talking to the genie. (Of course, I can't reduce the chance of disaster to zero this way, but I can reduce it to an acceptable level matching other human activities that don't have genies in them.)

That was in response to the claim that genies' actions are no more likely to have unforeseen side-effects than human ones.

If it realizes I would sabotage the plan, then it knows that the plan would not satisfy me. If it pushes for the plan knowing that it won't satisfy me, then it's an actively malicious genie, not a clueless one.

... no, that's kind of the definition of a clueless genie. A malicious one would be actively seeking out solutions that annoy you.

(Also, some Good solutions might require fooling you for your own good, if only because there's no time to explain.)

Comment author: Jiro 24 August 2013 05:05:29PM *  0 points [-]

Superintelligence can use strategies you can't undertstand.

There's a contradiction between "the superintelligence will do something you don't want" and "the superintelligence will do something you don't understand". Not wanting it implies I understand enough about it to not want it (even if I don't understand every single step).

that's kind of the definition of a clueless genie

I would consider a clueless genie to be a genie that tries to grant my wishes, but because it doesn't understand me, grants my wishes in a way that I wouldn't want. A malicious genie is a genie that grants my wishes in a way that it knows I wouldn't want. Reserving that term for genies that intentionally annoy while excluding genies that merely knowingly annoy is hairsplitting and only changes the terminology anyway.

Also, some Good solutions might require fooling you for your own good, if only because there's no time to explain.

If I would in fact want genies to fool me for my own good in such situations, this isn't a problem.

On the other hand, if I think that genies should not try to fool me for my own good in such situations, and the genie knows this, and it fools me for my own good anyway, it's a malicious genie by my standards. The genie has not failed to understand me; it understands what I want perfectly well, but knowingly does something contrary to its understanding of my desires. In the original example, the genie would be asked to save my mother from a building, it knows that I don't want it to explode the building to get her out, and it explodes the building anyway.

Comment author: MugaSofer 26 August 2013 03:17:24PM *  0 points [-]

There's a contradiction between "the superintelligence will do something you don't want" and "the superintelligence will do something you don't understand". Not wanting it implies I understand enough about it to not want it (even if I don't understand every single step).

Well, firstly, there might be things you wouldn't want if you could only understand them. But actually, I was thinking of actions that would affect society in subtle, sweeping ways. Sure, if the results were explained to you, you might not like them, but you built the genie to grant wishes, not explain them. And how sure are you that's even possible, for all possible wish-granting methods?

I would consider a clueless genie to be a genie that tries to grant my wishes, but because it doesn't understand me, grants my wishes in a way that I wouldn't want. A malicious genie is a genie that grants my wishes in a way that it knows I wouldn't want. Reserving that term for genies that intentionally annoy while excluding genies that merely knowingly annoy is hairsplitting and only changes the terminology anyway.

Well, that's what the term usually means. And, honestly, I think there's good reason for that; it takes a pretty precise definition of "non-malicious genie", AKA FAI, not to do Bad Things, which is kind of the point of this essay.

Comment author: Jiro 26 August 2013 03:36:26PM *  2 points [-]

Sure, if the results were explained to you, you might not like them, but you built the genie to grant wishes, not explain them.

That's why I suggested you can talk to the genie. Provided the genie is not malicious, it shouldn't conceal any such consequences; you just need to quiz it well.

It's sort of like the Turing test, but used to determine wish acceptability instead of intelligence. If a human can talk to it and say it is a person, treat it like a person. If a human can talk to it and decide the wish is good, treat the wish as good. And just like the Turing test, it relies on the fact that humans are better at asking questions during the process than writing long lists of prearranged questions that try to cover all situations in advance.

Well, that's what the term usually means.

Really? A clueless genie is a genie that is asked to do something, knows that the way it does it is displeasing to you, and does it anyway? I wouldn't call that a clueless genie.

What terms would you use for

-- a genie that would never knowingly displease you in granting wishes, but may do so out of ignorance

-- a genie that will knowingly displease you in granting wishes

-- a genie that will deliberately displease you in granting wishes?

Comment author: MugaSofer 26 August 2013 04:35:53PM 1 point [-]

More full response coming soon to a comment box near you. For now, terms! Everyone loves terms.

Really?

Here's how I learned it:

A "genie" will grant your wishes, without regard to what you actually want.

A malicious genie will grant your wishes, but deliberately seek out ways to do so that will do things you don't actually want.

A helpful - or Friendly - genie will work out what you actually wanted in the first place, and just give you that, without any of this tiresome "wishing" business. Sometimes called a "useful" genie - there's really no one agreed-on term. Essentially, what you're trying to replicate with carefully-worded wishes to other genies.

Comment author: private_messaging 28 August 2013 09:34:14PM *  3 points [-]

There was a story with an "outcome pump" like this, I do not remember the name. Essentially, a chemical had to get soaked with water due to some time travel related handwave. You could do minor things like getting your mom out of the building by pouring water on the chemical if you are satisfied with the outcome, with some risk that a hurricane would form instead and soak the chemical. It would produce the least improbable outcome (in the sense that all probabilities would become as if it is given that the chemical got soaked, so naturally the least improbable one had the highest chance to have occurred), so it's impact was generally quite limited - to do real damage you had to lock up the chemical in a very strong safe. With a minor plot hole that the least improbable condition was for the chemical to not get locked up in the safe in the first place.

Comment author: David_Gerard 31 August 2013 10:16:36AM 2 points [-]

Isaac Asimov's thiotimoline stories. The last turned it into a space drive.

Comment author: TheAncientGeek 10 January 2014 01:47:14PM *  0 points [-]

Indeed, it shouldn't be necessary to say anything. To be a safe fulfiller of a wish, a genie must share the same values that led you to make the wish. Otherwise the genie may not choose a path through time which leads to the destination you had in mind, or it may fail to exclude horrible side effects that would lead you to not even consider a plan in the first place.

No, the genie need not share the values. If it only needs to want to give you what you would are really wishing for, ie what you would give yourslef if you had its powers. It can do that by discovering your value structure and running a simulation. It doesn't have to hold to your values itself.

This also applies to real-world examples. I can play along with values I don't hold myself, as people do when they travel to other countries with differnt cultures.

Comment author: TheOtherDave 10 January 2014 01:55:44PM 0 points [-]

A genie who gives me what I would give myself is far from being a safe fulfiller of a wish.

Comment author: TheAncientGeek 10 January 2014 02:22:01PM *  0 points [-]

Because?

Comment author: TheOtherDave 10 January 2014 02:24:28PM 0 points [-]

Because I am not guaranteed to only give myself things that are safe.

Comment author: TheAncientGeek 10 January 2014 03:43:14PM 0 points [-]

You would give yourself what you like. Maybe you like danger. People voluntarily parachute and mountain-climb. If the unsafe thing you get is what you want, where is the problem?

Comment author: TheOtherDave 10 January 2014 06:19:21PM 0 points [-]

Sure, if all I care about is whether I get what I want, and I don't care about whether my wishes are fulfilled safely, then there's no problem.

Comment author: TheAncientGeek 17 January 2014 03:47:52PM *  -2 points [-]

It has been stated that this post shows that all values are moral values (or that there is no difference between morality and valuation in general, or..) in contrast with the common sense view that there are clear examples of morally neutral preferences, such as prefences for differnt flavours of ice cream.

I am not convinced by the explanation, since it also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preference when wishing for yummy ice cream.

To be sure, an agent capable of acting morally will have morality among their higher priority preferences -- it has to be among the higher order preferences, becuase it has to override other preferences for the agent to act morally. Therefore, when they scan their higher prioriuty prefences, they will happen to encounter their moral preferences. But that does not mean any preference is necessarily a moral preference. And their moral prefences override other preferences which are therefore non-moral, or at least less moral.

There is no safe wish smaller than an entire human morality.

There is no safe wish smaller than all the subset of value structure, moral or amoral, above it in priority. The subset below doesn't matter. However, a value structure need not be moral at all, and the lower stories will probably be amoral even if the upper stories are not.

Therefore morality is in general a subset of prefences, as common sense maintained all along.