So I have a conundrum. Imagine that Omega comes to you and offers you two choices:

First choice: You get a moment of moderate pain, let's say a slap and then another slap, so that your face hurts for a couple of minutes with some anguish. Now after that pain has faded and you still have the memory of it, Omega measures your discomfort and gives you exactly the amount of money that gives enough joy to compensate the pain and then a cent. By construction, the utility of this choice is one cent.

Second choice: Omega inflicts on you hell for a finite amount of time. Your worst fears all come true, you are unable to distinguish between reality and this hell, the most painful sensations you will experience. After this finite amount of time is over, Omega deletes all memory of it and gives you essentially unlimited monetary funds but still, this experience does not quite compensate for the previously experienced hell if you would remember it. By construction, the expected value of this choice is negative.[1]

If we go by expected value, the first choice is obviously better. Of course Omega forces you to take one choice or you will just get hell forever, we want our thought experiment to work. But if we go by the decision procedure to choose the option in which our future self will feel best, the second choice seems better. I have not yet found a satisfying solution to this apparent paradox. Essentially, how does a rational actor deal with discomfort to get to a pleasurable experience?

[1] I realize that this might be a weak point of my argument. Do we just simply add up positive and negative utilons to get our expected value? Or do we already take into consideration the process of forgetting the pain? Maybe therein lies a solution to this paradox.

New to LessWrong?

New Comment
24 comments, sorted by Click to highlight new comments since: Today at 2:09 PM

The typical way to handle this (for example, the AIXI model does this) is to just add up (integrate) the utility at each point in time and maximize the expected total. This sort of agent would quickly and without worry choose option 1.

Obviously this is not the way humans actually work. Which is not to say that's particularly bad, but at least we should work some way, or else we risk not working at all.

Isn't the second option the same as Omega offering to clone you, put the clone in hell for a finite amount of time and then destroy it, and give you the money immediately (assuming the money is adjusted to compensate for any lost time in hell in the original example)? So the option is actually to be paid a lot of money in exchange for allowing Omega to torture a person (nominally "you") who will never experience any further positive utility. I would take two slaps in the face even without compensation instead of that option. I don't consider my similarity to a person as a reason to treat them as a redundant copy.

That option runs into the problem that you've just let Omega extort money by threatening to create a person, torture it, and then destroy it. That seems problematic in other ways.

Everything Omega does is horribly problematic because Omega is at best an UFAI. I've never seen "preemptively neutralize Omega completely" offered as an option in any of the hypothetical scenarios even though that's obviously the very best choice.

Is it really in anyone's best interest to ever cooperate with Omega given that Omega seems intent on continuing a belligerent campaign of threats against humanity? "I'll give you a choice between $1,000,000 or $1,000 today, but tomorrow I might threaten to slap you or throw you in hell. Oh, btw, I just simulated you against your will 2^100 times to maintain my perfect record on one/two-boxing."

I may be overly tired and that may sound like hyperbole, but I do think that any rational agent encountering a far more powerful agent known to be at least not-friendly should think long and hard about the probability that the powerful agent can be trusted with even seemingly innocuous situations. There may be no way to Win. Some form of defection or defiance of the powerful agent may yield more dignity utilons than playing along with any of the choices offered by Omega. Survival machines may not value dignity and self-determination, but many humans value them quite highly.

I'm using this comment to test the -5 karma rule. Just ignore it.

Much clearer way to think about it.

I'd totally go for the memory loss/clone destruction option. To me it's the final outcome that matters most, so if you start with one poor me and end with one rich me without the memory of anything unpleasant, it's clearly a better option than ending up with one still-pretty-poor me with smarting cheeks. This is, of course, my subjective utility, I have no claim that it is better than anyone else's for them.

To me it's the final outcome that matters most ... it's clearly a better option than ending up with one still-pretty-poor me ... This is, of course, my subjective utility, I have no claim that it is better than anyone else's for them.

How could one know with any certainty what's better for them (in the murkier cases)? Alternatively, if you do have a process that allows you to learn what's better to you, you should claim that you can also help others to apply that process in order to figure out what's better to them (which may be a different thing than what the process says about you).

You can of course decide what to do, but having ability to implement your own decisions is separate from having ability to find decisions that are reliably correct, from knowing that the decisions you make are clearly right or pursuing what in fact matters the most.

Does that apply only to copies of you or to people in general? Would you choose to torture all of humanity for a finite time, make them forget it, and then you receive 1 utilon?

Does that apply only to copies of you or to people in general?

As I explained, I do not presume to make decisions for others.

Would you choose to torture all of humanity for a finite time, make them forget it, and then you receive 1 utilon?

I would not, see above. A better question would have been "Would you choose to slightly inconvenience a person you dislike for a short time, make them forget it, and then you receive 3^^^3 utilons?" If I answered "yes" (and I probably would), then you could probe further to see where exactly my self-professed non-interference breaks down. This is the standard way of forking the dust specks-vs-torture boundary and showing the resulting inconsistency.

Similar strategies apply to clarifying other seemingly absolute positions, including yours ("I don't consider my similarity to a person as a reason to treat them as a redundant copy.") Presumably at some point the answers become "I don't know", rather than Yes/No.

I am fairly certain the only way that I would treat a clone of myself differently than another independent person is if we continued to share internal mental experiences. Then again, I would probably stop thinking of myself and a random person off the street as different people if I started sharing mental experiences with them, too.

In other words, while I would reject sending my fully independent clone to hell in order to gain utility, I might agree to fully share the mental experience with the clone in hell so long as the clone also got to experience the extra utility Omega paid me to balance out hell. That brings up a rather interesting question; if two people share mental experiences do they achieve double the utility of each person individually, or merely the set union of their individual utilities? Or something else?

while I would reject sending my fully independent clone to hell in order to gain utility, I might agree to fully share the mental experience with the clone in hell so long as the clone also got to experience the extra utility Omega paid me to balance out hell.

This seems to contradict your earlier assertion that

the second option the same as Omega offering to clone you, put the clone in hell for a finite amount of time and then destroy it, and give you the money immediately

because if you and the clone are one and the same (no cloning happened, you were tortured and then memory-wiped), "both" of you reap the rewards.

because if you and the clone are one and the same (no cloning happened, you were tortured and then memory-wiped), "both" of you reap the rewards.

We are not the same person after the point of the decision. There's no continuity of experience. The tortured me experiences none of the utility, and the enriched me experiences none of the torture. That was why I thought of the cloning interpretation to begin with.

But if we go by the decision procedure to choose the option in which our future self will feel best

Which future self? There's more than one. The future selves that are in hell will feel much, much worse. The future selves after that will feel best.

I think that this isn't a problem about rational choice given a certain utility function, but about what your utility function should be in the first place, and there's no "correct" answer to that question.

No one can tell you whether you should prefer to be tortured horribly, then given amnesia, then given $1 trillion, or whether you should prefer to be slapped and then given $100.01. Decide which one you would prefer, and how much, and then we can talk about constructing a utility function that accurately reflects your preferences.

Once you've figured out your preferences and assigned consistent utilities, then we can do the sort of mathematical reasoning you are trying to do. For example, maybe you decide that hell+amnesia+riches has a utility of -100 and that slaps+money has a utility of +1, for example by deciding that you'd accept at most a 1% chance of hell+amnesia+riches in exchange for slaps+money. Then we can immediately use this to, say, calculate what other bets you should and shouldn't take with regard to these options.

But we can't tell you which one you should prefer: you have to decide that yourself. You're allowed to prefer hell+amnesia+riches if you want!

This runs into the "experiencing self" vs "remembering self" distinction. Conceptually it seems very troublesome to perform expected utility calculations on behalf of the experiencing self - the one who would suffer the pains in the above scenario.

From the perspective of the remembering self, pain only matters if it leaves a trace: if you can remember it, or if (unconsciously) it changes the choices you'd make in similar situations in future.

(Think of Sammy Jenkis in the movie Memento who was shown to not be a "true" amnesiac - he avoided picking up toys that had been previously rigged to give him electric shock, even though he behaved as though he had no memory of the past shocks. Yes, this is a fictional example - but despite being fictional it validly highlights a distinction lurking below the surface of the word "memory".)

From this perspective the disutility of the "hell" scenario consists only of the opportunity cost, i.e. while suffering hell you could instead have been doing something pleasant that you'd have remembered afterwards. But deleting the memories, and deleting any dispositions you may have acquired as a result of experiencing the pain, and so on - essentially restoring you to a previous backup - the deleted pain will not count from the perspective of the remembering self.

(Noting the "backup" analogy in the previous paragraph, I have to acknowledge that my intuitions in this may be shaped in part by my experiences playing video games...)

And, for a non-hypothetical example of the remembering self/experiencing self tradeoff, in Thinking Fast and Slow the example is colonoscopy methodology. We tend to remember pain in a peak-end way (the worst moment and the final moments), so if you prolong the colonoscopy's least bad point right at the end (by leaving the scope just within the body for a little while unessential) we remember it as less awful than if you ended the procedure promptly. But the experiencing self goes through an extra period of low-grade discomfort.

As I recall from my readings on amnesia, having no conscious recollection of events but nevertheless having an unconscious preference (or lack of preference) is fairly common. Essentially patients have impaired declarative (explicit) memory but some spared implicit perceptual and motor memory. So the fictional example of Sammy Jenkis is actually quite reality-based.

What needs to be distinguished in this scenario is whether Omega is only wiping your declarative memory or if he's also going in and getting rid of your implicit memory as well, which takes care of lower-level responses to stimuli that might otherwise cause problems after the event.

Having no conscious experience of events but having an unconscious something can also occur. This is what happens to the severed corpus callosum patients when you show them two images or words so that each is only in the field of vision of one eye. They only report seeing one, but the 'unseen' image can color their interpretation of the 'seen' one.

For example, if I show your talking half the word "cleave" and put a picture of two people hugging in the second, unseen slot, you're more likely to define "cleave" as "joining" than you would be if the second picture was of a knife cutting fruit.

The whole idea that [if you don't remember something then for the purposes of decision making it didn't happen] seems fundamentally ridiculous to me. Actually it's wierder than "didn't happen" in this example, it's "as if it was not about to happen" because you will have forgotten it some time in the future. It seems even more bizzare to me that you suggest leaving this in tact and instead resolving the paradox by changing the way you accumulate utilons. All this suggests to me that when accounting for memory failures you can't always correctly judge the best decision from anywhere other than an entirely external viewpoint. The fact that this is difficult to actually do is just another one of life's challenges.

I actually think poor decision making as a result of overly discounting future costs works both ways. People under-value in hindsight the amount of effort they put into getting something once they have it. Imagine for instance asking someone on a tropical holiday whether it was worth working all those extra shifts to save up for it, they're bound to say yes. If they went on the holiday first and you ask them while they're working a bunch of extra shifts to pay the bill you might get a different answer. The bias as I see it is to overvalue the present and discount both the future and the past.

So I don't really see why we should assume our future self knows any better than the present or past versions. They just have a different bias. What we need is a "timeless" version of ourself...

I think that the cumulative utility maximizer model might not be really appropriate if you allow for the agent to be forgetful. Anyway, if you stick with it, then the agent picks option 1.

I've considered some variations of the model that pick option 2, but they seem all vulnerable to wireheading.

Voted up for a clear situation in compact description.

Pro Up: on-topic on AI, compact, clear, open vulnerability, real confusion Pro Down: Unrealistic Omega situation, forgetting as method of reset seem off point

[+][anonymous]12y-120