Counterfactual Mugging
Related to: Can Counterfactuals Be True?, Newcomb's Problem and Regret of Rationality.
Imagine that one day, Omega comes to you and says that it has just tossed a fair coin, and given that the coin came up tails, it decided to ask you to give it $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, Omega tells you that if the coin came up heads instead of tails, it'd give you $10000, but only if you'd agree to give it $100 if the coin came up tails.
Omega can predict your decision in case it asked you to give it $100, even if that hasn't actually happened, it can compute the counterfactual truth. Omega is also known to be absolutely honest and trustworthy, no word-twisting, so the facts are really as it says, it really tossed a coin and really would've given you $10000.
From your current position, it seems absurd to give up your $100. Nothing good happens if you do that, the coin has already landed tails up, you'll never see the counterfactual $10000. But look at this situation from your point of view before Omega tossed the coin. There, you have two possible branches ahead of you, of equal probability. On one branch, you are asked to part with $100, and on the other branch, you are conditionally given $10000. If you decide to keep $100, the expected gain from this decision is $0: there is no exchange of money, you don't give Omega anything on the first branch, and as a result Omega doesn't give you anything on the second branch. If you decide to give $100 on the first branch, then Omega gives you $10000 on the second branch, so the expected gain from this decision is
-$100 * 0.5 + $10000 * 0.5 = $4950
So, this straightforward calculation tells that you ought to give up your $100. It looks like a good idea before the coin toss, but it starts to look like a bad idea after the coin came up tails. Had you known about the deal in advance, one possible course of action would be to set up a precommitment. You contract a third party, agreeing that you'll lose $1000 if you don't give $100 to Omega, in case it asks for that. In this case, you leave yourself no other choice.
But in this game, explicit precommitment is not an option: you didn't know about Omega's little game until the coin was already tossed and the outcome of the toss was given to you. The only thing that stands between Omega and your 100$ is your ritual of cognition. And so I ask you all: is the decision to give up $100 when you have no real benefit from it, only counterfactual benefit, an example of winning?
P.S. Let's assume that the coin is deterministic, that in the overwhelming measure of the MWI worlds it gives the same outcome. You don't care about a fraction that sees a different result, in all reality the result is that Omega won't even consider giving you $10000, it only asks for your $100. Also, the deal is unique, you won't see Omega ever again.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (256)
If some guy walked up to you and gave you this spiel, you'd be fully justified in telling him to get lost, or even seeking mental help for him.
The problem assumes Omega to be genuine, and trustworthy.
I don't see the difficulty. No, you don't win by giving Omega $100. Yes, it would have been a winning bet before the flip if, as you specify, the coin is fair. Your PS, in which you say to "assume that in the overwhelming measure of the MWI worlds it gives the same outcome", contradicts the assertion that the coin is fair, and so you have asked us for an answer to an incoherent question.
The difficulty comes from projecting the ideal decision theory on people. Look how many people are ready to pay up $100, so it must be a real difficulty.
The fairness of a coin is a property of your mind, not of the coin itself. The coin can be fair in a deterministic world, the same way you can have free will in deterministic world.
This doesn't sound right to me. The coin doesn't need to be quantum mechanical to be fair. Here is a fair but perfectly deterministic coin: the 1098374928th digit of pi, mod 2. I have no idea whether it's a zero or one. I could figure it out if you gave me enough time, as could Omega. If both of us agree not to take the time to figure it out in advance, we can use it as a fair coin. But in all Everett branches, it comes out the same way.
Better to say that your state of knowledge about the coin, prior to Omega appearing, is that it has a probability 1/2 of being heads and 1/2 of being tails. The MWI clause is supposed to make the problem harder by preventing you from assigning utility (once Omega appears) to your 'other selves' in other Everett branches. The problem is then just: "how, knowing that Omega might appear, but not knowing what the coin flip will be, can I maximise my utility?" If Omega appears in front of you right now then that's a different question.
Precommitting should be, as someone already said, signing a paper with a third party agreeing to give them $1000 in case you fail to give the $100 to Omega. Precommitment means you have no other option. You can't say that you both precommitted to give the $100 AND refused to do it when presented with the case.
Which means, if Omega presents you with the scenario before the coin toss, you precommit (by signing the contract with the third party). If Omega presents you with the scenario after the coin toss AND also tells you it has already come up tails - you haven't precommited, therefore you shouldn't give it $100.
EDIT: Also, some people objected to not giving the $100, because they might be the emulation which Omega uses to predict whether you'd really give money. If you were an emulation, then you would remember precommitting in expectation to get $10,000 with a 50% chance. It makes no sense for Omega to emulate you in a scenario where you don't get a chance to precommit.
That level of precomitting is only neccessary if you are unable to trust yourself to carry through with a self-imposed precommitment. If you are capable of this, you can decide now to act irrationally to certain future decisions in order to benefit to a greater degree than someone who can't. If the temptation to go back on your self-promise is too great in the failure case, then you would have lost in the win case - you are simply a fortunate loser who found out the flaw in his promise in the case where being flawed was beneficial. It doesn't change the fact that being capable of this decision would be a better strategy on average. Making yourself conditionally less rational can actually be a rational decision, and so the ability to do so can be a strength worth acquiring.
Ultimately the problem is the same as that of an ultimatum (eg. MAD). We want the other party to believe we will carry through even if it would be clearly irrational to do so at that point. As your opponent becomes better and better at predicting, you must become closer and closer to being someone who would make the irrational decision. When your opponent is sufficiently good (or you have insufficient knowledge as to how they are predicting), the only way to be sure is to be someone who would actually do it.
Okay, I agree that this level of precomitting is not necessary. But if the deal is really a one-time offer, then, when presented with the case of the coin already having come up tails, you can no longer ever benefit from being the sort of person who would precommit. Since you will never again be presented with a newcomb-like scenario, then you will have no benefit from being the precommiting type. Therefore you shouldn't give the $100.
If, on the other hand, you still expect that you can encounter some other Omega-like thing which will present you with such a scenario, doesn't this make the deal repeatable, which is not how the question was formulated?
In the repeatable scenario I believe, unlike Vladimir, that a real difference exists. Whatever decision process you use to decide not to pay $100 in one round, you can predict with high probability that that same process will operate in future rounds as well, leading to a total gain to you of about $0. On the other hand, you know that if your current decision process leads you to giving $100 in this case, then with high probability that same process will operate in future rounds, leading to a total gain to you of about $4950 x expected future rounds. Therefore, if you place a higher confidence in your ability to predict your future actions from your current ones than you do in your own reasoning process, you should give the $100 up. This makes the problem rather similar to the original Newcomb's problem, in that you assign higher probability that your reasoning is wrong if it causes you to two-box than you do to any reasoning which leads you to two-box.
This is a self-deception technique. If you think it's morally OK to self-deceive your future self for your current selfish ends, then by all means go ahead. Also, it looks like violent means of precommitment should actually be considered immoral, on par with forcing some other person to do your bidding by hiring a killer to kill them if they don't comply.
In the Newcomb's problem, it actually is in your self-interest to one-box. Not so in this problem.
I am fairly sure that it isn't, but demonstrating so would require another maths-laden article, which I anticipate would be received similarly to my last. I will however email you my entire reasoning if you so wish (you will have to wait several days while I brush up on the logical concept of common knowledge). (I don't know how to encode a ) in a link, so please add one to the end.)
Common knowledge (I used the %29 ASCII code for ")").
I'm going to write up my new position on this topic. Nonetheless I think it should be possible to discuss the question in a more concise form, since I think the problem is that of communication, not rigor. You deceive your future self, that's the whole point of the comment above, make it believe that it wants to make an action that it actually doesn't. The only disagreement position that I expect is saying that no, the future self actually wants to follow that action.
I think the problem with your article wasn't that it was math-laden, but that you didn't introduce things in sufficient detail to follow along, and to see the motivation behind the math.
To be perfectly honest, your last sentence is also my feeling. I should at the least have talked more about the key equation. But the article was already long, I was unsure as to how it would be received, and I spent too little time revising it (this is a persistent problem for me). If I were to write it again now, it would have been closer in style to the thread between you and me there.
If you intend to write another post, then I am happy to wait until then to introduce the ideas I have in mind, and I will try hard to do so in a manner that won't alienate everyone.
In a repeatable deal your action influences the conditions in the next rounds. Even if you defect in this round, you may still cooperate in the next rounds, Omegas aren't looking back at how you decided in the past, and don't punish you by not offering the deals. Your success in the following rounds (from your current point of view) depends on whether you manage to precommit to the future encounters, not on what you do now.
If you think that through and decide that way, then your precommitting method didn't work. The idea is that you must somehow now prevent your future self from behaving rationally in that situation - if they do, they will perform exactly the thought process you describe. The method of doing so, whether making a public promise (and valuing your spoken word more than $100), hiring a hitman to kill you if you renege or just having the capability of reliably convincing yourself to do so (effectively valuing keeping faith with your self-promise more than $100) doesn't matter so long as it is effective. If merely deciding now is effective, then that is all that's needed.
If you do then decide to take the rational course in the losing coinflip case, it just means you were wrong by definition about your commitment being effective. Luckily in this one case, you found it out in the loss case rather than the win case. Had you won the coin flip, you would have found yourself with nothing though.
How do I know that? I would assign a lower prior probability to that than to me waking up tomorrow with a blue tentacle instead of my right arm; so, it such a situation, I would just believe Omega is bullshitting me.
See Least convenient possible world. These technical difficulties are irrelevant to the problem itself.
It does seem like a legitimate issue though, that a decision theory that deals with the least convenient possible world manifestation of the Counterfactual Mugging scenario is not necessarily well adapted in general.
When to believe what claims is a completely separate issue. We are looking at a thought experiment to get a better idea about what kinds of considerations should be taken into account in general, not to build a particular agent that does well in this situation (and possibly worse in others).
Is the scenario really isomorphic to any sort of real life dilemma though? An agent which commits to paying out the $100 could end up being screwed over by an anti-Omega, which would pay out $10,000 only to a person who wouldn't give Omega the $100. I'm not clear on what sort of general principles the thought experiment is supposed to illustrate.
Start from assuming that the agent justifiably knows that the thought experiment is set up as it's described.
Do they know before being confronted by Omega, or only once confronted?
If they did not know in advance that it's more likely for Omega to appear and conduct the counterfactual mugging than it is for anti-Omega to appear and reward those who wouldn't cooperate on the counterfactual mugging, then I can't see that there's any point in time where the agent should expect greater utility by committing to cooperate on the counterfactual mugging. If they do know in advance, then it's better to precommit.
It's an assumption of the thought experiment that the player justifiably learns about the situation after the coin is tossed, that they are dealing with Omega and not "anti-Omega" and somehow learn that to be the case.
In that case, it doesn't seem like there's any point in time where a decision to cooperate should have a positive expected utility.
This is just the one-shot Prisoner's Dilemma. You being split into two different possible worlds, is just like the two prisoners being taken into two different cells.
Therefore, you should give Omega $100 if and only if you would cooperate in the one-shot PD.
I know this is off-topic, but I feel duty-bound to respond (in the absence of profile pages or a really working direct message functionality).
"Epistemology: the big questions" by Blackwell publishing is awesome.
introductory logic texts are easy to find, but Hurley's "A Concise Introduction to Logic" comes recommended, depending on what sort of intro you were looking for.
No precommittment, no deal.
Uniqueness raises all sorts of problems for decision theory, because expected utility implicitly assumes many trials. This may just be another example of that general phenomenon.
I guess I'm a bit tired of "God was unable to make the show today so the part of Omniscient being will be played by Omega" puzzles, even if in my mind Omega looks amusingly like the Flying Spaghetti Monster.
Particularly in this case where Omega is being explicitly dishonest - Omega is claiming to be either be sufficiently omniscient to predict my actions, or insufficiently omniscient to predict the result of a 'fair' coin, except that the 'fair' coin is explicitly predetermined to always give the same result . . . except . . .
What's the point of using rationalism to think things through logically if you keep placing yourself into illogical philosophical worlds to test the logic?
Could be a quantum coin, which is unpredictable under current laws of physics. Anyway, this stuff actually does have applications in decision theory. Quibbling over the practical implementations of the thought experiment is not actually useful to you or anybody else.
More precisely it is exactly predictable but for most practical purposes can be treated as equivalent to an unpredictable coin.
By 'unpredictable' I mean 'under current formalisms of physics it is not possible for us to accumulate enough information to predict it'.
By 'more precisely' I mean... no. The way you have phrased it makes your statement false.
You can predict what the future outcome of a quantum coin will be (along the lines of branches with heads and tails their respective amplitudes). A related prediction you cannot make - when the quantum event has already occurred but you have not yet observed it you cannot predict what your observation will be (now that 'your' refers only to the 'you' in the specific branch).
Again, for practical purposes - for most people's way of valuing most future outcomes - the future coin can be treated as though it is an unpredictable coin.
I was using 'you' and 'us' in the colloquial sense of the subjective experiences of a specific, arbitrary continuity chosen at random from of the set of Everette branches in the hypothetical branch of the world tree that this counterfactual occurs in.
Now, I CAN start listing my precise definitions for every potentially ambiguous term I use, or we could simply agree not to pick improbable and inconsistent interpretations of the other's words. Frankly, I'd much prefer the latter, as I cannot abide pedants.
EDIT: Or you could downvote all my posts. That's cool too.
It should be noted that 'all my posts' does not refer to karma-assassaination here. Rather, that three comments here were downvoted. This is correct (and in accord with my downvoting policy).
And I perceived you as being needlessly pedantic and choosing implausible interpretations of my words so that you could correct me. You'll note that your comment karma stands. I am, in fact, aware of quantum mechanics, and you are, of course, entirely correct. Coins behave in precisely deterministic ways, even if they rely on, say, radioactive decay. The causality just occurs in many Everette branches. That said, there is no way that before 'you' 'flip the coin' you can make a prediction about its subjective future state, and have more than half of your future selves be right. If that's not 'unpredictable' by the word's colloquial definition, then I'm not sure the word has any meaning.
You will notice that when I said that the coin is unpredictable, I did not claim, or even imply that the world was undeterministic, or that quantum mechanics was wrong. If I had said such a thing, you would have right to correct me. As it is, you took the opportunity to jump on my phrasing to correct me of a misconception that I did not, in fact, possess. That is being pedantic, it is pointless, and above all it is annoying. I apologize for rudeness, but trying to catch up others on their phrasing is a shocking waste of intellect and time.
EDIT: Again, I can totally discard every word that's entrenched in good, old-fashioned single-universe connotations, and spell out all the fascinating multiverse implications of everything I say, if that will make you happy -- but it will make my posts about five times longer, and it will make it a good deal more difficult to figure out what the hell I'm saying, which rather defeats the purpose of using language.
I'll note that I reject your 'implausible' claim, object to all insinuations regarding motive, stand by my previous statements and will maintain my policy of making mild clarifications when the subject happens to come up.
There seems to be little else to be said here.
As you like. Though I do hope you apply your strident policy of technical correctness in your home life, for consistency's sake.
For example: someone (clearly wrong) like I would merely say, in our archaic and hoplessly monocosmological phrasing 'I am going to lunch.' This is clearly nonsense. You will, over the set of multiverse branches, do a great many things, many of them having nothing to do with food, or survival. The concept of 'I' and 'lunch' are not even particularly well defined.
In contrast, someone held to your standard of correctness would have to say 'The computation function implemented in the cluster of mass from which these encoded pressure waves are emanating will execute a series of action for which they predict that in the majority of future Everette branches of this fork of the world tree, the aforementioned cluster of mass will accumulate new amplitude and potential energy through the process of digestion within the next hour and fifteen minutes.'
Clearly this is more efficient and less confusing to the reader.
Since the distinction is of decision theoretical relevance and the source of much confusion I choose to clarify incorrect usages of 'unpredictable' in this particular environment. By phrasing it as 'more precisely' I leave plenty of scope for the original speaker to be assumed to be just speaking loosely.
Unfortunately you chose to fortify and defend an incorrect position instead of allowing the additional detail. Now you have given a very nice definition of 'you' but even with that definition both of your claims are just as incorrect as when they started. Fixing 'you' misses the point.
You are probably too entrenched in your position to work with but for anyone else who wants to talk about 'unpredictable' quantum coins, qualifiers like ("for most intents and purposes", "effectively") are awesome!
By reading the quantum coin flip, you definitely entangle yourself with it, and there's no way you're going to stay coherent.
As a hard-core Everettian, I find the original usage and the followup totally unobjectionable in principle. Your clarification was good except for the part where it said Ati's statement was wrong. There exists a reading of the terms which leaves those wrong, yes. So don't use that one.
Coin is not predetermined, and it doesn't matter if Omega has hand-selected every result of the coin toss, as long as we don't have any reason to slide the probability of the result to either direction.
I really fail to see why you're all so fascinated by Newcomb-like problems. When you break causality, all logic based on causality doesn't function any more. If you try to model it mathematically, you will get inconsistent model always.
On second thoughts, since many clever philosophers spend careers on these problems, I may be missing something.
Nah, they just need something to talk about.
I find the problem interesting, so I'll try to explain why I find it interesting.
So there are these blogs called Overcoming Bias and Less Wrong, and the people posting on it seem like very smart people, and they say very reasonable things. They offer to teach how to become rational, in the sense of "winning more often". I want to win more often too, so I read the blogs.
Now a lot of what these people are saying sounds very reasonable, but it's also clear that the people saying these things are much smarter than me; so much so that although their conclusions sound very reasonable, I can't always follow all the arguments or steps used to reach those conclusions. As part of my rationalist training, I try to notice when I can follow the steps to a conclusion, and when I can't, and remember which conclusions I believe in because I fully understand it, and which conclusions I am "tentatively believing in" because someone smart said it, and I'm just taking their word for it for now.
So now Vladimir Nesov presents this puzzle, and I realize that I must not have understood one of the conclusions (or I did understand them, and the smart people were mistaken), because it sounds like if I were to follow the advice of this blog, I'd be doing something really stupid (depending on how you answered VN's problem, the stupid thing is either "wasting $100" or "wasting $4950").
So how do I reconcile this with everything I've learned on this blog?
Think of most of the blog as a textbook, with VN's post being an "exercise to the reader" or a "homework problem".
Not really - all that is neccessary is that Omega is a sufficiently accurate predictor that the payoff matrix, taking this accuracy into question, still amounts to a win for the given choice. There is no need to be a perfect predictor. And if an imperfect, 99.999% predictor violates free will, then it's clearly a lost cause anyway (I can predict with similar precision many behaviours about people based on no more evidence than their behaviour and speech, never mind godlike brain introspection) Do you have no "choice" in deciding to come to work tomorrow, if I predict based on your record that you're 99.99% reliable? Where is the cut-off that free will gets lost?
Chances are I can predict such a response too, and so won't tell you of my prediction (or tell you in such a way that you will be more likely to attend: eg. "I've a $50 bet you'll attend tomorrow. Be there and I'll split it 50:50"). It doesn't change the fact that in this particular instance I can fortell the future with a high degree of accuracy. Why then would it violate free will if Omega could predict your accuracy in this different situation (one where he's also able to predict the effects of him telling you) to a similar precision?
Then take my bet situation. I announce your attendance, and cut you in with a $25 stake in attendance. I don't think it would be unusual to find someone who would indeed appear 99.99% of the time - does that mean that person has no free will?
People are highly, though not perfectly, predictable under a large number of situations. Revealing knowledge about the prediction complicates things by adding feedback to the system, but there are lots of cases where it still doesn't change matters much (or even increases predictability). There are obviously some situations where this doesn't happen, but for Newcombe's paradox, all that is needed is a predictor for the particular situation described, not any general situation. (In fact Newcombe's paradox is equally broken by a similar revelation of knowledge. If Omega were to reveal its prediction before the boxes are chosen, a person determined to do the opposite of that prediction opens it up to a simple Epimenides paradox.)
The primary reason for resolving Newcomb-like problems is to explore the fundamental limitations of decision theories.
It sounds like you are still confused about free will. See Righting a Wrong Question, Possibility and Could-ness, and Daniel Dennett's lecture here.
I think I'm not confused about free will, and that the links I gave should help to resolve most of the confusion. Maybe you should write a blog post/LW article where you formulate the nature of your confusion (if you still have it after reading the relevant material), I'll respond to that.
There's no need to break causality. You are a being implemented in chaotic wetware. However, there's no reason to think we couldn't have rational agents implemented in much more predictable form, as python routines for example, so that any being with superior computation power could simply inspect the source and determine what the output would be.
In such a case, Newcomb-like problems would arise, perfectly lawfully, under normal physics.
You cannot do that without breaking Rice's theorem. If you assume you can find out the answer from someone else's source code -> instant contradiction.
You cannot work around Rice's theorem or around causality by specifying 50.5% accuracy independently of modeled system, any accuracy higher than 50%+epsilon is equivalent to indefinitely good accuracy by repeatedly predicting (standard cryptographic result), and 50%+epsilon doesn't cause the paradox.
Give me one serious math model of Newcomb-like problems where the paradox emerges while preserving causality. Here are some examples. Then you model it, you either get trivial solution to one-box, or causality break, or omega loses. * You decide first what you would do in every situation, omega decides second, and now you only implement your initial decision table and are not allowed to switch. Game theory says you should implement one-boxing. * You decide first what you would do in every situation, omega decides second, and now you are allowed to switch. Game theory says you should precommit to one-box, then implement two-boxing, omega loses. * You decide first what you would do in every situation, omega decides second, and now you are allowed to switch. If omega always decides correctly, then he bases his decision on your switch, which either turns it into model #1 (you cannot really switch, precommitment is binding), or breaks causality.
Rice's theorem says you can't predict every possible algorithm in general. Plenty of particular algorithms can be predictable. If you're running on a classical computer and Omega has a copy of you, you are perfectly predictable.
And all of your choices are just as real as they ever were, see the OB sequence on free will (I think someone referred to it already).
And the argument that omega just needs predictive power of 50.5% to cause the paradox only works if it works against ANY arbitrary algorithm. Having that power against any arbitrary algorithm breaks Rice's Theorem, having that power (or even 100%) against just limited subset of algorithms doesn't cause the paradox.
If you take strict decision tree precommitment interpretation, then you fix causality. You decide first, omega decides second, game theory says one-box, problem solved.
Decision tree precommitment is never a problem in game theory, as precommitment of the entire tree commutes with decisions by other agents:
are identical, as B cannot decide based on f. So the changing your mind problem never occurs.
With omega:
I don't think the paradox exist in any plausible mathematization of the problem. It looks to me like another of those philosophical problems that exist because of sloppiness of natural language and very little more, I'm just surprised that OB/LW crowd cares about this one and not about others. OK, I admit I really enjoyed it the first time I saw it but just as something fun, nothing more than that.
I don't know why nobody mentioned this at the time, but that's hardly an unpopular view around here (as I'm sure you've noticed by now).
The interesting thing about Newcomb had nothing to do with thinking it was a genuine paradox - just counterintuitive for some.
You wouldn't know this with certainty* because it wouldn't be true.
(*unless you were delusional)
The guy next door is on roughly your mental level. Thus, the guy next door can't predict your actions perfectly, because he can't run a perfect simulation of your mind that's faster than you. He doesn't have the capacity.
And he certainly doesn't have the capacity to simulate the environment, including other people, while doing so.
Humans may or may not generally have access to genuine randomness.
It's as yet unknown whether we even have run on quantum randomness; and its also unprovable that quantum randomness is actually genuine randomness, and not just based on effects we don't yet understand, as so many other types of randomness have been.
You're not taking this in the least convenient possible world. Surely it's not impossible in principle that your neighbor can simulate you and your environment. Perhaps your neighbor is superintelligent?
It's ALSO not impossible in principle in the real world. A superintelligent entity could, in principle, perfectly predict my actions. Remember, in the Least Convenient Possible World quantum "randomness" isn't random.
As such, this ISN'T a fundamental difference between humans and "such beings". Which was all I set out to demonstrate.
I was using the "most plausible world" on the basis that it seemed pretty clear that that was the one Roko intended. (Where your neighbour isn't in fact Yahweh in disguise). EDIT: Probably should specify worlds for things in this kind of environment. Thanks, the critical environment here is helping me think about how I think/argue.
Whaddaya mean humans are rational agents with access to genuine randomness? That's what we're arguing about in the first place!
Perhaps Omega is entangled with your brain such that in all the worlds in which you would choose to one-box, he would predict that you one-box, and all the worlds in which you would choose to two-box, he would predict that you two-box?
In the original formulation, if Omega expects you to flip a coin, he leaves box B empty.
In fact, Newcomb-like problems fall naturally out of any ability to simulate and predict the actions of other agents. Omega as described is essentially the limit as predictive power goes to infinity.
If we define an imperfect predictor as a perfect predictor plus noise, i.e. produces the correct prediction with probability p regardless of the cognition algorithm it's trying to predict, then Newcomb-like problems are very robust to imperfect prediction: for any p > .5 there is some payoff ratio great enough to preserve the paradox, and the required ratio goes down as the prediction improves. e.g. if 1-boxing gets 100 utilons and 2-boxing gets 1 utilon, then the predictor only needs to be more than 50.5% accurate. So the limit in that direction favors 1-boxing.
What other direction could there be? If the prediction accuracy depends on the algorithm-to-be-predicted (as it would in the real world), then you could try to be an algorithm that is mispredicted in your favor... but a misprediction in your favor can only occur if you actually 2-box, so it only takes a modicum of accuracy before a 1-boxer who tries to be predictable is better off than a 2-boxer who tries to be unpredictable.
I can't see any other way for the limit to turn out.
If you have two agents trying to precommit not to be blackmailed by each other / precommit not to pay attention to the others precommitment, then any attempt to take a limit of this Newcomblike problem does depend on how you approach the limit. (I don't know how to solve this problem.)
The value(s) for which the limit is being taken here is unidirectional predictive power, which is loosely a function of the difference in intelligence between the two agents; intuitively, I think a case could be made that (assuming ideal rationality) the total accuracy of mutual behavior prediction between two agents is conserved in some fashion, that doubling the predictive power of one unavoidably would roughly halve the predictive power of the other. Omega represents an entity with a delta-g so large vs. us that predictive power is essentially completely one-sided.
From that basis, allowing the unidirectional predictive power of both agents to go to infinity is probably inherently ill-defined and there's no reason to expect the problem to have a solution.
They don't require breaking causality. The argument works if Omega is barely predicting you above chance. I'm sure there are plenty of normal people who can do that just by talking to you.
There are also more important reasons. Take the doomsday argument. You can use the fact that you're alive now to predict that we'll die out "soon". Suppose you had a choice between saving a life in a third-world country that likely wouldn't amount to anything, or donating to SIAI to help in the distant future. You know it's very unlikely for there to be a distant future. It's like Omega did his coin toss, and if it comes up tails, we die out early and he asks you to waste the money by donating to SIAI. If it comes up heads, you're in the future, and it's better if you would have donated.
That's not some thing that might happen. That's a decision you have to make before you pick a charity to donate to. Lives are riding on this. That's if the coin lands on tails. If it lands on heads, there is more life riding on it than has so far existed in the known universe. Please choose carefully.
Arguments like these remind me of students' mistakes from Algorithms and Data Structures 101 - statements like that are very intuitive, absolutely wrong, and once you figure out why this reasoning doesn't work it's easy to forget that most people didn't go through this ever.
What is required is Omega predicting better than chance in the worst case. Predicting correctly with ridiculously tiny chance of error against "average" person is worthless.
To avoid Omega and causality silliness, and just demonstrate this intuition - let's take a slightly modified version of Boolean satisfiability - but instead of one formula we have three formulas of the same length. If all three are identical, return true or false depending on its satisfiability, if they're different return true if number of one bits in problem is odd (or some other trivial property).
It is obviously NP-complete, as any satisfiability problem reduces to it by concatenating it three times. If we use exponential brute force to solve the hard case, average running time is O(n) for scanning the string plus O(2^(n/3)) for brute forcing but only 2^-(2n/3) of the time, that is O(1). So we can solve NP-complete problems in average linear time.
What happened? We were led astray by intuition, and assumed that problems that are difficult in worst case cannot be trivial on average. But this equal weighting is an artifact - if you tried reducing any other NP problem into this, you'd be getting very difficult ones nearly all the time, as if by magic.
Back to Omega - even if Omega predicts normal people very well, as long as there are any thinking being who is cannot predict - Omega must break causality. And such being are not just hypothetical - people who decide based on a coin toss are exactly like that. Silly rules about disallowing chance merely make counterexamples more complicated, Omega and Newcomb are still as much based on sloppy thinking as ever.
I don't know any reason why a coin toss would be the best choice in Newcomb's paradox. If you decide based on reason, and don't decide to flip a coin, and Omega knows you well, he can predict your action above chance. The paradox stands.
Hi,
My name is Omega. You may have heard of me.
Anyway, I have just tossed a fair coin, and given that the coin came up tails, I'm gonna have to ask each of you to give me $100. Whatever you do in this situation, nothing else will happen differently in reality as a result. Naturally you don't want to give up your $100. But see, if the coin came up heads instead of tails, I'd have given you each $10000, but only to those that would agree to give me $100 if the coin came up tails.
I don't believe you.
You forgot to add that we have sufficient reason to believe everything you say.
We're assuming Omega is trustworthy? I'd give it the $100, of course.
Had the coin come up differently, Omega might have explained the secrets of friendly artificial general intelligence. However, he now asks that you murder 15 people.
Omega remains completely trustworthy, if a bit sick.
Completely agreed, a major problem in any realistic application of such scenarios.
I'm afraid I don't follow.
Raising the stakes in this way does not work, because of the issue described in Ethical Injunctions: it is less likely that Omega has presented you with this choice, than that you have gone insane.
To add to the comments below, if you're going to take this route, you might as well have already decided that encountering Omega at all is less likely than that you have gone insane.
That may be true, but it's still a dodge. Conditional on not being insane, what's your answer?
Additionally, I don't see why Omega asking you to give it 100 dollars vs 15 human lives necessarily crosses the threshold of "more likely that I'm just a nutbar". I don't expect to talk to Omega anytime soon...
So imagine yourself in the most inconvenient possible world where Omega is a known feature of the environment and has long been seen to follow through on promises of this type; it does not particularly occur to you or anyone that believing this fact makes you insane.
When I phrase it that way - imagine myself in a world full of other people confronted by similar Omega-induced dilemmas - I suddenly find that I feel substantially less uncomfortable; indicating that some of what I thought was pure ethical constraint is actually social ethical constraint. Still, it may function to the same self-protective effect as ethical constraint.
Ha, I'll re-raise: Had the coin come up differently, Omega would have filled ten Hubble volumes with CEV-output. However, he now asks that you blow up this Hubble volume.
(Not only do you blow up the universe (ending humanity for eternity) you're glad that Omega showed to offer this transparently excellent deal. Morbid, ne?)
Ouch.
For some reason, raising the stakes in these hypotheticals to the point of actual pain has become reflex for me. I'm not sure if it's to help train my emotions to be able to make the right choices in horrible circumstances, or just my years in the Bardic Conspiracy looking for an outlet.
So from my and Omega's perspective this coin is random and my behavior is predictable. Amusing. My question: What if Omega says "due to quirks in your neurology, had I requested it, you would have pre-committed to bet $100 against $46.32. As it happens, you lost anyway, but you would have taken an unfavorable deal. Would you pay then?
Nope. I don't care what quirks in my neurology do - I don't care what answer the material calculator returns, only the answer to 2 + 2 = ?
Meh, the original is badly worded.
Take 2. Omega notices a neuro-quirk. Then, based on what he's noticed, he offers you a 50/50 bet of 100$ to 43.25 dollars at just the right time with just the right intonation...
NOW do you take that bet?
...Why yes, yes you do. Even you. And you know it. it's related to why you don't think boxing an AI is the answer. only, Omega's already out of the box, and so can adjust your visual and auditory input with a much higher degree of precision.
No it isn't. Your 'Take 2' is an entirely different question. One that seems to miss the point. The question "Can Omega exploit a vulnerability of human psychology?" isn't a particularly interesting one and becomes even less so when by the definition of Omega and the problem specification the answer is either "Yes" or "I deny the counterfactual" regardless of anything to do with vulnerabilities in human intellectual capabilities.
oh. whoops.... so more like a way of poking holes in the strategy "i will do whatever I would have precommitted to do"?
The coin toss may be known to Omega and predicted in advance, it only needs to initially have 50/50 odds to you for the expected gain calculation to hold. When Omega tells you about the coin, it communicates to you its knowledge about the toss, about an independent variable of initial 50/50 odds. For example, Omega may tell you that it hasn't tossed the coin yet, it'll do so only a thousand years from now, but it predicted that the coin will come up tails, so it asks you for your $100.
That's just like playing "Eeny, meeny, miny, moe" to determine who's 'it'. Once you figure out if there's an even or odd number of words, you know the answer, and it isn't random to you anymore. This may be great as a kid choosing who gets a cookie (wow! I win again!), but you're no longer talking about something that can go either way.
For a random output of a known function, you still need a random input.
This requires though that Omega have decided to make the bet in a fashion which exhibited no dependency on its advance knowledge of the coin.
This is a big issue which I unsucessfully tried to address in my non-existing 6+ paragraph explanation. Why the heck is Omega making bets if he can already predict everything anyway?
That said, it's not clear that when Omega offers you a bet, you should automatically refuse it under the assumption that Omega is trying to "beat" you. It seems like Omega doesn't really mind giving away money (pretty reasonable for an omniscient entity), since he seems to be willing to leave boxes with millions of dollars in them just lying around.
What is Omega's purpose is entirely unknown. Maybe he wants you to win these bets. If you're a rational person who "wants to win", I think you can just "not worry" about what Omega's intents are, and figure out what sequence of actions maximizes your utility (which in these examples always seems to directly translate into maximizing the amount of money you get).
Quantum Coins. seriously. they're easy enough to predict if you accept many worlds.
as for the rest... entertainment? Could be a case of "even though I can predict these humans so well, it's fascinating as to just how many of them two-box no matter how obvious i make it."
It's not impossible-we know that we exist, it is not impossible that some race resembling our own figured out a sufficient solution to the lob problem and became a race of omegas...
Can you please explain the reasoning behind this? Given all of the restrictions mentioned (no iterations, no possible benefit to this self) I can't see any reason to part with my hard earned cash. My "gut" says "Hell no!" but I'm curious to see if I'm missing something.
There are various intuition pumps to explain the answer.
The simplest is to imagine that a moment from now, Omega walks up to you and says "I'm sorry, I would have given you $10000, except I simulated what would happen if I asked you for $100 and you refused". In that case, you would certainly wish you had been the sort of person to give up the $100.
Which means that right now, with both scenarios equally probable, you should want to be the sort of person who will give up the $100, since if you are that sort of person, there's half a chance you'll get $10000.
If you want to be the sort of person who'll do X given Y, then when Y turns up, you'd better bloody well do X.
And if Omega comes up to me and says "I was going to kill you if you gave me $100. But since I've worked out that you won't, I'll leave you alone." then I'll be damn glad I wouldn't agree.
This really does seem like pointless speculation.
Of course, I live in a world where there is no being like Omega that I know of. If I knew otherwise, and knew something of their properties, I might govern myself differently.
We're not talking Pascal's Wager here, you're not guessing at the behaviour of capricious omnipotent beings. Omega has told you his properties, and is assumed to be trustworthy.
You are stating that. But as far as I can tell Omega is telling me its a capricious omnipotent being. If there is a distinction, I'm not seeing it. Let me break it down for you:
1) Capricious -> I am completely unable to predict its actions. Yes.
2) Omnipotent -> Can do the seemingly impossible. Yes.
So, what's the difference?
It's not capricious in the sense you give: you are capable of predicting some of its actions: because it's assumed Omega is perfectly trustworthy, you can predict with certainty what it will do if it tells you what it will do.
So, if it says it'll give you 10k$ in some condition (say, if you one-box its challenge), you can predict that it'll give it the money if that condition arises.
If it were capricious in the sense of complete inability of being predicted, it might amputate three of your toes and give you a flower garland.
Note that the problem supposes you do have certainty that Omega is trustworthy; I see no way of reaching that epistemological state, but then again I see no way Omega could be omnipotent, either.
On an somewhat unrelated note, why would Omega ask you for 100$ if it had simulated you wouldn't give it the money? Also, why would it do the same if it had simulated you would give it the money? What possible use would an omnipotent agent have for 100$?
Omega is assumed to be mildly bored and mildly anthropic. And his asking you for 100$ could always be PART of the simulation.
Yes, it’s quite reasonable that if it was curious about you it would simulate you and ask the simulation a question. But once it did that, since the simulation was perfect, why would it waste the time to ask the real you? After all, in the time it takes you to understand Omega’s question it could probably simulate you many times over.
So I’m starting to think that encountering Omega is actually pretty strong evidence for the fact that you’re simulated.
I liked this position -- insightful, so I'm definitely upvoting.
But I'm not altogether convinced it's a completely compelling argument. With the amounts reversed, Omega could have walked up to you and said "I would have given you $100 except if I asked you for $10.000 you would have refused." You'd then certainly wish to have been the sort of person to counterfactually have given up the $10000, because in the real world it'd mean you'd get $100, even though you'd certainly REJECT that bet if you had a choice for it in advance.
Not necessarily; it depends on relative frequency. If Omega has a 10^-9 chance of asking me for $10000 and otherwise will simulate my response to judge whether to give me $100, and if I know that (perhaps Omega earlier warned me of this), I would want to be the type of person who gives the money.
Well said. That's a lot of the motivation behind my choice of decision theory in a nutshell.
Thanks, it's good to know I'm on the right track =)
I think this core insight is one of the clearest changes in my thought process since starting to read OB/LW -- I can't imagine myself leaping to "well, I'd hand him $100, of course" a couple years ago.
Is that an acceptable correction?
Well, with a being like Omega running around, the two become more or less identical.
If we're going to invent someone who can read thoughts perfectly, we may as well invent someone who can conceal thoughts perfectly.
Anyway, there aren't any beings like Omega running around to my knowledge. If you think that concealing motivations is harder than I think, and that the only way to make another human think you're a certain way is to be that way, say that.
I think this describes one of the core principles of virtue theory under any ethical system.
I wonder how much it depends upon accidents of human psychology, like our tendency to form habits, and how much of it is definitional (if you don't X when Y, then you're simply not the sort of person who Xes when Y)
That's not the situation in question. The scenario laid out by Vladimir_Nesov does not allow for an equal probability of getting $10000 and paying $100. Omega has already flipped the coin, and it's already been decided that I'm on the "losing" side. Join that with the fact that me giving $100 now does not increase the chance of me getting $10000 in the future because there is no repetition.
Perhaps there's something fundamental I'm missing here, but the linearity of events seems pretty clear. If Omega really did calculate that I would give him the $100 then either he miscalculated, or this situation cannot actually occur.
-- EDIT --
There is a third possibility after reading Cameron's reply... If Omega is correct and honest, then I am indeed going to give up the money.
But it's a bit of a trick question, isn't it? I'm going to give up the money because Omega says I'm going to give up the money and everything Omega says is gospel truth. However, if Omega hadn't said that I would give up the money, then I wouldn't of given up the money. Which makes this a bit of an impossible situation.
Assuming the existence of Omega, his intelligence, and his honesty, this scenario is an impossibility.
Thank you. Now I grok.
So, if this scenario is logically inconsistent for all values of 'me' then there really is nothing that I can learn about 'me' from this problem. I wish I hadn't thought about it so hard.
Omega hasn't told you his predictions in the given scenario.
I feel like a man in an Escher painting, with all these recursive hypothetical mes, hypothetical kuriges, and hypothetical omegas.
I'm saying, go ahead and start by imagining a situation like the one in the problem, except it's all happening in the future -- you don't yet know how the coin will land.
You would want to decide in advance that if the coin came up against you, you would cough up $100.
The ability to precommit in this way gives you an advantage. It gives you half a chance at $10000 you would not otherwise have had.
So it's a shame that in the problem as stated, you don't get to precommit.
But the fact that you don't get advance knowledge shouldn't change anything. You can just decide for yourself, right now, to follow this simple rule:
If there is an action to which my past self would have precommited, given perfect knowledge, and my current preferences, I will take that action.
By adopting this rule, in any problem in which the oppurtunity for precommiting would have given you an advantage, you wind up gaining that advantage anyway.
I'm actually not quite satisfied with it. Probability is in the mind, which makes it difficult to know what I mean by "perfect knowledge". Perfect knowledge would mean I also knew in advance that the coin would come up tails.
I know giving up the $100 is right, I'm just having a hard time figuring out what worlds the agent is summing over, and by what rules.
ETA: I think "if there was a true fact which my past self could have learned, which would have caused him to precommit etc." should do the trick. Gonna have to sleep on that.
ETA2: "What would you do in situation X?" and "What would you like to pre-commit to doing, should you ever encounter situation X?" should, to a rational agent, be one and the same question.
Not if precommiting potentially has other negative consequences. As Caspian suggested elsewhere in the thread, you should also consider the possibility that the universe contains No-megas who punish people who would cooperate with Omega.
...why should you also consider that possibility?
Because if that possibility exists, you should not necessarily precommit to cooperate with Omega, since that risks being punished by No-mega. In a universe of No-megas, precommiting to cooperate with Omega loses. This seems to me to create a distinction between the questions "what would you do upon encountering Omega?" and "what will you now precommit to doing upon encountering Omega?"
I suppose my real objection is that some people seem to have concluded in this thread that the correct thing to do is to, in advance, make some blanket precommitment to do the equivalent of cooperating with Omega should they ever find themselves in any similar problem. But I feel like these people have implicitly made some assumptions about what kind of Omega-like entities they are likely to encounter: for instance that they are much more likely to encounter Omega than No-mega.
...and that's an even better way of putting it.
|Perfect knowledge
use a Quantum coin-it conveniently comes up both.
"Perfect knowledge would mean I also knew in advance that the coin would come up tails."
This seems crucial to me.
Given what I know when asked to hand over the $100, I would want to have pre-committed to not pre-committing to hand over the $100 if offered the original bet.
Given what I would know if I were offered the bet before discovering the outcome of the flip I would wish to pre-commit to handing it over.
From which information set I should evaluate this? The information set I am actually at seems the most natural choice, and it also seems to be the one that WINS (at least in this world).
What am I missing?
I'll give you the quick and dirty patch for dealing with omega: There is no way to know that, at that moment, you are not inside of his simulation. by giving him the 100$, there is a chance you are tranfering that money from within a simulation-which is about to be terminated-to outside of the simulation, with a nice big multiplier.
MBlume:
This phrasing sounds about right. Whatever decision-making algorithm you have drawing your decision D when it's in situation X, should also come to the same conditional decision before the situation X appeared, "if(X) then D". If you actually don't give away $100 in situation X, you should also plan to not give away $100 in case of X, before (or irrespective of whether) X happens. Whichever decision is the right one, there should be no inconsistency of this form. This grows harder if you must preserve the whole preference order.
I don't see this situation is impossible, but I think it's because I've interpreted it differently from you.
First of all, I'll assume that everyone agrees that given a 50/50 bet to win $10'000 versus losing $100, everyone would take the bet. That's a straightforward application of utilitarianism + probability theory = expected utility, right?
So Omega correctly predicts that you would have taken the bet if he had offered it to you (a real no brainer; I too can predict that you would have taken the bet had he offered it).
But he didn't offer it to you. He comes up now, telling you that he predicted that you would accept the bet, and then carried out the bet without asking you (since he already knew you would accept the bet), and it turns out you lost. Now he's asking you to give him $100. He's not predicting that you will give him that number, nor is he demanding or commanding you to give it. He's merely asking. So the question is, do you do it?
I don't think there's any inconsistency in this scenario regardless of whether you decide to give him the money or not, since Omega hasn't told you what his prediction would be (though if we accept that Omega is infallible, then his prediction is obviously exactly whatever you would actually do in that situation).
I work on AI. In particular, on decision systems stable under self-modification. Any agent who does not give the $100 in situations like this will self-modify to give $100 in situations like this. I don't spend a whole lot of time thinking about decision theories that are unstable under reflection. QED.
Even considering situations like this and having special cases for them sounds like it would add a bit much cruft to the system.
Do you have a working AI that I could look at to see how this would work?
If you need special cases, your decision theory is not consistent under reflection. In other words, it should simply always do the thing that it would precommit to doing, because, as MBlume put it, the decision theory is formulated in such fashion that "What would you precommit to?" and "What will you do?" work out to be one and the same question.
But this is precisely what humans don't do, because we respond to a "near" situation differently than a "far" one. Your advance prediction of your decision is untrustworthy unless you can successfully simulate the real future environment in your mind with sufficient sensory detail to invoke "near" reasoning. Otherwise, you will fail to reach a consistent decision in the actual situation.
Unless of course, In the actual situation, you're projecting back, "What would I have decided in advance to do had I thought about this in advance?" -- and you successfully mitigate all priming effects and situationally-motivated reasoning.
Or to put all of the above in short, common-wisdom form: "that's easy for you to say NOW..." ;-)
And now I try to calculate what you should treat as being the probability that you're being emulated. Assume that Omega only emulates you if the coin comes up heads.
Suppose you decide beforehand that you are going to give Omega the $100, as you ought to. The expected value of this is $4950, as has been calculated.
Suppose that instead, you decide beforehand that E is the probability you're being emulated assuming you hear that came up tails. You'll still decide to give Omega the $100; therefore, your expected value if you hear that it came up heads is $10,000. Your expected value if you hear that the coin came up tails is -$100(1-E) + $10,000E.
The probability that you hear that the coin comes up tails should be given by P(H) + P(T and ~E) + P(T and E) = 0, P(H) = P(T and ~E), P(T and ~E) = P(T) - P(T and E), P(T and E) = P(E|T) * P(T). Solving these equations, I get P(E|T) = 2, which probably means I've made a mistake somewhere. If not, c'est l'Omega?
um... lets see....
to REALLY evaluate that, we technically need to know how long omega runs the simulation for.
now, we have two options: one, assume omega keeps running the simulation indefinitely. two, assume that omega shuts the simulation down once he has the info he's looking for (and before he has to worry about debugging the simulation.)
in # 1, what we are left with is p(S)=1/3, p(H)=1/3, p(t)=1/3, which means we're moving 200$/3 from part of our possibility cloud to gain 10,000$/3 in another part.
In #2, we're moving a total of 100/2 $ to gain 10000/2. The 100$ in the simulation is quantum-virtual.
so, unless you have reason to suspect omega is running a LOT of simulations of you, AND not terminating them after a minute or so...(aka, is not inadvertently simulation-mugging you)...
You can generally treat Omega's simulation capacity as a dashed causality arrow from one universe to another-sortof like the shadow produced by the simulation...
I'm very torn on this problem. Every time I think I've got it figured out and start typing out my reasons why, I change my mind, and throw away my 6+ paragraph explanation and start over, arguing the opposite case, only to change my mind again.
I think the problem has to do with strong conflicts between my rational arguments and my intuition. This problem is a much more interesting koan for me than one hand clapping, or tree in the forest.
My two bits: Omega's request is unreasonable.
Precommitting is something that you can only do before the coin is flipped. That's what the "pre" means. Omega's game rewards a precommitment, but Omega is asking for a commitment.
Precommitting is a rational thing to do because before the coin toss, the result is unknown and unknowable, even by Omega (I assume that's what "fair coin" means). This is a completely different course of action than committing after the coin toss is known! The utility computation for precommitment is not and should not be the same as the one for commitment.
In the example, you have access to information that pre-you doesn't (the outcome of the flip). If rationalists are supposed to update on new information, then it is irrational for you to behave like pre-you.
By definition, pre-you only has access to the coin's probability distribution, while you have access to the result of the coin flip. Surely you don't mean to say that's the same thing?
From the perspective of a non-superintelligence, Omega's prediction abilities are indistinguishable from magic. Human beings can't tell what they "imply." Trying to figure out the implications with a primate brain will only get you into a paradox like claiming a fact is the same as a probability distribution. All we can reasonably do is stipulate Omega's abilities needed to make the problem work and no further.
You know, if Omega is truly doing a full simulation of my cognitive algorithm, then it seems my interactions with him should be dominated by my desire for him to stop it, since he is effectively creating and murdering copies of me.
but...if you're the emulated you...you're going to die after you give him/don't give him the money.
It's not just about the USB sticks -- to me that seems inert. But if he's running you off those USB sticks for (let's say) a few hours every day, then you could (in fact there is a 1000/1001 chance that you will) wake up tomorrow morning and find yourself running from one of those drives, and know that there is a clear horizon of a few hours on the subjective experiences you can anticipate. This is a prospect which I, at least, would find terrifying.
The decision doesn't need to be read off from a straightforward simulation, it can be an on-demand, so to say, reconstruction of the outcome from the counterfactual. I believe it should be possible to calculate just your decision, without constructing a morally significant computation. Knowing your decision may be as simple as checking whether you adhere a certain decision theory.
The counterfactual anti-mugging: One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.
If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn't ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?
Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.
Before you open the box, you have a chance to precommit. What do you do?
Do you have a point?
Yes, that there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other. Which precommitment is most rational depends depends on the which type you expect to encounter.
I don't expect to encounter either, and on the other hand I can't rule out fallible human analogues of either. So for now I'm not precommitting either way.
Yeah, now. But after Omega really, really, appears in front of you, chance of Omega existing is about 1. Chance of No-Mega is still almost non-existent. In this problem, existence of Omega is given. It's not something you are expecting to encounter now, just as we're not expecting to encounter eccentric Kavkan billionaires that will give you money for toxicating yourself. The Kavka's Toxin and the counterfactual mugging present a scenario that is given, and ask you how would you act then.
You don't precommit to "give away the $100, to anyone who asks". You precommit to give away the $100 in exactly the situation I described. Or, generalizing such precommitments, you just compute your decisions on the spot, in a reflectively consistent fashion. If that's what you want do to with your future self, that is.
I would have no actionable suspicion that I should give Omega the $100 unless I knew about No-mega. So I get the $10000 only if No-mega asks the question "What would Eliezer do knowing about No-mega?" and not if No-mega asks the question "What would Eliezer do not knowing about No-mega?"
I realise I'm coming to this a little late, but I'm a little unclear about this case. This is my understanding:
When you ask me if I should give Omega the $100, I commit to "yes" because I am the agent who might meet Omega one day, and since I am in fact at the time before the coin has been flipped right now, by the usual expected value calculations the rational choice is to decide to.
So does that mean that if I commit now (eg: by giving myself a monetary incentive to give the $100), and my friend John meets Omega tomorrow who has flipped the coin and it has landed tails, I should tell him that the rational choice is to not give the $100, since he is deciding after the coin toss.
Would anyone be so kind as to tell me if that seems right?
Well, this comes up different ways under different interpretations. If there is a chance that I am being simulated, that is this is part of his determining my choice, then I give him $100. If the coin is quantum, that is there will exist other mes getting the money, I give him $100. If there is a chance that I will encounter similar situations again, I give him $100. If I were informed of the deal beforehand, I give him $100. Given that I am not simulated, given that the coin is deterministic, and given that I will never again encounter Omega, I don't think I give him $100. Seeing as I can treat this entirely in isolation due to these conditions, I have the choice between -$100 and $0, of which two options the second is better. Now, this runs into some problems. If I were informed of it beforehand, I should have precommitted. Seeing as my choices given all information shouldn't change, this presents difficulty. However, due to the uniqueness of this deal, there really does seem to be no benefit to any mes from giving him the money, and so it is purely a loss.
Philosopher Kenny Easwaran reported in 2007 that:
Korff also reinvents counterfactual mugging:
And he looks into generalizing to the algorithmic version:
Korff is now an Asst. Prof. at Georgie State.
If it's an iterated game, then the decision to pay is a lot less unintuitive.
I'm way late to this party, but aren't we ignoring something obvious? Such as imperfect knowledge of how likely Omega is to be right about its prediction of what you would do? If you live in a universe where Omega is a known fact and nobody thinks themselves insane when they meet him, well, then it's the degenerate case where you are 100% certain that Omega predicts correctly. If you lived in such a universe presumably you would know it, and everyone in that world would pre-commit to giving Omega $100, just like in ours pizza-deliverers pre-commit to not carrying more than a small amount of cash with them.
There may be other universes where Omega is known to be right and do what he says he will do 80% of the time. Or ones where there are rumors of an omniscient Omega that always makes good on his word, but you assign them 80% probability of being true. And so on.
Given the $5000 expected payoff and the $50 expected cost for pre committing, you should do it if the probability of Omega being both right and trustworthy is greater than or equal to 0.01.
But, if you, knowing what you know about THIS universe, suddenly found yourself in the presence of some alien entity making the claim Omega makes in the above scenario, what kind of evidence would you demand for this claim before assigning a probability greater than 0.01?
It occurs to me that the dude in the robe and mask pretending to be Omega could up the ante to $1000000, and if I wouldn't believe him more than 0.01% given a $10000 payoff, it probably wouldn't matter to me what he offered as a payoff, because if he has enough delusions and/or chutzpah to make this claim in this universe, there's no reason for him to balk at adding on a few extra decimal places. I'm not sure how to formalize that mathematically, though.
So, is it reasonable to pre-commit to giving the $100 in the counterfactual mugging game? (Pre-commitment is one solution to the Newcomb problem.) On first glance, it seems that a pre-commitment will work.
But now consider "counter-counterfactual mugging". In this game, Omega meets me and scans my brain. If it finds that I've pre-committed to handing over the $s in the counterfactual mugging game, then it empties my bank account. If I haven't pre-committed to doing anything in counterfactual mugging, then it rewards me with $1 million. Damn.
So what should I pre-commit to doing, if anything? Should I somehow try to assess my likelihood of meeting Omega (in some form or other) and guess what sort of parlour game it is likely to play with me, and for what stakes? Has anyone got any idea how to do that assessment, without unduly privileging the games that we happen to have thought of so far? This way madness lies I fear...
The interest with these Omega games is that we don't meet actual Omegas, but do meet each other, and the effects are sometimes rather similar. We do like the thought of friends who'll give us $1000 if we really need it (say in a once-in-a-lifetime emergency, with no likelihood of reciprocity) because they believe we'd do the same for them if they really needed it. We don't want to call that behaviour irrational. Isn't that the real point here?
Not exactly madness, but Pascal's wager. If you haven't seen any evidence of Omega existing by now, nor any theory behind how predictions such as his could be possible, and word of his parlour game preferences has not reached you, then chances are that he is so unlikely in this universe that he is in the same category as Pascal's wager.
There is one nice thing about the real-world friend case, which is that you actually might be in the reverse situation later. So it's not just a counterfactual you're considering; it's a real future possibility.
Take that away and it's more like Omega; but then it's not the real-world problem anymore!
Under my syntacticist cosmology, which is a kind of Tegmarkian/Almondian crossover (with measure flowing along the seemingly 'backward' causal relations), the answer becomes trivially "yes, give Omega the $100" because counterfactual-me exists. In fact, since this-Omega simulates counterfactual-me and counterfactual-Omega simulates this-me, the (backwards) flow of measure ensures that the subjective probabilities of finding myself in real-me and counterfactual-me must be fairly close together; consequently this remains my decision even in the Almondian variety. The purer and more elegant version of syntacticism doesn't place a measure on the Tegmark-space at all, but that makes it difficult to explain the regularity of our universe - without a probability distribution on Tegmark-space, you can't even mathematically approach anthropics. However, in that version counterfactual-me 'exists to the same extent that I do', and so again the answer is trivially "give Omega the $100".
Counterfactual problems can be solved in general by taking one's utilitarian summation over all of syntax-space rather than merely one's own Universe/hubble bubble/Everett branch. The outstanding problem is whether syntax-space should have a measure and if so what its nature is (and whether this measure can be computed).
Imagine that one day you come home to see your neighbors milling about your house and the Publisher's Clearinghouse (PHC) van just pulling away. You know that PHC has been running a new schtick recently of selling $100 lottery tickets to win $10,000 instead of just giving money away. In fact, you've used that very contest as a teachable moment with your kids to explain how once the first ticket of the 100 printed was sold, scratched, and determined not to be the winner -- that the average expected value of the remaining tickets was greater than their cost and they were therefore increasingly worth buying. Now, it's weeks later, most of the tickets have been sold, scratched, and not winners and they came to your house. In fact, there were only two tickets remaining. And you weren't home. Fortunately, your neighbor and best friend Bob asked if he could buy the ticket for you. Sensing a great human interest story (and lots of publicity), PHC said yes. Unfortunately, Bob picked the wrong ticket. After all your neighbors disperse and Bob and you are alone, Bob says that he'd really appreciate it if he could get his hundred dollars back. Is he mugging you? Or, do you give it to him?
Yes, I think you still owe him the $100.
But I like how you made it into a relatively realistic scenario.
Considering the ticket was worth $5,000 when he bought it, sure.
Did you give the same answer to Omega? The cases are exactly analogous. (Or do you argue that they are not?)
The disanalogy here is that you have a long term social relationship with Bob that you don't have with Omega, and the $100 are an investment into that relationship.
The outcomes don't seem to be tied together as they were in the original problem; is it true that if had he won, he would only then have given you the money if, had he not won, you would have given him the $100 back? That isn't clear.
If I found myself in this kind of scenario then it would imply that I was very wrong about how I reason about anthropics in an ensemble universe (as with Pascal's mugging or any sort of situation where an agent has enough computing power to take control of that much of my measure such that I find myself in a contrived philosophical experiment). In fact, I would be so surprised to find myself in such a situation that I would question the reasoning that led me to think one boxing was the best course of action in the first place, because somewhere along the way my model became very confused. (I'd still one box, but it would seem less obvious after taking into account the huge amount of previously unexpected structural uncertainty my model of the world suddenly has to deal with.)
I see some reasons for this perspective but I'm not sure.
On the one hand, I don't know much about the distribution of agent preferences in an ensemble universe. But there may be enough long towers of nested simulations of agents like us to compensate for this.
This problem seems conceptually identical to Kavka's toxin puzzle; we have merely replaced intending to drink the poison/pay $100 with being the sort of person whom Omega would predict would do it.
Since, as has been pointed out, one needn't be a perfect predictor for the game to work, I think I'll actually try this on some of my friends.
Thanks for reminding of the Kavka's puzzle. I think that puzzle is unnecessarily mental in its formulation, for example you have to "intend". It's less confusing when you work on more technical concepts of decision-making, evidence, preference and precommitment.
I can't imagine how you are going to perform this on your friends...
The main problem, I think, is getting them to believe that I'm a reliable predictor (i.e. that I predict as well as I claim I do).
Actually, I don't know that if I do this it will show anything relevant to the problem under consideration. But I think it will show something. It has in fact already shown that I believe that 59% of them would agree to give me the money, either because they are sufficiently similar to Eliezer, or because they enjoy random acts of silliness (and the amount of money involved will be pretty trivial).
Did you do it? And if so, did you give away money to the friends you predicted would have given you money, if the coin came up that way?
How much money did you lose?
Normally, you can assume your thought processes are uncorrelated with whats out there. Newcomb-like problems however, do have the state of the outside universe correlated with your actual thoughts, and this is what throws people off.
If you are unsure if the state of the universe is X or Y (say with p = 1/2 for simplicity), and we can chose either option A or B, we can calculate the expected utility of choosing A vs B by taking 1/2u(A,X)+1/2u(A,Y) and comparing it to 1/2u(B,X)+1/2u(B,Y).
In a newcomb-like problem, where the state of the experiment is actually dependent on your choice, the expected utility comparison should now be ~1u(A,X)+~0u(A,Y) vs ~0u(B,X)+~1u(B,Y).
In this case, it boils down to "Is u(A,X) > u(B,Y)?".
It is not enough for Omega to have a decent record of getting it right, since you could probably do pretty well by reading peoples comments and guessing based on that.
If Omega made its prediction solely based on a comment you made on LessWrong, you should expect that if you choose A the universe will be in the same state as if you choose b- knowing your ultimate decision doesn't tell you anything, since the only relevant evidence is what you said a month ago.
If, however, Omega actually simulates your thought process in sufficient detail to know for sure which choice you made, knowing that you ultimately decide to pick A is strong evidence that omega has set up X, and if you choose B, you better expect to see Y.
The reason that the answer changes is that the state of the box actually does depend on the thoughts themselves- it's just that you thought the same thoughts when omega was simulating you before filling the boxes/flipping the coin.
If you aren't sure whether you're just Omega's simulation, you better one box/pay omega. If we're talking about a wannabe Omega that just makes decent predictions based off comments, then you defect (though if you actually expect a situation like this to come up, you argue that you won't)
Omega's actions depend only on your decision (action), or in this case counterfactual decision, not on your thoughts or the algorithm you use to reach the decision. The action of course depends on your thoughts, but that's the usual case. You may move several steps back, seeking the ultimate cause, but that's pretty futile.
I convinced myself to one-box in Newcomb by simply treating it as if the contents of the boxes magically change when I made my decision. Simply draw the decision tree and maximize u-value.
I convinced myself to cooperate in the Prisoner's Dilemma by treating it as if whatever decision I made the other person would magically make too. Simply draw the decision tree and maximize u-value.
It seems that Omega is different because I actually have the information, where in the others I don't.
For example, In Newcomb, if we could see the contents of both boxes, then I should two-box, no? In the Prisoner's Dilemma, if my opponent decides before me and I observe the decision, then I should defect, no?
I suspect that this means that my thought process in Newcomb and the Prisoner's Dilemma is incorrect. That there is a better way to think about them that makes them more like Omega. Am I correct? Does this make sense?
Yes, the objective in designing this puzzle was to construct an example where according to my understanding of the correct way to make decision, the correct decision looks like losing. In other cases you may say that you close your eyes, pretend that your decision determines the past or other agents' actions, and just make the decision that gives the best outcome. In this case, you choose the worst outcome. The argument is that on reflection it still looks like the best outcome, and you are given an opportunity to think about what's the correct perspective from which it's the best outcome. It binds the state of reality to your subjective perspective, where in many other thought experiments you may dispense with this connection and focus solely on the reality, without paying any special attention to the decision-maker.
In Newcomb, before knowing the box contents, you should one-box. If you know the contents, you should two-box (or am I wrong?)
In Prisoner, before knowing the opponent's choice, you should cooperate. After knowing the opponent's choice, you should defect (or am I wrong?).
If I'm right in the above two cases, doesn't Omega look more like the "after knowing" situations above? If so, then I must be wrong about the above two cases...
I want to be someone who in situation Y does X, but when Y&Z happens, I don't necessarily want to do X. Here, Z is the extra information that I lost (in Omega), the opponent has chosen (in Prisoner) or that both boxes have money in them (in Newcomb). What am I missing?
No - in the prisoners' dilemma, you should always defect (presuming the payoff matrix represents utility), unless you can somehow collectively pre-commit to co-operating, or it is iterative. This distinction you're thinking of only applies when reverse causation comes into play.
There is a caveat: if you are an agent who is constructed to live in the world where Omega tossed its coin to come out tails, so that the state space for which your utility function and prior are defined doesn't contain the areas corresponding to the coin coming up heads, you don't need to give up $100. You only give up $100 as a tribute to the part of your morality specified on the counterfactual area of the state space.
I would one-box on Newcombe, and I believe I would give the $100 here as well (assuming I believed Omega).
With Newcombe, if I want to win, my optimal strategy is to mimic as closely as possible the type of person Omega would predict would take one box. However, I have no way of knowing what would fool Omega: indeed if it is a sufficiently good predictor there may be no such way. Clearly then the way to be "as close as possible" to a one-boxer is to be a one-boxer. A person seeking to optimise their returns will be a person who wants their response to such stimulus to be "take one box". I do want to win, so I do want my response to be that, so it is: I'm capable of locking my decisions (making promises) in ways that forgo short-term gain for longer term benefit.
The situation here is the same, even though I have already lost. It is beneficial for me to be that type of person in general (obscured by the fact that the situation is so unlikely to occur). Were I not the type of person who made the decision to pay out on loss, I would be the type of person that lost $10000 in an equally unlikely circumstance. Locking that response in now as a general response to such occurrances means I'm more likely to benefit than those who don't.
Well, the other way to look at it is "What action leads me to win?" in the Newcomb problem, one-boxing wins, so you and I are in agreement there.
But in this problem, not-giving-away-$100 wins. Sure, I want to be the "type of person who one boxes", but why do I want to be that person? Because I want to win. Being that type of person in this problem actually makes you lose.
The problem states that this is a one-shot bet, and that after you do or don't give Omega the $100, he flies away from this galaxy and will never interact with you again. So why give him the $100? It won't make you win in the long term.
Yes, but Omega isn't really here yet, and you, Nebu, deciding right now that you will give him $100 does make you win, since it gives you a shot at $10000.
Right, so if a normal person offered me the bet (and assuming I could somehow know it was a fair coin) then yes, I would accept the bet.
If it was Omega instead of a normal person offering the bet, we run into some problems...
But if Omega doesn't actually offer the bet, and just does what is described by Vladimir Nesov, then I wouldn't give him the $100. [1]
In other words, I do different things in different situations.
Edit 1: (Or maybe I would. I haven't figured it out yet.)
The problem only asks about what you would do in the failure case, and I think this obscures the fact that the relevant decision point is right now. If you would refuse to pay, that means that you are the type of person who would not have won had the coin flip turned out differently, either because you haven't considered the matter (and luckily turn out to be in the situation where your choice worked out better), or because you would renege on such a commitment when it occurred in reality.
However at this point, the coin flip hasn't been made. The globally optimal person to be right now is one that does precommit and doesn't renege. This person will come out behind in the hypothetical case as it requires we lock ourselves into the bad choice for that situation, but by being a person who would act "irrationally" at that point, they will outperform a non-committer/reneger on average.
What if there is no "on average", if the choice to give away the $100 is the only choice you are given in your life? There is no value in being the kind of person who globally optimizes because of the expectation to win on average. You only make this choice because it's what you are, not because you expect the reality on average to be the way you want it to be.
From my perspective now, I expect the reality to be the winning case 50% of the time because we are told this as part of the question: Omega is trustworthy and said it tossed a fair coin. In the possible futures where such an event could happen, 50% of the time my strategy would have paid off to a greater degree than it would lose the other 50% of the time. If omega did not toss a fair coin, then the situation is different, and my choice would be too.
There is no value in being such a person if they happen to lose, but that's like saying there's no value in being a person who avoids bets that lose on average by only posing the 1 in several million time they would have won the lottery. On average they'll come out ahead, just not in the specific situation that was described.
I think my answer would be "I would have agreed, had you asked me when the coin chances were .5 and .5. Now that they're 1 and 0, I have no reason to agree."
Seriously, why stick with an agreement you never made? Besides, if Omega can predict me this well he knows how the coin will come up and how I'll react. Why then, should I try to act otherwise. Somehow, I think I just don't get it.