I'm most fond of the precommitment argument. You say:
You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn't pre-commit and you didn't know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb's problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited.
I do not think this gets at the heart of the precommitment argument. You mention cousin_it's argument that what we care about is what decision theory we'd prefer a benevolent AI to use. You grant that this makes sense for that case, but you seem skeptical that the same reasoning applies to humans. I argue that it does.
When reasoning abstractly about decision-making, I am (in part) thinking about how I would like myself to make decisions in the future. So it makes sense for me to say to myself, "Ah, I'd want to be counterfactually mugged." I will count being-counterfactually-mugged as a point in favor of proposed ways of thinking about decisions; I will count not-being-mugged as a point against. This is not, in itself, a precommitment; this is just a heuristic about good and bad reasoning as it seems to me when thinking about it ahead of time. A generalization of this heuristic is, "Ah, it seems any case where a decision procedure would prefer to make a commitment ahead of time but would prefer to do something different in the moment is a point against that decision procedure". I will, thinking about decision-making in the abstract as things seem to me now, tend to prefer decision procedures which avoid such self-contradictions.
In other words, thinking about what constitutes good decision-making in the abstract seems a whole lot like thinking about how we would want a benevolent AI to make decisions.
You could argue that I might think such things now, and might think up all sorts of sophisticated arguments which fit that picture, but later, when Omega asks me for $100, if I re-think my decision-theoretic concepts at that time, I'll know better.
But, based on what principles would I be reconsidering? I can think of some. It seems to me now, though, that those principles are mistaken, and I should instead reason using principles which are more self-consistent -- principles which, when faced with the question of whether to give Omega $100, arrive at the same answer I currently think to be right.
Of course this cannot be a general argument that I prefer to reason by principles which will arrive at conclusions consistent with my current beliefs. What I can do is consider the impact which particular ways of reasoning about decisions have on my overall expected utility (assuming I start out reasoning with some version of expected utility theory). Doing so, I will prefer UDT-like ways of reasoning when it comes to problems like counterfactual mugging.
You might argue that beliefs are for true things, so I can't legitimately discount ways-of-thinking just because they have bad consequences. But, these are ways-of-thinking-about-decisions. The point of ways-of-thinking-about-decisions is winning. And, as I think about it now, it seems preferable to think about it in those ways which reliably achieve higher expected utility (the expectation being taken from my perspective now).
Nor is this a quirk of my personal psychology, that I happen to find these arguments compelling in my current mental state, and so, when thinking about how to reason, prefer methods of reasoning which are more consistent with precommitments I would make. Rather, this seems like a fairly general fact about thinking beings who approach decision-making in a roughly expected-utility-like manner.
Perhaps you would argue, like the CDT-er sometimes does in response to Newcomb, that you cannot modify your approach to reasoning about decisions so radically. You see that, from your perspective now, it would be better if you reasoned in a way which made you accept future counterfactual muggings. You'd see, in the future, that you are making a choice inconsistent with your preferences now. But this only means that you have different preferences then and now. And anyway, the question of decision theory should be what to do given preferences, right?
You can take that perspective, but it seems you must do so regretfully -- you should wish you could self-modify in that way. Furthermore, to the extent that a theory of preferences sits in the context of a theory of rational agency, it seems like preferences should be the kind of think which tend to stay the same over time, not the sort of thing which change like this.
Basically, it seems that assuming preferences remain fixed, beliefs about what you should do given those preferences and certain information should not change (except due to bounded rationality). IE: certainly I may think I should go to the grocery store but then change my mind when I learn it's closed. But I should not start out thinking that I should go to the grocery store even in the hypothetical where it's closed, and then, upon learning it's closed, go home instead. (Except due to bounded rationality.) That's what is happening with CDT in counterfactual mugging: it prefers that its future self should, if asked for $100, hand it over; but, when faced with the situation, it thinks it should not hand it over.
The CDTer response ("alas, I cannot change my own nature so radically") presumes that we have already figured out how to reason about decisions. I imagine that the real crux behind such a response is actually that CDT feels like the true answer, so that the non-CDT answer does not seem compelling even once it is established to have a higher expected value. The CDTer feels as if they'd have to lie to themselves to 1-box. The truth is that they could modify themselves so easily, if they thought the non-CDT answer was right! They protest that Newcomb's problem simply punishes rationality. But this argument presumes that CDT defines rationality.
An EDT agent who asks how best to act in future situations to maximize expected value in those situations will arrive back at EDT, since expected-value-in-the-situation is the very criterion which EDT already uses. However, this is a circular way of thinking -- we can make a variant of that kind of argument which justifies any decision procedure.
A CDT or EDT agent who asks itself how best to act in future situations to maximize expected value as estimated by its current self will arrive at UDT. Furthermore, that's the criterion it seems an agent ought to use when weighing the pros and cons of a decision theory; not the expected value according to some future hypothetical, but the expected value of switching to that decision theory now.
And, remember, it's not the case that we will switch back to CDT/EDT if we reconsider which decision theory is highest-expected-utility when we are later faced with Omega asking for $100. We'd be a UDT agent at that point, and so, would consider handing over the $100 to be the highest-EV action.
I expect another protest at this point -- that the question of which decision theory gets us the highest expected utility by our current estimation isn't the same as which one is true or right. To this I respond that, if we ask what highly capable agents would do ("highly intelligent"/"highly rational"), we would expect them to be counterfactually mugged -- because highly capable agents would (by the assumption of their high capability) self-modify if necessary in order to behave in the ways they would have precommitted to behave. So, this kind of decision theory / rationality seems like the kind you'd want to study to better understand the behavior of highly capable agents; and, the kind you would want to imitate if trying to become highly capable. This seems like an interesting enough thing to study. If there is some other thing, "the right decision theory", to study, I'm curious what that other thing is -- but it does not seem likely to make me lose interest in this thing (the normative theory I currently call decision theory, in which it's right to be counterfactually mugged).
a) it's possible that a counterfactual mugging situation could have been set up before an AI was built
My perspective now already includes some amount of updateless reasoning, so I don't necessarily find that compelling. However, I do agree that even according to UDT there's a subjective question of how much information should be incorporated into the prior. So, for example, it seems sensible to refuse counterfactual mugging on the first digit of pi.
Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map?
It seems worth pointing out that we might deal with this via anthropic reasoning. We don't need to believe that the counterfactual selves literally exist; rather, we are unsure whether we are being simulated. If we are being simulated, then the other self (in a position to get $1000) really does exist.
Caveat ----
There are a few hedge-words and qualifiers in the above which the casual reader might underestimate the importance of. For example, when I say
(except due to bounded rationality)
I really mean that many parts of the argument I'm making crumbles to dust in the face of bounded rationality, not that bounded rationality is a small issue which I set aside for convenience in the argument above. Keep in mind that I've recently been arguing against UDT. However, I do still think it is right to be counterfactually mugged, for something resembling the reasons I gave. It's just that many details of the argument I'm making really don't work for embedded agents -- to such a large extent that I've become pessimistic about UDT-like ideas.
You grant that this makes sense for that case, but you seem skeptical that the same reasoning applies to humans
I ultimately don't see much of a distinction between humans and AIs, but let me clarify. If we had the ability to perfectly pre-commit then we'd make pre-commitments that effectively would be the same as an AI self-modifying. Without this ability, this argument is slightly harder to make, but I think it still applies. I've attempted making it in the past although I don't really feel I completely succeeded.
Ah, it seems any case...
I'm new here. May I ask what's the core difference between the UDT and the FDT? Also, which is better and why?
I find that the "you should pay" answer is confused and self-contradictory in its reasoning. Like in all the OO (Omniscient Omega) setups, you, the subject, have no freedom of choice as far as OO is concerned, you are just another deterministic automaton. So any "decision" you make to precommit to a certain action has already been predicted (or could have been predicted) by OO, including any influence exerted on your thought process by other people telling you about rationality and precommitment. To make it clearer, anyone telling you to one-box in the Newcomb's problem in effect uses classical CDT (which advises two-boxing), because they assume that you have the freedom to make a decision in a setup where your decisions are predetermined. If that were so, two-boxing would make more sense, defying the OO infallibility assumption.
So, the whole reasoning advocating for one-boxing and for paying the mugger does not hold up to basic scrutiny. A self-consistent answer would be "you are a deterministic automaton, whatever you feel or think or pretend to decide is an artifact of the algorithm that runs you, so the question whether to pay is meaningless, you either will pay or will not, you have no control over it."
Of course, this argument only applies to OO setups. In "reality" there are no OO that we know of, the freedom of choice debate is far from resolved, and if one assumes that we are not automatons whose actions are set in stone (or in the rules of quantum mechanics), then learning to make better decisions is not a futile exercise. One example is the twin prisoner dilemma, where the recommendation to cooperate with one's twin is self-consistent.
Newcomb's paradox still works if Omega is not infallible, just right a substantial proportion of the time. Between the two extremes you have described, of free choice, unpredictable by Omega, and deterministic absence of choice, lies people's real psychology.
Just what is my power to sever links of a causal graph that point towards me? If I am faced with a wily salesman, how shall I be sure of making my decision to buy or not by my own values, taking into account what is informative from the salesman, but uninfluenced by his dark arts? Do I even k...
Again, we seem to just have foundational disagreements here. Free will is one of those philosophical topics that I lost interest in a long time ago, so I'm happy to leave it to others to debate.
Just ask which algorithm wins then. At least in these kinds of situations udt does better. The only downside is the algorithm has to check if it's in this kind of situation; it might not be worth practicing.
Is it forbidden to ask about Quantum Mechanics and Decission Theory? I got banned with the other account and I don't understand why. It was a serious question.
Hey, moderator here. The reason for banning your previous account was mostly just that we get a lot of quantum-theory crackpots, and your post had a lot of markings of someone in that reference class. The posts you wrote on this account seem a bit better, though the use of multiple punctuation marks in a row, and a somewhat unclear structure still make me hesitant. I will approve one of your two posts for now, and we will see how it goes.
Sorry for putting you under this additional scrutiny, but we get enough people who are really confused about various aspects of quantum mechanics and want to tell anyone about their opinions that we need to have somewhat high barriers for entry in that domain.
Sure, it is fine!! I imagine that is a high problem. As a physicist, although not someone in quantum mechanics, I tried to be precise.
Anyway, your answer has been good. It seems as the paper has been debunked.
why can’t we just imagine that you are an agent that doesn’t care about counterfactual selves?
Caring about counterfactual selves is part of UDT, though. If you simply assume that it doesn't hold, and ask proponents of UDT to argue under that assumption, I'm not sure there's a good answer.
Interesting. Do you taken caring about counterfactual selves as foundational - in the sense that there is no why, you either do or do not?
No, not like that. I think there is an argument for caring about counterfactual selves. But it cannot be carried out from the assumption that the agent doesn't care about counterfactual selves. You're just asking me to do something impossible.
I guess my argument is based on imagining at the start that agents either can care about counterfactual selves or not. But agents that don't are a bit controversial, so let's imagine such an agent and see if we run into any issues. So imagine a consistent agent that doesn't care about counterfactual selves except insofar as they "could be it" from its current epistemic position. I can't see any issues with this - it seems consistent. And my challenge is for you to answer why this isn't a valid set of values to have.
Let's imagine a kind of symmetric counterfactual mugging. In case of heads, Omega says: "The coin came up heads, now you can either give me $100 or refuse. After that, I'll give you $10000 if you would've given me $100 in case of tails". In case of tails, Omega says the same thing, but with heads and tails reversed. In this situation, an agent who doesn't care about counterfactual selves always gets 0 regardless of the coin, while an agent who does care always gets $9900 regardless of the coin.
I can't think of any situation where the opposite happens (the non-caring agent gets more with certainty). To me that suggests the caring agent is more rational.
Yeah, I actually stumbled upon this argument myself this morning. Has anyone written this up beyond this comment as this seems like the most persuasive argument for paying? This suggests that never caring is not a viable position.
I was thinking today about whether there are any intermediate positions, but I don't think they are viable. Only caring about counterfactuals when you have a prisoner's dilemma-like situation seems an unprincipled fudge.
Do you think you'll write a post on it? Because I was thinking of writing a post, but if you were planning on doing this then that would be even better as it would probably get more attention.
In this situation, an agent who doesn't care about counterfactual selves always gets 0 regardless of the coin
Since the agent is very correlated with its counterfactual copy, it seems that superrationality (or even just EDT) would make the agent pay $100, and get the $10000.
Actually, the counterfactual agent makes a different observation (heads instead of tails) so their actions aren't necessarily linked
I just thought of another argument. Imagine that before being faced with counterfactual mugging, the agent can make a side bet on Omega's coin. Let's say the agent who doesn't care about counterfactual selves chooses to bet X dollars on heads, so the income is X in case of heads and -X in case of tails. Then the agent who cares about counterfactual selves can bet X-5050 on heads (or if that's negative, bet 5050-X on tails). Since this agent agrees to pay Omega, the income will be 10000+X-5050=4950+X in case of heads, and 5050-X-100=4950-X in case of tails. So in both cases the caring agent gets 4950 dollars more than the non-caring agent. And the opposite is impossible: no matter how the two agents bet, the caring agent always gets more in at least one of the cases.
"Imagine that before being faced with counterfactual mugging, the agent can make a side bet on Omega's coin" - I don't know if that works. Part of counterfactual mugging is that you aren't told before the problem that you might be mugged, otherwise you could just pre-commit.
If Omega didn't know the outcome of the flip in advance (and is telling the truth), then you should pay if 1/2*U(x+$10,000)+1/2*U(x-100) > U(x).
You could also tell Omega that the bet is riskier than you would have agreed to, but you would have been fine with winning $1,000 if you won, and paying $10 if you lost. (This doesn't work with anyone other than Omega though - Omega can predict what you'd agree to, and give you $1000 if you win, and ask for $10 if you lose. This would also have to be consistent with you paying the $10 though.)
Good point about risk also being a factor, but just the point in question isn't how to perform an expected utility calculation, but the justification of it
If I had agreed with Omega about the bet in advance, then I'd pay up. (This covers concerns about risk.)
So, would you pay if the agreement was made, not cleanly 'in the past', but time travel was involved?
No. I don't know the accuracy of the prediction. It's just that I already know the result of the coin flip.
Is there a way to prove that the coin toss was fair? In a broader (math/physics) sense, is it possible to prove that a historical event with a known outcome 'was' the result of a random process using only the observation of the outcome?
In the event of the 'many worlds' theory being true, there should exist a world where the coin flipped in the other direction, and 'parallel me' has been gifted 100,000.
If parallel me were to call me on my many worlds quantum iphone (this is my hypothetical, I get to have one), and confirm that he is calling from the universe where the coin went the other way, and he did in fact get paid, presumably contingent on me paying the person in front of me, I would probably pay.
Now, if I dial my many worlds quantum phone, and get an operator error, that means no parallel universe where parallel me won exists, and the 'coin flip' either did not happen, or was actually a predetermined event designed to win my mugger $100, in which case, I should not pay, and should probably clobber him on general principle.
Without the use of a hypothetical 'many worlds quantum iphone', is there a way to observe a coin laying on the ground displaying a 'heads' and prove that the coin was flipped (and therefore had the opportunity to be tails) vs was intentionally placed with the heads facing up.
The LessWrong Wiki defines Counterfactual Mugging as follows:
I expect that most people would say that you should pay because a 50% chance of $10000 for $100 is an amazing deal according to expected value. I lean this way too, but it is harder to justify than you might think.
After all, if you are being asked for $100, you know that the coin came up heads and you won't receive the $10000. Sure this means that if the coin would have been heads then you wouldn't have gained the $10000, but you know the coin wasn't heads so you don't lose anything. It's important to emphasise: this doesn't deny that if the coin had come up heads that this would have made you miss out on $10000. Instead, it claims that this point is irrelevant, so merely repeating the point again isn't a valid counter-argument.
You could argue that you would have pre-commited to paying if you had known about the situation ahead of time. True, but you didn't pre-commit and you didn't know about it ahead of time, so the burden is on you to justify why you should act as though you did. In Newcomb's problem you want to have pre-committed and if you act as though you were pre-committed then you will find that you actually were pre-committed. However, here it is the opposite. Upon discovering that the coin came up tails, you want to act as though you were not pre-commited to pay and if you act that way, you will find that you actually were indeed not pre-commited.
We could even channel Yudkowsky from Newcomb's Problem and Regret of Rationality: "Rational agents should WIN... It is precisely the notion that Nature does not care about our algorithm, which frees us up to pursue the winning Way - without attachment to any particular ritual of cognition, apart from our belief that it wins. Every rule is up for grabs, except the rule of winning... Unreasonable? I am a rationalist: what do I care about being unreasonable? I don't have to conform to a particular ritual of cognition. I don't have to take only box B because I believe my choice affects the box, even though Omega has already left. I can just... take only box B." You can just not pay the $100. (Vladimir Nesov makes this argument this exact same argument here).
Here's another common reason, I've heard as described by Cousin_it: "I usually just think about which decision theory we'd want to program into an AI which might get copied, its source code inspected, etc. That lets you get past the basic stuff, like Newcomb's Problem, and move on to more interesting things. Then you can see which intuitions can be transferred back to problems involving humans."
That's actually a very good point. It's entirely possible that solving this problem doesn't have any relevance to building AI. However, I want to note that: a) it's possible that a counterfactual mugging situation could have been set up before an AI was built b) understanding this could help deconfuse what a decision is - we still don't have a solution to logical counterfactuals c) this is probably a good exercise for learning to cut through philosophical confusion d) okay, I admit it, it's kind of cool and I'd want an answer regardless of any potential application.
Or maybe you just directly care about counterfactual selves? But why? Do you really believe that counterfactuals are in the territory and not the map? So why care about that which isn't real? Or even if they are real, why can't we just imagine that you are an agent that doesn't care about counterfactual selves? If we can imagine an agent that likes being hit on the head with a hammer, why can't we manage that?
Then there's the philosophical uncertainty approach. Even if there's only a 1/50 chance of your analysis being wrong, then you should pay. This is great if you face the decision in real life, but not if you are trying to delve into the nature of decisions.
So given all of this, why should you pay?