You're in Newcomb's Box
Part 1: Transparent Newcomb with your existence at stake
Related: Newcomb's Problem and Regret of Rationality
Omega, a wise and trustworthy being, presents you with a one-time-only game and a surprising revelation.
"I have here two boxes, each containing $100," he says. "You may choose to take both Box A and Box B, or just Box B. You get all the money in the box or boxes you take, and there will be no other consequences of any kind. But before you choose, there is something I must tell you."
Omega pauses portentously.
"You were created by a god: a being called Prometheus. Prometheus was neither omniscient nor particularly benevolent. He was given a large set of blueprints for possible human embryos, and for each blueprint that pleased him he created that embryo and implanted it in a human woman. Here was how he judged the blueprints: any that he guessed would grow into a person who would choose only Box B in this situation, he created. If he judged that the embryo would grow into a person who chose both boxes, he filed that blueprint away unused. Prometheus's predictive ability was not perfect, but it was very strong; he was the god, after all, of Foresight."
Do you take both boxes, or only Box B?
Newcomb's problem happened to me
Okay, maybe not me, but someone I know, and that's what the title would be if he wrote it. Newcomb's problem and Kavka's toxin puzzle are more than just curiosities relevant to artificial intelligence theory. Like a lot of thought experiments, they approximately happen. They illustrate robust issues with causal decision theory that can deeply affect our everyday lives.
Yet somehow it isn't mainstream knowledge that these are more than merely abstract linguistic issues, as evidenced by this comment thread (please no Karma sniping of the comments, they are a valuable record). Scenarios involving brain scanning, decision simulation, etc., can establish their validy and future relevance, but not that they are already commonplace. For the record, I want to provide an already-happened, real-life account that captures the Newcomb essence and explicitly describes how.
So let's say my friend is named Joe. In his account, Joe is very much in love with this girl named Omega… er… Kate, and he wants to get married. Kate is somewhat traditional, and won't marry him unless he proposes, not only in the sense of explicitly asking her, but also expressing certainty that he will never try to leave her if they do marry.
Now, I don't want to make up the ending here. I want to convey the actual account, in which Joe's beliefs are roughly schematized as follows:
- if he proposes sincerely, she is effectively sure to believe it.
- if he proposes insincerely, she will 50% likely believe it.
- if she believes his proposal, she will 80% likely say yes.
- if she doesn't believe his proposal, she will surely say no, but will not be significantly upset in comparison to the significance of marriage.
- if they marry, Joe will 90% likely be happy, and will 10% likely be unhappy.
He roughly values the happy and unhappy outcomes oppositely:
- being happily married to Kate: 125 megautilons
- being unhapily married to Kate: -125 megautilons.
So what should he do? What should this real person have actually done?1 Well, as in Newcomb, these beliefs and utilities present an interesting and quantifiable problem…
Blackmail, Nukes and the Prisoner's Dilemma
This example (and the whole method for modelling blackmail) are due to Eliezer. I have just recast them in my own words.
We join our friends, the Countess of Rectitude and Baron Chastity, in bed together. Having surmounted their recent difficulties (she paid him, by the way), they decide to relax with a good old game of prisoner's dilemma. The payoff matrix is as usual:
| (Baron, Countess) | Cooperate | Defect |
|---|---|---|
| Cooperate |
(3,3) | (0,5) |
| Defect |
(5,0) | (1,1) |
Were they both standard game theorists, they would both defect, and the payoff would be (1,1). But recall that the baron occupies an epistemic vantage over the countess. While the countess only gets to choose her own action, he can choose from among four more general tactics:
- (Countess C, Countess D)→(Baron D, Baron C) "contrarian" : do the opposite of what she does
- (Countess C, Countess D)→(Baron C, Baron C) "trusting soul" : always cooperate
- (Countess C, Countess D)→(Baron D, Baron D) "bastard" : always defect
- (Countess C, Countess D)→(Baron C, Baron D) "copycat" : do whatever she does
Recall that he counterfactually considers what the countess would do in each case, while assuming that the countess considers his decision a fixed fact about the universe. Were he to adopt the contrarian tactic, she would maximise her utility by defecting, giving a payoff of (0,5). Similarly, she would defect in both trusting soul and bastard, giving payoffs of (0,5) and (1,1) respectively. If he goes for copycat, on the other hand, she will cooperate, giving a payoff of (3,3).
Thus when one player occupies a superior epistemic vantage over the other, they can do better than standard game theorists, and manage to both cooperate.
"Isn't it wonderful," gushed the Countess, pocketing her 3 utilitons and lighting a cigarette, "how we can do such marvellously unexpected things when your position is over mine?"
The Blackmail Equation
This is Eliezer's model of blackmail in decision theory at the recent workshop at SIAI, filtered through my own understanding. Eliezer help and advice were much appreciated; any errors here-in are my own.
The mysterious stranger blackmailing the Countess of Rectitude over her extra-marital affair with Baron Chastity doesn't have to run a complicated algorithm. He simply has to credibly commit to the course of action:
"If you don't give me money, I will reveal your affair."
And then, generally, the Countess forks over the cash. Which means the blackmailer never does reveal the details of the affair, so that threat remains entirely counterfactual/hypothetical. Even if the blackmailer is Baron Chastity, and the revelation would be devastating for him as well, this makes no difference at all, as long as he can credibly commit to Z. In the world of perfect decision makers, there is no risk to doing so, because the Countess will hand over the money, so the Baron will not take the hit from the revelation.
Indeed, the baron could replace "I will reveal our affair" with Z="I will reveal our affair, then sell my children into slavery, kill my dogs, burn my palace, and donate my organs to medical science while boiling myself in burning tar" or even "I will reveal our affair, then turn on an unfriendly AI", and it would only matter if this changed his pre-commitment to Z. If the Baron can commit to counterfactually doing Z, then he never has to do Z (as the countess will pay him the hush money), so it doesn't matter how horrible the consequences of Z are to himself.
To get some numbers in this model, assume the countess can either pay up or not do so, and the baron can reveal the affair or keep silent. The payoff matrix could look something like this:
| (Baron, Countess) | Pay | Not pay |
|---|---|---|
| Reveal |
(-90,-110) | (-100,-100) |
| Silent |
(10,-10) | (0,0) |
The continued misuse of the Prisoner's Dilemma
Related to: The True Prisoner's Dilemma, Newcomb's Problem and Regret of Rationality
In The True Prisoner's Dilemma, Eliezer Yudkowsky pointed out a critical problem with the way the Prisoner's Dilemma is taught: the distinction between utility and avoided-jail-time is not made clear. The payoff matrix is supposed to represent the former, even as its numerical values happen to coincidentally match the latter. And worse, people don't naturally assign utility as per the standard payoff matrix: their compassion for the friend in the "accomplice" role means they wouldn't feel quite so good about a "successful" backstabbing, nor quite so bad about being backstabbed. ("Hey, at least I didn't rat out a friend.")
For that reason, you rarely encounter a true Prisoner's Dilemma, even an iterated one. The above complications prevent real-world payoff matrices from working out that way.
Which brings us to another unfortunate example of this misunderstanding being taught.
Timeless Decision Theory and Meta-Circular Decision Theory
(This started as a reply to Gary Drescher's comment here in which he proposes a Metacircular Decision Theory (MCDT); but it got way too long so I turned it into an article, which also contains some amplifications on TDT which may be of general interest.)
Ingredients of Timeless Decision Theory
Followup to: Newcomb's Problem and Regret of Rationality, Towards a New Decision Theory
Wei Dai asked:
"Why didn't you mention earlier that your timeless decision theory mainly had to do with logical uncertainty? It would have saved people a lot of time trying to guess what you were talking about."
...
All right, fine, here's a fast summary of the most important ingredients that go into my "timeless decision theory". This isn't so much an explanation of TDT, as a list of starting ideas that you could use to recreate TDT given sufficient background knowledge. It seems to me that this sort of thing really takes a mini-book, but perhaps I shall be proven wrong.
The one-sentence version is: Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.
The three-sentence version is: Factor your uncertainty over (impossible) possible worlds into a causal graph that includes nodes corresponding to the unknown outputs of known computations; condition on the known initial conditions of your decision computation to screen off factors influencing the decision-setup; compute the counterfactuals in your expected utility formula by surgery on the node representing the logical output of that computation.
Timeless Decision Theory: Problems I Can't Solve
Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you. Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions. The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"
Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver. "Yes," you say. "You're lying," says the driver, and drives off leaving you to die.
If only you weren't so rational!
This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma. Of course, any self-modifying agent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much. So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash. And indeed I've worked out a theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment". Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.
However, there are some other timeless decision problems for which I do not possess a general theory.
For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:
Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:
"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"
Collective Apathy and the Internet
Previously in series: Beware of Other-Optimizing
Followup to: Bystander Apathy
Yesterday I convered the bystander effect, aka bystander apathy: given a fixed problem situation, a group of bystanders is actually less likely to act than a single bystander. The standard explanation for this result is in terms of pluralistic ignorance (if it's not clear whether the situation is an emergency, each person tries to look calm while darting their eyes at the other bystanders, and sees other people looking calm) and diffusion of responsibility (everyone hopes that someone else will be first to act; being part of a crowd diminishes the individual pressure to the point where no one acts).
Which may be a symptom of our hunter-gatherer coordination mechanisms being defeated by modern conditions. You didn't usually form task-forces with strangers back in the ancestral environment; it was mostly people you knew. And in fact, when all the subjects know each other, the bystander effect diminishes.
So I know this is an amazing and revolutionary observation, and I hope that I don't kill any readers outright from shock by saying this: but people seem to have a hard time reacting constructively to problems encountered over the Internet.
Perhaps because our innate coordination instincts are not tuned for:
- Being part of a group of strangers. (When all subjects know each other, the bystander effect diminishes.)
- Being part of a group of unknown size, of strangers of unknown identity.
- Not being in physical contact (or visual contact); not being able to exchange meaningful glances.
- Not communicating in real time.
- Not being much beholden to each other for other forms of help; not being codependent on the group you're in.
- Being shielded from reputational damage, or the fear of reputational damage, by your own apparent anonymity; no one is visibly looking at you, before whom your reputation might suffer from inaction.
- Being part of a large collective of other inactives; no one will single out you to blame.
- Not hearing a voiced plea for help.
Newcomb's Problem standard positions
Marion Ledwig's dissertation summarizes much of the existing thinking that's gone into Newcomb's Problem.
(For the record, I myself am neither an evidential decision theorist, nor a causal decision theorist in the current sense. My view is not easily summarized, but it is reflectively consistent without need of precommitment or similar dodges; my agents see no need to modify their own source code or invoke abnormal decision procedures on Newcomblike problems.)
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)