Let us say you are a doctor, and you are dealing with a malaria epidemic in your village. You are faced with two problems. First, you have no access to the drugs needed for treatment. Second, you are one of two doctors in the village, and the two of you cannot agree on the nature of the disease itself. You, having carefully tested many patients, being a highly skilled, well-educated diagnostician, have proven to yourself that the disease in question is malaria. Of this you are >99% certain. Yet your colleague, the blinkered fool, insists that you are dealing with an outbreak of bird flu, and to this he assigns >99% certainty.

Well, it need hardly be said that someone here is failing at rationality. Rational agents do not have common knowledge of disagreements etc. But... what can we say? We're human, and it happens.

So, let's say that one day, OmegaDr. House calls you both into his office and tells you that he knows, with certainty, which disease is afflicting the villagers. As confident as you both are in your own diagnoses, you are even more confident in House's abilities. House, however, will not tell you his diagnosis until you've played a game with him. He's going to put you in one room and your colleague in another. He's going to offer you a choice between 5,000 units of malaria medication, and 10,000 units of bird-flu medication. At the same time, he's going to offer your colleague a choice between 5,000 units of bird-flu meds, and 10,000 units of malaria meds.

(Let us assume that keeping a malaria patient alive and healthy takes the same number of units of malaria meds as keeping a bird flu patient alive and healthy takes bird flu meds).

You know the disease in question is malaria. The bird-flu drugs are literally worthless to you, and the malaria drugs will save lives. You might worry that your colleague would be upset with you for making this decision, but you also know House is going to tell him that it was actually malaria before he sees you. Far from being angry, he'll embrace you, and thank you for doing the right thing, despite his blindness.

So you take the 5,000 units of malaria medication, your colleague takes the 5,000 units of bird-flu meds (reasoning in precisely the same way), and you have 5,000 units of useful drugs with which to fight the outbreak.

Had you each taken that which you supposed to be worthless, you'd be guaranteed 10,000 units. I don't think you can claim to have acted rationally.

Now obviously you should be able to do even better than that. You should be able to take one another's estimates into account, share evidence, revise your estimates, reach a probability you both agree on, and, if the odds exceed 2:1 in one direction or the other, jointly take 15,000 units of whatever you expect to be effective, and otherwise get 10,000 units of each. I'm not giving out any excuses for failing to take this path.

But still, both choosing the 5,000 units strictly loses. If you can agree on nothing else, you should at least agree that cooperating is better than defecting.

Thus I propose that the epistemic prisoner's dilemma, though it has unique features (the agents differ epistemically, not preferentially) should be treated by rational agents (or agents so boundedly rational that they can still have persistent disagreements) in the same way as the vanilla prisoner's dilemma. What say you?

New to LessWrong?

New Comment
46 comments, sorted by Click to highlight new comments since: Today at 8:51 AM

Another example:

Pretend we have two people, a Republican and a Democrat, who can each donate to three charities: The Republican party, the Democratic Party, and a non-political charity.

Both people's utility is increasing in the amount of resources that their party and the non-political charity gets. And, as you would expect, the Republican is made worse off the more the Democratic party gets and the Democrat is made worse off the more the Republican party gets.

The two people would benefit from an agreement in which they each agreed to give a higher percentage of their charitable dollars to the non-political charity than the would have absent this agreement.

To the best of my knowledge, such agreements are never made.

But similar agreements are made in politics. A close (but not isomorphic) example would be vote-trading and especially vote-pairing.

Perhaps it's just that no one has suggested it before or gone through with it. Dozens of people have suggested prediction registries or prediction markets, for example, but the number of working & large examples can be counted on one hand.

Yes, this should be treated the same as PD. What are you driving at?

To my mind there's not much point to PD except this: if someone else carelessly controls a variable you care about, and the problem has been carefully construed to prohibit you from manipulating their decision... well, it sucks to be you.

It doesn't matter how the preference order of the players came to be as it is. Cooperation is a Pareto improvement according to that preference order, and in this case the only one available, so go for it.

Seems pretty similar to the "true prisoner's dilemma".

If both of you can self-bind (i.e. self-modify to act in a specific way in one specific future situation) and prove you've done so, you can make a mutual agreement to each take the 10k units.

If you can provably self-commit but the other doctor can't, make a one-sided commitment along the following lines: "I'll take the 10k units, and afterwards will trade them to you for 10k units of the other medicine. If you don't want to trade for the medicine I got, I will destroy it completely, even if by that point I know that this will condemn 10k people to die horribly.".

Obviously, if the other doctor can provably self-bind and you can't, suggest that they commit along the same lines.

If one of you is willing to self-bind on this and can prove what code their mind is implemented with, that should (given arbitrarily large computing power) be enough to prove that they have done so.

Alternatively, if you like to cheat: Take the more expensive package of medicine, and if it turns out you took the set for the wrong disease, trade it for a larger amount of the other medicine on the market.

[-][anonymous]15y20

Well, it need hardly be said that someone here is failing at rationality. Rational agents do not have common knowledge of disagreements etc.

That's not strictly true. They could both be completely rational but each assume the other is irrational or dishonest.

Assuming the 99% likelihood I assign to the disease being malaria doesn't change, if I can't communicate with my colleague I obviously take the 5000 units of malaria meds. If I can communicate, I'll do my best to convince my colleague to cooperate so he takes the 10 000 units of malaria meds, and then I take the other 5000 units of malaria meds.

Either I save 5000 people or I save 15 000 people with 99% likelihood (instead of saving 0 or 10 000 people), which is similar to avoiding 5 years or 10 years in prison (instead of avoiding 0 years or 9.5 years). So yeah, it is similar to the prisoner's dilemma.

It seems like you assume implicitly that there's an equal probability of the other doctor defecting: (0 + 10,000)/2 < (5,000 + 15,000)/2. That makes sense in the original prisoner's dilemma, but given that you can communicate, why assume this?

It doesn't make a difference. I'm better off defecting no matter what the other doctor does. Like I said, I'll try to convince him to cooperate and then I'll break our agreement. If I succeed, good for me; if I fail, at least I'll have saved 5000 people.

That's only if there's a single iteration of this dilemma, of course. If I have reason to believe there will be three iterations and if I'm pretty sure I managed to convince the other doctor, I should cooperate (10000 * 3 > 15 000 + 5000 + 5000).

What if you're wrong?

What if I'm wrong? Well, what if my house gets hit by a meteor today, and I get seriously wounded? Should I then regret not having left my house today?

I could wish I had left, but regretting my decision would be silly. We can only ever make decisions with the information that's available to us at the moment. Right now I have every reason to believe my house will not get hit by a meteor, and I feel like staying at home, so that's the best decision. Likewise, in the OP's scenario I have every reason to believe the disease is malaria, so getting my hands on as much malaria medication as I can is the best decision. That's all there is to it.

But in this case, someone with a degree of astronomical knowledge comparable to yours, acting in good faith, has come up to you and has said "I'm 99% confident that a meteor will hit your house today. You should leave." Why not investigate his claim before dismissing it?

The original post specifies that even taking account of the other doctor's opinion, we're still 99% sure. This seems pretty unlikely, unless we know that the other doctor is really very rationally deficient, but it's the scenario we're discussing.

Out of curiosity, do you cooperate or defect against an unfriendly superintelligence in the regular prisoner's dilemma?

I'm one of the human beings that Eliezer has so much trouble imagining: While I'm not (entirely) selfish myself, I have no trouble acting as if I were completely selfish for the purpose of playing in the vanilla prisoner's dilemma. Consequently, it's of no relevance to me that the other agent is an unfriendly superintelligence, rather than a friendly human being. I defect in both cases.

well, thanks for the heads-up =)

How many people are there in the village? If there are less than 5000, it makes no differences how many doses of the correct drug we have, so long as we have at least 5000. Otherwise...

(And if there are between 5000 and 10,000 people, the two doctors can just agree to take each the 10,000 doses of the other drug, so that whichever the true disease is, we have 10,000 doses of the drug for it. This only fully resembles the Prisoner's Dilemma if there are more than 10,000 people.)

I need a clarification. In the PD with the supertintelligence, you are not in a position to negotiate because values cannot be negotiated, right? However, epistemology can be negotiated, right? You can point out the various symptoms that make it sure that it is malaria. he can point out the various symptoms that make it sure it is bird flu. You can agree to tests and counter-tests.

There are no tests for wants and desires. But we can agree about ways to find out, can't we?

Absolutely. Unfortunately, even when everyone shares all the information they can, some disagreements persist long after they should.

Building the typical payoff matrix:

C,C - 10k, 10k (after trade)
C,D - 0k, 15k (assuming I give him what I consider useless)
D,C - 15k, 0k (and vice versa)
D,D - 5k, 5k

I am not tracking how this is any different than the standard model except you can communicate beforehand. Does it have to do with the >99% certainty of value?

(Note) Thinking and typing here, so no guarantee of value. If someone sees a misstep though, please point it out. I am still learning this.

So... that would >99% certainty of 15k, 15k on cooperation, <1% certainty of 15k, 15k. Actually, no, that would be 15k, 0k if I was right and 0k, 15k if he was right. So ignore that, here is a new matrix.

C,C - >99% 10k, 0k; 99% 0k, 0k; 99% 15k, 0k; 99% 5k, 0k; <1% 0k, 5k

I suppose, since we are assuming trading it would make it simpler to just consider the rewards as pooled afterwards. It really makes no difference which doctor ends up with the medicine.

C,C - 100% 10k; 99% 0k; 99% 15k; 99% 5k; <1% 5k

Taking out the failed diagnoses:

C,C - 100% 10k
C,D - 99% 15k
D,D - 100% 5k

Yeah, that is not a prisoner's dilemma. Especially if you can convince the other doctor to cooperate. Assuming probability P for the other doctor cooperating (this is the part where I may need help):

C: P 10k + (1 - P) 99% 15k + (1 - P) 5k

And that is as far as I can go. I assume that there is some way to map which choice is better for which values of P.

When I first read down this, I decided to assume it's equally likely that he's right and you're the fool, thus assigning ~50% probability either way. Taking the 10,000 bird flu meds became a no-brainer.

Another way to look at it is that you consider the malaria meds valuable and the bird flu meds worthless, and vice versa for him. By Ricardo's law, it's thus best to take the bird flu medication, and trade with the other doctor before seeing Dr. House, and you can pretend as if you were simply offered 5,000 malaria meds versus 10,000 malaria meds.

This does appear to meet the entire definition of the Prisoner's Dilemma, but, for some reason, I have much less trouble imagining myself cooperating than in most versions.

Yes. Given the set-up, it's a standard prisoners' dilemma (well, at least if you add the proviso that it's guaranteed to be one-shot). Iff you can ensure that the other doctor will co-operate iff you co-operate, then co-operate. If not defect.

(Or, more accurately, a probabilistic version of this. E.g. assuming you are risk neutral, co-operate iff the other doctor has at least a 50% greater chance of co-operating if you co-operate than if you defect.)

I would condition my response on how trustworthy my colleague and I had been on previous diagnoses (and equally consistent with our confidence of our diagnoses). If we had both been equally good, I should trust his judgement as well as mine, then I would pick 10,000 of the wrong medicine and trust him to follow the same reasoning.

I think you have to take the 5k. The only way it doesn't leave everyone better off and save lives is if you don't actually believe your prior of >99%, in which case update your prior. I don't see how what he does in another room matters. Any reputational effects are overwhelmed by the ability to save thousands of lives.

However, I also don't see how you can cooperate in a true one-time prisoner's dilemma without some form of cheating. The true PD presumes that I don't care at all about the other side of the matrix, so assuming there isn't some hidden reason to prefer co-operation - there are no reputational effects personally or generally, no one can read or reconstruct my mind, etc - why not just cover the other side's payoffs? The payoff looks a lot like this: C -> X, D -> X+1, where X is unknown.

Also, as a fun side observation, this sounds suspiciously like a test designed to figure out which of us actually thinks we're >99% once we take into account the other opinion and which of us is only >99% before we take into account the other opinion. Dr. House might be thinking that if we order 15k of either medicine that one is right often enough that his work here is done. I'd have to assign that p>0.01, as it's actually a less evil option than taking him at face value. But I'm presuming that's not the case and we can trust the scenario as written.

[-][anonymous]4y20
I think you have to take the 5k. The only way it doesn't leave everyone better off and save lives is if you don't actually believe your prior of >99%

I'm not sure how much work and what kind of work the following color does:

You, having carefully tested many patients, being a highly skilled, well-educated diagnostician [is 99% confident one way.] Yet your colleague, the blinkered fool, [is 99% certain the other way]. Well, it need hardly be said that someone here is failing at rationality. [...] You should be able to take one another's estimates into account, share evidence, revise your estimates, reach a probability you both agree on.

I'm not sure what the assumptions are, here. If I have been using Testing of Patients while my colleague has been practicing Blinkered Folly, by which I mean forming truth-uncorrelated beliefs, taking his estimate into account shouldn't change my beliefs since they're not truth-correlated. He has no useful evidence, and he is impervious to mine.

But let's say we're playing a game of prisoners dilemma with payouts in two currencies. I can add either 5k malarian dollars or 10k birdfluian dollars to a pot which will be evenly shared by both (since we share the value of healing the sick), while my colleague has the reverse choice. The expected utilities if I'm right and he's wrong are 99*5k + 1*0k = 4950 and 1*10k + 99*0k = 100. I maximize by defecting. By a similar calculation, my colleague maximizes by cooperating but is too foolishly blinkered to see this.

I guess your point is that if we have commitments available, I can do better than my 5k and his 0k malarian dollars by humoring his delusions and agreeing on $10,000m + $10,000b, which he wants because $10,000b is "better" than the 5k he could get by defecting.

So: sometimes you can act real stupid in ways that bribe crazy people to act even less stupid, thereby increasing social utility compared to the alternative. Or, said another way, people act "rationally" from the perspective of their wrong beliefs—the payoff matrix in terms of malarian dollars reflect the utility of the patients and the rational doctor whereas the payoff matrix in terms of birdfluian dollars reflects the utility of blinkered doctor, where "utility" means "behavior-predicting abstraction" in case of the blinkered doctor.

If it is common knowledge between the two doctors that both practice Epistemic Rationality and Intellectual Virtue but collect disjoint bodies of evidence (I think this goes against the stated assumptions), one doctor's confidence is evidence of their conclusion to the other doctor, and they should both snap to 50% confidence once they learn about the other doctor's (previous) certainty. In that scenario it's a straightforward game of prisoner's dilemma, the exact same as before except with half the payoffs.

The closest I have come to studying decision theory for irrational agents is reinforcement learning in Markov Decision Processes. Maybe you can argue in favor of epsilon-greedy exploration, on the argument that getting the medicine will not be the last thing the doctors will experience, but then we're veering into a discussion about how to make and select maps rather than a discussion of how to select a route given the map the article author has drawn.

Also, as a fun side observation, this sounds suspiciously like a test designed to figure out which of us actually thinks we're >99% once we take into account the other opinion and which of us is only >99% before we take into account the other opinion. Dr. House might be thinking that if we order 15k of either medicine that one is right often enough that his work here is done. I'd have to assign that p>0.01, as it's actually a less evil option than taking him at face value. But I'm presuming that's not the case and we can trust the scenario as written.

I wasn't going there, but I like the thought =)

I am having some difficulty imagining that I am 99% sure of something, but I cannot either convince a person to outright agree with me or accept that he is uncertain and therefore should make the choice that would help more if it is right, but I could convince that same person to cooperate in the prisoner's dilemma. However, if I did find myself in that situation, I would cooperate.

I'm tipping my hand here, but...

Do you think you could convince a young-earth creationist to cooperate in the prisoner's dilemma?

Good point. I probably could. I expect that the young-earth creationist has a huge bias that does not have to interfere with reasoning about the prisoner's dilemma.

So, suppose Omega finds a young-earth creationist and an atheist, and plays the following game with them. They will each be taken to a separate room, where the atheist will choose between each of them receiving $10000 if the earth is less than 1 million years old or each receiving $5000 if the earth is more than 1 million years old, and the young earth creationist will have a similar choice with the payoffs reversed. Now, with prisoner's dilemma tied to the young earth creationist's bias, would I, in the role of the atheist still be able to convince him to cooperate? I don't know. I am not sure how much the need to believe that the earth is around 5000 years would interfere with recognizing that it is in his interest to choose the payoff for earth being over a million years old. But still, if he seemed able to accept it, I would cooperate.

(Edit: Fixed a typo reversing the payoffs.)

You've pretty much written tomorrow's post for me, though I was going to throw in some existential risk to make things more fun.

Now obviously you should be able to do even better than that. You should be able to take one another's estimates into account, share evidence, revise your estimates, reach a probability you both agree on

Hold on. I assume that, in accordance with a normal Prisoner's Dilemma, the two players aren't allowed to discuss stuff. So this is out.

Edit: And lo, MBlume added the following to his post:

I'm starting with them together, and free to speak briefly, for reasons which will be clear soon enough.

In that case, they can simply agree to take 10,000 of each. This is obvious.

Edit again: This seems inconsistent with my later view. Hmm.