Counterfactual self-defense

MrMind

Counterfactual self-defense

2 min read23rd Nov 20129 comments

2

Let's imagine these following dialogues between Omega and an agent implementing TDT. Usual standard assumptions on Omega applies: the agent knows Omega is real, trustworthy and reliable, and Omega knows that the agent knows that, and the agent knows that Omega knows that the agent knows, etc. (that is, Omega's trustworthiness is common knowledge, à la Aumann).

Dialogue 1.

Omega: "Would you accept a bet where I pay you 1000$ if a fair coin flip comes out tail and you pay me 100$ if it comes out head?"
TDT: "Sure I would."
Omega: "I flipped the coin. It came out head."
TDT: "Doh! Here's your 100$."

I hope there's no controversy here.

Dialogue 2.

Omega: "I flipped a fair coin and it came out head."
TDT: "Yes...?"
Omega: "Would you accept a bet where I pay you 1000$ if the coin flip came out tail and you pay me 100$ if it came out head?"
TDT: "No way!"

I also hope no controversy arises: if the agent would answer yes, then there's no reason he wouldn't accept all kinds of losing bets conditioned on information it already knows.

The two bets are equal, but the information is presented in different order: in the second dialogue, the agent has the time to change its knowledge about the world and should not accept bets that it already knows are losing.

But then...

Dialogue 3.

Omega: "I flipped a coin and it came out head. I offer you a bet where I pay you 1000$ if the coin flip comes out tail, but only if you agree to pay me 100$ if the coin flip comes out head."
TDT: "...?"

In the original counterfactual discussion, apparently the answer of the TDT implementing agent should have been yes, but I'm not entirely clear on what is the difference between the second and the third case.

Thinking about it, it seems that the case is muddled because the outcome and the bet are presented at the same time. On one hand, it appears correct to think that an agent should act exactly how it should if it had pre-committed, but on the other hand, an agent should not ignore any information is presented (it's a basic requirement of treating probability as extended logic).

So here's a principle I would like to call 'counterfactual self-defense': whenever informations and bets are presented to the agent at the same time, it always first conditions its priors and only then examines whatever bets has been offered. This should prevent Omega from offering counterfactual losing bets, but not counterfactual winning ones.

Would this principle make an agent win more?

New to LessWrong?

2

Counterfactual self-defense

New Comment

9 comments, sorted by

top scoring

Click to highlight new comments since: Today at 7:16 AM

[-]ArisKatsaris11y150

Dialogue 2 and dialogue 3 as you phrased them are equivalent, but they both omit a significant aspect of the original discussion -- that Omega promises that if the coin had come up tails, it would have offered the same bet (which would now have been a winning one for you).

Your scenarios, as stated, leave it unclear whether the bet is offered because it came up heads. So the possibility is left open that Omega only offer bets when he knows that the coin came up heads.

Taking the scenario where we know that Omega would have offered the bet regardless of what the coin-toss was: that effectively means that, in this type of decisions, statistically speaking, agents are favoured who exhibit some sort of "timelessness" in their decision theory, agents are favoured who do not update in the sense that a CDT agent would update.

So to have that "winning decision theory" (which is winning overall, not in individual cases), we must be agents who do not update in this manner.

The problem people tend to have with this is that they seem to assume a winning decision theory to be one which maximizes the expected utility of each of any individual decisions as if they're logically independent from each other, but in reality we want a decision theory that maximizes the expected summed utility over the whole life-length of the decision theory.

[-]MrMind11y00

Dialogue 2 and dialogue 3 as you phrased them are equivalent, but they both omit a significant aspect of the original discussion -- that Omega promises that if the coin had come up tails, it would have offered the same bet (which would now have been a winning one for you).

Mmm... I implied it by saying in the first paragraph that Omega is reliable and that's common knowledge, but it's true that the wording could have been much clearer. I wonder if an edit would do more harm than good.

Your scenarios, as stated, leave it unclear whether the bet is offered because it came up heads. So the possibility is left open that Omega only offer bets when he knows that the coin came up heads.

That's on purpose, and is exactly the information that the agent doesn't have in one-shot counterfactual mugging.

Taking the scenario where we know that Omega would have offered the bet regardless of what the coin-toss was: that effectively means that, in this type of decisions, statistically speaking, agents are favoured who exhibit some sort of "timelessness" in their decision theory, agents are favoured who do not update in the sense that a CDT agent would update.

So to have that "winning decision theory" (which is winning overall, not in individual cases), we must be agents who do not update in this manner.

That is all well estabilished, I think. Yet:

The problem people tend to have with this is that they seem to assume a winning decision theory to be one which maximizes the expected utility of each of any individual decisions as if they're logically independent from each other, but in reality we want a decision theory that maximizes the expected summed utility over the whole life-length of the decision theory.

A DT though that would accept only winning counterfactual but behaves in every other occasion like a TDT agent is even better. My problem is understanding if the rule proposed is consistent with this desiderata, or would turn a TDT back into a CDT.

[-]VincentYu11y70

Dialogue 2 and dialogue 3 as you phrased them are equivalent, but they both omit a significant aspect of the original discussion -- that Omega promises that if the coin had come up tails, it would have offered the same bet (which would now have been a winning one for you).

Mmm... I implied it by saying in the first paragraph that Omega is reliable and that's common knowledge, but it's true that the wording could have been much clearer. I wonder if an edit would do more harm than good.

Common knowledge of Omega's reliability is not sufficient. ArisKatsaris is pointing out that in your current post, Omega is allowed to condition its offer on the outcome of the coin flip. The original discussion on counterfactual mugging specifies that this is not allowed—Omega's offer is independent of the coin flip.

Your scenarios, as stated, leave it unclear whether the bet is offered because it came up heads. So the possibility is left open that Omega only offer bets when he knows that the coin came up heads.

That's on purpose, and is exactly the information that the agent doesn't have in one-shot counterfactual mugging.

You do have the information. In counterfactual mugging, Omega tells you truthfully (with its trustworthiness being common knowledge) that it would have given the same offer if the coin had landed differently.

[-]MrMind11y00

Common knowledge of Omega's reliability is not sufficient. ArisKatsaris is pointing out that in your current post, Omega is allowed to condition its offer on the outcome of the coin flip. The original discussion on counterfactual mugging specifies that this is not allowed—Omega's offer is independent of the coin flip.

You do have the information. In counterfactual mugging, Omega tells you truthfully (with its trustworthiness being common knowledge) that it would have given the same offer if the coin had landed differently.

I'm starting to think that there's a deeper-than-I-thought point about the extraction of this information from the way I structured the dialogue. If all that Omega offers is a series of bet, and the TDT agent has no information about what would have Omega done if the coin toss came out differently or even if he will see Omega again, then it's not clear to me what a TDT agent should do.

[-]ArisKatsaris11y10

If all that Omega offers is a series of bet, and the TDT agent has no information about what would have Omega done if the coin toss came out differently or even if he will see Omega again, then it's not clear to me what a TDT agent should do.

Indeed. I think that's why it's sometimes better to imagine "Omega" as some sort of stable physical process whose complete functionality we know, instead of an as an agent with mysterious motivations.

[-]ArisKatsaris11y20

Omega: "Would you accept a bet where I pay you 1000$ if a fair coin flip comes out tail and you pay me 100$ if it comes out head?"
TDT: "Sure I would."
Omega: "I flipped the coin. It came out head."
TDT: "Doh! Here's your 100$."

As a sidenote, I note that a CDT agent would be like the following:

Omega: "Would you accept a bet where I pay you 1000$ if a fair coin flip comes out tail and you pay me 100$ if it comes out head?"
CDT: "Sure I would."
Omega: "I flipped the coin. It came out head. Now give me the 100$."
CDT: "No way!"

And knowing this, Omega would never bet with the CDT agent, unless they had a way to precommit to give the money even though they now know they have already lost, which brings them effectively close to being TDT agents... :-)

[-]A1987dM11y00

As a sidenote, I note that a CDT agent would be like the following:

Only if they knew Omega wouldn't retaliate.

[-]MrMind11y00

And knowing this, Omega would never bet with the CDT agent, unless they had a way to precommit to give the money even though they now know they have already lost, which brings them effectively close to being TDT agents... :-)

That's begging the question on Omega motivations a little bit too much: if the world is a series of unrelated bets, then an agent that doesn't pay does strictly better than an agent who pays when losing, so a good DT would want the agent to do that. But when trustworthiness (which is essentially the degree of timelessness) is an issue, for example in cooperation scenarios, or when Omega values that (Newcomb-like problems), or when it's a precondition for receiving utility (Parfit's hitchhiker), then TDT outperforms CDT, as it should.

[-][anonymous]11y-30

OP: before making the post, have you considered that due to the very nature of TDT, "failed" inquiries into this topic can leave you not just with knowledge useless for self-defense, but with information that's an active liability and can become a full-fledged backdoor in your mind? More importantly, that public inquiries into this topic can "infect" unwilling readers, like in The Infamous Hushed-Up Incident?

Not that any of the examples you posted appear to open into dangerous venues to this layman, but still...

[This comment is no longer endorsed by its author]Reply

Moderation Log