# The Blackmail Equation

11 10 March 2010 02:46PM

This is Eliezer's model of blackmail in decision theory at the recent workshop at SIAI, filtered through my own understanding. Eliezer help and advice were much appreciated; any errors here-in are my own.

The mysterious stranger blackmailing the Countess of Rectitude over her extra-marital affair with Baron Chastity doesn't have to run a complicated algorithm. He simply has to credibly commit to the course of action:

"If you don't give me money, I will reveal your affair."

And then, generally, the Countess forks over the cash. Which means the blackmailer never does reveal the details of the affair, so that threat remains entirely counterfactual/hypothetical. Even if the blackmailer is Baron Chastity, and the revelation would be devastating for him as well, this makes no difference at all, as long as he can credibly commit to Z. In the world of perfect decision makers, there is no risk to doing so, because the Countess will hand over the money, so the Baron will not take the hit from the revelation.

Indeed, the baron could replace "I will reveal our affair" with Z="I will reveal our affair, then sell my children into slavery, kill my dogs, burn my palace, and donate my organs to medical science while boiling myself in burning tar" or even "I will reveal our affair, then turn on an unfriendly AI", and it would only matter if this changed his pre-commitment to Z. If the Baron can commit to counterfactually doing Z, then he never has to do Z (as the countess will pay him the hush money), so it doesn't matter how horrible the consequences of Z are to himself.

To get some numbers in this model, assume the countess can either pay up or not do so, and the baron can reveal the affair or keep silent. The payoff matrix could look something like this:

(Baron, Countess)
Pay
Not pay
Reveal
(-90,-110) (-100,-100)
Silent
(10,-10) (0,0)

Both the countess and the baron get -100 utility if the affair is revealed, while the countess transfers 10 of her utilitons to the baron if she pays up. Staying silent and not paying have no effect on the utility of either.

Let's see how we could implement the blackmailing if the baron and the countess were running simple decision algorithms. The baron has a variety of tactics he could implement. What is a tactic, for the baron? A tactic is a list of responses he could implement, depending on what the countess does. His four tactics are:

1. (Pay, NPay)→(Reveal, Silent)        "anti-blackmail" : if she pays, tell all, if she doesn't, keep quiet
2. (Pay, NPay)→(Reveal, Reveal)      "blabbermouth" : whatever she does, tell all
3. (Pay, NPay)→(Silent, Silent)         "not-a-word" : whatever she does, keep quiet
4. (Pay ,NPay)→(Silent, Reveal)        "blackmail" : if she pays, keep quiet, if she doesn't, tell all

The countess, in contract, has only two tactics: pay or don't pay. Each will try and estimate what the other will do, so the baron must model the countess, who must model the baron in turn. This seems as if it leads to infinite regress, but the baron has a short-cut: when reasoning counterfactually as to which tactic to implement, he will substitute that tactic in his model of how the countess models him.

In simple terms, it means that when he is musing 'what were to happen if I were to anti-blackmail, hypothetically', he assume that the countess would model him as an anti-blackmailer. In that case, the countess' decision is easy: her utility maximising decision is not to pay, leaving them with a payoff of (0,0).

Similarly, if he counterfactually considers the blabbermouth tactic, then if the countess models him as such, her utility-maximising tactic is also not to pay up, giving a payoff of (-100,-100). Not-a-word results in a payoff of (0,0), and only if the baron implements the blackmail tactic will the countess pay up, giving a payoff of (10,-10). Since this maximises his utility, he will implement the blackmail tactic. And the countess will pay him, to minimise her utility loss.

Notice that in order for this to work, the baron needs four things:

1. The baron needs to make his decision after the countess does, so she cannot react to his action.
2. The baron needs to make his decision after the countess does, so he can react to her action.
3. The baron needs to be able to precommit to a specific tactic (in this case, blackmail).
4. The baron needs the countess to find his precommitment plausible.

If we were to model the two players as timeless AI's implementing specific decision theories, what would these conditions become? They can be cast as:

1. The baron and the countess must exchange their source code.
2. The baron and the countess must both be rational.
3. The countess' available tactics are simply to pay or not to pay.
4. The baron's available tactics are conditional tactics, dependent on what the countess' decision is.
5. The baron must model the countess as seeing his decision as a fixed fact over which she has no influence.
6. The countess must indeed see the baron's decision as a fixed fact over which she has no influence.

The baron occupies what Eliezer termed a superior epistemic vantage.

Could two agents be in superior epistemic vantage, as laid out above, one over the other? This is precluded by the set-up above*, as two agents cannot be correct in assuming that the other treats their own decision as a fixed fact, while both running counterfactuals conditioning their response on the varrying tactics of the other.

"I'll tell, if you don't send me the money, or try and stop me from blackmailing you!" versus "I'll never send you the money, if you blackmail me or tell anyone about us!"

Can the countess' brother, the Archduke of Respectability, blackmail the baron on her behalf? If the archduke is in a superior epistemic vantage to the baron, then there is no problem. He could choose a tactic that is dependent on the baron's choice of tactics, without starting an infinite loop, as the baron cannot do the same to him. The most plausible version would go:

"If you blackmail my sister, I will shoot you. If you blabbermouth, I will shoot you. Anti-blackmail and not-a-word are fine by me, though."

Note that Omega, in the Newcomb's problem, is occupying the superior epistemic vantage. His final tactic is the conditional Z="if you two-box, I put nothing in box A; if you one-box, I put in a million pounds," whereas you do not have access to tactics along the lines of "if Omega implements Z, I will two-box; if he doesn't, I will one-box". Instead, like the countess, you have to assume that Omega will indeed implement Z, accept this as fact, and then choose simply to one-box or two-box.

*The argument, as presented here, is a lie, but spelling out the the true version would be tedious and tricky. The countess, for instance, is perfectly free to indulge in counterfactual speculations that the baron may decide something else, as long as she and the baron are both aware that these speculations will never influence her decision. Similarly, the baron is free to model her doing so, as long this similarly leads to no difference. The countess may have a dozen other options, not just the two presented here, as long as they both know she cannot make use of them. There is a whole issue of extracting information from an algorithm and a source code here, where you run into entertaining paradoxes such as if the baron knows the countess will do something, then he will be accurate, and can check whether his knowledge is correct; but if he didn't know this fact, then it would be incorrect. These are beyond the scope of this post.

[EDIT] The impossibility of the countess and the baron being each in epistemic vantage over the other has been clarified, and replaces the original point - about infinite loops - which only implied that result for certain naive algorithms.

[EDIT] Godelian reasons make it impossible to bandy about "he is rational and believes X, hence X is true" with such wild abandon. I've removed the offending lines.

[EDIT] To clarify issues, here is a formal model of how the baron and countess could run their decision theories. Let X be a fact about the world, and let S_B be the baron's source code.

Baron(S_C):

Utility of pay = 10, utility of reveal = -100

Based on S_C, if the countess would accept the baron's behaviour as a fixed fact, run:

Let T={anti-blackmail, blabbermouth, not-a-word, blackmail}

For t_b in T, compute utility of the outcome implied by Countess(t_b,S_B). Choose the t_b that maximises it.

Countess(X, S_B)

If X implies the baron's tactic t_b, then accept t_b as fixed fact.

If not, run Baron(S_C) to compute the baron's tactic t_b. Stop as soon as the tactic is found. Accept as fixed fact.

Utility of pay = -10, utility of reveal = -100.

Let T={pay, not pay}

For t_c in T, under the assumption of t_b, compute utility of outcome. Choose t_c that maximises it.

Both these agents are rational with each other, in that they correctly compute each other's ultimate decisions in this situation. They are not perfectly rational (or rather, their programs are incomplete) in that they do not perform well against general agents, and may fall into infinite loops as written.

Sort By: Best
Comment author: 10 March 2010 04:52:17PM *  6 points [-]

What's to stop the Countess from having precommitted to never respond to blackmail?

Or to have precommitted to act as though having precommitted to the course of action having precommitted to in retrospect seems the most beneficial (including meta-precommittments, meta-meta-precommitments, meta^meta^meta precommitments etc up to the highest level she can model)?

Which would presumably include not being blackmailable to agents who would not try to blackmail if she absolutely committed to not be blackmailable, but being blackmailable to agents who would try blackmail even if she absolutely committed to not be blackmailable, except agents who would not have modified themselves into such agents were in not for such exceptions. Or in short: Being blackmailable only to irrationally blackmailing agents who were never deliberately modified into such by anyone.

Comment author: 10 March 2010 06:14:02PM 4 points [-]

Who precommits first wins. If the baron precommits to fulfil the threat unless he gets the money, later precommitment of the countess is worthless, since she expects the baron to fulfil the threat anyway. Her precommitment has sense only if she makes it and if the baron knows about it before his threat is announced. Assumed that all precommitments are public, the countess' precommitment to never respond to threats and the baron's precommitment to reveal the secret are mutually exclusive. Hence, if the baron actually threatens the countess, we can be sure that she hasn't precommited to never respond.

Comment author: 10 March 2010 06:40:22PM *  3 points [-]

There is no "first" in precommiting -- your source code precommits you to certain actions, and you can't influence your source code, only carry out what the code states. The notion of precommiting, as a modification, is bogus (not so for the signalling of being precommited, or of being precommited in the particular case). You could be precommited to ignore certain signals of precommitment as well, and at some point signal such a precommitment. There seems to be no sense in distinguishing between when the same signal of precommitment is made (but it should be about the same precommitment, not a conditional variant of the previous one).

Comment author: 10 March 2010 08:39:45PM 1 point [-]

There is no "first" in precommiting -- your source code precommits you to certain actions, and you can't influence your source code, only carry out what the code states. The notion of precommiting, as a modification, is bogus

You can influence your source code. You change the words and symbols in the text file, hit recompile, load the new binary into memory and execute it. If your code is such that it considers making such modifications as a suitable action to a situation then that is what you will do.

Comment author: 11 March 2010 09:21:40AM 1 point [-]

Common computer programs have a rather sharp boundary between their source code and the data. In brains (and hypothetical AIs) this distinction is (would be) probably less explicit. Whenever the baron learns anything, his source code changes in some sense, involuntarily, without recompiling. Still, the original source code contains all the information. Precommiting, in order to have some importance, should mean learning about a particular output of your own source code, rather than recompiling.

Comment author: 12 March 2010 12:17:48AM 0 points [-]

The use of 'source code' here is merely a metaphor.

Comment author: 12 March 2010 08:30:37AM 0 points [-]

Metaphor standing for what exactly?

Comment author: 13 March 2010 04:01:28AM 0 points [-]

UTM tape, brain, clockwork mechanism... whatever.

Comment author: 10 March 2010 08:45:53PM 0 points [-]

Think functional program, or what was initially written on the tape of a UTM. We are interested in that particular fact, not what happened after.

Comment author: 10 March 2010 09:02:21PM *  1 point [-]

But I am interested in what happened after. If a tape operating on a UTM is programmed to operate a peripheral device to take the tape and modify it. then it is able to do that and the original tape is no longer running, the new one is. For any given agent in the universe it is possible to alter its state such that it behaves differently. Agents that are not implemented within this universe may not be changed in this way and those are the agents that I am not interested in.

Think functional program

Functional programs can operate machines that alter code to produce new, different functional programs.

The baron can alter his source code. Once he does so he is a different agent. How a countess responds to the baron's decision to modify his source code is a different question. If the countess is wise she will not pay in such a situation, the baron will know this and he will choose not to modify his source code. But it is a choise, the universe permits it.

Comment author: 10 March 2010 09:19:12PM 1 point [-]

If the countess is wise she will not pay in such a situation, the baron will know this and he will choose not to modify his source code. But it is a choise, the universe permits it.

Now this is a game of signalling -- to lie or not to lie, to trust or not to trust (or just how to interpret a given signal). The payoffs of the original game induce the payoff for this game of signalling the facts useful for efficiently playing the original game.

You don't neet to talk about "modified source code" to discuss this data as signalling the original source code. (The original source code is interesting, because it describes the strategy.) The modified code is only interesting to the extent it signals the original code (which it probably doesn't).

(Incidentally, one can only change things in accordance with the laws of physics, and many-to-one mapping may not be an option, though reconstructing the past may be infeasible in practice.)

Comment author: 10 March 2010 09:30:25PM 1 point [-]

to lie or not to lie, to trust or not to trust

But it isn't a lie. It is the truth.

You don't neet to talk about "modified source code" to discuss this data as signalling the original source code.

I don't want to signal the original source code.

Comment author: 10 March 2010 09:47:31PM 0 points [-]

I don't want to signal the original source code.

But I want to know it, so whatever you do, signals something about the original source code, possibly very little.

But it isn't a lie. It is the truth.

What's not a lie? (I'm confused.) I was just listing the possible moves in a new meta-game.

Comment author: 10 March 2010 07:01:17PM *  1 point [-]

Having precommitted first is equivalent to deterministically acting as if already precommmitted in this instance, having precommitted too late is equivalent to only acting that way in future instances. I use "having precommitted" rather than "having source code such that..." because it's simpler, more intuitive, and more easily applicable to agents who don't have source code in the straightforward sense.

Comment author: 10 March 2010 07:11:06PM *  1 point [-]

When you say "precommited", you mean "effectively signalled precommitment". When you say "can't precommit" (that is, can precommit only to certain other things), you mean "there is no way of effectively signalling this precommitment". Thus, you state that you can't signal that you'd uphold a counterfactual precommitment. But if it's possible to give your source code, you can.

(Or the game might have a notion of rational strategy, and so you won't need either source code or signalling of precommitment.)

Comment author: 10 March 2010 07:21:40PM *  4 points [-]

Please don't correct me on what I think. My use of precommitting has absolutely nothing to do with signaling. I first thought about these things (this explicitly) in the context of time travel, and you can't fool the universe with signaling, no matter how good your acting skills.

Comment author: 10 March 2010 08:53:35PM *  0 points [-]

I don't propose fooling anyone, signaling is most effective when it's truthful.

What could it mean to "make a precommitment", if not to signal the fact that your strategy is a certain way? You strategy either is, or isn't a certain way, this is a fixed fact about yourself, facts don't change. This being apparently the only resolution, I was not so much correcting as elucidating what you were saying (but assuming you didn't think of this elucidation explicitly), in order to make the conclusion easier to see (that the problem is with inability to signal counterfactual aspects of the strategy).

Comment author: 10 March 2010 09:10:26PM *  1 point [-]

I don't propose fooling anyone, signaling is most effective when it's truthful.

Signaling is about perceptions, not the truth by necessity. That means that fooling is at least a hypothetical possibility. Which is not the case for my use of precommittment.

What could it mean to "make a precommitment", if not to signal the fact that your strategy is a certain way?

Taking the decision not to change your mind later in a way you will stick to. If as you seem to suggest the question whether the agent later acts a certain way or not is already implicit in its original source code then this agent already comes into existence precommitted (or not, as the case may be).

Comment author: 10 March 2010 09:30:05PM *  1 point [-]

Taking the decision not to change your mind later in a way you will stick to.

That you've taken this decision is a fact about your strategy (as such, it's timeless: looking at it from ten years ago doesn't change it). There is a similar fact of what you'd do if the situation was different.

Did you read about counterfactual mugging, and do you agree that one should give up the money? No precommitment in this sense could help you there: there is no explicit decision in advance, it has to be a "passive" property of your strategy (the distinction between a decision that was "made" and that wasn't is superficial one -- that's my point).

If as you seem to suggest the question whether the agent later acts a certain way or not is already implicit in its original source code then this agent already comes into existence precommitted (or not, as the case may be).

How could it be otherwise? And if so, "deciding to precommit" (in the sense of fixing this fact at a certain moment) is impossible in principle. All you can do is tell the other player about this fact, maybe only after you yourself discovered it (as being the way to win, and so the thing to do, etc.)

Comment author: 10 March 2010 09:40:59PM *  1 point [-]

That you've taken this decision is a fact about your strategy (as such, it's timeless: looking at it from ten years ago doesn't change it). There is a similar fact of what you'd do if the situation was different.

Yes, its a fact about your strategy, but this particular strategy would not have been your strategy before making that decision (it may have been a strategy you were considering, though). Unless you want to argue that there is no such thing as a decision, which would be a curious position in the context of a thought experiment about decision theory.

Did you read about counterfactual mugging, and do you agree that one should give up the money?

Yes, I considered myself precommitted to hand over the money when reading that. I would not have considered myself precommmitted before my speculations about time travel a couple of years ago, and if I had read the scenario of the counterfactual mugging and nothing else here, and if I had been forced to say whether I would hand over the money without time to think it though I would have said that I would not (I can't tell what I would have said given unlimited time).

Comment author: 10 March 2010 09:44:55PM 0 points [-]

Signaling is about perceptions, not the truth by necessity.

Any evidence, that is any way in which you may know facts about the world, is up to interpretation, and you may err in interpreting it. But it's also the only way to observe the truth.

Comment author: 10 March 2010 09:51:25PM 1 point [-]

You are talking about the relation between truth and your own perceptions. None of this is relevant for the relation between truth and what you want other peoples perceptions to be, which is the context those words are used in the post you reply to. Are you deliberately trying to misinterpret me? Do I need to make all of my posts lawyer-proof?

Comment author: 10 March 2010 09:14:00PM 0 points [-]

this is a fixed fact about yourself, facts don't change.

What I was 10 years ago is a fixed fact about what I was 10 years ago. That doesn't change. But I have.

Comment author: 10 March 2010 09:22:18PM 0 points [-]

So? (Not a rhetorical question.)

Comment author: 10 March 2010 09:33:29PM *  0 points [-]

The point is that it is not a fixed fact about yourself unless you have an esoteric definition of self that is "what I was, am or will be at one particular instant in time". Under the conventional meaning of 'yourself', you can change and do so constantly. Essentially the 'So?' is a fundamental rejection of the core premise of your comment.

(We disagree about a fundamental fact here. It is a fact that appears trivial and obvious to me and I assume your view appears trivial and obvious to you. It doesn't seem likely that we will resolve this disagreement. Do you agree that it would be best for us if we just leave it at that? You can, of course, continue the discussion with FAWS who on this point at least seems to have the same belief as I.)

Comment author: 10 March 2010 08:50:22PM 2 points [-]

When you say "precommited", you mean "effectively signalled precommitment". When you say "can't precommit" (that is, can precommit only to certain other things), you mean "there is no way of effectively signalling this precommitment".

FAWS clearly does not mean that. He means what he says he means and you disagree with him.

Since the game stipulates that one of the two acts before the other editing their source code is a viable option. If you happen to know that the other party is vulnerable to this kind of tactic then this is the right decision to make.

(Or the game might have a notion of rational strategy, and so you won't need either source code or signalling of precommitment.)

On this I agree.

Comment author: 10 March 2010 08:58:43PM *  0 points [-]

FAWS clearly does not mean that. He means what he says he means and you disagree with him.

I don't disagree with him, because I don't see what else it could mean.

Since the game stipulates that one of the two acts before the other editing their source code is a viable option.

See the other reply -- the edited code is not an interesting fact. The communicated code must be the original one -- if it's impossible to verify, this just means it can't be effectively communicated (signalled), which implies that you can't signal your counterfactual precommitment.

Comment author: 10 March 2010 09:09:58PM 0 points [-]

See the other reply -- the edited code is not an interesting fact. The communicated code must be the original one

No, it need not be the original code. In fact, if the Baron really wants to he can destroy all copies of the original code. This is a counterfactual actual universe. The agent that is the baron is made up of quarks which can be moved about using the normal laws of physics.

Comment author: 10 March 2010 09:40:42PM *  0 points [-]

It need not be the original code, but if we are interested in the original code, then we read the communicated data as evidence about the original code -- for what it's worth. It may well be in Baron's interest to give info about his code -- since otherwise, what distinguishes him from a random jumble of wires, in which case the outcome may not be appropriate for his skills.

Comment author: 11 March 2010 09:09:27AM 0 points [-]

By precommiting I understand starting to be aware of the fact that my source code will do the particular thing with certainty. Nobody knows his source code completely, and even knowing the source code doesn't imply knowing all its outputs immediately. So, what I wanted to say is that when making the threat, the baron must know that he will certainly act the way he announces (this is the precommitment) and the countess has to know this fact about the baron (this is the signalling part).

Time matters because the baron has to calculate his counterfactual actions (i.e. partly simulate himself) before he can precommit in the sense I understand the word.

Comment author: 10 March 2010 06:31:51PM *  1 point [-]

Who precommits first wins. If the baron precommits to fulfil the threat unless he gets the money, later precommitment of the countess is worthless, since she expects the baron to fulfil the threat anyway.

Obviously. Hence my use of perfect tense rather than present tense. A world with agents acting and reflecting in the way the two players acting in the example do, but without previous commitments that make this precise behavior impossible seems highly implausible to me. I personally would have considered myself as being precommitted not to respond to blackmail in the scenario given even before reading it, and that would have been obvious to anyone familiar enough with me to reasonably feel as confident about predicting my reaction as would be required in the scenario.

Comment author: 10 March 2010 05:50:49PM 2 points [-]

"Being blackmailable only to irrationally blackmailing agents who were never deliberately modified into such by anyone"... i.e. being blackmailable by any old normal blackmailer.

Comment author: 10 March 2010 06:35:25PM 2 points [-]

Most normal blackmailers don't try to blackmail knowably unblackmailable agents and are therefore insufficiently irrational in the sense used.

Comment author: 10 March 2010 08:23:19PM 4 points [-]

If the Baron can commit to counterfactually doing Z, then he never has to do Z, so it doesn't matter how horrible the consequences of Z are to himself.

This is true, but you've neutered the prisoner's dilemma. One of the central problems one faces in game theory is that it is extremely hard to credibly precommit to do something that you'd clearly rather not do. Your point is valid, but you've assumed away almost all of the difficult parts of the problem. This is even more of a problem in your subsequent post on nukes.

Comment author: 10 March 2010 06:13:37PM *  4 points [-]

(A few notes, with no particular point.)

The players should be modeled as algorithms that respond not to each other's moves, but as algorithms responding to information about the other player's algorithm (by constructing a strategy for responding to the other player's moves). In particular, saying that a certain player is a "rational agent" with certain payoff in the game might sometimes fix what that player's algorithm (of responding to info about the other player's algorithm) is. "It's a rational agent", if given as info about the player's algorithm to the other player, in the right games, may be taken as equivalent to the complete specification of this player's source code (modulo logical uncertainty; it's actually easier if we say "it's a rational agent" than if we give the source code, since in the latter case we may need to also make the inference -- hence you don't even need to be Omega to read "it's a rational agent" as exact specification of the algorithm, if the game is right). For this reason, it's unnecessary to actually communicate the source code, if it's a given that the players are "rational agents" and this knowledge can, by some magic, be communicated.

The game of considering counterfactuals has the goal of revealing semantics of the other player's algorithm: player A doesn't care what the source code of player B looks like, it only cares of what that source code does, that is how it reacts to possible players A (to what possible players A do, not to what possible players' source code is -- nobody is caring about the coding; and player A doesn't even know what its own code does -- this is what it's currently deciding; the coding only reflects logical uncertainty).

Note that each player has to maximize the payoff across all possible opponents, and not just the "rational agents" (particularly given that the coding for the "rational agent" may be unknown, even if the game has the "rational agent"). The other player may well be a random knot of wires (that still somehow processes the info about the other player's algorithm, maybe punishes the player if it cooperates with "rational agents"; no relation to the payoff matrix is possible at this point). The conserved resource here is expressive power: each player can't have an arbitrary table of answers to each of the possible other player's algorithms (the number of possible reactions to elements of set S is bigger than set S), so it's not possible in general to "step back" and improve on one's algorithm pointwise, for each of the possible other players (even if it were possible computationally). This suggests that many games won't have the unique "rational player" optimum, and it won't be enough to communicate "it's a rational player" to fix the algorithm or facilitate effective cooperation (for example, the ultimatum game, obviously).

Compare the notion of "rational agent" (as applied to more general games than PD) with stating that it's known about the other player that it's in your interest to make move X in response to that player.

Comment author: 10 March 2010 05:21:06PM 7 points [-]

The fifth fact is a consequence of the previous ones.

Um, no. Again, I think you may have misunderstood that point there. The point is not that all Countesses can inevitably and inescapably be blackmailed. It is just that a Countess designed a particular way can be blackmailed. The notion of a superior epistemic vantage point is not that there is some way for the Baron to always get it, but that if the Baron happens to have it, the Baron wins.

Could the countess plausibly raise herself to a superior epistemic vantage over the baron, and get out from under his thumb? Alas no.

Again, this just wasn't a conclusion of the workshop. A certain fixed equation occupies a lower epistemic vantage. Nothing was said about being unable to raise yourself up.

Alas no. Once the countess allows herself to use tactics conditional on the baron's actions, the whole set-up falls apart: the two start modelling the other's actions based on their own actions which are based on the other's actions, and so on. The baron can no longer assume that the countess has no influence on his decision, as now she does, so the loop never terminated.

Or the Countess just decides not to pay, unconditional on anything the Baron does. Also, if the Baron ends up in an infinite loop or failing to resolve the way the Baron wants to, that is not really the Countess's problem.

As I did say at the decision workshop, the resolution that seems most likely is "respond to offers, not to threats".

Comment author: 10 March 2010 09:01:23PM *  1 point [-]

I haven't missunderstood the points - though I have, I fear, over-simplified the presentation for illustrative purposes. The key missing ingredient is that when I wrote that:

The baron must model the countess as seeing his decision as a fixed fact over which she has no influence,

implicit in that was the assumption that the baron was rational, knew his source and the countess' and would arrive at a decision in finite time - hence he must be correct in his assumption. I nearly wrote it that way, but thought this layout would be more intuitive.

It is just that a Countess designed a particular way can be blackmailed.

Indeed. Those are conditions that allow the countess to be blackmailed.

Could the countess plausibly raise herself to a superior epistemic vantage over the baron, and get out from under his thumb? Alas no.

If the countess is already in an inferior epistemic vantage point, she can't raise herself deterministically to a higher one - for instance, she cannot stop treating the baron's actions as a fixed fact, as an entity capable of doing that is not genuinly treating them as fixed already.

The rest of that section was a rather poorly phrased way of saying that two entities cannot be in superior epistemic vantage over each other.

Comment author: 10 March 2010 10:07:01PM 0 points [-]

The fifth fact is a consequence of the previous ones.

It seems that by "consequence" you mean "logical consequence", that is if I, observing this scenario, note that the first 5 conditions hold, I can derive that the 6th condition holds as well.

There is another interpretation though, that you mean a "causal consequence", that the baron, by having a certain model of the countess, makes that model correct, because the baron is rational and therefor will produce a correct model. What this interpretation tells us is wrong. (Eliezer, were you interpreting it this way when you said Stuart misunderstood your point?)

Comment author: 10 March 2010 10:37:41PM *  0 points [-]

Yes, I'm eliding Godelian arguments there... Consequences of anyone being rational and believing X have been removed.

Interestingly, in the model I produced down below, both the countess and the baron produce correct models of each other. Furthermore, the countess knows she produces a correct model of the baron (as she runs his source successfuly).

It also happens that the baron can check he has the correct model of the countess, after making his decision, by running her code. Since the countess will stop running his own code as soon as she also knows his outcome, he can know that his model was accurate in finite time.

Comment author: 10 March 2010 09:21:33PM 0 points [-]

implicit in that was the assumption that the baron was rational, knew his source and the countess' and would arrive at a decision in finite time - hence he must be correct in his assumption. I nearly wrote it that way, but thought this layout would be more intuitive.

You say here that the baron is rational and he knows the countess's. This being the case the only way for the countess to be blackmailed is if she implements a defective decision algorithm. Yet you describe the difference between the two as an 'inferior epistemic vantage point'. This does not seem like the right label. It seems to me that the advantage is instrumental and not epistemic.

Comment author: 10 March 2010 09:31:48PM 0 points [-]

We do not yet have a decision algorithm that reliably "respond to offers, not to threats".

Therefore 'defective decision algorithm' must include everything we are capable of designing today :-)

Comment author: 10 March 2010 09:47:53PM 2 points [-]

We don't have a decision theory that reliably responds to offers, not to threats. We do have an algorithm that responds to offers, not to threats. Approximately it goes "when dealing with with rational agents and there is full epistemic awareness thrown all over the place respond to offers, not to threats because that is what works best." Unfortunately, integrating that into situations with epistemic uncertainty is all sorts of complex and probably beyond me. But that is a general problem that can be expected with any decision theory.

Comment author: 10 March 2010 10:42:53PM 0 points [-]

Sorry, trapped by Godel again. Consequences of anyone being rational and believing X have been removed.

Comment author: 16 March 2010 03:15:26PM 2 points [-]

Sure to all that; but what I want to see is an explanation of why I should consider the notion of "superior epistemic vantage" a useful idea. It seems like a fantasy that has no bearing on real life. Why shouldn't I dismiss Newcomb's dilemma as philosophical masturbation the moment someone says that it depends on superior epistemic vantage?

Comment author: 10 March 2010 11:09:06PM 2 points [-]

Parfit does a good job of covering this ground in Reasons and Persons in his discussion of threat-fulfillers and threat-ignorers. Let's see... you can read most of that section on Google Books, starting on p. 20 (section 1 part 8). A "threat" is a claim by someone that they will do X if you do Y, where doing X would make both them and you worse-off. In a world of transparency (shared source codes), Parfit comes out in favor of threat-ignoring (as well as promise-keeping).

Comment author: 10 March 2010 09:19:43PM *  2 points [-]

I'm staying out of this discussion mainly because I'm incredibly confused about acausal/timeless/counterfactual trade/blackmail. Eliezer gave a small presentation at the recent decision theory mini-workshop on his ideas but unlike Stuart I'm pretty sure I don't understand it. I've been told there are also some very rough notes/drafts on related ideas written by a couple of individuals floating around SIAI and FHI, but so far I have been unsuccessful in getting access to them.

ETA: I should mention that the workshop was very enjoyable and I greatly appreciate SIAI's efforts in setting it up, even though I came out more confused than I did going in. That just means I wasn't confused nearly enough previously. :)

Comment author: 10 March 2010 11:42:28PM *  2 points [-]

I also don't understand the general case of these problems, but from what I understand, discussing payoff matrices is the wrong thing to do -- it's about jumbles of wires, not "rational agents", and dancing around computational complexity, not figuring out simple strategies or heuristics, such as not listening to threats (the "baseline" harm discussion -- seems to end on the suggestion that rational agents won't blackmail other rational agents to begin with -- but what would you do when you are blackmailed by a jumble of wires?).

These concepts seem important: logical uncertainty (state of partial computation, program code vs. denotation, state vs. dynamics, proof vs. cut-free proof), observational uncertainty (and its combination with logical uncertainty), acausal control ("logical control", a situation when a system X is defined using the state (partial computation) of agent A, so that A's decisions control what X is -- this means that we are interested in the way X works, not just what it does -- that is not its denotation, not its strategy), recursive acausal control (what happens when the agent controls environment by conceptualizing it as containing the agent, or when two agents think about each other). The latter is the crux of most games, and seems incompatible with Bayesian networks, requiring thinking about algorithms (but not just their semantics -- acausal control is interested in the way things work, not just what they do).

Comment author: 10 March 2010 09:57:19PM 1 point [-]

but unlike Stuart I'm pretty sure I don't understand it.

Sigh... you may be the wiser of the two of us.

I understand certain formal models that ressemble the blackmail problem, but have a sloppy understanding of the exact conditions where they apply. Will edit the post to insert the formal model.

Comment author: 19 March 2010 10:11:39PM 1 point [-]

Sorry for such a late reply... Just so I understand the point, the point is "whoever can better model their opponent while themselves being able to then precommit to an action is the one in a superior epistemic vantage, whoever can go 'deeper' in the counterfactuals wins"?

ie, just so I understand, is the only thing stopping the countess from doing the same sort of counterfactual modelling on the baron, from having strategies that are functions of which strategy the baron chooses being that simply "by assumption, the countess's code/computational resources are lesser"? Or is there something about the situation that inherently leads to the countess having fewer options, however sophisticated she is and however much computational resources available to her?

(first, sorry for replying to this thing so late. I was a bit ill for a while, wasn't really up to tackling this/thinking about it)

Near as I can make out, one ends up with a "precommitment arms race"

ie, what stops the countess from credibly precommiting to "If you blackmail me, I will not only not pay, but will immediately and publicly reveal all myself"? And then the baron will have a way of precommiting to "no matter what, I'm going to blackmail you", and basically one ends up with a precommitment race until one side or the other blinks or they both make each other miserable. (or did I completely miss the point?)

Comment author: 10 March 2010 08:05:10PM *  1 point [-]

The most plausible version would go: "If you reveal the affair, I will shoot you."

This is of course contingent on the Countess being aware of this limitation on the Baron (i.e. she knows he will not reveal, forcing him to choose either not-a-word or anti-blackmail, as these are the only precommitments that would result in her not forcing him to reveal). I am therefore fairly certain the Archduke does not need an epistemic advantage; he merely needs the ability to make one outcome unacceptable to the party with an epistemic advantage, and make this fact known to both parties.

Granted, if the Archduke must keep his intervention unknown to his sister, your solution seems like the simplest one that would work. In my solution, the Countess must know that the payoff of [Reveal] has changed for the Baron, or he would choose blackmail, she would pay him, and the Archduke would not shoot him.

Comment author: 10 March 2010 04:14:10PM *  1 point [-]

In the world of perfect decision makers, there is no risk to doing so, because the Countess will hand over the money, so the Baron will not take the hit from the revelation.

Since the payoffs are symmetrical (in sense that both baron and countess lose the same amount of utility if the secret is revealed), in the world of perfect decision markers the winner is clearly the person who decides to blackmail. But the countess can retaliate easily: after she pays, she can blackmail the baron, and get the money back with certainty, even with some bonus. So, say that the baron has Xb money, and prefers having at least Lb money to keeping the secret (this is the lower bound; it can be 0 if the baron is willing to pay anything he has). For the countess, the analogous quantities are Xc and Lc.

1st step: The baron blackmails, and rationally demands Xc-Lc money. Countess pays. After that, the baron has Xb+Xc-Lc, the countess has Lc. Now, the baron knows that the countess will reject any further threat. 2nd step: The countes blackmails the baron, and demands Xb+Xc-Lc-Lb money. Baron, obviously, pays. After that, the baron is at Lb, while the countess is at Xb+Xc-Lb. 3rd step is obvious.

Now, both the countess and the baron are reasonable enough that they can predict the outcome being an infinite oscillation of the bank account, and that the result is only loss of time. (Assume that it is impossible for the countess to trick the baron by spending the money somehow to get some utility while lowering her accessible possessions to Lc, or vice versa). So, what is the solution in the timeless decision theory? How would the ideally rational baron and countess behave?