What's to stop the Countess from having precommitted to never respond to blackmail?
Or to have precommitted to act as though having precommitted to the course of action having precommitted to in retrospect seems the most beneficial (including meta-precommittments, meta-meta-precommitments, meta^meta^meta precommitments etc up to the highest level she can model)?
Which would presumably include not being blackmailable to agents who would not try to blackmail if she absolutely committed to not be blackmailable, but being blackmailable to agents who would try blackmail even if she absolutely committed to not be blackmailable, except agents who would not have modified themselves into such agents were in not for such exceptions. Or in short: Being blackmailable only to irrationally blackmailing agents who were never deliberately modified into such by anyone.
Please don't correct me on what I think. My use of precommitting has absolutely nothing to do with signaling. I first thought about these things (this explicitly) in the context of time travel, and you can't fool the universe with signaling, no matter how good your acting skills.
The fifth fact is a consequence of the previous ones.
Um, no. Again, I think you may have misunderstood that point there. The point is not that all Countesses can inevitably and inescapably be blackmailed. It is just that a Countess designed a particular way can be blackmailed. The notion of a superior epistemic vantage point is not that there is some way for the Baron to always get it, but that if the Baron happens to have it, the Baron wins.
Could the countess plausibly raise herself to a superior epistemic vantage over the baron, and get out from under his thumb? Alas no.
Again, this just wasn't a conclusion of the workshop. A certain fixed equation occupies a lower epistemic vantage. Nothing was said about being unable to raise yourself up.
Alas no. Once the countess allows herself to use tactics conditional on the baron's actions, the whole set-up falls apart: the two start modelling the other's actions based on their own actions which are based on the other's actions, and so on. The baron can no longer assume that the countess has no influence on his decision, as now she does, so the loop never terminated.
Or the Countess just decides not to pay, unconditional o...
If the Baron can commit to counterfactually doing Z, then he never has to do Z, so it doesn't matter how horrible the consequences of Z are to himself.
This is true, but you've neutered the prisoner's dilemma. One of the central problems one faces in game theory is that it is extremely hard to credibly precommit to do something that you'd clearly rather not do. Your point is valid, but you've assumed away almost all of the difficult parts of the problem. This is even more of a problem in your subsequent post on nukes.
(A few notes, with no particular point.)
The players should be modeled as algorithms that respond not to each other's moves, but as algorithms responding to information about the other player's algorithm (by constructing a strategy for responding to the other player's moves). In particular, saying that a certain player is a "rational agent" with certain payoff in the game might sometimes fix what that player's algorithm (of responding to info about the other player's algorithm) is. "It's a rational agent", if given as info about the play...
I'm staying out of this discussion mainly because I'm incredibly confused about acausal/timeless/counterfactual trade/blackmail. Eliezer gave a small presentation at the recent decision theory mini-workshop on his ideas but unlike Stuart I'm pretty sure I don't understand it. I've been told there are also some very rough notes/drafts on related ideas written by a couple of individuals floating around SIAI and FHI, but so far I have been unsuccessful in getting access to them.
ETA: I should mention that the workshop was very enjoyable and I greatly appreciate SIAI's efforts in setting it up, even though I came out more confused than I did going in. That just means I wasn't confused nearly enough previously. :)
Sure to all that; but what I want to see is an explanation of why I should consider the notion of "superior epistemic vantage" a useful idea. It seems like a fantasy that has no bearing on real life. Why shouldn't I dismiss Newcomb's dilemma as philosophical masturbation the moment someone says that it depends on superior epistemic vantage?
Parfit does a good job of covering this ground in Reasons and Persons in his discussion of threat-fulfillers and threat-ignorers. Let's see... you can read most of that section on Google Books, starting on p. 20 (section 1 part 8). A "threat" is a claim by someone that they will do X if you do Y, where doing X would make both them and you worse-off. In a world of transparency (shared source codes), Parfit comes out in favor of threat-ignoring (as well as promise-keeping).
Sorry for such a late reply... Just so I understand the point, the point is "whoever can better model their opponent while themselves being able to then precommit to an action is the one in a superior epistemic vantage, whoever can go 'deeper' in the counterfactuals wins"?
ie, just so I understand, is the only thing stopping the countess from doing the same sort of counterfactual modelling on the baron, from having strategies that are functions of which strategy the baron chooses being that simply "by assumption, the countess's code/computati...
The most plausible version would go: "If you reveal the affair, I will shoot you."
This is of course contingent on the Countess being aware of this limitation on the Baron (i.e. she knows he will not reveal, forcing him to choose either not-a-word or anti-blackmail, as these are the only precommitments that would result in her not forcing him to reveal). I am therefore fairly certain the Archduke does not need an epistemic advantage; he merely needs the ability to make one outcome unacceptable to the party with an epistemic advantage, and make...
In the world of perfect decision makers, there is no risk to doing so, because the Countess will hand over the money, so the Baron will not take the hit from the revelation.
Since the payoffs are symmetrical (in sense that both baron and countess lose the same amount of utility if the secret is revealed), in the world of perfect decision markers the winner is clearly the person who decides to blackmail. But the countess can retaliate easily: after she pays, she can blackmail the baron, and get the money back with certainty, even with some bonus. So, say ...
This is Eliezer's model of blackmail in decision theory at the recent workshop at SIAI, filtered through my own understanding. Eliezer help and advice were much appreciated; any errors here-in are my own.
The mysterious stranger blackmailing the Countess of Rectitude over her extra-marital affair with Baron Chastity doesn't have to run a complicated algorithm. He simply has to credibly commit to the course of action:
"If you don't give me money, I will reveal your affair."
And then, generally, the Countess forks over the cash. Which means the blackmailer never does reveal the details of the affair, so that threat remains entirely counterfactual/hypothetical. Even if the blackmailer is Baron Chastity, and the revelation would be devastating for him as well, this makes no difference at all, as long as he can credibly commit to Z. In the world of perfect decision makers, there is no risk to doing so, because the Countess will hand over the money, so the Baron will not take the hit from the revelation.
Indeed, the baron could replace "I will reveal our affair" with Z="I will reveal our affair, then sell my children into slavery, kill my dogs, burn my palace, and donate my organs to medical science while boiling myself in burning tar" or even "I will reveal our affair, then turn on an unfriendly AI", and it would only matter if this changed his pre-commitment to Z. If the Baron can commit to counterfactually doing Z, then he never has to do Z (as the countess will pay him the hush money), so it doesn't matter how horrible the consequences of Z are to himself.
To get some numbers in this model, assume the countess can either pay up or not do so, and the baron can reveal the affair or keep silent. The payoff matrix could look something like this:
Both the countess and the baron get -100 utility if the affair is revealed, while the countess transfers 10 of her utilitons to the baron if she pays up. Staying silent and not paying have no effect on the utility of either.
Let's see how we could implement the blackmailing if the baron and the countess were running simple decision algorithms. The baron has a variety of tactics he could implement. What is a tactic, for the baron? A tactic is a list of responses he could implement, depending on what the countess does. His four tactics are:
The countess, in contract, has only two tactics: pay or don't pay. Each will try and estimate what the other will do, so the baron must model the countess, who must model the baron in turn. This seems as if it leads to infinite regress, but the baron has a short-cut: when reasoning counterfactually as to which tactic to implement, he will substitute that tactic in his model of how the countess models him.
In simple terms, it means that when he is musing 'what were to happen if I were to anti-blackmail, hypothetically', he assume that the countess would model him as an anti-blackmailer. In that case, the countess' decision is easy: her utility maximising decision is not to pay, leaving them with a payoff of (0,0).
Similarly, if he counterfactually considers the blabbermouth tactic, then if the countess models him as such, her utility-maximising tactic is also not to pay up, giving a payoff of (-100,-100). Not-a-word results in a payoff of (0,0), and only if the baron implements the blackmail tactic will the countess pay up, giving a payoff of (10,-10). Since this maximises his utility, he will implement the blackmail tactic. And the countess will pay him, to minimise her utility loss.
Notice that in order for this to work, the baron needs four things:
If we were to model the two players as timeless AI's implementing specific decision theories, what would these conditions become? They can be cast as:
The baron occupies what Eliezer termed a superior epistemic vantage.
Could two agents be in superior epistemic vantage, as laid out above, one over the other? This is precluded by the set-up above*, as two agents cannot be correct in assuming that the other treats their own decision as a fixed fact, while both running counterfactuals conditioning their response on the varrying tactics of the other.
"I'll tell, if you don't send me the money, or try and stop me from blackmailing you!" versus "I'll never send you the money, if you blackmail me or tell anyone about us!"
Can the countess' brother, the Archduke of Respectability, blackmail the baron on her behalf? If the archduke is in a superior epistemic vantage to the baron, then there is no problem. He could choose a tactic that is dependent on the baron's choice of tactics, without starting an infinite loop, as the baron cannot do the same to him. The most plausible version would go:
"If you blackmail my sister, I will shoot you. If you blabbermouth, I will shoot you. Anti-blackmail and not-a-word are fine by me, though."
Note that Omega, in the Newcomb's problem, is occupying the superior epistemic vantage. His final tactic is the conditional Z="if you two-box, I put nothing in box A; if you one-box, I put in a million pounds," whereas you do not have access to tactics along the lines of "if Omega implements Z, I will two-box; if he doesn't, I will one-box". Instead, like the countess, you have to assume that Omega will indeed implement Z, accept this as fact, and then choose simply to one-box or two-box.
*The argument, as presented here, is a lie, but spelling out the the true version would be tedious and tricky. The countess, for instance, is perfectly free to indulge in counterfactual speculations that the baron may decide something else, as long as she and the baron are both aware that these speculations will never influence her decision. Similarly, the baron is free to model her doing so, as long this similarly leads to no difference. The countess may have a dozen other options, not just the two presented here, as long as they both know she cannot make use of them. There is a whole issue of extracting information from an algorithm and a source code here, where you run into entertaining paradoxes such as if the baron knows the countess will do something, then he will be accurate, and can check whether his knowledge is correct; but if he didn't know this fact, then it would be incorrect. These are beyond the scope of this post.
[EDIT] The impossibility of the countess and the baron being each in epistemic vantage over the other has been clarified, and replaces the original point - about infinite loops - which only implied that result for certain naive algorithms.
[EDIT] Godelian reasons make it impossible to bandy about "he is rational and believes X, hence X is true" with such wild abandon. I've removed the offending lines.
[EDIT] To clarify issues, here is a formal model of how the baron and countess could run their decision theories. Let X be a fact about the world, and let S_B be the baron's source code.
Baron(S_C):
Utility of pay = 10, utility of reveal = -100
Based on S_C, if the countess would accept the baron's behaviour as a fixed fact, run:
Let T={anti-blackmail, blabbermouth, not-a-word, blackmail}
For t_b in T, compute utility of the outcome implied by Countess(t_b,S_B). Choose the t_b that maximises it.
Countess(X, S_B)
If X implies the baron's tactic t_b, then accept t_b as fixed fact.
If not, run Baron(S_C) to compute the baron's tactic t_b. Stop as soon as the tactic is found. Accept as fixed fact.
Utility of pay = -10, utility of reveal = -100.
Let T={pay, not pay}
For t_c in T, under the assumption of t_b, compute utility of outcome. Choose t_c that maximises it.
Both these agents are rational with each other, in that they correctly compute each other's ultimate decisions in this situation. They are not perfectly rational (or rather, their programs are incomplete) in that they do not perform well against general agents, and may fall into infinite loops as written.