Decision Theories, Part 3.75: Hang On, I Think This Works After All
Followup to: Decision Theories, Part 3.5: Halt, Melt and Catch Fire, Decision Theories: A Semi-Formal Analysis, Part III
The thing about dead mathematical proofs is that it's practically always worth looting the corpse; sometimes you even find there's still a pulse! That appears to be the case with my recent post lamenting the demise of the TDT-like "Masquerade" algorithm. I think I've got a rigorous proof this time, but I'd like other opinions before I declare that the rumors of Masquerade's demise were greatly exaggerated...
To recap quickly, I've been trying to construct an algorithm that, given the payoff matrix of a game and the source code of its opponent, does some deductions and then outputs a move. I want this algorithm to do the commonsense right things (defect against both DefectBot and CooperateBot, and mutually cooperate against both FairBot and itself), and I want it to do so for simple and general reasons (that is, no gerrymandering of actions against particular opponents, and in particular no fair trying to "recognize itself", since there can be variants of any algorithm that are functionally identical but not provably so within either's powers of deduction). I'd also like it to be "un-exploitable" in a certain sense: it has a default move (which is one of its Nash equilibria), and no opponent can profit against the algorithm by forcing it below that default payoff. If the opponent does as well or better in expected value than it would by playing that Nash equilibrium, then so too does my algorithm.
The revised Masquerade algorithm does indeed have these properties.
In essence, there are two emendations that I needed: firstly, since some possible pairs of masks (like FairBot and AntiFairBot) can't knowingly settle on a fixed point, there's no way to determine what they do without a deductive capacity that strictly exceeds the weaker of them. That's a bad feature to have, so we'll just have to exclude potentially troublemaking masks from Masquerade's analysis. (In the special case of Prisoner's Dilemma I know that including DefectBot and FairBot will suffice; I've got what looks like a good solution in general, as well.)
The second emendation is that FairBot needs to alternate between trying harder to prove its opponent cooperates, and trying harder to prove its opponent defects. (There needs to be an asymmetry, like cooperation proofs going first, to guarantee that when FairBot plays against itself, it finds a Löbian proof of mutual cooperation rather than one of mutual defection.) The reason for this is so that when agents reason about masks, they should be able to find a proof of the mask's action without needing to exceed that mask's powers of deduction. Otherwise we get that arms race again.
This escalation of proof attempts can be represented in terms of proof limits (since there exists a constant C such that for N sufficiently large, a proof that "there are no proofs of X of length less than N" either exists with length less than C^N or not at all), but the simplest way to do this is with the formalism of PA+N. That is, PA is Peano Arithmetic; PA+1 is the formal system with the axioms of Peano Arithmetic and an extra axiom that PA is self-consistent (that is, if PA proves X, then PA does not prove not-X); PA+2 has those axioms and an extra one stating that PA+1 is self-consistent and so on. (Note that none of these formal systems know themselves to be self-consistent, and for good reason!) In every use, we'll assume that N is a fixed number (anything greater than 4 will work).
New And Improved Masks
So without further ado, let's define our masks for the Prisoner's Dilemma:
def DefectBot(X):
return D
def FairBot(X):
for i in range(N):
if "X(FairBot)=C" is provable in PA+i:
return C
if "X(FairBot)=D" is provable in PA+i:
return D
return D
Lemma 1: For any X, "DefectBot(X)=D" is provable in PA. (Obvious.)
Lemma 2: "FairBot(FairBot)=C" is provable in PA.
Proof: If we label the FairBots separately (after all, they might be logically equivalent but not know they are), we can quickly show that "if PA proves FairBot1(FairBot2)=C, then FairBot2(FairBot1)=C" is provable in PA (and the same with the subscripts reversed). This makes what we might term a Löbian cycle, which will come up a few more times in this post: if we have a formal system S, and "if S proves A, then B" and "if S proves B, then A" are theorems of S, then S indeed proves both A and B. (Proof reserved for a comment; hat tip to Eliezer and co.)
Lemma 3: "FairBot(DefectBot)=D" is provable in PA+1.
Proof: Note that this does not work in PA, even though PA can prove "DefectBot(FairBot)=D". Why not? Because PA can't prove that PA doesn't find a proof of "DefectBot(FairBot)=C" before it can look for a proof of defection! However, PA+1 can handle this just fine, because it proves "if PA proves DB(FB)=D, then PA does not prove DB(FB)=C".
Note the weird fact that it takes a stronger formal system to prove a result that's much more intuitive from outside the system!
New and Improved Masquerade
Let's split up the long definition of Masquerade with a helper function.
def Search(X,Y):
for i in range(N):
for j,k in [C,D]:
if "Y(X)=j and X(Y)=k" is provable in PA+i:
return (j,k)
return False
This function, given two agents, simply searches (using increasingly strong formal systems) for a proof of what they do against each other; once it finds one, it halts and returns that result, and if it doesn't find one in time, it announces its failure. We will incorporate this into the definition of Masquerade. (The other function call here is to the payoff matrix, to retrieve the utility U(j,k) of a certain pair of moves.)
def Masquerade(X):
utility = -∞
my_move = null
their_move = null
for Y in [DefectBot, FairBot]:
if Search(X,Y) != False:
(j,k) = Search(X,Y)
if U(j,k) > utility:
my_move = j
their_move = k
utility = U(j,k)
if utility > U(D,D):
for i in range(N):
if "X(Masquerade) = their_move" is provable in PA+i:
return my_move
return D
A few things to note: first, this agent actually halts and outputs an action against any opponent... but for it to provably do so in a system below PA+N, Search(X,Y) has to find proofs quickly. This is the reason that my prior attempt didn't work- it had to wait at one point for the old FairBot to run out of time/power before concluding what it did, and that made it impossible for the old FairBot to know what the old Masquerade did. But with the new and improved agents, we get to ground in a fixed number of steps.
For brevity, I'll label DefectBot, FairBot, and Masquerade as DB, FB, and M, respectively.
Lemma 4: "Search(DB,DB)=(D,D)" is provable in PA+1. (Follows from Lemma 1; note that it needs to use PA+1 in order to rule out finding proofs of other action-pairs.)
Lemma 5: "Search(FB,DB)=Search(DB,FB)=(D,D)" is provable in PA+2. (Follows from Lemma 3.)
Lemma 6: "Search(FB,FB)=(C,C)" is provable in PA. (Follows from Lemma 2; since (C,C) is the first one tried, we don't even need to go up to PA+1.)
Lemma 7: "Masquerade(DB)=D" is provable in PA+3.
Proof: Lemmas 4 and 5, plus the fact that PA+3 knows the consistency of PA+2. There is no sanity-check step, since utility=U(D,D) here.
Lemma 8: "Masquerade(FB)=C" and "FB(Masquerade)=C" are provable in PA+3.
Proof: Lemmas 5 and 6, and the consistency of PA+2, imply that when Masquerade arrives at the sanity-check stage, it has the variables set as utility=U(C,C), my_move=C and their_move=C. Thus PA+3 can prove that "if 'FB(M)=C' is provable in PA+3, then M(FB)=C". And of course, "if 'M(FB)=C' is provable in PA+3, then FB(M)=C" is provable in PA+3, since again PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions before it gets around to trying to find cooperation in PA+3. Therefore we have the desired Löbian cycle!
Theorem: "Masquerade(Masquerade)=C" is provable in PA+4.
Proof: Lemmas 7 and 8, and the consistency of PA+3, allow PA+4 to prove that when each Masquerade arrives at the sanity-check stage, it has set utility=U(C,C), my_move=C and their_move=C. Thus we achieve the Löbian cycle, and find proofs of mutual cooperation!
Awesome! So, what next?
Well, assuming that I haven't made a mistake in one of my proofs, I'm going to run the same proof for my generalization: given a payoff matrix in general, Masquerade enumerates all of the constant strategies and all of the "mutually beneficial deals" of the FairBot form (that is, masks that hold out the "stick" of a particular Nash equilibrium and the "carrot" of another spot on the payoff matrix which is superior to the "stick" for both players). Then it alternates (at escalating PA+n levels) between trying to prove the various good deals that the opponent could agree to. There are interesting complexities here (and an idea of what bargaining problems might involve).
Secondly, I want to see if there's a good way of stating the general problem that Masquerade solves, something better than "it agrees with commonsense decision theory". The analogy here (and I know it's a fatuous one, but bear with me) is that I've come up with a Universal Turing Machine but not yet the Church-Turing Thesis. And that's unsatisfying.
But before anything else... I want to be really sure that I haven't made a critical error somewhere, especially given my false start (and false halt) in the past. So if you spot a lacuna, let me know!
Decision Theories, Part 3.5: Halt, Melt and Catch Fire
Followup to: Decision Theories: A Semi-Formal Analysis, Part III
UPDATE: As it turns out, rumors of Masquerade's demise seem to have been greatly exaggerated. See this post for details and proofs!
I had the chance, over the summer, to discuss the decision theory outlined in my April post with a bunch of relevantly awesome people. The sad part is, there turned out to be a fatal flaw once we tried to formalize it properly. I'm laying it out here, not with much hope that there's a fix, but because sometimes false starts can be productive for others.
Since it's not appropriate to call this decision theory TDT, I'm going to use a name suggested in one of these sessions and call it "Masquerade", which might be an intuition pump for how it operates. So let's first define some simple agents called "masks", and then define the "Masquerade" agent.
Say that our agent has actions a1, ... , an, and the agent it's facing in this round has actions b1, ... , bm. Then for any triple (bi, aj, ak), we can define a simple agent Maskijk which takes in its opponent's source code and outputs an action:
def Mask_ijk(opp_src):
look for proof that Opp(Mask_ijk) = bi
if one is found, then output aj
otherwise, output ak
(This is slightly less general than what I outlined in my post, but it'll do for our purposes. Note that there's no need for aj and ak to be distinct, so constant strategies fall under this umbrella as well.)
A key example of such an agent is what we might call FairBot: on a Prisoner's Dilemma, FairBot tries to prove that the other agent cooperates against FairBot, and if it finds such a proof, then it immediately cooperates. If FairBot fails to find such a proof, then it defects. (An important point is that if FairBot plays against itself and both have sufficiently strong deductive capacities, then a short proof of one's cooperation gives a slightly longer proof of the other's cooperation, and thus in the right circumstances we have mutual cooperation via Löb's Theorem.)
The agent Masquerade tries to do better than any individual mask (note that FairBot foolishly cooperates against CooperateBot when it could trivially do better by defecting). My original formulation can be qualitatively described as trying on different masks, seeing which one fares the best, and then running a "sanity check" to see if the other agent treats Masquerade the same way it treats that mask. The pseudocode looked like this:
def Masquerade(opp_src):
for each (i,j,k), look for proofs of the form "Mask_ijk gets utility u against Opp"
choose (i,j,k) corresponding to the largest such u found
look for proof that Opp(Masquerade) = Opp(Mask_ijk)
if one is found, then output the same thing as Mask_ijk(Opp)
otherwise, output a default action
(The default should be something safe like a Nash equilibrium strategy, of course.)
Intuitively, when Masquerade plays the Prisoner's Dilemma against FairBot, Masquerade finds that the best utility against FairBot is achieved by some mask that cooperates, and then Masquerade's sanity-check is trying to prove that FairBot(Masquerade) = C as FairBot is trying to prove that Masquerade(FairBot) = C, and the whole Löbian circus goes round again. Furthermore, it's intuitive that when Masquerade plays against another Masquerade, the first one notices the proof of the above, and finds that the best utility against the other Masquerade is achieved by FairBot; thus both pass to the sanity-check stage trying to imitate FairBot, both seek to prove that the other cooperate against themselves, and both find the Löbian proof.
So what's wrong with this intuitive reasoning?
Problem: A deductive system can't count on its own consistency!
Let's re-examine the argument that Masquerade cooperates with FairBot. In order to set up the Löbian circle, FairBot needs to be able to prove that Masquerade selects a mask that cooperates with FairBot (like CooperateBot or FairBot). There are nice proofs that each of those masks attains the mutual-cooperation payoff against FairBot, but we also need to be sure that some other mask won't get the very highest (I defect, you cooperate) payoff against FairBot. Now you and I can see that this must be true, because FairBot simply can't be exploited that way. But crucially, FairBot can't deduce its own inexploitability without thereby becoming exploitable (for the same Gödelian reason that a formal system can't prove its own consistency unless it is actually inconsistent)!
Now, the caveats to this are important: if FairBot's deductive process is sufficiently stronger than the deductive process that's trying to exploit it (for example, FairBot might have an oracle that can answer questions about Masquerade's oracle, or FairBot might look for proofs up to length 2N while Masquerade only looks up to length N), then it can prove (by exhaustion if nothing else) that Masquerade will select a cooperative mask after all. But since Masquerade needs to reason about Masquerade at this level, this approach goes nowhere. (At first, I thought that having a weaker oracle for Masquerade's search through masks, and a stronger oracle both for each mask and for Masquerade's sanity-check, would solve this. But that doesn't get off the ground: the agent thus defined attains mutual cooperation with FairBot, but not with itself, because the weaker oracle can't prove that it attains mutual cooperation with FairBot.)
Another caveat is the following: FairBot may not be able to rule out the provability of some statement we know is false, but (given a large enough deductive capacity) it can prove that a certain result is the first of its kind in a given ordering of proofs. So if our agents act immediately on the first proof they find, then we could make a version of Masquerade work... as long as each search does find a proof, and as long as that fact is provable by the same deduction system. But there's an issue with this: two masks paired against each other won't necessarily have provable outcomes!
Let's consider the following mask agent, which we'll call AntiFairBot: it searches for a proof that its opponent cooperates against it, and it defects if it finds one; if it doesn't find such a proof, then it cooperates. This may not be a very optimal agent, but it has one interesting property: if you pit AntiFairBot against FairBot, and the two of them use equivalent oracles, then it takes an oracle stronger than either to deduce what the two of them will do! Thus, Masquerade can't be sure that AntiFairBot won't get the highest payoff against FairBot (which of course it won't) unless it uses a stronger deduction system for the search through masks than FairBot uses for its proof search (which would mean that FairBot won't be able to tell what mask Masquerade picks).
I tried to fix this by iterating over only some of the masks; after all, there's no realistic opponent against whom AntiFairBot is superior to both FairBot and DefectBot. Unfortunately, at this point I realized two things: in order to play successfully against a reasonable range of opponents on the Prisoner's Dilemma, Masquerade needs to be able to imitate at least both FairBot and DefectBot; and FairBot cannot prove that FairBot defects against DefectBot. (There are variants of FairBot that can do so, e.g. it could search both for proofs of cooperation and proofs of defection and playing symmetrically if it finds one, but this variant is no longer guaranteed to cooperate against itself!)
If there are any problems with this reasoning, or an obvious fix that I've missed, please bring it to my attention; but otherwise, I've decided that my approach has failed drastically enough that it's time to do what Eliezer calls "halt, melt, and catch fire". The fact that Löbian cooperation works is enough to keep me optimistic about formalizing this side of decision theory in general, but the ideas I was using seem insufficient to succeed. (Some variant of "playing chicken with my deductive system" might be a crucial component.)
Many thanks to all of the excellent people who gave their time and attention to this idea, both on and offline, especially Eliezer, Vladimir Slepnev, Nisan, Paul Christiano, Critch, Alex Altair, Misha Barasz, and Vladimir Nesov. Special kudos to Vladimir Slepnev, whose gut intuition on the problem with this idea was immediate and correct.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)