Decision Theories, Part 3.75: Hang On, I Think This Works After All

orthonormal

Followup to: Decision Theories, Part 3.5: Halt, Melt and Catch Fire, Decision Theories: A Semi-Formal Analysis, Part III

The thing about dead mathematical proofs is that it's practically always worth looting the corpse; sometimes you even find there's still a pulse! That appears to be the case with my recent post lamenting the demise of the TDT-like "Masquerade" algorithm. I think I've got a rigorous proof this time, but I'd like other opinions before I declare that the rumors of Masquerade's demise were greatly exaggerated...

To recap quickly, I've been trying to construct an algorithm that, given the payoff matrix of a game and the source code of its opponent, does some deductions and then outputs a move. I want this algorithm to do the commonsense right things (defect against both DefectBot and CooperateBot, and mutually cooperate against both FairBot and itself), and I want it to do so for simple and general reasons (that is, no gerrymandering of actions against particular opponents, and in particular no fair trying to "recognize itself", since there can be variants of any algorithm that are functionally identical but not provably so within either's powers of deduction). I'd also like it to be "un-exploitable" in a certain sense: it has a default move (which is one of its Nash equilibria), and no opponent can profit against the algorithm by forcing it below that default payoff. If the opponent does as well or better in expected value than it would by playing that Nash equilibrium, then so too does my algorithm.

The revised Masquerade algorithm does indeed have these properties.

In essence, there are two emendations that I needed: firstly, since some possible pairs of masks (like FairBot and AntiFairBot) can't knowingly settle on a fixed point, there's no way to determine what they do without a deductive capacity that strictly exceeds the weaker of them. That's a bad feature to have, so we'll just have to exclude potentially troublemaking masks from Masquerade's analysis. (In the special case of Prisoner's Dilemma I know that including DefectBot and FairBot will suffice; I've got what looks like a good solution in general, as well.)

The second emendation is that FairBot needs to alternate between trying harder to prove its opponent cooperates, and trying harder to prove its opponent defects. (There needs to be an asymmetry, like cooperation proofs going first, to guarantee that when FairBot plays against itself, it finds a Löbian proof of mutual cooperation rather than one of mutual defection.) The reason for this is so that when agents reason about masks, they should be able to find a proof of the mask's action without needing to exceed that mask's powers of deduction. Otherwise we get that arms race again.

This escalation of proof attempts can be represented in terms of proof limits (since there exists a constant C such that for N sufficiently large, a proof that "there are no proofs of X of length less than N" either exists with length less than C^N or not at all), but the simplest way to do this is with the formalism of PA+N. That is, PA is Peano Arithmetic; PA+1 is the formal system with the axioms of Peano Arithmetic and an extra axiom that PA is self-consistent (that is, if PA proves X, then PA does not prove not-X); PA+2 has those axioms and an extra one stating that PA+1 is self-consistent and so on. (Note that none of these formal systems know themselves to be self-consistent, and for good reason!) In every use, we'll assume that N is a fixed number (anything greater than 4 will work).

New And Improved Masks

So without further ado, let's define our masks for the Prisoner's Dilemma:

def DefectBot(X):
    return D

def FairBot(X):
    for i in range(N):
        if "X(FairBot)=C" is provable in PA+i:
            return C
        if "X(FairBot)=D" is provable in PA+i:
            return D
    return D

Lemma 1: For any X, "DefectBot(X)=D" is provable in PA. (Obvious.)

Lemma 2: "FairBot(FairBot)=C" is provable in PA.

Proof: If we label the FairBots separately (after all, they might be logically equivalent but not know they are), we can quickly show that "if PA proves FairBot1(FairBot2)=C, then FairBot2(FairBot1)=C" is provable in PA (and the same with the subscripts reversed). This makes what we might term a Löbian cycle, which will come up a few more times in this post: if we have a formal system S, and "if S proves A, then B" and "if S proves B, then A" are theorems of S, then S indeed proves both A and B. (Proof reserved for a comment; hat tip to Eliezer and co.)

Lemma 3: "FairBot(DefectBot)=D" is provable in PA+1.

Proof: Note that this does not work in PA, even though PA can prove "DefectBot(FairBot)=D". Why not? Because PA can't prove that PA doesn't find a proof of "DefectBot(FairBot)=C" before it can look for a proof of defection! However, PA+1 can handle this just fine, because it proves "if PA proves DB(FB)=D, then PA does not prove DB(FB)=C".

Note the weird fact that it takes a stronger formal system to prove a result that's much more intuitive from outside the system!

New and Improved Masquerade

Let's split up the long definition of Masquerade with a helper function.

def Search(X,Y):
    for i in range(N):
        for j,k in [C,D]:
            if "Y(X)=j and X(Y)=k" is provable in PA+i:
                return (j,k)
    return False

This function, given two agents, simply searches (using increasingly strong formal systems) for a proof of what they do against each other; once it finds one, it halts and returns that result, and if it doesn't find one in time, it announces its failure. We will incorporate this into the definition of Masquerade. (The other function call here is to the payoff matrix, to retrieve the utility U(j,k) of a certain pair of moves.)

def Masquerade(X):
    utility = -∞
    my_move = null
    their_move = null
    for Y in [DefectBot, FairBot]:
        if Search(X,Y) != False:
            (j,k) = Search(X,Y)
            if U(j,k) > utility:
                my_move = j
                their_move = k
                utility = U(j,k)
    if utility > U(D,D):
        for i in range(N):
            if "X(Masquerade) = their_move" is provable in PA+i:
                return my_move
    return D

A few things to note: first, this agent actually halts and outputs an action against any opponent... but for it to provably do so in a system below PA+N, Search(X,Y) has to find proofs quickly. This is the reason that my prior attempt didn't work- it had to wait at one point for the old FairBot to run out of time/power before concluding what it did, and that made it impossible for the old FairBot to know what the old Masquerade did. But with the new and improved agents, we get to ground in a fixed number of steps.

For brevity, I'll label DefectBot, FairBot, and Masquerade as DB, FB, and M, respectively.

Lemma 4: "Search(DB,DB)=(D,D)" is provable in PA+1. (Follows from Lemma 1; note that it needs to use PA+1 in order to rule out finding proofs of other action-pairs.)

Lemma 5: "Search(FB,DB)=Search(DB,FB)=(D,D)" is provable in PA+2. (Follows from Lemma 3.)

Lemma 6: "Search(FB,FB)=(C,C)" is provable in PA. (Follows from Lemma 2; since (C,C) is the first one tried, we don't even need to go up to PA+1.)

Lemma 7: "Masquerade(DB)=D" is provable in PA+3.

Proof: Lemmas 4 and 5, plus the fact that PA+3 knows the consistency of PA+2. There is no sanity-check step, since utility=U(D,D) here.

Lemma 8: "Masquerade(FB)=C" and "FB(Masquerade)=C" are provable in PA+3.

Proof: Lemmas 5 and 6, and the consistency of PA+2, imply that when Masquerade arrives at the sanity-check stage, it has the variables set as utility=U(C,C), my_move=C and their_move=C. Thus PA+3 can prove that "if 'FB(M)=C' is provable in PA+3, then M(FB)=C". And of course, "if 'M(FB)=C' is provable in PA+3, then FB(M)=C" is provable in PA+3, since again PA+3 can prove that PA through PA+2 won't have found proofs of contrary conclusions before it gets around to trying to find cooperation in PA+3. Therefore we have the desired Löbian cycle!

Theorem: "Masquerade(Masquerade)=C" is provable in PA+4.

Proof: Lemmas 7 and 8, and the consistency of PA+3, allow PA+4 to prove that when each Masquerade arrives at the sanity-check stage, it has set utility=U(C,C), my_move=C and their_move=C. Thus we achieve the Löbian cycle, and find proofs of mutual cooperation!

Awesome! So, what next?

Well, assuming that I haven't made a mistake in one of my proofs, I'm going to run the same proof for my generalization: given a payoff matrix in general, Masquerade enumerates all of the constant strategies and all of the "mutually beneficial deals" of the FairBot form (that is, masks that hold out the "stick" of a particular Nash equilibrium and the "carrot" of another spot on the payoff matrix which is superior to the "stick" for both players). Then it alternates (at escalating PA+n levels) between trying to prove the various good deals that the opponent could agree to. There are interesting complexities here (and an idea of what bargaining problems might involve).

Secondly, I want to see if there's a good way of stating the general problem that Masquerade solves, something better than "it agrees with commonsense decision theory". The analogy here (and I know it's a fatuous one, but bear with me) is that I've come up with a Universal Turing Machine but not yet the Church-Turing Thesis. And that's unsatisfying.

But before anything else... I want to be really sure that I haven't made a critical error somewhere, especially given my false start (and false halt) in the past. So if you spot a lacuna, let me know!

Followup to: Decision Theories, Part 3.5: Halt, Melt and Catch Fire, Decision Theories: A Semi-Formal Analysis, Part III

The revised Masquerade algorithm does indeed have these properties.

New And Improved Masks

So without further ado, let's define our masks for the Prisoner's Dilemma:

Lemma 1: For any X, "DefectBot(X)=D" is provable in PA. (Obvious.)

Lemma 2: "FairBot(FairBot)=C" is provable in PA.

Lemma 3: "FairBot(DefectBot)=D" is provable in PA+1.

Note the weird fact that it takes a stronger formal system to prove a result that's much more intuitive from outside the system!

New and Improved Masquerade

Let's split up the long definition of Masquerade with a helper function.

def Search(X,Y):
    for i in range(N):
        for j,k in [C,D]:
            if "Y(X)=j and X(Y)=k" is provable in PA+i:
                return (j,k)
    return False

For brevity, I'll label DefectBot, FairBot, and Masquerade as DB, FB, and M, respectively.

Awesome! So, what next?

@orthonormal: Thanks for pointing me to these posts during our dinner last Thursday.

I haven't checked the proofs in detail, and I didn't read all previous posts in the series. I'm a bit confused about a couple of things, and I'd be happy if someone could enlighten me.

1) What is the purpose of the i-loops?

In your definition of FairBot and others, you check for all i=1,...,N whether some sentence is provable in PA+i. However, if a sentence is provable in PA+i, then it is also provable in PA+(i+1). For this reason, I don't understand why you loop over i=1,...,N.

2) FairBot and the search procedure can be simplified

The following holds for all agents X and Y, and all j,k in {C,D}:

Search(X,Y) fails if and only if, for all j,k in {C,D}, the sentence "j = X(Y) and k=Y(X)" is not provable in PA+N.
If Search(X,Y) does not fail, then Search(X,Y) outputs ( X(Y) , Y(X) ).

This means you could define the search procedure simply as follows:

Search'(X,Y) := ( X(Y) , Y(X) )

This procedure Search' has the same output as Search whenever Search does not fail, and Search' is meaningful on strictly more inputs. In particular, Search' halts and outputs the correct answer if and only if X halts on input Y and Y halts on input X.

Then you can redefine Search(X,Y) equivalently as follows:

If "X(Y) halts and Y(X) halts" is provable in PA+N then return ( X(Y) , Y(X) );
Otherwise, return False.

Formulated this way, it becomes apparent that you are trying to find proofs for the fact that certain Turing machines halt on certain inputs; the actual output is not so important and can be computed outside the proof system.

Similarly, FairBot(X) can be simplified as follows:

If "X(FairBot) halts" is provable in PA+N, then return X(FairBot).
Otherwise return D.

This nicely matches the intuition that we try to understand our opponent in interaction with us, and if we do, we can predict what they will do and match their action; if we fail to understand them, we are clueless and have to defect.

3) On a related note, what is the utility if we don't halt?

Maybe your concern is that we should halt even if we play against the Turing machine that never halts? To me it seems that a prisoner who can't decide on what to do effectively outputs "Cooperate" since the guards will only wait a finite amount of time and don't extract any information from a prisoner that does not halt to explicitly output D.

In the model in which not halting is the same as cooperating, the trivial bot below is optimally fair:

TrivialFairBot(X):

return X(TrivialFairBot). (i.e., simply simulate your opponent and match their action)

Based on the fact that you're doing something more complicated, I assume that you want to forbid not halting; let's say we model this such that every non-halting player gets a utility of -∞. Then it is a bit weird that we have to halt while our opponent is allowed to run into an infinite loop: it seems to give malicious opponents an unfair advantage if they're allowed to threaten to go into an infinite loop.

4) Why allow these PA oracles?

I don't understand why you forbid your agents to run into an infinite loop, but at the same time allow them have access to uncomputable PA+N oracles. Or are you only looking at proofs of a certain bounded length? If so, what's the motivation for that?

5) How do you define "regrets" formally?

I haven't checked this, but it seems that, by diagonalization arguments, every possible agent will loose against infinitely many opponents. It further seems that, for any given agent X, you can always find an opponent Y so that adding a hard-wired rule "if your opponent is this particular agent Y, then do C/D" (whichever is better) improves the performance of X (note that this can't be done for all Y since Y could first check whether X has a hard-wired rule like that). In particular, I can imagine that there is an infinite sequence X_1,.. of agents, such that X_{i+1} performs at least as good as X_i on all inputs, and it performs better on at least one input. This would mean that there is no "Pareto-optimum" since there's always some input on which our performance can be improved.

6) What's the motivation for studying a decision theory that is inherently inefficient, almost not computable, and easily exploitable by some opponents?

If we're trying to model a real decision theory, the two agents arguably shouldn't have access to a lot of time. Would the following resource-bounded version make sense? For some function t(n), the machines are forced to halt and produce their output within t(n) computation steps, where n=|X|+|Y| is the length of the binary encoding of the machines. In this post, you're getting decision theories that are at least exponential in n. I'd be interested to see whether we achieve something when t(n) is a polynomial?

The time-bounded setting has some advantages: I think it'd more realistically model the problem, and it prevents an arms race of power since no player is allowed to take more than t(n) time. On the other hand, things get more difficult (and more interesting) now since you can inspect your opponent's source code, but you don't have enough time to fully simulate your opponent.

39

Decision Theories, Part 3.75: Hang On, I Think This Works After All

39

New And Improved Masks

New and Improved Masquerade

Awesome! So, what next?

39

39

Decision Theories, Part 3.75: Hang On, I Think This Works After All

39

New And Improved Masks

New and Improved Masquerade

Awesome! So, what next?

39