Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This post was inspired by Benja's SUDT post. I'm going to describe another simplified model of UDT which is equivalent to Benja's proposal, and is based on standard game theory concepts as described in this Wikipedia article.

First let's define what is a "single player extensive-form game with chance moves and imperfect information":

  1. A "single player extensive-form game" is a tree of nodes. Each leaf node is a utility value. A play of the game starts at the root and ends at some leaf node.
  2. Some non-leaf nodes are "chance nodes", with probabilities assigned to branches going out of that node. All other non-leaf nodes are "decision nodes", where the player can choose which branch to take. (Thanks to badger for helping me fix an error in this part!)
  3. "Imperfect information" means the decision nodes are grouped into "information sets". The player doesn't know which node they're currently at, only which information set it belongs to.
  4. "Imperfect recall" is a special case of imperfect information, where knowing the current information set doesn't even allow the player to figure out which information sets were previously visited, like in the Absent-Minded Driver problem.
  5. We will assume that the player can use "behavioral strategies", where the player can make a random choice at each node independently, rather than "mixed strategies", which randomize over the set of pure strategies for the entire game. See Piccione and Rubinstein's paper for more on this difference. (Thanks to Coscott for pointing out that assumption!)
  6. The behavioral strategy with the highest expected utility will be taken as the solution of the game.

Now let's try using that to solve some UDT problems:

Absent-Minded Driver is the simplest case, since it's already discussed in the literature as a game of the above form. It's strange that not everyone agrees that the best strategy is indeed the best, but let's skip that and move on.

Psy-Kosh's non-anthropic problem is more tricky, because it has multiple players. We will model it as a single-player game anyway, putting the decision nodes of the different players in sequence and grouping them together into information sets in the natural way. The resulting game tree is complicated, but the solution is the same as UDT's. As a bonus, we see that our model does not need any kind of anthropic probabilities, because it doesn't specify or use the probabilities of individual nodes within an information set.

Wei Dai's coordination problem is similar to the previous one, but with multiple players choosing different actions based on different information. If we use the same trick of folding all players into one, and group the decision nodes into information sets in the natural way, we get the right solution again. It's nice to see that our model automatically solves problems that require Wei's "explicit optimization of global strategy".

Counterfactual Mugging is even more tricky, because writing it as an extensive-form game must include a decision node for Omega's simulation of the player. Some people are okay with that, and our model gives the right solution. But others feel that it leads to confusing questions about the nature of observation. For example, what if Omega used a logical coin, and the player could actually check which way the coin came up by doing a long calculation? Paying up is probably the right decision, but our model here doesn't have enough detail.

Finally, Agent Simulates Predictor is the kind of problem that cannot be captured by our model at all, because logical uncertainty is the whole point of ASP.

It's instructive to see the difference between the kind of UDT problems that fit our model and those that require something more. Also it would be easy to implement the model as a computer program, and solve some UDT problems automatically. (Though the exercise wouldn't have much scientific value, because extensive-form games are a well known idea.) In this way it's a little similar to Patrick's work on modal agents, which made certain problems solvable on the computer by using modal logic instead of enumerating proofs. Now I wonder if other problems that involve logical uncertainty could also be solved by some simplified model?

New Comment
26 comments, sorted by Click to highlight new comments since: Today at 12:23 PM

You are ignoring an important detail.

1) In the (standard?) interpretation of an extensive form of a game, a strategy is a probability distribution on functions from information sets to outputs.

2) In another interpretation, a strategy is a function from information sets to independent probability distributions on outputs.

These are different. I thing that you are assuming the second interpretation, but I also believe that the first interpretation is more common.

These two interpretations are in fact very different. In the first interpretation, there is never any incentive to use randomness in a one player game and it is not possible to win the absent minded driver problem.

In some sense, it does not matter, you can take a problem modeled as one, and change it to be modeled as the other, at least approximately if you want to have a finite game graph. However when taking a real world problem and coming up with a game model, you have to be aware of what interpretation you are using.

I feel like which interpretation is more accurate is an actual empirical question that I do not know the answer to. It boils down to the following question:

I have copied you, and put the two identical copies in identical situation. You must both choose A or B. If you choose the same thing, you lose if you choose different things, you win.

Under one interpretation, you choose the same thing and both lose no matter what. Under the other interpretation, you can independently choose randomly, and win with probability 1/2.

The first option is standard. When the second interpretation comes up, those strategies are referred to as behavior strategies.

If every information set is visited at most once in the course of play, then the game satisfies no-absent-mindedness and every behavior strategy can be represented as a standard mixed strategy (but some mixed strategies don't have equivalent behavior strategies).

Kuhn's theorem says the game has perfect recall (roughly players never forget anything and there is a clear progression of time) if and only if mixed and behavior strategies are equivalent.

Thank you, I did not know the terminology.

The types of games we care about that inspire us to have to use UDT do not have perfect recall, so whether or not behavior strategies are possible is an important question. It also feels like an empirical question.

I think interpretation 1 is usually called "mixed strategy" and interpretation 2 is "behavioral strategy". In the post I was indeed assuming interpretation 2. Thanks for pointing that out! I have edited the post accordingly.

I do not think it is something you should just assume. I think it is an empirical question. I think that behavioral strategies might not be realistic, because they seem to depend on non-determinism.

Well, in the Absent-Minded Driver problem it seems reasonable to allow the driver to flip a coin whenever he's faced with a choice. Why do you think that's unrealistic?

Hmm. I was thinking that determinism requires that you get the same output in the same situation, but I guess I was not accounting for the fact that we do not require the two nodes in the information set to be the same situation, we only require that they are indistinguishable to the agent.

It does seem realistic to have the absent minded driver flip a coin. (although perhaps it is better to model that as a third option of flipping a coin, which points to chance node.)

On the other hand, If I am a deterministic Turing machine, and Omega simulates me and puts a dollar in whichever of two boxes he predicts I will not pick, then I cannot win this game unless I have an outside source of randomness.

It seems like in different situations, you want different models. It seems to me like you have two different types of agents: a deterministic dUDT agent and a randomized rUDT agent. We should be looking at both, because they are not the same. I also do not know which one I am as a human.

By asking about the Absent-Minded Driver with a coin, you were phrasing the problem so that it does not matter, because an rUDT agent is just a dUDT agent which has access to a fair coin that he can flip any number of times at no cost.

I agree that there is a difference, and I don't know which model describes humans better. It doesn't seem to matter much in any of our toy problems though, apart from AMD where we really want randomness. So I think I'm going to keep the post as is, with the understanding that you can remove randomness from the model if you really want to.

I agree that that is a good solution. Since adding randomness to a node is something that can be done in a formulaic way, it makes sense to have information sets which are just labeled as "you can use behavioral strategies here" It also makes sense to have them labeled as such by default.

I do not think that agents wanting but not having randomness is any more pathological than Newcomb's problem (Although that is already pretty pathological)

I suppose there is a third interpretation, which is that the player can mix and match between two types of randomness, and make the options in some instances of an information set correlated with some instances of an information set. This is not useful in a 1 player game, however, because any interpretation 1 type randomness does not help at all, so if the game is 1 player, you might as well just call this interpretation 2.

Your description of incomplete information is off. What you give as the definition of incomplete information is one type of imperfect information, where nature is added as a player.

A game has incomplete information when one player has more information than another about payoffs. Since Harsanyi, incomplete information has been seen as a special case of imperfect information with nature randomly assigning types to each player according to a commonly known distribution and payoffs given types being commonly known.

Thanks, you're right and I didn't understand what the Harsanyi transformation was. Edited the post again.

I think I see a reasonable way to represent the Logical Counterfactual Mugging in the extensive form. I can draw a diagram if it's desirable.

Well, either that or an explanation. I can't just figure it out from your comment.

I mean what I'm saying, by the way. As long as you're OK with assuming you can have a logical prior in the first place, I don't see any issue with representing LCM with the diagram I made.

Yes, it means the LCM is no different to the CM, but I don't see an issue with that. Apart from the question of involving different kinds of priors (logical vs non-logical) the two problems are indeed identical.

If I'm missing something, please tell me what it is; I'd like to know!

I agree that your diagram gives the right answer to logical Counterfactual Mugging. The problem is that it's not formal enough, because you don't really explain what a "logical prior" is. For example, if we have logical Counterfactual Mugging based on a digit of pi, then one of the two possible worlds is logically inconsistent. How do we know that calculating the digit of pi by a different method will give the same result in that world, rather than blow up the calculator or something? And once you give a precise definition of "logical prior", the problems begin to look more like programs or logical formulas than causal diagrams.

That's fair enough; the "logical prior" is definitely a relatively big assumption, although it's very hard to justify anything other than a 50/50 split between the two possibilities.

However, LCM only takes place in one of the two possible worlds (the real one); the other never happens. Either way you're calculating the digit of pi in this world, it's just that in one of the two possibilities (which, as far as you know are equal) you are the subject of logical counterfactual surgery by Omega. Assuming this is the case, surely calculating the digit of pi isn't going to help?

From the point of view of your decision-making algorithm, not knowing which of the two inputs it's actually receiving (counterfactual vs real) it outputs "GIVE". Moreover, it chooses "GIVE" not merely for counterfactual utilons, but for real ones.

Of course, we've assumed that the logical counterfactual surgery Omega does is a coherent concept to begin with. The whole point of Omega is that Omega gets the benefit of the doubt, but in this situation it's definitely still worthwhile to ask whether it makes sense.

In particular, maybe it's possible to make a logical counterfactual surgery detector that is robust even against Omega. If you can do that, then you win regardless of which way the logical coin came up. I don't think trying to calculate the relevant digit of pi is good enough, though.

Here's an idea for a "logical counterfactual surgery detector":
Run a sandboxed version of your proof engine that attempts to maximally entangle that digit of pi with other logical facts. For example, it might prove that "if the 10000th decimal digit of pi is 8, then ⊥". If you detect that the sandboxed proof engine undergoes a logical explosion, then GIVE. Otherwise, REFUSE.

Agreed. I was initially hesitant because you said "the model doesn't have enough detail" so I was checking to see if there was a response along the lines of "the model would necessarily fail to represent X" that you would bring up. Anyways, I have in fact discovered a truly marvelous model, but the margin is too small to contain it...

Fortunately, though, URL links are big enough to fit in the margin! http://www.gliffy.com/go/publish/6135045

For the purposes of the model itself you should, of course, ignore all of the suggestively named LISP tokens, except the labels (ALLCAPS) on the outgoing edges of nodes in the same information set (as these tell you which actions are "the same action"). In other words, the actual "model" consists only of:
(1) the nodes
(2) the directed edges
(3) the information set arrows (an equivalence relation)
(4) the identities of arrows out of the same information set (i.e. an equivalence relation on those arrows).
(5) the probabilities associated with a chance node
(6) the values inside the square boxes (utilities)

In the game as I've presented it, the optimal strategy is clearly to:

  • always GIVE when Omega asks you
  • if for some reason you happened to CALCULATE, you still GIVE after that (also I think you would be well-advised to run a full re-check of all of your AI subsystems to work out why you decided to CALCULATE. Hopefully it was just a cosmic ray ^_~)

OK, here's my model of Agent Simulates Predictor:
http://www.gliffy.com/go/publish/6149724
The agent has two information sets, joined by the dotted lines: Action if P-ONE (predictor predicts O), and Action if P-TWO (predictor predicts T).
It's unclear what the predictor will predict if the agent one-boxes vs P-ONE and two-boxes vs P-TWO, so I've assigned the variable "x" to the probability of P-ONE in that case. Similarly, it's unclear what the predictor will predict if the agent two-boxes vs P-ONE and one-boxes vs P-TWO (i.e. always does the opposite of what the predictor predicted), so I've the variable assigned "y" to the probability of P-ONE in that case. I'm guessing the default interpretation of ASP would be x=y=0.

However, it's probably not a good representation in the case of randomized strategies, as the predictor might have some specific way to respond to those. If we assume the agent only picks pure strategies, this much simpler game is an equivalent one:
http://www.gliffy.com/go/publish/6149719

It is quite clear that it is optimal to ONEBOX regardless of P-ONE or P-TWO (except if y > 1000/1001, in which case you should TWOBOX vs P-ONE and ONEBOX vs P-TWO).

The agent's ability to simulate the predictor is no different to both boxes being transparent in Newcomb's problem.

I still feel that logical uncertainty has to be tackled head on, rather than kept unspecified behind diagrams. But you bring up an interesting point. In the transparent Newcomb's problem as originally formulated by Gary, the simulated agent sees both boxes as filled. That way both the agent and the predictor can be deterministic, and the agent can't cause a paradox by doing the opposite of what was predicted. Maybe we should reformulate ASP in the same way.

Yes, logical uncertainty clearly needs to be dealt with, although I'd be interested in seeing a proper statement of the problem in its general form. Diagrams like this are a useful tool because you can give an uncontroversial correct answer to "what is the optimal strategy for diagram X?" but they don't answer the underlying questions unless you also have a theory that says the diagram should be drawn THIS way and not THAT way.

I think the diagrams I made are valid; the ASP one in particular is justified pretty well by the exact same reasoning that goes into Eliezer's TDT. The LCM one relies pretty heavily on being able to assign a prior probability of 0.5 to both possibilities, but without justification for something else I see little other choice than to go for a 50/50 split.

Also, yes; I do think that ASP in its standard form is underspecified, unlike Gary's transparent Newcomb's problem.

In terms of the proof-searching formalism you bring up in your post on ASP, you have these issues:
1) If proofs of one-boxing and two-boxing are both valid, which one gets found first?
2) If no proof can be found at all, what does the predictor do?

I think the straightforward way to resolve this issue with ASP is to state that the predictor only searches for proofs up to length N that the agent one-boxes, and fills both boxes if and only if it finds such a proof. This resolves the issue in the exact same way that "FairBot" does in the "Robust Cooperation" paper, right? Also, it's essentially analogous to Gary's transparent Newcomb's, because searching for a proof that the agent one-boxes is pretty much the same idea as simulating what they would do if both boxes were filled.

Of course, it's sufficiently well-specified in your original ASP post because you also specified an algorithm for the agent; in this case the agent happens to find a proof that two-boxing leads to $1000 and a "spurious" proof that one-boxing leads to $0, both of which do indeed turn out to be true because that agent ends up two-boxing.

If proofs of one-boxing and two-boxing are both valid, which one gets found first?

They can't be both valid, because only one of the two statements is true. We assume that the predictor's formal theory is sound (doesn't prove false statements).

If no proof can be found at all, what does the predictor do?

Right, that's the underspecified part. I guess the predictor should fill only one box in this case.

I think the straightforward way to resolve this issue with ASP is to state that the predictor only searches for proofs up to length N that the agent one-boxes, and fills both boxes if and only if it finds such a proof. This resolves the issue in the exact same way that "FairBot" does in the "Robust Cooperation" paper, right?

I think the UDT agent will still two-box in that case, for the same reason as before. To have FairBot-style cooperation, the agent must also search for one-sided proofs only.

Also, it's essentially analogous to Gary's transparent Newcomb's, because searching for a proof that the agent one-boxes is pretty much the same idea as simulating what they would do if both boxes were filled.

Why? If an agent one-boxes when both boxes are filled, that doesn't necessarily mean the agent one-boxes unconditionally.

They can't be both valid, because only one of the two statements is true. We assume that the predictor's formal theory is sound (doesn't prove false statements).

True, but still underspecified. They can't both be valid at the same time, but the point is that a priori it's possible that either one could be valid, and you don't know which one without knowing additional details. If the agent one-boxes whenever the predictor predicts it one-boxes, and two-boxes whenever the predictor predicts it two-boxes, the predictor's actual prediction is arbitrary---it will depend on the specific details of the predictor's proof-searching algorithm. I'm guessing this means that the proof itself would have to involve those details in some way.

I think the UDT agent will still two-box in that case, for the same reason as before. To have FairBot-style cooperation, the agent must also search for one-sided proofs only.

That would depend on the details of the UDT agent, wouldn't it? It's hard to say without those details. If you're assuming the proof-searching model then yes, but that's the same old spurious proof problem isn't it?

Why? If an agent one-boxes when both boxes are filled, that doesn't necessarily mean the agent one-boxes unconditionally.

If an agent one-boxes when both boxes are filled then the predictor will predict that the agent one-boxes when both boxes are filled, and so both boxes will be filed, and so the agent one-boxes unconditionally.

If you're assuming the proof-searching model then yes, but that's the same old spurious proof problem isn't it?

Yeah.

If an agent one-boxes when both boxes are filled then the predictor will predict that the agent one-boxes when both boxes are filled, and so both boxes will be filed, and so the agent one-boxes unconditionally.

I'm confused, which implementation of the predictor are you assuming in that sentence? I don't think that every predictor will be able to figure out every true statement about the agent...

Yeah, a predictor won't necessarily figure out every true statement, but my original point was about the transparent Newcomb's, in which case if Omega finds that the agent one-boxes whenever Omega fills both boxes, then Omega will fill both and the agent will one-box.

In the case of the proof-searching model things aren't quite as nice, but you can still get pretty far. If the predictor has a sufficiently high proof limit and knows that it fills both boxes whenever it finds a proof of one-boxing, then a short enough proof that the agent one-boxes whenever both boxes are filled should be sufficient (I'm quite sure a Löbian argument holds here).

So yes, from the predictor's point of view, a proof that the agent one-boxes whenever both boxes are filled should be sufficient. Now you just need the agent to not be stupid...