You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

What should superrational players do in asymmetric games?

10 badger 24 January 2014 07:42AM

Rereading Hofstadter's essays on superrationality prompted me to wonder what strategies superrational agents would want to commit to in asymmetric games. In symmetric games, everyone can agree on outcome they'd like to jointly achieve, leaving the decision-theoretic question of whether the players can commit or not. In asymmetric games, life becomes murkier. There are typically many Pareto-efficient outcomes, and we enter the wilds of cooperative game theory and bargaining solutions trying to identify the right one. While, say, the Nash bargaining solution is appealing on many levels, I have a hard time connecting the logic of superrationality to any particular solution. Recently though, I found some insight in "Cooperation in Strategic Games Revisited" by Adam Kalai and Ehud Kalai (working paper version and three-page summary version) for the special case of two-player games with side transfers.

Just to make sure everyone's on common ground, the prototypical game examined in the argument for superrationality is the prisoners' dilemma:

Alice / Bob   Cooperate       Defect    
Cooperate 10 / 10 0 / 12
Defect 12 / 0 4 / 4

 

The unique dominant-strategy equilibrium is (Defect, Defect). However, Hofstadter argues that "superrational" players would recognize the symmetry in reasoning processes between each other and thus conclude that cooperating is in their interest. The argument is not in favor of unconditional cooperation. Instead, the reasoning is closer to "I cooperate if and only I expect you to cooperate if and only if I cooperate". Many bits have been devoted to formalizing this reasoning in timeless decision theory and other variants.

The symmetry in the prisoners' dilemma makes it easy to pick out (Cooperate, Cooperate) as the action profile each player ideally wants to see happen. Consider instead the following skewed prisoners' dilemma:

Alice/Bob   Cooperate       Defect    
Cooperate 2 / 18  0 / 12
Defect 12 / 0 4 / 4

 

The (Cooperate, Cooperate) outcome still has the highest total benefit, but (Defect, Defect) is also Pareto-efficient. With this asymmetry, it seems reasonable for Alice to Defect, even as someone who would cooperate in the original prisoners' dilemma. Suppose however players can also agree to transfer utility between themselves on a 1-to-1 basis (like if they value cash equally and can make side-payments). Then, (Cooperate, Cooperate) with a transfer between 2 and 14 from Bob to Alice dominates (Defect, Defect). The size of the transfer is still up in the air, although a transfer of 8 (leaving both with a payoff of 10) is appealing since it takes us back to the original symmetric game. I feel confident suggesting this as an outcome the players should commit to if possible.

While the former game could be symmetrized in a nice way, what about more general games where payoffs could look even more askew or strategy sets could be completely different?

Let A be the payoff matrix for Alice and B be the payoff matrix for Bob in any given game. Kalai and Kalai point out that the game (A, B) can be decomposed into the sum of two games:

where payoffs are identical in the first game (the team game) and zero-sum in the second (the advantage game). Consider playing these games separately. In the team game, Alice and Bob both agree on the action profile that maximizes their payoff with no controversy. In the advantage game, preferences are exactly opposed, so each can play their maximin strategy, again with no controversy. Of course, the rub is the team game strategy profile could be very different from the advantage game strategy profile.

Suppose Alice and Bob could commit to playing each game separately. Kalai and Kalai define the payoffs each gets between the two games as

where coco stands for cooperative/competitive. We don't actually have two games to be played separately, so the way to achieve these payoffs is for Alice and Bob to actually play the team game actions and hypothetically play the advantage game. Transfers then even out the gains from the team game results and add in the hypothetical advantage game results. Even though the original game might be asymmetric, this simple decomposition allows players to cooperate exactly where interests are aligned and compete exactly where interests are opposed.

For example, consider two hot dog vendors. There are 40 potential customers at the airport and 100 at the beach. If both choose the same location, they split the customers there evenly. Otherwise, the vendor at each location sells to everyone at that place. Alice turns a profit of $2 per customer, while Bob turns a profit of $1 per customer. Overall this yields the payoffs:

Alice / Bob      Airport            Beach      
Airport 40 / 20 80 / 100
Beach 200 / 40 100 / 50

 

The game decomposes into the team game:

Alice / Bob       Airport            Beach      
Airport 30 / 30 90 / 90
Beach 120 / 120 75 / 75

 

and the advantage game:

Alice / Bob       Airport            Beach      
Airport 10 / -10 -10 / 10
Beach 80 / -80 25 / -25

 

The maximizing strategy profile for the team game is (Beach, Airport) with payoffs (120, 120). The maximin strategy profile for the advantage game is (Beach, Beach) with payoffs (25, -25). In total, this game has a coco-value of (145, 95), which would be realized by Alice selling at the beach, Bob selling at the airport, and Alice transferring 55 to Bob. Alice generates most of the profits in this situation, but Bob has to be compensated for his credible threat to start selling at the beach too.

The bulk of the Kalai and Kalai article is extending the coco-value to incomplete information settings. For instance, each vendor might have some private information about the weather tomorrow, which will affect the number of customers at the airport and the beach. The Kalais prove that being able to publicly observe the payoffs for the chosen actions is sufficient for agents to commit themselves to the coco-value ex-ante (before receiving any private information) and that being able to publicly observe all hypothetical payoffs from alternative action profiles is sufficient for commitment even after agents have private information.

The Kalais provide an axiomatization of the coco-value, showing it is the payoff pair that uniquely satisfies all of the following:

  1. Pareto optimality: The sum of the values is maximal.
  2. Shift invariance: Increasing a player's payoff by a constant amount in each cell increases their value by the same amount.
  3. Payoff dominance: If one player always gets more than the other in each cell, that player can't get a smaller value for the game.
  4. Invariance to redundant strategies: Adding a new action that is a convex combination of the payoffs of two other actions can't change the value.
  5. Monotonicity in actions: Removing an action from a player can't increase their value for the game.
  6. Monotonicity in information: Giving a player strictly less information can't increase their value for the game.

The coco-value is also easily computable, unlike Nash equilibria in general. I'm hard-pressed to think of any more I could want from it (aside from easy extensions to bigger classes of games). Given its simplicity, I'm surprised it wasn't hit upon earlier.

Schelling Point Strategy Training

8 sixes_and_sevens 04 October 2013 03:41PM

There's a category of game-theoretic scenario called Battle of the Sexes, which is commonly used to demonstrate coordination problems. Two cinema-goers, traditionally a husband and wife, have agreed to go to the cinema, but haven't decided on what to see beforehand. Of the two films that are showing, she would rather see King Kong Lives, while he would rather see Big Momma's House 2. Each would rather see their non-preferred film with their spouse than see their preferred film on their own. The payoff matrix is as follows:

 

Husband
King Kong Lives Big Momma's House 2
Wife King Kong Lives 2 / 1 0 / 0
Big Momma's House 2 0 / 0 1 / 2

 

The two have not conferred beforehand, beyond sharing knowledge of their preferences. They are turning up to the cinema and picking an auditorium in the hope that their spouse is in there.  Which should they pick?  This is a classic coordination problem. The symmetry of their preferences means there is no stand-out option for them to converge on. There is no Schelling Point.1

Except I'm going to argue that there is.

Shoehorning an example of a Schelling Point into the above scenario, we might imagine that one of the above films being screened is being billed as "an ideal romantic treat to share with your spouse", (which one that would be, I'm not entirely sure), though in the absence of a "natural" Schelling Point, there's no reason we can't make one. All we need is to identify procedures that would reliably elevate one of these options to our attention.  Then it becomes a question of selecting which of these procedures is most likely to be selected by the other agent in the scenario.

I am now going to instigate a multidimensional instance of Battle of the Sexes with all the readers of this post.  Below are sixteen randomly-ordered films.  I am going to select one, and invite you to do the same.  The object of the exercise is for all of us to pick the same one.  I will identify my selection, and the logic behind it, in rot13 after the list.

Breakfast at Tiffany's
William Shakespeare's Romeo and Juliet
E.T. the Extra-Terrestrial
Children of the Corn
An American Werewolf in London
To Kill a Mockingbird
Harold and Maude
The Day the Earth Stood Still
Duck Soup
Highlander
Fantasia
Heathers
Forbidden Planet
Butch Cassidy and the Sundance Kid
Grosse Pointe Blank
Mrs. Doubtfire

Urer vf na vapbafrdhragvny fragrapr gb guebj bss crbcyr jub pna vagrecerg guvf plcure ba fvtug ol abj. Zl fryrpgvba jnf na nzrevpna jrerjbys va Ybaqba. Gur cebprqher V fryrpgrq jnf gur svefg svyz nycunorgvpnyyl. Guvf frrzf yvxr gur zbfg "boivbhf" cebprqher sbe eryvnoyl fryrpgvat n fvatyr vgrz sebz gur frg. Cbffvoyl n zber "boivbhf" bar jbhyq fvzcyl or gb fryrpg gur svefg bar ba gur yvfg (Oernxsnfg ng Gvssnal'f va guvf pnfr), ohg V jnf bcrengvat ba gur nffhzcgvba gung gur yvfg jnf abg arprffnevyl eryvnoyl-beqrerq (juvpu V gevrq gb pbairl ol qrfpevovat gur yvfg nf "enaqbzyl-beqrerq", ohg pbhyqa'g ernyyl rkcyvpvgyl fgngr jvgubhg cbffvoyl tvivat n ovt uvag nf gb gur cebprqher V pubfr. Guvf jbhyq unir fcbvyrq guvatf n yvggyr.

I have no idea if that worked.  Whether or not it did, it seems to me that the general skill of identifying popular procedures for designating Schelling Points is possibly a worthwhile skill to develop. It also seems to me that once a handful of common strategies for identifying Schelling Points are known to a group, some effort has to be put into constructing scenarios in which that group can't coordinate. This forms the outline of an adversarial game, (provisionally named Schelling Point Strategy Training), whereby two teams take it in turns to construct and present a set of options which the other team has to coordinate on. I am idly toying with running a session of this at a future London Less Wrong meetup. 


There is actually an unrelated meta-strategy here, whereby on all disputes one designated partner acquiesces to the wishes of the other.  This behaviour is also far from unheard of in romantic partnerships.  While this doesn't seem very egalitarian, I am wondering if it actually becomes a reasonable trade-off for partnerships which face coordination problems on a regular basis.

Testing lords over foolish lords: gaming Pascal's mugging

2 Stuart_Armstrong 07 May 2013 06:47PM

There are two separate reasons to reject Pascal's mugger's demands. The first one is if you have a system of priors or a method of updating that precluded you from going along with the deal. The second reason is that if it becomes known that you accept Pascal's mugger situations, people are going to seek you out and take advantage of you.

I think it's useful to keep the two reasons very separate. If Pascal's mugger was a force of nature - a new theory of physics, maybe - then the case for keeping to expected utility maximisation may be quite strong. But when there are opponents, everything gets much more complicated - which is why game theory has thousands of published research papers, while expected utility maximisation is taught in passing in other subjects.

But does this really affect the argument? It means that someone approaching you with a Pascal's mugging today is much less likely to be honest (and much more likely to have simply read about it on Less Wrong). But that's a relatively small shift in probability, in an area where the number are already so huge/tiny.

Nevertheless, it seems that "reject Pascal's muggings (and other easily exploitable gambles)" may be a reasonable position to take, even if you agreed with the expected utility calculation. First, of course, you would gain that you reject all the human attempts to exploit you. But there's another dynamic: the "Lords of the Matrix" are players too. They propose certain deals to you for certain reasons, and fail to propose them to you for other reasons. We can model three kinds of lords:

  1. The foolish lords, who will offer a Pascal's mugging no matter what they predict your reaction will be.
  2. The sadistic lords, who will offer a deal you won't accept.
  3. The testing lords, who will offer a deal you will accept, but push you to the edge of your logic and value system.

Precommitting to rejecting the mugging burns you only with the foolish lords. The sadistic lords won't offer an acceptable deal anyway, and the testing lords will offer you a better deal if you've made such a precommitment. So the gain is the loss with (some of) the foolish lords versus a gain with the testing lords. Depending on your probability distribution over the lord types, this can be a reasonable thing to do, even if you would accept the impersonal version of the mugging.

Game Theory of the Immortals

-2 Crystalist 11 March 2013 05:47PM

I’m sure many others have put much more thought into this sort of thing -- at the moment, I’m too lazy to look for it, but if anyone has a link, I’d love to check it out.

Anyway, I ran into some interesting musings on game theory for immortal agents and I thought it was interesting enough to talk about.

Cooperation in games like the iterated Prisoner’s Dilemma is partly dependent on the probability of encountering the other player again. Axelrod (1981) gives the payoff for a sequence of 'cooperate's as R/(1-p) where R is the payoff for cooperating, and p is a discount parameter that he takes as the probability of the players meeting again (and recognizing each other, etc.). If you assume that both players continue playing for eternity in a randomly mixing, finite group of other players, then the probability of encountering the other player again approaches 1, and the payoff for an extended period of cooperation approaches infinity.

So, take a group of rational, immortal agents, in a prisoner’s dilemma game. Should we expect them to cooperate?

I realize there is no optimal strategy without reference to the other players’ strategies, and that the universe is not actually infinite in time, so this is not a perfect model on at least two counts, but I wanted to look at the simple case before adding complexities.

The neural bases of behavioral game theory

7 lukeprog 22 September 2011 08:29PM

Bhatt & Camerer (2011). The cognitive neuroscience of strategic thinking. Abstract:

This chapter focuses on some emerging elements of a neuroscientific basis for behavioral game theory. The premise of this chapter is that game theory can be useful in helping to elucidate the neural basis of strategic thinking. The great strength of game theory is that it offers precision in defining what players are likely to do and suggesting algorithms of reasoning and learning. Whether people are using these algorithms can be estimated from behavior and from psychological observables (such as response times and eye tracking of attention), and used as parametric regressors to identify candidate brain circuits that appear to encode those regressors.

This review article may be particularly interesting for those who suspect that game theory may play a major role in human value, perhaps even in ways that would make it more intuitively plausible that reasonable value extrapolation algorithms can be developed.

Individual Deniability, Statistical Honesty

43 Alicorn 09 August 2011 04:17AM

If you have a lot of people to question about something, and they have a motivation to lie, consider this clever use of a six-sided die.

If the farmer tossed the die and got a one, they had to respond "yes" to the surveyor's question. If they got a six, they had to say "no." The rest of the time, they were asked to answer honestly. The die was hidden from the person who was conducting the survey, so they never knew what number the farmer was responding to.

Suddenly, the number of "yes" responses to the leopard question started coming up by more than just one-sixth.

Schneier talks about The Dishonest Minority [Link]

6 Nic_Smith 10 May 2011 05:27AM

Evolution. Morality. Strategy. Security/Cryptography. This hits so many topics of interest, I can't imagine it not being discussed here. Bruce Schneier blogs about his book-in-progress, The Dishonest Minority:

Humans evolved along this path. The basic mechanism can be modeled simply. It is in our collective group interest for everyone to cooperate. It is in any given individual's short-term self interest not to cooperate: to defect, in game theory terms. But if everyone defects, society falls apart. To ensure widespread cooperation and minimal defection, we collectively implement a variety of societal security systems.

I am somewhat reminded of Robin Hanson's Homo Hypocritus writings from the above, although it is not the same. Schneier says that the book is basically a first draft at this point, and might still change quite a bit. Some of the comments focus on whether "dishonest" is actually the best term to use for defecting from social norms.

Introduction to Game Theory (Links)

9 [deleted] 15 December 2010 02:14AM

Reading the What topics would you like to see more of on LessWrong? thread gave me the impression that many people here would appreciate introductory material to several of the topics that are often discussed in lesswrong. I have therefore decided to link in the direction of the ECON 159 course lectures at Open Yale courses and YouTube and to the Game Theory 101 video series in hopes that people will find them useful.