Mixed strategy Nash equilibrium

Meni_Rosenfeld

60 Mixed strategy Nash equilibrium

16th Oct 2010

5 min read

60

Inspired by: Swords and Armor: A Game Theory Thought Experiment

Recently, nick012000 has posted Swords and Armor: A Game Theory Thought Experiment. I was disappointed to see many confused replies to this post, even after a complete solution was given by Steve_Rayhawk. I thought someone really ought to post an explanation about mixed strategy Nash equilibria. Then I figured that that someone may as well be me.

I will assume readers are familiar with the concepts of a game (a setting with several players, each having a choice of strategies to take and a payoff which depends on the strategies taken by all players) and of a Nash equilibrium (an "optimal" assignment of strategies such that, if everyone plays their assigned strategy, no player will have a reason to switch to a different strategy). Some games, like the famous prisoner's dilemma, have a Nash equilibrium in so-called "pure strategies" (as opposed to mixed strategies, to be introduced later). Consider, however, the following variant of the matching pennies game:

Player 1 is a general leading an attacking army, and player 2 is the general of the defending army. The attacker can attack from the east or west, and the defender can concentrate his defenses on the east or west. By the time each side learns the strategy of its enemy, it is too late to switch strategies. Attacking where the defenses aren't concentrated gives a great advantage; additionally, due to unspecified tactical circumstances, attacking from the east gives a slight advantage. The sides have no interest in cooperating, so this is a zero-sum game (what one side wins, the other loses).

This elaborate description can be summarized in the following payoff matrix (these payoffs are for the attacker; the defender's payoffs are their negatives):

	2: East	2: West
1: East	-1	2
1: West	1	-2

What strategy should each side play? The attacker can think, "Overall, going east is advantageous. So I'll go east." The defender, anticipating this, might say, "Surely they will go east, so that's where I'll wait for them." But after some deliberation, the attacker will realize this, and say "They will expect me from the east! I'll surprise them and go west." You can see where this is going - the defender will think "they'll try to be smart and go west; I'll be smarter and be ready for them", the attacker will think "I was right the first time, east is the way to go" and so on, ad infinitum.

Indeed, looking at the table does not reveal any obvious Nash equilibrium. Every assignment of strategies, represented as a square in the table, will leave one side preferring the alternative. So, are the sides deadlocked in an infinite recursion of trying to outsmart each other? No. They have the option of choosing a strategy randomly.

Suppose the attacker decides to toss a coin, and go east if it lands heads, and west for tails. Suppose also that the defender, with his mastery of psychology, manages to accurately predict the attacker's bizarre behavior. What can he do with this knowledge? He cannot predict the outcome of a coin toss, so he makes do with calculating the expected outcome for each of his (the defender's) strategies. And it can be easily seen that no matter what the defender does, the expectation is 0.

Similarly, suppose the defender consults his preferred (P)RNG so that he defends east with probability 2/3, west with probability 1/3, and suppose that the attacker anticipates this. Again, either of the attacker's strategies will give an expected gain of 0.

This randomized behavior, which is a combination of strategies from which one is randomly chosen with specified probabilities, is called a "mixed strategy". They are typically denoted as a vector listing the probability for choosing each pure strategy, so the defender's suggested strategy is (2/3, 1/3).

What we have seen here, is that by a clever choice of mixed strategy, each side can make sure they cannot be outsmarted! By constraining ourselves to rely on randomness for deciding on an action, we denied our opponent the ability to predict it and counter it. If yourself you don't know what you'll do, there's no way your opponent will know. We conclude that sometimes, acting randomly is the rational action to take.

As we've seen, for each player we have that if he uses his suggested mixed strategy, then it doesn't matter what the other player will do. The corollary is that if the attacker plays (1/2, 1/2) and the defender plays (2/3, 1/3), then no player will have a reason to switch to a different strategy - this is a Nash equilibrium!

Some more points of interest: "Just" randomizing won't do - you have to pick the right probabilities. The exact probabilities of the mixed strategy Nash equilibria, and the resulting payoff, depend on the specifics of the payoff matrix. In my example, the defender needs a high probability of defending east to prevent the attacker from exercising his advantage, but the symmetry is such that the attacker chooses with even odds. In games with more strategies per player, an equilibrium mixed strategy may be supported on (have positive probability for) all, several, or one of the pure strategies. If all players apply a Nash equilibrium, then any player can switch to any one of the pure strategies supporting his mixed strategy without changing his expected payoff; switching to any other strategy either decreases the payoff or leaves it unchanged.

Perhaps most interesting of all is Nash's theorem, saying that every finite game has a mixed strategy Nash equilibrium! Our original problem, that some games have no equilibrium, is solved completely once we move to mixed strategies.

One thing should be kept in mind. A Nash equilibrium strategy, much like a minimax strategy, is "safe". It makes sure your expected payoff won't be too low no matter how clever your opponent is. But what if you don't want to be safe - what if you want to win? If you have good reason to believe you are smarter than your opponent, that he will play a non-equilibrium strategy you'll be able to predict, then go ahead and counter that strategy. Nash equilibria are for smart people facing smarter people.

In fact, it is possible that some of the comments I called confused earlier were fully aware of these issues and coming from this "post-Nash" view. To them I apologize.

Examples where mixed strategies are crucial are plentiful. I'll give one more - the volunteer's dilemma. A group of people are about to suffer greatly from some misfortune, unless at least one of them volunteers to take a slightly inconvenient action. They cannot communicate to coordinate the volunteering. If everyone uses the same deterministic strategy, then either all or none will volunteer, neither of which is stable. But they have a mixed strategy equilibrium which, by carefully balancing the risks of needlessly volunteering and having nobody volunteer, leaves everyone with an expected penalty equal to just the inconvenience of volunteering. Not as good as having just one person volunteer, but at least it's stable.

For further reading, Wikipedia has many articles about game theory and Nash equilibria. Also, Perplexed made this related comment recently.

Game Theory

Frontpage

60

New Comment

Rendering 0/47 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:51 AM

Moderation Log

60 Mixed strategy Nash equilibrium

by Meni_Rosenfeld

16th Oct 2010

5 min read

60

	2: East	2: West
1: East	-1	2
1: West	1	-2

In fact, it is possible that some of the comments I called confused earlier were fully aware of these issues and coming from this "post-Nash" view. To them I apologize.

For further reading, Wikipedia has many articles about game theory and Nash equilibria. Also, Perplexed made this related comment recently.

Game Theory

Frontpage

60

Mentioned in

174References & Resources for LessWrong

New Comment

Rendering 0/47 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 7:51 AM

Moderation Log

More from Meni_Rosenfeld

Curated and popular this week

47Comments

Comment Permalink

JenniferRM16y30

The true limit would be 0 if everyone was rational, and aware of the rationality of everyone else, but rational people in the real world should be taking into account...

the fact that most people only seem to be able cope with two or three recursions before they get bored and pick an option

So what you should do, based on that, is try to figure out how many iterations "most people" will do, and then estimate the smaller percentage of the "rank one pragmatic rationalists" who will realize this and aim for 2/3 of "most people" and so on until you have accounted for 100% of the population. The trick is knowing that some people aren't logical means logical strategy (that will actually win) requires population modeling rather than game theory. Hearing your average of 30 makes me hypothesize that the distribution of people looks something like this:

0.1% 100 Blind number repeaters

4.9% 66 Blind multipliers

17% 44 Beating the blind!

28% 30 "most people"

20% 20 Read a study or use a "recurse three times" heuristic

15% 13 Contrarian rationalists

10% 9 Meta-contrarian rationalists

4.9% 6 Economists (I kid!)

0.1% 0 Obstinate game theorists

Suppose you run the experiment again (say, on a mixture of people from LW who have read this combined with others) where you expect that a lot of people might have actually seen this hypothesis, but a lot of people will naively play normally. I think the trick might be to figure out what the percentage of LWers you're dealing with is and then figure out what to do based on that.

I'm tempted (because it would be amusing) to try to estimate the percentage of LWers relative to naive, and then model the LWers as people who execute some variant of timeless decision theory in the presence of the baseline people. If I understand timeless decision theory correctly, the trick would be for everyone to independently derive the number all the LWers would have to pick in order for us all to tie at winning given the presence of baselines. It seems kind of far fetched, but it would be ticklesome if it ever happened :-D

OK, now I'm curious! I think I might try this at the next SoCal meetup.

ETA: ...or does utilitarian consequentialism mean we need to calculate the value we'd all have to vote to maximize the number of winners, whether or not they were expected to participate in the calculation? That sounds even harder.

JenniferRM16y10

This is a follow up to my comment just above. At the meetup this game was played with six people: me, three other people deeply read in the sequences, and two people who hadn't read the sequences.

The results were: four people guessed 0, a deeply read person guessed 3, and one of the people not familiar with the sequences guessed 23. The target was (26/6)*(2/3) = 2 and 8/9ths, so the guess of 3 won with an error of only 1/9th.

I'll leave the other people to explain what their thinking was if they want, but I directly estimated "the obvious thing"... (read more)

See in context