Freaky Fairness

cousin_it

14 Freaky Fairness

by cousin_it

25th Jul 2009

4 min read

14

Consider this game:

$\begin{bmatrix}4,1&1,3\\2,4&3+\epsilon_1,2+\epsilon_2\end{bmatrix}$

where the last payoff pair is very close to (3,2). I choose a row and you choose a column simultaneously, then I receive the first payoff in a pair and you receive the second. The game has no Nash equilibria in pure strategies, but that's beside the point right now because we drop the competitive setting and go all cooperative: all payoffs are in dollars and transferable, and we're allowed beforehand to sign a mutually binding contract about the play and the division of revenue. The question is, how much shall we win and how should we split it?

Game theory suggests we should convert the competitive game to a coalitional game and compute the Shapley value to divide the spoils. (Or some other solution concept, like the "nucleolus", but let's not go there. Assume for now that the Shapley value is "fair".) The first step is to assign a payoff to each of the 2^N = 4 possible coalitions. Clearly, empty coalitions should receive 0, and the grand coalition (me and you) gets the maximum possible sum: 6 dollars. But what payoffs should we assign to the coalition of me and the coalition of you?

Now, there are at least two conflicting approaches to doing this: alpha and beta. The alpha approach says that "the value a coalition can get by itself" is its security value, i.e. the highest value it can win guaranteed if it chooses the strategy first. My alpha value is 2, and yours is 2+ϵ₂. The beta approach says that "the value a coalition can get by itself" is the highest value that it cannot be prevented from winning if it chooses its strategy second. My beta value is 3+ϵ₁, and yours is 3.

Astute readers already see the kicker: the Shapley value computed from alphas assigns 3-ϵ₂/2 dollars to me and 3+ϵ₂/2 dollars to you. The Shapley value of betas does the opposite for ϵ₁. So who owes whom a penny?

That's disturbing.

Aha, you say. We should have considered mixed strategies when computing alpha and beta values! In fact, if we do so, we'll find that my alpha value equals my beta value and your alpha equals your beta, because that's true for games with mixed strategies in general (a result equivalent to the minimax theorem). My security value is (10+4ϵ₁)/(4+ϵ₁), and yours is (10-ϵ₂)/(4-ϵ₂).

This still means the signs of the epsilons determine who owes whom a penny. That's funny because, if you plot the game's payoffs, you will see that the game isn't a quadrilateral like the PD; it's a triangle. And the point (3+ϵ₁,2+ϵ₂) that determines the outcome, the point that we can ever-so-slightly wiggle to change who of us gets more money... lies inside that triangle. It can be reached by a weighted combination of the other three outcomes.

That's disturbing too.

...

Now, this whole rambling series of posts was spurred by Eliezer's offhand remark about "AIs with knowledge of each other's source code". I formalize the problem thus: all players simultaneously submit programs that will receive everyone else's source code as input and print strategy choices for the game as output. The challenge is to write a good program without running into the halting problem, Rice's theorem or other obstacles.

Without further ado I generalize the procedure described above and present to you an algorithm Freaky Fairness — implementable in an ordinary programming language like Python — that achieves a Nash equilibrium in algorithms and a Pareto optimum simultaneously in any N-player game with transferable utility:

Calculate the security values in mixed strategies for all subsets of players.
Divide all other players into two groups: those whose source code is an exact copy of Freaky Fairness (friends), and everyone else (enemies).
If there are no enemies: build a Shapley value from the computed security values of coalitions; play my part in the outcome that yields the highest total sum in the game; give up some of the result to others so that the resulting allocation agrees with the Shapley value.
If there are enemies: play my part in the outcome that brings the total payoff of the coalition of all enemies down to their security value.

Proof that all players using this algorithm is a Nash equilibrium: any coalition of players that decides to deviate (collectively or individually) cannot win total payoff greater than their group security value, by point 4. If they cooperate, they collectively get no less than their group security value, by superadditivity and construction of Shapley value.

(NB: we have tacitly assumed that all payoffs in the game are positive, so the Shapley value makes sense. If some payoffs are negative, give everyone a million dollars before the game and take them away afterward; both the Shapley value and the minimax survive such manipulations.)

In retrospect the result seems both obvious and startling. Obvious because it closely follows the historically original derivation of the Shapley value. Startling because we're dealing with a class of one-shot competitive games: players enter their programs blindly, striving to maximize only their own payoff. Yet all such games turn out to have Nash equilibria that are Pareto-optimal, and in pure strategies to boot. Pretty neat, huh?

I've seriously doubted whether to post this or not. But there might be mistakes, and many eyes will be more likely to spot them. Critique is welcome!

UPDATE 12.01.2011: benelliott found a stupid mistake in my result, so it's way less applicable than I'd thought. Ouch.

FairnessGame Theory

Personal Blog

14

New Comment

Rendering 0/36 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:22 AM

Moderation Log

14 Freaky Fairness

by cousin_it

25th Jul 2009

4 min read

14

Consider this game:

$\begin{bmatrix}4,1&1,3\\2,4&3+\epsilon_1,2+\epsilon_2\end{bmatrix}$

That's disturbing.

That's disturbing too.

...

Calculate the security values in mixed strategies for all subsets of players.
Divide all other players into two groups: those whose source code is an exact copy of Freaky Fairness (friends), and everyone else (enemies).
If there are no enemies: build a Shapley value from the computed security values of coalitions; play my part in the outcome that yields the highest total sum in the game; give up some of the result to others so that the resulting allocation agrees with the Shapley value.
If there are enemies: play my part in the outcome that brings the total payoff of the coalition of all enemies down to their security value.

I've seriously doubted whether to post this or not. But there might be mistakes, and many eyes will be more likely to spot them. Critique is welcome!

UPDATE 12.01.2011: benelliott found a stupid mistake in my result, so it's way less applicable than I'd thought. Ouch.

FairnessGame Theory

Personal Blog

14

Mentioned in

28Freaky unfairness

22An Alternative Approach to AI Cooperation

New Comment

Rendering 0/36 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 8:22 AM

Moderation Log

More from cousin_it

Curated and popular this week

36Comments

Comment Permalink

cousin_it17y120

This merits a long response, please bear with me...

Deep Blue wasn't built on the single insight of searching the game tree. It incorporates a whole lot of grandmaster-level advice from humans. Computers did not (and AFAIK cannot yet) regenerate the opening book. Also, perhaps more relevantly to AI, the evaluation function for positions is mostly designed by human master players, not inferred all the way backward from winning and losing positions.

One important insight like the minimax tree rarely takes you all the way to victory. We nibble away at the problem, taking out small digestible chunks. Game theory (or Bayesian statistics, or any other insight as of now) doesn't explain all of human behavior, but it's a first step without which further steps would've been that much harder.

See, I have this persistent idea - maybe stemming from my math upbringing - that the way to advancement of knowledge leads through small problems conclusively solved. That as beliefs should pay rent in predictions, so directions of inquiry should pay rent in small problems conclusively solved, building on one another. Failing to meet this standard is a pattern of FAIL that I've seen many, many times, and no doubt so did you. And even if you see your quest as reductionism, searching for the small computational core that would suffice to regenerate game theory and insights beyond, you'd still do well to heed the pattern.

A long quote from my textbook on game theory, Ken Binmore's "Fun and Games":

People who do not understand how human knowledge is advanced are often scornful of such claims. What is the point, they say, of studying a silly parlor game like Poker if the aim is to gain insight into the broad sweep of human affairs? I like to characterize such folk as naive positivists. They have seldom received any serious mathematical training. Otherwise they would have learned that difficult problems do not commonly succumb to frontal attacks. One has to lay siege to them. In graduate school, mathematicians are taught to look at simpler versions of their problems first. Examples help to rid the mind of misconceptions and clarify what the difficulties are. Such a search for insight may take one far afield. A political scientist may end up studying the Prisoners' Dilemma. Economists may look at the literature on Poker models. Naive positivists will think their labors ridiculous. But this is because naive positivists do not understand that nobody ever solved a difficult problem who had not learned to solve simple problems first.

Maybe I should've made it a toplevel post, but it sounds too much like dancing about architecture. Better leave it in a comment and think of something mathy to post instead :-)

CronoDAS17y140

Interestingly, recent advances in computer Go have come from ignoring much human expertise and incorporating Monte Carlo approaches into the evaluation function.

It works like this:

1) Choose the move to be evaluated.
2a) Search the game tree by having players make random moves for many turns after the initial move is made.
2b) Score each randomly generated "far future" game state by using a very simple evaluation function, such as counting the number of stones on the board of each color.
3) Repeat steps 2a and 2b lots of times, generating lots

... (read more)

See in context