In my experience, constant-sum games are considered to provide "maximally unaligned" incentives, and common-payoff games are considered to provide "maximally aligned" incentives. How do we quantitatively interpolate between these two extremes? That is, given an arbitrary payoff table representing a two-player normal-form game (like Prisoner's Dilemma), what extra information do we need in order to produce a real number quantifying agent alignment?
If this question is ill-posed, why is it ill-posed? And if it's not, we should probably understand how to quantify such a basic aspect of multi-agent interactions, if we want to reason about complicated multi-agent situations whose outcomes determine the value of humanity's future. (I started considering this question with Jacob Stavrianos over the last few months, while supervising his SERI project.)
Thoughts:
- Assume the alignment function has range or .
- Constant-sum games should have minimal alignment value, and common-payoff games should have maximal alignment value.
- The function probably has to consider a strategy profile (since different parts of a normal-form game can have different incentives; see e.g. equilibrium selection).
- The function should probably be a function of player A's alignment with player B; for example, in a prisoner's dilemma, player A might always cooperate and player B might always defect. Then it seems reasonable to consider whether A is aligned with B (in some sense), while B is not aligned with A (they pursue their own payoff without regard for A's payoff).
- So the function need not be symmetric over players.
- The function should be invariant to applying a separate positive affine transformation to each player's payoffs; it shouldn't matter whether you add 3 to player 1's payoffs, or multiply the payoffs by a half.
The function may or may not rely only on the players' orderings over outcome lotteries, ignoring the cardinal payoff values. I haven't thought much about this point, but it seems important.EDIT: I no longer think this point is important, but rather confused.
If I were interested in thinking about this more right now, I would:
- Do some thought experiments to pin down the intuitive concept. Consider simple games where my "alignment" concept returns a clear verdict, and use these to derive functional constraints (like symmetry in players, or the range of the function, or the extreme cases).
- See if I can get enough functional constraints to pin down a reasonable family of candidate solutions, or at least pin down the type signature.
So, something like "fraction of preferred states shared" ? Describe preferred states for P1 as cells in the payoff matrix that are best for P1 for each P2 action (and preferred stated for P2 in a similar manner) Fraction of P1 preferred states that are also preferred for P2 is measurement of alignment P1 to P2. Fraction of shared states between players to total number of preferred states is measure of total alignment of the game.
For 2x2 game each player will have 2 preferred states (corresponding to the 2 possible action of the opponent). If 1 of them will be the same cell that will mean that each player is 50% aligned to other (1 of 2 shared) and the game in total is 33% aligned (1 of 3), This also generalize easily to NxN case and for >2 players.
And if there are K multiple cells with the same payoff to choose from for some opponent action we can give 1/K to them instead of 1.
(it would be much easier to explain with a picture and/or table, but I'm pretty new here and wasn't able to find how to do them here yet)
Thanks for careful analysis, I must confess that my metric does not consider the stochastic strategies, and in general works better if players actions are taken consequently, not simultaneously (which is much different from the classic description).
The reasoning being that for maximal alignment each action of P1 there exist exactly one action of P2 (and vice versa) that is Nash equilibrium. In this case the game stops in stable state after single pair of actions. And maximally unaligned game will have no nash equilibrium at all, meaning the players actions... (read more)