Two Notions of Best Response

Diffractor

In game theory, there are two different notions of "best response" at play. Causal best-response corresponds to standard game-theoretic reasoning, because it assumes that the joint probability distribution over everyone else's moves remains unchanged if one player changes their move. The second one, Evidential best-response, can model cases where the actions of the various players are not subjectively independent, such as Death in Damascus, Twin Prisoner's Dilemma, Troll Bridge, Newcomb, and Smoking Lesion, and will be useful to analyze the behavior of logical inductors in repeated games. This is just a quick rundown of the basic properties of these two notions of best response.

#Causal Best-Response:

Let $S_{1}$ be the set of actions available to player 1. $| S_{1} | = m$ . Given a matrix $M$ , $M_{j}$ refers to the $j$ -th row vector of $M$ . Given some vector $\to v$ , $| \to v |$ is defined in the usual way.

Consider a joint probability distribution over the outcomes of every player other than player 1 and index the set of all outcomes by i. There are $n$ possible joint outcomes from everyone else.

This produces a vector ${\to p}^{*}$ in an $n$ -dimensional vector space. The $*$ superscript denotes that this is not a full joint probability distribution, as it overlooks the probabilities of player 1's actions.

Now, consider an $n \times m$ matrix $U$ , where $U_{i, j}$ is a number in $[0, 1]$ . This is the utility matrix.

An action $a_{j^{'}}$ is a causal best response by player 1 iff

$\forall a_{j} \in S_{1} : (U \times {\to p}^{*})_{j^{'}} \geq (U \times {\to p}^{*})_{j}$

Let $B_{a_{j^{'}}}^{*}$ be the set of probability vectors for which $a_{j^{'}}$ is a best response. This set is convex. Proof:

Select an arbitrary $a_{j} \in S_{1}$ . By definition of $a_{j}$ being the best response for the two vectors ${\to p}^{*}$ and ${\to q}^{*}$ ,

$(U \times {\to p}^{*})_{j^{'}} \geq (U \times {\to p}^{*})_{j}$ $(U \times {\to q}^{*})_{j^{'}} \geq (U \times {\to q}^{*})_{j}$ By multiplying these equations by $x$ and $(1 - x)$ , and adding them together, $(x (U \times {\to p}^{*}) + (1 - x) (U \times {\to q}^{*}))_{j^{'}} \geq (x (U \times {\to p}^{*}) + (1 - x) (U \times {\to q}^{*}))_{j}$ And then, by the distributive laws of matrix multiplication, we can move the scalars in, and pull out $U$ , to yield... $(U \times (x {\to p}^{*} + (1 - x) {\to q}^{*}))_{j^{'}} \geq (U \times (x {\to p}^{*} + (1 - x) {\to q}^{*}))_{j}$ And because $a_{j}$ was arbitrary, the result follows.

#Evidential Best Response

Consider a joint probability distribution over the outcomes of every player, and index the entries by i (for every other player), and j(for player 1).

This produces a vector $\to p$ in an $n m$ -dimensional vector space (which it is convenient to view as an $n \times m$ matrix $P$ ).

Now, consider the same $n \times m$ utility matrix $U$ as before.

The condition for $a_{j^{'}}$ being a best response is now a bit more complicated, because of the possibility of all-0 rows in $P$ . This will be handled by lower and upper bounds on the expected utility of a 0-probability event.

If $P_{j} \neq \to 0$ , then $E U_{l o w e r} (a_{j}) = E U_{u p p e r} (a_{j}) = E U (a_{j}) = \frac{U_{j} \cdot P_{j}}{| P_{j} |}$ . If $P_{j} = \to 0$ , then $E U_{l o w e r} (a_{j}) = min (U_{j})$ $E U_{u p p e r} (a_{j}) = max (U_{j})$

Now, the condition for $a_{j^{'}}$ being a "best response" is as follows: $\forall a_{j} \in S_{1} : E U_{u p p e r} (a_{j^{'}}) \geq E U_{l o w e r} (a_{j})$

This condition of "best response" corresponds to there being some perturbation of $P$ by a nonstandard $ϵ$ , such that expected utility conditional on all rows is now well-defined, and move $a_{j^{'}}$ doesn't have lower expected utility (which is now well-defined by the equation for nonzero probability actions) than any other row. This perturbation could be thought of as the expected utility of density-zero exploration into the action $a_{j^{'}}$ .

The set of joint probability distributions where $a_{j^{'}}$ is a best action, $B_{a_{j^{'}}}$ is not convex, although it is compact.

Technically, this definition of best response applies to any matrix with nonzero entries, even if they don't sum up to 1. This will be important to deal with logical inductors, when the prices on the various outcomes don't have exact coherence (although the logical inductor prices will limit to the subset of the $n m$ -dimensional hypercube that corresponds to probability distributions)

A matrix $P$ is subjectively independent of player 1 when all the row vectors lie in a 1-dimensional subspace. This means that, conditional on any action with nonzero probability, the probability distribution on the moves of everyone else remains the same. If $P$ is subjectively independent of player 1, then there's a canonical choice of vector ${\to p}^{*}$ obtained by taking an arbitrary nonzero row vector and renormalizing it so it sums to 1. The causal best response will also be an evidential best response in this case. Subjective independence is the condition necessary for a causal best response to be a correct approximation.

Interestingly enough, a convex combination of subjectively independent joint probability distributions may not be subjectively independent.

Both nonconvexity of the set of best responses, and nonconvexity of subjectively independent distributions, will be proved, by exhibiting a case where there are two subjectively independent distributions where move 1 is a best response, and a 50/50 mixture of the two probability distributions is both not subjectively independent, and move 1 is not a best response. The example was inspired by the smoking lesion problem.

The utility matrix $U$ is

$[\begin{matrix} 0.1 & 0.9 0 & 1 \end{matrix}]$

The probability matrix $P$ is

$[\begin{matrix} 0.8 (1 - ϵ) & 0.2 (1 - ϵ) 0.8 ϵ & 0.2 ϵ \end{matrix}]$

The probability matrix $Q$ is

$[\begin{matrix} 0.25 & 0.25 0.25 & 0.25 \end{matrix}]$

To tell a story, let's say that it is a game against a wizard, and on even-numbered turns, there is an 80% chance of him inflicting a curse on player 1, and on odd-numbered turns, there is a 50% chance of him inflicting a curse on player 1. Seperately, the player can eat a toad (or not) (row 1 and 2, respectively). Eating a toad diminishes the effectiveness of the curse, but the player would prefer not to eat the toad. On even-numbered turns, the best move is to eat the toad, so not eating the toad occurs with $ϵ$ probability. ( $ϵ$ in this case should be interpreted as a small real number, to show that this isn't just an issue with conditioning on probability 0 events) On odd-numbered turns, player 1 is indifferent between eating the toad or not, so the player does it with 50% probability. In both situations, eating the toad yields the same or higher expected utility than not doing so.

$0.5 P + 0.5 Q$ is

$[\begin{matrix} 0.525 - .4 ϵ & .225 - .1 ϵ 0.125 + .4 ϵ & .125 + .1 ϵ \end{matrix}]$

This is obviously not subjectively independent of player 1's actions, and furthermore, the expected utility of eating the toad is ~0.34, which is less than the expected utility of not eating the toad, which is ~0.5.

[-]Johannes Treutlein6yΩ230

Wolfgang Spohn develops the concept of a "dependency equilibrium" based on a similar notion of evidential best response (Spohn 2007, 2010). A joint probability distribution $P$ is a dependency equilibrium if all actions of all players that have positive probability are evidential best responses. In case there are actions with zero probability, one evaluates a sequence $(P^{(i)})_{i \in N}$ of joint probability distributions such that ${lim}_{i \to \infty} P^{(i)} = P$ and $P^{(i)} (a) \neq 0$ for all actions $a$ and $i \in N$ . Using your notation of a probability matrix and a utility matrix, the expected utility of an action $a_{j}$ is then defined as the limit of the conditional expected utilities, $lim i \to \infty \frac{U_{j} P_{j}^{(i)}}{| P_{j}^{(i)} |}$ (which is defined for all actions). Say $P$ is a probability matrix with only one zero column, $P_{j}$ . It seems that you can choose an arbitrary nonzero vector $Q_{j}$ , $| Q_{j} | = 1$ to construct, e.g., a sequence of probability matrices $(\frac{i - 1}{i} P + [0, \dots, 0, \frac{1}{i} Q_{j}, 0, \dots, 0])_{i \in N} .$ The expected utilities in the limit for all other actions and the actions of the opponent shouldn't be influenced by this change. So you could choose $Q_{j}$ as the standard vector $e_{i}$ where $i$ is an index such that $U_{j, i} = min U_{j}$ . The expected utility of $a_{j}$ would then be $min U_{j}$ . Hence, this definition of best response in case there are actions with zero probability probably coincides with yours (at least for actions with positive probability—Spohn is not concerned with the question of whether a zero probability action is a best response or not).

The whole thing becomes more complicated with several zero rows and columns, but I would think it should be possible to construct sequences of distributions which work in that case as well.