The most intuitively natural thing to define utility functions over is outcomes or worlds. The problem with this approach is that it results in an excessive amount of utility functions.
For instance, imagine the agent has two actions a1 and a2. The action a1 will map straight to world w1, while a2 will map to a complicated probability distribution over a large set of worlds W2. The agent gets no further action.
If we see the set of worlds as W={w1}∪W2, then we have a vast amount of possible utility functions. Whereas in practice only two parts of the utility function matters: its value on w1, and its expected value on W2.
An agent that knows those values (technically, that knows the ordering of those values) for a single utility function, can reach a decision. Therefore it would seem perverse to have things change when the agent starts to have uncertainty over its utility.
What would make sense is to have utilities defined over expected outcomes -- ie over {w1} and W2 -- but that definition doesn't seem to work in more general situations.
Histories
Model the agent as interacting n times with the world. It gets an observation, then follows that by an action until it takes its last action an. Call the set of n alternating observations and actions a history, and let H be the set of histories. We can also define partial histories, sequences h=o1a1o2a2…om of length m<n.
Note this inverts the usual order, by starting with observations and following with actions. This is because the history ends with an action: since the agent can't affect anything more after than, subsequent observations are meaningless. Partial histories, though, end with observations, not actions (this is for subsequent ease of use).
Any utility function u can be seen as function from H to R, by mapping the history hn∈H to u(hn), the expected utility of the history.
So a more natural class would be to define utilities as being over H, the set of possible histories. Note that actions are included in histories, and histories with the same observations but different actions can have different utility values. This is because, eg, the observation of someone saying 'You succeed in your task!' will have different implications depending on the action just prior.
So define U as the set of utility functions (not equivalence classes of utility functions) over H.
This is better that defining U over outcomes, but still somewhat unsatisfactory, as we can have a large number of incredibly unlikely histories, and these should not affect the normalisation procedure much.
So what is really wanted is a distribution over H. But that makes no sense, as we don't know what the actions will be. So we need to look at how the agent choose its actions: at its set of strategies.
Strategies
The set of deterministic strategies S is the set of ways an agent can pick its actions, given its past history. It would be tempting to define utilities as taking values over S, but there a subtleties to deal with first.
Isomorphic strategies
Suppose the agent can choose action a1 or b1 at the first step, then a2 or b2 at the second step. The observations are o1 and o2, independently of actions chosen.
Then consider the following strategies:
s(o1)=a1, s(o1a1o2)=a2, s(o1a2o2)=a2.
s′(o1)=a1, s′(o1a1o2)=a2, s′(o1a2o2)=b2.
These strategies seem to differ: they give different behaviours on the partial history h=o1a2o2. However, because both will choose a1 after o1, the history h will never come up. Thus we will define S as consisting of equivalent classes of strategies that will always choose the same action in any situation that actually arises.
Stochastic strategies
We should briefly note that we can form stochastic strategies by setting a probability distribution over S (this defines a simplex with 'pure' elements of S as the vertexes). This set includes all possible stochastic strategies.
Utilities and strategies
This section presents a ways of connecting strategies and utilities. Starting with embedding utilities over histories into utilities over strategies:
Theorem 1: There is a linear map k from U to RS. This k need not be either injective nor surjective.
Proof: The set RS is naturally identified with the set of functions S→R.
Each element of s∈S defines a probability distribution on H, and hence an expectation for any utility function u. Write this expectation as u(s). Then the values of u(s) for different s defines the element k(u)∈RS.
The linearity of k comes from the fact that utility functions are additive: (u+v)(s)=u(s)+v(s).
To show that k need not be injective, consider the following situation, where the initial observation is o or o′ with equal probability, leading to a choice of actions {a,b} and {a′}:
Then there are two strategies -- the ones that pick a and b -- but three worlds -- those ending in a, b, and a′ -- so k maps a three dimensional space into a two dimensional one, and cannot be injective.
There are utilities in U that are independent of any strategy; the kernel of k is the subspace of these that happen to always map to utility 0.
To show that k need not be surjective, add actions c and b′ after o and o′ above:
Then the set of strategies has six elements (each strategy choose one element from among 3 elements, then another from among 2 elements), while the set of histories has five elements, corresponding to each of the five final actions. Hence k cannot be surjective from the five dimensional U to the six dimensional RS. □
Given this k, we can consider the true space of utility functions that we're interested in: the image of k, divided out by equivalence relations (translations and positive scalings). Thus the space of relevant utility classes is:
U=k(U)/∼.
Note that can translate any element of RS to have all its coordinates sum to zero, and then divide by positive scalings. This means that U=S∪{0} where S is a sphere and {0} corresponds to the utilities constant on all strategies. The topology of U is the natural topology on S, combined with the fact that the only open set containing {0} is U itself.
Stable permutations
We want to be able to interchange strategies, and still get the same result: have some sort of symmetry on the normalisation process. However, since k(U) can be a strict subset of RS, we can't just take arbitrary permutations of S. This section will instead define the subset of stable permutations. Note that any permutation of S extends to an invertable linear transformation of RS by permutation of the basis elements.
Definition: Stable permutations. A stable permutation is a permutation of S that extends to a linear map on RS that preserves k(U)⊂RS.
Since U=k(U)/∼, we can call the set of stable permutations SU.
How common are stable permutations? In general, not too uncommon and not too common, as the following two theorems will show:
Proposition 2: If S is non-trivial, then so is SU.
Proof: Consider the set of actions and possible observations as a tree. If S contains more than one element, then there is a branching point where the agent can choose between more than one action. Since the length of the branches is n and hence finite, there exists at least one branching point with no subsequent branching point: the agent has no further decisions to make.
Let h be the partial history that corresponds to that final branching point, and let a and b be possible actions at h. Then a transposition of a and b will also permute any strategies S for which h is a possible history.
But because h is a finale branching point, the expected utilities u(ha) and u(hb) are defined for any u∈U, independently of any strategy. Thus transposing a and b defines a map σ on U, with σ(u)(ha)=u(hb), σ(u)(hb)=u(ha), and u(h′) otherwise unchanged for any history h that doesn't start with ha or hb.
Hence the permutation of S corresponds to a linear automorphism of U, which preserves the image of U under k, which means that that permutation is in SU.□
In general, any tree isomorphism will correspond to an element of SU, and one can delete -- or add -- nodes with a single action before doing such an isomorphism. However, in the general case, the space of stable permutations is not that large:
Proposition 3: The set SU can be a strict subset of the set of permutations of S. And SU need not be transitive and may even have fixed points: there may exist an s∈S which is mapped to itself for all ρ∈SU.
Proof: consider the situation below, where there are no choices after o1b:
Let s be the strategy that chooses b at o1, and let u be the utility function that maps o1b to 1 and everything else to 0. Then assume that ρ exchanges s with any s′. This means that ρ(k(u)) is a function in RS that takes value 1 on a single policy s′ that choose a after o1 (as all policies but s do that), and 0 everywhere else.
Without loss of generality, let s′ choose a and a′ after o2 and o′2, respectively. Then changing a to b must reduce the expectation of s′ by 1, and similarly changing a′ to b′. Thus changing a to b and changing a′ to b′ simultaneously must, by linearity, set the expected utility of that policy to −1. Consequently, ρ(k(u)) cannot be an element of k(U), and hence ρ∉SU.
Thus every element of SU must map s to itself.□
Extra results and concepts
This section presents some miscellaneous extra results and concepts within this formalism.
Utilities and strategies
Each s∈S generates Hs⊂H which is the set of histories which have non-zero probability given s.
Lemma 4: If s≠s′, then Hs≠Hs′.
Proof: since s≠s′, there must be some partial history h, of non-zero probability given s or s′, where s will choose action a while s′ will choose action a′ (in the extreme case, h may be ∅). Thus s must have at least one possible history starting with ha while s′ can have none (and vice versa for ha′). □
Proposition 5: Each s has some [u]∈U of which it is the unique maximising strategy for.
Proof: Define us∈U as mapping Hs to 1 and the rest of H to 0. Thus the expectation us(s)=1, and us(s′)<1 for s≠s′, since s′ has a non-zero probability of reaching a history outside Hs, hence a history where us has expected utility 0. The argument extends to mixed strategies, and the relevant utility class is [k(us)].□
Partial histories
For any partial history h (or ha, if we want it to end with an action), define Hh (Hha) as the set of histories that start with h (or ha).
Let Sh be the set of strategies on Hh, and S(h)⊂S the set of strategies that allow the history h. Then there is a natural projection qh:S(h)→Sh by mapping each strategy to its equivalence class of strategies on Hh. The same holds for ha.
Let S−ha be a set of partial strategies, defined as strategies defined on H−Hha that allow ha to happen with non-zero probability. Then S(ha)=S−ha×Sh.
Let Uha be the set of utility functions on Hha, and kha the projection Uha→Sha from Theorem 1. We can define Uha as k(Uha)/∼, similarly to above.
Let Oha be the set of possible observations the agent can make just after ha. Then there is a decomposition Sha=×o∈OhaShao.
Proposition 6: There is a natural map rha from U to Uha.
Proof: for u∈U, s∈Sha and s′∈S−ha, define rha(u)(s) as u(s×s′).
Now replace s′ with s′′, also in S−ha, and compare, using standard conditional probability and conditional expected utility:
Note that the probability of ha is independent of s (since s∈Sha), and, since s′ and s′′ are deterministic, P(ha|s′)=P(ha|s′′). Similarly u(s×s′|ha) and u(s×s′|¬ha) are independent of s′ and and s, respectively -- and the same goes for the s′′ expression.
where Cs′,s′′ is a constant depending only on s′ and s′′.
Consequently, the map rha defined above changes by a constant when s′ changes. Similarly, if we change u by changing its definition on H−Hha only, then rha also changes by a constant, as u(s′|¬ha) changes.
Thus, up to constants, rha is a well defined map from U to kha(Uha). Since it only uses the expectation of u on S, this makes it a well defined map from k(U) to kha(Uha).
Then if we divide by translations (and scalings), the constant factors drop out, so rha is a well defined map from U to Uha.□
This post seeks to clarify the question of what utility functions should be defined over, in the situation where an agent has uncertainty over their own utility function.
Outcomes or worlds
The most intuitively natural thing to define utility functions over is outcomes or worlds. The problem with this approach is that it results in an excessive amount of utility functions.
For instance, imagine the agent has two actions a1 and a2. The action a1 will map straight to world w1, while a2 will map to a complicated probability distribution over a large set of worlds W2. The agent gets no further action.
If we see the set of worlds as W={w1}∪W2, then we have a vast amount of possible utility functions. Whereas in practice only two parts of the utility function matters: its value on w1, and its expected value on W2.
An agent that knows those values (technically, that knows the ordering of those values) for a single utility function, can reach a decision. Therefore it would seem perverse to have things change when the agent starts to have uncertainty over its utility.
What would make sense is to have utilities defined over expected outcomes -- ie over {w1} and W2 -- but that definition doesn't seem to work in more general situations.
Histories
Model the agent as interacting n times with the world. It gets an observation, then follows that by an action until it takes its last action an. Call the set of n alternating observations and actions a history, and let H be the set of histories. We can also define partial histories, sequences h=o1a1o2a2…om of length m<n.
Note this inverts the usual order, by starting with observations and following with actions. This is because the history ends with an action: since the agent can't affect anything more after than, subsequent observations are meaningless. Partial histories, though, end with observations, not actions (this is for subsequent ease of use).
Any utility function u can be seen as function from H to R, by mapping the history hn∈H to u(hn), the expected utility of the history.
So a more natural class would be to define utilities as being over H, the set of possible histories. Note that actions are included in histories, and histories with the same observations but different actions can have different utility values. This is because, eg, the observation of someone saying 'You succeed in your task!' will have different implications depending on the action just prior.
So define U as the set of utility functions (not equivalence classes of utility functions) over H.
This is better that defining U over outcomes, but still somewhat unsatisfactory, as we can have a large number of incredibly unlikely histories, and these should not affect the normalisation procedure much.
So what is really wanted is a distribution over H. But that makes no sense, as we don't know what the actions will be. So we need to look at how the agent choose its actions: at its set of strategies.
Strategies
The set of deterministic strategies S is the set of ways an agent can pick its actions, given its past history. It would be tempting to define utilities as taking values over S, but there a subtleties to deal with first.
Isomorphic strategies
Suppose the agent can choose action a1 or b1 at the first step, then a2 or b2 at the second step. The observations are o1 and o2, independently of actions chosen.
Then consider the following strategies:
These strategies seem to differ: they give different behaviours on the partial history h=o1a2o2. However, because both will choose a1 after o1, the history h will never come up. Thus we will define S as consisting of equivalent classes of strategies that will always choose the same action in any situation that actually arises.
Stochastic strategies
We should briefly note that we can form stochastic strategies by setting a probability distribution over S (this defines a simplex with 'pure' elements of S as the vertexes). This set includes all possible stochastic strategies.
Utilities and strategies
This section presents a ways of connecting strategies and utilities. Starting with embedding utilities over histories into utilities over strategies:
Theorem 1: There is a linear map k from U to RS. This k need not be either injective nor surjective.
Proof: The set RS is naturally identified with the set of functions S→R.
Each element of s∈S defines a probability distribution on H, and hence an expectation for any utility function u. Write this expectation as u(s). Then the values of u(s) for different s defines the element k(u)∈RS.
The linearity of k comes from the fact that utility functions are additive: (u+v)(s)=u(s)+v(s).
To show that k need not be injective, consider the following situation, where the initial observation is o or o′ with equal probability, leading to a choice of actions {a,b} and {a′}:
Then there are two strategies -- the ones that pick a and b -- but three worlds -- those ending in a, b, and a′ -- so k maps a three dimensional space into a two dimensional one, and cannot be injective.
There are utilities in U that are independent of any strategy; the kernel of k is the subspace of these that happen to always map to utility 0.
To show that k need not be surjective, add actions c and b′ after o and o′ above:
Then the set of strategies has six elements (each strategy choose one element from among 3 elements, then another from among 2 elements), while the set of histories has five elements, corresponding to each of the five final actions. Hence k cannot be surjective from the five dimensional U to the six dimensional RS. □
Given this k, we can consider the true space of utility functions that we're interested in: the image of k, divided out by equivalence relations (translations and positive scalings). Thus the space of relevant utility classes is:
Note that can translate any element of RS to have all its coordinates sum to zero, and then divide by positive scalings. This means that U=S∪{0} where S is a sphere and {0} corresponds to the utilities constant on all strategies. The topology of U is the natural topology on S, combined with the fact that the only open set containing {0} is U itself.
Stable permutations
We want to be able to interchange strategies, and still get the same result: have some sort of symmetry on the normalisation process. However, since k(U) can be a strict subset of RS, we can't just take arbitrary permutations of S. This section will instead define the subset of stable permutations. Note that any permutation of S extends to an invertable linear transformation of RS by permutation of the basis elements.
Definition: Stable permutations. A stable permutation is a permutation of S that extends to a linear map on RS that preserves k(U)⊂RS.
Since U=k(U)/∼, we can call the set of stable permutations SU.
How common are stable permutations? In general, not too uncommon and not too common, as the following two theorems will show:
Proposition 2: If S is non-trivial, then so is SU.
Proof: Consider the set of actions and possible observations as a tree. If S contains more than one element, then there is a branching point where the agent can choose between more than one action. Since the length of the branches is n and hence finite, there exists at least one branching point with no subsequent branching point: the agent has no further decisions to make.
Let h be the partial history that corresponds to that final branching point, and let a and b be possible actions at h. Then a transposition of a and b will also permute any strategies S for which h is a possible history.
But because h is a finale branching point, the expected utilities u(ha) and u(hb) are defined for any u∈U, independently of any strategy. Thus transposing a and b defines a map σ on U, with σ(u)(ha)=u(hb), σ(u)(hb)=u(ha), and u(h′) otherwise unchanged for any history h that doesn't start with ha or hb.
Hence the permutation of S corresponds to a linear automorphism of U, which preserves the image of U under k, which means that that permutation is in SU.□
In general, any tree isomorphism will correspond to an element of SU, and one can delete -- or add -- nodes with a single action before doing such an isomorphism. However, in the general case, the space of stable permutations is not that large:
Proposition 3: The set SU can be a strict subset of the set of permutations of S. And SU need not be transitive and may even have fixed points: there may exist an s∈S which is mapped to itself for all ρ∈SU.
Proof: consider the situation below, where there are no choices after o1b:
Let s be the strategy that chooses b at o1, and let u be the utility function that maps o1b to 1 and everything else to 0. Then assume that ρ exchanges s with any s′. This means that ρ(k(u)) is a function in RS that takes value 1 on a single policy s′ that choose a after o1 (as all policies but s do that), and 0 everywhere else.
Without loss of generality, let s′ choose a and a′ after o2 and o′2, respectively. Then changing a to b must reduce the expectation of s′ by 1, and similarly changing a′ to b′. Thus changing a to b and changing a′ to b′ simultaneously must, by linearity, set the expected utility of that policy to −1. Consequently, ρ(k(u)) cannot be an element of k(U), and hence ρ∉SU.
Thus every element of SU must map s to itself.□
Extra results and concepts
This section presents some miscellaneous extra results and concepts within this formalism.
Utilities and strategies
Each s∈S generates Hs⊂H which is the set of histories which have non-zero probability given s.
Lemma 4: If s≠s′, then Hs≠Hs′.
Proof: since s≠s′, there must be some partial history h, of non-zero probability given s or s′, where s will choose action a while s′ will choose action a′ (in the extreme case, h may be ∅). Thus s must have at least one possible history starting with ha while s′ can have none (and vice versa for ha′). □
Proposition 5: Each s has some [u]∈U of which it is the unique maximising strategy for.
Proof: Define us∈U as mapping Hs to 1 and the rest of H to 0. Thus the expectation us(s)=1, and us(s′)<1 for s≠s′, since s′ has a non-zero probability of reaching a history outside Hs, hence a history where us has expected utility 0. The argument extends to mixed strategies, and the relevant utility class is [k(us)].□
Partial histories
For any partial history h (or ha, if we want it to end with an action), define Hh (Hha) as the set of histories that start with h (or ha).
Let Sh be the set of strategies on Hh, and S(h)⊂S the set of strategies that allow the history h. Then there is a natural projection qh:S(h)→Sh by mapping each strategy to its equivalence class of strategies on Hh. The same holds for ha.
Let S−ha be a set of partial strategies, defined as strategies defined on H−Hha that allow ha to happen with non-zero probability. Then S(ha)=S−ha×Sh.
Let Uha be the set of utility functions on Hha, and kha the projection Uha→Sha from Theorem 1. We can define Uha as k(Uha)/∼, similarly to above.
Let Oha be the set of possible observations the agent can make just after ha. Then there is a decomposition Sha=×o∈OhaShao.
Proposition 6: There is a natural map rha from U to Uha.
Proof: for u∈U, s∈Sha and s′∈S−ha, define rha(u)(s) as u(s×s′).
Now replace s′ with s′′, also in S−ha, and compare, using standard conditional probability and conditional expected utility:
Note that the probability of ha is independent of s (since s∈Sha), and, since s′ and s′′ are deterministic, P(ha|s′)=P(ha|s′′). Similarly u(s×s′|ha) and u(s×s′|¬ha) are independent of s′ and and s, respectively -- and the same goes for the s′′ expression.
Putting this all together gives:
where Cs′,s′′ is a constant depending only on s′ and s′′.
Consequently, the map rha defined above changes by a constant when s′ changes. Similarly, if we change u by changing its definition on H−Hha only, then rha also changes by a constant, as u(s′|¬ha) changes.
Thus, up to constants, rha is a well defined map from U to kha(Uha). Since it only uses the expectation of u on S, this makes it a well defined map from k(U) to kha(Uha).
Then if we divide by translations (and scalings), the constant factors drop out, so rha is a well defined map from U to Uha.□