% operators that are separated from the operand by a space
% operators that require brackets
% operators that require parentheses
% Paper specific
The asymptotic optimality theorem for minimax we proved previously was formulated in terms of the "optimal value" v(x). Compared to the Bayesian case, v(x) has the strange property that its definition depends on rewards other than those received after observing x. This is a direct consequence of the dynamic inconsistency of minimax. In an attempt to get around this, we show that the minimax decision rule is dynamically consistent in a certain special case and demonstrate that this special case can occur often for a natural class of models. However, this doesn't immediately solve the problem: doing so requires replacing the minimax decision rule by a stronger condition that ensures subgame perfection in some sense. The formalization of the latter idea is left for future posts.
Appendix A contains the proofs of all results.
##Notation
Given X a topological space, P(X) will denote the space of Borel probability measures on X. We regard it as a topological space using the weak∗ topology. Given x∈X, δx∈P(X) is defined by δx(A):=[[x∈A]]. Given μ,ν∈P(X), dtv(μ,ν) stands for the total variation distance between μ and ν. We denote PC(X) the set of non-empty convex compact subsets of P(X).
Given X a finite set, X∗ denotes the set of finite strings in alphabet X, i.e. X∗:=⨆n∈NXn. Xω denotes the set of infinite strings in alphabet X. Given x∈X∗⊔Xω, |x|∈N⊔{∞} is the length of string. Given 0≤n<|x|, xn∈X is the n-th symbol in x. Given 0≤n≤|x|, x<n is the prefix of x of length n. Given x,y∈X∗⊔Xω, the notations x⊏y, x⊑y, x⊏/y and x⋢y mean "x is a proper prefix of y", "x is a prefix of y" and their negations. λX∈X∗ is the empty string and X+:=X∗∖λX.
##Factorization
We recall the result of the previous post regarding subagents of minimax agents:
In general, π∗2 is not a minimax policy for any natural model i.e. the minimax decision rule is not "dynamically consistent". However, if we assume a certain factorization condition, it is. Specifically, assume E1,E2 are compact Polish, f:E1×E2→E is Borel measurable, ¯u1:S1×E1→R and ¯u2:S2×T×E2→R are continuous s.t.
u1(s1,f(e1,e2))=¯u1(s1,e1)
u2(s2,t,f(e1,e2))=¯u1(s2,t,e2)
Further assume that A1⊆E1 is Borel s.t. f−1(A)=A1×E2 and Φ1,2∈PC(E1,2) are s.t. Φ is the closure of the convex hull of f∗(Φ1×Φ2).
Moreover, define ¯S2:=T→S2 equipped with the product topology, define ¯π∗2∈P(¯S2) by
¯π∗2:=∏t∈Tπ∗2(t)
Define ¯E2:=T×E2 and define ¯Φ2∈PC(¯E2) by
¯Φ2:={α∗π∗1×μ∣μ∈Φ2}
Finally, define ^u2:¯S2ׯE2→R by
^u2(s,t,e):=¯u2(s(t),t,e)
Then, ¯π∗2 is a minimax policy for ¯Φ2 w.r.t. the utility function ^u2.
##Factorization everywhere
We now formulate a condition that ensures that factorizations happens "often" in a certain sense.
#Definition 1
Consider Φ∈PC(Oω) and x∈O∗. Define fx:Oω×Oω→Oω by
fx(e1,e2):={e1 if x⊏/e1xe2 if x⊏e1
Φ is said to factorize at x when there are Φ1,Φ2⊆P(Oω) s.t. Φ is the closure of the convex hull of fx∗(Φ1×Φ2).
Given e∈Oω, Φ is said to factorize over e when there are infinitely many n∈N s.t. Φ factorizes at e<n.
Given a time discount function γ, Φ is said to factorize frequently over e (w.r.t. γ) when there is an increasing sequence {nk∈N}k∈N s.t.
i. Φ factorizes at e<nk for all k.
ii. There is ϵ>0 s.t. for all k
∑∞n=nk+1−1γ(n)∑∞n=nkγ(n)>ϵ
For example, for geometric discount (γ(n)=βn), condition ii means that nk+1−nk is bounded.
Φ is said to factorize everywhere when
∀μ∈Φ:Pre∼μ[Φ factorizes over e]=1
Φ is said to factorize frequently everywhere (w.r.t. γ) when
∀μ∈Φ:Pre∼μ[Φ factorizes frequently over e]=1
Note that any singleton {μ} factorizes frequently everywhere w.r.t. any time discount function. More generally, we can give the following explicit description of models that factorize everywhere:
#Construction
Consider some F:O∗→PC(O+) (O+ is considered to be a discrete space). Assume that for any x∈O, there is Ax⊆O+ prefix-free s.t. any μ∈F(x) is supported on Ax.
We define XF⊆O∗ as the minimal set with the following properties:
i. λO∈XF
ii. If x∈XF, μ∈F(x) and y∈O+ is s.t. μ(y)>0, then xy∈XF.
We define ΦF⊆P(Oω) as follows. μ∈ΦF iff for any x∈XF, there is μx∈F(x) s.t. for any y∈O+, if μx(y)>0 then
Pre∼μ[xy⊏e]=Pre∼μ[x⊏e]μx(y)
#Proposition 2
For any F as above, ΦF∈PC(Oω) and factorizes everywhere.
It is also easy to use the above construction to get Φ that factorizes frequently everywhere w.r.t. given γ (for example, if we require that for every x∈XF, F(x) is supported on strings of uniformly bounded length, ΦF will factorize frequently everywhere w.r.t. geometric discount; this condition on F is sufficient but not necessary).
We now formulate the optimality condition that we would like to hold (but for which minimax is unfortunately insufficent). Consider again Φ, Ψ and π∗ as before. Define u>n:Oω×(O∗→A)→R to be normalized sum of rewards from time n, i.e.
Recalling that minimization over Φ can be replaced by minimization over any set s.t. Φ is the closure of its convex hull, and using Proposition 0, we have
In the setting of the Construction, consider any μ∈ΦF. Then:
∀x∈XF∃μx∈F(x)∀y∈Ax:μ(xyOω)=μ(xOω)μx(y)
#Proof of Proposition A.0
Given x∈XF, we know there is μx∈F(x) s.t. the identity holds whenever μx(y)>0. Summing these identities over y, we get
∑μx(y)>0μ(xyOω)=∑μx(y)>0μ(xOω)μx(y)
∑μx(y)>0μ(xyOω)=μ(xOω)
Since Ax is prefix-free, for any y,y′∈Ax we have xyOω∩xy′Oω=∅. Also, obviously, xyOω⊆xOω. It follows that
∑y∈Axμ(xyOω)=μ(⋃y∈AxxyOω)≤μ(xOω)=∑μx(y)>0μ(xyOω)
Since μx(y)>0 implies y∈Ax, we conclude that for any y∈Ax, if μx(y)=0 then μ(xyOω)=0. This, together with the property of μx, gives the desired result.
#Proposition A.1
In the setting of the Construction, ΦF is convex.
#Proof of Proposition A.1
Consider μ,ν∈ΦF and p∈(0,1) and define ξ=pμ+(1−p)ν. Consider some x∈XF, and let Ax⊆O+ be as in the Construction. If ξ(xOω)=0, we can choose any ξx∈F(x) to satisfy the desired condition. Therefore, we can assume ξ(xOω)>0. By Proposition A.0, there are μx,νx∈F(x) s.t. for all y∈Ax:
μ(xyOω)=μ(xOω)μx(y)
ν(xyOω)=ν(xOω)νx(y)
Taking a convex linear combination of these equations, we get
ξ(xyOω)=pμ(xOω)μx(y)+(1−p)ν(xOω)νx(y)
ξ(xyOω)=ξ(xOω)pμ(xOω)μx(y)+(1−p)ν(xOω)νx(y)ξ(xOω)
Define ξx∈P(O+) by
ξx:=pμ(xOω)ξ(xOω)μx+(1−p)ν(xOω)ξ(xOω)νx
ξx is a convex linear combination of μx and νx, therefore ξx∈F(x). We get
ξ(xyOω)=ξ(xOω)ξx(y)
Therefore, ξ∈ΦF.
#Proposition A.2
In the setting of the Construction, ΦF is closed.
#Proof of Proposition A.2
Consider any sequence {μn∈ΦF}n∈N and s.t. μn→μ for some μ∈P(Oω). Consider some x∈XF, and let Ax⊆O+ be as in the Construction. By Proposition A.0, for any n∈N, there is μnx∈F(x) s.t. for any y∈Ax:
μn(xyOω)=μn(xOω)μnx(y)
Note that P(O+) is not compact but is still Polish (since O+ is such), and therefore F(x) is compact and Polish. Let {nk∈N}k∈N be an increasing sequence s.t. μnkx→μx for some μx∈F(x). For any z∈O∗, χzOω:Oω→{0,1} is a continuous function, and therefore μn(zO∗)→μ(zO∗). For any y∈O+, χ{y}:O+→{0,1} is also continuous, and therefore μnkx(y)→μx(y). Putting these together, we get
μ(xyOω)=μ(xOω)μx(y)
Therefore μ∈ΦF.
#Proposition A.3
In the setting of the Construction, consider any x,z∈XF, μ∈F(z) and y∈O+ s.t. μ(y)>0. Assume that x⊏zy. Then, x⊑z.
#Proof of Proposition A.3
We prove by induction on |x|. If |x|=0, then obviously x⊑z. If |x|>0, then, by definition of XF, there is x′∈XF, y′∈O+ and ν∈F(x′) s.t. ν(y′)>0 and x=x′y′. x′⊏x⊏zy, therefore, by the induction hypothesis, x′⊑z. Assume to the contrary that x⋢z. Then, since x⊏zy, z⊏x=x′y′. By the induction hypothesis, this implies z⊑x′. We got z=x′. It follows that zy′=x⊏zy, and therefore y′⊏y. However, y,y′∈Az and Az is prefix-free, which is a contradiction.
#Proposition A.4
In the setting of the Construction, ΦF is non-empty.
#Proof of Proposition A.4
Choose μx∈F(x) for each x∈XF. Given any z∈O+, define p(z)∈O∗ as the longest proper prefix of z s.t. p(z)∈XF. Define θ:O∗→[0,1] recursively by
θ(λO):=1
∀z∈O+:θ(z):=θ(p(z))Pry∼μp(z)[z⊑p(z)y]
Consider any z∉XF. For any o∈O, p(zo)=p(z), therefore:
θ(zo)=θ(p(z))Pry∼μp(z)[zo⊑p(z)y]
∑o∈Oθ(zo)=θ(p(z))∑o∈OPry∼μp(z)[zo⊑p(z)y]
∑o∈Oθ(zo)=θ(p(z))Pry∼μp(z)[z⊏p(z)y]
For any y s.t. μp(z)(y)>0, p(z)y∈XF and therefore z≠p(z)y. It follows that z⊏p(z)y iff z⊑p(z)y, and we get
∑o∈Oθ(zo)=θ(z)
Now, consider z∈XF. For any o∈O, p(zo)=z, therefore:
θ(zo)=θ(z)Pry∼μz[zo⊑zy]
∑o∈Oθ(zo)=θ(z)∑o∈OPry∼μz[zo⊑zy]
Again we get
∑o∈Oθ(zo)=θ(z)
It follows that there is μ∈P(Oω) s.t. for any z∈O∗, μ(zOω)=θ(z).
Consider any x∈XF and y0∈O+ s.t. μx(y0)>0. p(xy0)⊏xy0 and by Proposition\ A.13, this implies p(xy0)⊑x. By definition of p, it follows that p(xy0)=x. We get
θ(xy0)=θ(p(xy0))Pry∼μp(xy0)[xy0⊑p(xy0)y]
θ(xy0)=θ(x)Pry∼μx[xy0⊑xy]
θ(xy0)=θ(x)Pry∼μx[y0⊑y]
y0∈Ax and any y∈O+ s.t. μx(y)>0 is also in Ax. Ax is prefix-free, so y0⊑y iff y0=y. We get
θ(xy0)=θ(x)μx(y0)
μ(xy0Oω)=μ(xOω)μx(y0)
Therefore, μ∈ΦF.
#Proposition A.5
Consider x∈O∗ and let fx:Oω×Oω→Oω be defined as in Definition 1. Then, for any y∈O∗:
(fx)−1(xyOω)=xOω×yOω
#Proof of Proposition A.5
Given any e1,e2∈Oω, fx(xe1,ye2)=xye2. On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=xye3 then x⊏e1 and e2=ye3.
#Proposition A.6
Consider x∈O∗ and let fx:Oω×Oω→Oω be defined as in Definition 1. Then, for any z∈O∗, if x⊏/z, then:
(fx)−1(zOω)=zOω×Oω
#Proof of Proposition A.6
Assume x=zw for some w∈O∗. For any e1,e2∈Oω, fx(ze1,e2) is either ze1 (if w⊏/e1) or xe2=zwe2 (if w⊏e1). Either way, z⊏fx(ze1,e2). On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=ze3 then either x⊏/e1 and e1=ze3 or zw=x⊏e1. Either way, z⊏e1.
Now, assume x⋣z. For any e1,e2∈Oω, fx(ze1,e2)=ze1 (since x⊏/ze1). On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=ze3 then e1=ze3 (since x⊏/ze3).
#Proposition A.7
In the setting of the Construction, consider some x∈XF and define G:O∗→PC(O+) by G(z):=F(xz). Assume that w∈O∗ is s.t. xw∈XF. Then, w∈XG.
#Proof of Proposition A.7
We prove by induction on |w|. For |w|=0, the proposition is obvious. Assume |w|>0. xw∈XF, therefore xw=x′y for some x′∈XF, μ∈F(x′) and y∈O+ s.t. μ(y)>0. x⊏x′y, therefore, by Proposition A.3, x⊑x′. Let z∈O∗ be s.t. x′=xz. xw=x′y=xzy, therefore w=zy and we can use the induction hypothesis to show that z∈XG. μ∈F(x′)=F(xz)=G(z), implying that w∈XG.
#Proposition A.8
In the setting of the Construction, consider some x∈XF and define G:O∗→PC(O+) by G(z):=F(xz). Consider any w∈XG. Then, xw∈XF.
#Proof of Proposition A.8
We prove by induction on |w|. For |w|=0, the proposition is obvious. Assume that |w|>0. Then, there is some w′∈XG, μ∈G(w′)=F(xw′) and y∈O+ s.t. μ(y)>0 and w=w′y. By the induction hypothesis, xw′∈XF. Therefore, xw=xw′y is also in XF.
#Proposition A.9
In the setting of the Construction, ΦF factorizes at x for any x∈XF.
#Proof of Proposition A.9
Fix x∈XF. Let fx be as in Definition 1. Define G:O∗→PC(O+) by G(z):=F(xz). We claim that ΦF=fx∗(ΦF×ΦG).
Consider any μ∈ΦF, ν∈ΦG and denote ξ:=fx∗(μ×ν). Consider some z∈XF and Az⊆O+ corresponding.
Assume z=xw for some w∈O∗. By Proposition A.7, w∈XG. Note that for any ν0∈G(w)=F(z), ν0(y)>0 implies y∈Az. Let νw∈P(O+) be as in Proposition A.0. For any y∈Az, we have
ν(wyOω)=ν(wOω)νw(y)
Multiplying both sides by μ(xOω):
μ(xOω)ν(wyOω)=μ(xOω)ν(wOω)νw(y)
Using the definition of ξ and Proposition A.5, we get
ξ(zOω)=μ(xOω)ν(wOω)
ξ(zyOω)=μ(xOω)ν(wyOω)
Combining, we get
ξ(zyOω)=ξ(zOω)νw(y)
Now, assume x⋢z. Let μz∈P(O+) be as in Proposition A.0. For any y∈O+ s.t. μz(y)>0, we have
μ(zyOω)=μ(zOω)μz(y)
Using the definition of ξ and Proposition A.6, we get
ξ(zOω)=μ(zOω)
By Proposition A.3, x⊏/zy. Therefore, we can use Proposition A.6 again to conclude
ξ(zyOω)=μ(zyOω)
Combining, we get
ξ(zyOω)=ξ(zOω)μz(y)
We proved that ξ∈ΦF. It remains to show that, conversely, μ∈fx∗(ΦF×ΦG).
Define μx∈P(Oω) as follows. If μ(xOω)=0, choose arbitrary μx∈ΦG. If μ(xOω)>0, define μx by
μx(zOω):=μ(xzOω)μ(xOω)
Consider any w∈XG. By Proposition A.8, xw∈XF. By Proposition A.0, there is μxw∈F(xw) s.t. for any y∈Axw:
μ(xwyOω)=μ(xwOω)μxw(y)
If μ(xOω)>0, we can divide both sides by μ(xOω) and get
μx(wyOω)=μx(wOω)μxw(y)
It follows that, in either case, μx∈ΦG and
μ(xOω)μx(zOω)=μ(xzOω)
Denote ξ:=fx∗(μ×μx). Consider any z∈O∗.
Assume z=xw for some w∈O∗. By Proposition A.5
ξ(zOω)=μ(xOω)μx(wOω)=μ(zOω)
Now, assume x⋢z. By Proposition A.6
ξ(zOω)=μ(zOω)
We conclude that μ=ξ.
#Proposition A.10
In the setting of the construction, consider any μ∈ΦF and e∈Oω s.t. for any n∈N, μ(e<nOω)>0. Then, there are infinitely many x∈XF s.t. x⊏e.
#Proof of Proposition A.10
We prove by induction on n that there are at least n prefixes of e in XF. For n=0 the claim is vacuously true. Consider any n>0. By the induction hypothesis, there are x0⊏x1⊏…⊏xn−1⊏e s.t. xk∈XF for all k<n. Without loss of generality, we can assume Axn−1 is s.t.
⋃y∈Axn−1yOω=Oω
Indeed, if this condition is false we can make it true by adding more elements to Axn−1. By Proposition A.0, there is ν∈F(xn−1) s.t. for any y∈Axn−1
μ(xn−1yOω)=μ(xn−1Oω)ν(y)
By the assumption on Axn−1, there is y0∈Axn−1 s.t. xn−1y0⊏e. Denote xn:=xn−1y0. We get
μ(xnOω)=μ(xn−1Oω)ν(y0)
The left hand side is positive since xn⊏e, therefore ν(y0)>0.\ It follows that xn∈XF.
#Proof of Proposition 2
By Propositions A.1, A.2 and A.4, ΦF∈PC(Oω). It remains to show that ΦF factorizes everywhere.
Consider any μ∈ΦF. Denote
N:={e∈Oω∣∃n∈N:μ(e<nOω)=0}
We have
N=⋃x∈O∗μ(xOω)=0xOω
Therefore, μ(N)=0 and μ(Oω∖N)=1. By Proposition A.10, for any e∉N, there are infinitely many x⊏e s.t. x∈XF. By Proposition A.9, it follows that ΦF factorizes over e. We get
% operators that are separated from the operand by a space
% operators that require brackets
% operators that require parentheses
% Paper specific
The asymptotic optimality theorem for minimax we proved previously was formulated in terms of the "optimal value" v(x). Compared to the Bayesian case, v(x) has the strange property that its definition depends on rewards other than those received after observing x. This is a direct consequence of the dynamic inconsistency of minimax. In an attempt to get around this, we show that the minimax decision rule is dynamically consistent in a certain special case and demonstrate that this special case can occur often for a natural class of models. However, this doesn't immediately solve the problem: doing so requires replacing the minimax decision rule by a stronger condition that ensures subgame perfection in some sense. The formalization of the latter idea is left for future posts.
Appendix A contains the proofs of all results.
##Notation
Given X a topological space, P(X) will denote the space of Borel probability measures on X. We regard it as a topological space using the weak∗ topology. Given x∈X, δx∈P(X) is defined by δx(A):=[[x∈A]]. Given μ,ν∈P(X), dtv(μ,ν) stands for the total variation distance between μ and ν. We denote PC(X) the set of non-empty convex compact subsets of P(X).
Given X a finite set, X∗ denotes the set of finite strings in alphabet X, i.e. X∗:=⨆n∈NXn. Xω denotes the set of infinite strings in alphabet X. Given x∈X∗⊔Xω, |x|∈N⊔{∞} is the length of string. Given 0≤n<|x|, xn∈X is the n-th symbol in x. Given 0≤n≤|x|, x<n is the prefix of x of length n. Given x,y∈X∗⊔Xω, the notations x⊏y, x⊑y, x⊏/y and x⋢y mean "x is a proper prefix of y", "x is a prefix of y" and their negations. λX∈X∗ is the empty string and X+:=X∗∖λX.
##Factorization
We recall the result of the previous post regarding subagents of minimax agents:
#Proposition 0
π∗2∈argmaxπ2:T→P(S2)minμ∈Φ(Eπ∗1×μ[u1]+μ(A)Et∼α∗π∗1[Eπ2(t)×μ∣A[u2(t)]])
In general, π∗2 is not a minimax policy for any natural model i.e. the minimax decision rule is not "dynamically consistent". However, if we assume a certain factorization condition, it is. Specifically, assume E1,E2 are compact Polish, f:E1×E2→E is Borel measurable, ¯u1:S1×E1→R and ¯u2:S2×T×E2→R are continuous s.t.
u1(s1,f(e1,e2))=¯u1(s1,e1)
u2(s2,t,f(e1,e2))=¯u1(s2,t,e2)
Further assume that A1⊆E1 is Borel s.t. f−1(A)=A1×E2 and Φ1,2∈PC(E1,2) are s.t. Φ is the closure of the convex hull of f∗(Φ1×Φ2).
#Proposition 1
Assume that minμ∈Φ1μ1(A1)>0. Then,
π∗2∈argmaxπ2:T→P(S2)minμ∈Φ2Et∼α∗π∗1[Eπ2(t)×μ[¯u2(t)]]
Moreover, define ¯S2:=T→S2 equipped with the product topology, define ¯π∗2∈P(¯S2) by
¯π∗2:=∏t∈Tπ∗2(t)
Define ¯E2:=T×E2 and define ¯Φ2∈PC(¯E2) by
¯Φ2:={α∗π∗1×μ∣μ∈Φ2}
Finally, define ^u2:¯S2ׯE2→R by
^u2(s,t,e):=¯u2(s(t),t,e)
Then, ¯π∗2 is a minimax policy for ¯Φ2 w.r.t. the utility function ^u2.
##Factorization everywhere
We now formulate a condition that ensures that factorizations happens "often" in a certain sense.
#Definition 1
Consider Φ∈PC(Oω) and x∈O∗. Define fx:Oω×Oω→Oω by
fx(e1,e2):={e1 if x⊏/e1xe2 if x⊏e1
Φ is said to factorize at x when there are Φ1,Φ2⊆P(Oω) s.t. Φ is the closure of the convex hull of fx∗(Φ1×Φ2).
Given e∈Oω, Φ is said to factorize over e when there are infinitely many n∈N s.t. Φ factorizes at e<n.
Given a time discount function γ, Φ is said to factorize frequently over e (w.r.t. γ) when there is an increasing sequence {nk∈N}k∈N s.t.
i. Φ factorizes at e<nk for all k.
ii. There is ϵ>0 s.t. for all k
∑∞n=nk+1−1γ(n)∑∞n=nkγ(n)>ϵ
For example, for geometric discount (γ(n)=βn), condition ii means that nk+1−nk is bounded.
Φ is said to factorize everywhere when
∀μ∈Φ:Pre∼μ[Φ factorizes over e]=1
Φ is said to factorize frequently everywhere (w.r.t. γ) when
∀μ∈Φ:Pre∼μ[Φ factorizes frequently over e]=1
Note that any singleton {μ} factorizes frequently everywhere w.r.t. any time discount function. More generally, we can give the following explicit description of models that factorize everywhere:
#Construction
Consider some F:O∗→PC(O+) (O+ is considered to be a discrete space). Assume that for any x∈O, there is Ax⊆O+ prefix-free s.t. any μ∈F(x) is supported on Ax.
We define XF⊆O∗ as the minimal set with the following properties:
i. λO∈XF
ii. If x∈XF, μ∈F(x) and y∈O+ is s.t. μ(y)>0, then xy∈XF.
We define ΦF⊆P(Oω) as follows. μ∈ΦF iff for any x∈XF, there is μx∈F(x) s.t. for any y∈O+, if μx(y)>0 then
Pre∼μ[xy⊏e]=Pre∼μ[x⊏e]μx(y)
#Proposition 2
For any F as above, ΦF∈PC(Oω) and factorizes everywhere.
It is also easy to use the above construction to get Φ that factorizes frequently everywhere w.r.t. given γ (for example, if we require that for every x∈XF, F(x) is supported on strings of uniformly bounded length, ΦF will factorize frequently everywhere w.r.t. geometric discount; this condition on F is sufficient but not necessary).
We now formulate the optimality condition that we would like to hold (but for which minimax is unfortunately insufficent). Consider again Φ, Ψ and π∗ as before. Define u>n:Oω×(O∗→A)→R to be normalized sum of rewards from time n, i.e.
u>n(s,e):=∑m>nγ(m)r(es<m)∑m>nγ(m)
Define w:O∗→R by
w(x):=maxρ∈A|x|→P(O∗→A)minμ∈ΦEs∼π∗⊗xρe∼μ[u>|x|(s,e)∣x⊏e]
#Hypothesis
If π∗ satisfies the (yet unformulated) subgame perfection condition, we should have the following:
Assume Φ factorizes everywhere. Then,
∀μ∈Φ:Pre∼μ[liminfn→∞max(w(e<n)−Es∼π∗e′∼μ[u>n(s,e′)∣e<n⊏e′],0)=0]=1
Moreover, if Φ factorizes frequently everywhere, then
∀μ∈Φ:Pre∼μ[limn→∞max(w(e<n)−Es∼π∗e′∼μ[u>n(s,e′)∣e<n⊏e′],0)=0]=1
##Appendix A
#Proof of Proposition 1
Recalling that minimization over Φ can be replaced by minimization over any set s.t. Φ is the closure of its convex hull, and using Proposition 0, we have
π∗2∈argmaxπ2:T→P(S2)minμ1∈Φ1μ2∈Φ2(Eπ∗1×f∗(μ1×μ2)[u1]+(μ1×μ2)(f−1(A))Et∼α∗π∗1[Eπ2(t)×f∗(μ1×μ2)∣A[u2(t)]])
π∗2∈argmaxπ2:T→P(S2)minμ1∈Φ1μ2∈Φ2(Eπ∗1×μ1[¯u1]+μ1(A1)Et∼α∗π∗1[Eπ2(t)×μ2[¯u2(t)]])
π∗2∈argmaxπ2:T→P(S2)minμ1∈Φ1(Eπ∗1×μ1[¯u1]+μ1(A1)minμ2∈Φ2Et∼α∗π∗1[Eπ2(t)×μ2[¯u2(t)]])
Using the assumption minμ1∈Φ1μ1(A1)>0, we get the desired result.
Now consider any ¯π2∈P(¯S2). Define π2:T→P(S2) by (π2(t))(B):=Prσ∼¯π2[σ(t)∈B]. We have
Et∼α∗π∗1[Es∼π2(t)e∼μ2[¯u2(s,t,e)]]=Et∼α∗π∗1[Eσ∼¯π2e∼μ2[¯u2(σ(t),t,e)]]
Et∼α∗π∗1[Es∼π2(t)e∼μ2[¯u2(s,t,e)]]=E¯π2×α∗π∗1×μ[^u2]
For the special case ¯π2=¯π∗2=∏t∈Tπ∗2(t), we get π2=π∗2 and therefore
Et∼α∗π∗1[Es∼π∗2(t)e∼μ2[¯u2(s,t,e)]]=E¯π∗2×α∗π∗1×μ[^u2]
Using the property of π∗2, it follows that
∀μ∈Φ2:E¯π2×α∗π∗1×μ[^u2]≤E¯π∗2×α∗π∗1×μ[^u2]
∀¯μ∈¯Φ2:E¯π2ׯμ[^u2]≤E¯π∗2ׯμ[^u2]
#Proposition A.0
In the setting of the Construction, consider any μ∈ΦF. Then:
∀x∈XF∃μx∈F(x)∀y∈Ax:μ(xyOω)=μ(xOω)μx(y)
#Proof of Proposition A.0
Given x∈XF, we know there is μx∈F(x) s.t. the identity holds whenever μx(y)>0. Summing these identities over y, we get
∑μx(y)>0μ(xyOω)=∑μx(y)>0μ(xOω)μx(y)
∑μx(y)>0μ(xyOω)=μ(xOω)
Since Ax is prefix-free, for any y,y′∈Ax we have xyOω∩xy′Oω=∅. Also, obviously, xyOω⊆xOω. It follows that
∑y∈Axμ(xyOω)=μ(⋃y∈AxxyOω)≤μ(xOω)=∑μx(y)>0μ(xyOω)
Since μx(y)>0 implies y∈Ax, we conclude that for any y∈Ax, if μx(y)=0 then μ(xyOω)=0. This, together with the property of μx, gives the desired result.
#Proposition A.1
In the setting of the Construction, ΦF is convex.
#Proof of Proposition A.1
Consider μ,ν∈ΦF and p∈(0,1) and define ξ=pμ+(1−p)ν. Consider some x∈XF, and let Ax⊆O+ be as in the Construction. If ξ(xOω)=0, we can choose any ξx∈F(x) to satisfy the desired condition. Therefore, we can assume ξ(xOω)>0. By Proposition A.0, there are μx,νx∈F(x) s.t. for all y∈Ax:
μ(xyOω)=μ(xOω)μx(y)
ν(xyOω)=ν(xOω)νx(y)
Taking a convex linear combination of these equations, we get
ξ(xyOω)=pμ(xOω)μx(y)+(1−p)ν(xOω)νx(y)
ξ(xyOω)=ξ(xOω)pμ(xOω)μx(y)+(1−p)ν(xOω)νx(y)ξ(xOω)
Define ξx∈P(O+) by
ξx:=pμ(xOω)ξ(xOω)μx+(1−p)ν(xOω)ξ(xOω)νx
ξx is a convex linear combination of μx and νx, therefore ξx∈F(x). We get
ξ(xyOω)=ξ(xOω)ξx(y)
Therefore, ξ∈ΦF.
#Proposition A.2
In the setting of the Construction, ΦF is closed.
#Proof of Proposition A.2
Consider any sequence {μn∈ΦF}n∈N and s.t. μn→μ for some μ∈P(Oω). Consider some x∈XF, and let Ax⊆O+ be as in the Construction. By Proposition A.0, for any n∈N, there is μnx∈F(x) s.t. for any y∈Ax:
μn(xyOω)=μn(xOω)μnx(y)
Note that P(O+) is not compact but is still Polish (since O+ is such), and therefore F(x) is compact and Polish. Let {nk∈N}k∈N be an increasing sequence s.t. μnkx→μx for some μx∈F(x). For any z∈O∗, χzOω:Oω→{0,1} is a continuous function, and therefore μn(zO∗)→μ(zO∗). For any y∈O+, χ{y}:O+→{0,1} is also continuous, and therefore μnkx(y)→μx(y). Putting these together, we get
μ(xyOω)=μ(xOω)μx(y)
Therefore μ∈ΦF.
#Proposition A.3
In the setting of the Construction, consider any x,z∈XF, μ∈F(z) and y∈O+ s.t. μ(y)>0. Assume that x⊏zy. Then, x⊑z.
#Proof of Proposition A.3
We prove by induction on |x|. If |x|=0, then obviously x⊑z. If |x|>0, then, by definition of XF, there is x′∈XF, y′∈O+ and ν∈F(x′) s.t. ν(y′)>0 and x=x′y′. x′⊏x⊏zy, therefore, by the induction hypothesis, x′⊑z. Assume to the contrary that x⋢z. Then, since x⊏zy, z⊏x=x′y′. By the induction hypothesis, this implies z⊑x′. We got z=x′. It follows that zy′=x⊏zy, and therefore y′⊏y. However, y,y′∈Az and Az is prefix-free, which is a contradiction.
#Proposition A.4
In the setting of the Construction, ΦF is non-empty.
#Proof of Proposition A.4
Choose μx∈F(x) for each x∈XF. Given any z∈O+, define p(z)∈O∗ as the longest proper prefix of z s.t. p(z)∈XF. Define θ:O∗→[0,1] recursively by
θ(λO):=1
∀z∈O+:θ(z):=θ(p(z))Pry∼μp(z)[z⊑p(z)y]
Consider any z∉XF. For any o∈O, p(zo)=p(z), therefore:
θ(zo)=θ(p(z))Pry∼μp(z)[zo⊑p(z)y]
∑o∈Oθ(zo)=θ(p(z))∑o∈OPry∼μp(z)[zo⊑p(z)y]
∑o∈Oθ(zo)=θ(p(z))Pry∼μp(z)[z⊏p(z)y]
For any y s.t. μp(z)(y)>0, p(z)y∈XF and therefore z≠p(z)y. It follows that z⊏p(z)y iff z⊑p(z)y, and we get
∑o∈Oθ(zo)=θ(z)
Now, consider z∈XF. For any o∈O, p(zo)=z, therefore:
θ(zo)=θ(z)Pry∼μz[zo⊑zy]
∑o∈Oθ(zo)=θ(z)∑o∈OPry∼μz[zo⊑zy]
Again we get
∑o∈Oθ(zo)=θ(z)
It follows that there is μ∈P(Oω) s.t. for any z∈O∗, μ(zOω)=θ(z).
Consider any x∈XF and y0∈O+ s.t. μx(y0)>0. p(xy0)⊏xy0 and by Proposition\ A.13, this implies p(xy0)⊑x. By definition of p, it follows that p(xy0)=x. We get
θ(xy0)=θ(p(xy0))Pry∼μp(xy0)[xy0⊑p(xy0)y]
θ(xy0)=θ(x)Pry∼μx[xy0⊑xy]
θ(xy0)=θ(x)Pry∼μx[y0⊑y]
y0∈Ax and any y∈O+ s.t. μx(y)>0 is also in Ax. Ax is prefix-free, so y0⊑y iff y0=y. We get
θ(xy0)=θ(x)μx(y0)
μ(xy0Oω)=μ(xOω)μx(y0)
Therefore, μ∈ΦF.
#Proposition A.5
Consider x∈O∗ and let fx:Oω×Oω→Oω be defined as in Definition 1. Then, for any y∈O∗:
(fx)−1(xyOω)=xOω×yOω
#Proof of Proposition A.5
Given any e1,e2∈Oω, fx(xe1,ye2)=xye2. On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=xye3 then x⊏e1 and e2=ye3.
#Proposition A.6
Consider x∈O∗ and let fx:Oω×Oω→Oω be defined as in Definition 1. Then, for any z∈O∗, if x⊏/z, then:
(fx)−1(zOω)=zOω×Oω
#Proof of Proposition A.6
Assume x=zw for some w∈O∗. For any e1,e2∈Oω, fx(ze1,e2) is either ze1 (if w⊏/e1) or xe2=zwe2 (if w⊏e1). Either way, z⊏fx(ze1,e2). On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=ze3 then either x⊏/e1 and e1=ze3 or zw=x⊏e1. Either way, z⊏e1.
Now, assume x⋣z. For any e1,e2∈Oω, fx(ze1,e2)=ze1 (since x⊏/ze1). On the other hand, if e1,e2,e3∈Oω are s.t. fx(e1,e2)=ze3 then e1=ze3 (since x⊏/ze3).
#Proposition A.7
In the setting of the Construction, consider some x∈XF and define G:O∗→PC(O+) by G(z):=F(xz). Assume that w∈O∗ is s.t. xw∈XF. Then, w∈XG.
#Proof of Proposition A.7
We prove by induction on |w|. For |w|=0, the proposition is obvious. Assume |w|>0. xw∈XF, therefore xw=x′y for some x′∈XF, μ∈F(x′) and y∈O+ s.t. μ(y) >0. x⊏x′y, therefore, by Proposition A.3, x⊑x′. Let z∈O∗ be s.t. x′=xz. xw=x′y=xzy, therefore w=zy and we can use the induction hypothesis to show that z∈XG. μ∈F(x′)=F(xz)=G(z), implying that w∈XG.
#Proposition A.8
In the setting of the Construction, consider some x∈XF and define G:O∗→PC(O+) by G(z):=F(xz). Consider any w∈XG. Then, xw∈XF.
#Proof of Proposition A.8
We prove by induction on |w|. For |w|=0, the proposition is obvious. Assume that |w|>0. Then, there is some w′∈XG, μ∈G(w′)=F(xw′) and y∈O+ s.t. μ(y)>0 and w=w′y. By the induction hypothesis, xw′∈XF. Therefore, xw=xw′y is also in XF.
#Proposition A.9
In the setting of the Construction, ΦF factorizes at x for any x∈XF.
#Proof of Proposition A.9
Fix x∈XF. Let fx be as in Definition 1. Define G:O∗→PC(O+) by G(z):=F(xz). We claim that ΦF=fx∗(ΦF×ΦG).
Consider any μ∈ΦF, ν∈ΦG and denote ξ:=fx∗(μ×ν). Consider some z∈XF and Az⊆O+ corresponding.
Assume z=xw for some w∈O∗. By Proposition A.7, w∈XG. Note that for any ν0∈G(w)=F(z), ν0(y)>0 implies y∈Az. Let νw∈P(O+) be as in Proposition A.0. For any y∈Az, we have
ν(wyOω)=ν(wOω)νw(y)
Multiplying both sides by μ(xOω):
μ(xOω)ν(wyOω)=μ(xOω)ν(wOω)νw(y)
Using the definition of ξ and Proposition A.5, we get
ξ(zOω)=μ(xOω)ν(wOω)
ξ(zyOω)=μ(xOω)ν(wyOω)
Combining, we get
ξ(zyOω)=ξ(zOω)νw(y)
Now, assume x⋢z. Let μz∈P(O+) be as in Proposition A.0. For any y∈O+ s.t. μz(y)>0, we have
μ(zyOω)=μ(zOω)μz(y)
Using the definition of ξ and Proposition A.6, we get
ξ(zOω)=μ(zOω)
By Proposition A.3, x⊏/zy. Therefore, we can use Proposition A.6 again to conclude
ξ(zyOω)=μ(zyOω)
Combining, we get
ξ(zyOω)=ξ(zOω)μz(y)
We proved that ξ∈ΦF. It remains to show that, conversely, μ∈fx∗(ΦF×ΦG).
Define μx∈P(Oω) as follows. If μ(xOω)=0, choose arbitrary μx∈ΦG. If μ(xOω)>0, define μx by
μx(zOω):=μ(xzOω)μ(xOω)
Consider any w∈XG. By Proposition A.8, xw∈XF. By Proposition A.0, there is μxw∈F(xw) s.t. for any y∈Axw:
μ(xwyOω)=μ(xwOω)μxw(y)
If μ(xOω)>0, we can divide both sides by μ(xOω) and get
μx(wyOω)=μx(wOω)μxw(y)
It follows that, in either case, μx∈ΦG and
μ(xOω)μx(zOω)=μ(xzOω)
Denote ξ:=fx∗(μ×μx). Consider any z∈O∗.
Assume z=xw for some w∈O∗. By Proposition A.5
ξ(zOω)=μ(xOω)μx(wOω)=μ(zOω)
Now, assume x⋢z. By Proposition A.6
ξ(zOω)=μ(zOω)
We conclude that μ=ξ.
#Proposition A.10
In the setting of the construction, consider any μ∈ΦF and e∈Oω s.t. for any n∈N, μ(e<nOω)>0. Then, there are infinitely many x∈XF s.t. x⊏e.
#Proof of Proposition A.10
We prove by induction on n that there are at least n prefixes of e in XF. For n=0 the claim is vacuously true. Consider any n>0. By the induction hypothesis, there are x0⊏x1⊏…⊏xn−1⊏e s.t. xk∈XF for all k<n. Without loss of generality, we can assume Axn−1 is s.t.
⋃y∈Axn−1yOω=Oω
Indeed, if this condition is false we can make it true by adding more elements to Axn−1. By Proposition A.0, there is ν∈F(xn−1) s.t. for any y∈Axn−1
μ(xn−1yOω)=μ(xn−1Oω)ν(y)
By the assumption on Axn−1, there is y0∈Axn−1 s.t. xn−1y0⊏e. Denote xn:=xn−1y0. We get
μ(xnOω)=μ(xn−1Oω)ν(y0)
The left hand side is positive since xn⊏e, therefore ν(y0)>0.\ It follows that xn∈XF.
#Proof of Proposition 2
By Propositions A.1, A.2 and A.4, ΦF∈PC(Oω). It remains to show that ΦF factorizes everywhere.
Consider any μ∈ΦF. Denote
N:={e∈Oω∣∃n∈N:μ(e<nOω)=0}
We have
N=⋃x∈O∗μ(xOω)=0xOω
Therefore, μ(N)=0 and μ(Oω∖N)=1. By Proposition A.10, for any e∉N, there are infinitely many x⊏e s.t. x∈XF. By Proposition A.9, it follows that ΦF factorizes over e. We get
Pre∼μ[ΦF factorizes over e]≥Pre∼μ[e∉N]=1