Theorem 5: Causal Commutative Square:The following diagram commutes for any choice of causal Θ. Any choice of infradistribution E∈□E where E=⋀π(π⋅)∗((π⋅)∗(E)) also makes this diagram commute.
Again, like usual, this proof consists of doing preliminary work (section 0) where we clarify some stuff and find 4 conditions for the 4 corners that correspond to "maximal representative of a causal belief function". Section 1.1-8 is showing that these properties are preserved under all eight morphisms, where as usual, the complicated part of translating between a belief function and infradistribution is shoved in the first two sections. Sections 2.1-8 are then showing the 8 identities needed to establish 4 isomorphisms assuming the relevant properties, and then section 3 is going in two directions from one corner to another and showing equality to rule out nontrivial automorphisms.
T5.0 First, preliminaries. We use suprema over: probability distributions ζ on the space of policies, and utility functions Uπ we associate with each policy π, which fulfill the following constraint for some function f.
∀e:Eπ∼ζ[Uπ(π⋅e)]≤f(e)
To avoid writing all this out each time, we observe that for the purposes of the proof, all that varies in this format is the function f:E→[0,1] that we're using as an upper-bound, we always select probability distributions over policies and utility functions linked with the same policy, and impose upper bounds of that "for all copolicies" form on a mix of the relevant utility functions interacting with the copolicy. So we'll suppress all this in the notation and just say
supζ,Uπ,f
as our abbreviation, with ∙ for when we need a function to have an empty slot for copolicies. So something like U(π′⋅∙) would be "the function from copolicies to [0,1] given by letting π′ interact with the copolicy to get a history, and then applying U to it".
The defining property for the third-person static view is that E is an infradistribution, and, for all functions f:E→[0,1], E(f)=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
The property for the third-person dynamic view is that ψT is an infradistribution, and ψT(f)=supζ,Uπ,fEπ∼ζ[ψT(λe.Uπ(π⋅e))] and →T(e)=⊤A⋉(λa.δe(a),ea,e(a))
The property for the first-person static view is that Θ be a belief function, and that it fulfill causality, that Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
The property for the first-person dynamic view is that ψF is an infradistribution, and ψF(f)=supζ,Uπ,fEπ∼ζ[ψF(λe.Uπ(π⋅e))] and →F(e,a)=δe(a),ea,e(a)
Before showing that these properties are preserved under morphisms, it will be handy to have an equivalent rephrasing of causality for belief functions.
The causality condition is:
Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
We can unconditionally derive the statement
Θ(π′)(U)≤supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
because, letting ζ be the dirac-delta distribution on π′, and Uπ′=U, we trivially fulfill the necessary condition for the supremum, that ∀e:Eζ[Uπ(π⋅e)]≤U(π′⋅e)
In the first direction, this happens because any particular choice of ζ and Uπ which fulfills the implication antecendent must have the expectation of Θ(π)(Uπ) doing worse than the supremum of such expectation, which still loses to Θ(π′)(U).
In the other direction, this happens because the supremum only ranges over distributions on policies and choices of utility function that fulfills the implication antecedent, so the consequent always kicks in. Thus, even the largest possible expectation of Θ(π)(Uπ) isn't above Θ(π′)(U).
as an alternative phrasing of causality for belief functions.
Time to show that these properties are preserved under the morphisms. As usual, the hard part is showing property preservation between first-person static and first-person dynamic.
T5.1.1 3-static to 1-static. Our translation is
Θ(π):=(π⋅)∗(E)
and we must check the five conditions of a belief function, as well as causality. So the proof splits into six parts for those six desired results. The five conditions for a belief function are a bounded Lipschitz constant, lower-semicontinuity, normalization, sensible supports, and agreement on max value. We'll just be using that E is an infradistribution, and not that it has any additional properties beyond that. The pushforward of an infradistribution is always an infradistribution, so we know that Θ(π) is always an infradistribution.
T5.1.1.1 For a bounded Lipschitz constant, we have
|Θ(π)(U)−Θ(π)(U′)|=|(π⋅)∗(E)(U)−(π⋅)∗(E)(U′)|
=|E(λe.U(π⋅e))−E(λe.U′(π⋅e))|
and then, since E has some Lipschitz constant λ⊙, we then have
≤λ⊙supe|U(π⋅e)−U′(π⋅e)|
And, for any π,e pair, it induces a history, so this is upper-bounded by
≤λ⊙suph|U(h)−U′(h)|=λ⊙d(U,U′)
So a uniform Lipschitz constant upper bound is shown.
T5.1.1.2 For lower-semicontinuity, it's actually easy to show for once. We get something even stronger. All causal belief functions aren't just lower-semicontinuous in their first argument, they're continuous. The proof is:
Now, the thing to realize here is that λe.U(πn⋅e) uniformly converges to λe.U(π∞⋅e). Why does this happen? Well, once πn gets close enough to π∞, it will perfectly agree with π∞ about what to do in any situation for the first m steps. So, vs any copolicy at all, the histories πn⋅e (for all sufficiently late n) and π∞⋅e will agree for the first m steps. And, since U is continuous on (A×O)ω which is a compact space, it must be uniformly continuous. So for any ϵ difference in output, there's some m where knowing the first m steps of history lets you pin down the value of U to within ϵ.
Then, we can just appeal to the finite Lipschitz constant of E to conclude that, since we have uniform convergence of functions, we have
=E(λe.U(π∞⋅e))
which then packs back up in reverse as
=Θ(π∞)(U)
That was very easy compared to our typical lower-semicontinuity proofs!
T5.1.1.3 For normalization, we have
infπΘ(π)(1)=infπ(π⋅)∗(E)(1)=infπE(1)=1
And the same goes for 0, because E is an infradistribution.
T5.1.1.4 For sensible supports, assume U,U′ agree on histories that can be produced by π. Then,
|Θ(π)(U)−Θ(π)(U′)|=|(π⋅)∗(E)(U)−(π⋅)∗(E)(U′)|
=|E(λe.U(π⋅e))−E(λe.U′(π⋅e))|
But then wait, for any e, π⋅e makes a history that can be made by π, so then U,U′ must agree on it. Those two inner functions are equal, and so attain equal value, and we have =0.
T5.1.1.5 For agreement on max value, you can just go
Θ(π)(1)=(π⋅)∗(E)(1)=E(1)=(π′⋅)∗(E)(1)=Θ(π′)(1)
and
Θ(π)(∞)=(π⋅)∗(E)(∞)=E(∞)=(π′⋅)∗(E)(∞)=Θ(π′)(∞)
and you're done.
T5.1.1.6 That just leaves checking causality. We'll make a point to not assume any special conditions on E at all in order to show this, because the
E(f)=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
condition is actually saying that E is the maximal infradistribution that corresponds to a causal belief function, but it'll be useful later to know that any infradistribution on copolicies makes a causal belief function.
We'll be showing our alternate characterization of causality, that
and then, by applying concavity of infradistributions, we can move the expectation inside and the value goes up, and then we apply monotonicity via the precondition we assumed.
≤E(λe.Eπ∼ζ[Uπ(π⋅e)])≤E(λe.U(π′⋅e))
and then we just pack this back up.
=(π′⋅)∗(E)(U)=Θ(π′)(U)
and we have the causality condition shown.
T5.1.2 1-static to 3-static. Our translation is:
E:=⋀π(π⋅)∗(Θ(π))
The way this proof works is first, we unpack E into a different form in preparation for what follows, that's phase 1. Then there's 6 more phases after that, where we show the five infradistribution conditions on E and the causality condition. Some of these are hard to show so we'll have to split them into more parts.
T5.1.2.1 We'll unpack this first, in preparation for showing the infradistribution properties and causality property for E. We can go
E(f)=⋀π(π⋅)∗(Θ(π))(f)
and then express the intersection as a supremum over ζ∈ΔΠ and fπ:E→[0,1] such that ∀e:Eπ∼ζ[fπ(e)]≤f(e) because that's just how intersections work. This is abbreviated as supζ,fπ,f.
=supζ,fπ,fEπ∼ζ[(π⋅)∗(Θ(π))(fπ)]
Then, for the pullback, we take the supremum over Uπ where Uπ(π⋅e)≤fπ(e) for all e. We'll write this as supUπ,fπ.
This makes
=supζ,fπ,fEπ∼ζ[supUπ,fπΘ(π)(Uπ)]
Now, these Uπ(π⋅∙) always make an acceptable substitute for the fπ functions early on, as they're always lower than fπ regardless of copolicy. Further, if you're attempting to attain the supremum, you'll never actually use your fπ, it will always be Uπ being plugged in. So, we can just wrap up all the maximization into one supremum, to make
=supζ,Uπ,fEπ∼ζ[Θ(π)(Uπ)]
We'll be using this phrasing as a more explicit unpacking of E(f).
Now, for the infradistribution conditions on E, and the causality condition, there's a very important technicality going on. Intersection behaves much worse for the [0,1] type signature. In the R type signature, everything works out perfectly. However, in the [0,1] type signature, we'll be able to show all the needed properties for Eexcept having a finite Lipschitz constant. But, not all is lost, because to compensate for that, we'll show that you can swap out E (almost an infradistribution) for an actual infradistribution E′ that induces the same belief function Θ.
Like we said back in the post, it isn't terribly important that we use the true maximal representative E, we just want some infradistribution in the same equivalence class.
T5.1.2.2 First up, monotonicity. Assume f′≥f. Then, using our rewrite of E(f′) and E(f), we have
and monotonicity is shown. This occurred because we're taking the supremum over probability distributions and choices of utility function where the mix underperforms f′ as an upper bound. Since f is lower, we're taking the supremum over fewer options.
T5.1.2.3 Now, concavity. We'll work in reverse for this.
Let's regard the supremum as being attained (or coming arbitrarily close to being attained) by some particular choice of ζ,ζ′, and families Uπ1 and Uπ2(which are upper-bounded by f,f′ respectively) to get
=pEπ∼ζ[Θ(π)(Uπ1)]+(1−p)Eπ∼ζ′[Θ(π)(Uπ2)]
We can rewrite this as an integral to get
=p∫Π(λπ.Θ(π)(Uπ1))dζ+(1−p)∫Π(λπ.Θ(π)(Uπ2))dζ′
and then move the p and (1−p) in to get integration w.r.t. a measure.
=∫Π(λπ.Θ(π)(Uπ1))d(pζ)+∫Π(λπ.Θ(π)(Uπ2))d((1−p)ζ′)
And then, since pζ and (1−p)ζ′ are both absolutely continuous w.r.t. pζ+(1−p)ζ′, we can rewrite both of these integrals w.r.t the Radon-Nikodym derivative, as
(well, technically, this may not hold, but only on a set of measure 0 relative to pζ+(1−p)ζ′, so we can fiddle with things a bit so this always holds), we can apply concavity of Θ(π) because it's an inframeasure, for all π, and get
The Radon-Nikodym derivatives cancel to leave you with
=∫Π(λπ.Uπ1(π⋅e))d(pζ)+∫Π(λπ.Uπ2(π⋅e))d((1−p)ζ′)
and then pull out the constants to get
=p∫Π(λπ.Uπ1(π⋅e))dζ+(1−p)∫Π(λπ.Uπ2(π⋅e))dζ′
pack up as an expectation
=pEπ∼ζ[Uπ1(π⋅e)]+(1−p)Eπ∼ζ′[Uπ2(π⋅e)]
And use that
Eπ∼ζ[Uπ1(π⋅e)]≤f(e)
Eπ∼ζ′[Uπ2(π⋅e)]≤f′(e)
to get an upper bound of
≤(pf+(1−p)f′)(e)
Ok, we're done. We now know that pζ+(1−p)ζ′ is a probability distribution (mix of two probability distributions). We know that the expectation w.r.t. that distribution of the functions
undershoots pf+(1−p)f′. And since this is a mix of two existing utility functions, it'll be in [0,1] if the original utility functions were in that range already (or bounded if the originals were). Taking stock of what we've done so far, we've shown that
but the probability distribution, and functions used, ends up mixing to undershoot pf+(1−p)f′. So we can impose an upper bound of:
≤supζ′′,Uπ3,pf+(1−p)f′Eπ∼ζ′′[Θ(π)(Uπ3)]
which then packs up to equal
=E(pf+(1−p)f′)
and we're done, concavity has been shown.
T5.1.2.4 The compact-almost-support property on E is trivial since E is a compact space.
T5.1.2.5 For normalization, we'll need to do it in two parts, for both 0 and 1.
T5.1.2.5.1 First, we show normalization for 1, which requires showing both directions of the inequality. We start with
E(1)=supζ,Uπ,1Eπ∼ζ[Θ(π)(Uπ)]
and a possible choice that the supremum ranges over is the dirac-delta distribution on the π′ with the worst value of Θ(π)(1) (which is 1, by normalization for belief functions), and all the Uπ are 1. So then we get
≥Θ(π′)(1)=infπΘ(π)(1)=1
A lower bound of 1. For getting an upper bound, we have
E(1)=supζ,Uπ,1Eπ∼ζ[Θ(π)(Uπ)]
and then, the supremum ranges over ζ,Uπ where ∀e:Eπ∼ζ[Uπ(π⋅e)]≤1
Now, 1 can be written as U1(π′⋅e) where U1 is the utility function which maps everything to 1, and π′ is the policy with the worst value of Θ(π)(1), and we can apply our alternate formulation of causality for Θ to guarantee that
Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U1)
regardless of our choice of ζ,Uπ. And so, we can impose an upper-bound of
≤Θ(π′)(U1)=Θ(π′)(1)=infπΘ(π)(1)=1
and we're done, we've got inequalities going both way, so E(1)=1.
T5.1.2.5.2 Now for 0. We have
E(0)=supζ,Uπ,0Eπ∼ζ[Θ(π)(Uπ)]
and one thing that the supremum ranges over is the dirac-delta on π′ which attains the worst value of Θ(π)(0), and all the Uπ being 0, so we get
≥Θ(π′)(0)=infπΘ(π)(0)=0
and we have a lower-bound of 0. For an upper bound,
E(0)=supζ,Uπ,0Eπ∼ζ[Θ(π)(Uπ)]
and then, the supremum ranges over ζ,Uπ where ∀e:Eπ∼ζ[Uπ(π⋅e)]≤0
Now, 1 can be written as U0(π′⋅e) where U0 is the utility function which maps everything to 1, and π′ is the policy with the worst value of Θ(π)(0), and we can apply our alternate formulation of causality for Θ to guarantee that
Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U0)
regardless of our choice of ζ,Uπ. And so, we can impose an upper-bound of
≤Θ(π′)(U0)=Θ(π′)(0)=infπΘ(π)(0)=0
and we're done, we've got inequalities going both way, so E(0)=0.
T5.1.2.6 Now for Lipschitzness, which we'll need to split in two for the two type signatures.
T5.1.2.6.1 This will be completely provable for the R type signature on infradistributions. What we'll do is show that for any two functions f,f′, when λ⊙is the upper bound on the Lipschitz constant for the belief function Θ, we have
E(f)−E(inf(f,f′))≤λ⊙d(f,f′)
Since both f,f′ must be above inf(f,f′), by monotonicity, this result would show that
|E(f)−E(f′)|≤λ⊙d(f,f′)
and then we'd be done, that's Lipschitzness. So let's get working. We unpack as
Consider the supremum of the first term to be attained by some choice of ζ,Uπ1 fulfilling the property that ∀e:Eπ∼ζ[Uπ1(π⋅e)]≤f(e) and then we can rewrite as
f,f′ were arbitrary, so by monotonicity for E, we have
|E(f)−E(f′)|≤λ⊙d(f,f′)
and E has a finite Lipschitz constant for the R type signature.
T5.1.2.6.2 Now for the fiddly bits with the [0,1] type signature. The reason the above proof doesn't work for that type signature is that all our functions have to remain in [0,1], and it's possible that subtracting d(f,f′) from all the Uπ results in functions that go outside of [0,1], and it's way way harder than it looks to patch this.
We have all the other properties, though. What unbounded Lipschitz constant for E (as an expectation functional) looks like from the set point of view is that we've got minimal points in the set of a-measures over E (our E) with unboundedly large measure components. However, all the Θ(π) infradistribution sets have a bounded Lipschitz constant of λ⊙, all their minimal points can have only λ⊙ measure at most.
If you took your set E, and chucked out all the minimal points with measure greater than λ⊙ and built up an infradistribution E′ from the remaining minimal points, E′ would produce Θ again if you went back to a belief function. Why?
Well, the minimal points of an inframeasure/distribution are the truly critical part of them. As long as two sets of a-measures agree on which a-measures are minimal points, they're completely 100 percent equivalent in all respects. We travel back to a belief function Θ(π) via applying the pushforward (π⋅)∗(E). Doing this pushforward, in the set view keeps all the measure components of a-measures the same. So, chucking out all the a-measures in E with too much measure doesn't throw out any a-measures in E which are necessary to produce a minimal point in some Θ(π) (all of those have λ⊙ measure or less). When we apply the pushforward to E′ instead of E, we get a smaller set, sure, but the set perfectly agrees with the pushforward of E on the minimal points, so the sets they produce are equivalent. You get the same belief function back.
T5.1.2.7 Now for the last part, showing that we get the causality condition on E. Using our rewrite of E, we have
Note that there's a dependence of the second probability distribution and choice of utility functions on the the π selected, because we have a separate sup for each π. Since our rewrite of E(f) was
E(f)=supζ,Uπ,fEπ∼ζ[(Θ(π))(Uπ)]
we can abbreviate the interior of the big equation we have to get
=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
Which is the causality condition for E.
T5.1.3 For 3-static to 3-dynamic, it's easy, since ψT:=E so it can just copy over the relevant causality property from E, and we specified that the transition dynamics are exactly what we need, →T(e):=⊤A⋉(λa.δe(a),ea,e(a))
T5.1.4 For 3-dynamic to 3-static, this will be tricky, so first we set up some preliminaries. The translation process is unrolling the action-observation sequence to get an infradistribution over destinies, and then turning that into a belief function, and then turning that back into a set of copolicies. Ie,
E:=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(ψT⋉→T:∞)))
We can save on some work. We recently established that going from a causal belief function to the space of copolicies produces an infradistribution over copolicies with the causality condition. So, actually, all we need to do is show that doing the first two steps (and skipping the third) makes a causal belief function, and we'll be done.
This is essentially going from third-person dynamic to third-person static (in destiny form), to first-person static, so we should expect it to be a bit hard. So our goal is that
Θ(π):=u1∼π(pr(A×O)ω∗(ψT⋉→T:∞))
ends up making a causal belief function.
Obviously, we need to show the causality property for Θ. However, for all other belief function conditions, we get them for free as long as we have an actual infradistribution over histories (this can be shown by taking the arguments that you get a belief function from translating over an infradistribution back in the Acausal Square Theorem, and adapting them to the pseudocausal case, which is pretty easy). So, actually, all we need to show is that Θ is causal, and that pr(A×O)ω∗(ψT⋉→T:∞) is an infradistribution over histories, assuming that ψT is an infradistribution over copolicies.
So, the proof splits into four parts where we verify the niceness conditions on →T in order to show that the above line is an infradistribution over histories, and one part where we verify causality for Θ
First up, verifying the niceness conditions for →T so →T:∞ is well-defined.
T5.1.4.1 We'll start with lower-semicontinuity, though actually, we'll be going for something stronger, continuity.
And continuity has been shown. This inducts up to show continuity for →T:∞.
T5.1.4.2 1-Lipschitzness holds because →T is the semidirect product of a crisp infradistribution (⊤A) (and all crisp infradistributions are 1-Lipschitz) with a crisp infrakernel (also 1-Lipschitz), and the semidirect product of a 1-Lipschitz infrakernel and a 1-Lipschitz infradistribution is 1-Lipschitz.
T5.1.4.3 Compact-shared almost-compact-support is trivial because A×O×E is a compact space.
T5.1.4.4 For "constants increase", actually, crisp infrakernels map constants to the same constant. So we get a stronger result, that →T(e)(c)=c, and in particular, →T:∞(e)(c)=c (the property inducts up appropriately). This also has the implication that all the →T:∞(e) are infradistributions, as they map 0 to 0 and 1 to 1. This also bags the last condition, 1-normalization. That's all niceness conditions, so →T:∞is well-defined.
Now that we know it's well-defined at all, our goal is to show that
pr(A×O)ω∗(ψT⋉→T:∞)
is an infradistribution. This is, again, easy. The semidirect product of an infradistribution and a continuous infrakernel that returns infradistributions is an infradistribution, and the projection of an infradistribution is an infradistribution.
T5.1.4.5 With that, all that remains is showing that Θ, defined as
Θ(π′):=u1∼π′(pr(A×O)ω∗(ψT⋉→T:∞))
fulfills the causality property. We can rewrite the update and projection to get
Now, we should notice something really important here. pr(A×O)ω∗(→T:∞(e0)), because →T has complete ignorance over the actions, but has the observations and next copolicies determined by interacting with e0, is the infradistribution generated by all probabilistic mixes of histories capable of being produced by e0. We can think of this as being generated by dirac-delta distributions on all the histories compatible with the copolicy e0.
Updating a dirac-delta on a history on "is the history compatible with π′?" will, if it isn't compatible, make the (0,1) a-measure present in all infradistributions (cannibalize the measure into the b term), (or delete the point entirely for the R type signature) and if it is a dirac-delta distribution on a compatible history, leaves it alone. However, there's only one history compatible with π′ and e0, it's δπ′⋅e0, the history made from the two interacting. Thus,
u1∼π′(pr(A×O)ω∗(→T:∞(e0)))=δπ′⋅e0
Making this substitution, we get
=ψT(λe0.δπ′⋅e0(λao1:∞.U(ao1:∞)))
And putting the value in the function, we get
=ψT(λe0.U(π′⋅e0))
Now we can apply the causality condition our ψT has, yielding
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.Uπ(π⋅e0))]
and go in reverse, breaking this out into a dirac-delta to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.δπ⋅e0(Uπ))]
And rewriting this as an update on the projection of the semidirect product initialized on a particular copolicy, yielding
and pack this up into a belief function like how we defined it, to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
and we're done, we made a causal belief function, and we know that going further from that to the infradistribution on copolicies endows that with the causality condition.
T5.1.5 3-dynamic to 1-dynamic. We don't have to worry about the starting infradistributions, as they're identical to each other, so the relevant causality condition is inherited, we just have to check that the infrakernels transform appropriately. We have
→F(e,a):=prO×E∗(u1{a}(→T(e)))
For any action a, applying our definition, we have
→F(e,a)(f)=prO×E∗(u1{a}(→T(e)))(f)
Unpacking the projection and update, we have
=→T(e)(λa′,o′,e′.1a′=af(o′,e′)+1a′≠a)
And then we consider that →T(e)=⊤A⋉(λa.δe(a),ea,e(a)), so we have
and substituting the dirac-delta values in, we have
=⊤A(λa′.1a′=af(e(a),ea,e(a))+1a′≠a)
Unpacking what ⊤A is, we have
=infa′1a′=af(e(a),ea,e(a))+1a′≠a)
This is 1 or infinity when a′≠a, so a′=a.
=f(e(a),ea,e(a))
And then we just pull out a dirac-delta
=δe(a),ea,e(a)(f)
and we're done! The infrakernel has the right form.
T5.1.6 3-dynamic to 1-dynamic. We know →T(e):=⊤A⋉(λa.→F(e,a))
and we know →F(e,a)=δe(a),ea,e(a) and we want
→T(e)=⊤A⋉(λa.δe(a),ea,e(a))
This is tautological, no problems here.
T5.1.7 1-static to 1-dynamic. Well, we're going 1-static to 1-dynamic via
ψF:=⋀π(π⋅)∗(Θ(π))
So we can just copy the proof of 1-static to 3-static, the translation functions are the same in both cases. The infrakernel also works out appropriately.
T5.1.8 1-dynamic to 1-static. We define our belief function via
Θ(π′):=pr(A×O)ω∗(ψF⋉→π′F:∞)
This proof will be very similar to third-person dynamic to third-person static, but easier. To cut down on a truly aggravating amount of overhead, we invoke our Theorem 4 on pseudocausal IPOMDP's. Our state space is E. We only need to show that →F fulfills the niceness conditions, and we then know that Θ fulfills all belief function conditions except for maybe normalization.
Our game plan is to show that →F fulfills the niceness conditions on an infrakernel (four parts, since we can sort of lump 1-normalization in with moving constants up), show that Θ fulfills the normalization property (last thing needed to show it's a belief function that Theorem 4 doesn't get), and then show that Θ is causal, for 6 parts in total.
T5.1.8.1 First niceness condition, lower-semicontinuity. We'll go for continuity instead, it's as easy to show.
We know that an settles down to a∞ after finite time because the action space is finite, and en(a) settles down to e∞(a) for any action because eventually the copolicy settles down on what it does for the initial action, so we get
=limn→∞f(e∞(a∞),ena∞,e∞(a∞))
and, in the limit, because f is continuous, and en limits to e∞, we have
T5.1.8.2 Compact-shared compact-almost-support is trivial because O×E is a compact space.
T5.1.8.3,4 1-Lipschitzness, and mapping constants to constants (ie, →F(e,a)(c)=c) are trivial because it's a single probability distribution, and those are always 1-Lipschitz and map constants to constants. In particular, 1 gets mapped to 1 so that's the fifth niceness condition taken care of. The infinite infrakernel is well-defined, and our theorem can be invoked.
T5.1.8.5 To get normalization for Θ, we first observe that since all the →F map constants to the same constant, the same applies to all the →πF,n, and then to the infinite infrakernel →πF:∞. So we have
infπΘ(π)(1)=infπpr(A×O)ω∗(ψF⋉→πF:∞)(1)
=infπ(ψF⋉→πF:∞)(1)=infπψF(λe.→πF:∞(e)(1))
and then, since constants are mapped to the same constant, we have
=infπψF(1)=1
Where that last step was done by normalization for ψF since it's an infradistribution and thus normalized. The exact same argument works for 0, showing that the resulting Θ is normalized and thus a belief function.
T5.1.8.6 All that remains is to show causality for Θ. We want to show that
Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
To do this, just go
Θ(π′)(U)=pr(A×O)ω∗(ψF⋉→π′F:∞)(λao1:∞.U(ao1:∞))
and reexpress the projection and semidirect product to get
=ψF(λe0.→π′F:∞(e0)(λaoe1:∞.U(ao1:∞)))
and wrap this back up as a projection to get
=ψF(λe0.pr(A×O)ω∗(→π′F:∞(e0))(λao1:∞.U(ao1:∞)))
And then we observe that pr(A×O)ω∗(→π′F:∞(e0))is just δπ′⋅e0, because the infradistribution is "repeatedly play π′ against e0 and record the actions and observations you get", so making that substitution we get
=ψF(λe0.δπ′⋅e0(λao1:∞.U(ao1:∞)))
And then we can substite the dirac-delta in and get
=ψF(λe0.U(π′⋅e0))
and apply the causality condition for ψF to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψF(λe0.Uπ(π⋅e0))]
And then we go in reverse, substituting the dirac-delta for pr(A×O)ω∗(→πF:∞(e0)) and then pulling out projections and expressing as a semidirect product and reexpressing it as a projection to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[pr(A×O)ω∗(ψF⋉→πF:∞)(Uπ)]
and then pack it back up to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
and we're done, that's the last of it! Time for isomorphism directions.
T5.2.1 First, 1-static to 3-static to 1-static. We want to show
Which, by the causality condition for E, is E(f) and we're done! In fact, the causality condition for E is basically just saying that going 1-static to 3-static to 1-static is identity.
T5.2.2 For 3-static to 1-static to 3-static, we want to show that
Θ(π′)=(π′⋅)∗(⋀π(π⋅)∗(Θ(π)))
Taking the latter thing
(π′⋅)∗(⋀π(π⋅)∗(Θ(π)))(U)
and undoing the pushforward, we get
=⋀π(π⋅)∗(Θ(π))(λe.U(π′⋅e))
And reexpressing the intersection, we get
=supζ,fπ,U(π′⋅∙)Eπ∼ζ[(π⋅)∗(Θ(π))(λe.fπ(e))]
and reexpressing the pullback, we get
=supζ,fπ,U(π′⋅∙)Eπ∼ζ[supUπ,fπΘ(π)(λh.Uπ(h))]
folding the sups into one, we get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
And, by causality for Θ, this is just
=Θ(π′)(U)
and we're done there.
T5.2.3 For 3-static to 3-dynamic to 3-static, this will be tricky, because third-person dynamic to third-person static is a complicated operation. We want to show that
E=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(E⋉→T:∞)))
So, let's take the big thing and start unpacking it.
And then, by our earlier arguments when we were doing third-person dynamic to third-person static, we showed
u1∼π(pr(A×O)ω∗(→T:∞(e0)))=δπ⋅e0
So, making that substitution, we have
=supζ,Uπ,fEπ∼ζ[E(λe0.δπ⋅e0(λao:∞.Uπ(ao:∞)))]
Substituting the dirac-delta in, we get
=supζ,Uπ,fEπ∼ζ[E(λe0.Uπ(π⋅e0))]
Renaming our variables a little bit, we have
=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
and then, by the causality condition for E, this is just =E(f) and we're done.
T5.2.4 For 3-dynamic to 3-static to 3-dynamic, our desired result is
ψT=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(ψT⋉→T:∞)))
And the proof of this is the exact same as going third-person static to third-person dynamic and back, just use ψT instead of E. Also, →F(e) works out as it should.
T5.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don't need to worry about this.
T5.2.7 We're on the last two! For 1-dynamic to 1-static to 1-dynamic, we want to show that
ψF=⋀π(π⋅)∗(pr(A×O)ω∗(ψF⋉→πF:∞))
Taking the big part, we have
⋀π(π⋅)∗(pr(A×O)ω∗(ψF⋉→πF:∞))(f)
and undoing the big and, and the pullback via interaction with π, and merging the sups together as usual, (skipping a few steps we've seen several times before) we have
and then, because pr(A×O)ω∗(→πF:∞(e0))=δπ⋅e0 because unrolling a policy against a copolicy and recording the actions and observations does that, we have
=supζ,Uπ,fEπ∼ζ[ψF(λe0.δπ⋅e0(λao1:∞.Uπ(ao1:∞)))]
Then, substituting the value of the dirac-delta in, we get
=supζ,Uπ,fEπ∼ζ[ψF(λe0.Uπ(π⋅e0))]
And then, by causality for ψF, this is just
=ψF(f)
and we're done there.
T5.2.8 Last one, 1-static to 1-dynamic to 1-static. We want to show that
We use the usual trick of this projection of a semidirect product just being the dirac-delta on the copolicy interacting with the policy, and substituting that in to get
=(⋀π(π⋅)∗(Θ(π)))(λe0.U(π′⋅e0))
And then we do the usual unpacking of the intersection and the pullbacks to get:
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
Which, by causality for Θ, is just
=Θ(π′)(U)
and we're done with the isomorphisms.
T5.3 The 2 paths from first-person static to third-person dynamic being equivalent is pretty easy since we've got identity for two of the legs, and the other two legs are just "intersection of preimages of the Θ(π)", ie identical, and we're done!
Theorem 5: Causal Commutative Square: The following diagram commutes for any choice of causal Θ. Any choice of infradistribution E∈□E where E=⋀π(π⋅)∗((π⋅)∗(E)) also makes this diagram commute.
Again, like usual, this proof consists of doing preliminary work (section 0) where we clarify some stuff and find 4 conditions for the 4 corners that correspond to "maximal representative of a causal belief function". Section 1.1-8 is showing that these properties are preserved under all eight morphisms, where as usual, the complicated part of translating between a belief function and infradistribution is shoved in the first two sections. Sections 2.1-8 are then showing the 8 identities needed to establish 4 isomorphisms assuming the relevant properties, and then section 3 is going in two directions from one corner to another and showing equality to rule out nontrivial automorphisms.
T5.0 First, preliminaries. We use suprema over: probability distributions ζ on the space of policies, and utility functions Uπ we associate with each policy π, which fulfill the following constraint for some function f.
∀e:Eπ∼ζ[Uπ(π⋅e)]≤f(e)
To avoid writing all this out each time, we observe that for the purposes of the proof, all that varies in this format is the function f:E→[0,1] that we're using as an upper-bound, we always select probability distributions over policies and utility functions linked with the same policy, and impose upper bounds of that "for all copolicies" form on a mix of the relevant utility functions interacting with the copolicy. So we'll suppress all this in the notation and just say
supζ,Uπ,f
as our abbreviation, with ∙ for when we need a function to have an empty slot for copolicies. So something like U(π′⋅∙) would be "the function from copolicies to [0,1] given by letting π′ interact with the copolicy to get a history, and then applying U to it".
The defining property for the third-person static view is that E is an infradistribution, and, for all functions f:E→[0,1], E(f)=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
The property for the third-person dynamic view is that ψT is an infradistribution, and ψT(f)=supζ,Uπ,fEπ∼ζ[ψT(λe.Uπ(π⋅e))] and →T(e)=⊤A⋉(λa.δe(a),ea,e(a))
The property for the first-person static view is that Θ be a belief function, and that it fulfill causality, that Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
The property for the first-person dynamic view is that ψF is an infradistribution, and ψF(f)=supζ,Uπ,fEπ∼ζ[ψF(λe.Uπ(π⋅e))] and →F(e,a)=δe(a),ea,e(a)
Before showing that these properties are preserved under morphisms, it will be handy to have an equivalent rephrasing of causality for belief functions.
The causality condition is:
Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
We can unconditionally derive the statement
Θ(π′)(U)≤supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
because, letting ζ be the dirac-delta distribution on π′, and Uπ′=U, we trivially fulfill the necessary condition for the supremum, that ∀e:Eζ[Uπ(π⋅e)]≤U(π′⋅e)
So, we get the fact
Θ(π′)(U)=Eπ∼δπ′[Θ(π)(Uπ)]≤supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
With this knowledge, the causality condition holds iff
Θ(π′)(U)≥supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
and this holds iff
∀ζ∈ΔΠ,Uπ:(∀e:Eπ∼ζ[Uπ(π⋅e)]≤U(π′⋅e))→Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U)
In the first direction, this happens because any particular choice of ζ and Uπ which fulfills the implication antecendent must have the expectation of Θ(π)(Uπ) doing worse than the supremum of such expectation, which still loses to Θ(π′)(U).
In the other direction, this happens because the supremum only ranges over distributions on policies and choices of utility function that fulfills the implication antecedent, so the consequent always kicks in. Thus, even the largest possible expectation of Θ(π)(Uπ) isn't above Θ(π′)(U).
Thus, we will be taking
∀ζ∈ΔΠ,Uπ:(∀e:Eπ∼ζ[Uπ(π⋅e)]≤U(π′⋅e))→Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U)
as an alternative phrasing of causality for belief functions.
Time to show that these properties are preserved under the morphisms. As usual, the hard part is showing property preservation between first-person static and first-person dynamic.
T5.1.1 3-static to 1-static. Our translation is
Θ(π):=(π⋅)∗(E)
and we must check the five conditions of a belief function, as well as causality. So the proof splits into six parts for those six desired results. The five conditions for a belief function are a bounded Lipschitz constant, lower-semicontinuity, normalization, sensible supports, and agreement on max value. We'll just be using that E is an infradistribution, and not that it has any additional properties beyond that. The pushforward of an infradistribution is always an infradistribution, so we know that Θ(π) is always an infradistribution.
T5.1.1.1 For a bounded Lipschitz constant, we have
|Θ(π)(U)−Θ(π)(U′)|=|(π⋅)∗(E)(U)−(π⋅)∗(E)(U′)|
=|E(λe.U(π⋅e))−E(λe.U′(π⋅e))|
and then, since E has some Lipschitz constant λ⊙, we then have
≤λ⊙supe|U(π⋅e)−U′(π⋅e)|
And, for any π,e pair, it induces a history, so this is upper-bounded by
≤λ⊙suph|U(h)−U′(h)|=λ⊙d(U,U′)
So a uniform Lipschitz constant upper bound is shown.
T5.1.1.2 For lower-semicontinuity, it's actually easy to show for once. We get something even stronger. All causal belief functions aren't just lower-semicontinuous in their first argument, they're continuous. The proof is:
limn→∞Θ(πn)(U)=limn→∞(πn⋅)∗(E)(U)=limn→∞E(λe.U(πn⋅e))
Now, the thing to realize here is that λe.U(πn⋅e) uniformly converges to λe.U(π∞⋅e). Why does this happen? Well, once πn gets close enough to π∞, it will perfectly agree with π∞ about what to do in any situation for the first m steps. So, vs any copolicy at all, the histories πn⋅e (for all sufficiently late n) and π∞⋅e will agree for the first m steps. And, since U is continuous on (A×O)ω which is a compact space, it must be uniformly continuous. So for any ϵ difference in output, there's some m where knowing the first m steps of history lets you pin down the value of U to within ϵ.
Then, we can just appeal to the finite Lipschitz constant of E to conclude that, since we have uniform convergence of functions, we have
=E(λe.U(π∞⋅e))
which then packs back up in reverse as
=Θ(π∞)(U)
That was very easy compared to our typical lower-semicontinuity proofs!
T5.1.1.3 For normalization, we have
infπΘ(π)(1)=infπ(π⋅)∗(E)(1)=infπE(1)=1
And the same goes for 0, because E is an infradistribution.
T5.1.1.4 For sensible supports, assume U,U′ agree on histories that can be produced by π. Then,
|Θ(π)(U)−Θ(π)(U′)|=|(π⋅)∗(E)(U)−(π⋅)∗(E)(U′)|
=|E(λe.U(π⋅e))−E(λe.U′(π⋅e))|
But then wait, for any e, π⋅e makes a history that can be made by π, so then U,U′ must agree on it. Those two inner functions are equal, and so attain equal value, and we have =0.
T5.1.1.5 For agreement on max value, you can just go
Θ(π)(1)=(π⋅)∗(E)(1)=E(1)=(π′⋅)∗(E)(1)=Θ(π′)(1)
and
Θ(π)(∞)=(π⋅)∗(E)(∞)=E(∞)=(π′⋅)∗(E)(∞)=Θ(π′)(∞)
and you're done.
T5.1.1.6 That just leaves checking causality. We'll make a point to not assume any special conditions on E at all in order to show this, because the
E(f)=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
condition is actually saying that E is the maximal infradistribution that corresponds to a causal belief function, but it'll be useful later to know that any infradistribution on copolicies makes a causal belief function.
We'll be showing our alternate characterization of causality, that
∀ζ∈ΔΠ,Uπ,U:(∀e:Eπ∼ζ[Uπ(π⋅e)]≤U(π′⋅e))→Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U)
So, pick an arbitrary ζ, family of Uπ, utility function U, and policy π′, and assume ∀e:Eπ∼ζ[Uπ(π⋅e)]≤U(π′⋅e) Then, we can go:
Eπ∼ζ[Θ(π)(Uπ)]=Eπ∼ζ[(π⋅)∗(E)(Uπ)]=Eπ∼ζ[E(λe.Uπ(π⋅e))]
and then, by applying concavity of infradistributions, we can move the expectation inside and the value goes up, and then we apply monotonicity via the precondition we assumed.
≤E(λe.Eπ∼ζ[Uπ(π⋅e)])≤E(λe.U(π′⋅e))
and then we just pack this back up.
=(π′⋅)∗(E)(U)=Θ(π′)(U)
and we have the causality condition shown.
T5.1.2 1-static to 3-static. Our translation is:
E:=⋀π(π⋅)∗(Θ(π))
The way this proof works is first, we unpack E into a different form in preparation for what follows, that's phase 1. Then there's 6 more phases after that, where we show the five infradistribution conditions on E and the causality condition. Some of these are hard to show so we'll have to split them into more parts.
T5.1.2.1 We'll unpack this first, in preparation for showing the infradistribution properties and causality property for E. We can go
E(f)=⋀π(π⋅)∗(Θ(π))(f)
and then express the intersection as a supremum over ζ∈ΔΠ and fπ:E→[0,1] such that ∀e:Eπ∼ζ[fπ(e)]≤f(e) because that's just how intersections work. This is abbreviated as supζ,fπ,f.
=supζ,fπ,fEπ∼ζ[(π⋅)∗(Θ(π))(fπ)]
Then, for the pullback, we take the supremum over Uπ where Uπ(π⋅e)≤fπ(e) for all e. We'll write this as supUπ,fπ.
This makes
=supζ,fπ,fEπ∼ζ[supUπ,fπΘ(π)(Uπ)]
Now, these Uπ(π⋅∙) always make an acceptable substitute for the fπ functions early on, as they're always lower than fπ regardless of copolicy. Further, if you're attempting to attain the supremum, you'll never actually use your fπ, it will always be Uπ being plugged in. So, we can just wrap up all the maximization into one supremum, to make
=supζ,Uπ,fEπ∼ζ[Θ(π)(Uπ)]
We'll be using this phrasing as a more explicit unpacking of E(f).
Now, for the infradistribution conditions on E, and the causality condition, there's a very important technicality going on. Intersection behaves much worse for the [0,1] type signature. In the R type signature, everything works out perfectly. However, in the [0,1] type signature, we'll be able to show all the needed properties for E except having a finite Lipschitz constant. But, not all is lost, because to compensate for that, we'll show that you can swap out E (almost an infradistribution) for an actual infradistribution E′ that induces the same belief function Θ.
Like we said back in the post, it isn't terribly important that we use the true maximal representative E, we just want some infradistribution in the same equivalence class.
T5.1.2.2 First up, monotonicity. Assume f′≥f. Then, using our rewrite of E(f′) and E(f), we have
E(f′)=supζ,Uπ,f′Eπ∼ζ[Θ(π)(Uπ)]≥supζ,Uπ,fEπ∼ζ[Θ(π)(Uπ)]=E(f)
and monotonicity is shown. This occurred because we're taking the supremum over probability distributions and choices of utility function where the mix underperforms f′ as an upper bound. Since f is lower, we're taking the supremum over fewer options.
T5.1.2.3 Now, concavity. We'll work in reverse for this.
pE(f)+(1−p)E(f′)=psupζ,Uπ1,fEπ∼ζ[Θ(π)(Uπ1)]+(1−p)supζ′,Uπ2,f′Eπ∼ζ′[Θ(π)(Uπ2)]
Let's regard the supremum as being attained (or coming arbitrarily close to being attained) by some particular choice of ζ,ζ′, and families Uπ1 and Uπ2(which are upper-bounded by f,f′ respectively) to get
=pEπ∼ζ[Θ(π)(Uπ1)]+(1−p)Eπ∼ζ′[Θ(π)(Uπ2)]
We can rewrite this as an integral to get
=p∫Π(λπ.Θ(π)(Uπ1))dζ+(1−p)∫Π(λπ.Θ(π)(Uπ2))dζ′
and then move the p and (1−p) in to get integration w.r.t. a measure.
=∫Π(λπ.Θ(π)(Uπ1))d(pζ)+∫Π(λπ.Θ(π)(Uπ2))d((1−p)ζ′)
And then, since pζ and (1−p)ζ′ are both absolutely continuous w.r.t. pζ+(1−p)ζ′, we can rewrite both of these integrals w.r.t the Radon-Nikodym derivative, as
=∫Π(λπ.Θ(π)(Uπ1)⋅d(pζ)d(pζ+(1−p)ζ′)(π))d(pζ+(1−p)ζ′)
+∫Π(λπ.Θ(π)(Uπ2)⋅d((1−p)ζ′)d(pζ+(1−p)ζ′)(π))d(pζ+(1−p)ζ′)
and pack these up into one integral as the probability distribution we're integrating over is the same in both cases,
=∫Π(λπ.d(pζ)d(pζ+(1−p)ζ′)(π)⋅Θ(π)(Uπ1)+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Θ(π)(Uπ2))
d(pζ+(1−p)ζ′)
And then, because
d(pζ)d(pζ+(1−p)ζ′)(π)+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)=1
(well, technically, this may not hold, but only on a set of measure 0 relative to pζ+(1−p)ζ′, so we can fiddle with things a bit so this always holds), we can apply concavity of Θ(π) because it's an inframeasure, for all π, and get
≤∫Π(λπ.Θ(π)(d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2))
d(pζ+(1−p)ζ′)
and then pack it back up as an expectation, to get
=Eπ∼pζ+(1−p)ζ′[Θ(π)(d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2)]
We will now attempt to show that, for all copolicies e,
Eπ∼pζ+(1−p)ζ′[(d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2)(π⋅e)]
≤(pf+(1−p)f′)(e)
We can take that big expectation and write it as
=Eπ∼pζ+(1−p)ζ′[d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1(π⋅e)+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2(π⋅e)]
and then write it as an integral
=∫Π(λπ.d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1(π⋅e)+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2(π⋅e))
d(pζ+(1−p)ζ′)
Split it up into two integrals
=∫Π(λπ.Uπ1(π⋅e)⋅d(pζ)d(pζ+(1−p)ζ′)(π))d(pζ+(1−p)ζ′)
+∫Π(λπ.Uπ2(π⋅e)⋅d((1−p)ζ′)d(pζ+(1−p)ζ′)(π))d(pζ+(1−p)ζ′)
The Radon-Nikodym derivatives cancel to leave you with
=∫Π(λπ.Uπ1(π⋅e))d(pζ)+∫Π(λπ.Uπ2(π⋅e))d((1−p)ζ′)
and then pull out the constants to get
=p∫Π(λπ.Uπ1(π⋅e))dζ+(1−p)∫Π(λπ.Uπ2(π⋅e))dζ′
pack up as an expectation
=pEπ∼ζ[Uπ1(π⋅e)]+(1−p)Eπ∼ζ′[Uπ2(π⋅e)]
And use that
Eπ∼ζ[Uπ1(π⋅e)]≤f(e)
Eπ∼ζ′[Uπ2(π⋅e)]≤f′(e)
to get an upper bound of
≤(pf+(1−p)f′)(e)
Ok, we're done. We now know that pζ+(1−p)ζ′ is a probability distribution (mix of two probability distributions). We know that the expectation w.r.t. that distribution of the functions
λe.(d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2)(π⋅e)
undershoots pf+(1−p)f′. And since this is a mix of two existing utility functions, it'll be in [0,1] if the original utility functions were in that range already (or bounded if the originals were). Taking stock of what we've done so far, we've shown that
pE(f)+(1−p)E(f′)≤Eπ∼pζ+(1−p)ζ′[Θ(π)(d(pζ)d(pζ+(1−p)ζ′)(π)⋅Uπ1+d((1−p)ζ′)d(pζ+(1−p)ζ′)(π)⋅Uπ2)]
but the probability distribution, and functions used, ends up mixing to undershoot pf+(1−p)f′. So we can impose an upper bound of:
≤supζ′′,Uπ3,pf+(1−p)f′Eπ∼ζ′′[Θ(π)(Uπ3)]
which then packs up to equal
=E(pf+(1−p)f′)
and we're done, concavity has been shown.
T5.1.2.4 The compact-almost-support property on E is trivial since E is a compact space.
T5.1.2.5 For normalization, we'll need to do it in two parts, for both 0 and 1.
T5.1.2.5.1 First, we show normalization for 1, which requires showing both directions of the inequality. We start with
E(1)=supζ,Uπ,1Eπ∼ζ[Θ(π)(Uπ)]
and a possible choice that the supremum ranges over is the dirac-delta distribution on the π′ with the worst value of Θ(π)(1) (which is 1, by normalization for belief functions), and all the Uπ are 1. So then we get
≥Θ(π′)(1)=infπΘ(π)(1)=1
A lower bound of 1. For getting an upper bound, we have
E(1)=supζ,Uπ,1Eπ∼ζ[Θ(π)(Uπ)]
and then, the supremum ranges over ζ,Uπ where ∀e:Eπ∼ζ[Uπ(π⋅e)]≤1
Now, 1 can be written as U1(π′⋅e) where U1 is the utility function which maps everything to 1, and π′ is the policy with the worst value of Θ(π)(1), and we can apply our alternate formulation of causality for Θ to guarantee that
Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U1)
regardless of our choice of ζ,Uπ. And so, we can impose an upper-bound of
≤Θ(π′)(U1)=Θ(π′)(1)=infπΘ(π)(1)=1
and we're done, we've got inequalities going both way, so E(1)=1.
T5.1.2.5.2 Now for 0. We have
E(0)=supζ,Uπ,0Eπ∼ζ[Θ(π)(Uπ)]
and one thing that the supremum ranges over is the dirac-delta on π′ which attains the worst value of Θ(π)(0), and all the Uπ being 0, so we get
≥Θ(π′)(0)=infπΘ(π)(0)=0
and we have a lower-bound of 0. For an upper bound,
E(0)=supζ,Uπ,0Eπ∼ζ[Θ(π)(Uπ)]
and then, the supremum ranges over ζ,Uπ where ∀e:Eπ∼ζ[Uπ(π⋅e)]≤0
Now, 1 can be written as U0(π′⋅e) where U0 is the utility function which maps everything to 1, and π′ is the policy with the worst value of Θ(π)(0), and we can apply our alternate formulation of causality for Θ to guarantee that
Eπ∼ζ[Θ(π)(Uπ)]≤Θ(π′)(U0)
regardless of our choice of ζ,Uπ. And so, we can impose an upper-bound of
≤Θ(π′)(U0)=Θ(π′)(0)=infπΘ(π)(0)=0
and we're done, we've got inequalities going both way, so E(0)=0.
T5.1.2.6 Now for Lipschitzness, which we'll need to split in two for the two type signatures.
T5.1.2.6.1 This will be completely provable for the R type signature on infradistributions. What we'll do is show that for any two functions f,f′, when λ⊙is the upper bound on the Lipschitz constant for the belief function Θ, we have
E(f)−E(inf(f,f′))≤λ⊙d(f,f′)
Since both f,f′ must be above inf(f,f′), by monotonicity, this result would show that
|E(f)−E(f′)|≤λ⊙d(f,f′)
and then we'd be done, that's Lipschitzness. So let's get working. We unpack as
E(f)−E(inf(f,f′))=supζ,Uπ1,fEπ∼ζ[Θ(π)(Uπ1)]−supζ′,Uπ2,inf(f,f′)Eπ∼ζ[Θ(π)(Uπ2)]
Consider the supremum of the first term to be attained by some choice of ζ,Uπ1 fulfilling the property that ∀e:Eπ∼ζ[Uπ1(π⋅e)]≤f(e) and then we can rewrite as
=Eπ∼ζ[Θ(π)(Uπ1)]−supζ′,Uπ2,inf(f,f′)Eπ∼ζ[Θ(π)(Uπ2)]
Now, let your choice of ζ′ be ζ, and your choice of the Uπ2 family be Uπ1−d(f,f′). We do need to show that
∀e:Eπ∼ζ[Uπ1(π⋅e)−d(f,f′)]≤inf(f,f′)(e)
This is easy, we can fix an arbitrary e, and go
Eπ∼ζ[Uπ1(π⋅e)−d(f,f′)]=Eπ∼ζ[Uπ1(π⋅e)]−d(f,f′)≤f(e)−d(f,f′)≤inf(f,f′)(e)
So, this choice of ζ′ and Uπ2 works out. Now we can go
Eπ∼ζ[Θ(π)(Uπ1)]−supζ′,Uπ2,inf(f,f′)Eπ∼ζ[Θ(π)(Uπ2)]
≤Eπ∼ζ[Θ(π)(Uπ1)]−Eπ∼ζ[Θ(π)(Uπ1−d(f,f′))]
And then, by monotonicity and a shared Lipschitz constant for the Θ(π), we have
≤Eπ∼ζ[Θ(π)(Uπ1)]−Eπ∼ζ[Θ(π)(Uπ1)−λ⊙d(f,f′)]
=Eπ∼ζ[Θ(π)(Uπ1)]−Eπ∼ζ[Θ(π)(Uπ1)]+λ⊙d(f,f′)=λ⊙d(f,f′)
And we've shown what we came here to show,
E(f)−E(inf(f,f′))≤λ⊙d(f,f′)
f,f′ were arbitrary, so by monotonicity for E, we have
|E(f)−E(f′)|≤λ⊙d(f,f′)
and E has a finite Lipschitz constant for the R type signature.
T5.1.2.6.2 Now for the fiddly bits with the [0,1] type signature. The reason the above proof doesn't work for that type signature is that all our functions have to remain in [0,1], and it's possible that subtracting d(f,f′) from all the Uπ results in functions that go outside of [0,1], and it's way way harder than it looks to patch this.
We have all the other properties, though. What unbounded Lipschitz constant for E (as an expectation functional) looks like from the set point of view is that we've got minimal points in the set of a-measures over E (our E) with unboundedly large measure components. However, all the Θ(π) infradistribution sets have a bounded Lipschitz constant of λ⊙, all their minimal points can have only λ⊙ measure at most.
If you took your set E, and chucked out all the minimal points with measure greater than λ⊙ and built up an infradistribution E′ from the remaining minimal points, E′ would produce Θ again if you went back to a belief function. Why?
Well, the minimal points of an inframeasure/distribution are the truly critical part of them. As long as two sets of a-measures agree on which a-measures are minimal points, they're completely 100 percent equivalent in all respects. We travel back to a belief function Θ(π) via applying the pushforward (π⋅)∗(E). Doing this pushforward, in the set view keeps all the measure components of a-measures the same. So, chucking out all the a-measures in E with too much measure doesn't throw out any a-measures in E which are necessary to produce a minimal point in some Θ(π) (all of those have λ⊙ measure or less). When we apply the pushforward to E′ instead of E, we get a smaller set, sure, but the set perfectly agrees with the pushforward of E on the minimal points, so the sets they produce are equivalent. You get the same belief function back.
T5.1.2.7 Now for the last part, showing that we get the causality condition on E. Using our rewrite of E, we have
E(f)=supζ,Uπ,fEπ∼ζ[(Θ(π))(Uπ)]
and then, by causality for Θ, we get
=supζ,Uπ,fEπ∼ζ[supζ′π,Uπ′π,Uπ(π⋅∙)Eπ′∼ζ′π[(Θ(π′))(Uπ′π)]]
Note that there's a dependence of the second probability distribution and choice of utility functions on the the π selected, because we have a separate sup for each π. Since our rewrite of E(f) was
E(f)=supζ,Uπ,fEπ∼ζ[(Θ(π))(Uπ)]
we can abbreviate the interior of the big equation we have to get
=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
Which is the causality condition for E.
T5.1.3 For 3-static to 3-dynamic, it's easy, since ψT:=E so it can just copy over the relevant causality property from E, and we specified that the transition dynamics are exactly what we need, →T(e):=⊤A⋉(λa.δe(a),ea,e(a))
T5.1.4 For 3-dynamic to 3-static, this will be tricky, so first we set up some preliminaries. The translation process is unrolling the action-observation sequence to get an infradistribution over destinies, and then turning that into a belief function, and then turning that back into a set of copolicies. Ie,
E:=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(ψT⋉→T:∞)))
We can save on some work. We recently established that going from a causal belief function to the space of copolicies produces an infradistribution over copolicies with the causality condition. So, actually, all we need to do is show that doing the first two steps (and skipping the third) makes a causal belief function, and we'll be done.
This is essentially going from third-person dynamic to third-person static (in destiny form), to first-person static, so we should expect it to be a bit hard. So our goal is that
Θ(π):=u1∼π(pr(A×O)ω∗(ψT⋉→T:∞))
ends up making a causal belief function.
Obviously, we need to show the causality property for Θ. However, for all other belief function conditions, we get them for free as long as we have an actual infradistribution over histories (this can be shown by taking the arguments that you get a belief function from translating over an infradistribution back in the Acausal Square Theorem, and adapting them to the pseudocausal case, which is pretty easy). So, actually, all we need to show is that Θ is causal, and that pr(A×O)ω∗(ψT⋉→T:∞) is an infradistribution over histories, assuming that ψT is an infradistribution over copolicies.
So, the proof splits into four parts where we verify the niceness conditions on →T in order to show that the above line is an infradistribution over histories, and one part where we verify causality for Θ
First up, verifying the niceness conditions for →T so →T:∞ is well-defined.
T5.1.4.1 We'll start with lower-semicontinuity, though actually, we'll be going for something stronger, continuity.
limn→∞→T(en)(f)=limn→∞(⊤A⋉(λa.δen(a),ena,en(a)))(λa,o,e.f(a,o,e))
and then unpack the semidirect product and subsitute the dirac-delta values in to get
=limn→∞⊤A(λa.f(a,en(a),ena,en(a)))
and unpack what ⊤A means to get
=limn→∞infaf(a,en(a),ena,en(a))
There are finitely many actions, so we can shift the limit in, to get
=infalimn→∞f(a,en(a),ena,en(a))
Now, as en approaches e∞, the observation returned for the first action stabilizes, so we get
=infalimn→∞f(a,e∞(a),ena,e∞(a))
And then, due to continuity of f, and ena,e∞(a) converging to e∞a,e∞(a) because en converges to e∞, we have
=infaf(a,e∞(a),e∞a,e∞(a))
Which then packs up in reverse as
=⊤A(λa.f(a,e∞(a),e∞a,e∞(a)))
=(⊤A⋉(λa.δe∞(a),e∞a,e∞(a)))(λa,o,e.f(a,o,e))=→T(e∞)(f)
And continuity has been shown. This inducts up to show continuity for →T:∞.
T5.1.4.2 1-Lipschitzness holds because →T is the semidirect product of a crisp infradistribution (⊤A) (and all crisp infradistributions are 1-Lipschitz) with a crisp infrakernel (also 1-Lipschitz), and the semidirect product of a 1-Lipschitz infrakernel and a 1-Lipschitz infradistribution is 1-Lipschitz.
T5.1.4.3 Compact-shared almost-compact-support is trivial because A×O×E is a compact space.
T5.1.4.4 For "constants increase", actually, crisp infrakernels map constants to the same constant. So we get a stronger result, that →T(e)(c)=c, and in particular, →T:∞(e)(c)=c (the property inducts up appropriately). This also has the implication that all the →T:∞(e) are infradistributions, as they map 0 to 0 and 1 to 1. This also bags the last condition, 1-normalization. That's all niceness conditions, so →T:∞is well-defined.
Now that we know it's well-defined at all, our goal is to show that
pr(A×O)ω∗(ψT⋉→T:∞)
is an infradistribution. This is, again, easy. The semidirect product of an infradistribution and a continuous infrakernel that returns infradistributions is an infradistribution, and the projection of an infradistribution is an infradistribution.
T5.1.4.5 With that, all that remains is showing that Θ, defined as
Θ(π′):=u1∼π′(pr(A×O)ω∗(ψT⋉→T:∞))
fulfills the causality property. We can rewrite the update and projection to get
Θ(π′)(U)=(ψT⋉→T:∞)(λe0,aoe1:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′)
and rewrite the semidirect product
=ψT(λe0.→T:∞(e0)(λaoe1:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′))
and then write this as a projection,
=ψT(λe0.pr(A×O)ω∗(→T:∞(e0))(λao1:∞.1ao1:∞∼π′U(ao1:∞)+1ao1:∞≁π′))
and then write this as an update
=ψT(λe0.u1∼π′(pr(A×O)ω∗(→T:∞(e0)))(λao1:∞.U(ao1:∞)))
Now, we should notice something really important here. pr(A×O)ω∗(→T:∞(e0)), because →T has complete ignorance over the actions, but has the observations and next copolicies determined by interacting with e0, is the infradistribution generated by all probabilistic mixes of histories capable of being produced by e0. We can think of this as being generated by dirac-delta distributions on all the histories compatible with the copolicy e0.
Updating a dirac-delta on a history on "is the history compatible with π′?" will, if it isn't compatible, make the (0,1) a-measure present in all infradistributions (cannibalize the measure into the b term), (or delete the point entirely for the R type signature) and if it is a dirac-delta distribution on a compatible history, leaves it alone. However, there's only one history compatible with π′ and e0, it's δπ′⋅e0, the history made from the two interacting. Thus,
u1∼π′(pr(A×O)ω∗(→T:∞(e0)))=δπ′⋅e0
Making this substitution, we get
=ψT(λe0.δπ′⋅e0(λao1:∞.U(ao1:∞)))
And putting the value in the function, we get
=ψT(λe0.U(π′⋅e0))
Now we can apply the causality condition our ψT has, yielding
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.Uπ(π⋅e0))]
and go in reverse, breaking this out into a dirac-delta to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.δπ⋅e0(Uπ))]
And rewriting this as an update on the projection of the semidirect product initialized on a particular copolicy, yielding
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.u1∼π(pr(A×O)ω∗(→T:∞(e0)))(λao1:∞.Uπ(ao1:∞)))]
And expand the update
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.pr(A×O)ω∗(→T:∞(e0))(λao1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π))]
And remove the projection
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψT(λe0.→T:∞(e0)(λaoe1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π))]
And pack up as a semidirect product and projection
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[pr(A×O)ω∗(ψT⋉→T:∞)(λao1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π)]
and pack up the update
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[u1∼π(pr(A×O)ω∗(ψT⋉→T:∞))(λao1:∞.Uπ(ao1:∞))]
and do just one more rewrite to crunch down the function a little bit
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[u1∼π(pr(A×O)ω∗(ψT⋉→T:∞))(Uπ)]
and pack this up into a belief function like how we defined it, to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
and we're done, we made a causal belief function, and we know that going further from that to the infradistribution on copolicies endows that with the causality condition.
T5.1.5 3-dynamic to 1-dynamic. We don't have to worry about the starting infradistributions, as they're identical to each other, so the relevant causality condition is inherited, we just have to check that the infrakernels transform appropriately. We have
→F(e,a):=prO×E∗(u1{a}(→T(e)))
For any action a, applying our definition, we have
→F(e,a)(f)=prO×E∗(u1{a}(→T(e)))(f)
Unpacking the projection and update, we have
=→T(e)(λa′,o′,e′.1a′=af(o′,e′)+1a′≠a)
And then we consider that →T(e)=⊤A⋉(λa.δe(a),ea,e(a)), so we have
=(⊤A⋉(λa.δe(a),ea,e(a)))(λa′,o′,e′.1a′=af(o′,e′)+1a′≠a)
Unpacking the semidirect product, we have
=⊤A(λa′.δe(a),ea,e(a)(λo′,e′.1a′=af(o′,e′)+1a′≠a))
and substituting the dirac-delta values in, we have
=⊤A(λa′.1a′=af(e(a),ea,e(a))+1a′≠a)
Unpacking what ⊤A is, we have
=infa′1a′=af(e(a),ea,e(a))+1a′≠a)
This is 1 or infinity when a′≠a, so a′=a.
=f(e(a),ea,e(a))
And then we just pull out a dirac-delta
=δe(a),ea,e(a)(f)
and we're done! The infrakernel has the right form.
T5.1.6 3-dynamic to 1-dynamic. We know →T(e):=⊤A⋉(λa.→F(e,a))
and we know →F(e,a)=δe(a),ea,e(a) and we want
→T(e)=⊤A⋉(λa.δe(a),ea,e(a))
This is tautological, no problems here.
T5.1.7 1-static to 1-dynamic. Well, we're going 1-static to 1-dynamic via
ψF:=⋀π(π⋅)∗(Θ(π))
So we can just copy the proof of 1-static to 3-static, the translation functions are the same in both cases. The infrakernel also works out appropriately.
T5.1.8 1-dynamic to 1-static. We define our belief function via
Θ(π′):=pr(A×O)ω∗(ψF⋉→π′F:∞)
This proof will be very similar to third-person dynamic to third-person static, but easier. To cut down on a truly aggravating amount of overhead, we invoke our Theorem 4 on pseudocausal IPOMDP's. Our state space is E. We only need to show that →F fulfills the niceness conditions, and we then know that Θ fulfills all belief function conditions except for maybe normalization.
Our game plan is to show that →F fulfills the niceness conditions on an infrakernel (four parts, since we can sort of lump 1-normalization in with moving constants up), show that Θ fulfills the normalization property (last thing needed to show it's a belief function that Theorem 4 doesn't get), and then show that Θ is causal, for 6 parts in total.
T5.1.8.1 First niceness condition, lower-semicontinuity. We'll go for continuity instead, it's as easy to show.
limn→∞→F(en,an)(f)=limn→∞δen(an),enan,en(an)(λo,e.f(o,e))
=limn→∞f(en(an),enan,en(an))
We know that an settles down to a∞ after finite time because the action space is finite, and en(a) settles down to e∞(a) for any action because eventually the copolicy settles down on what it does for the initial action, so we get
=limn→∞f(e∞(a∞),ena∞,e∞(a∞))
and, in the limit, because f is continuous, and en limits to e∞, we have
=f(e∞(a∞),e∞a∞,e∞(a∞))=δe∞(a∞),e∞a∞,e∞(a∞)(λo,e.f(o,e))
=→F(e∞,a∞)(f)
and we're done, continuity is shown.
T5.1.8.2 Compact-shared compact-almost-support is trivial because O×E is a compact space.
T5.1.8.3,4 1-Lipschitzness, and mapping constants to constants (ie, →F(e,a)(c)=c) are trivial because it's a single probability distribution, and those are always 1-Lipschitz and map constants to constants. In particular, 1 gets mapped to 1 so that's the fifth niceness condition taken care of. The infinite infrakernel is well-defined, and our theorem can be invoked.
T5.1.8.5 To get normalization for Θ, we first observe that since all the →F map constants to the same constant, the same applies to all the →πF,n, and then to the infinite infrakernel →πF:∞. So we have
infπΘ(π)(1)=infπpr(A×O)ω∗(ψF⋉→πF:∞)(1)
=infπ(ψF⋉→πF:∞)(1)=infπψF(λe.→πF:∞(e)(1))
and then, since constants are mapped to the same constant, we have
=infπψF(1)=1
Where that last step was done by normalization for ψF since it's an infradistribution and thus normalized. The exact same argument works for 0, showing that the resulting Θ is normalized and thus a belief function.
T5.1.8.6 All that remains is to show causality for Θ. We want to show that
Θ(π′)(U)=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
To do this, just go
Θ(π′)(U)=pr(A×O)ω∗(ψF⋉→π′F:∞)(λao1:∞.U(ao1:∞))
and reexpress the projection and semidirect product to get
=ψF(λe0.→π′F:∞(e0)(λaoe1:∞.U(ao1:∞)))
and wrap this back up as a projection to get
=ψF(λe0.pr(A×O)ω∗(→π′F:∞(e0))(λao1:∞.U(ao1:∞)))
And then we observe that pr(A×O)ω∗(→π′F:∞(e0))is just δπ′⋅e0, because the infradistribution is "repeatedly play π′ against e0 and record the actions and observations you get", so making that substitution we get
=ψF(λe0.δπ′⋅e0(λao1:∞.U(ao1:∞)))
And then we can substite the dirac-delta in and get
=ψF(λe0.U(π′⋅e0))
and apply the causality condition for ψF to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[ψF(λe0.Uπ(π⋅e0))]
And then we go in reverse, substituting the dirac-delta for pr(A×O)ω∗(→πF:∞(e0)) and then pulling out projections and expressing as a semidirect product and reexpressing it as a projection to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[pr(A×O)ω∗(ψF⋉→πF:∞)(Uπ)]
and then pack it back up to get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
and we're done, that's the last of it! Time for isomorphism directions.
T5.2.1 First, 1-static to 3-static to 1-static. We want to show
E=⋀π(π⋅)∗((π⋅)∗(E))
Breaking down the big intersection, we get
⋀π(π⋅)∗((π⋅)∗(E))(f)=supζ,fπ,fEπ∼ζ[(π⋅)∗((π⋅)∗(E))(λe.fπ(e))]
and breaking down the pullbacks, we get
=supζ,fπ,fEπ∼ζ[supUπ,fπ((π⋅)∗(E))(λh.Uπ(h))]
and reexpressing the pushforward, we get
=supζ,fπ,fEπ∼ζ[supUπ,fπE(λe.Uπ(π⋅e))]
Folding the sups into one, we get
=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
Which, by the causality condition for E, is E(f) and we're done! In fact, the causality condition for E is basically just saying that going 1-static to 3-static to 1-static is identity.
T5.2.2 For 3-static to 1-static to 3-static, we want to show that
Θ(π′)=(π′⋅)∗(⋀π(π⋅)∗(Θ(π)))
Taking the latter thing
(π′⋅)∗(⋀π(π⋅)∗(Θ(π)))(U)
and undoing the pushforward, we get
=⋀π(π⋅)∗(Θ(π))(λe.U(π′⋅e))
And reexpressing the intersection, we get
=supζ,fπ,U(π′⋅∙)Eπ∼ζ[(π⋅)∗(Θ(π))(λe.fπ(e))]
and reexpressing the pullback, we get
=supζ,fπ,U(π′⋅∙)Eπ∼ζ[supUπ,fπΘ(π)(λh.Uπ(h))]
folding the sups into one, we get
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
And, by causality for Θ, this is just
=Θ(π′)(U)
and we're done there.
T5.2.3 For 3-static to 3-dynamic to 3-static, this will be tricky, because third-person dynamic to third-person static is a complicated operation. We want to show that
E=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(E⋉→T:∞)))
So, let's take the big thing and start unpacking it.
⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(E⋉→T:∞)))(f)
we undo the big and, to get
=supζ,fπ,fEπ∼ζ[(π⋅)∗(u1∼π(pr(A×O)ω∗(E⋉→T:∞)))(λe.fπ(e))]
and then reexpress the pullbacks to get
=supζ,fπ,fEπ∼ζ[supUπ,fπ(u1∼π(pr(A×O)ω∗(E⋉→T:∞)))(λao1:∞.Uπ(ao1:∞))]
and then fold the two sups into one to get
=supζ,Uπ,fEπ∼ζ[u1∼π(pr(A×O)ω∗(E⋉→T:∞))(λao1:∞.Uπ(ao1:∞))]
and then reexpress the update to get
=supζ,Uπ,fEπ∼ζ[pr(A×O)ω∗(E⋉→T:∞)(λao1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π)]
And then reexpress the projection and semidirect product to get
=supζ,Uπ,fEπ∼ζ[E(λe0.→T:∞(e0)(λaoe1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π))]
And then wrap it back up in a projection to get
=supζ,Uπ,fEπ∼ζ[E(λe0.pr(A×O)ω∗(→T:∞(e0))(λao1:∞.1ao1:∞∼πUπ(ao1:∞)+1ao1:∞≁π))]
And then rewrite this as an update, to get
=supζ,Uπ,fEπ∼ζ[E(λe0.u1∼π(pr(A×O)ω∗(→T:∞(e0)))(λao1:∞.Uπ(ao1:∞)))]
And then, by our earlier arguments when we were doing third-person dynamic to third-person static, we showed
u1∼π(pr(A×O)ω∗(→T:∞(e0)))=δπ⋅e0
So, making that substitution, we have
=supζ,Uπ,fEπ∼ζ[E(λe0.δπ⋅e0(λao:∞.Uπ(ao:∞)))]
Substituting the dirac-delta in, we get
=supζ,Uπ,fEπ∼ζ[E(λe0.Uπ(π⋅e0))]
Renaming our variables a little bit, we have
=supζ,Uπ,fEπ∼ζ[E(λe.Uπ(π⋅e))]
and then, by the causality condition for E, this is just =E(f) and we're done.
T5.2.4 For 3-dynamic to 3-static to 3-dynamic, our desired result is
ψT=⋀π(π⋅)∗(u1∼π(pr(A×O)ω∗(ψT⋉→T:∞)))
And the proof of this is the exact same as going third-person static to third-person dynamic and back, just use ψT instead of E. Also, →F(e) works out as it should.
T5.2.5,6 For translating back and forth between 3-dynamic and 1-dynamic, we observe that both translations keep the initial infradistribution over destinies the same, and our 3-dynamic to 1-dynamic, and 1-dynamic to 3-dynamic proofs verified that the →F and →T behave as they should when they go back and forth, so we don't need to worry about this.
T5.2.7 We're on the last two! For 1-dynamic to 1-static to 1-dynamic, we want to show that
ψF=⋀π(π⋅)∗(pr(A×O)ω∗(ψF⋉→πF:∞))
Taking the big part, we have
⋀π(π⋅)∗(pr(A×O)ω∗(ψF⋉→πF:∞))(f)
and undoing the big and, and the pullback via interaction with π, and merging the sups together as usual, (skipping a few steps we've seen several times before) we have
=supζ,Uπ,fEπ∼ζ[pr(A×O)ω∗(ψF⋉→πF:∞)(λao1:∞.Uπ(ao1:∞))]
Undoing the projection, and the semidirect product, we have
=supζ,Uπ,fEπ∼ζ[ψF(λe0.→πF:∞(e0)(λaoe1:∞.Uπ(ao1:∞)))]
and wrap it up in a projection,
=supζ,Uπ,fEπ∼ζ[ψF(λe0.pr(A×O)ω∗(→πF:∞(e0))(λao1:∞.Uπ(ao1:∞)))]
and then, because pr(A×O)ω∗(→πF:∞(e0))=δπ⋅e0 because unrolling a policy against a copolicy and recording the actions and observations does that, we have
=supζ,Uπ,fEπ∼ζ[ψF(λe0.δπ⋅e0(λao1:∞.Uπ(ao1:∞)))]
Then, substituting the value of the dirac-delta in, we get
=supζ,Uπ,fEπ∼ζ[ψF(λe0.Uπ(π⋅e0))]
And then, by causality for ψF, this is just
=ψF(f)
and we're done there.
T5.2.8 Last one, 1-static to 1-dynamic to 1-static. We want to show that
Θ(π′)=pr(A×O)ω∗((⋀π(π⋅)∗(Θ(π)))⋉→π′F:∞)
Starting at the complicated side, we have
pr(A×O)ω∗((⋀π(π⋅)∗(Θ(π)))⋉→π′F:∞)(U)
and then, undoing the projection,
=((⋀π(π⋅)∗(Θ(π)))⋉→π′F:∞)(λe0,aoe1:∞.U(ao1:∞))
Now, rewriting the semidirect product, we have
=(⋀π(π⋅)∗(Θ(π)))(λe0.→π′F:∞(e0)(λaoe1:∞.U(ao1:∞)))
Rewriting this as a projection, we have
=(⋀π(π⋅)∗(Θ(π)))(λe0.pr(A×O)ω∗(→π′F:∞(e0))(λao1:∞.U(ao1:∞)))
We use the usual trick of this projection of a semidirect product just being the dirac-delta on the copolicy interacting with the policy, and substituting that in to get
=(⋀π(π⋅)∗(Θ(π)))(λe0.U(π′⋅e0))
And then we do the usual unpacking of the intersection and the pullbacks to get:
=supζ,Uπ,U(π′⋅∙)Eπ∼ζ[Θ(π)(Uπ)]
Which, by causality for Θ, is just
=Θ(π′)(U)
and we're done with the isomorphisms.
T5.3 The 2 paths from first-person static to third-person dynamic being equivalent is pretty easy since we've got identity for two of the legs, and the other two legs are just "intersection of preimages of the Θ(π)", ie identical, and we're done!