Theorem 6: Causal IPOMDP's:Any infra-POMDP with transition kernel K:S×Aik→O×S which fulfills the niceness conditions and always produces crisp infradistributions, and starting infradistribution ψ∈□S, produces a causal belief function via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞).
So, first up, we can appeal to Theorem 4 on Pseudocausal IPOMDP's, to conclude that all the Kπ:∞ are well-defined, and that this Θ we defined fulfills all belief function conditions except maybe normalization.
The only things to check are that the resulting Θ is normalized, and that the resulting Θ is causal. The second is far more difficult and makes up the bulk of the proof, so we'll just dispose of the first one.
T6.1 First, K, since it's crisp by assumption, always has K(s,a)(c)=c for all constants c. The same property extends to all the Kπn (they're all crisp too) and then from there to Kπ:∞ by induction. Then we can just use the following argument, where π is arbitrary.
The last two inequalities were by Kπ:∞ mapping constants to the same constant regardless of π, and by ψ being normalized since it's an infradistribution. This same proof works if you swap out the 1 with a 0, so we have normalization for Θ.
T6.2 All that remains is checking the causality condition. We'll be using the alternate rephrasing of causality. We have Θ defined as
Θ(π′):=pr(A×O)ω∗(ψ⋉Kπ′:∞)
and want to derive our rephrasing of causality, that for any U and π′ and ζ and family Uπ, we have
Because if we hit that, we can stitch the inequalities together. The obvious move to show this new result is to apply monotonicity of infradistributions. If the inner function on the left hand side was always above the inner function on the right hand side, we'd have our result by infradistribution monotonicity. So, our new proof target becomes
At this point, now that we've gotten rid of ψ, we can use the full power of crisp infradistributions. Crisp infradistributions are just a set of probability distributions! Remember that the semidirect product, ψ⋉K, can be viewed as taking a probability distribution from ψ, and for each x, picking a conditional probability distribution from K(x), and all the probability distributions of that form make up ψ⋉K.
In particular, the infinitely iterated semidirect product here can be written as a repeated process where, for each history of the form s0,aos1:n,a, Murphy gets to select a probability distribution over the next observation and state from K(sn,a) (as a set of probability distributions) to fill in the next part of the action-observation-state tree. Accordingly every single probability distribution in Kπ:∞(s0) can be written as π interacting with an expanded probabilistic environment eΔ+ of type (A×O×S)<ω×A→Δ(O×S) (which gives Murphy's choice wherever it needs to pick the next observation and state), subject to the constraints that
eΔ+(aos1:n,a)∈K(sn,a)
and
eΔ+(∅,a)∈K(s0,a)
We'll abbreviate this property that eΔ+ always picks stuff from the appropriate K set and starts off picking from K(s0,a) as eΔ+∼K,s0.
Now, on the right side, since the set we're selecting the expanded environments from is the same each time and doesn't depend on π, we can freely move the inf outside of the expectation. It is unconditionally true that
because if that were true, we could pair it up with the always-true inequality to hit our old proof goal. And we'd be able to show this proof goal if we showed the following, more ambitious result.
Note that we're not restricting to compatibility with K and s0, we're trying to show it for any expanded probabilistic environment of type (A×O×S)<ω×A→Δ(O×S). We can rephrase this new proof goal slightly, viewing ourselves as selecting an action and observation history.
Now, for any expanded probabilistic environment eΔ+:(A×O×S)<ω×A→Δ(O×S), there is a unique corresponding probabilistic environment eΔ of type (A×O)<ω×A→ΔO where, for any policy, pr(A×O)ω∗(π⋅eΔ+)=π⋅eΔ. This is just the classical result that any POMDP can be viewed as a MDP with a state space consisting of finite histories, it still behaves the same. Using this swap, our new proof goal is:
Where eΔ is an arbitrary probabilistic environment. Now, you can take any probabilistic environment eΔ and view it as a mixture of deterministic environments e. So we can rephrase eΔ as, at the start of time, picking a e from ξ∈ΔE for some particular ξ. This is our mixture of deterministic environments that makes up eΔ. This mixture isn't necessarily unique, but there's always some mixture of deterministic enivironments that works to make eΔ where eΔ is arbitrary. So, our new proof goal becomes:
Since π⋅e is just a dirac-delta distribution on a particular history, we can substitute that in to get the equivalent
∀ξ∈ΔE:Ee∼ξ[U(π′⋅e)]≥Eπ∼ζ[Ee∼ξ[Uπ(π⋅e)]]
Now we just swap the two expectations.
∀ξ∈ΔE:Ee∼ξ[U(π′⋅e)]≥Ee∼ξ[Eπ∼ζ[Uπ(π⋅e)]]
And then we remember that our starting assumption was
∀e:U(π′⋅e)≥Eπ∼ζ[Uπ(π⋅e)]
So we've managed to hit our proof target and we're done, this propagates back to show causality for Θ.
Theorem 7: Pseudocausal to Causal Translation:The following diagram commutes when the bottom belief function Θ is pseudocausal. The upwards arrows are the causal translation procedures, and the top arrows are the usual morphisms between the type signatures. The new belief function Θc is causal. Also, if U:(A×O)ω→[0,1], and UN:(A×(O+N))ω→[0,1] is defined as 1 for any history containing Nirvana, and U(h) for any nirvana-free history, then ∀π:Θc(π)(UN)=Θ(π)(U).
T7.0 Preliminaries. Let's restate what the translations are. The notation d(e,h) is that e"defends" h. e defends h if it responds to any prefix of h which ends in an action with either the next observation in h, or Nirvana, and responds to any non-prefixes of h with Nirvana.
Also, we'll write hN∼π′,h if hN is "compatible" with π′ and h. The criterion for this is that hN∼π′, and that any prefix of hN which ends in an action and contains no Nirvana observations must either be a prefix of h, or only deviate from h in the last action. An alternate way to state that is that hN is capable of being produced by a copolicy which defends h, interacting with π′.
With these two notes, we introduce the infrakernels Kd:(A×O)ωik→E, and Kπ′:(A×O)ωik→(A×(O+N))ω.
Kd(h) is total uncertainty over e such that e defends h. Kd(h)(f):=infe:d(e,h)f(e)
And Kπ′(h) is total uncertainty over hN such that hN is compatible with π′ and h. Kπ′(h)(UN):=infhN∼π′,hUN(hN)
Now, to briefly recap the three translations.For first-person static translation, we have
Θc(π′):=pr(A×(O+N))ω∗(⊤Π⋉Θ⋉Kπ′)
Where Θ is the original belief function.
For third-person static translation, we have
E:=prE∗(D⋉Kd)
Where D is the original infradistribution over (A×O)ω.
For first-person dynamic translation, the new infrakernel has type signature
((A×O)ω+N)×Aik→((O+N)×((A×O)ω+N))
Ie, the state space is destinies plus a single Nirvana state, and the observation state is the old space of observations plus a Nirvana observation. N is used for the Nirvana observation, and N is used for the Nirvana state. The transition dynamics are:
∀a′∈A:→N(N,a′):=δN,N
and, if a′≠a, then →N(aoh,a′):=δN,N
and otherwise, →N(aoh,a):=δo,h∨δN,N
The initial infradistribution is i∗(ψF), the obvious injection of ψF from (A×O)ω to (A×O)ω+N.
The obvious way to show this theorem is that we can assume the bottom part of the diagram commutes and ψF,Θ,D are all pseudocausal, and then show that all three paths to Θc produce the same result, and it's causal. Then we just wrap up that last part about the utility functions.
Fortunately, assuming all three paths produce the same Θc belief function, it's trivial to show that it's causal. i∗(ψF) is an infradistribution since ψF is, and →N is a crisp infrakernel, so we can just invoke Theorem 6 that crisp infrakernels produce causal belief functions to show causality for Θc. So we only have to check that all three paths produce the same result, and show the result about the utility functions. Rephrasing one of the proofs is one phase, and then there are two more phases showing that the other two phases hit the same target, and the last phase showing our result about the utility functions.
We want to show that
Θc(π′)=(π′⋅)∗(E)=pr(A×(O+N))ω∗(i∗(ψF)⋉→π′N:∞)
T7.1 First, let's try to rephrase Θc(π′)(UN) into a more handy form for future use. Invoking our definition of Θc(π′), we have:
Θc(π′)(UN)=pr(A×(O+N))ω∗(⊤Π⋉Θ⋉Kπ′)(UN)
Then, we undo the projection
=(⊤Π⋉Θ⋉Kπ′)(λπ,h,hN.UN(hN))
And unpack the semidirect product
=⊤Π(λπ.Θ(π)(λh.Kπ′(h)(λhN.UN(hN))))
And unpack what ⊤Π means
=infπΘ(π)(λh.Kπ′(h)(λhN.UN(hN)))
And then apply the definition of Kπ′(h)
=infπΘ(π)(λh.infhN∼π′,hUN(hN))
This will be our final rephrasing of Θc(π′)(UN). Our job is to derive this result for the other two paths.
T7.2 First up, the third-person static translation. We can start with definitions.
(π′⋅)∗(E)(UN)=(π′⋅)∗(prE∗(D⋉Kd))(UN)
And then, we can recall that since D and Θ are assumed to commute, we can define D in terms of Θ via D=pr(A×O)ω∗(⊤Π⋉Θ). Making this substitution, we get
=(π′⋅)∗(prE∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd))(UN)
We undo the first pushforward
=prE∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd)(λe.UN(π′⋅e))
And then the projection
=((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd)(λh,e.UN(π′⋅e))
And unpack the semidirect product
=pr(A×O)ω∗(⊤Π⋉Θ)(λh.Kd(h)(λe.UN(π′⋅e)))
And remove the projection
=(⊤Π⋉Θ)(λπ,h.Kd(h)(λe.UN(π′⋅e)))
and unpack the semidirect product and what ⊤Π is
=infπΘ(π)(λh.Kd(h)(λe.UN(π′⋅e)))
Now, we recall that Kd(h) is all the e that defend h, to get
=infπΘ(π)(λh.infe:d(e,h)UN(π′⋅e))
At this point we can realize that for any e which defends h, when e interacts with π′, it makes a hN that is compatible with π′ and h. Further, all hN compatible with π′ and h have an e which produces them. If hN has its nirvana-free prefix not being a prefix of h, then the defending e which doesn't end h early, just sends deviations to Nirvana, is capable of producing it. If hN looks like h but ended early with Nirvana, you can just have a defending e which ends with Nirvana at the same time. So we can rewrite this as
=infπΘ(π)(λh.infhN∼π′,h:UN(hN))
And this is exactly the form we got for Θ(π′)(UN).
T7.3 Time for the third and final translation. We want to show that
pr(A×(O+N))ω∗(i∗(ψF)⋉→π′N:∞)(UN)
is what we want. Because the pseudocausal square commutes, we have that ψF=pr(A×O)ω∗(⊤Π⋉Θ), so we can make that substitution to yield
For this notation, we're using sN to refer to an element of (A×O)ω+N, the state space. aosN1:∞ is an element of (A×(O+N)×((A×O)ω+N))ω, a full unrolling. aoN1:∞ is the projection of this to (A×(O+N))ω.
And now, we can ask what pr(A×(O+N))ω∗(→π′N:∞(h)) is. Basically, the starting state is the destiny h, and then π′ interacts with it through the kernel →N. Due to how →N is defined, it can do one of three things. First, it could just unroll h all the way to completion, if h is the sort of history that's compatible with π′. Second, it could partially unroll h and deviate to Nirvana early, even if π′ is compatible with more, because of how →N(aoh,a) is defined. Finally, π′ could deviate from h and then →N would have to respond with Nirvana. And of course, once you're in a Nirvana state, you're in there for good. Hm, you have total uncertainty over histories hN compatible with π′ and h, any of them at all could be made by pr(A×(O+N))ω∗(→π′N:∞(h)). So, this projection of the infrakernel unrolling h is just Kπ′(h). We get
=infπΘ(π)(λh.Kπ′(h)(λhN.UN(hN)))
And then we use what Kπ′(h) does to functions, to get
=infπΘ(π)(λh.infhN∼π′,hUN(hN))
Which is exactly the form we need. So, all three translation procedures produce identical belief functions.
T7.4 The only thing which remains to wrap this result up is guaranteeing that, if UN:(A×(O+N))ω→[0,1] is 1 if the history contains Nirvana, (or infinity for the R type signature), and U otherwise, then Θc(π′)(UN)=Θ(π′)(U).
We rephrased Θc(π′)(UN) as infπΘ(π)(λh.infhN∼π′,hUN(hN))
We can notice something interesting. If h is the sort of history that π′ is capable of making, ie h∼π′, then all of the hN∼π′,h are going to end up in Nirvana except h itself, because all the hN are like "follow π′, any deviations of π′ from h go to Nirvana, and also Nirvana can show up early". In this case, since any history ending with Nirvana is maximum utility, and otherwise U is copied, we have infhN∼π′,hUN(hN)=U(h). However, if h is incompatible with π′, then no matter what, a history hN which follows π′ is going to hit Nirvana and be assigned maximum utility. So, we can rewrite
infπΘ(π)(λh.infhN∼π′,hUN(hN))
as
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Which can be written as an update, yielding
=infπu1∼π′(Θ(π))(U)
And then, we remember that regardless of π′ and π, for pseudocausal belief functions, which Θ is, we have that u1∼π′(Θ(π))(U)≥Θ(π′)(U), so the update assigns higher expectation values than Θ(π′) does. Also, u1∼π′(Θ(π′))=Θ(π′). Thus, the infinimum is attained by π′, and we get
=Θ(π′)(U)
and we're done.
Theorem 8: Acausal to Pseudocausal Translation:The following diagram commutes when the bottom belief function Θ is acausal. The upwards arrows are the pseudocausal translation procedures, and the top arrows are the usual morphisms between the type signatures. The new belief function Θp will be pseudocausal.
T8.0 First, preliminaries. The two translation procedures are, to go from an infradistribution over policy-tagged destinies D, to an infradistribution over destinies Dp, we just project.
Dp:=pr(A×O)ω∗(D)
And, to go from an acausal belief function Θ to a pseudocausal one Θp, we do the translation
Θp(π′):=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))
We assume that D and Θ commute, and try to show that these translation procedures result in a Θp and Dp which are pseudocausal and commute with each other. So, our proof goals are that
Θp(π′)=u1∼π′(Dp)
and
Dp=pr(A×O)ω∗(⊤Π⋉Θp)
Once we've done that, it's easy to get that Θp is pseudocausal. This occurs because the projection of any infradistribution is an infradistribution, so we can conclude that Dp is an infradistribution over (A×O)ω. Also, back in the Pseudocausal Commutative Square theorem, we were able to derive that any infradistribution Dp over (A×O)ω, when translated to a Θp, makes a pseudocausal belief function, no special properties were used in that except being an infradistribution.
T8.1 Let's address the first one, trying to show that
Θp(π′)=u1∼π′(Dp)
We'll take the left side, plug in some function, and keep unpacking and rearranging until we get the right side with some function plugged in. First up is applying definitions.
Θp(π′)(U)=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))(U)
by how Θp was defined. Now, we start unpacking. First, the projection.
=(⊤Π⋉(λπ.u1∼π′(Θ(π))))(λπ,h.U(h))
Then unpack the semidirect product and ⊤Π, for
=infπu1∼π′(Θ(π))(λh.U(h))
Then unpack the update, to get
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Then pack up the infπ
=⊤Π(λπ.Θ(π)(λh.1h∼π′U(h)+1h≁π′))
And pack up the semidirect product
=(⊤Π⋉Θ)(λπ,h.1h∼π′U(h)+1h≁π′)
And since D=⊤Π⋉Θ for the original acausal belief function, we can rewrite this as
=D(λπ,h.1h∼π′U(h)+1h≁π′)
And pack this up in a projection.
=pr(A×O)ω∗(D)(λh.1h∼π′U(h)+1h≁π′)
And then, since we defined Dp:=pr(A×O)ω∗(D), we can pack this up as
=Dp(λh.1h∼π′U(h)+1h≁π′)
And then pack up the update
=u1∼π′(Dp)(U)
So, that's one direction done.
T8.2 Now for the second result, showing that
Dp=pr(A×O)ω∗(⊤Π⋉Θp)
Again, we'll take the more complicated form, and write it as the first one, for arbitrary functions U.bWe begin with
Let's fold the two infs into one inf, this doesn't change anything, just makes things a bit more concise.
=infπ′,πu1∼π′(Θ(π))(λh.U(h))
and unpack the update.
=infπ′,πΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Now, we can note something interesting here. If π′ is selected to be equal to π, that inner function is just U, because Θ(π) is only supported over histories compatible with π. If π′ is unequal to π, that inner function is U with 1's for some inputs, possibly. So, our choice of π′ is optimal for minimizing when its equal to π, so we can rewrite as
=infπΘ(π)(λh.U(h))
and write this as ⊤Π
=⊤Π(λπ.Θ(π)(λh.U(h)))
And pack up as a semidirect product
=(⊤Π⋉Θ)(λπ,h.U(h))
and then because D=⊤Π⋉Θ, we substitute to get
=D(λπ,h.U(h))
and then write as a projection.
=pr(A×O)ω∗(D)(U)
Now, this is just Dp, yielding
=Dp(U)
and we're done.
Proposition 2:The two formulations of x-pseudocausality are equivalent for crisp acausal belief functions.
One formulation of x-pseudocausality is that
∀π,π′:u1∼π′(Θ(π))⊆Θx(π′) where Θx(π′)(U):=Θ(π′)(inf(x,U))
The second formulation is,
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
This proof will be slightly informal, it can be fully formalized without too many conceptual difficulties. We need to establish some basic probability theory results to work on this. The proof outline is we establish/recap four probability-theory/inframeasure-theory results, and then show the two implication directions of the iff statement.
P2.1 Our first claim is that, given any two probability distributions μ and ν, you can shatter them into measures m∗,mμ, and mν such that m∗+mμ=μ and m∗+mν=ν, and mμ(1)=mν(1)=dTV(μ,ν). What this decomposition is basically doing is finding the largest chunk of measure that μ and ν agree on (that's m∗), and then subtracting that out from μ and ν yields you your mμ and mν. These measures have no overlap, because if there was overlap, you could put that overlap into the m∗ chunk of measure. And, the mass of the "unique to μ" piece must equal the mass of the "unique to ν" piece because they must add up to probability distributions, and this quantity is exactly the total variation distance of μ,ν.
P2.2 Our second claim is that
x⋅dTV(μ,ν)=supU:(A×O)ω→[0,x]|μ(U)−ν(U)|
Why is this? Well, there's an alternate formulation of total variation distance where
dTV(μ,ν)=supU:(A×O)ω→[0,1]|μ(U)−ν(U)|
As intuition for this, we can pick a function U which is 1 on the support of mμ, and 0 elsewhere. In such a case, we'd get
and then remember that mμ and mν have disjoint supports, so we get
=|mμ(1)−mν(0)|=mμ(1)=dTV(μ,ν)
And any other function in the [0,1] range wouldn't do as well at attaining different expectation values on mμ and mν. So that's where the identity comes from. If we try to do the optimal separation but instead restrict U to be in [0,x], we'd get
=|mμ(x)−mν(0)|=mμ(x)=x⋅mμ(1)=x⋅dTV(μ,ν)
and any other utility function in [0,x] wouldn't do as well at separating the two.
P2.3 Our third claim is that the function which maps f to −∞, if f goes outside of [0,x], and otherwise maps f to ψ(f), corresponds to taking the set of sa-measures that is Ψ (the set form of ψ), and taking the upper completion of it under all signed-measure, b pairs where b+x⋅m−(1)≥0. Call this an x-sa measure. Ie, if x=13, then one unit of +b is big enough to cancel out 3 units of negative-measure in the signed measure. Clearly, if f:X→[0,x], and (m,b) has b+x⋅m−(1)≥0 (is an x-sa measure) then we have
m(f)+b=m+(f)+m−(f)+b≥m+(0)+m−(x)+b=x⋅m−(1)+b≥0
So, adding in the cone of all x-sa pairs like that doesn't actually affect the expectation values of any function bounded in [0,x] at all. However, for any function which goes over the x upper-bound, you can consider a x-sa measure which concentrates all its large amount of negative measure on a point where the function exceeds the upper bound, and exactly compensates for it with the +b term. Adding this on can make the function, evaluated at said point, have an unboundedly negative expectation value, justifying our choice of −∞ for the expectation value.
P2.4 Time for the fourth claim. To build up to it, the monotone hull of a functional C(X,[0,x])→[0,1] is as follows: if ψx is the original functional, then ψm is the monotone hull defined by
ψm(f):=supf′:X→[0,x]:f′≤fψx(f)
Our fourth claim is that, in the set-based value, this operation corresponds to restricting the set of x-sa measures that is Ψx to just the a-measures in it. Why? Well, we have
∀f∈CB(X):ψm(f)≥ψx(f)
This is because ψx and ψm both agree that any function with negative parts gets −∞ value, and both agree on the expectation of any function bounded in [0,x], but for functions above 0 which also stray above x somewhere, ψx(f)=−∞, while ψm(f)≥0.
Therefore, since ψm≥ψx (when considered as a function, we aren't using the infradistribution reversed ordering here), we have that Ψm⊆Ψx (higher functions correspond to smaller sets). Now, Ψm cannot contain any x-sa-measures with negative measure anywhere in the measure component. For, if there was such a point in there, we could consider a function f that assigned extremely high value to the region of negative measure, and 0 value elsewhere, and f would attain a negative expectation value on that x-sa measure, which contradicts the nonnegative worst-case expectation value of f due to how ψm was defined. So, the set Ψm must consist entirely of a-measures. Further, for any function f with range in [0,x], it is minimized by an a-measure in Ψx, and the same point is present in Ψx∩Ma (no extra a-measures added), so the ϕ corresponding to that set must assign the exact same expectation value to f as ψx and ψm do. Also, any set consisting entirely of a-measures must have its expectation functional be monotone.
So, recapping what we've done so far, we know that Ψm⊆Ψx. We know that Ψm must consist entirely of a-measures, otherwise we'd get a contradiction with monotonicity. Further, Ψx∩Ma (the biggest set that Ψm could possibly be) has its expectation functional ϕ agree with ψx on the expectations of functions in [0,x], and it's monotone like ψm is. Now, if Ψm⊂Ψx∩Ma (strict subset), we'd have the functional ϕ corresponding to the intersection being strictly below ψm on some function. However ψm is the minimal monotone function that agrees with ψx on the expectations of functions in [0,x], so this rule out strict subset, and we must have equality. The monotone hull of an expectation functional corresponds to the restriction to the set to a-measures. Further, we can rewrite the definition
ψm(f):=supf′:(A×O)ω→[0,x]:f′≤fψx(f)
as just
ψm(f):=ψx(inf(x,f))
And we're done with this.
Putting claims 3 and 4 together, we can now express what the set Θx(π′) is, from our definition of Θx(π′)(U):=Θ(π′)(inf(x,U)). Θx(π′) (interpreted as a set of sa-measures) is just the upper completion of Θ(π′) under x-sa measures, those with b+x⋅m−(1)≥0 (this is the restriction to [0,x]), and then the intersection of that set with the cone of a-measures (monotone hull), and then upper completion under sa-measures (the restriction to [0,1]).
P2.5 Time to begin showing one direction of the iff statement. Remember, we're assuming that all the Θ(π) are induced by a set of probability distributions, they're crisp. We'll start with the easy direction, that
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
implies
∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U)
where Θx(π′)(U):=Θ(π′)(inf(x,U)). (well, technically, this shouldn't just be continuous functions, but lower-semicontinuous functions as well) To begin with, let π,π′,U be arbitrary with U bounded in [0,1]. We can start unpacking
u1∼π′(Θ(π)(U))=Θ(π)(λh.1h∼π′U(h)+1h≁π′)
Now, this minimal value must be attained by some minimal point in Θ(π), which is some probability distribution μ. So, we have
=μ(λh.1h∼π′U(h)+1h≁π′)
We can, at this point, select the closest probability distribution to μ in Θ(π′) according to total variation distance, call that ν. We do our usual shattering of μ into m∗ and mμ, the chunk of measure where μ and ν overlap, and the chunk of measure that's unique to μ. Note that the chunk of measure where μ and ν overlap must be supported on histories compatible with π′ since ν∈Θ(π′), so we can break things down as
=m∗(U)+mμ(λh.1h∼π′U(h)+1h≁π′)
And then we can break mμ down further into mμ,∼π′ and mμ,≁π′, for the chunk of measure on histories compatible with π′ and incompatible, respectively, yielding
=m∗(U)+mμ,∼π′(U)+mμ,≁π′(1)
Which can be rephrased as
=m∗(U)+mμ,∼π′(U)+μ(h≁π′)
Now it's time for some inequalities, explained afterwards.
So, for the first line, the first two inequalities are obvious. For the second line, the initial inequality is us invoking the total variation distance variant of x-pseudocausality since ν is as close as possible to μ while also being in Θ(π′), and the equality is reexpressing total variation distance in accordance with our second claim. The third line is just shattering ν and μ in our way connected with total variation distance, and the fourth line is just canceling. For the fifth line, the first equality is the supremum being attained by a function that's x on mν, and 0 on mμ, as the two measures are disjoint, and the second equality is ν=m∗+mν. Then, for line 6, the inequality is obvious because ν∈Θ(π′), and then we just wrap the inf up as an expectation according to Θ(π′), and apply the definition of Θx(π′).
P2.6 Now for the other direction of the iff-statement, going from
∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U)
to
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
Using our set-based characterization of inframeasure ordering, we can accurately rephrase our initial statement as
∀π,π′:u1∼π′(Θ(π))⊆Θx(π′)
The former set is "take all the minimal points of Θ(π), convert all their measure on histories incompatible with π′ to the +b term, do closed convex hull and upper completion", and the latter set is "take Θ(π′), do upper completion w.r.t. the cone of x-sa measures, restrict to just the a-measures, and then add in the cone of sa-measures again"
Pick a μ∈Θ(π). When we 1-update it on compatibility with π′, we get (mμ,∼π′,μ(h≁π′)). And said point lies in Θx(π′), which means it must be possible to create by taking a minimal point in Θ(π′) (call that one ν), and adding an x-sa measure to it. Call the x-sa measure (m′,b′). So, our net result is:
(mμ,∼π′,μ(h≁π′))=(ν,0)+(m′,b′)
The update of μ equals ν plus an x-sa measure. Now, breaking up m′ into its positive and negative parts which are disjoint, (m′)+ and (m′)−, we can rewrite things as:
b′=μ(h≁π′)
mμ,∼π′=ν+(m′)++(m′)−
Adding mμ,≁π′ to both sides and folding it into μ, and rearranging a bit and adding parentheses, that second equality can be restated as
μ=ν+((m′)++mμ,≁π′)+(m′)−
Now, ν is supported entirely over histories compatible with π′, and (m′)− is a chunk of negative measure taken out of ν, so (m′)− must be supported entirely over histories compatible with π′. Also, (m′)+ and (m′)− are disjoint (Jordan decomposition), and mμ,≁π′ is supported entirely over histories incompatible with π′, so the positive measure (m′)++mμ,≁π′ and the negative measure (m′)− must be disjoint. Since μ is made from ν by adding on a chunk of positive measure and a chunk of negative measure which are disjoint, we have dTV(μ,ν)=−(m′)−(1). At this point, we only have one step left.
For the last step, because ((m′)++(m′)−,b′) is an x-sa measure, we have b′+x⋅(m′)−(1)≥0. Also, because b′=μ(h≁π′), our inequality turns into
μ(h≁π′)+x⋅(m′)−(1)≥0
μ(h≁π′)≥−x⋅(m′)−(1)
μ(h≁π′)≥x⋅dTV(μ,ν)
and this gets us our
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
result. We're done!
Proposition 3:If a utility function U is bounded within [0,x] and a belief function Θ is x-pseudocausal, then translating Θ to Θp (pseudocausal translation) fulfills ∀π:Θp(π)(U)=Θ(π)(U).
Blessedly, this one is very easy. First, let's recap the translation of a belief function (assumed to be x-pseudocausal) from acausal to pseudocausal. It is
And then we recall that x-pseudocausality is ∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U) Also, recall that since Θx(π′)(U):=Θ(π′)(inf(U,x)), and our utility function is bounded in [0,x], we get the statement ∀π,π′:u1∼π′(Θ(π))(U)≥Θ(π′)(U) Additionally, u1π′(Θ(π′))=Θ(π′). Therefore, the infinimum is attained by π′ itself, and we get
Theorem 6: Causal IPOMDP's: Any infra-POMDP with transition kernel K:S×Aik→O×S which fulfills the niceness conditions and always produces crisp infradistributions, and starting infradistribution ψ∈□S, produces a causal belief function via Θ(π):=pr(A×O)ω∗(ψ⋉Kπ:∞).
So, first up, we can appeal to Theorem 4 on Pseudocausal IPOMDP's, to conclude that all the Kπ:∞ are well-defined, and that this Θ we defined fulfills all belief function conditions except maybe normalization.
The only things to check are that the resulting Θ is normalized, and that the resulting Θ is causal. The second is far more difficult and makes up the bulk of the proof, so we'll just dispose of the first one.
T6.1 First, K, since it's crisp by assumption, always has K(s,a)(c)=c for all constants c. The same property extends to all the Kπn (they're all crisp too) and then from there to Kπ:∞ by induction. Then we can just use the following argument, where π is arbitrary.
Θ(π)(1)=pr(A×O)ω∗(ψ⋉Kπ:∞)(1)=(ψ⋉Kπ:∞)(1)=ψ(λs0.Kπ:∞(s0)(1))=ψ(1)=1
The last two inequalities were by Kπ:∞ mapping constants to the same constant regardless of π, and by ψ being normalized since it's an infradistribution. This same proof works if you swap out the 1 with a 0, so we have normalization for Θ.
T6.2 All that remains is checking the causality condition. We'll be using the alternate rephrasing of causality. We have Θ defined as
Θ(π′):=pr(A×O)ω∗(ψ⋉Kπ′:∞)
and want to derive our rephrasing of causality, that for any U and π′ and ζ and family Uπ, we have
(∀e:U(π′⋅e)≥Eπ∼ζ[Uπ(π⋅e)])→Θ(π′)(U)≥Eπ∼ζ[Θ(π)(Uπ)]
Accordingly, we get to fix a U,π′,ζ, and family of functions Uπ, and assume
∀e:U(π′⋅e)≥Eπ∼ζ[Uπ(π⋅e)]
Our proof target is
Θ(π′)(U)≥Eπ∼ζ[Θ(π)(Uπ)]
Applying our reexpression of Θ in terms of ψ and Kπ:∞ and projection, we get
pr(A×O)ω∗(ψ⋉Kπ′:∞)(λao1:∞.U(ao1:∞))≥Eπ∼ζ[pr(A×O)ω∗(ψ⋉Kπ:∞)(λao1:∞.Uπ(ao1:∞))]
Undoing the projections on both sides, we get
(ψ⋉Kπ′:∞)(λs0,aos1:∞.U(ao1:∞))≥Eπ∼ζ[(ψ⋉Kπ:∞)(λs0,aos1:∞.Uπ(ao1:∞))]
Unpacking the semidirect product, we get
ψ(λs0.Kπ′:∞(s0)(λaos1:∞.U(ao1:∞)))≥Eπ∼ζ[ψ(λs0.Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞)))]
Now, we can notice something interesting here. By concavity for infradistributions, we have
ψ(λs0.Eπ∼ζ[Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞))])≥Eπ∼ζ[ψ(λs0.Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞)))]
So, our new proof target becomes
ψ(λs0.Kπ′:∞(s0)(λaos1:∞.U(ao1:∞)))≥ψ(λs0.Eπ∼ζ[Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞))])
Because if we hit that, we can stitch the inequalities together. The obvious move to show this new result is to apply monotonicity of infradistributions. If the inner function on the left hand side was always above the inner function on the right hand side, we'd have our result by infradistribution monotonicity. So, our new proof target becomes
∀s0:Kπ′:∞(s0)(λaos1:∞.U(ao1:∞))≥Eπ∼ζ[Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞))]
At this point, now that we've gotten rid of ψ, we can use the full power of crisp infradistributions. Crisp infradistributions are just a set of probability distributions! Remember that the semidirect product, ψ⋉K, can be viewed as taking a probability distribution from ψ, and for each x, picking a conditional probability distribution from K(x), and all the probability distributions of that form make up ψ⋉K.
In particular, the infinitely iterated semidirect product here can be written as a repeated process where, for each history of the form s0,aos1:n,a, Murphy gets to select a probability distribution over the next observation and state from K(sn,a) (as a set of probability distributions) to fill in the next part of the action-observation-state tree. Accordingly every single probability distribution in Kπ:∞(s0) can be written as π interacting with an expanded probabilistic environment eΔ+ of type (A×O×S)<ω×A→Δ(O×S) (which gives Murphy's choice wherever it needs to pick the next observation and state), subject to the constraints that
eΔ+(aos1:n,a)∈K(sn,a)
and
eΔ+(∅,a)∈K(s0,a)
We'll abbreviate this property that eΔ+ always picks stuff from the appropriate K set and starts off picking from K(s0,a) as eΔ+∼K,s0.
Our old proof goal of
∀s0:Kπ′:∞(s0)(λaos1:∞.U(ao1:∞))≥Eπ∼ζ[Kπ:∞(s0)(λaos1:∞.Uπ(ao1:∞))]
can be rephrased as
∀s0:infμ′∈Kπ′:∞(s0)μ′(λaos1:∞.U(ao1:∞))≥Eπ∼ζ[infμπ∈Kπ:∞(s0)μπ(λaos1:∞.Uπ(ao1:∞))]
and then rephrased as
∀s0:infμ′∈Kπ′:∞(s0)Eaos1:∞∼μ′[U(ao1:∞)]≥Eπ∼ζ[infμπ∈Kπ:∞(s0)Eaos1:∞∼μπ[Uπ(ao1:∞)]]
And now, by our earlier discussion, we can rephrase this statement in terms of picking expanded probabilistic environments.
∀s0:infeΔ′+∼K,s0Eaos1:∞∼π′⋅eΔ′+[U(ao1:∞)]≥Eπ∼ζ[infeΔ+π∼K,s0Eaos1:∞∼π⋅eΔ+π[Uπ(ao1:∞)]]
Now, on the right side, since the set we're selecting the expanded environments from is the same each time and doesn't depend on π, we can freely move the inf outside of the expectation. It is unconditionally true that
infeΔ+∼K,s0Eπ∼ζ[Eaos1:∞∼π⋅eΔ+[Uπ(ao1:∞)]]≥Eπ∼ζ[infeΔ+π∼K,s0Eaos1:∞∼π⋅eΔ+π[Uπ(ao1:∞)]]
So now, our proof goal shifts to
∀s0:infeΔ′+∼K,s0Eaos1:∞∼π′⋅eΔ′+[U(ao1:∞)]≥infeΔ+∼K,s0Eπ∼ζ[Eaos1:∞∼π⋅eΔ+[Uπ(ao1:∞)]]
because if that were true, we could pair it up with the always-true inequality to hit our old proof goal. And we'd be able to show this proof goal if we showed the following, more ambitious result.
∀eΔ+:Eaos1:∞∼π′⋅eΔ+[U(ao1:∞)]≥Eπ∼ζ[Eaos1:∞∼π⋅eΔ+[Uπ(ao1:∞)]]
Note that we're not restricting to compatibility with K and s0, we're trying to show it for any expanded probabilistic environment of type (A×O×S)<ω×A→Δ(O×S). We can rephrase this new proof goal slightly, viewing ourselves as selecting an action and observation history.
∀eΔ+:Eao1:∞∼pr(A×O)ω∗(π′⋅eΔ+)[U(ao1:∞)]≥Eπ∼ζ[Eao1:∞∼pr(A×O)ω∗(π⋅eΔ+)[Uπ(ao1:∞)]]
Now, for any expanded probabilistic environment eΔ+:(A×O×S)<ω×A→Δ(O×S), there is a unique corresponding probabilistic environment eΔ of type (A×O)<ω×A→ΔO where, for any policy, pr(A×O)ω∗(π⋅eΔ+)=π⋅eΔ. This is just the classical result that any POMDP can be viewed as a MDP with a state space consisting of finite histories, it still behaves the same. Using this swap, our new proof goal is:
∀eΔ:Eao1:∞∼π′⋅eΔ[U(ao1:∞)]≥Eπ∼ζ[Eao1:∞∼π⋅eΔ[Uπ(ao1:∞)]]
Where eΔ is an arbitrary probabilistic environment. Now, you can take any probabilistic environment eΔ and view it as a mixture of deterministic environments e. So we can rephrase eΔ as, at the start of time, picking a e from ξ∈ΔE for some particular ξ. This is our mixture of deterministic environments that makes up eΔ. This mixture isn't necessarily unique, but there's always some mixture of deterministic enivironments that works to make eΔ where eΔ is arbitrary. So, our new proof goal becomes:
∀ξ∈ΔE:Ee∼ξ[Eao1:∞∼π′⋅e[U(ao1:∞)]]≥Eπ∼ζ[Ee∼ξ[Eao1:∞∼π⋅e[Uπ(ao1:∞)]]]
Since π⋅e is just a dirac-delta distribution on a particular history, we can substitute that in to get the equivalent
∀ξ∈ΔE:Ee∼ξ[U(π′⋅e)]≥Eπ∼ζ[Ee∼ξ[Uπ(π⋅e)]]
Now we just swap the two expectations.
∀ξ∈ΔE:Ee∼ξ[U(π′⋅e)]≥Ee∼ξ[Eπ∼ζ[Uπ(π⋅e)]]
And then we remember that our starting assumption was
∀e:U(π′⋅e)≥Eπ∼ζ[Uπ(π⋅e)]
So we've managed to hit our proof target and we're done, this propagates back to show causality for Θ.
Theorem 7: Pseudocausal to Causal Translation: The following diagram commutes when the bottom belief function Θ is pseudocausal. The upwards arrows are the causal translation procedures, and the top arrows are the usual morphisms between the type signatures. The new belief function Θc is causal. Also, if U:(A×O)ω→[0,1], and UN:(A×(O+N))ω→[0,1] is defined as 1 for any history containing Nirvana, and U(h) for any nirvana-free history, then ∀π:Θc(π)(UN)=Θ(π)(U).
T7.0 Preliminaries. Let's restate what the translations are. The notation d(e,h) is that e"defends" h. e defends h if it responds to any prefix of h which ends in an action with either the next observation in h, or Nirvana, and responds to any non-prefixes of h with Nirvana.
Also, we'll write hN∼π′,h if hN is "compatible" with π′ and h. The criterion for this is that hN∼π′, and that any prefix of hN which ends in an action and contains no Nirvana observations must either be a prefix of h, or only deviate from h in the last action. An alternate way to state that is that hN is capable of being produced by a copolicy which defends h, interacting with π′.
With these two notes, we introduce the infrakernels Kd:(A×O)ωik→E, and Kπ′:(A×O)ωik→(A×(O+N))ω.
Kd(h) is total uncertainty over e such that e defends h. Kd(h)(f):=infe:d(e,h)f(e)
And Kπ′(h) is total uncertainty over hN such that hN is compatible with π′ and h. Kπ′(h)(UN):=infhN∼π′,hUN(hN)
Now, to briefly recap the three translations.For first-person static translation, we have
Θc(π′):=pr(A×(O+N))ω∗(⊤Π⋉Θ⋉Kπ′)
Where Θ is the original belief function.
For third-person static translation, we have
E:=prE∗(D⋉Kd)
Where D is the original infradistribution over (A×O)ω.
For first-person dynamic translation, the new infrakernel has type signature
((A×O)ω+N)×Aik→((O+N)×((A×O)ω+N))
Ie, the state space is destinies plus a single Nirvana state, and the observation state is the old space of observations plus a Nirvana observation. N is used for the Nirvana observation, and N is used for the Nirvana state. The transition dynamics are:
∀a′∈A:→N(N,a′):=δN,N
and, if a′≠a, then →N(aoh,a′):=δN,N
and otherwise, →N(aoh,a):=δo,h∨δN,N
The initial infradistribution is i∗(ψF), the obvious injection of ψF from (A×O)ω to (A×O)ω+N.
The obvious way to show this theorem is that we can assume the bottom part of the diagram commutes and ψF,Θ,D are all pseudocausal, and then show that all three paths to Θc produce the same result, and it's causal. Then we just wrap up that last part about the utility functions.
Fortunately, assuming all three paths produce the same Θc belief function, it's trivial to show that it's causal. i∗(ψF) is an infradistribution since ψF is, and →N is a crisp infrakernel, so we can just invoke Theorem 6 that crisp infrakernels produce causal belief functions to show causality for Θc. So we only have to check that all three paths produce the same result, and show the result about the utility functions. Rephrasing one of the proofs is one phase, and then there are two more phases showing that the other two phases hit the same target, and the last phase showing our result about the utility functions.
We want to show that
Θc(π′)=(π′⋅)∗(E)=pr(A×(O+N))ω∗(i∗(ψF)⋉→π′N:∞)
T7.1 First, let's try to rephrase Θc(π′)(UN) into a more handy form for future use. Invoking our definition of Θc(π′), we have:
Θc(π′)(UN)=pr(A×(O+N))ω∗(⊤Π⋉Θ⋉Kπ′)(UN)
Then, we undo the projection
=(⊤Π⋉Θ⋉Kπ′)(λπ,h,hN.UN(hN))
And unpack the semidirect product
=⊤Π(λπ.Θ(π)(λh.Kπ′(h)(λhN.UN(hN))))
And unpack what ⊤Π means
=infπΘ(π)(λh.Kπ′(h)(λhN.UN(hN)))
And then apply the definition of Kπ′(h)
=infπΘ(π)(λh.infhN∼π′,hUN(hN))
This will be our final rephrasing of Θc(π′)(UN). Our job is to derive this result for the other two paths.
T7.2 First up, the third-person static translation. We can start with definitions.
(π′⋅)∗(E)(UN)=(π′⋅)∗(prE∗(D⋉Kd))(UN)
And then, we can recall that since D and Θ are assumed to commute, we can define D in terms of Θ via D=pr(A×O)ω∗(⊤Π⋉Θ). Making this substitution, we get
=(π′⋅)∗(prE∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd))(UN)
We undo the first pushforward
=prE∗((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd)(λe.UN(π′⋅e))
And then the projection
=((pr(A×O)ω∗(⊤Π⋉Θ))⋉Kd)(λh,e.UN(π′⋅e))
And unpack the semidirect product
=pr(A×O)ω∗(⊤Π⋉Θ)(λh.Kd(h)(λe.UN(π′⋅e)))
And remove the projection
=(⊤Π⋉Θ)(λπ,h.Kd(h)(λe.UN(π′⋅e)))
and unpack the semidirect product and what ⊤Π is
=infπΘ(π)(λh.Kd(h)(λe.UN(π′⋅e)))
Now, we recall that Kd(h) is all the e that defend h, to get
=infπΘ(π)(λh.infe:d(e,h)UN(π′⋅e))
At this point we can realize that for any e which defends h, when e interacts with π′, it makes a hN that is compatible with π′ and h. Further, all hN compatible with π′ and h have an e which produces them. If hN has its nirvana-free prefix not being a prefix of h, then the defending e which doesn't end h early, just sends deviations to Nirvana, is capable of producing it. If hN looks like h but ended early with Nirvana, you can just have a defending e which ends with Nirvana at the same time. So we can rewrite this as
=infπΘ(π)(λh.infhN∼π′,h:UN(hN))
And this is exactly the form we got for Θ(π′)(UN).
T7.3 Time for the third and final translation. We want to show that
pr(A×(O+N))ω∗(i∗(ψF)⋉→π′N:∞)(UN)
is what we want. Because the pseudocausal square commutes, we have that ψF=pr(A×O)ω∗(⊤Π⋉Θ), so we can make that substitution to yield
=pr(A×(O+N))ω∗(i∗(pr(A×O)ω∗(⊤Π⋉Θ))⋉→π′N:∞)(UN)
Remove the projection, to get
=(i∗(pr(A×O)ω∗(⊤Π⋉Θ))⋉→π′N:∞)(λsN0,aosN1:∞.UN(aoN1:∞))
For this notation, we're using sN to refer to an element of (A×O)ω+N, the state space. aosN1:∞ is an element of (A×(O+N)×((A×O)ω+N))ω, a full unrolling. aoN1:∞ is the projection of this to (A×(O+N))ω.
We unpack the semidirect product to yield
=i∗(pr(A×O)ω∗(⊤Π⋉Θ))(λsN0.→π′N:∞(sN0)(λaosN1:∞.UN(aoN1:∞)))
and rewrite the injection pushforward
=pr(A×O)ω∗(⊤Π⋉Θ)(λh.→π′N:∞(h)(λaosN1:∞.UN(aoN1:∞)))
and remove the projection
=(⊤Π⋉Θ)(λπ,h.→π′N:∞(h)(λaosN1:∞.UN(aoN1:∞)))
And unpack the semidirect product along with ⊤Π.
=infπΘ(π)(λh.→π′N:∞(h)(λaosN1:∞.UN(aoN1:∞)))
Now we rewrite this as a projection to yield
=infπΘ(π)(λh.pr(A×(O+N))ω∗(→π′N:∞(h))(λaoN1:∞.UN(aoN1:∞)))
And do a little bit of reindexing for convenience, swapping out the symbol aoN1:∞ for hN, getting
=infπΘ(π)(λh.pr(A×(O+N))ω∗(→π′N:∞(h))(λhN.UN(hN)))
And now, we can ask what pr(A×(O+N))ω∗(→π′N:∞(h)) is. Basically, the starting state is the destiny h, and then π′ interacts with it through the kernel →N. Due to how →N is defined, it can do one of three things. First, it could just unroll h all the way to completion, if h is the sort of history that's compatible with π′. Second, it could partially unroll h and deviate to Nirvana early, even if π′ is compatible with more, because of how →N(aoh,a) is defined. Finally, π′ could deviate from h and then →N would have to respond with Nirvana. And of course, once you're in a Nirvana state, you're in there for good. Hm, you have total uncertainty over histories hN compatible with π′ and h, any of them at all could be made by pr(A×(O+N))ω∗(→π′N:∞(h)). So, this projection of the infrakernel unrolling h is just Kπ′(h). We get
=infπΘ(π)(λh.Kπ′(h)(λhN.UN(hN)))
And then we use what Kπ′(h) does to functions, to get
=infπΘ(π)(λh.infhN∼π′,hUN(hN))
Which is exactly the form we need. So, all three translation procedures produce identical belief functions.
T7.4 The only thing which remains to wrap this result up is guaranteeing that, if UN:(A×(O+N))ω→[0,1] is 1 if the history contains Nirvana, (or infinity for the R type signature), and U otherwise, then Θc(π′)(UN)=Θ(π′)(U).
We rephrased Θc(π′)(UN) as infπΘ(π)(λh.infhN∼π′,hUN(hN))
We can notice something interesting. If h is the sort of history that π′ is capable of making, ie h∼π′, then all of the hN∼π′,h are going to end up in Nirvana except h itself, because all the hN are like "follow π′, any deviations of π′ from h go to Nirvana, and also Nirvana can show up early". In this case, since any history ending with Nirvana is maximum utility, and otherwise U is copied, we have infhN∼π′,hUN(hN)=U(h). However, if h is incompatible with π′, then no matter what, a history hN which follows π′ is going to hit Nirvana and be assigned maximum utility. So, we can rewrite
infπΘ(π)(λh.infhN∼π′,hUN(hN))
as
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Which can be written as an update, yielding
=infπu1∼π′(Θ(π))(U)
And then, we remember that regardless of π′ and π, for pseudocausal belief functions, which Θ is, we have that u1∼π′(Θ(π))(U)≥Θ(π′)(U), so the update assigns higher expectation values than Θ(π′) does. Also, u1∼π′(Θ(π′))=Θ(π′). Thus, the infinimum is attained by π′, and we get
=Θ(π′)(U)
and we're done.
Theorem 8: Acausal to Pseudocausal Translation: The following diagram commutes when the bottom belief function Θ is acausal. The upwards arrows are the pseudocausal translation procedures, and the top arrows are the usual morphisms between the type signatures. The new belief function Θp will be pseudocausal.
T8.0 First, preliminaries. The two translation procedures are, to go from an infradistribution over policy-tagged destinies D, to an infradistribution over destinies Dp, we just project.
Dp:=pr(A×O)ω∗(D)
And, to go from an acausal belief function Θ to a pseudocausal one Θp, we do the translation
Θp(π′):=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))
We assume that D and Θ commute, and try to show that these translation procedures result in a Θp and Dp which are pseudocausal and commute with each other. So, our proof goals are that
Θp(π′)=u1∼π′(Dp)
and
Dp=pr(A×O)ω∗(⊤Π⋉Θp)
Once we've done that, it's easy to get that Θp is pseudocausal. This occurs because the projection of any infradistribution is an infradistribution, so we can conclude that Dp is an infradistribution over (A×O)ω. Also, back in the Pseudocausal Commutative Square theorem, we were able to derive that any infradistribution Dp over (A×O)ω, when translated to a Θp, makes a pseudocausal belief function, no special properties were used in that except being an infradistribution.
T8.1 Let's address the first one, trying to show that
Θp(π′)=u1∼π′(Dp)
We'll take the left side, plug in some function, and keep unpacking and rearranging until we get the right side with some function plugged in. First up is applying definitions.
Θp(π′)(U)=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))(U)
by how Θp was defined. Now, we start unpacking. First, the projection.
=(⊤Π⋉(λπ.u1∼π′(Θ(π))))(λπ,h.U(h))
Then unpack the semidirect product and ⊤Π, for
=infπu1∼π′(Θ(π))(λh.U(h))
Then unpack the update, to get
=infπΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Then pack up the infπ
=⊤Π(λπ.Θ(π)(λh.1h∼π′U(h)+1h≁π′))
And pack up the semidirect product
=(⊤Π⋉Θ)(λπ,h.1h∼π′U(h)+1h≁π′)
And since D=⊤Π⋉Θ for the original acausal belief function, we can rewrite this as
=D(λπ,h.1h∼π′U(h)+1h≁π′)
And pack this up in a projection.
=pr(A×O)ω∗(D)(λh.1h∼π′U(h)+1h≁π′)
And then, since we defined Dp:=pr(A×O)ω∗(D), we can pack this up as
=Dp(λh.1h∼π′U(h)+1h≁π′)
And then pack up the update
=u1∼π′(Dp)(U)
So, that's one direction done.
T8.2 Now for the second result, showing that
Dp=pr(A×O)ω∗(⊤Π⋉Θp)
Again, we'll take the more complicated form, and write it as the first one, for arbitrary functions U.bWe begin with
pr(A×O)ω∗(⊤Π⋉Θp)(U)
and then unpack what Θp(π′) is to get
=pr(A×O)ω∗(⊤Π⋉(λπ′.pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))))(U)
Undo the projection
=(⊤Π⋉(λπ′.pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))))(λπ′,h.U(h))
Unpack the semidirect product and ⊤Π.
=infπ′pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))(λh.U(h))
Remove the projection
=infπ′(⊤Π⋉(λπ.u1∼π′(Θ(π)))(λπ′′,h.U(h))
Unpack the semidirect product and ⊤Π again.
=infπ′infπu1∼π′(Θ(π))(λh.U(h))
Let's fold the two infs into one inf, this doesn't change anything, just makes things a bit more concise.
=infπ′,πu1∼π′(Θ(π))(λh.U(h))
and unpack the update.
=infπ′,πΘ(π)(λh.1h∼π′U(h)+1h≁π′)
Now, we can note something interesting here. If π′ is selected to be equal to π, that inner function is just U, because Θ(π) is only supported over histories compatible with π. If π′ is unequal to π, that inner function is U with 1's for some inputs, possibly. So, our choice of π′ is optimal for minimizing when its equal to π, so we can rewrite as
=infπΘ(π)(λh.U(h))
and write this as ⊤Π
=⊤Π(λπ.Θ(π)(λh.U(h)))
And pack up as a semidirect product
=(⊤Π⋉Θ)(λπ,h.U(h))
and then because D=⊤Π⋉Θ, we substitute to get
=D(λπ,h.U(h))
and then write as a projection.
=pr(A×O)ω∗(D)(U)
Now, this is just Dp, yielding
=Dp(U)
and we're done.
Proposition 2: The two formulations of x-pseudocausality are equivalent for crisp acausal belief functions.
One formulation of x-pseudocausality is that
∀π,π′:u1∼π′(Θ(π))⊆Θx(π′) where Θx(π′)(U):=Θ(π′)(inf(x,U))
The second formulation is,
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
This proof will be slightly informal, it can be fully formalized without too many conceptual difficulties. We need to establish some basic probability theory results to work on this. The proof outline is we establish/recap four probability-theory/inframeasure-theory results, and then show the two implication directions of the iff statement.
P2.1 Our first claim is that, given any two probability distributions μ and ν, you can shatter them into measures m∗,mμ, and mν such that m∗+mμ=μ and m∗+mν=ν, and mμ(1)=mν(1)=dTV(μ,ν). What this decomposition is basically doing is finding the largest chunk of measure that μ and ν agree on (that's m∗), and then subtracting that out from μ and ν yields you your mμ and mν. These measures have no overlap, because if there was overlap, you could put that overlap into the m∗ chunk of measure. And, the mass of the "unique to μ" piece must equal the mass of the "unique to ν" piece because they must add up to probability distributions, and this quantity is exactly the total variation distance of μ,ν.
P2.2 Our second claim is that
x⋅dTV(μ,ν)=supU:(A×O)ω→[0,x]|μ(U)−ν(U)|
Why is this? Well, there's an alternate formulation of total variation distance where
dTV(μ,ν)=supU:(A×O)ω→[0,1]|μ(U)−ν(U)|
As intuition for this, we can pick a function U which is 1 on the support of mμ, and 0 elsewhere. In such a case, we'd get
|μ(U)−ν(U)|=|m∗(U)+mμ(U)−m∗(U)−mν(U)|=|mμ(U)−mν(U)|
and then remember that mμ and mν have disjoint supports, so we get
=|mμ(1)−mν(0)|=mμ(1)=dTV(μ,ν)
And any other function in the [0,1] range wouldn't do as well at attaining different expectation values on mμ and mν. So that's where the identity comes from. If we try to do the optimal separation but instead restrict U to be in [0,x], we'd get
=|mμ(x)−mν(0)|=mμ(x)=x⋅mμ(1)=x⋅dTV(μ,ν)
and any other utility function in [0,x] wouldn't do as well at separating the two.
P2.3 Our third claim is that the function which maps f to −∞, if f goes outside of [0,x], and otherwise maps f to ψ(f), corresponds to taking the set of sa-measures that is Ψ (the set form of ψ), and taking the upper completion of it under all signed-measure, b pairs where b+x⋅m−(1)≥0. Call this an x-sa measure. Ie, if x=13, then one unit of +b is big enough to cancel out 3 units of negative-measure in the signed measure. Clearly, if f:X→[0,x], and (m,b) has b+x⋅m−(1)≥0 (is an x-sa measure) then we have
m(f)+b=m+(f)+m−(f)+b≥m+(0)+m−(x)+b=x⋅m−(1)+b≥0
So, adding in the cone of all x-sa pairs like that doesn't actually affect the expectation values of any function bounded in [0,x] at all. However, for any function which goes over the x upper-bound, you can consider a x-sa measure which concentrates all its large amount of negative measure on a point where the function exceeds the upper bound, and exactly compensates for it with the +b term. Adding this on can make the function, evaluated at said point, have an unboundedly negative expectation value, justifying our choice of −∞ for the expectation value.
P2.4 Time for the fourth claim. To build up to it, the monotone hull of a functional C(X,[0,x])→[0,1] is as follows: if ψx is the original functional, then ψm is the monotone hull defined by
ψm(f):=supf′:X→[0,x]:f′≤fψx(f)
Our fourth claim is that, in the set-based value, this operation corresponds to restricting the set of x-sa measures that is Ψx to just the a-measures in it. Why? Well, we have
∀f∈CB(X):ψm(f)≥ψx(f)
This is because ψx and ψm both agree that any function with negative parts gets −∞ value, and both agree on the expectation of any function bounded in [0,x], but for functions above 0 which also stray above x somewhere, ψx(f)=−∞, while ψm(f)≥0.
Therefore, since ψm≥ψx (when considered as a function, we aren't using the infradistribution reversed ordering here), we have that Ψm⊆Ψx (higher functions correspond to smaller sets). Now, Ψm cannot contain any x-sa-measures with negative measure anywhere in the measure component. For, if there was such a point in there, we could consider a function f that assigned extremely high value to the region of negative measure, and 0 value elsewhere, and f would attain a negative expectation value on that x-sa measure, which contradicts the nonnegative worst-case expectation value of f due to how ψm was defined. So, the set Ψm must consist entirely of a-measures. Further, for any function f with range in [0,x], it is minimized by an a-measure in Ψx, and the same point is present in Ψx∩Ma (no extra a-measures added), so the ϕ corresponding to that set must assign the exact same expectation value to f as ψx and ψm do. Also, any set consisting entirely of a-measures must have its expectation functional be monotone.
So, recapping what we've done so far, we know that Ψm⊆Ψx. We know that Ψm must consist entirely of a-measures, otherwise we'd get a contradiction with monotonicity. Further, Ψx∩Ma (the biggest set that Ψm could possibly be) has its expectation functional ϕ agree with ψx on the expectations of functions in [0,x], and it's monotone like ψm is. Now, if Ψm⊂Ψx∩Ma (strict subset), we'd have the functional ϕ corresponding to the intersection being strictly below ψm on some function. However ψm is the minimal monotone function that agrees with ψx on the expectations of functions in [0,x], so this rule out strict subset, and we must have equality. The monotone hull of an expectation functional corresponds to the restriction to the set to a-measures. Further, we can rewrite the definition
ψm(f):=supf′:(A×O)ω→[0,x]:f′≤fψx(f)
as just
ψm(f):=ψx(inf(x,f))
And we're done with this.
Putting claims 3 and 4 together, we can now express what the set Θx(π′) is, from our definition of Θx(π′)(U):=Θ(π′)(inf(x,U)). Θx(π′) (interpreted as a set of sa-measures) is just the upper completion of Θ(π′) under x-sa measures, those with b+x⋅m−(1)≥0 (this is the restriction to [0,x]), and then the intersection of that set with the cone of a-measures (monotone hull), and then upper completion under sa-measures (the restriction to [0,1]).
P2.5 Time to begin showing one direction of the iff statement. Remember, we're assuming that all the Θ(π) are induced by a set of probability distributions, they're crisp. We'll start with the easy direction, that
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
implies
∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U)
where Θx(π′)(U):=Θ(π′)(inf(x,U)). (well, technically, this shouldn't just be continuous functions, but lower-semicontinuous functions as well) To begin with, let π,π′,U be arbitrary with U bounded in [0,1]. We can start unpacking
u1∼π′(Θ(π)(U))=Θ(π)(λh.1h∼π′U(h)+1h≁π′)
Now, this minimal value must be attained by some minimal point in Θ(π), which is some probability distribution μ. So, we have
=μ(λh.1h∼π′U(h)+1h≁π′)
We can, at this point, select the closest probability distribution to μ in Θ(π′) according to total variation distance, call that ν. We do our usual shattering of μ into m∗ and mμ, the chunk of measure where μ and ν overlap, and the chunk of measure that's unique to μ. Note that the chunk of measure where μ and ν overlap must be supported on histories compatible with π′ since ν∈Θ(π′), so we can break things down as
=m∗(U)+mμ(λh.1h∼π′U(h)+1h≁π′)
And then we can break mμ down further into mμ,∼π′ and mμ,≁π′, for the chunk of measure on histories compatible with π′ and incompatible, respectively, yielding
=m∗(U)+mμ,∼π′(U)+mμ,≁π′(1)
Which can be rephrased as
=m∗(U)+mμ,∼π′(U)+μ(h≁π′)
Now it's time for some inequalities, explained afterwards.
m∗(U)+mμ,∼π′(U)+μ(h≁π′)≥m∗(U)+μ(h≁π′)≥m∗(inf(U,x))+μ(h≁π′)
≥m∗(inf(U,x))+x⋅dTV(μ,ν)=m∗(inf(U,x))+supU′:(A×O)ω→[0,x]|ν(U′)−μ(U′)|
=m∗(inf(U,x))+supU′:(A×O)ω→[0,x]|m∗(U′)+mν(U′)−m∗(U′)−mμ(U′)|
=m∗(inf(U,x))+supU′:(A×O)ω→[0,x]|mν(U′)−mμ(U′)|
=m∗(inf(U,x))+mν(x)≥m∗(inf(U,x))+mν(inf(U,x))=ν(inf(U,x))
≥infν′∈Θ(π′)ν′(inf(U,x))=Θ(π′)(inf(U,x))=Θx(U)
So, for the first line, the first two inequalities are obvious. For the second line, the initial inequality is us invoking the total variation distance variant of x-pseudocausality since ν is as close as possible to μ while also being in Θ(π′), and the equality is reexpressing total variation distance in accordance with our second claim. The third line is just shattering ν and μ in our way connected with total variation distance, and the fourth line is just canceling. For the fifth line, the first equality is the supremum being attained by a function that's x on mν, and 0 on mμ, as the two measures are disjoint, and the second equality is ν=m∗+mν. Then, for line 6, the inequality is obvious because ν∈Θ(π′), and then we just wrap the inf up as an expectation according to Θ(π′), and apply the definition of Θx(π′).
P2.6 Now for the other direction of the iff-statement, going from
∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U)
to
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
Using our set-based characterization of inframeasure ordering, we can accurately rephrase our initial statement as
∀π,π′:u1∼π′(Θ(π))⊆Θx(π′)
The former set is "take all the minimal points of Θ(π), convert all their measure on histories incompatible with π′ to the +b term, do closed convex hull and upper completion", and the latter set is "take Θ(π′), do upper completion w.r.t. the cone of x-sa measures, restrict to just the a-measures, and then add in the cone of sa-measures again"
Pick a μ∈Θ(π). When we 1-update it on compatibility with π′, we get (mμ,∼π′,μ(h≁π′)). And said point lies in Θx(π′), which means it must be possible to create by taking a minimal point in Θ(π′) (call that one ν), and adding an x-sa measure to it. Call the x-sa measure (m′,b′). So, our net result is:
(mμ,∼π′,μ(h≁π′))=(ν,0)+(m′,b′)
The update of μ equals ν plus an x-sa measure. Now, breaking up m′ into its positive and negative parts which are disjoint, (m′)+ and (m′)−, we can rewrite things as:
b′=μ(h≁π′)
mμ,∼π′=ν+(m′)++(m′)−
Adding mμ,≁π′ to both sides and folding it into μ, and rearranging a bit and adding parentheses, that second equality can be restated as
μ=ν+((m′)++mμ,≁π′)+(m′)−
Now, ν is supported entirely over histories compatible with π′, and (m′)− is a chunk of negative measure taken out of ν, so (m′)− must be supported entirely over histories compatible with π′. Also, (m′)+ and (m′)− are disjoint (Jordan decomposition), and mμ,≁π′ is supported entirely over histories incompatible with π′, so the positive measure (m′)++mμ,≁π′ and the negative measure (m′)− must be disjoint. Since μ is made from ν by adding on a chunk of positive measure and a chunk of negative measure which are disjoint, we have dTV(μ,ν)=−(m′)−(1). At this point, we only have one step left.
For the last step, because ((m′)++(m′)−,b′) is an x-sa measure, we have b′+x⋅(m′)−(1)≥0. Also, because b′=μ(h≁π′), our inequality turns into
μ(h≁π′)+x⋅(m′)−(1)≥0
μ(h≁π′)≥−x⋅(m′)−(1)
μ(h≁π′)≥x⋅dTV(μ,ν)
and this gets us our
∀π,π′,μ∈Θ(π):μ(h≁π′)≥x⋅infν∈Θ(π′)dTV(μ,ν)
result. We're done!
Proposition 3: If a utility function U is bounded within [0,x] and a belief function Θ is x-pseudocausal, then translating Θ to Θp (pseudocausal translation) fulfills ∀π:Θp(π)(U)=Θ(π)(U).
Blessedly, this one is very easy. First, let's recap the translation of a belief function (assumed to be x-pseudocausal) from acausal to pseudocausal. It is
Θp(π′):=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))
So, applying this, we have
Θp(π′)(U)=pr(A×O)ω∗(⊤Π⋉(λπ.u1∼π′(Θ(π))))(U)
=(⊤Π⋉(λπ.u1∼π′(Θ(π))))(λπ,h.U(h))=infπu1∼π′(Θ(π))(U)
And then we recall that x-pseudocausality is ∀π,π′,U:u1∼π′(Θ(π))(U)≥Θx(π′)(U) Also, recall that since Θx(π′)(U):=Θ(π′)(inf(U,x)), and our utility function is bounded in [0,x], we get the statement ∀π,π′:u1∼π′(Θ(π))(U)≥Θ(π′)(U) Additionally, u1π′(Θ(π′))=Θ(π′). Therefore, the infinimum is attained by π′ itself, and we get
=Θ(π′)(U)
and we're done!