We're going to need to go out of order here a bit and do Proposition 31 before doing Propositions 29 and 30. Please have patience.
Proposition 31:All five characterizations of the pullback define the same infradistribution.
So, the two requirements of being able to do a pullback g∗(h) are that: h has g(X) as a support, and g is a proper function (preimage of a compact set is compact). And our five characterizations are:
1: For sets,
g∗(H):=(g∗)−1(H)
ie, take the preimage of H w.r.t. the linear operator that g induces from Ma(X)→Ma(Y).
2: For concave functionals, g∗(h) is the concave monotone hull of the partial function f∘g↦h(f).
3: For concave functionals, g∗(h) is the monotone hull of the partial function f∘g↦h(f), ie:
g∗(h)(f):=supf∗:f∗∘g≤fh(f∗)
4: For concave functionals,
g∗(h)=inf{h′|h′∈□X,g∗(h′)=h}
take the inf of all infradistributions that could have made your infradistribution of interest via pushforward.
5: For sets, take the closed convex hull of all infradistribution sets which project down to be equivalent to H.
The proof path proof path will be first establishing that 3 makes an infradistribution, then showing that 3 and 2 are equivalent, then showing that 2 and 4 are equivalent, then showing that 4 and 5 are equivalent. And now that 2, 3, 4, and 5 are in an equivalence class, we wrap up by showing that 2 implies 1 and 1 implies 5.
First up: Establishing that 3 makes an infradistribution, which is where the bulk of the work is.
Starting with normalization:
g∗(h)(1):=supf∗:f∗∘g≤1h(f∗)
And clearly, this supremum is attained by f∗=1, it can be no higher, by normalization, so we have h(1)=1 and thus one part of normalization. For the other half of normalization,
g∗(h)(0)=supf∗:f∗∘g≤0h(f∗)
Now, in order for f∗∘g≤0 to be the case, all we need is that f∗ be 0 on g(X). However, since g(X) is assumed to be a support for h, all functions like this must have h(f∗)=h(0)=0 and we have the other part of normalization.
Monotonicity: This is immediate because if f′≥f, then the supremum for f′ has more options and can attain a higher value.
And we can pick some f∗ and f′∗ which very nearly attain the relevant values, as close as you wish, to go:
≃ph(f∗)+(1−p)h(f′∗)≤h(pf∗+(1−p)f′∗)
And then,
(pf∗+(1−p)f′∗)∘g=p(f∗∘g)+(1−p)(f′∗∘g)≤pf+(1−p)f′
So, with that, we can go:
≤supf∗:f∗∘g≤pf+(1−p)f′h(f∗)=g∗(h)(pf+(1−p)f′)
And we're done with concavity.
For Lipschitzness, the proof will work as follows. We'll take two functions f and f′, and show that g∗(h)(f′)≥g∗(h)(f)−λ⊙d(f,f′) in a way that generalizes to symmetrically show that g∗(h)(f)≥g∗(h)(f′)−λ⊙d(f,f′) to get our result that
|g∗(h)(f)−g∗(h)(f′)|≤λ⊙d(f,f′)
where λ⊙ is the Lipschitz constant of h.
So, let's show
g∗(h)(f′)≥g∗(h)(f)−λ⊙d(f,f′)
What we can do is, because
g∗(h)(f)=supf∗:f∗∘g≤fh(f∗)
then we can consider this value to be approximately obtained by some specific function f∗. Then,
The first two inequalities are from monotonicity. The third inequality is because (f∗−d(f,f′))∘g≤f−d(f,f′)), where f∗ is our special function from a bit earlier, because f∗∘g≤f. So, it isn't necessarily the supremum and doesn't attain maximum value. Then, there's just Lipschitzness of h, and h(f∗) being approximately g∗(h)(f). And we're done with Lipschitzness, just swap f and f′ and use earlier arguments.
Before proceeding further, note the following information: Polish spaces are first-countable and Hausdorff so they're compactly generated, and Wikipedia says any proper map to a compactly generated space is closed (maps closed sets to closed sets). So, g(X) is a closed set.
Now just leaves compact almost-support. Fix an ϵ, and take a compact ϵ-almost-support of h, Cϵ⊆Y. Intersect with g(X) if you haven't already, it still makes a compact set and it keeps being an ϵ-almost-support. We will show that g−1(Cϵ), which is compact because we assumed our map was proper, is an ϵ-almost-support of g∗(h).
This proof will proceed by showing that, if f,f′ agree on g−1(Cϵ), then g∗(h)(f)−ϵd(f,f′)≤g∗(h)(f′). Symmetrical arguments work by swapping f and f′, showing that the two expectation values are only ϵd(f,f′) apart.
As usual, we have
g∗(h)(f)≃h(f∗)
Where f∗ very nearly attains the supremum, and we know that f∗∘g≤f.
Let the function Θ:g(X)→P(R) (a set-valued function) be defined as:
These sets are always concave, and also always nonempty, because, in the second case, the only way it could be nonempty is if
minx∈g−1(y)f′(x)<f∗(y)−d(f,f′)
Now, g being a proper map, the preimage of the single point y is a compact set, so we can actually minimize our function, pick out our particular x that maps to y. Then, we know:
f′(x)+d(f,f′)≥f(x)≥f∗(g(x))=f∗(y)
The second inequality is because f≥f∗∘g by assumption. Shuffling this around a bit, we have:
f′(x)≥f∗(y)−d(f,f′)
where our x is minimizing, which is incompatible with nonemptiness.
So, we've got convexity and nonemptiness. Let's shower lower-hemicontinuity.
As a warmup for that, we'll need to show that the function y↦minx∈g−1(y)f′(x) Is lower-semicontinuous.
So, our proof goal is: if yn limits to y, then
liminfn→∞minx∈g−1(yn)f′(x)≥minx∈g−1(y)f′(x)
And we know that all our yn and y itself, lie in g(X).
At this point, here's what we do. Isolate the subsequence of points we're using to attain the liminf. This sequence of points yn (when unioned with y itself) is compact in Y, so its preimage in X is compact. And you can minimize functions over compact sets. So, for each yn, pick a point xn∈g−1(yn) that minimizes f′. Our sequence of points xn is roaming around in a compact set, so we can pick a convergent subsequence. At this point, let's start indexing with m. Said limit point x must have f′(x)=limm→∞f′(xm) by continuity for f′, and g(x)=y by continuity for g (all the xm get mapped to ym, so the limit point x must go to the limit point y) Putting it all together,
And y↦minx∈g−1(y)f′(x) is lower-semicontinuous over g(X).
At this point, we'll try to show that the set-valued function Θ is lower-hemicontinuous over g(X). We'll use z to denote numbers in R.
The criterion for lower-hemicontinuity of Θ is: If yn limits to y, and z∈Θ(y), then there's a subsequence ym and points zm∈Θ(ym) where limm→∞zm=z
To show this, we'll divide into three cases. In our first case, only finitely many of the yn are in Cϵ. Let's pass to a subsequence where we remove all the yn that lie in Cϵ, and where minx∈g−1(ym)f′(x) converges to a value. Let
This was because all these sequences have their corresponding values converge, whether by continuity or assumption. So, this takes care of lower-hemicontinuity in one case.
Now for our second case, where only finitely may of the yn aren't in Cϵ. Pass to a subsequence where they are all removed, and where minx∈g−1(ym)f′(x) converges to a value. Accordingly, all our ym lie in Cϵ. Let zm=f∗(ym). Then
limm→∞zm=limm→∞f∗(ym)=f∗(y)=z
Our third case is the trickiest, where there are infinitely many yn in Cϵ, and infinitely many outside that set, and accordingly, y∈Cϵ Let's pass to a subsequence where we remove all the yn that lie in Cϵ and where minx∈g−1(ym)f′(x) converges to a value. We would be able to apply the proof from the first case if we knew that
Obviously, we have f∗(y)≥f∗(y)−d(f,f′), but we still need to show that
f∗(y)≤minx∈g−1(y)f′(x)
and
f∗(y)≤f∗(y)+d(f,f′)
The second one can be addressed trivially, so now we just have to show that
f∗(y)≤minx∈g−1(y)f′(x)
Pick some minimizing x, where g(x)=y, and remember that this y lies in Cϵ, so x∈g−1(Cϵ), the compact set where f=f′. Now we can just go:
f∗(y)=f∗(g(x))≤f(x)=f′(x)
And we're done. That's the last piece we need to know that Θ is lower-hemicontinuous.
Now we will invoke the Michael selection theorem, which gets us a continuous selection f∗′:Y→R (bounded because f and f′ are) where, for all y, f∗′(y)∈Θ(y).
The conditions to invoke the Michael selection theorem are that F is lower-hemicontinuous (check), that Y is paracompact, and that R is a Banach space (check). Fortunately, any metrizable space is paracompact, and Polish spaces are metrizable, so we can invoke the theorem and get our continuous f∗′.
Note two important things about this continuous f∗′ we just constructed. First, it perfectly matches f∗ over Cϵ. Second, for all x in g(X) (remember, that's the space it's defined over)
(f∗′∘g)(x)=f∗′(g(x))≤minx′∈g−1(g(x))f′(x′)≤f′(x)
Because of that worst-case bound thing being one of the upper bounds on Θ(y). Also, over g(X), it remains within d(f,f′) of f∗, again, because of the imposed bounds.
We can use the Tietze extension theorem to extend f∗′ to all of X while staying within d(f,f′) of f∗.
At this point, we can return to where we started out in this section, and go:
The first equality was the definition of the preimage, the approximate equality was because f∗ is very very near attaining the true supremum, the ≤ is because f′∗ and f agree on Cϵ which is an ϵ-almost-support for h, the next inequality after that is because we just showered that f∗′∘g≤f′, and then packing up the definition of preimage.
Perfectly symmetric arguments work when you swap f′ and f, yielding our net result that
|g∗(h)(f)−g∗(h)(f′)|≤ϵd(f,f′)
And f,f′ were selected to be arbitrary continuous functions that agree on g−1(Cϵ), which is compact, so we have a compact ϵ-almost-support, so g∗(h) has compact almost-support and is an infradistribution.
Ok, so we're done, we've defined a pullback, it's the monotone hull of the function h under precomposition with g, and this was definition 3. Now it's time to start showing equivalences.
First, the equality of definitions 2 and 3. The concave monotone hull of the partially defined pullback must be ≥ than the monotone hull of the partially defined pullback, because we're imposing an additional condition, there are less functions that fit the bill. But, we've showed that the monotone hull is concave, so it's actually an equality.
Now for the equality of of 2 and 4, that the concave monotone hull of the pullback is the same as the inf of all infradistributions that produce h when pushfoward is applied.
We know that the concave monotone hull is indeed an infradistribution, and
g∗(g∗(h))(f)=g∗(h)(f∘g)=supf∗:f∗∘g≤f∘gh(f∗)=h(f)
So, since its projection forward matches h perfectly, we have
g∗(h)≥inf{h′∈□X|g∗(h′)=h}
For actual equality, we note that, if g∗(h′)=h, then
h′(f∘g)=g∗(h′)(f)=h(f)
And since g∗(h) is the concave monotone hull of the partial function f∘g↦h(f), and the concave monotone hull is the least concave monotone function compatible with said partial function, and any h′ which pushes forward to map to h must be concave and monotone and match this partial function, all the h′ we're inf-ing together lie above g∗(h), so we have actual equality and
g∗(h)=inf{h′|g∗(h′)=h}
So, at this point, definitions 2, 3, and 4 of the concave hull are equal.
Now for equality of definitions 4 and 5. Taking the inf of infradistributions on the concave functional level is the same as unioning together all the infradistribution sets, and taking the closed concave hull (because that doesn't affect the expectation values), so "inf of infradistributions that pushfoward to make h", on the concave functional side of LF-duality, is the same as "closed concave hull and upper completion of infradistributions which pushforward to make H" on the set side of LF-duality.
Now that we know definitions 2-5 of the pullback are all equal, and using h3,h1,h5 for the expectation functionals corresponding to definitions 3, 1, and 5 of the pullback, we'll show h5≥h1≥h3, so h1, ie, the first definition of the pullback, must be equal to the rest.
First, h1 is the functional corresponding to the set:
g∗(H):=(g∗)−1(H)
Where g∗ in this case is the function of type signature Ma(X)→Ma(Y) given by keeping the b term the same and pushing forward the signed measure component via the function g. Note that said pushfoward can actually only make points in Ma(g(X)), the measures can only be supported over that set.
We'll show that
Eg∗(H):CB(X)→R
is monotone and that
Eg∗(H)(f∘g)=EH(f)
To begin with, for monotonicity, if f′≥f, then
Eg∗(H)(f′)=inf(m,b)∈(g∗)−1(H)m(f′)+b
≥inf(m,b)∈(g∗)−1(H)m(f)+b
=Eg∗(H)(f)
This is easy because our set is entirely of a-measures. So the corresponding h1 is monotone.
Now, at this point, we can go:
Eg∗(H)(f∘g)=inf(m,b)∈(g∗)−1(H)m(f∘g)+b
=inf(m,b):(g∗(m),b)∈Hm(f∘g)+b
The next line is because g∗(m)(f) (the continuous pushfoward of a measure, evaluated via f) is the same as m(f∘g) (old measure evaluating f∘g)
=inf(m,b):(g∗(m),b)∈Hg∗(m)(f)+b
The next line is a bit tricky, because pushforward seems like it may only net some of the points in H. However, any measure supported over g(X) has a measure over X that, when pushfoward is applied, produces it. Just take each point y∈g(X), craft a Markov Kernel from g(X) to X that maps all y∈g(X) to a probability distribution supported over g−1(y), take the semidirect product of your measure over g(X) and your kernel, and project to X to get such a measure. So, said pushfoward surjects onto H, and we can get:
=inf(m′,b′)∈Hm′(f)+b′=EH(f)
So, now that we've shown that
Eg∗(H):CB(X)→R
is monotone and that
Eg∗(H)(f∘g)=EH(f)
We know that the corresponding functional h1 has the property that it's monotone and h1(f∘g)=h(f). Thus, it must lie above the monontone hull of f∘g↦h(f), which is just h3 (third definition of the pullback) and we have: h1≥h3.
Now for the other direction, showing that h5≥h1. The fifth definition of the pullback was the closed convex hull (and upper completion) of all the infradistributions which project to H. So, if we can show that any infradistribution set H′ which projects into H must have its minimal points lie in g∗(H), we'll have that all infradistribution which project to H are a subset of g∗(H), and thus is higher in the ordering on functionals, h5≥h1.
This is trivial by g∗(H) just being a preimage, so any set which pushforwards into the set of interest must be a subset of the preimage.
And we have our desired result that any infradistribution set which projects into H must be a subset of g∗(H) (all minimal points are present in it), and higher in the information ordering, so the inf of all of them, ie, the functional h5, must lie above the corresponding functional h1.
So, we have h5≥h1≥h3=h5 and so h1 equals all of them, and the restricted preimage of H is another formulation of the pullback, and we're done.
Proposition 29:The pullback of an infradistribution is an infradistribution, and it preserves all properties indicated in the table for this section.
So, we know from the proof of Proposition 31 that the pullback of an infradistribution is an infradistribution. Now, to show preservation of properties, we'll work with the formulation of the pullback as:
g∗(H)=(g∗)−1(H)
For homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness, they are all characterized by all minimal points having certain properties of their λ and b values. All minimal points of g∗(H) must pushforward to minimal points of H (otherwise the b term of your original minimal point could be strictly decreased and it'd still pushforward into H), which have the property of interest, and because pushforward g∗ preserves the λ and b values of a-measures, we know that all minimal points of g∗(H) have the same property on their λ and b, so all these properties are preserved under pullback.
Sharpness takes a slightly more detailed argument. Let's say that H is sharp, ie, its minimal points consist entirely of the probability distributions on C. We know from crispness preservation that g∗(H) has its minimal points consist entirely of probability distributions. If there was a probability distribution that wasn't supported on g−1(C) (preimage of C, which, by the assumed properness of g, is compact), that was present as a minimal point, then it would pushfoward to make a minimal point in H that was a probability distribution that had some measure outside of C, which is impossible. Further, any probability distribution that is supported on g−1(C) would pushfoward to make a probability distribution supported on C, which would be a minimal point of H, so said point must be minimal in g∗(H). Thus, the minimal points of g∗(H) consist entirely of probability distributions supported on g−1(C), so g∗(h) is sharp, and all nice properties are therefore preserved.
Proposition 30:Pullback then pushforward is identity, g∗(g∗(h))=h, and pushforward then pullback is below identity, g∗(g∗(h))≤h
The first part is easy to show,
g∗(g∗(h))(f)=g∗(h)(f∘g)=supf∗:f∗∘g≤f∘gh(f∗)=h(f)
Yielding pullback then pushforward being identity. For the reverse direction, we use Proposition 31, in particular the part about the pullback being the least infradistribution that pushes forward to produce the target. And we have g∗(g∗(g∗(h)))=g∗(h), therefore g∗(g∗(h)))≤h
Proposition 32:The free product of two infradistributions exists and is an infradistribution iff: 1: There are points (λμ,0)∈Hmin1 and (λ′μ′,0)∈Hmin2 where λ=λ′ 2: There are points (λμ,b)∈Hmin1 and (λ′μ′,b′)∈Hmin2 where λ=λ′, and b=b′, and λ+b=1.
So, to show this, what we'll be doing is using our set-form characterization of the pullback as a preimage and the supremum as an intersection. We already know that, from proposition 14, a necessary condition for the intersection of infradistributions being an infradistribution is normalization, ie, on the set level, the presence of a point with b=0 and a point with λ+b=1. The presence of a point with b=0 in the intersection of the preimages would pushforward to make a b=0 point in H1 and H2, respectively, with equal λ values, fulfilling condition 1.
Also, the presence of a point with λ+b=1 in the intersection of the preimages would pushforward to make a λ+b=1 minimal point in H1 and H2 respectively, with equal λ values and b values, fufilling condition 2.
Thus, if the free product exists and is an infradistribution, the two properties are fulfilled. That just leaves the reverse direction of establishing that the free product exists and is an infradistribution if both properties are fulfilled.
We'll be handling the set conditions on an infradistribution. For nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure (λ(μ×μ′),0) which projects down to (λμ,0) and (λμ′,0), so it's present in the intersection of the preimages of H1 and H2 under the projection mappings. This witnesses nonemptiness and one half of normalization.
For the second half of normalization, any point with λ+b<1 would project down to make a point outside of H1 and H2, so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure (λ(μ×μ′),b), with λ+b=1, and it projects down to (λμ,b) and (λμ′,b) in H1 and H2 respectively, witnessing that it's present in the free product, and witnessing that the other half of normalization works out.
For closure and convexity and upper-completeness, the intersection of two closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the b term) is closed-convex upper-complete, so that works out.
That leaves the compact-projection property, which is the only one where we may have issues. Remember, the necessary-and-sufficient conditions for compactness of a set of measures are: bounded λ value, and, for all ϵ, there's a compact subset of X×Y where all the measure components of the points have all but ϵ measure on said set.
Any point in the intersection of preimages with the measure component having a λ value above the minimum of the two λ⊙1 and λ⊙2 values (maximum amount of measure present in points in H1 and H2 respectively) would project down to not be in H1 or H2, respectively, so those are ruled out, and we've got a bound on the amount of measure present.
As for an ϵ-almost-support for all the measure components in H1∗H2, what you do is find a CXϵ2 compact ϵ2-almost-support for H1 and a CYϵ2 compact ϵ2-almost-support for H2, and consider CXϵ2×CYϵ2
We claim this is a compact (product of compact sets) ϵ-almost-support for all measure components in H1∗H2.
This is because any measure component in H1∗H2 must project down to be a measure component of H1 and a measure component of H2. So, if there was more than ϵ measure outside
CXϵ2×CYϵ2
then it would project down to either H1 or H2 and have more than ϵ2 measure outside of the two compact sets, which is impossible by the projection landing in H1 (or H2) and the two compact sets being ϵ2-almost-supports for H1 and H2 respectively.
For any ϵ we can make a compact subset of X×Y that accounts for all but ϵ of the measure present of a measure in H1∗H2, so it's got the compact-projection property, which is the last thing we need to certify it's an infradistribution.
Proposition 33:Having (homogenity or 1-Lipschitzness) present on both of the component infradistributions is a sufficient condition to guarantee the existence of the free product.
From Proposition 32, all we need to do is to verify properties 1 and 2, ie 1: There are points (λμ,0)∈Hmin1 and (λ′μ′,0)∈Hmin2 where λ=λ′ 2: There are points (λμ,b)∈Hmin1 and (λ′μ′,b′)∈Hmin2 where λ=λ′, and b=b′, and λ+b=1.
Given only that (without loss of generality) H1 is 1-Lipschitz or homogenous, and same for H2, we'll show properties 1 and 2.
The trick to this is showing that 1-Lipschitzness and homogenity both imply that an infradistribution contains some point with λ=1 and b=0, and if both infradistributions fulfill that, then conditions 1 and 2 are fulfilled for free, as can be verified with inspection.
So, let's show that 1-Lipschitzness and homogenity imply there's a point with λ=1 and b=0. For 1-Lipschitzness, λ≤1, and since there must be a point with b=0 by normalization, and all points have λ+b≥1 by normalization, said point must have λ=1. As for homogenity, all minimal points have b=0, and normalization says there's a point with λ+b=1, getting that there's a point with λ=1,b=0. So, we're done.
Proposition 34:prX∗(h1∗h2)≥h1 and prY∗(h1∗h2)≥h2, with equality when h1 and h2 are C-additive.
And same for h2. For equality in the C-additive case, we need to show that every point in H1 has some point in H1∗H2 which surjects onto it. Remember, C-additivity is "all minimal points have λ=1", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.
The existence of minimal points with b=0 means that, given any point (μ,b)∈H1, you can find a point with the same b value in H2, call it (μ′,b)∈H2, and then (μ×μ′,b)∈H1∗H2 because it projects down into both sets, showing that H1∗H2 surjects onto H1 when projected down and so must have equal expectation value. Same for H2.
Proposition 35:Free product is commutative and associative.
This is because supremum is commutative (order doesn't matter) and associative (parentheses don't matter), because on the set level it's just intersection, which is order-and-grouping agnostic.
Proposition 36:The free product of countably many infradistributions exists and is an infradistribution iff: 1: There are points (λiμi,0)∈Hmini where, ∀i,i′∈I:λi=λi′ 2: There are points (λiμi,bi)∈Hmini where ∀i,i′∈I:λi=λi′∧bi=bi′∧λi+bi=1
We recap the proof of Proposition 32. The presence of a point with b=0 or λ+b=1 in a hypothetical infinite free product must project down into all the Hi, and cause the fulfillment of conditions 1 and 2.
For the set conditions on an infradistribution, for nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure (λ∏iμi,0) which is present in the intersection of all the a-measure pullbacks. This witnesses nonemptiness and one half of normalization.
For the second half of normalization, any point with λ+b<1 would project down to make a point outside of all the Hi so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure (λ∏iμi,b), with λ+b=1, and it is present in the intersection of all the a-measure pullbacks, witnessing that the other half of normalization works out.
For closure and convexity and upper-completeness, the intersection of closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the b term) is closed-convex upper-complete, so that works out.
That leaves the compact-projection property, which is the only one where we may have issues. For Lipschitzness, we remember that any point in the intersection of preimages with the measure component having a λ value above miniλ⊙i has some Hi that it can't project down into, so we get a contradiction, and we've got a bound on the amount of measure present.
As for an ϵ-almost-support for all the measure components, you biject your Hi with the natural numbers, and find a compact set CXnϵ2n+1 which is a ϵ2n+1-almost-support for Hn and take the product of them. It's compact by Tychonoff, and because each speck of measure of the measure component of a point in the infinite free product that doesn't land in this set projects down to be outside of one of the CXnϵ2n+1 sets, the total amount of measure outside this set is upper-bounded by:
∑∞n=0ϵ2n+1=ϵ
So it's a compact ϵ-almost-support for the measure components of all a-measures in the infinite free product, and this can work for any ϵ, so the infinite free product fulfills the compact almost-support property and has compact projection, which is the last condition we needed.
Proposition 37:pr(Xj)∗(∗i(hi))≥hj, with equality when all the hi are C-additive.
For equality in the C-additive case, we need to show that every point in Hj has some point in ∗iHi which surjects onto it. Remember, C-additivity is "all minimal points have λ=1", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.
The existence of minimal points with b=0 means that, given any point (μj,b)∈Hj, you can find a point with the same b value in Hi (regardless of i), call it (μi,b)∈Hi, and then (∏iμi,b)∈∗iHi because it projects down into all sets, showing that ∗iHi surjects onto Hj when projected down and so must have equal expectation value.
Proposition 38:Ultradistributions H are isomorphic to convex, monotone, normalized, Lipschitz, CAS, functions h:CB(X)→R.
We'll be going over the proof of Theorem 1 again, but with some stuff swapped around.
Our first order of business is establishing the isomorphism. Our first direction is H to h and back is H exactly. By lower completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize H as
{(m,b)|∀f∈CB(X):m(f)+b≤sup(m′,b′)∈H(m′(f)+b′)}
And then, our H can further be reexpressed as
{(m,b)|∀f∈CB(X):m(f)+b≤EH(f)}
{(m,b)|∀f∈CB(X):b≤EH(f)−m(f)}
{(m,b)|b≤inff∈CB(X)(EH(f)−m(f))}
and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of CB(X), and m(f) being a functional applied to an element, this is...
{(m,b)|b≤−(h)∗(m)}
So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of a-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions (we'll really only need continuity and concavity)We want to show that
Given an m, we have a natural candidate for maximizing the b, just set it equal to −(h)∗(m).
So then we get
=supm(m(f)−(h)∗(m))
And this is just... (h)∗∗(f), and, because h is continuous over CB(X), and convex, h=(h)∗∗. From that, we get
E{(m,b)|b≤−(h)∗(m)}(f)=(h)∗∗(f)=h(f)
and we're done with isomorphism.
So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.
For monotonicity, remember that all points in the infradistribution set are a-measures, so if f′≥f, then
h(f′)=sup(m,b)∈Hm(f′)+b≥sup(m,b)∈Hm(f)+b=h(f)
We could do that because all the measure components are actual measures.
For Lipschitzness, we first observe that compact-projection (the minimal points, when projected up to their measure components, make a set with compact closure) enforces that there's an upper bound on the λ value of a minimal point (λμ,b)∈Hmax, because otherwise you could pick a sequence with unbounded λ, and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected up to their measure components.
Then, we observe that points in H correspond perfectly to hyperplanes that lie below the graph of h, and a maximal point is "you shift your hyperplane up as much as you can until you can't shift it up any more without starting to cut into the function h". Further, for every function f∈CB(X), you can make a hyperplane tangent to the function h at that point by the Hahn-Banach theorem, which must correspond to a maximal point.
Putting it together, the epigraph of h is exactly the region above all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to maximal points, and their Lipschitz constants correspond to the λ value of the minimal points. Which are bounded. So, Compact-Projection in H implies h is Lipschitz.
Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all ϵ, there is a compact set Cϵ⊆X where all the measures m have <ϵ measure outside of Cϵ.
So, given that the set of measures corresponding to H is compact by the compact-projection property, we want to show that the functional h has compact almost-support. To do this, we'll observe that if h is the sup of a bunch of functions, and all functions think two different points are only a little ways apart in value, then h must think they're only a little distance apart in value. Keeping that in mind, we have:
And then, we can think of a minimal point as corresponding to a hyperplane ϕ and ψ, and h is the sup of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get:
And then, because f′ was selected to be 0 on Cϵ, which makes up all but ϵ of the measure for all measures present in H, we can upper-bound |m(f′)| by ϵ||f′||, so we have that
f′↓Cϵ→∀f:|dh(f;f′)|≤ϵ||f′||
And so, Cϵ is a compact ϵ-almost-support for h, and this argument works for all ϵ, so h is CAS, and that's the last condition we need. Thus, if H is an infradistribution (set form), the expectation functional h is an infradistribution (expectation form).
Now for the other direction, where we assume monotonicity, concavity, normalization, Lipschitzness, and CAS on an infradistribution (expectation form) and show that the induced form fulfills nonemptiness, convexity, closure, upper completion, projection-compactness, normalization, and being a set of a-measures.
Remember, our specification of the corresponding set was:
{(m,b)|b≤−(h)∗(m)}
Where (h)∗ is the convex conjugate of h.
First, being a nonempty set of a-measures with 0 or negative b value. Because there's an isomorphism linking points of the set and hyperplanes below the graph of h, we just need to establish that no hyperplanes below the graph of h can slope down in the direction of a nonnegative function (as this certifies that the measure component must be an actual measure), and no hyperplanes below the graph of h can assign 0 a value above 0 (as this corresponds to the b term, and can be immediately shown by normalization).
What we do is go "assume there's a ϕ where the linear functional corresponding to ϕ isn't a measure, ie, there's some negative function f where ϕ(f)>ϕ(0)". Well, because of monotonicity for h (one of the assumed properties), we have h(0)≥h(f)≥h(2f)≥h(3f).... And, because all affine functionals are made by taking a linear functional and displacing it, ϕ(0)<ϕ(f)<ϕ(2f)<ϕ(3f)..., increases at a linear rate, so eventually the hyperplane and h cross over, but ϕ was assumed to be below h always, so we have a contradiction.
Therefore, all hyperplanes below h must have their linear functional component corresponding to an actual measure. And we get nonemptiness from the concavity of h, so we can pick any function and use the Hahn-Banach theorem to make a tangent hyperplane to h that touches at that point, certifying nonemptiness.
By the way, the convex conjugate, (h)∗(m), can be reexpressed as supf(h(f)−m(f)).
For closure and convexity: h is proper, continuous on CB(X), and convex, so, by the Wikipedia page on "Closed Convex Function", (h)∗ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h)∗ is convex and closed, so −(h)∗ is concave and closed. From the Wikipedia page on "Closed Convex Function", this means that the hypograph of −(h)∗ is closed, and also the hypograph of a concave function is convex. This takes care of closure and convexity for our H.
Time for lower-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)−(0,b′) lies in the epigraph. This is equivalent to showing that b−b′≤−(h)∗(m). Let's begin.
−(h)∗(m)≥b≥b−b′
And we're done.
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H. Then EH(0)≠0, or EH(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H must be normalized.
That just leaves compact-projection. We know that a set of measures is precompact iff there's a bound on their λ values, and for all ϵ, there's a compact set Cϵ⊆X where all the measure components have <ϵ measure outside of that set.
First, we can observe that no hyperplane below h can have a Lipschitz constant above the maximal Lipschitz constant for the function h, because if it increased more steeply in some direction, you could go in that direction, and eventually the hyperplane would grow past the rate h grows at and cross it, and the hyperplane must be below h. Thus, Lipschitzness of h enforces that there can be no point in the set H with too much measure, which gives us one half of compact-projection for H.
For the other half, CAS for h ensures that for all ϵ, there is a compact set Cϵ where
f′↓Cϵ=0→|dh(f;f′)|≤ϵ||f′||
What we'll do is establish that no hyperplane lying above h can have a slope more than ϵ in the direction of a function that's in [0,1] and is 0 on Cϵ. Let f′ be such a function fulfilling those properties that makes some hyperplane below h slope up too hard. Then
dh(0;f′)≥ϵ
Because going in the direction of a positive function increases your value.
Now, we can realize that as we travel from 0 to f′ to 2f′ to 3f′, our vector of travel is always in the direction of f′, which can't be too negative. Each additional f′ added ramps up the value of h by at most ϵ||f′||. However, each additional f′ added raises the value of ψ (our assumed functional that's sloping up too hard in the f′ direction) by more than that quantity, so eventually ψ will cross over to be higher than h, so ψ can't correspond to a point in H, and we have a contradiction.
Therefore, regardless of the point in H, its measure component must assign any function that's 0 on Cϵ and bounded in [0,1] a value of ϵ at most. We can then realize that this can only happen to a measure that assigns ϵ or less measure to the set Cϵ (otherwise you can back off from your continuous function to a discontinuous indicator function).
Thus, given our ϵ, we've found a compact set Cϵ where the measure component of all points in H assign ϵ or less value to the outside of that set, and this can be done for all ϵ, certifying the last missing piece for compact-projection of H (because the projection is precompact iff the set of measures is bounded above in amount of measure present, and for all ϵ, there's a compact set Cϵ where all the measures assign ≤ϵ measure outside of that set.
And that's the last condition we need to conclude that the set form of an ultradistribution (functional form) is an ultradistribution (set form), and we're done.
Theorem 2: Infra-Disintegration Theorem:Let h be some arbitrary infradistribution over X, and K be some arbitrary infrakernel Xik→Y. Then (h⋉K)|gL=h′⋉K′, where: gx:=λy.g(x,y) Lx:=λy.L(x,y) α(x):=K(x)(1★Lxgx)−K(x)(0★Lxgx) β(x):=K(x)(0★Lxgx) K′(x):=K(x)|gxLx h′(f):=h(λx.α(x)f(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
Our first order of business is showing that putting this together makes the orginal thing. Let f be a continuous function X×Y→R in this case.
in order to wrap up our disintegration argument. There's a subtlety here, which is that when unpacking the definition of K′ we can't cancel the α(x) out of the inside and outside, because you've got division-by-zero errors. So, actually, let's take a detour and show that for all x,
α(x)K′(x)(λy.f(x,y))+β(x)
=K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y))
We've got two possible cases. In case one, α(x)≠0. Then, we can go:
Now, there is a bit of work that we need to put in, in order to verify that h' and K' are valid infradistributions and infrakernels, respectively.
Our first order of business is verifying that β and α are continuous bounded functions X→R. We recall our note that for any continuous bounded function f:X×Y→R, and any infrakernel K, the function λx.K(x)(λy.f(x,y)) must be continuous and bounded, otherwise the semidirect product would be ill-defined. Now, β was defined as:
λx.K(x)(0★Lxgx)
=λx.K(x)(λy.(1−L(x,y))g(x,y))
and (1−L)g is obviously continuous and bounded, so β must be continuous and bounded. For α, it's similar, it's defined as:
and similarly, we can verify that L+(1−L)g and (1−L)g are both continuous and bounded, so this is a continuous bounded function as well. Now that that's out of the way, the conditions we need to check for h′ being a valid infradistribution are: Normalization, monotonicity, concavity, Lipschitzness, and compact almost-support.
Monotonicity is an easy one to show. If f′≥f, then αf′+β≥αf+β because α(x)≥0 always, and then monotonicity for h transfers that to the outside, and so you get monotonicity for h′, because
And so we can make compact almost-supports for any ϵ we want, so h′ fulfills the CAS property. And that's the last piece we need to know that h′ is an infradistribution.
Further, support(α) is a support for h′, because any two functions disagreeing off the support of α only disagree where α is 0, and from the definition of h′ it's clear to see that they both induce identical results. So, we can view h′ as having type signature □(support(α)).
For K′, we'll show that it's (almost) an infrakernel of type support(α)ik→Y, only missing the Lipschitz constant constraint.
However, what's going on here is that it's kind of like, how, if you want to specify a function that's guaranteed to have a finite value when you integrate it w.r.t. any probability distribution, you need a uniform bound on the values of that function. But if you've got a specific probability distribution you're integrating that function w.r.t. to, the function can run off to infinity but in areas of low measure, so the net integral is finite. So we should expect the "common Lipschitz constant" property to be the first property to break in an infrakernel.
We do still have to check compact-shared CAS and pointwise convergence, as well as all the K′(x) being infradistributions.
For the first one, K′(x):=K(x)|gxLx, and the update of an infradistribution K(x) is an infradistribution when the update is well-defined, so we need to check whether the update is well-defined. The update would fail to be well-defined iff
K(x)(1★Lxgx)−K(x)(0★Lxgx)=0
We can rewrite this as PgxK(x)(Lx), but that's just α(x), and we know α(x)>0, so we have that all the updates are well-defined and make infradistributions.
For number 2 and 3, the ones with compact subsets of support(α), we should note something important. If Cα⊆support(α) is compact, then inf{α(x)|x∈Cα}>0. If it was otherwise, you could fix a sequence of points with lower and lower α values, identify a convergent subsequence by compactness, and then by continuity of α, conclude that the limit point x had α(x)=0, so it isn't in the support of α and can't be in Cα, and we'd have a contradiction. Therefore, infx∈Cαα(x)>0. Cα will also be a compact subset of X.
If you pick an ϵ and a Cα, then a ϵinfx∈Cαα(x)-almost support for all the K(x), which is guaranteed to exist by compact-shared compact almost-support for K is an ϵ-almost support for all the K′(x) with x∈Cα. Here's how to show it. Let f,f′ agree on your almost-support of interest.
So, if the inner function wasn't continuous, then it couldn't be evaluated by h, and h⋉K would be undefined for a continuous function input, which is impossible because K is an infrakernel, so we can take semidirect product and always make an infradistribution. At this point, we can go:
We're going to need to go out of order here a bit and do Proposition 31 before doing Propositions 29 and 30. Please have patience.
Proposition 31: All five characterizations of the pullback define the same infradistribution.
So, the two requirements of being able to do a pullback g∗(h) are that: h has g(X) as a support, and g is a proper function (preimage of a compact set is compact). And our five characterizations are:
1: For sets,
g∗(H):=(g∗)−1(H)
ie, take the preimage of H w.r.t. the linear operator that g induces from Ma(X)→Ma(Y).
2: For concave functionals, g∗(h) is the concave monotone hull of the partial function f∘g↦h(f).
3: For concave functionals, g∗(h) is the monotone hull of the partial function f∘g↦h(f), ie:
g∗(h)(f):=supf∗:f∗∘g≤fh(f∗)
4: For concave functionals,
g∗(h)=inf{h′|h′∈□X,g∗(h′)=h}
take the inf of all infradistributions that could have made your infradistribution of interest via pushforward.
5: For sets, take the closed convex hull of all infradistribution sets which project down to be equivalent to H.
The proof path proof path will be first establishing that 3 makes an infradistribution, then showing that 3 and 2 are equivalent, then showing that 2 and 4 are equivalent, then showing that 4 and 5 are equivalent. And now that 2, 3, 4, and 5 are in an equivalence class, we wrap up by showing that 2 implies 1 and 1 implies 5.
First up: Establishing that 3 makes an infradistribution, which is where the bulk of the work is.
Starting with normalization:
g∗(h)(1):=supf∗:f∗∘g≤1h(f∗)
And clearly, this supremum is attained by f∗=1, it can be no higher, by normalization, so we have h(1)=1 and thus one part of normalization. For the other half of normalization,
g∗(h)(0)=supf∗:f∗∘g≤0h(f∗)
Now, in order for f∗∘g≤0 to be the case, all we need is that f∗ be 0 on g(X). However, since g(X) is assumed to be a support for h, all functions like this must have h(f∗)=h(0)=0 and we have the other part of normalization.
Monotonicity: This is immediate because if f′≥f, then the supremum for f′ has more options and can attain a higher value.
Concavity:
pg∗(h)(f)+(1−p)g∗(h)(f′)=psupf∗:f∗∘g≤1h(f∗)+(1−p)supf′∗:f′∗∘g≤1h(f′∗)
And we can pick some f∗ and f′∗ which very nearly attain the relevant values, as close as you wish, to go:
≃ph(f∗)+(1−p)h(f′∗)≤h(pf∗+(1−p)f′∗)
And then,
(pf∗+(1−p)f′∗)∘g=p(f∗∘g)+(1−p)(f′∗∘g)≤pf+(1−p)f′
So, with that, we can go:
≤supf∗:f∗∘g≤pf+(1−p)f′h(f∗)=g∗(h)(pf+(1−p)f′)
And we're done with concavity.
For Lipschitzness, the proof will work as follows. We'll take two functions f and f′, and show that g∗(h)(f′)≥g∗(h)(f)−λ⊙d(f,f′) in a way that generalizes to symmetrically show that g∗(h)(f)≥g∗(h)(f′)−λ⊙d(f,f′) to get our result that
|g∗(h)(f)−g∗(h)(f′)|≤λ⊙d(f,f′)
where λ⊙ is the Lipschitz constant of h.
So, let's show
g∗(h)(f′)≥g∗(h)(f)−λ⊙d(f,f′)
What we can do is, because
g∗(h)(f)=supf∗:f∗∘g≤fh(f∗)
then we can consider this value to be approximately obtained by some specific function f∗. Then,
g∗(h)(f′)≥g∗(h)(inf(f,f′))≥g∗(h)(f−d(f,f′))≥h(f∗−d(f,f′))
≥h(f∗)−λ⊙d(f,f′)≃g∗(h)(f)−λ⊙d(f,f′)
The first two inequalities are from monotonicity. The third inequality is because (f∗−d(f,f′))∘g≤f−d(f,f′)), where f∗ is our special function from a bit earlier, because f∗∘g≤f. So, it isn't necessarily the supremum and doesn't attain maximum value. Then, there's just Lipschitzness of h, and h(f∗) being approximately g∗(h)(f). And we're done with Lipschitzness, just swap f and f′ and use earlier arguments.
Before proceeding further, note the following information: Polish spaces are first-countable and Hausdorff so they're compactly generated, and Wikipedia says any proper map to a compactly generated space is closed (maps closed sets to closed sets). So, g(X) is a closed set.
Now just leaves compact almost-support. Fix an ϵ, and take a compact ϵ-almost-support of h, Cϵ⊆Y. Intersect with g(X) if you haven't already, it still makes a compact set and it keeps being an ϵ-almost-support. We will show that g−1(Cϵ), which is compact because we assumed our map was proper, is an ϵ-almost-support of g∗(h).
This proof will proceed by showing that, if f,f′ agree on g−1(Cϵ), then g∗(h)(f)−ϵd(f,f′)≤g∗(h)(f′). Symmetrical arguments work by swapping f and f′, showing that the two expectation values are only ϵd(f,f′) apart.
As usual, we have
g∗(h)(f)≃h(f∗)
Where f∗ very nearly attains the supremum, and we know that f∗∘g≤f.
Let the function Θ:g(X)→P(R) (a set-valued function) be defined as:
If y∈Cϵ, then Θ(y)=f∗(y)
If y∉Cϵ, then
Θ(y):=[f∗(y)−d(f,f′),inf(minx∈g−1(y)f′(x),f∗(y)+d(f,f′))]
These sets are always concave, and also always nonempty, because, in the second case, the only way it could be nonempty is if
minx∈g−1(y)f′(x)<f∗(y)−d(f,f′)
Now, g being a proper map, the preimage of the single point y is a compact set, so we can actually minimize our function, pick out our particular x that maps to y. Then, we know:
f′(x)+d(f,f′)≥f(x)≥f∗(g(x))=f∗(y)
The second inequality is because f≥f∗∘g by assumption. Shuffling this around a bit, we have:
f′(x)≥f∗(y)−d(f,f′)
where our x is minimizing, which is incompatible with nonemptiness.
So, we've got convexity and nonemptiness. Let's shower lower-hemicontinuity.
As a warmup for that, we'll need to show that the function y↦minx∈g−1(y)f′(x)
Is lower-semicontinuous.
So, our proof goal is: if yn limits to y, then
liminfn→∞minx∈g−1(yn)f′(x)≥minx∈g−1(y)f′(x)
And we know that all our yn and y itself, lie in g(X).
At this point, here's what we do. Isolate the subsequence of points we're using to attain the liminf. This sequence of points yn (when unioned with y itself) is compact in Y, so its preimage in X is compact. And you can minimize functions over compact sets. So, for each yn, pick a point xn∈g−1(yn) that minimizes f′. Our sequence of points xn is roaming around in a compact set, so we can pick a convergent subsequence. At this point, let's start indexing with m. Said limit point x must have f′(x)=limm→∞f′(xm) by continuity for f′, and g(x)=y by continuity for g (all the xm get mapped to ym, so the limit point x must go to the limit point y) Putting it all together,
liminfn→∞minx∈g−1(yn)f′(x)=limm→∞f′(xm)=f′(x)≥infx∈g−1(y)f′(x)
And so we have that, if yn limits to y, then
liminfn→∞minx∈g−1(yn)f′(x)≥minx∈g−1(y)f′(x)
And y↦minx∈g−1(y)f′(x) is lower-semicontinuous over g(X).
At this point, we'll try to show that the set-valued function Θ is lower-hemicontinuous over g(X). We'll use z to denote numbers in R.
The criterion for lower-hemicontinuity of Θ is: If yn limits to y, and z∈Θ(y), then there's a subsequence ym and points zm∈Θ(ym) where limm→∞zm=z
To show this, we'll divide into three cases. In our first case, only finitely many of the yn are in Cϵ. Let's pass to a subsequence where we remove all the yn that lie in Cϵ, and where minx∈g−1(ym)f′(x) converges to a value. Let
zm:=sup(inf(z,minx∈g−1(ym)f′(x),f∗(ym)+d(f,f′)),f∗(ym)−d(f,f′))
Then,
z=sup(inf(z,minx∈g−1(y)f′(x),f∗(y)+d(f,f′)),f∗(y)−d(f,f′))
=sup(inf(z,limm→∞minx∈g−1(ym)f′(x),limm→∞f∗(ym)+d(f,f′))),limm→∞f∗(ym)−d(f,f′))
=sup(limm→∞inf(z,minx∈g−1(ym)f′(x),f∗(ym)+d(f,f′)),limm→∞f∗(ym)−d(f,f′))
=limm→∞sup(inf(z,minx∈g−1(ym)f′(x),f∗(ym)+d(f,f′)),f∗(ym)−d(f,f′))=limm→∞zm
This was because all these sequences have their corresponding values converge, whether by continuity or assumption. So, this takes care of lower-hemicontinuity in one case.
Now for our second case, where only finitely may of the yn aren't in Cϵ. Pass to a subsequence where they are all removed, and where minx∈g−1(ym)f′(x) converges to a value. Accordingly, all our ym lie in Cϵ. Let zm=f∗(ym). Then
limm→∞zm=limm→∞f∗(ym)=f∗(y)=z
Our third case is the trickiest, where there are infinitely many yn in Cϵ, and infinitely many outside that set, and accordingly, y∈Cϵ Let's pass to a subsequence where we remove all the yn that lie in Cϵ and where minx∈g−1(ym)f′(x) converges to a value. We would be able to apply the proof from the first case if we knew that
f∗(y)∈[f∗(y)−d(f,f′),inf(minx∈g−1(y)f′(x),f∗(y)+d(f,f′))]
Obviously, we have f∗(y)≥f∗(y)−d(f,f′), but we still need to show that
f∗(y)≤minx∈g−1(y)f′(x)
and
f∗(y)≤f∗(y)+d(f,f′)
The second one can be addressed trivially, so now we just have to show that
f∗(y)≤minx∈g−1(y)f′(x)
Pick some minimizing x, where g(x)=y, and remember that this y lies in Cϵ, so x∈g−1(Cϵ), the compact set where f=f′. Now we can just go:
f∗(y)=f∗(g(x))≤f(x)=f′(x)
And we're done. That's the last piece we need to know that Θ is lower-hemicontinuous.
Now we will invoke the Michael selection theorem, which gets us a continuous selection f∗′:Y→R (bounded because f and f′ are) where, for all y, f∗′(y)∈Θ(y).
The conditions to invoke the Michael selection theorem are that F is lower-hemicontinuous (check), that Y is paracompact, and that R is a Banach space (check). Fortunately, any metrizable space is paracompact, and Polish spaces are metrizable, so we can invoke the theorem and get our continuous f∗′.
Note two important things about this continuous f∗′ we just constructed. First, it perfectly matches f∗ over Cϵ. Second, for all x in g(X) (remember, that's the space it's defined over)
(f∗′∘g)(x)=f∗′(g(x))≤minx′∈g−1(g(x))f′(x′)≤f′(x)
Because of that worst-case bound thing being one of the upper bounds on Θ(y). Also, over g(X), it remains within d(f,f′) of f∗, again, because of the imposed bounds.
We can use the Tietze extension theorem to extend f∗′ to all of X while staying within d(f,f′) of f∗.
At this point, we can return to where we started out in this section, and go:
g∗(h)(f)−ϵd(f,f′)=supf∗:f∗∘g≤fh(f∗)−ϵd(f,f′)≃h(f∗)−ϵd(f,f′)
≤(h(f′∗)+ϵd(f,f′))−ϵd(f,f′)=h(f′∗)
≤supf′∗:f′∗∘g≤f′h(f′∗)=g∗(h)(f′)
The first equality was the definition of the preimage, the approximate equality was because f∗ is very very near attaining the true supremum, the ≤ is because f′∗ and f agree on Cϵ which is an ϵ-almost-support for h, the next inequality after that is because we just showered that f∗′∘g≤f′, and then packing up the definition of preimage.
Perfectly symmetric arguments work when you swap f′ and f, yielding our net result that
|g∗(h)(f)−g∗(h)(f′)|≤ϵd(f,f′)
And f,f′ were selected to be arbitrary continuous functions that agree on g−1(Cϵ), which is compact, so we have a compact ϵ-almost-support, so g∗(h) has compact almost-support and is an infradistribution.
Ok, so we're done, we've defined a pullback, it's the monotone hull of the function h under precomposition with g, and this was definition 3. Now it's time to start showing equivalences.
First, the equality of definitions 2 and 3. The concave monotone hull of the partially defined pullback must be ≥ than the monotone hull of the partially defined pullback, because we're imposing an additional condition, there are less functions that fit the bill. But, we've showed that the monotone hull is concave, so it's actually an equality.
Now for the equality of of 2 and 4, that the concave monotone hull of the pullback is the same as the inf of all infradistributions that produce h when pushfoward is applied.
We know that the concave monotone hull is indeed an infradistribution, and
g∗(g∗(h))(f)=g∗(h)(f∘g)=supf∗:f∗∘g≤f∘gh(f∗)=h(f)
So, since its projection forward matches h perfectly, we have
g∗(h)≥inf{h′∈□X|g∗(h′)=h}
For actual equality, we note that, if g∗(h′)=h, then
h′(f∘g)=g∗(h′)(f)=h(f)
And since g∗(h) is the concave monotone hull of the partial function f∘g↦h(f), and the concave monotone hull is the least concave monotone function compatible with said partial function, and any h′ which pushes forward to map to h must be concave and monotone and match this partial function, all the h′ we're inf-ing together lie above g∗(h), so we have actual equality and
g∗(h)=inf{h′|g∗(h′)=h}
So, at this point, definitions 2, 3, and 4 of the concave hull are equal.
Now for equality of definitions 4 and 5. Taking the inf of infradistributions on the concave functional level is the same as unioning together all the infradistribution sets, and taking the closed concave hull (because that doesn't affect the expectation values), so "inf of infradistributions that pushfoward to make h", on the concave functional side of LF-duality, is the same as "closed concave hull and upper completion of infradistributions which pushforward to make H" on the set side of LF-duality.
Now that we know definitions 2-5 of the pullback are all equal, and using h3,h1,h5 for the expectation functionals corresponding to definitions 3, 1, and 5 of the pullback, we'll show h5≥h1≥h3, so h1, ie, the first definition of the pullback, must be equal to the rest.
First, h1 is the functional corresponding to the set:
g∗(H):=(g∗)−1(H)
Where g∗ in this case is the function of type signature Ma(X)→Ma(Y) given by keeping the b term the same and pushing forward the signed measure component via the function g. Note that said pushfoward can actually only make points in Ma(g(X)), the measures can only be supported over that set.
We'll show that
Eg∗(H):CB(X)→R
is monotone and that
Eg∗(H)(f∘g)=EH(f)
To begin with, for monotonicity, if f′≥f, then
Eg∗(H)(f′)=inf(m,b)∈(g∗)−1(H)m(f′)+b
≥inf(m,b)∈(g∗)−1(H)m(f)+b
=Eg∗(H)(f)
This is easy because our set is entirely of a-measures. So the corresponding h1 is monotone.
Now, at this point, we can go:
Eg∗(H)(f∘g)=inf(m,b)∈(g∗)−1(H)m(f∘g)+b
=inf(m,b):(g∗(m),b)∈Hm(f∘g)+b
The next line is because g∗(m)(f) (the continuous pushfoward of a measure, evaluated via f) is the same as m(f∘g) (old measure evaluating f∘g)
=inf(m,b):(g∗(m),b)∈Hg∗(m)(f)+b
The next line is a bit tricky, because pushforward seems like it may only net some of the points in H. However, any measure supported over g(X) has a measure over X that, when pushfoward is applied, produces it. Just take each point y∈g(X), craft a Markov Kernel from g(X) to X that maps all y∈g(X) to a probability distribution supported over g−1(y), take the semidirect product of your measure over g(X) and your kernel, and project to X to get such a measure. So, said pushfoward surjects onto H, and we can get:
=inf(m′,b′)∈Hm′(f)+b′=EH(f)
So, now that we've shown that
Eg∗(H):CB(X)→R
is monotone and that
Eg∗(H)(f∘g)=EH(f)
We know that the corresponding functional h1 has the property that it's monotone and h1(f∘g)=h(f). Thus, it must lie above the monontone hull of f∘g↦h(f), which is just h3 (third definition of the pullback) and we have: h1≥h3.
Now for the other direction, showing that h5≥h1. The fifth definition of the pullback was the closed convex hull (and upper completion) of all the infradistributions which project to H. So, if we can show that any infradistribution set H′ which projects into H must have its minimal points lie in g∗(H), we'll have that all infradistribution which project to H are a subset of g∗(H), and thus is higher in the ordering on functionals, h5≥h1.
This is trivial by g∗(H) just being a preimage, so any set which pushforwards into the set of interest must be a subset of the preimage.
And we have our desired result that any infradistribution set which projects into H must be a subset of g∗(H) (all minimal points are present in it), and higher in the information ordering, so the inf of all of them, ie, the functional h5, must lie above the corresponding functional h1.
So, we have h5≥h1≥h3=h5 and so h1 equals all of them, and the restricted preimage of H is another formulation of the pullback, and we're done.
Proposition 29: The pullback of an infradistribution is an infradistribution, and it preserves all properties indicated in the table for this section.
So, we know from the proof of Proposition 31 that the pullback of an infradistribution is an infradistribution. Now, to show preservation of properties, we'll work with the formulation of the pullback as:
g∗(H)=(g∗)−1(H)
For homogenity, 1-Lipschitzness, cohomogenity, C-additivity, and crispness, they are all characterized by all minimal points having certain properties of their λ and b values. All minimal points of g∗(H) must pushforward to minimal points of H (otherwise the b term of your original minimal point could be strictly decreased and it'd still pushforward into H), which have the property of interest, and because pushforward g∗ preserves the λ and b values of a-measures, we know that all minimal points of g∗(H) have the same property on their λ and b, so all these properties are preserved under pullback.
Sharpness takes a slightly more detailed argument. Let's say that H is sharp, ie, its minimal points consist entirely of the probability distributions on C. We know from crispness preservation that g∗(H) has its minimal points consist entirely of probability distributions. If there was a probability distribution that wasn't supported on g−1(C) (preimage of C, which, by the assumed properness of g, is compact), that was present as a minimal point, then it would pushfoward to make a minimal point in H that was a probability distribution that had some measure outside of C, which is impossible. Further, any probability distribution that is supported on g−1(C) would pushfoward to make a probability distribution supported on C, which would be a minimal point of H, so said point must be minimal in g∗(H). Thus, the minimal points of g∗(H) consist entirely of probability distributions supported on g−1(C), so g∗(h) is sharp, and all nice properties are therefore preserved.
Proposition 30: Pullback then pushforward is identity, g∗(g∗(h))=h, and pushforward then pullback is below identity, g∗(g∗(h))≤h
The first part is easy to show,
g∗(g∗(h))(f)=g∗(h)(f∘g)=supf∗:f∗∘g≤f∘gh(f∗)=h(f)
Yielding pullback then pushforward being identity. For the reverse direction, we use Proposition 31, in particular the part about the pullback being the least infradistribution that pushes forward to produce the target. And we have g∗(g∗(g∗(h)))=g∗(h), therefore g∗(g∗(h)))≤h
Proposition 32: The free product of two infradistributions exists and is an infradistribution iff:
1: There are points (λμ,0)∈Hmin1 and (λ′μ′,0)∈Hmin2 where λ=λ′
2: There are points (λμ,b)∈Hmin1 and (λ′μ′,b′)∈Hmin2 where λ=λ′, and b=b′, and λ+b=1.
So, to show this, what we'll be doing is using our set-form characterization of the pullback as a preimage and the supremum as an intersection. We already know that, from proposition 14, a necessary condition for the intersection of infradistributions being an infradistribution is normalization, ie, on the set level, the presence of a point with b=0 and a point with λ+b=1. The presence of a point with b=0 in the intersection of the preimages would pushforward to make a b=0 point in H1 and H2, respectively, with equal λ values, fulfilling condition 1.
Also, the presence of a point with λ+b=1 in the intersection of the preimages would pushforward to make a λ+b=1 minimal point in H1 and H2 respectively, with equal λ values and b values, fufilling condition 2.
Thus, if the free product exists and is an infradistribution, the two properties are fulfilled. That just leaves the reverse direction of establishing that the free product exists and is an infradistribution if both properties are fulfilled.
We'll be handling the set conditions on an infradistribution. For nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure (λ(μ×μ′),0) which projects down to (λμ,0) and (λμ′,0), so it's present in the intersection of the preimages of H1 and H2 under the projection mappings. This witnesses nonemptiness and one half of normalization.
For the second half of normalization, any point with λ+b<1 would project down to make a point outside of H1 and H2, so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure (λ(μ×μ′),b), with λ+b=1, and it projects down to (λμ,b) and (λμ′,b) in H1 and H2 respectively, witnessing that it's present in the free product, and witnessing that the other half of normalization works out.
For closure and convexity and upper-completeness, the intersection of two closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the b term) is closed-convex upper-complete, so that works out.
That leaves the compact-projection property, which is the only one where we may have issues. Remember, the necessary-and-sufficient conditions for compactness of a set of measures are: bounded λ value, and, for all ϵ, there's a compact subset of X×Y where all the measure components of the points have all but ϵ measure on said set.
Any point in the intersection of preimages with the measure component having a λ value above the minimum of the two λ⊙1 and λ⊙2 values (maximum amount of measure present in points in H1 and H2 respectively) would project down to not be in H1 or H2, respectively, so those are ruled out, and we've got a bound on the amount of measure present.
As for an ϵ-almost-support for all the measure components in H1∗H2, what you do is find a CXϵ2 compact ϵ2-almost-support for H1 and a CYϵ2 compact ϵ2-almost-support for H2, and consider CXϵ2×CYϵ2
We claim this is a compact (product of compact sets) ϵ-almost-support for all measure components in H1∗H2.
This is because any measure component in H1∗H2 must project down to be a measure component of H1 and a measure component of H2. So, if there was more than ϵ measure outside
CXϵ2×CYϵ2
then it would project down to either H1 or H2 and have more than ϵ2 measure outside of the two compact sets, which is impossible by the projection landing in H1 (or H2) and the two compact sets being ϵ2-almost-supports for H1 and H2 respectively.
For any ϵ we can make a compact subset of X×Y that accounts for all but ϵ of the measure present of a measure in H1∗H2, so it's got the compact-projection property, which is the last thing we need to certify it's an infradistribution.
Proposition 33: Having (homogenity or 1-Lipschitzness) present on both of the component infradistributions is a sufficient condition to guarantee the existence of the free product.
From Proposition 32, all we need to do is to verify properties 1 and 2, ie
1: There are points (λμ,0)∈Hmin1 and (λ′μ′,0)∈Hmin2 where λ=λ′
2: There are points (λμ,b)∈Hmin1 and (λ′μ′,b′)∈Hmin2 where λ=λ′, and b=b′, and λ+b=1.
Given only that (without loss of generality) H1 is 1-Lipschitz or homogenous, and same for H2, we'll show properties 1 and 2.
The trick to this is showing that 1-Lipschitzness and homogenity both imply that an infradistribution contains some point with λ=1 and b=0, and if both infradistributions fulfill that, then conditions 1 and 2 are fulfilled for free, as can be verified with inspection.
So, let's show that 1-Lipschitzness and homogenity imply there's a point with λ=1 and b=0. For 1-Lipschitzness, λ≤1, and since there must be a point with b=0 by normalization, and all points have λ+b≥1 by normalization, said point must have λ=1. As for homogenity, all minimal points have b=0, and normalization says there's a point with λ+b=1, getting that there's a point with λ=1,b=0. So, we're done.
Proposition 34: prX∗(h1∗h2)≥h1 and prY∗(h1∗h2)≥h2, with equality when h1 and h2 are C-additive.
prX∗(h1∗h2)=prX∗(sup(pr∗X(h1),pr∗Y(h2)))≥prX∗(pr∗X(h1))=h1
And same for h2. For equality in the C-additive case, we need to show that every point in H1 has some point in H1∗H2 which surjects onto it. Remember, C-additivity is "all minimal points have λ=1", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.
The existence of minimal points with b=0 means that, given any point (μ,b)∈H1, you can find a point with the same b value in H2, call it (μ′,b)∈H2, and then (μ×μ′,b)∈H1∗H2 because it projects down into both sets, showing that H1∗H2 surjects onto H1 when projected down and so must have equal expectation value. Same for H2.
Proposition 35: Free product is commutative and associative.
This is because supremum is commutative (order doesn't matter) and associative (parentheses don't matter), because on the set level it's just intersection, which is order-and-grouping agnostic.
Proposition 36: The free product of countably many infradistributions exists and is an infradistribution iff:
1: There are points (λiμi,0)∈Hmini where, ∀i,i′∈I:λi=λi′
2: There are points (λiμi,bi)∈Hmini where ∀i,i′∈I:λi=λi′∧bi=bi′∧λi+bi=1
We recap the proof of Proposition 32. The presence of a point with b=0 or λ+b=1 in a hypothetical infinite free product must project down into all the Hi, and cause the fulfillment of conditions 1 and 2.
For the set conditions on an infradistribution, for nonemptiness and normalization, observe that if condition 1 is fulfilled, you can make the a-measure (λ∏iμi,0) which is present in the intersection of all the a-measure pullbacks. This witnesses nonemptiness and one half of normalization.
For the second half of normalization, any point with λ+b<1 would project down to make a point outside of all the Hi so it can't exist. Further, if condition 2 is fulfilled, you can make the a-measure (λ∏iμi,b), with λ+b=1, and it is present in the intersection of all the a-measure pullbacks, witnessing that the other half of normalization works out.
For closure and convexity and upper-completeness, the intersection of closed convex upper-complete sets (the pullback of closed convex upper-complete sets along a continuous linear mapping that preserves the b term) is closed-convex upper-complete, so that works out.
That leaves the compact-projection property, which is the only one where we may have issues. For Lipschitzness, we remember that any point in the intersection of preimages with the measure component having a λ value above miniλ⊙i has some Hi that it can't project down into, so we get a contradiction, and we've got a bound on the amount of measure present.
As for an ϵ-almost-support for all the measure components, you biject your Hi with the natural numbers, and find a compact set CXnϵ2n+1 which is a ϵ2n+1-almost-support for Hn and take the product of them. It's compact by Tychonoff, and because each speck of measure of the measure component of a point in the infinite free product that doesn't land in this set projects down to be outside of one of the CXnϵ2n+1 sets, the total amount of measure outside this set is upper-bounded by:
∑∞n=0ϵ2n+1=ϵ
So it's a compact ϵ-almost-support for the measure components of all a-measures in the infinite free product, and this can work for any ϵ, so the infinite free product fulfills the compact almost-support property and has compact projection, which is the last condition we needed.
Proposition 37: pr(Xj)∗(∗i(hi))≥hj, with equality when all the hi are C-additive.
pr(Xj)∗(∗i(hi))=pr(Xj)∗(supi(pr∗Xi(hi)))≥pr(Xj)∗(pr∗Xj(hj))=hj
For equality in the C-additive case, we need to show that every point in Hj has some point in ∗iHi which surjects onto it. Remember, C-additivity is "all minimal points have λ=1", which, by our notion of upper-completion, is equivalent to all measure components in the set just being ordinary probability distributions.
The existence of minimal points with b=0 means that, given any point (μj,b)∈Hj, you can find a point with the same b value in Hi (regardless of i), call it (μi,b)∈Hi, and then (∏iμi,b)∈∗iHi because it projects down into all sets, showing that ∗iHi surjects onto Hj when projected down and so must have equal expectation value.
Proposition 38: Ultradistributions H are isomorphic to convex, monotone, normalized, Lipschitz, CAS, functions h:CB(X)→R.
We'll be going over the proof of Theorem 1 again, but with some stuff swapped around.
Our first order of business is establishing the isomorphism. Our first direction is H to h and back is H exactly. By lower completion, and reproved analogues of Proposition 2 and Theorem 1 from "Basic inframeasure theory", which an interested party can reprove if they want to see it, we can characterize H as
{(m,b)|∀f∈CB(X):m(f)+b≤sup(m′,b′)∈H(m′(f)+b′)}
And then, our H can further be reexpressed as
{(m,b)|∀f∈CB(X):m(f)+b≤EH(f)}
{(m,b)|∀f∈CB(X):b≤EH(f)−m(f)}
{(m,b)|b≤inff∈CB(X)(EH(f)−m(f))}
and, by the definition of the convex conjugate, and the space of finite signed measures being the dual space of CB(X), and m(f) being a functional applied to an element, this is...
{(m,b)|b≤−(h)∗(m)}
So, our original set H is identical to the convex-conjugate set, when we go from H to h back to a set of a-measures.
Proof Phase 2: In the reverse direction for isomorphism, assume that h fulfills the conditions (we'll really only need continuity and concavity)We want to show that
E{(m,b)|b≤−(h)∗(m)}(f)=h(f)
Let's begin.
E{(m,b)|b≤−(h)∗(m)}(f)=sup(m,b):b≤−(h)∗(m)(m(f)+b)
Given an m, we have a natural candidate for maximizing the b, just set it equal to −(h)∗(m).
So then we get
=supm(m(f)−(h)∗(m))
And this is just... (h)∗∗(f), and, because h is continuous over CB(X), and convex, h=(h)∗∗. From that, we get
E{(m,b)|b≤−(h)∗(m)}(f)=(h)∗∗(f)=h(f)
and we're done with isomorphism.
So, in our first direction, we're going to derive the conditions on the functional from the condition on the set, so we can assume nonemptiness, closure, convexity, upper completion, projected-compactness, and normalization, and derive monotonicity, concavity, normalization, Lipschitzness, and compact almost-support (CAS) from that.
For monotonicity, remember that all points in the infradistribution set are a-measures, so if f′≥f, then
h(f′)=sup(m,b)∈Hm(f′)+b≥sup(m,b)∈Hm(f)+b=h(f)
We could do that because all the measure components are actual measures.
For convexity
h(pf+(1−p)f′)=sup(m,b)∈Hm(pf+(1−p)f′)+b
=sup(m,b)∈H(pm(f)+(1−p)m(f′)+pb+(1−p)b)
≤sup(m,b)∈H(pm(f)+pb)+sup(m,b)∈H((1−p)m(f′)+(1−p)b)
=psup(m,b)∈H(m(f)+b)+(1−p)sup(m,b)∈H(m(f′)+b)=ph(f)+(1−p)h(f′)
And we're done with that. For normalization,
h(0)=sup(m,b)∈Hm(0)+b=sup(m,b)∈Hb=0
And
h(1)=sup(m,b)∈Hm(1)+b=sup(λμ,b)∈Hλμ(1)+b=sup(λμ,b)∈Hλ+b=1
So we have normalization.
For Lipschitzness, we first observe that compact-projection (the minimal points, when projected up to their measure components, make a set with compact closure) enforces that there's an upper bound on the λ value of a minimal point (λμ,b)∈Hmax, because otherwise you could pick a sequence with unbounded λ, and it'd have no convergent subsequence of measures, which contradicts precompactness of the minimal points projected up to their measure components.
Then, we observe that points in H correspond perfectly to hyperplanes that lie below the graph of h, and a maximal point is "you shift your hyperplane up as much as you can until you can't shift it up any more without starting to cut into the function h". Further, for every function f∈CB(X), you can make a hyperplane tangent to the function h at that point by the Hahn-Banach theorem, which must correspond to a maximal point.
Putting it together, the epigraph of h is exactly the region above all its tangent hyperplanes. And we know all the tangent hyperplanes correspond to maximal points, and their Lipschitz constants correspond to the λ value of the minimal points. Which are bounded. So, Compact-Projection in H implies h is Lipschitz.
Finally, we'll want compact almost-support. A set of measures is compact iff the amount of measure is upper-bounded, and, for all ϵ, there is a compact set Cϵ⊆X where all the measures m have <ϵ measure outside of Cϵ.
So, given that the set of measures corresponding to H is compact by the compact-projection property, we want to show that the functional h has compact almost-support. To do this, we'll observe that if h is the sup of a bunch of functions, and all functions think two different points are only a little ways apart in value, then h must think they're only a little distance apart in value. Keeping that in mind, we have:
|dh(f;f′)|=limδ→0|h(f+δf′)−h(f)|δ=limδ→0|sup(m,b)∈H(m(f+δf′)+b)−sup(m′,b′)∈H(m′(f)+b′)|δ
And then, we can think of a minimal point as corresponding to a hyperplane ϕ and ψ, and h is the sup of all of them, so to bound the distance between these two values, we just need to assess the maximum size of the gap between those values over all minimal points/tangent hyperplanes. Thus, we can get:
limδ→0|sup(m,b)∈H(m(f+δf′)+b)−sup(m′,b′)∈H(m′(f)+b′)|δ
≤limδ→0sup(m,b)∈H|(m(f+δf′)+b)−(m(f)+b))|δ
And then, we can do some canceling and get:
=limδ→0sup(m,b)∈H|m(δf′)|δ=limδ→0sup(m,b)∈H|δm(f′)|δ
=limδ→0sup(m,b)∈H|m(f′)|=sup(m,b)∈H|m(f′)|
And then, because f′ was selected to be 0 on Cϵ, which makes up all but ϵ of the measure for all measures present in H, we can upper-bound |m(f′)| by ϵ||f′||, so we have that
f′↓Cϵ→∀f:|dh(f;f′)|≤ϵ||f′||
And so, Cϵ is a compact ϵ-almost-support for h, and this argument works for all ϵ, so h is CAS, and that's the last condition we need. Thus, if H is an infradistribution (set form), the expectation functional h is an infradistribution (expectation form).
Now for the other direction, where we assume monotonicity, concavity, normalization, Lipschitzness, and CAS on an infradistribution (expectation form) and show that the induced form fulfills nonemptiness, convexity, closure, upper completion, projection-compactness, normalization, and being a set of a-measures.
Remember, our specification of the corresponding set was:
{(m,b)|b≤−(h)∗(m)}
Where (h)∗ is the convex conjugate of h.
First, being a nonempty set of a-measures with 0 or negative b value. Because there's an isomorphism linking points of the set and hyperplanes below the graph of h, we just need to establish that no hyperplanes below the graph of h can slope down in the direction of a nonnegative function (as this certifies that the measure component must be an actual measure), and no hyperplanes below the graph of h can assign 0 a value above 0 (as this corresponds to the b term, and can be immediately shown by normalization).
What we do is go "assume there's a ϕ where the linear functional corresponding to ϕ isn't a measure, ie, there's some negative function f where ϕ(f)>ϕ(0)". Well, because of monotonicity for h (one of the assumed properties), we have h(0)≥h(f)≥h(2f)≥h(3f).... And, because all affine functionals are made by taking a linear functional and displacing it, ϕ(0)<ϕ(f)<ϕ(2f)<ϕ(3f)..., increases at a linear rate, so eventually the hyperplane and h cross over, but ϕ was assumed to be below h always, so we have a contradiction.
Therefore, all hyperplanes below h must have their linear functional component corresponding to an actual measure. And we get nonemptiness from the concavity of h, so we can pick any function and use the Hahn-Banach theorem to make a tangent hyperplane to h that touches at that point, certifying nonemptiness.
By the way, the convex conjugate, (h)∗(m), can be reexpressed as supf(h(f)−m(f)).
For closure and convexity: h is proper, continuous on CB(X), and convex, so, by the Wikipedia page on "Closed Convex Function", (h)∗ is a closed convex function, and then by the Wikipedia page on "Convex Conjugate" in the Properties section, (h)∗ is convex and closed, so −(h)∗ is concave and closed. From the Wikipedia page on "Closed Convex Function", this means that the hypograph of −(h)∗ is closed, and also the hypograph of a concave function is convex. This takes care of closure and convexity for our H.
Time for lower-completeness. Assume that (m,b) lies in the epigraph. Our task now is to show that (m,b)−(0,b′) lies in the epigraph. This is equivalent to showing that b−b′≤−(h)∗(m). Let's begin.
−(h)∗(m)≥b≥b−b′
And we're done.
Normalization of the resulting set is easy. Going from h to a (maybe)-inframeasure H back to h is identity as established earlier, so all we have to do is show that a failure of normalization in a (maybe)-inframeasure makes the resulting h not normalized. Thus, if our h is normalized, and it makes an H that isn't normalized, then going back makes a non-normalized h, which contradicts isomorphism. So, assume there's a failure of normalization in H. Then EH(0)≠0, or EH(1)≠1, so either h(0)≠0 or h(1)≠1 and we get a failure of normalization for h which is impossible. So H must be normalized.
That just leaves compact-projection. We know that a set of measures is precompact iff there's a bound on their λ values, and for all ϵ, there's a compact set Cϵ⊆X where all the measure components have <ϵ measure outside of that set.
First, we can observe that no hyperplane below h can have a Lipschitz constant above the maximal Lipschitz constant for the function h, because if it increased more steeply in some direction, you could go in that direction, and eventually the hyperplane would grow past the rate h grows at and cross it, and the hyperplane must be below h. Thus, Lipschitzness of h enforces that there can be no point in the set H with too much measure, which gives us one half of compact-projection for H.
For the other half, CAS for h ensures that for all ϵ, there is a compact set Cϵ where
f′↓Cϵ=0→|dh(f;f′)|≤ϵ||f′||
What we'll do is establish that no hyperplane lying above h can have a slope more than ϵ in the direction of a function that's in [0,1] and is 0 on Cϵ. Let f′ be such a function fulfilling those properties that makes some hyperplane below h slope up too hard. Then
dh(0;f′)≥ϵ
Because going in the direction of a positive function increases your value.
Now, we can realize that as we travel from 0 to f′ to 2f′ to 3f′, our vector of travel is always in the direction of f′, which can't be too negative. Each additional f′ added ramps up the value of h by at most ϵ||f′||. However, each additional f′ added raises the value of ψ (our assumed functional that's sloping up too hard in the f′ direction) by more than that quantity, so eventually ψ will cross over to be higher than h, so ψ can't correspond to a point in H, and we have a contradiction.
Therefore, regardless of the point in H, its measure component must assign any function that's 0 on Cϵ and bounded in [0,1] a value of ϵ at most. We can then realize that this can only happen to a measure that assigns ϵ or less measure to the set Cϵ (otherwise you can back off from your continuous function to a discontinuous indicator function).
Thus, given our ϵ, we've found a compact set Cϵ where the measure component of all points in H assign ϵ or less value to the outside of that set, and this can be done for all ϵ, certifying the last missing piece for compact-projection of H (because the projection is precompact iff the set of measures is bounded above in amount of measure present, and for all ϵ, there's a compact set Cϵ where all the measures assign ≤ϵ measure outside of that set.
And that's the last condition we need to conclude that the set form of an ultradistribution (functional form) is an ultradistribution (set form), and we're done.
Theorem 2: Infra-Disintegration Theorem: Let h be some arbitrary infradistribution over X, and K be some arbitrary infrakernel Xik→Y. Then (h⋉K)|gL=h′⋉K′, where:
gx:=λy.g(x,y)
Lx:=λy.L(x,y)
α(x):=K(x)(1★Lxgx)−K(x)(0★Lxgx)
β(x):=K(x)(0★Lxgx)
K′(x):=K(x)|gxLx
h′(f):=h(λx.α(x)f(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
Our first order of business is showing that putting this together makes the orginal thing. Let f be a continuous function X×Y→R in this case.
(h′⋉K′)(f)=h′(λx.K′(x)(λy.f(x,y)))
=h(λx.α(x)K′(x)(λy.f(x,y))+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
And now, we just need to show that
h(λx.α(x)K′(x)(λy.f(x,y))+β(x))=(h⋉K)(f★Lg)
in order to wrap up our disintegration argument. There's a subtlety here, which is that when unpacking the definition of K′ we can't cancel the α(x) out of the inside and outside, because you've got division-by-zero errors. So, actually, let's take a detour and show that for all x,
α(x)K′(x)(λy.f(x,y))+β(x)
=K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y))
We've got two possible cases. In case one, α(x)≠0. Then, we can go:
α(x)K′(x)(λy.f(x,y))+β(x)
=α(x)(K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y))−β(x)α(x))+β(x)
=K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y))
And we're done. In our second case, α(x)=0, so
α(x)K′(x)(λy.f(x,y))+β(x)=β(x)
However,
α(x)=PgxK(x)(Lx)=K(x)(1★Lxgx)−K(x)(0★Lxgx)
Thus,
K(x)(λy.L(x,y)+(1−L(x,y))g(x,y))=K(x)(λy.(1−L(x,y))g(x,y))=β(x)
so we have our desired result that
α(x)K′(x)(λy.f(x,y))+β(x)=β(x)=K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y)).
Anyways, moving on, now that we have that result, what can we do? Well, from earlier, we had:
(h′⋉K′)(f)=h(λx.α(x)K′(x)(λy.f(x,y))+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
So, we can swap this out to get:
=h(λx.K(x)(λy.L(x,y)f(x,y)+(1−L(x,y))g(x,y)))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)(Lf+(1−L)g)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)(f★Lg)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=((h⋉K)|gL)(f)
And so we have that:
h′⋉K′=(h⋉K)|gL
Now, there is a bit of work that we need to put in, in order to verify that h' and K' are valid infradistributions and infrakernels, respectively.
Our first order of business is verifying that β and α are continuous bounded functions X→R. We recall our note that for any continuous bounded function f:X×Y→R, and any infrakernel K, the function λx.K(x)(λy.f(x,y)) must be continuous and bounded, otherwise the semidirect product would be ill-defined. Now, β was defined as:
λx.K(x)(0★Lxgx)
=λx.K(x)(λy.(1−L(x,y))g(x,y))
and (1−L)g is obviously continuous and bounded, so β must be continuous and bounded. For α, it's similar, it's defined as:
λx.K(x)(1★Lxgx)−K(x)(0★Lxgx)
=λx.K(x)(λy.L(x,y)+(1−L(x,y))g(x,y))−K(x)(λy.(1−L(x,y))g(x,y))
and similarly, we can verify that L+(1−L)g and (1−L)g are both continuous and bounded, so this is a continuous bounded function as well. Now that that's out of the way, the conditions we need to check for h′ being a valid infradistribution are: Normalization, monotonicity, concavity, Lipschitzness, and compact almost-support.
For normalization,
h′(1)=h(λx.α(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
And,
α(x)+β(x)
=K(x)(λy.L(x,y)+(1−L(x,y))g(x,y))−K(x)(λy.(1−L(x,y))g(x,y))
+K(x)(λy.(1−L(x,y))g(x,y))
=K(x)(λy.L(x,y)+(1−L(x,y))g(x,y))
So, with that, we get:
=h(λx.K(x)(λy.L(x,y)+(1−L(x,y))g(x,y)))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)(L+(1−L)g)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)(1★Lg)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)=1
And that half of normalization is done. For the other half,
h′(0)=h(λx.β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
And, β(x)=K(x)(λy.(1−L(x,y))g(x,y))
So, with that, we get:
=h(λx.K(x)(λy.(1−L(x,y))g(x,y)))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)((1−L)g)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=(h⋉K)(0★Lg)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)=0
Monotonicity is an easy one to show. If f′≥f, then αf′+β≥αf+β because α(x)≥0 always, and then monotonicity for h transfers that to the outside, and so you get monotonicity for h′, because
h′(f)=h(λx.α(x)f(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
For concavity,
h′(pf+(1−p)f′)=h(α(pf+(1−p)f′)+β)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=h(pαf+(1−p)αf′+pβ+(1−p)β)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=h(p(αf+β)+(1−p)(αf′+β))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
And then by concavity of h, we get:
≥ph(αf+β)+(1−p)h(αf′+β)−p(h⋉K)(0★Lg)−(1−p)(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=p(h(αf+β)−(h⋉K)(0★Lg))+(1−p)(h(αf′+β)−(h⋉K)(0★Lg))(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=ph(αf+β)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)+(1−p)h(αf′+β)−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
=ph′(f)+(1−p)h′(f′)
This just leaves Lipschitzness and compact almost-support for h′. We can abbreviate
(h⋉K)(1★Lg)−(h⋉K)(0★Lg)
as Pgh⋉K(L) and go like this:
|h′(f)−h′(f′)|
=∣∣∣h(λx.α(x)f(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)−h(λx.α(x)f′(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)∣∣∣
=1(h⋉K)(1★Lg)−(h⋉K)(0★Lg)|h(λx.α(x)f(x)+β(x))−h(λx.α(x)f′(x)+β(x))|
=1Pgh⋉K(L)|h(λx.α(x)f(x)+β(x))−h(λx.α(x)f′(x)+β(x))|
≤1Pgh⋉K(L)λ⊙hd(λx.α(x)f(x)+β(x),λx.α(x)f′(x)+β(x))
=1Pgh⋉K(L)λ⊙hd(λx.α(x)f(x),λx.α(x)f′(x))
≤1Pgh⋉K(L)λ⊙h||α||d(f,f′)
And 1Pgh⋉K(L)λ⊙h||α|| Is a constant, so we've shown that h′ is Lipschitz.
Now for compact almost-support. For an ϵ, pick a compact ϵ⋅Pgh⋉K(L)||α||-almost-support for h. If f and f′ agree on it, then:
|h′(f)−h′(f′)|
=∣∣∣h(λx.α(x)f(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)−h(λx.α(x)f′(x)+β(x))−(h⋉K)(0★Lg)(h⋉K)(1★Lg)−(h⋉K)(0★Lg)∣∣∣
=1(h⋉K)(1★Lg)−(h⋉K)(0★Lg)|h(λx.α(x)f(x)+β(x))−h(λx.α(x)f′(x)+β(x))|
=1Pgh⋉K(L)|h(λx.α(x)f(x)+β(x))−h(λx.α(x)f′(x)+β(x))|
And then, since these two functions agree on the compact set, we have:
≤1Pgh⋉K(L)ϵ⋅Pgh⋉K(L)||α||d(λx.α(x)f(x)+β(x),λx.α(x)f′(x)+β(x))
=ϵ||α||d(λx.α(x)f(x),λx.α(x)f′(x))≤ϵd(f,f′)
And so we can make compact almost-supports for any ϵ we want, so h′ fulfills the CAS property. And that's the last piece we need to know that h′ is an infradistribution.
Further, support(α) is a support for h′, because any two functions disagreeing off the support of α only disagree where α is 0, and from the definition of h′ it's clear to see that they both induce identical results. So, we can view h′ as having type signature □(support(α)).
For K′, we'll show that it's (almost) an infrakernel of type support(α)ik→Y, only missing the Lipschitz constant constraint.
However, what's going on here is that it's kind of like, how, if you want to specify a function that's guaranteed to have a finite value when you integrate it w.r.t. any probability distribution, you need a uniform bound on the values of that function. But if you've got a specific probability distribution you're integrating that function w.r.t. to, the function can run off to infinity but in areas of low measure, so the net integral is finite. So we should expect the "common Lipschitz constant" property to be the first property to break in an infrakernel.
We do still have to check compact-shared CAS and pointwise convergence, as well as all the K′(x) being infradistributions.
For the first one, K′(x):=K(x)|gxLx, and the update of an infradistribution K(x) is an infradistribution when the update is well-defined, so we need to check whether the update is well-defined. The update would fail to be well-defined iff
K(x)(1★Lxgx)−K(x)(0★Lxgx)=0
We can rewrite this as PgxK(x)(Lx), but that's just α(x), and we know α(x)>0, so we have that all the updates are well-defined and make infradistributions.
For number 2 and 3, the ones with compact subsets of support(α), we should note something important. If Cα⊆support(α) is compact, then inf{α(x)|x∈Cα}>0. If it was otherwise, you could fix a sequence of points with lower and lower α values, identify a convergent subsequence by compactness, and then by continuity of α, conclude that the limit point x had α(x)=0, so it isn't in the support of α and can't be in Cα, and we'd have a contradiction. Therefore, infx∈Cαα(x)>0. Cα will also be a compact subset of X.
If you pick an ϵ and a Cα, then a ϵinfx∈Cαα(x)-almost support for all the K(x), which is guaranteed to exist by compact-shared compact almost-support for K is an ϵ-almost support for all the K′(x) with x∈Cα. Here's how to show it. Let f,f′ agree on your almost-support of interest.
supx∈Cα|K′(x)(f)−K′(x)(f′)|
=supx∈Cα∣∣∣K(x)(f★Lxgx)−K(x)(0★Lxgx)K(x)(1★Lxgx)−K(x)(0★Lxgx)−K(x)(f′★Lxgx)−K(x)(0★Lxgx)K(x)(1★Lxgx)−K(x)(0★Lxgx)∣∣∣
=supx∈Cα∣∣∣K(x)(f★Lxgx)−K(x)(0★Lxgx)α(x)−K(x)(f′★Lxgx)−K(x)(0★Lxgx)α(x)∣∣∣
=supx∈Cα(1α(x)|K(x)(f★Lxgx)−K(x)(f′★Lxgx)|)
≤(supx∈Cα1α(x))⋅(supx∈Cα|K(x)(f★Lxgx)−K(x)(f′★Lxgx)|)
=1infx∈Cαα(x)supx∈Cα|K(x)(f★Lxgx)−K(x)(f′★Lxgx)|
and then, because f and f′ agree on our almost-support, so do the bigger functions, and we have:
≤1infx∈Cαα(x)supx∈Cα(ϵinfx′∈Cαα(x′))d(Lxf+(1−Lx)gx,Lxf′+(1−Lx)gx)
=ϵd(Lxf,Lxf′)
However, Lx is always bounded in [0,1] so we have:
≤ϵd(f,f′)
and we're done, we've shown a compact ϵ-almost-support for all the K′(x) where the x were selected from a compact set, so we have compact-shared CAS.
Finally, for pointwise convergence, the first thing we'll observe is that the function
λx.K(x)(λy.L(x,y)f(y)+(1−L(x,y))g(x,y))
is continuous, as
(h⋉K)(Lf+(1−L)g)=h(λx.K(x)(λy.L(x,y)f(y)+(1−L(x,y))g(x,y)))
So, if the inner function wasn't continuous, then it couldn't be evaluated by h, and h⋉K would be undefined for a continuous function input, which is impossible because K is an infrakernel, so we can take semidirect product and always make an infradistribution.
At this point, we can go:
limn→∞K′(xn)(f)
=limn→∞(K(xn)|gxnLxn)(f)
=limn→∞K(xn)(f★Lxngxn)−K(xn)(0★Lxngxn)K(xn)(1★Lxngxn)−K(xn)(0★Lxngxn)
=limn→∞K(xn)(f★Lxngxn)−β(xn)α(xn)
=limn→∞K(xn)(λy.L(xn,y)f(y)+(1−L(xn,y))g(xn,y))−β(xn)α(xn)
And then, because we've shown that β and α are continuous and the limit point is in the support of α so α(x)>0, and that the function
λx.K(x)(λy.L(x,y)f(y)+(1−L(x,y))g(x,y))
is continuous, we have:
=K(x)(λy.L(x,y)f(y)+(1−L(x,y))g(x,y))−β(x)α(x)
=K(x)(f★Lxgx)−β(x)α(x)
=K(x)(f★Lxgx)−K(x)(0★Lxgx)K(x)(1★Lxgx)−K(x)(0★Lxgx)
=(K(x)|gxLx)(f)
=K′(x)(f)
And pointwise continuity is shown, so K′ is (almost) an infrakernel of type support(α)ik→Y.