Proposition 39:Given a crisp infradistribution ζ□ over N, an infrakernel K from N to infradistributions over X, and suggestively abbreviating K(i) as Hi(hypothesis i) and K∗(ζ□) as Eζ□Hi (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then ((Eζ□Hi)|gL)(f)=Eζ□(PgHi(L)⋅(Hi|gL)(f)+Hi(0★Lg))−Eζ□(Hi(0★Lg))Eζ□(Hi(1★Lg))−Eζ□(Hi(0★Lg))
Proof: Assume that L and g are functions of type X→[0,1] and X→R respectively, ie, likeliehood and utility doesn't depend on which hypothesis you're in, just what happens. First, unpack our abbreviations and what an update means.
Proposition 40:If a likelihood function L:X→[0,1] is 0 when f(x)<a, and f≥0 and a>0, then h(L⋅a)≤h(f)
h(L⋅a)=inf(λμ,b)∈Hλμ(L⋅a)+b=inf(λμ,b)∈Haλμ(L)+b
And then we apply Markov's inequality, that for any probability distribution,
μ(1f(x)≥a)≤μ(f)a
Also,1f(x)≥a≥L (because L is 0 when f(x)<a), so monotonicity means that
μ(L)≤μ(f)a
So, we can get:
≤inf(λμ,b)∈Haλμ(f)a+b=inf(λμ,b)∈Hλμ(f)+b=h(f)
And we're done.
Proposition 41:The IKR-metric is a metric.
So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular f (instead of the supremum), it would fulfill the triangle inequality, and then it's an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.
First, if dIKR(h,h′)=0, then h and h′ must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions, h and h′ must perfectly agree on all the uniformly continuous functions.
Now, we're going to need a somewhat more sophisticated argument. Let's say that the sequence fn is uniformly bounded and limits to f in CB(X) equipped with the compact-open topology (ie, we get uniform convergence of fn to f on all compact sets). Then, for any infradistributions, h(fn) will limit to h(f). Here's why. For any ϵ, there's some compact set Cϵ that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that h(fn) will, in the limit, be incredibly close to h(f), due to fn and f disagreeing by a bounded amount outside the set Cϵ and only disagreeing by a tiny amount on the set Cϵ, and the Lipschitzness of h.
Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when CB(X) is equipped with the compact-open topology, so given any function f, we can find a sequence of uniformly continuous functions fn limiting to f in the compact-open topology, and then,
h(f)=limn→∞h(fn)=limn→∞h′(fn)=h′(f)
And so, h and h′ agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that dIKR is a metric.
Proposition 42:The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.
Let's show both directions of this. For the first one, if the Hausdorff-distance between H,H′ is dhau(H,H′), then for all a-measures (m,b) in H, there's an a-measure (m′,b′) in H′ that's only dhau(H,H′) or less distance away, according to the KR-metric (on a-measures).
Now, by LF-duality, a-measures in H correspond to hyperplanes above h. Two a-measures being dhau(H,H′) apart means, by the definition of the KR-metric for a-measures, that they will assign values at most dhau(H,H′) distance apart for 1-Lipschitz functions in [−1,1].
So, translating to the concave functional view of things, H and H′ being dhau(H,H′) apart means that every hyperplane above h has another hyperplane above h′ that can only differ on the 1-Lipschitz 1-bounded functions by at most dhau(H,H′), and vice-versa.
Let's say we've got a Lipschitz function f. Fix an affine functional/hyperplane ψh that touches the graph of h at f. Let's try to set an upper bound on what h′(f) can be. If f is 1-Lipschitz and 1-bounded, then we can craft a ψh′ above h′ that's nearby, and
h′(f)≤ψh′(f)≤ψh(f)+dhau(H,H′)=h(f)+dhau(H,H′)
Symmetrically, we can swap h′ and h to get h(f)≤h′(f)+dhau(H,H′), and put them together to get:
|h′(f)−h(f)|≤dhau(H,H′)
For the 1-Lipschitz functions.
Let's tackle the case where f is either more than 1-Lipschitz, or strays outside of [−1,1]. In that case, fmax(Li(f),||f||) is 1-Lipschitz and bounded in [−1,1]. We can craft a ψh′ that only differs on 1-Lipschitz functions by dhau(H,H′) or less. Then, since, for affine functionals, ψ(ax)=a(ψ(x)−ψ(0))+ψ(0) and using that ψh′ and ψh are close on 1-Lipschitz functions, which fmax(Li(f),||f||) and 0 are, we can go:
This argument works for all h. And, even though we just got an upper bound, to rule out h′(f) being significantly below h(f), we could run through the same upper bound argument with h′ instead of h, to show that h(f) can't be more than 2dhau(H,H′)⋅(max(Li(f),||f||)) above h′(f).
So, for all Lipschitz f, |h(f)−h′(f)|≤2dhau(H,H′)⋅(max(Li(f),||f||,1)). Thus, for all Lipschitz f,
|h(f)−h′(f)|max(Li(f),||f||,1)≤2dhau(H,H′)
And therefore,
dIKR(h,h′)≤2dhau(H,H′)
This establishes one part of our inequalities. Now for the other direction.
Here's how things are going to work. Let's say we know the IKR-distance between h and h′. Our task will be to stick an upper bound on the Hausdorff-distance between H and H′. Remember that the Hausdorff-distance being low is equivalent to "any hyperplane above h has a corresponding hyperplane above h′ that attains similar values on the 1-or-less-Lipschitz functions".
So, let's say we've got h, and a ψh≥h. Our task is, knowing h′, to craft a hyperplane above h′ that's close to ψ on the 1-Lipschitz functions. Then we can just swap h′ and h, and since every hyperplane above h is close (on the 1-Lipschitz functions) to a hyperplane above h′, and vice-versa, H and H′ can be shown to be close. We'll use Hahn-Banach separation for this one.
Accordingly, let the set A be the set of f,b where (f,b)=p(f′,b′)+(1−p)(f∗,b∗), and:
That's... quite a mess. It can be thought of as the convex hull of the hypograph of h′, and the hypograph of ψh restricted to the 1-Lipschitz functions in [−1,1] and shifted down a bit. If there was a ψh′ that cuts into h′ and scores lower than it, ie ψh′(f∗)<h′(f∗), we could have p=0, and b∗=ψh′(f∗)<h′(f∗) to observe that ψh′ cuts into the set A. Conversely, if an affine functional doesn't cut into the set A, then it lies on-or-above the graph of h′.
Similarly, if ψh′ undershoots ψh−dIKR(h,h′) over the 1-or-less-Lipschitz functions in [−1,1], it'd also cut into A. Conversely, if the hyperplane ψh′ doesn't cut into A, then it sticks close to ψh over the 1-or-less-Lipschitz functions.
This is pretty much what A is doing. If we don't cut into it, we're above h′ and not too low on the functions with a Lipschitz norm of 1 or less.
For Hahn-Banach separation, we must verify that A is convex and open. Convexity is pretty easy.
First verification: Those numbers at the front add up to 1 (easy to verify), are both in [0,1] (this is trivial to verify), and qp1+(1−q)p2 isn't 1 (this is a mix of two numbers that are both below 1, so this is easy). Ok, that condition is down. Next up: Is our mix of f′1 and f′2 1-Lipschitz and in [−1,1]? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of f∗1 and f∗2 still in CB(X)? Yup.
That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below ψh′−dIKR(h,h′) (a hyperplane) lies strictly below it as well. For the second one, since h′ is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when qp1+(1−q)p2 is 0, in which case, we can have our new f′ and b′ be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So A is convex.
But... is A open? Well, observe that the region under the graph of h on CB(X) is open, due to Lipschitzness of h. We can wiggle b and f around a tiny tiny little bit in any direction without matching or exceeding the graph of h. So, given a point in A, fix your tiny little open ball around (f∗,b∗). Since p can't be 1, when you mix with (f′,b′), you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn't collapse to a point), making a little tiny open ball around your arbitrarily chosen point in A. So A is open.
Now, let's define a B that should be convex, so we can get Hahn-Banach separation going (as long as we can show that A and B are disjoint). It should be chosen to forbid our separating hyperplane being too much above ψh over the 1-or-less Lipschitz functions. So, let B be:
{(f,b)|f∈C1−lip(X,[−1,1]),b≥ψh(f)+dIKR(h,h′)}
Obviously, cutting into this means your hyperplane is too far above ψh over the 1-or-less-Lipschitz functions in [−1,1]. And it's obviously convex, because 1-or-less-Lipschitz functions in [−1,1] are a convex set, and so is the region above a hyperplane (ψh+dIKR(h,h′)).
All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We'll assume there's a point in both of them and derive a contradiction. So, let's say that (f,b) is in both A and B. Since it's in B,
b≥ψh(f)+dIKR(h,h′)
But also, (f,b)=p(f′,b′)+(1−p)(f∗,b∗) with the f's and b's and p fulfilling the appropriate properties, because it's in A. Since b∗<h′(f∗) and b′<ψh(f′)−dIKR(h,h′), we'll write b∗ as h′(f∗)−δ∗ and b′ as ψh(f′)−dIKR(h,h′)−δ′, where δ∗ and δ′ are nonzero. Thus, we rewrite as:
We'll be folding −pδ′−(1−p)δ∗ into a single −δ term so I don't have to write as much stuff. Also, ψh is an affine function, so we can split things up with that, and make:
And, if h(f∗)≥h′(f∗), we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, h′(f∗)>h(f∗), and we can rewrite as:
(1−p)|h′(f∗)−h(f∗)|−δ≥(1+p)dIKR(h,h′)
And now, we should notice something really really important. Since p can't be 1, f∗ does consistute a nonzero part of f, because f=pf′+(1−p)f∗.
However, f is a 1-or-less Lipschitz function, and bounded in [−1,1], due to being in B! If f∗ wasn't Lipschitz, then given any slope, you could find areas where it's ascending faster than that rate. This still happens when it's scaled down, and f′ can only ascend or descend at a rate of 1 or slower there since it's 1-Lipschitz as well. So, in order for f to be 1-or-less Lipschitz, f∗ must be Lipschitz as well. Actually, we get something stronger, if f∗ has a really high Lipschitz constant, then p needs to be pretty high. Otherwise, again, f wouldn't be 1-or-less Lipschitz, since 1−p of it is composed of f∗, which has areas of big slope. Further, if f∗ has a norm sufficiently far away from 0, then p needs to be pretty high, because otherwise f wouldn't be in [−1,1], since 1−p of it is composed of f∗ which has areas distant from 0.
Our most recent inequality (derived under the assumption that there's a point in A and B) was:
(1−p)|h′(f∗)−h(f∗)|−δ≥(1+p)dIKR(h,h′)
Assuming hypothetically were were able to show that
(1−p)|h′(f∗)−h(f∗)|≤(1+p)dIKR(h,h′)
then because δ>0, we'd get a contradiction, showing that A and B are disjoint. So let's shift our proof target to trying to show
(1−p)|h′(f∗)−h(f∗)|≤(1+p)dIKR(h,h′)
Let's begin. So, our first order of business is that
1+p1−p≥1
This should be trivial to verify, remember that p∈[0,1).
Now, f=pf′+(1−p)f∗, and f is 1-Lipschitz, and so is f′. Our goal now is to impose an upper bound on the Lipschitz constant of f∗. Let us assume that said Lipschitz constant of f∗ is above 1. We can find a pair of points where the rise of f∗ from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of f∗, or equal. If we're trying to have f∗ slope up as hard as it possibly can while mixing to make f, which is 1-Lipschitz, then the best case for that is one where f′ is sloping down as hard as it can, at a rate of -1. Therefore, we have that
(1−p)Li(f∗)+p⋅(−1)≤1
Ie, mixing f∗ sloping up as hard as possible and f′ sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:
(1−p)Li(f∗)≤(1+p)
Li(f∗)≤1+p1−p
We can run through almost the same exact argument, but with the norm of f∗. Let us assume that said norm is above 1. We can find a point where f∗ attains its maximum/minimum, whichever is further from 0. Now, if you're trying to have f∗ be as negative/positive as it possibly can be, while mixing to make f, which lies in [−1,1], then the best case for that is one where f′ is as positive/negative as it can possibly be there, ie, has a value of -1 or 1. In both cases, we have:
(1−p)||f∗||+p⋅(−1)≤1
(1−p)||f∗||≤(1+p)
||f∗||≤1+p1−p
Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by 1+p1−p, we have:
inequality necessary to force a contradiction. Therefore, A and B must be disjoint. Since A is open and convex, and B is convex, we can do Hahn-Banach separation to get something that touches B and doesn't cut into A.
Therefore, we've crafted a ψh′ that lies above h′, and is within dIKR(h,h′) of ψh over the 1-or-less-Lipschitz functions in [−1,1], because it doesn't cut into A and touches B.
This same argument works for any ψh≥h, and it works if we swap h′ and h. Thus, since hyperplanes above the graph of an infradistribution function h or h′ correspond to points in the corresponding H and H′, and we can take any point in H/affine functional above h and make a point in H′/affine functional above h′ (and same if the two are swapped) that approximately agree on C1−lip(X,[−1,1]), there's always a point in the other infradistribution set that's close in KR-distance and so H and H′ have
dhau(H,H′)≤dIKR(h,h′)
And with that, we get
dhau(H,H′)≤dIKR(h,h′)≤2dhau(H,H′)
And we're done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.
Proposition 43:A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space □X is complete under dIKR.
So, the space of closed subsets of Ma(X) is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions hn in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets Hn converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the Hn sets fulfills the various different properties of an infradistribution. Every point in H∞, the limiting set, has the property that there exists some Cauchy sequence of points from the Hn sets that limit to it, and also every Cauchy sequence of points from the Hn sets has its limit point be in H∞.
So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.
For upper completion, given any point (m,b)∈H∞, and any (0,b′) a-measure, you can fix a Cauchy sequence (mn,bn)∈Hn limiting to (m,b), and then consider the sequence (mn,bn+b′), which is obviously Cauchy (you're just adding the same amount to everything, which doesn't affect the KR-distance), and limits to (m,b+b′), certifying that (m,b)+(0,b′)∈H∞, so H∞ is upper-complete.
For closure, the Hausdorff limit of a sequence of closed sets is closed.
For convexity, given any two points (m,b) and (m′,b′) in H∞, and any p∈[0,1], we can fix a Cauchy sequence (mn,bn)∈Hn and (m′n,b′n)∈Hn converging to those two points, respectively, and then consider the sequence p(mn,bn)+(1−p)(m′n,b′n), which lies in Hn (due to convexity of all the Hn), and converges to p(m,b)+(1−p)(m′,b′), witnessing that this point is in H∞, and we've just shown convexity.
For normalization, it's most convenient to work with the positive functionals, and observe that, because all the hn(0)=0 and all the hn(1)=1 because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.
Finally, there's the compact-projection property. We will observe that the projection of the a-measures in Hn to just their measure components, call the set pr(Hn), must converge in Hausdorff-distance. The reason for this is because if they didn't, then you could find some ϵ and arbitrarily late pairs of inframeasures where pr(Hn) and pr(Hm) have Hausdorff-distance >ϵ, and then pick a point in pr(Hn) (or pr(Hm)) that's >ϵ KR-distance away from the other projection. Then you can pair that measure with some gigantic b term to get a point in Hn (or Hm, depending on which one you're picking from), and there'd be no point in Hm (or Hn) within ϵ distance of it, because the measure component would only be able to change by ϵ if you moved that far, and you need to change the measure component by >ϵ to land within Hm (or Hn).
Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the Hn sequence, so the projections pr(Hn) must converge in Hausdorff distance on the space of measures over X. Further, they're precompact by the compact-projection property for the Hn (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections pr(Hn) (technically, their closures) is a compact set of measures. Further, any sequence (mn,bn) which converges to some (m,b)∈H∞, has its projection being mn∈pr(Hn), which limits to show that m is in this Hausdorff limit. Thus, all points in H∞ project down to be in a compact set of measures, and we have compact-projection for H∞, which is the last condition we need to check to see if it's an infradistribution.
So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space □X is complete under the infra-KR metric.
Proposition 44:If a sequence of infradistributions converges in the IKR distance for one complete metric that X is equipped with, it will converge in the IKR distance for all complete metrics that X could be equipped with.
So, as a brief recap, X could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on X, a sequence of convergent infradistributions will still converge. Use d1 for the original metric on X and d2 for the modified metric on X, and similarly, dKR1 and dKR2 for the KR-metrics on measures, and dhaus1,dhaus2 for the Hausdorff distance induced by the two measures.
Remember, our infradistribution sets are closed under adding +b to them, and converge according to dhaus1 to the set H∞.
What we'll be doing is slicing up the sets in a particular way. In order to do this, the first result we'll need is that, for all b∗≥1, the set
{(mn,bn)∈Hn|bn≥b∗}
converges, according to dhaus1, to the set
{(m,b)∈H∞|b≥b∗}
So, here's the argument for this. We know that the projection sets
{mn|∃bn:(mn,bn)∈Hn}
are precompact, ie, have compact closure, and Hausdorff-limit according to dhau1 to the set
{m|∃b:(m,b)∈H∞}
(well, actually, they limit to the closure of that set)
According to our Lemma 3, this means that the set
{m|∃b≥0,n∈N∪{∞}:(m,b)∈Hn}
(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity λ⊙, the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn't depend on the distance metric X is equipped with.
Now, fix any ϵ. There's some timestep n where, for all greater timesteps, dhau1(Hn,H∞)≤ϵ.
Now, picking a point (mn,bn) in Hn with bn≤b∗−ϵ, we can travel ϵ distance according to dKR1 and get a point in H∞, and the b term can only change by ϵ or less when we move our a-measure a little bit, so we know that our nearby point lies in
{(m,b)∈H∞|b≤b∗}
But, what if our point (mn,bn) in Hn has b∗−ϵ≤bn≤b∗? Well then, we can pick some arbitrary point (mlon,0)∈Hn (by normalization for Hn), and go:
And then we have to be a little careful. bn≤b∗ by assumption. Also, we can unpack the distance to get
≤ϵb∗(supf∈C1−Lip(X,[−1,1])|mn(f)−mlon(f)|+b∗)
And the worst-case for distance, since all the measures have their total amount of measure bounded above by λ⊙, would be f being 1 on one of the measures and -1 on another one of the measures, producing:
≤ϵb∗(2λ⊙+b∗)
So, the distance from (mn,bn) to
ϵb∗(mlon,0)+(1−ϵb∗)(mn,bn)
according to dKR1 is at most 2ϵλ⊙b∗+ϵ
And then, because this point has a b value of at most
(1−ϵb∗)b∗
Because bn≤b∗, the b value upper bound turns into b∗−ϵ
Which is a sufficient condition for that mix of two points to be only ϵ distance from a point in H∞ with a b∗ upper bound on the b term, so we have that the distance from
{(mn,bn)∈Hn|bn≤b∗}
to
{(m,b)∈H∞|b≤b∗}
is at most
2ϵλ⊙b∗+ϵ+ϵ=2ϵ(λ⊙b∗+1)
Conversely, we can flip Hn and H∞, to get this upper bound on the Hausdorff distance between these two sets according to dhau1.
And, since b∗ and λ⊙ are fixed, and for any ϵ, we can find some time where the distance between these two "lower parts" of the Hn and H∞ sets is upper-bounded by 2ϵ(λ⊙b∗+1)
We can have this quantity limit to 0, showing that
limn→∞dhau1({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
For any b∗≥1.
Ok, this is part of our result. No matter which b∗ we chop off the infradistribution sets at, we get convergence of those chopped pieces according to dhau1.
Now, we'll need a second important result, that:
limb∗→∞dhau1({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
Now, we only have to establish one direction of low Hausdorff distance in the limit, that any point in the latter set is close to a point in the former set, because the former set is a subset of the latter set and has distance 0 to it.
What we can do is, because H∞ has the compact-projection property, the set {m|(m,b)∈H∞} is precompact, so for any ϵ, we can select finitely many points in it such that every point in {m|(m,b)∈H∞} is within ϵ distance of our finite subset according to dKR1. For these finitely many measures, there must be some b term associated with them where (m,b)∈H∞, so you can just take the largest one of them, and let that be your b∗. Then, all your finitely many measures, when paired with b∗ or any larger number, will be present in H∞, so
dhau1({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})<ϵ
Because all points in the latter set are close to one of finitely many points, which are all present in the former set, so the Hausdorff-1 distance must be low.
At this point, we can truly begin. We have produced the dual results:
And we also know that, because Hn limits to H according to 1-Hausdorff distance, and projection is 1-Lipschitz,
limn→∞dhau1({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})=0
Now, here's the thing. (The closure of) all of these sets are compact. For instance,
{(mn,bn)∈Hn|bn≤b∗}
will always be compact, because any sequence in here must have a subsequence where its measure converges according to dKR1 (due to the compact-projection property applied to Hn), and then because bn is bounded in [0,b∗], we can pick out another convergent subsequence for that. Plus, it's the intersection of a closed set (Hn) and another closed set {(m,b|b≤b∗}, so it's closed. All sequences have a convergent subsequence and it's closed, so this set is compact. By identical arguments,
{(m,b)∈H∞|b≤b∗}
is compact. And for
{m|(m,b)∈H∞,b≤b∗}
it's the projection of a compact set from earlier arguments, and
{m|(m,b)∈H∞}
must be precompact by the compact-projection property, so it has compact closure. The exact same argument applies to
{mn|(mn,bn)∈Hn}
as well.
Now, for compact sets, convergence in Hausdorff-distance only depends on the topology of the underlying space, not the specific metric it's equipped with. Just as long as the metrics induce the same topology. And the weak topology on the space of measures, or on the space of a-measures, doesn't depend one bit on the metric that X is equipped with, just with the topology. So, the properties of these sets limiting to each other still works when X has its metric changed. Because, for measures/a-measures, we end up using the dKR2 metric, but that induces the same topology on the space of a-measures, so the compact sets still converge in the dhau2 metric. So, we still have our triple results of:
Now, here's how to argue that Hn limits to H∞ in dhau2. Fix some ϵ. From our limits above, there's some value of b∗ where
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
And for that value of b∗, and that ϵ, we have that there's some value of n where, for all greater numbers,
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
And
dhau2({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})≤ϵ
Now, we're going to need to go in two directions for this. First, we pick a point in Hn and show that it's close to a point in H∞. Second, we pick a point in H∞ and show it's close to a point in Hn.
Let (mn,bn)∈Hn. We have two possibilities. One possibility is that bn≤b∗. Then, because
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
we only have to go ϵ distance to get to H∞. The second possibility is that bn>b∗.
In this case, (mn,bn) lies in the set
[b∗,∞)×{mn|∃bn:(mn,bn)∈Hn}
Which has distance ≤ϵ from
[b∗,∞)×{m|∃b:(m,b)∈H∞}
Because we have that
dhau2({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})≤ϵ
Just scooch over and keep the b term the same. Additionally, the set
[b∗,∞)×{m|∃b:(m,b)∈H∞}
has distance ≤ϵ from the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
Because we have:
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
Further, the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
is a subset of H∞, because H∞ is upper-closed. So, either way, we only have to travel 2ϵ 2-distance from Hn to get to H∞
Now for the reverse direction, starting with a point (m,b)∈H∞ and getting to a nearby point in Hn. Again, we can split into two cases. In our first case, b≤b∗, and because
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
we only have to go ϵ distance to get to Hn. The second possibility is that b>b∗. In such a case, (m,b) would be guaranteed to lie in the set
[b∗,∞)×{m|∃b:(m,b)∈H∞}
which has distance ≤ϵ from the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
Because we have:
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
Further, the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
has distance ≤ϵ according to dhau2 from the set
[b∗,∞)×{mn|∃bn≤b∗:(mn,bn)∈Hn}
Because the latter components are the projection of the sets
{(m,b)∈H∞|b≤b∗}
and
{(mn,bn)∈H∞|bn≤b∗}
And we already know that
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
So, given our point (m,b)∈H∞, we just have to go 2ϵ distance to get to the set
[b∗,∞)×{mn|∃bn≤b∗:(mn,bn)∈Hn}
And all points in this set lie in Hn because of upper completion.
Thus, given any ϵ, there's a tail of the Hn sequence where the Hn are all within 2ϵ distance (according to dhau2) of H∞, so if dhau1 thinks that Hn converge to H∞, dhau2 will think that as well. Further, the metric on X which induces dhau1 and dhau2 are arbitrary, so a sequence of infradistributions converging happens regardless of which complete metric X is equipped with.
Proposition 45:If a sequence of infradistributions hn converges to h in the infra-KR distance, then for all bounded continuous functions f, limn→∞hn(f)=h(f).
So, to begin with, if hn converges to h, all bounded Lipschitz functions must have limn→∞hn(f)=h(f) or else the infra-KR distance wouldn't converge.
For the next two, since the infra-KR distance is strongly equivalent to Hausdorff distance, and we know that
∀n:{mn|∃bn:(mn,bn)∈Hn}
is always precompact, and they Hausdorff-limit to
{m|∃b:(m,b)∈H∞}
And we have our Lemma 3 that the union of compact-sets which Hausdorff-limit to something is compact, so the set
{m|∃b,n:(m,b)∈Hn}
is compact (well, actually precompact, but just take the closure).
Because compactness of a set of measures implies that the amount of measure doesn't run off to infinity, there's some λ⊙<∞ that's a shared Lipschitz constant for all the hn.
Also, any uniformly continuous function can be built as the uniform limit of Lipschitz-continuous functions from above and below, so given some uniformly continuous f, we can make a fhim sequence limiting to it from above, and a flom sequence limiting to it from below. Then, we have:
limn→∞hn(f)≤limn→∞hn(fhim)=h(fhim)
And similarly, we can get:
limn→∞hn(f)≥h(flom)
Now, regardless of m and n,
|hn(fhim)−hn(flom)|≤λ⊙⋅d(fhim,flom)
So, even though we don't necessarily know that the limit actually exists for hn(f), we at least know that all the values are bounded in an interval of known maximum size, which converges to the interval
[h(flom),h(fhim)]
Which, by monotonicity for h, h(f) lies in that interval.
So, all the limit points of the hn(f) sequence are in that interval. Now, as m gets unboundedly high, the difference between fhim and flom gets unboundedly small, so for gigantic m, we have that any limit points of the hn sequence must be in a really tiny interval. Taking the limit, we have that the interval crunches down to a single point, and hn(f) actually limits to h(f). We've shown it now for uniformly continuous functions.
Time to expand this to continuous functions in full generality. Again,
\{m|\exists b,n:(m,b)\in H_n\}
is precompact, so this implies that for all ϵ, there is a compact set Cϵ where all minimal points of Hn (regardless of the n! Even for the final infradistribution set H∞!) have <ϵ measure outside of that compact set.
Transferring to functionals, this means that for all the h_n (and h), C_{\eps} is an \eps-almost-support, and any two functions that differ on that set have expectations correspondingly close together.
Given some arbitrary f, let fm be identical to f on C1m, (ie, uniformly continuous on that compact set), and extend it in an arbitrary uniformly continuous way to all of X while staying in [−||f||,||f||], by the Tietze Extension Theorem.
Regardless of then, since C1m is a 1m-almost-support for hn, we have that
|hn(f)−hn(fm)|≤2||f||m
Why? Well, f and fm are identical on a 1m-almost support for hn, so the magnitude of their difference is proportional to 1m, and the maximum level of difference between the two, and f and fm are both in [−||f||,||f||], so they can differ by at most twice that much. The same result extends to the limit h itself.
Because ||f|| is bounded, and n is arbitrary, we have that hn(fm) limits to hn(f)uniformly in n.
Now, we can go:
limn→∞hn(f)=limn→∞limm→∞hn(fm)
And now, to invoke the Moore-Osgood theorem to swap the two limits, we need two results. One is that, for all m,
limn→∞hn(fm)=h(fm)
(which is true because fm was selected to be uniformly continuous).
The second result we need is that for all n,
limm→∞hn(fm)=hn(f)
uniformly inn. Which is true. So, we can invoke the Moore-Osgood theorem and swap the two results, to get
=limm→∞limn→∞hn(fm)
=limm→∞h(fm)=h(f)
So, we have our final result that
limn→∞hn(f)=h(f)
For all continuous bounded functions f, and we're done.
Proposition 46:A set of infradistributions {hi}i∈I is precompact in the topology induced by the IKR distance iff: 1:There's an upper bound on the Lipschitz constant of all the infradistributions in the set 2: There's a sequence of compact sets Cϵ, one for each ϵ, that are compact ϵ-almost-supports for all infradistributions in the set. 3: The set of infradistributions is b-uniform.
This proof will proceed in three phases. The first phase is showing that compactness implies conditions 1 and 2. The second phase is showing that a failure of condition 3 permits you to construct a sequence with no convergent subsequence, so a failure of condition 3 implies non-precompactness, and taking the contrapositive, precompactness implies condition 3. That gets us one half of the iff implication, that precompactness implies the three conditions. For the second half of the iff implication, we assume the three conditions, and construct a convergent subsequence.
So, for our first step, due to working in Hausdorff spaces, we can characterize precompactness as "is a subset of a compact set"
Also, the projection mapping of type
C(M+(X)×R≥0)→K(M+(X))
Which takes a closed set of a-measures (an infradistribution) and projects it down (and takes the closure) to make a compact set of measures (by the compact-projection property), is Lipschitz (projection of sets down to one coordinate keeps their Hausdorff-distance the same or contracts it), so it's continuous. So, a compact set of infradistributions (because the infra-KR metric is strongly equivalent to the Hausdorff-distance), would get mapped to a compact set of sets of measures (because the image of a compact set is compact), which by Lemma 3, unions together to make a compact set of measures.
Doing the same process (taking your precompact set of infradistributions, mapping it through the projection, unioning together all the sets) makes a subset of that compact set of measures, so it's precompact.
Also, the necessary-and-sufficient condition for precompactness of a set of measures is that: There be a maximum amount of measure present, and for all ϵ there is a compact set Cϵ⊆X where all the measures assign ≤ϵ measure outside of that compact set.
So, if you take a precompact set of infradistributions, all the measure components of points in any of them have a uniform upper bound on the amount of measure present, and we also have the shared compact almost-support property. So, precompactness implies conditions 1 and 2.
Time for phase 2 of our proof, showing that a failure of condition 3 implies that there's a sequence from it with no convergent subsequence in the KR-metric.
Assume, for contradiction, that we indeed have a precompact set which fails condition 3. Using I to index your set of infradistributions, Condition 3 is:
∀ϵ>0∃b∗∀i:dhau(Hi,Hb∗i)≤ϵ
Where Hb∗i is the set formed from the set Hi by deleting all points with b>b∗ and taking the upper completion again. Negating this, we see that the set of infradistribution sets Hi failing this condition is stated as:
∃ϵ>0∀b∗∃i:dhau(Hi,Hb∗i)>ϵ
So, let ϵ0 be your ϵ of choice, and let Hn be the infradistribution Hi such that dhau(Hi,Hni)≥ϵ0.
Because we're assuming that this sequence of infradistributions was selected from a precompact set, we have a guarantee that the sequence Hn has a convergent subsequence limiting to some H∞. We'll still be using n as our limiting variable, hopefully this doesn't cause too much confusion.
Now, from our earlier proof of Proposition 44, we can crib two results from the proof. From this proof, we know that because Hn limits to H∞ in Hausdorff-distance,
limb∗→∞dhau({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
and also,
limn→∞dhau({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
For any b∗≥1. To craft this into a more usable form, we can realize that for all b∗, Hb∗∞⊆H∞
So the distance from the former set to the latter set is 0. Also, any point in H∞ can be written as (m,b). Either b≤b∗, in which case the same point is present in Hb∗∞ and the distance to enter that set is 0, or b>b∗, in which case the m component is present in {m|(m,b)∈H∞}, and from
limb∗→∞dhau({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
For large b∗, you just have to adjust the m component a little bit to m′ and then you know there's some (m′,b′)∈H∞,b′≤b∗, so by upper completion, (m′,b)∈Hb∗∞, and this point is close to (m,b).
We took a point in Hb∗∞ and showed it's in H∞ (trivially), and took a point in H∞ and showed there's a nearby point in Hb∗∞, so we have our modified result that:
limb∗→∞dhau(Hb∗∞,H∞)=0
For another modified result, due to the fact that we know
limn→∞dhau({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
We can take any point in Hb∗n, descend to a point in Hn (but cut off at b∗), shift over a bit to get to H∞ (but cut off at b∗), and add the same amount of b value to this point as you took off, to make a point in Hb∗∞ that's nearby to the point you started with, and flip the two sets, to argue that
∀b∗:limn→∞dhau(Hb∗n,Hb∗∞)=0
Now, here's what you do from here. We know our ϵ0 value. Because of the fact that
limb∗→∞dhau(Hb∗∞,H∞)=0
we can identify some finite b∗ value (call it b0) where, for it and all greater values,
dhau(Hb0∞,H∞)<ϵ03
Locking this value in, and because of
∀b∗:limn→∞dhau(Hb∗n,Hb∗∞)=0
and Hn limiting to H∞, so
limn→∞dhau(Hn,H∞)=0
We can find some finite n where, for all greater values,
dhau(Hb0n,Hb0∞)<ϵ03
and
dhau(Hn,H∞)<ϵ03
There's one last thing to note. The sequence Hn was selected as a subsequence of a sequence of infradistributions selected so that the Hausdorff-distance between an infradistribution and its truncation of minimal points at a certain b value was always ϵ0 or more.
Accordingly let ρ(n) be the value of the cutoff for Hn (ie, the index of Hn before we did the reindexing when we passed to a subsequence). Due to our construction process for the Hn, we have that:
∀n:dhau(Hn,Hρ(n)n)≥ϵ0
Further, ρ(n) diverges to infinity, so there's some n where ρ(n)≥b0. Because, for that n, Hb0n⊆Hρ(n)n⊆Hn, we have that dhau(Hb0n,Hn)≥dhau(Hρ(n)n,Hn).
Taking stock of all we have, we know that there is some n where:
dhau(Hb0∞,H∞})<ϵ03
and
dhau(Hb0n,Hb0∞)<ϵ03
and
dhau(H∞,Hn)<ϵ03
and
dhau(Hb0n,Hn)≥dhau(Hρ(n)n,Hn)
and, by our construction process for the Hn sequence,
But we just showed ϵ0>ϵ0, a contradiction. Our one assumption that we made was that there could be a set of infradistributions that was both precompact and that failed to meet the shared b-uniformity condition. Therefore, if a set of infradistributions is precompact, it must fulfill the shared b-uniformity condition.
Because we've shown that precompactness implies a Lipschitz bound and shared compact-almost-support in part 1 of the proof, and that precompactness implies the shared b-uniformity condition, we have one direction of our iff statement. Precompactness implies these three properties.
Now we'll go in the other direction and establish that if these three properties are fulfilled, then every sequence of infradistributions has a convergent subsequence.
So, let's say we have some set of infradistributions Hi that fulfills the following three properties:
∃λ⊙∀i,(m,b)∈Hi:m(1)≤λ⊙
(this is bounded Lipschitz constant)
∀ϵ∃Cϵ∈K(X)∀i,(m,b)∈Hi:m(X/Cϵ)≤ϵ
(this is shared almost-compact-support)
∀ϵ∃b∗∀i:dhau(Hi,Hb∗i)≤ϵ
(this is the b-uniformity condition)
Note that Hb∗i is Hi but you chop off all the points in it with b≥b∗ and regenerate it via upper-completion.
First, the compact almost-support condition and bounded amount of measure (and closure) are necessary-and-sufficient conditions for a set of measures to be compact. Thus, letting ΔC,λ be defined as:
{m∈M+(X)|∀ϵ:m(X/Cϵ)≤ϵ∧m(1)≤λ⊙}
(ie, measures where the measure outside of the compact set Cϵ is ϵ or less, for all ϵ, and the amount of measure is upper-bounded by λ⊙, where that sequence of compact sets and measure upper bound came from the relevant sequence of compact sets and measure upper bound on the set {Hi|i∈I}, from the fact that we assumed a Lipschitz upper bound and shared compact-almost-support for it).
We know that ΔC,λ is a compact set. All the measure components of all the points in all the Hi lie in this set. Thus, all sets Hi can be thought of as being a subset of the space ΔC,λ×R≥0
In particular, all our Hn (from our arbitrily selected sequence) are a subset of this space.
Now, here's what we do. Fix anym≥1. From the b-uniformity condition on the Hi, there is some quantity bm where
∀i:dhau(Hi,Hbmi)≤1m
What we're going to do is find a subsequence of the Hn sequence where the Hbmn sequence converges in Hausdorff-distance.
Here's how to do it. We can take each Hn and chop it off at a b value of bm, to make a closed set (Hbmn)′ which is a subset of ΔC,λ×[0,bm]
Which, being a product of two compact sets, is compact. Further, the space of compact subsets of a compact space (equipped with a Hausdorff distance-metric) is compact. So, we can isolate some subsequence where the (Hbmn)′ sets converge in Hausdorff-distance. If sets converge in Hausdorff-distance, their upper completions do too, so we have isolated a subsequence of our Hn sequence where the sets Hbmn converge in Hausdorff-distance. Also, each Hbmn infradistribution set is only 1m Hausdorff-distance away, at most, from the corresponding Hn. So, for sufficiently large n, the Hn subsequence we picked out is all wandering around in a ball of size 2m.
Now, here's what we do. Start with your Hn sequence. Use the argument we described above for m=1 to isolate a subsequence where the Hausdorff-distance of the subsequence eventually is wandering around in a ball (w.r.t. Hausdorff-distance) of size 2 in the tail. Now, use the argument for m=2 to isolate a subsequence of that wandering around in a ball (w.r.t. Hausdorff-distance) of size 1 in the tail. And, y'know, repeat for all finite m, to get a subsequence embedded in all previous subsequences which, in the tail, is wandering around in a ball of size 2m in the tail.
Now build one final subsequence, which takes the first element of the m=1 subsequence, the second element of the m=2 subsequence, the third element of the m=3 subsequence, and so on. It eventually enters the tail of the sequence for all finite m, so, regardless of m, the tail of that sequence starts wandering around in a ball of size 2m. Thus, the sequence is actually Cauchy, and must converge, as we've previously shown that the space □X is complete in the KR/Hausdorff metric.
Assuming the three conditions on a set of infradistributions has let us show that every sequence has a convergent subsequence, and thus must be precompact, so we have the reverse direction of our iff statement and we're done.
Proposition 47: When X is a compact Polish space, the spaces of cohomogenous, crisp, and sharp infradistributions are all compact in □X equipped with the infra-KR metric.
So, from Proposition 46, necessary-and-sufficient conditions for a set of infradistributions to be compact is:
1: Bounded Lipschitz constant/bounded amount of measure on minimal points. 1-Lipschitz, C-additive, cohomogenous, crisp, and sharp infradistributions fulfill this because of their iff minimal point characterizations.
2: Shared compact almost-supports. X is compact by assumption, and it's the whole space so it must be a support of everything, and thus an ϵ-almost-support of everything, so this is trivially fulfilled for all infradistributions when X is compact.
3: b-uniformity. Homogenous, cohomogenous, crisp, and sharp infradistributions fulfill this because they all have their minimal points having b≤1, and the condition is "there's gotta be some b value you can go up to in order to have a guarantee of being within ϵ of the full H set in Hausdorff-distance if you delete all the minimal points with a higher b value, for all ϵ".
Thus, cohomogenous, crisp, and sharp infradistributions fulfill the necessary-and-sufficient conditions for precompactness, and all we need is to check that the set of them is closed in the KR-metric.
To do this, we'll invoke Proposition 45, that: If a sequence of infradistributions hn converges to h in the infra-KR distance, then for all bounded continuous functions f, limn→∞hn(f)=h(f).
The characterization for cohomogenity was that h(1+af)=1−a+ah(1+f) So, we can go:
h(1+af)=limn→∞hn(1+af)=limn→∞1−a+ahn(1+f)
=1−a+alimn→∞hn(1+f)=1−a+ah(1+f)
Showing that the limit of cohomogenous infradistributions is cohomogenous, and we've verified closure, which is the last property we needed for cohomogenity.
The characterization for crispness was that: h(c+af)=c+ah(f) for c∈R,a≥0. To show it's preserved under limits, we can go:
Showing that the limit of crisp infradistributions is crisp, and we've verified closure. Sharpness is a bit more tricky.
Let's say a sequence of sharp infradistributions hn limits to h, and all the hn are associated with the compact set Cn⊆X. The minimal points of the hn consist of all probability distributions supported over Cn, with a b value of 0. Thus, all the Hn sets can be written as ΔCn×R≥0, and so, if they converge in Hausdorff-distance, then the sets of probability distributions ΔCn must converge in Hausdorff-distance, which is impossible if Cn don't converge in Hausdorff-distance, because the dirac-delta distributions on points in the Cn sets can transport a failure of Hausdorff-convergence of the Cn sets up to a failure of Hausdorff-convergence of the ΔCn sets of probability distributions.
Thus, the Cn converge to a compact set C∞ in Hausdorff-distance.
We also know that, because sharp infradistributions are crisp infradistributions, and crisp infradistributions are preserved under limits, all we have to check is if the minimal points of H∞ consist exactly of all probability distributions supported over C∞. Now, ΔC∞ is the closed convex hull of all the dirac-delta distributions on points in C∞, and all those points have a sequence from the Cn that converge to them, so the associated dirac-delta distributions converge and witness that all the dirac-delta distributions on points in C∞ are present in the set H∞. So, because infradistribution sets are closed and convex, all of ΔC∞ must be present as minimal points in H∞. Now we just need to rule out the presence of additional points.
Let's say we've got some probability distribution μ∈H∞ which is not supported entirely on C∞, there's ϵ probability mass outside that set. Because probability distributions in Polish spaces have the property that the probability on open supersets of the set of interest can be shrunk down to have arbitrarily similar measure, we can find some open superset of C∞, call it O, which has ϵ2 probability mass outside of it. Any point outside of O must be some δ distance away from C∞, because otherwise, you could pick a sequence of points in C∞ which gets arbitrarily close to the (closed) complement of O, find a convergent subsequence since C∞ is compact, and you'd have a limit point which is in C∞ (due to closure) and also in the complement of O (due to getting arbitrarily close to said closed set), disproving that the two sets are disjoint (because O is a superset of C∞)
Ok, so our hypothetical "bad" probability distribution has ϵ2 probability measure at a distance of δ or more from our set of interest, C∞. The KR distance is equivalent to the earthmover distance, which is "how much effort would it take to move this pile of dirt/pile of probability mass into the other distribution/pile of dirt".
All minimal points in H∞ must have a sequence of minimal points in Hn limiting to them, because it's the Hausdorff-limit of those infradistributions. So, we've got some sequence μn limiting to our hypothetical bad distribution μ∉C∞, but all the μn lie in ΔCn.
There is some n value where dKR(μ,μn)<δϵ4, and also where dhau(C∞,Cn)<δϵ4. Now, we can get something really interesting.
So, we agree that μ has ϵ2 probability mass a distance of δ or more away from the set C∞, right? This means that the earthmover distance from μ to any point in ΔC∞ must be δϵ2 or more, because you've gotta move ϵ2 measure a distance of δ at the very least.
However, the earthmover distance from μ to μn is strictly belowδϵ4, and because μn∈ΔCn, it's only got an earthmover distance of less than δϵ4 to go to arrive at a probability distribution in ΔC∞, because all dirt piled up in Cn is only δϵ4 distance away from C∞. So, the distance from μ to ΔC∞ is only
<δϵ4+δϵ4=δϵ2
distance. But we know it's impossible for it to be any closer than δϵ2 distance from that set, so we have a contradiction, and no such μ can exist in H∞. Thus, Hmin∞ has all the probability distributions over C∞ and nothing else, so the limit of sharp infradistributions is sharp, and we're done.
Proposition 39: Given a crisp infradistribution ζ□ over N, an infrakernel K from N to infradistributions over X, and suggestively abbreviating K(i) as Hi(hypothesis i) and K∗(ζ□) as Eζ□Hi (your infraprior where you have Knightian uncertainty over how to mix the hypotheses), then
((Eζ□Hi)|gL)(f)=Eζ□(PgHi(L)⋅(Hi|gL)(f)+Hi(0★Lg))−Eζ□(Hi(0★Lg))Eζ□(Hi(1★Lg))−Eζ□(Hi(0★Lg))
Proof: Assume that L and g are functions of type X→[0,1] and X→R respectively, ie, likeliehood and utility doesn't depend on which hypothesis you're in, just what happens. First, unpack our abbreviations and what an update means.
((Eζ□Hi)|gL)(f)=(K∗(ζ□)|gL)(f)=K∗(ζ□)(f★Lg)−K∗(ζ□)(0★Lg)K∗(ζ□)(1★Lg)−K∗(ζ□)(0★Lg)
Then use the definition of an infrakernel pushforward.
=ζ□(λi.K(i)(f★Lg))−ζ□(λi.K(i)(0★Lg))ζ□(λi.K(i)(1★Lg))−ζ□(λi.K(i)(0★Lg))
For the next thing, we're just making types a bit more explicit, f,L,g only depend on x, not i.
=ζ□(λi.K(i)(λx.f★Lg))−ζ□(λi.K(i)(λx.0★Lg))ζ□(λi.K(i)(λx.1★Lg))−ζ□(λi.K(i)(λx.0★Lg))
Then we pack the semidirect product back up.
=(ζ□⋉K)(λi,x.f★Lg)−(ζ□⋉K)(λi,x.0★Lg)(ζ□⋉K)(λi,x.1★Lg)−(ζ□⋉K)(λi,x.0★Lg)
And pack the update back up.
=((ζ□⋉K)|gL)(f)
At this point, we invoke the Infra-Disintegration Theorem.
=(ζ□′⋉K′)(f)
=ζ□′(λi.K′(i)(f))
We unpack what our new modified prior is, via the Infra-Disintegration Theorem.
=ζ□(λi.α(i)K′(i)(f)+β(i))−(ζ□⋉K)(0★Lg)(ζ□⋉K)(1★Lg)−(ζ□⋉K)(0★Lg)
and unpack the semidirect product.
=ζ□(λi.α(i)K′(i)(f)+β(i))−ζ□(λi.K(i)(0★Lg))ζ□(λi.K(i)(1★Lg))−ζ□(λi.K(i)(0★Lg))
Now we unpack α and β.
=ζ□(λi.PgK(i)(L)⋅K′(i)(f)+K(i)(0★Lg))−ζ□(λi.K(i)(0★Lg)ζ□(λi.K(i)(1★Lg))−ζ□(λi.K(i)(0★Lg)
And unpack what K′ is
=ζ□(λi.PgK(i)(L)⋅(K(i)|gL)(f)+K(i)(0★Lg))−ζ□(λi.K(i)(0★Lg))ζ□(λi.K(i)(1★Lg))−ζ□(λi.K(i)(0★Lg))
And reabbreviate K(i) as Hi,
=ζ□(λi.PgHi(L)⋅(Hi|gL)(f)+Hi(0★Lg))−ζ□(λi.Hi(0★Lg)ζ□(λi.Hi(1★Lg))−ζ□(λi.Hi(0★Lg)
And then pack it back up into a suggestive form as a sort of expectation.
=Eζ□(PgHi(L)⋅(Hi|gL)(f)+Hi(0★Lg))−Eζ□(Hi(0★Lg))Eζ□(Hi(1★Lg))−Eζ□(Hi(0★Lg)
And we're done.
Proposition 40: If a likelihood function L:X→[0,1] is 0 when f(x)<a, and f≥0 and a>0, then h(L⋅a)≤h(f)
h(L⋅a)=inf(λμ,b)∈Hλμ(L⋅a)+b=inf(λμ,b)∈Haλμ(L)+b
And then we apply Markov's inequality, that for any probability distribution,
μ(1f(x)≥a)≤μ(f)a
Also,1f(x)≥a≥L (because L is 0 when f(x)<a), so monotonicity means that
μ(L)≤μ(f)a
So, we can get:
≤inf(λμ,b)∈Haλμ(f)a+b=inf(λμ,b)∈Hλμ(f)+b=h(f)
And we're done.
Proposition 41: The IKR-metric is a metric.
So, symmetry is obvious, as is one direction of identity of indiscernibles (that the distance from an infradistribution to itself is 0). That just leaves the triangle inequality and the other direction of identity of indiscernibles. For the triangle inequality, observe that for any particular f (instead of the supremum), it would fulfill the triangle inequality, and then it's an easy exercise for the reader to verify that the same property applies to the supremum, so the only tricky part is the reverse direction of identity of indiscernibles, that two infradistributions which have a distance of 0 are identical.
First, if dIKR(h,h′)=0, then h and h′ must perfectly agree on all the Lipschitz functions. And then, because uniformly continuous functions are the uniform limit of Lipschitz functions, h and h′ must perfectly agree on all the uniformly continuous functions.
Now, we're going to need a somewhat more sophisticated argument. Let's say that the sequence fn is uniformly bounded and limits to f in CB(X) equipped with the compact-open topology (ie, we get uniform convergence of fn to f on all compact sets). Then, for any infradistributions, h(fn) will limit to h(f). Here's why. For any ϵ, there's some compact set Cϵ that accounts for almost all of why a function inputted into an infradistribution has the value it does. Then, what we can do is realize that h(fn) will, in the limit, be incredibly close to h(f), due to fn and f disagreeing by a bounded amount outside the set Cϵ and only disagreeing by a tiny amount on the set Cϵ, and the Lipschitzness of h.
Further, according to this mathoverflow answer, uniformly continuous functions are dense in the space of all continuous functions when CB(X) is equipped with the compact-open topology, so given any function f, we can find a sequence of uniformly continuous functions fn limiting to f in the compact-open topology, and then,
h(f)=limn→∞h(fn)=limn→∞h′(fn)=h′(f)
And so, h and h′ agree on all continuous functions, and are identical, if they have a distance of 0, giving us our last piece needed to conclude that dIKR is a metric.
Proposition 42: The IKR-metric for infradistributions is strongly equivalent to the Hausdorff distance (w.r.t. the KR-metric) between their corresponding infradistribution sets.
Let's show both directions of this. For the first one, if the Hausdorff-distance between H,H′ is dhau(H,H′), then for all a-measures (m,b) in H, there's an a-measure (m′,b′) in H′ that's only dhau(H,H′) or less distance away, according to the KR-metric (on a-measures).
Now, by LF-duality, a-measures in H correspond to hyperplanes above h. Two a-measures being dhau(H,H′) apart means, by the definition of the KR-metric for a-measures, that they will assign values at most dhau(H,H′) distance apart for 1-Lipschitz functions in [−1,1].
So, translating to the concave functional view of things, H and H′ being dhau(H,H′) apart means that every hyperplane above h has another hyperplane above h′ that can only differ on the 1-Lipschitz 1-bounded functions by at most dhau(H,H′), and vice-versa.
Let's say we've got a Lipschitz function f. Fix an affine functional/hyperplane ψh that touches the graph of h at f. Let's try to set an upper bound on what h′(f) can be. If f is 1-Lipschitz and 1-bounded, then we can craft a ψh′ above h′ that's nearby, and
h′(f)≤ψh′(f)≤ψh(f)+dhau(H,H′)=h(f)+dhau(H,H′)
Symmetrically, we can swap h′ and h to get h(f)≤h′(f)+dhau(H,H′), and put them together to get:
|h′(f)−h(f)|≤dhau(H,H′)
For the 1-Lipschitz functions.
Let's tackle the case where f is either more than 1-Lipschitz, or strays outside of [−1,1]. In that case, fmax(Li(f),||f||) is 1-Lipschitz and bounded in [−1,1]. We can craft a ψh′ that only differs on 1-Lipschitz functions by dhau(H,H′) or less. Then, since, for affine functionals, ψ(ax)=a(ψ(x)−ψ(0))+ψ(0) and using that ψh′ and ψh are close on 1-Lipschitz functions, which fmax(Li(f),||f||) and 0 are, we can go:
h′(f)≤ψh′(f)=ψh′(max(Li(f),||f||)⋅fmax(Li(f),||f||))
=max(Li(f),||f||)(ψh′(fmax(Li(f),||f||))−ψh′(0))+ψh′(0)
And then we swap out ψh′ for ψh with a known penalty in value, we're taking an overestimate at this point.
≤max(Li(f),||f||)((ψh(fmax(Li(f),||f||))+dhau(H,H′))−(ψh(0)−dhau(H,H′)))
+(ψh(0)−dhau(H,H′))
=dhau(H,H′)⋅(2(max(Li(f),||f||))−1)
+max(Li(f),||f||)⋅(ψh(fmax(Li(f),||f||))−ψh(0))+ψh(0)
<2dhau(H,H′)⋅(max(Li(f),||f||))+ψh(max(Li(f),||f||)⋅fmax(Li(f),||f||))
=2dhau(H,H′)⋅(max(Li(f),||f||))+ψh(f)
=2dhau(H,H′)⋅(max(Li(f),||f||))+h(f)
This argument works for all h. And, even though we just got an upper bound, to rule out h′(f) being significantly below h(f), we could run through the same upper bound argument with h′ instead of h, to show that h(f) can't be more than 2dhau(H,H′)⋅(max(Li(f),||f||)) above h′(f).
So, for all Lipschitz f, |h(f)−h′(f)|≤2dhau(H,H′)⋅(max(Li(f),||f||,1)). Thus, for all Lipschitz f,
|h(f)−h′(f)|max(Li(f),||f||,1)≤2dhau(H,H′)
And therefore,
dIKR(h,h′)≤2dhau(H,H′)
This establishes one part of our inequalities. Now for the other direction.
Here's how things are going to work. Let's say we know the IKR-distance between h and h′. Our task will be to stick an upper bound on the Hausdorff-distance between H and H′. Remember that the Hausdorff-distance being low is equivalent to "any hyperplane above h has a corresponding hyperplane above h′ that attains similar values on the 1-or-less-Lipschitz functions".
So, let's say we've got h, and a ψh≥h. Our task is, knowing h′, to craft a hyperplane above h′ that's close to ψ on the 1-Lipschitz functions. Then we can just swap h′ and h, and since every hyperplane above h is close (on the 1-Lipschitz functions) to a hyperplane above h′, and vice-versa, H and H′ can be shown to be close. We'll use Hahn-Banach separation for this one.
Accordingly, let the set A be the set of f,b where (f,b)=p(f′,b′)+(1−p)(f∗,b∗), and:
p∈[0,1),f′∈C1−lip(X,[−1,1]),f∗∈CB(X),b′<ψh(f′)−dIKR(h,h′),b∗<h′(f∗)
That's... quite a mess. It can be thought of as the convex hull of the hypograph of h′, and the hypograph of ψh restricted to the 1-Lipschitz functions in [−1,1] and shifted down a bit. If there was a ψh′ that cuts into h′ and scores lower than it, ie ψh′(f∗)<h′(f∗), we could have p=0, and b∗=ψh′(f∗)<h′(f∗) to observe that ψh′ cuts into the set A. Conversely, if an affine functional doesn't cut into the set A, then it lies on-or-above the graph of h′.
Similarly, if ψh′ undershoots ψh−dIKR(h,h′) over the 1-or-less-Lipschitz functions in [−1,1], it'd also cut into A. Conversely, if the hyperplane ψh′ doesn't cut into A, then it sticks close to ψh over the 1-or-less-Lipschitz functions.
This is pretty much what A is doing. If we don't cut into it, we're above h′ and not too low on the functions with a Lipschitz norm of 1 or less.
For Hahn-Banach separation, we must verify that A is convex and open. Convexity is pretty easy.
q(p1(f′1,b′1)+(1−p1)(f∗1,b∗1))+(1−q)(p2(f′2,b′2)+(1−p2)(f∗2,b∗2))
=(qp1+(1−q)p2)(qp1qp1+(1−q)p2(f′1,b′1)+(1−q)p2qp1+(1−q)p2(f′2,b′2))
+(q(1−p1)+(1−q)(1−p2))(q(1−p1)q(1−p1)+(1−q)(1−p2)(f∗1,b∗1)+(1−q)(1−p2)(q(1−p1)+(1−q)(1−p2)(f∗2,b∗2))
First verification: Those numbers at the front add up to 1 (easy to verify), are both in [0,1] (this is trivial to verify), and qp1+(1−q)p2 isn't 1 (this is a mix of two numbers that are both below 1, so this is easy). Ok, that condition is down. Next up: Is our mix of f′1 and f′2 1-Lipschitz and in [−1,1]? Yes, the mix of 1-Lipschitz functions in that range is 1-Lipschitz and in that range too. Also, is our mix of f∗1 and f∗2 still in CB(X)? Yup.
That leaves the conditions on the b terms. For the first one, just observe that mixing two points that lie strictly below ψh′−dIKR(h,h′) (a hyperplane) lies strictly below it as well. For the second one, since h′ is concave, mixing two points that lie strictly below its graph also lies strictly below its graph. Admittedly, there may be divide-by-zero errors, but only when qp1+(1−q)p2 is 0, in which case, we can have our new f′ and b′ be anything we want as long as it fulfills the conditions, it still defines the same point (because that term gets multiplied by 0 anyways). So A is convex.
But... is A open? Well, observe that the region under the graph of h on CB(X) is open, due to Lipschitzness of h. We can wiggle b and f around a tiny tiny little bit in any direction without matching or exceeding the graph of h. So, given a point in A, fix your tiny little open ball around (f∗,b∗). Since p can't be 1, when you mix with (f′,b′), you can do the same mix with your little open ball instead of the center point, and it just gets scaled down (but doesn't collapse to a point), making a little tiny open ball around your arbitrarily chosen point in A. So A is open.
Now, let's define a B that should be convex, so we can get Hahn-Banach separation going (as long as we can show that A and B are disjoint). It should be chosen to forbid our separating hyperplane being too much above ψh over the 1-or-less Lipschitz functions. So, let B be:
{(f,b)|f∈C1−lip(X,[−1,1]),b≥ψh(f)+dIKR(h,h′)}
Obviously, cutting into this means your hyperplane is too far above ψh over the 1-or-less-Lipschitz functions in [−1,1]. And it's obviously convex, because 1-or-less-Lipschitz functions in [−1,1] are a convex set, and so is the region above a hyperplane (ψh+dIKR(h,h′)).
All we need to do now for Hahn-Banach separation is show that the two sets are disjoint. We'll assume there's a point in both of them and derive a contradiction. So, let's say that (f,b) is in both A and B. Since it's in B,
b≥ψh(f)+dIKR(h,h′)
But also, (f,b)=p(f′,b′)+(1−p)(f∗,b∗) with the f's and b's and p fulfilling the appropriate properties, because it's in A. Since b∗<h′(f∗) and b′<ψh(f′)−dIKR(h,h′), we'll write b∗ as h′(f∗)−δ∗ and b′ as ψh(f′)−dIKR(h,h′)−δ′, where δ∗ and δ′ are nonzero. Thus, we rewrite as:
p(ψh(f′)−dKR(h,h′)−δ′)+(1−p)(h′(f∗)−δ∗)≥ψh(pf′+(1−p)f∗)+dIKR(h,h′)
We'll be folding −pδ′−(1−p)δ∗ into a single −δ term so I don't have to write as much stuff. Also, ψh is an affine function, so we can split things up with that, and make:
pψh(f′)−pdIKR(h,h′)+(1−p)h′(f∗)−δ≥pψh(f′)+(1−p)ψh(f∗)+dIKR(h,h′)
(1−p)h′(f∗)−δ≥(1−p)ψh(f∗)+(1+p)dIKR(h,h′)
Remember, ψh(f∗)≥h(f∗) because ψh≥h. So, we get:
(1−p)h′(f∗)−δ≥(1−p)h(f∗)+(1+p)dIKR(h,h′)
(1−p)(h′(f∗)−h(f∗))−δ≥(1+p)dIKR(h,h′)
And, if h(f∗)≥h′(f∗), we get a contradiction straightaway because the left side is negative, and the right side is nonnegative. Therefore, h′(f∗)>h(f∗), and we can rewrite as:
(1−p)|h′(f∗)−h(f∗)|−δ≥(1+p)dIKR(h,h′)
And now, we should notice something really really important. Since p can't be 1, f∗ does consistute a nonzero part of f, because f=pf′+(1−p)f∗.
However, f is a 1-or-less Lipschitz function, and bounded in [−1,1], due to being in B! If f∗ wasn't Lipschitz, then given any slope, you could find areas where it's ascending faster than that rate. This still happens when it's scaled down, and f′ can only ascend or descend at a rate of 1 or slower there since it's 1-Lipschitz as well. So, in order for f to be 1-or-less Lipschitz, f∗ must be Lipschitz as well. Actually, we get something stronger, if f∗ has a really high Lipschitz constant, then p needs to be pretty high. Otherwise, again, f wouldn't be 1-or-less Lipschitz, since 1−p of it is composed of f∗, which has areas of big slope. Further, if f∗ has a norm sufficiently far away from 0, then p needs to be pretty high, because otherwise f wouldn't be in [−1,1], since 1−p of it is composed of f∗ which has areas distant from 0.
Our most recent inequality (derived under the assumption that there's a point in A and B) was:
(1−p)|h′(f∗)−h(f∗)|−δ≥(1+p)dIKR(h,h′)
Assuming hypothetically were were able to show that
(1−p)|h′(f∗)−h(f∗)|≤(1+p)dIKR(h,h′)
then because δ>0, we'd get a contradiction, showing that A and B are disjoint. So let's shift our proof target to trying to show
(1−p)|h′(f∗)−h(f∗)|≤(1+p)dIKR(h,h′)
Let's begin. So, our first order of business is that
1+p1−p≥1
This should be trivial to verify, remember that p∈[0,1).
Now, f=pf′+(1−p)f∗, and f is 1-Lipschitz, and so is f′. Our goal now is to impose an upper bound on the Lipschitz constant of f∗. Let us assume that said Lipschitz constant of f∗ is above 1. We can find a pair of points where the rise of f∗ from the first point to the next, divided by the distance between the points is exceptionally close to the Lipschitz constant of f∗, or equal. If we're trying to have f∗ slope up as hard as it possibly can while mixing to make f, which is 1-Lipschitz, then the best case for that is one where f′ is sloping down as hard as it can, at a rate of -1. Therefore, we have that
(1−p)Li(f∗)+p⋅(−1)≤1
Ie, mixing f∗ sloping up as hard as possible and f′ sloping down as hard as possible had better make something that slopes up at a rate of 1 or less. Rearranging this equation, we get:
(1−p)Li(f∗)≤(1+p)
Li(f∗)≤1+p1−p
We can run through almost the same exact argument, but with the norm of f∗. Let us assume that said norm is above 1. We can find a point where f∗ attains its maximum/minimum, whichever is further from 0. Now, if you're trying to have f∗ be as negative/positive as it possibly can be, while mixing to make f, which lies in [−1,1], then the best case for that is one where f′ is as positive/negative as it can possibly be there, ie, has a value of -1 or 1. In both cases, we have:
(1−p)||f∗||+p⋅(−1)≤1
(1−p)||f∗||≤(1+p)
||f∗||≤1+p1−p
Now we can proceed. Since we established that all three of these quantities (1, Lipschitz constant, and norm) are upper bounded by 1+p1−p, we have:
max(Li(f∗)||f∗||,1)≤1+p1−p
1−p≤1+pmax(Li(f∗),||f∗||,1)
(1−p)|h′(f∗)−h(f∗)|≤(1+p)|h′(f∗)−h(f∗)|max(Li(f∗),||f∗||,1)
≤(1+p)supf∈CB−lip(X)|h′(f)−h(f)|max(Li(f),||f||,1)=(1+p)dIKR(h,h′)
And we have exactly our critical
(1−p)|h′(f∗)−h(f∗)|≤(1+p)dIKR(h,h′)
inequality necessary to force a contradiction. Therefore, A and B must be disjoint. Since A is open and convex, and B is convex, we can do Hahn-Banach separation to get something that touches B and doesn't cut into A.
Therefore, we've crafted a ψh′ that lies above h′, and is within dIKR(h,h′) of ψh over the 1-or-less-Lipschitz functions in [−1,1], because it doesn't cut into A and touches B.
This same argument works for any ψh≥h, and it works if we swap h′ and h. Thus, since hyperplanes above the graph of an infradistribution function h or h′ correspond to points in the corresponding H and H′, and we can take any point in H/affine functional above h and make a point in H′/affine functional above h′ (and same if the two are swapped) that approximately agree on C1−lip(X,[−1,1]), there's always a point in the other infradistribution set that's close in KR-distance and so H and H′ have
dhau(H,H′)≤dIKR(h,h′)
And with that, we get
dhau(H,H′)≤dIKR(h,h′)≤2dhau(H,H′)
And we're done! Hausdorff distance between sets is within a factor of 2 of the IKR-distance between their corresponding infradistributions.
Proposition 43: A Cauchy sequence of infradistributions converges to an infradistribution, ie, the space □X is complete under dIKR.
So, the space of closed subsets of Ma(X) is complete under the Hausdorff metric. Pretty much, by proposition 42, a Cauchy sequence of infradistributions hn in the IKR-distance corresponds to a Cauchy sequence of infradistribution sets Hn converging in Hausdorff-distance, so to verify completeness, we merely need to double-check that the Hausdorff-limit of the Hn sets fulfills the various different properties of an infradistribution. Every point in H∞, the limiting set, has the property that there exists some Cauchy sequence of points from the Hn sets that limit to it, and also every Cauchy sequence of points from the Hn sets has its limit point be in H∞.
So, for nonemptiness, you have a sequence of nonempty sets of a-measures limiting to each other in Hausdorff-distance, so the limit is going to be nonempty.
For upper completion, given any point (m,b)∈H∞, and any (0,b′) a-measure, you can fix a Cauchy sequence (mn,bn)∈Hn limiting to (m,b), and then consider the sequence (mn,bn+b′), which is obviously Cauchy (you're just adding the same amount to everything, which doesn't affect the KR-distance), and limits to (m,b+b′), certifying that (m,b)+(0,b′)∈H∞, so H∞ is upper-complete.
For closure, the Hausdorff limit of a sequence of closed sets is closed.
For convexity, given any two points (m,b) and (m′,b′) in H∞, and any p∈[0,1], we can fix a Cauchy sequence (mn,bn)∈Hn and (m′n,b′n)∈Hn converging to those two points, respectively, and then consider the sequence p(mn,bn)+(1−p)(m′n,b′n), which lies in Hn (due to convexity of all the Hn), and converges to p(m,b)+(1−p)(m′,b′), witnessing that this point is in H∞, and we've just shown convexity.
For normalization, it's most convenient to work with the positive functionals, and observe that, because all the hn(0)=0 and all the hn(1)=1 because of normalization, the same property must apply to the limit, and this transfers over to get normalization for your infradistribution set.
Finally, there's the compact-projection property. We will observe that the projection of the a-measures in Hn to just their measure components, call the set pr(Hn), must converge in Hausdorff-distance. The reason for this is because if they didn't, then you could find some ϵ and arbitrarily late pairs of inframeasures where pr(Hn) and pr(Hm) have Hausdorff-distance >ϵ, and then pick a point in pr(Hn) (or pr(Hm)) that's >ϵ KR-distance away from the other projection. Then you can pair that measure with some gigantic b term to get a point in Hn (or Hm, depending on which one you're picking from), and there'd be no point in Hm (or Hn) within ϵ distance of it, because the measure component would only be able to change by ϵ if you moved that far, and you need to change the measure component by >ϵ to land within Hm (or Hn).
Because this situation occurs infinitely often, it contradicts the Cauchy-sequence-ness of the Hn sequence, so the projections pr(Hn) must converge in Hausdorff distance on the space of measures over X. Further, they're precompact by the compact-projection property for the Hn (which are infradistributions), so their closures are compact. Further, the Hausdorff-limit of a series of compact sets is compact, so the Hausdorff limit of the projections pr(Hn) (technically, their closures) is a compact set of measures. Further, any sequence (mn,bn) which converges to some (m,b)∈H∞, has its projection being mn∈pr(Hn), which limits to show that m is in this Hausdorff limit. Thus, all points in H∞ project down to be in a compact set of measures, and we have compact-projection for H∞, which is the last condition we need to check to see if it's an infradistribution.
So, the Hausdorff-limit of a Cauchy sequence of infradistribution sets is an infradistribution set, and by the strong equivalence of the infra-KR metric and Hausdorff-distance, a Cauchy limit of the infra-KR metric must be an infradistribution, and the space □X is complete under the infra-KR metric.
Proposition 44: If a sequence of infradistributions converges in the IKR distance for one complete metric that X is equipped with, it will converge in the IKR distance for all complete metrics that X could be equipped with.
So, as a brief recap, X could be equipped with many different complete metrics that produce the relevant topology. Each choice of metric affects what counts as a Lipschitz function, affecting the infra-KR metric on infradistributions, as well as the KR-distance between a-measures, and the Hausdorff-distance. So, we need to show that regardless of the metric on X, a sequence of convergent infradistributions will still converge. Use d1 for the original metric on X and d2 for the modified metric on X, and similarly, dKR1 and dKR2 for the KR-metrics on measures, and dhaus1,dhaus2 for the Hausdorff distance induced by the two measures.
Remember, our infradistribution sets are closed under adding +b to them, and converge according to dhaus1 to the set H∞.
What we'll be doing is slicing up the sets in a particular way. In order to do this, the first result we'll need is that, for all b∗≥1, the set
{(mn,bn)∈Hn|bn≥b∗}
converges, according to dhaus1, to the set
{(m,b)∈H∞|b≥b∗}
So, here's the argument for this. We know that the projection sets
{mn|∃bn:(mn,bn)∈Hn}
are precompact, ie, have compact closure, and Hausdorff-limit according to dhau1 to the set
{m|∃b:(m,b)∈H∞}
(well, actually, they limit to the closure of that set)
According to our Lemma 3, this means that the set
{m|∃b≥0,n∈N∪{∞}:(m,b)∈Hn}
(well, actually, its closure) is a compact set in the space of measures. Thus, it must have some maximal amount of measure present, call that quantity λ⊙, the maximal Lipschitz constant of any of the infradistributions in the sequence. It doesn't depend on the distance metric X is equipped with.
Now, fix any ϵ. There's some timestep n where, for all greater timesteps, dhau1(Hn,H∞)≤ϵ.
Now, picking a point (mn,bn) in Hn with bn≤b∗−ϵ, we can travel ϵ distance according to dKR1 and get a point in H∞, and the b term can only change by ϵ or less when we move our a-measure a little bit, so we know that our nearby point lies in
{(m,b)∈H∞|b≤b∗}
But, what if our point (mn,bn) in Hn has b∗−ϵ≤bn≤b∗? Well then, we can pick some arbitrary point (mlon,0)∈Hn (by normalization for Hn), and go:
dKR1((mn,bn),ϵb∗(mlon,0)+(1−ϵb∗)(mn,bn))
=dKR1(ϵb∗(mn,bn)+(1−ϵb∗)(mn,bn),ϵb∗(mlon,0)+(1−ϵb∗)(mn,bn))
≤dKR1(ϵb∗(mn,bn),ϵb∗(mlon,0))+dKR1((1−ϵb∗)(mn,bn),(1−ϵb∗)(mn,bn))
=ϵb∗dKR1((mn,bn),(mlon,0))+(1−ϵb∗)dKR1((mn,bn),(mn,bn))
=ϵb∗dKR1((mn,bn),(mlon,0))=ϵb∗(dKR1(mn,mlon)+|bn−0|)
And then we have to be a little careful. bn≤b∗ by assumption. Also, we can unpack the distance to get
≤ϵb∗(supf∈C1−Lip(X,[−1,1])|mn(f)−mlon(f)|+b∗)
And the worst-case for distance, since all the measures have their total amount of measure bounded above by λ⊙, would be f being 1 on one of the measures and -1 on another one of the measures, producing:
≤ϵb∗(2λ⊙+b∗)
So, the distance from (mn,bn) to
ϵb∗(mlon,0)+(1−ϵb∗)(mn,bn)
according to dKR1 is at most 2ϵλ⊙b∗+ϵ
And then, because this point has a b value of at most
(1−ϵb∗)b∗
Because bn≤b∗, the b value upper bound turns into b∗−ϵ
Which is a sufficient condition for that mix of two points to be only ϵ distance from a point in H∞ with a b∗ upper bound on the b term, so we have that the distance from
{(mn,bn)∈Hn|bn≤b∗}
to
{(m,b)∈H∞|b≤b∗}
is at most
2ϵλ⊙b∗+ϵ+ϵ=2ϵ(λ⊙b∗+1)
Conversely, we can flip Hn and H∞, to get this upper bound on the Hausdorff distance between these two sets according to dhau1.
And, since b∗ and λ⊙ are fixed, and for any ϵ, we can find some time where the distance between these two "lower parts" of the Hn and H∞ sets is upper-bounded by 2ϵ(λ⊙b∗+1)
We can have this quantity limit to 0, showing that
limn→∞dhau1({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
For any b∗≥1.
Ok, this is part of our result. No matter which b∗ we chop off the infradistribution sets at, we get convergence of those chopped pieces according to dhau1.
Now, we'll need a second important result, that:
limb∗→∞dhau1({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
Now, we only have to establish one direction of low Hausdorff distance in the limit, that any point in the latter set is close to a point in the former set, because the former set is a subset of the latter set and has distance 0 to it.
What we can do is, because H∞ has the compact-projection property, the set {m|(m,b)∈H∞} is precompact, so for any ϵ, we can select finitely many points in it such that every point in {m|(m,b)∈H∞} is within ϵ distance of our finite subset according to dKR1. For these finitely many measures, there must be some b term associated with them where (m,b)∈H∞, so you can just take the largest one of them, and let that be your b∗. Then, all your finitely many measures, when paired with b∗ or any larger number, will be present in H∞, so
dhau1({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})<ϵ
Because all points in the latter set are close to one of finitely many points, which are all present in the former set, so the Hausdorff-1 distance must be low.
At this point, we can truly begin. We have produced the dual results:
∀b∗≥1:limn→∞dhau1({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
And
limb∗→∞dhau1({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
And we also know that, because Hn limits to H according to 1-Hausdorff distance, and projection is 1-Lipschitz,
limn→∞dhau1({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})=0
Now, here's the thing. (The closure of) all of these sets are compact. For instance,
{(mn,bn)∈Hn|bn≤b∗}
will always be compact, because any sequence in here must have a subsequence where its measure converges according to dKR1 (due to the compact-projection property applied to Hn), and then because bn is bounded in [0,b∗], we can pick out another convergent subsequence for that. Plus, it's the intersection of a closed set (Hn) and another closed set {(m,b|b≤b∗}, so it's closed. All sequences have a convergent subsequence and it's closed, so this set is compact. By identical arguments,
{(m,b)∈H∞|b≤b∗}
is compact. And for
{m|(m,b)∈H∞,b≤b∗}
it's the projection of a compact set from earlier arguments, and
{m|(m,b)∈H∞}
must be precompact by the compact-projection property, so it has compact closure. The exact same argument applies to
{mn|(mn,bn)∈Hn}
as well.
Now, for compact sets, convergence in Hausdorff-distance only depends on the topology of the underlying space, not the specific metric it's equipped with. Just as long as the metrics induce the same topology. And the weak topology on the space of measures, or on the space of a-measures, doesn't depend one bit on the metric that X is equipped with, just with the topology. So, the properties of these sets limiting to each other still works when X has its metric changed. Because, for measures/a-measures, we end up using the dKR2 metric, but that induces the same topology on the space of a-measures, so the compact sets still converge in the dhau2 metric. So, we still have our triple results of:
∀b∗≥1:limn→∞dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
And
limb∗→∞dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
And
limn→∞dhau2({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})=0
Now, here's how to argue that Hn limits to H∞ in dhau2. Fix some ϵ. From our limits above, there's some value of b∗ where
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
And for that value of b∗, and that ϵ, we have that there's some value of n where, for all greater numbers,
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
And
dhau2({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})≤ϵ
Now, we're going to need to go in two directions for this. First, we pick a point in Hn and show that it's close to a point in H∞. Second, we pick a point in H∞ and show it's close to a point in Hn.
Let (mn,bn)∈Hn. We have two possibilities. One possibility is that bn≤b∗. Then, because
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
we only have to go ϵ distance to get to H∞. The second possibility is that bn>b∗.
In this case, (mn,bn) lies in the set
[b∗,∞)×{mn|∃bn:(mn,bn)∈Hn}
Which has distance ≤ϵ from
[b∗,∞)×{m|∃b:(m,b)∈H∞}
Because we have that
dhau2({mn|(mn,bn)∈Hn},{m|(m,b)∈H∞})≤ϵ
Just scooch over and keep the b term the same. Additionally, the set
[b∗,∞)×{m|∃b:(m,b)∈H∞}
has distance ≤ϵ from the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
Because we have:
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
Further, the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
is a subset of H∞, because H∞ is upper-closed. So, either way, we only have to travel 2ϵ 2-distance from Hn to get to H∞
Now for the reverse direction, starting with a point (m,b)∈H∞ and getting to a nearby point in Hn. Again, we can split into two cases. In our first case, b≤b∗, and because
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
we only have to go ϵ distance to get to Hn. The second possibility is that b>b∗. In such a case, (m,b) would be guaranteed to lie in the set
[b∗,∞)×{m|∃b:(m,b)∈H∞}
which has distance ≤ϵ from the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
Because we have:
dhau2({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})≤ϵ
Further, the set
[b∗,∞)×{m|∃b≤b∗:(m,b)∈H∞}
has distance ≤ϵ according to dhau2 from the set
[b∗,∞)×{mn|∃bn≤b∗:(mn,bn)∈Hn}
Because the latter components are the projection of the sets
{(m,b)∈H∞|b≤b∗}
and
{(mn,bn)∈H∞|bn≤b∗}
And we already know that
dhau2({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})≤ϵ
So, given our point (m,b)∈H∞, we just have to go 2ϵ distance to get to the set
[b∗,∞)×{mn|∃bn≤b∗:(mn,bn)∈Hn}
And all points in this set lie in Hn because of upper completion.
Thus, given any ϵ, there's a tail of the Hn sequence where the Hn are all within 2ϵ distance (according to dhau2) of H∞, so if dhau1 thinks that Hn converge to H∞, dhau2 will think that as well. Further, the metric on X which induces dhau1 and dhau2 are arbitrary, so a sequence of infradistributions converging happens regardless of which complete metric X is equipped with.
Proposition 45: If a sequence of infradistributions hn converges to h in the infra-KR distance, then for all bounded continuous functions f, limn→∞hn(f)=h(f).
Now, the infra-KR metric is:
dIKR(h,h′)=supf∈Clip(X)|h(f)−h′(f)|max(Li(f),||f||,1)
So, to begin with, if hn converges to h, all bounded Lipschitz functions must have limn→∞hn(f)=h(f) or else the infra-KR distance wouldn't converge.
For the next two, since the infra-KR distance is strongly equivalent to Hausdorff distance, and we know that
∀n:{mn|∃bn:(mn,bn)∈Hn}
is always precompact, and they Hausdorff-limit to
{m|∃b:(m,b)∈H∞}
And we have our Lemma 3 that the union of compact-sets which Hausdorff-limit to something is compact, so the set
{m|∃b,n:(m,b)∈Hn}
is compact (well, actually precompact, but just take the closure).
Because compactness of a set of measures implies that the amount of measure doesn't run off to infinity, there's some λ⊙<∞ that's a shared Lipschitz constant for all the hn.
Also, any uniformly continuous function can be built as the uniform limit of Lipschitz-continuous functions from above and below, so given some uniformly continuous f, we can make a fhim sequence limiting to it from above, and a flom sequence limiting to it from below. Then, we have:
limn→∞hn(f)≤limn→∞hn(fhim)=h(fhim)
And similarly, we can get:
limn→∞hn(f)≥h(flom)
Now, regardless of m and n,
|hn(fhim)−hn(flom)|≤λ⊙⋅d(fhim,flom)
So, even though we don't necessarily know that the limit actually exists for hn(f), we at least know that all the values are bounded in an interval of known maximum size, which converges to the interval
[h(flom),h(fhim)]
Which, by monotonicity for h, h(f) lies in that interval.
So, all the limit points of the hn(f) sequence are in that interval. Now, as m gets unboundedly high, the difference between fhim and flom gets unboundedly small, so for gigantic m, we have that any limit points of the hn sequence must be in a really tiny interval. Taking the limit, we have that the interval crunches down to a single point, and hn(f) actually limits to h(f). We've shown it now for uniformly continuous functions.
Time to expand this to continuous functions in full generality. Again,
\{m|\exists b,n:(m,b)\in H_n\}
is precompact, so this implies that for all ϵ, there is a compact set Cϵ where all minimal points of Hn (regardless of the n! Even for the final infradistribution set H∞!) have <ϵ measure outside of that compact set.
Transferring to functionals, this means that for all the h_n (and h), C_{\eps} is an \eps-almost-support, and any two functions that differ on that set have expectations correspondingly close together.
Given some arbitrary f, let fm be identical to f on C1m, (ie, uniformly continuous on that compact set), and extend it in an arbitrary uniformly continuous way to all of X while staying in [−||f||,||f||], by the Tietze Extension Theorem.
Regardless of the n, since C1m is a 1m-almost-support for hn, we have that
|hn(f)−hn(fm)|≤2||f||m
Why? Well, f and fm are identical on a 1m-almost support for hn, so the magnitude of their difference is proportional to 1m, and the maximum level of difference between the two, and f and fm are both in [−||f||,||f||], so they can differ by at most twice that much. The same result extends to the limit h itself.
Because ||f|| is bounded, and n is arbitrary, we have that hn(fm) limits to hn(f) uniformly in n.
Now, we can go:
limn→∞hn(f)=limn→∞limm→∞hn(fm)
And now, to invoke the Moore-Osgood theorem to swap the two limits, we need two results. One is that, for all m,
limn→∞hn(fm)=h(fm)
(which is true because fm was selected to be uniformly continuous).
The second result we need is that for all n,
limm→∞hn(fm)=hn(f)
uniformly in n. Which is true. So, we can invoke the Moore-Osgood theorem and swap the two results, to get
=limm→∞limn→∞hn(fm)
=limm→∞h(fm)=h(f)
So, we have our final result that
limn→∞hn(f)=h(f)
For all continuous bounded functions f, and we're done.
Proposition 46: A set of infradistributions {hi}i∈I is precompact in the topology induced by the IKR distance iff:
1:There's an upper bound on the Lipschitz constant of all the infradistributions in the set
2: There's a sequence of compact sets Cϵ, one for each ϵ, that are compact ϵ-almost-supports for all infradistributions in the set.
3: The set of infradistributions is b-uniform.
This proof will proceed in three phases. The first phase is showing that compactness implies conditions 1 and 2. The second phase is showing that a failure of condition 3 permits you to construct a sequence with no convergent subsequence, so a failure of condition 3 implies non-precompactness, and taking the contrapositive, precompactness implies condition 3. That gets us one half of the iff implication, that precompactness implies the three conditions. For the second half of the iff implication, we assume the three conditions, and construct a convergent subsequence.
So, for our first step, due to working in Hausdorff spaces, we can characterize precompactness as "is a subset of a compact set"
Also, the projection mapping of type
C(M+(X)×R≥0)→K(M+(X))
Which takes a closed set of a-measures (an infradistribution) and projects it down (and takes the closure) to make a compact set of measures (by the compact-projection property), is Lipschitz (projection of sets down to one coordinate keeps their Hausdorff-distance the same or contracts it), so it's continuous. So, a compact set of infradistributions (because the infra-KR metric is strongly equivalent to the Hausdorff-distance), would get mapped to a compact set of sets of measures (because the image of a compact set is compact), which by Lemma 3, unions together to make a compact set of measures.
Doing the same process (taking your precompact set of infradistributions, mapping it through the projection, unioning together all the sets) makes a subset of that compact set of measures, so it's precompact.
Also, the necessary-and-sufficient condition for precompactness of a set of measures is that: There be a maximum amount of measure present, and for all ϵ there is a compact set Cϵ⊆X where all the measures assign ≤ϵ measure outside of that compact set.
So, if you take a precompact set of infradistributions, all the measure components of points in any of them have a uniform upper bound on the amount of measure present, and we also have the shared compact almost-support property. So, precompactness implies conditions 1 and 2.
Time for phase 2 of our proof, showing that a failure of condition 3 implies that there's a sequence from it with no convergent subsequence in the KR-metric.
Assume, for contradiction, that we indeed have a precompact set which fails condition 3. Using I to index your set of infradistributions, Condition 3 is:
∀ϵ>0∃b∗∀i:dhau(Hi,Hb∗i)≤ϵ
Where Hb∗i is the set formed from the set Hi by deleting all points with b>b∗ and taking the upper completion again. Negating this, we see that the set of infradistribution sets Hi failing this condition is stated as:
∃ϵ>0∀b∗∃i:dhau(Hi,Hb∗i)>ϵ
So, let ϵ0 be your ϵ of choice, and let Hn be the infradistribution Hi such that dhau(Hi,Hni)≥ϵ0.
Because we're assuming that this sequence of infradistributions was selected from a precompact set, we have a guarantee that the sequence Hn has a convergent subsequence limiting to some H∞. We'll still be using n as our limiting variable, hopefully this doesn't cause too much confusion.
Now, from our earlier proof of Proposition 44, we can crib two results from the proof. From this proof, we know that because Hn limits to H∞ in Hausdorff-distance,
limb∗→∞dhau({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
and also,
limn→∞dhau({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
For any b∗≥1. To craft this into a more usable form, we can realize that for all b∗, Hb∗∞⊆H∞
So the distance from the former set to the latter set is 0. Also, any point in H∞ can be written as (m,b). Either b≤b∗, in which case the same point is present in Hb∗∞ and the distance to enter that set is 0, or b>b∗, in which case the m component is present in {m|(m,b)∈H∞}, and from
limb∗→∞dhau({m|(m,b)∈H∞,b≤b∗},{m|(m,b)∈H∞})=0
For large b∗, you just have to adjust the m component a little bit to m′ and then you know there's some (m′,b′)∈H∞,b′≤b∗, so by upper completion, (m′,b)∈Hb∗∞, and this point is close to (m,b).
We took a point in Hb∗∞ and showed it's in H∞ (trivially), and took a point in H∞ and showed there's a nearby point in Hb∗∞, so we have our modified result that:
limb∗→∞dhau(Hb∗∞,H∞)=0
For another modified result, due to the fact that we know
limn→∞dhau({(mn,bn)∈Hn|bn≤b∗},{(m,b)∈H∞|b≤b∗})=0
We can take any point in Hb∗n, descend to a point in Hn (but cut off at b∗), shift over a bit to get to H∞ (but cut off at b∗), and add the same amount of b value to this point as you took off, to make a point in Hb∗∞ that's nearby to the point you started with, and flip the two sets, to argue that
∀b∗:limn→∞dhau(Hb∗n,Hb∗∞)=0
Now, here's what you do from here. We know our ϵ0 value. Because of the fact that
limb∗→∞dhau(Hb∗∞,H∞)=0
we can identify some finite b∗ value (call it b0) where, for it and all greater values,
dhau(Hb0∞,H∞)<ϵ03
Locking this value in, and because of
∀b∗:limn→∞dhau(Hb∗n,Hb∗∞)=0
and Hn limiting to H∞, so
limn→∞dhau(Hn,H∞)=0
We can find some finite n where, for all greater values,
dhau(Hb0n,Hb0∞)<ϵ03
and
dhau(Hn,H∞)<ϵ03
There's one last thing to note. The sequence Hn was selected as a subsequence of a sequence of infradistributions selected so that the Hausdorff-distance between an infradistribution and its truncation of minimal points at a certain b value was always ϵ0 or more.
Accordingly let ρ(n) be the value of the cutoff for Hn (ie, the index of Hn before we did the reindexing when we passed to a subsequence). Due to our construction process for the Hn, we have that:
∀n:dhau(Hn,Hρ(n)n)≥ϵ0
Further, ρ(n) diverges to infinity, so there's some n where ρ(n)≥b0. Because, for that n, Hb0n⊆Hρ(n)n⊆Hn, we have that dhau(Hb0n,Hn)≥dhau(Hρ(n)n,Hn).
Taking stock of all we have, we know that there is some n where:
dhau(Hb0∞,H∞})<ϵ03
and
dhau(Hb0n,Hb0∞)<ϵ03
and
dhau(H∞,Hn)<ϵ03
and
dhau(Hb0n,Hn)≥dhau(Hρ(n)n,Hn)
and, by our construction process for the Hn sequence,
dhau(Hρ(n)n,Hn)≥ϵ0
So now we can go:
ϵ0≤dhau(Hρ(n)n,Hn)≤dhau(Hb0n,Hn)
≤dhau(Hb0n,Hb0∞)+dhau(Hb0∞,H∞)+dhau(H∞,Hn)<ϵ03+ϵ03+ϵ03=ϵ0
But we just showed ϵ0>ϵ0, a contradiction. Our one assumption that we made was that there could be a set of infradistributions that was both precompact and that failed to meet the shared b-uniformity condition. Therefore, if a set of infradistributions is precompact, it must fulfill the shared b-uniformity condition.
Because we've shown that precompactness implies a Lipschitz bound and shared compact-almost-support in part 1 of the proof, and that precompactness implies the shared b-uniformity condition, we have one direction of our iff statement. Precompactness implies these three properties.
Now we'll go in the other direction and establish that if these three properties are fulfilled, then every sequence of infradistributions has a convergent subsequence.
So, let's say we have some set of infradistributions Hi that fulfills the following three properties:
∃λ⊙∀i,(m,b)∈Hi:m(1)≤λ⊙
(this is bounded Lipschitz constant)
∀ϵ∃Cϵ∈K(X)∀i,(m,b)∈Hi:m(X/Cϵ)≤ϵ
(this is shared almost-compact-support)
∀ϵ∃b∗∀i:dhau(Hi,Hb∗i)≤ϵ
(this is the b-uniformity condition)
Note that Hb∗i is Hi but you chop off all the points in it with b≥b∗ and regenerate it via upper-completion.
First, the compact almost-support condition and bounded amount of measure (and closure) are necessary-and-sufficient conditions for a set of measures to be compact. Thus, letting ΔC,λ be defined as:
{m∈M+(X)|∀ϵ:m(X/Cϵ)≤ϵ∧m(1)≤λ⊙}
(ie, measures where the measure outside of the compact set Cϵ is ϵ or less, for all ϵ, and the amount of measure is upper-bounded by λ⊙, where that sequence of compact sets and measure upper bound came from the relevant sequence of compact sets and measure upper bound on the set {Hi|i∈I}, from the fact that we assumed a Lipschitz upper bound and shared compact-almost-support for it).
We know that ΔC,λ is a compact set. All the measure components of all the points in all the Hi lie in this set. Thus, all sets Hi can be thought of as being a subset of the space ΔC,λ×R≥0
In particular, all our Hn (from our arbitrily selected sequence) are a subset of this space.
Now, here's what we do. Fix any m≥1. From the b-uniformity condition on the Hi, there is some quantity bm where
∀i:dhau(Hi,Hbmi)≤1m
What we're going to do is find a subsequence of the Hn sequence where the Hbmn sequence converges in Hausdorff-distance.
Here's how to do it. We can take each Hn and chop it off at a b value of bm, to make a closed set (Hbmn)′ which is a subset of ΔC,λ×[0,bm]
Which, being a product of two compact sets, is compact. Further, the space of compact subsets of a compact space (equipped with a Hausdorff distance-metric) is compact. So, we can isolate some subsequence where the (Hbmn)′ sets converge in Hausdorff-distance. If sets converge in Hausdorff-distance, their upper completions do too, so we have isolated a subsequence of our Hn sequence where the sets Hbmn converge in Hausdorff-distance. Also, each Hbmn infradistribution set is only 1m Hausdorff-distance away, at most, from the corresponding Hn. So, for sufficiently large n, the Hn subsequence we picked out is all wandering around in a ball of size 2m.
Now, here's what we do. Start with your Hn sequence. Use the argument we described above for m=1 to isolate a subsequence where the Hausdorff-distance of the subsequence eventually is wandering around in a ball (w.r.t. Hausdorff-distance) of size 2 in the tail. Now, use the argument for m=2 to isolate a subsequence of that wandering around in a ball (w.r.t. Hausdorff-distance) of size 1 in the tail. And, y'know, repeat for all finite m, to get a subsequence embedded in all previous subsequences which, in the tail, is wandering around in a ball of size 2m in the tail.
Now build one final subsequence, which takes the first element of the m=1 subsequence, the second element of the m=2 subsequence, the third element of the m=3 subsequence, and so on. It eventually enters the tail of the sequence for all finite m, so, regardless of m, the tail of that sequence starts wandering around in a ball of size 2m. Thus, the sequence is actually Cauchy, and must converge, as we've previously shown that the space □X is complete in the KR/Hausdorff metric.
Assuming the three conditions on a set of infradistributions has let us show that every sequence has a convergent subsequence, and thus must be precompact, so we have the reverse direction of our iff statement and we're done.
Proposition 47: When X is a compact Polish space, the spaces of cohomogenous, crisp, and sharp infradistributions are all compact in □X equipped with the infra-KR metric.
So, from Proposition 46, necessary-and-sufficient conditions for a set of infradistributions to be compact is:
1: Bounded Lipschitz constant/bounded amount of measure on minimal points. 1-Lipschitz, C-additive, cohomogenous, crisp, and sharp infradistributions fulfill this because of their iff minimal point characterizations.
2: Shared compact almost-supports. X is compact by assumption, and it's the whole space so it must be a support of everything, and thus an ϵ-almost-support of everything, so this is trivially fulfilled for all infradistributions when X is compact.
3: b-uniformity. Homogenous, cohomogenous, crisp, and sharp infradistributions fulfill this because they all have their minimal points having b≤1, and the condition is "there's gotta be some b value you can go up to in order to have a guarantee of being within ϵ of the full H set in Hausdorff-distance if you delete all the minimal points with a higher b value, for all ϵ".
Thus, cohomogenous, crisp, and sharp infradistributions fulfill the necessary-and-sufficient conditions for precompactness, and all we need is to check that the set of them is closed in the KR-metric.
To do this, we'll invoke Proposition 45, that: If a sequence of infradistributions hn converges to h in the infra-KR distance, then for all bounded continuous functions f, limn→∞hn(f)=h(f).
The characterization for cohomogenity was that h(1+af)=1−a+ah(1+f) So, we can go:
h(1+af)=limn→∞hn(1+af)=limn→∞1−a+ahn(1+f)
=1−a+alimn→∞hn(1+f)=1−a+ah(1+f)
Showing that the limit of cohomogenous infradistributions is cohomogenous, and we've verified closure, which is the last property we needed for cohomogenity.
The characterization for crispness was that: h(c+af)=c+ah(f) for c∈R,a≥0. To show it's preserved under limits, we can go:
h(c+af)=limn→∞hn(c+af)=limn→∞c+ahn(f)=c+alimn→∞hn(f)=c+ah(f)
Showing that the limit of crisp infradistributions is crisp, and we've verified closure. Sharpness is a bit more tricky.
Let's say a sequence of sharp infradistributions hn limits to h, and all the hn are associated with the compact set Cn⊆X. The minimal points of the hn consist of all probability distributions supported over Cn, with a b value of 0. Thus, all the Hn sets can be written as ΔCn×R≥0, and so, if they converge in Hausdorff-distance, then the sets of probability distributions ΔCn must converge in Hausdorff-distance, which is impossible if Cn don't converge in Hausdorff-distance, because the dirac-delta distributions on points in the Cn sets can transport a failure of Hausdorff-convergence of the Cn sets up to a failure of Hausdorff-convergence of the ΔCn sets of probability distributions.
Thus, the Cn converge to a compact set C∞ in Hausdorff-distance.
We also know that, because sharp infradistributions are crisp infradistributions, and crisp infradistributions are preserved under limits, all we have to check is if the minimal points of H∞ consist exactly of all probability distributions supported over C∞. Now, ΔC∞ is the closed convex hull of all the dirac-delta distributions on points in C∞, and all those points have a sequence from the Cn that converge to them, so the associated dirac-delta distributions converge and witness that all the dirac-delta distributions on points in C∞ are present in the set H∞. So, because infradistribution sets are closed and convex, all of ΔC∞ must be present as minimal points in H∞. Now we just need to rule out the presence of additional points.
Let's say we've got some probability distribution μ∈H∞ which is not supported entirely on C∞, there's ϵ probability mass outside that set. Because probability distributions in Polish spaces have the property that the probability on open supersets of the set of interest can be shrunk down to have arbitrarily similar measure, we can find some open superset of C∞, call it O, which has ϵ2 probability mass outside of it. Any point outside of O must be some δ distance away from C∞, because otherwise, you could pick a sequence of points in C∞ which gets arbitrarily close to the (closed) complement of O, find a convergent subsequence since C∞ is compact, and you'd have a limit point which is in C∞ (due to closure) and also in the complement of O (due to getting arbitrarily close to said closed set), disproving that the two sets are disjoint (because O is a superset of C∞)
Ok, so our hypothetical "bad" probability distribution has ϵ2 probability measure at a distance of δ or more from our set of interest, C∞. The KR distance is equivalent to the earthmover distance, which is "how much effort would it take to move this pile of dirt/pile of probability mass into the other distribution/pile of dirt".
All minimal points in H∞ must have a sequence of minimal points in Hn limiting to them, because it's the Hausdorff-limit of those infradistributions. So, we've got some sequence μn limiting to our hypothetical bad distribution μ∉C∞, but all the μn lie in ΔCn.
There is some n value where dKR(μ,μn)<δϵ4, and also where dhau(C∞,Cn)<δϵ4. Now, we can get something really interesting.
So, we agree that μ has ϵ2 probability mass a distance of δ or more away from the set C∞, right? This means that the earthmover distance from μ to any point in ΔC∞ must be δϵ2 or more, because you've gotta move ϵ2 measure a distance of δ at the very least.
However, the earthmover distance from μ to μn is strictly below δϵ4, and because μn∈ΔCn, it's only got an earthmover distance of less than δϵ4 to go to arrive at a probability distribution in ΔC∞, because all dirt piled up in Cn is only δϵ4 distance away from C∞. So, the distance from μ to ΔC∞ is only
<δϵ4+δϵ4=δϵ2
distance. But we know it's impossible for it to be any closer than δϵ2 distance from that set, so we have a contradiction, and no such μ can exist in H∞. Thus, Hmin∞ has all the probability distributions over C∞ and nothing else, so the limit of sharp infradistributions is sharp, and we're done.