Part 6: Showing the basic four infradistribution properties: Normalization, monotonicity, concavity, and Lipschitzness (specifically 1-Lipschitzness). For normalization, observe that regardless of n,
This is because, since f′≥f, the "extend with worst-case outputs" function is bigger for f′ than f, and then by monotonicity for K:n(x0), the inequality transfers to the outside.
This was by monotonicity (minimizing over the two parts separately produces lower values, and monotonicity transfers that to the outside), and concavity respectively, because K:n(x0) is an infradistribution. And now:
Then we'd be able to pair that with the 1-Lipschitzness of K:n(x0) to get our desired result. So let this be our new proof target. Reexpressing it a bit, it's:
Now, if two functions are only d(f,f′) apart, then their minimal values over a compact set can only be d(f,f′) apart at most. Let your compact set be ∏i=n+1i=1{xi}×∏∞i=n+2Ci to see the result. Thus, we have proved our proof target and we have the result that, for all n,f,f′
Since they're always only d(f,f′) apart, the same property transfers to the limit, so we get:
|K:∞(x0)(f)−K:∞(x0)(f′)|≤d(f,f′)
And so, K:∞(x0) is 1-Lipschitz for all x0. This takes care of the Lipschitzness condition on infradistributions and the uniform Lipschitz bound condition on infrakernels. All that's left is the compact almost-support condition on infradistributions and the compact-shared compact almost-support condition on infrakernels, and the pointwise convergence condition on infrakernels.
Part 7: This will use our result from Part 4 about making a suitable compact set. Our desired thing we want to prove is that
Because this property is exactly the compact-shared compact-almost-support property for K:∞.
So, here's our plan of attack. Fix our CX0 and ϵ, so our target is
∃Cϵ⊆∏∞i=1Xi∀x0∈CX0f,f′∈CB(∏∞i=1Xi):
f↓Cϵ=f↓Cϵ→|K:∞(x0)(f)−K:∞(x0)(f′)|≤ϵd(f,f′)
Let our Cϵ be C1:∞[CX0,ϵ], as defined from Part 4, and let x0 be arbitrary in CX0 and f,f′ be arbitrary and agree with each other on C1:∞[CX0,ϵ]. Then our goal becomes
|K:∞(x0)(f)−K:∞(x0)(f′)|≤ϵd(f,f′)
Now, letting our chosen defining sequence be Ci[CX0,ϵ], our goal is to show
If we could do this, all the approximating points being only ϵ away from each other shows that the same property transfers to the limit and we get our result. Accordingly, let n be arbitrary, and our proof goal is now:
Then we could conclude the two functions were equal on the set C1:n+1[CX0,ϵ], which, from Part 4, is an ϵ-almost support for all the K:n(x0) where x0∈CX0, and it would yield our result. So our proof target is now
However, f and f′ are equal on the set C1:∞[CX0,ϵ], which breaks down as C1:n+1[CX0,ϵ]×Cn+2:∞[CX0,ϵ]. And we know that x1:n+1∈C1:n+1[CX0,ϵ], and since the second part is being selected from Cn+2:∞[CX0,ϵ], the two values are equal, and we've proved our proof target. Thus, we have compactly-shared compact almost-support, and that's the second condition for K:∞ to be an infrakernel, as well as the last condition needed for all the K:∞(x0) to be infradistributions. Just pointwise convergence left!
Part 8: Alright, for showing pointwise convergence, we'll need a lemma. We need to show:
This is roughly saying that the rate of convergence of K:n(x0)(f) to K:∞(x0)(f) is uniform over compact sets. Let's begin. Let CX0,¯¯¯¯C,ϵ,f be arbitrary, so our proof target is now:
Here's what we'll do. Because we have a CX0 and ϵ, we can craft our sequence Ci[CX0,ϵ] of compact ϵ-supports. Now, on the set:
∏∞i=1(Ci[CX0,ϵ]∪Ci)
which is compact, we know that f is uniformly continuous, so there's some δ you need to go to to ensure that function values are only ϵ away from each other, which translates into an n via n:=log2(δ). Any two inputs which only differ beyond that point will be only δ apart and can only get an ϵ-difference in values from f. Now that that's defined, let m and x0∈CX0 be arbitrary. Our proof target is now to show:
Then, because C1:n+m+1[CX0,ϵ] is an ϵ-almost-support for K:n+m(x0) as long as x0∈CX0 (which we know to be the case), we could apply Lemma 2 to get our desired proof target.
The "Lipschitz constant times deviation over compact set" part would make one ϵ, and the "degree of almost-support times difference in the two functions" part would make 2ϵ||f|| in the worst-case due to being an ϵ-almost-support and the worst possible case where the two functions differ by 2||f|| somewhere.
Thus, let x1:n+m+1 be arbitrary within the appropriate set, so our proof target is now:
|infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
−infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞)|≤ϵ
Because we're minimizing over a compact set in both instances, we can consider the infs to be attained by a xn+2:∞ and a xn+m+2:∞, so our proof target is now
|f(x1:n+1,xn+2:∞)−f(x1:n+m+1,xn+m+2:∞)|≤ϵ
Now, because x1:n+1∈C1:n+1[CX0,ϵ], and xn+2:∞∈∏∞i=n+2Ci, we have:
(x1:n+1,xn+2:∞)∈C1:n+1[CX0,ϵ]×∏∞i=n+2Ci
=∏i=n+1i=1Ci[CX0,ϵ]×∏∞i=n+2Ci⊆∏∞i=1(Ci[CX0,ϵ]∪Ci)
And also, because x1:n+m+1∈C1:n+m+1[CX0,ϵ], and xn+m+2:∞∈∏∞i=n+m+2Ci,
So, both inputs lie in the relevant compact set, and they agree on x1:n+1, so the inputs are only δ apart, so they only have value differing by ϵ, and we're done, our desired result of
Part 9: Now, for showing our pointwise continuity property, fix a sequence x0,m which limits to x0,∞, and an arbitrary f. Our task is now to show that:
limm→∞K:∞(x0,m)(f)=K:∞(x0,∞)(f)
Fixing some arbitrary Ci sequence, we could do this if we could show that
uniformly inm. However, back in Part 8 we established uniform convergence on any compact set. And {x0,m}m∈N∪{∞} is a compact set! So we have uniform convergence and can invoke Moore-Osgood to swap the limits and get our proof target, showing our result, that
limm→∞K:∞(x0,m)(f)=K:∞(x0,∞)(f)
So, K:∞ fulfills pointwise convergence and we've verified all properties needed to show it's an infrakernel.
Part 10: Showing that if all the K:n have some nice property, then K:∞ inherits it too. Homogenity, Cohomogenity, C-additivity, crispness, and sharpness.
Proof of crispness preservation: Crispness is equivalent to the conjunction of C-additivity and homogenity and both of those are preserved.
Proof of sharpness preservation: So, this will take a bit of work because we do have to get a fairly explicit form for the set you're minimizing over. Remember that when h and K are crisp, with associated compact sets Ch and CK(x), then the compact minimizing set for h⋉K is ⋃x∈ChCK(x), from the proof of sharpness preservation for semidirect product. Another way of writing this minimizing set of the semidirect product is as the set
{(x,y)|x∈Ch,y∈CK(x)}
Now, we'll give a form for the associated compact set of K:n(x0). It is:
{x1:n+1|∀i≤n:xi+1∈CKi(x0,x1:i)}
For the base case, observe that when n=0, it's
{x1|x1∈CK0(x0)}=CK0(x0)
which works out perfectly. For the induction step, because
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And we are, by induction, assuming the relevant form for the associated compact set of K:n(x0), and know how to craft the minimizing set for semidirect products, putting them together gets us:
And so, we have the proper set form for the compact minimizing sets associated with all the sharp infradistributions K:n(x0).
Our conjectured set form for the infinite semidirect product would be:
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
Let's call the projections of this to coordinate i as C∞i. All of these are compact, because it yields the exact same result as projecting
{x1:i|∀j≤i−1:xj+1∈CKj(x0,x1:j)}
Down to coordinate i, and we know this is compact because it's the compact set associated with K:i−1(x0), and projections of compact sets are compact. Note that since there's a dependence on earlier selected points, it isn't as simple as our CK:∞(x0) set being a product. But we do at least have the result that
CK:∞(x0)⊆∏∞i=1C∞i
So, since it's a subset of a compact set (product of compact sets), it's at least precompact. All we need is to verify closure to see that CK:∞(x0) is compact.
Fix a convergent sequence xm1:∞ limiting to x∞1:∞, and all the xm1:∞ lie in
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
So, if this arbitrary convergent sequence didn't have its limit point be in the set, then we could conclude that:
∃n:x∞n+1∉CKn(x0,x∞1:n)
However, we could imagine projecting our set (and convergent sequence) down to coordinates 1-through-n+1 and we'd still get that exact same failure of closure, but in the set
{x1:n+1|∀i≤n:xi+1∈CKi(x0,x1:i)}
Ie, the compact set for K:n(x0), which, due to being compact, is closed, and it does contain that limit point, getting a contradiction, Therefore,
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
Is indeed a compact subset of ∏∞i=1Xi.
Yes, but can K:∞(x0)(f) be expressed as minimizing over this set? Remember, we're using C∞i as an abbreviation for the projection of this set to coordinate i, and let's abbreviate the set as a whole as C∞1:∞, and the projections of it to coordinates 1-through-n+1 as C∞1:n+1 which is the same as the minimizing compact set associated with K:n(x0) Then:
∏∞i=1C∞i is a compact set, so f is uniformly continuous over it so there's always some huge n where two inputs agreeing perfectly on the first n coordinates will produce values extremely close together. Then we can go:
And because C∞1:n+1×∏∞i=n+2C∞i as a set agrees with C∞1:∞ on more and more coordinates as n increases (which we know drives f closer and closer to its true minimum value), and is constantly narrowing down, we have a monotonically increasing sequence, with limit:
infx1:∞∈C∞1:∞f(x1:∞)
And thus, all the K:∞ are sharp. We're finally done!
Proposition 23:If all the Kn are C-additive, thenpr∏i=n+1i=0Xi∗(h⋉K:∞)=h⋉K:n
And then, since there's no dependence on xm+2, it's treated as a constant, and due to C-additivity of Km+1, we can pull it out of the Km+1(x0,x1:m+1) to yield
=K:m(x0)(λx1:m+1.f(pr∏i=n+1i=0Xi(x0,x1:m+1)))
And then, because adding any further coordinates would get clipped away by the projection, we can minimize over further coordinates, it does nothing, to get
For further progress, we need to put in some work on characterizing how infrakernels work. We'll be shamelessly stealing a result about the infra-KR metric from proofs in the future of this post, because there's no cyclic dependencies that show up.
Lemma 4:For an infrakernel K:Xik→Y, if xn limits to x, then if you let K(xn)b∗ be the set {(m,b)∈K(xn)|b≤b∗} thenlimn→∞K(xn)b∗=K(x)b∗.
Interestingly enough, the only result we need to steal from later-number propositions is that, if a sequence of infradistributions Hn limits to H in Hausdorff-distance, then for all functions f, Hn(f) limits to H(f), which is proposition 45.
We'll be working backwards by taking our goal, and repeatedly showing that it can be shown if a slightly easier result can be shown, which eventually grounds out in something we can prove outright. So, our goal is: limn→∞K(xn)b∗=K(x)b∗
Now, we want b∗ to be an arbitrary integer. Let dhau∘(H,H′) (distance metric between two infradistribution sets) be defined as:
∑b≥22−bdhau(Hb,H′b)
Where dhau is the Hausdorff-distance between sets (the two truncated sets cut off at some +b value), according to some KR-metric on measures. Which is induced by a metric on X. It doesn't really matter which metric you use, any one will work.
Now, assuming you had the result:
limn→∞dhau∘(K(xn),K(x))=0
Then, by the definition of that distance metric, and b∗ being an integer, said distance converging to 0 would imply your desired
limn→∞K(xn)b∗=K(x)b∗
by the definition of the modified Hausdorff metric. So, let's try to prove that the modified Hausdorff distance limits to 0. The best way to do this is by assuming that it actually doesn't limit to 0, and deriving a contradiction. So, let's change our proof goal to try to derive bottom, and assume that there is some ϵ and some subsequence of xn where K(xn) always stays ϵ away from the set K(x) according to the modified Hausdorff metric.
At this point, given any value of b≥2, we can notice that K(xn)b, due to the compact-almost-support condition on our compact collection of infrakernels (the K(xn) sequence), has, for all ϵ, some component set Cϵ⊆Y where all the measure components of the K(xn) lie in a compact set. Also, due to the Lipschitz-boundedness condition, there's an upper bound on the amount of measure present. This is a necessary-and-sufficient condition for the measure components of the infrakernel family K(xn) to lie in a compact set of measures. Further, the b upper bound means that the last avenue to a failure of compactness is closed off, and all the K(xn)b, lie in some compact set of a-measures. Call the set that all the K(xn)b lie in Kb. Closure of the K(xn)b sets, and being a subset of a compact set means that they're all compact. They can then be considered as points in the space K(Kb), the space of compact subsets of Kb, which, being the space of compact subsets of a compact set equipped with the Hausdorff distance metric, is compact.
By compactness, we can pick yet another subsequence which converges in K(Kb), and we then get that the K(xn)b converge on some subsequence. This argument always works, so we can find a subsequence where the K(xn)2 converge in Hausdorff-distance, and then a subsequence of that where the K(xn)3 converge in Hausdorff-distance, and so on, and take a diagonal (first element from first subsequence, second element from second subsequence, etc..)
And so, we eventually arrive at a subsequence of the K(xn) where, regardless of b, K(xn)b is a Cauchy sequence.
So, we can find a subsequence where the K(xn)b converge, regardless of what b is. Therefore, since
dhau∘(H,H′)=∑b≥22−bdhau(Hb,H′b)
We have that on the subsequence, K(xn) is a Cauchy sequence according to dhau∘. Also, all the K(xn) are staying ϵ away from K(x), according to dhau∘.
Assuming, hypothetically, we were able to show that our Cauchy K(xn) sequence actually converges to K(x), we'd have our contradiction.
What we'll do is show that the K(xn) sets do have a set, which we'll dub K∞, that they do limit to according to dhau∘, then we'll show that said set must agree with K(x) re: the expectations of every function, and must be an identical infradistribution, so our convergent subsequence which stays away from K(x) actually limits to it and we have a contradiction.
So, let's specify the sets. For each b≥2, let Kb∞ be limn→∞K(xn)b. (For our convergent subsequence, this set is always well-defined)
What we want to show is that:
(⋃b′Kb′∞)b=Kb∞
Ie, our proposed limit set of the convergent K(xn) sequence (according to dhau∘) is ⋃b′Kb′∞. And if we can show that chopping this set off at any b value in particular makes the limit of the chopped-off sets, then the modified Hausdorff-metric limits to 0, and this is indeed the limit point of our K(xn) sequence according to the modified Hausdorff-metric.
One direction of this is very easy, we trivially have
(⋃b′Kb′∞)b⊇Kb∞
For the other direction, fix a particular b where equality fails. Then there exists some point(m,b′′)∈(⋃b′Kb′∞)b where it doesn't lie in Kb∞. From this, we know that b′′≤b and there is some b′ where (m,b′′)∈Kb′∞, so we also have b′′≤b′. Now, if b′′<b, and yet (m,b′′)∈Kb′∞, then there is some Cauchy sequence from the K(xn)b′ that limits to (m,b′′), and eventually the bn terms of this sequence will drop below b itself, and they will start all being present in K(xn)b, giving you a sequence of points in that sequence of sets that limits to (m,b′′), witnessing that said point lies in Kb∞, the limit of K(xn)b. However, what if b′′=b? Then, there's a Cauchy sequence from the K(xn)b′ that limits to (m,b′′), and eventually the bn terms of the sequence will approach b itself, and then mixing a little tiny bit with some point in K(xn) with b=0, to make a point nearby which still undershoots the cutoff, and this sequence still limits to (m,b′′), again witnessing that said point lies in Kb∞. That's both cases down, so we've shown
(⋃b′Kb′∞)b=Kb∞
For all the b. Accordingly, we now know that the K(xn) sequence limits to this set.
Now, we just have to show that said limit set equals K(x), in order to derive our desired contradiction. We do this by, letting f be arbitrary, and λ⊙K be the Lipschitz constant of the infrakernel,
(⋃bKb∞)(f)=(⋃b′Kb′∞)2λ⊙K||f||(f)
=K2λ⊙K||f||∞(f)=limn′→∞K2λ⊙K||f||(xn′)(f)
=limn′→∞K(xn′)(f)=limn→∞K(xn)(f)=K(x)(f)
So, we've got six equalities. Equality 2 is what we just showed, equality 5 is because the limit for a subsequence is the same as the limit for the original sequence, and equality 6 comes from the pointwise-convergence property for infradistributions, so that leaves equalities 1, 3, and 4. Equalities 1 and 4 can be dispatched by observing that there's some fixed λ⊙K upper-bound on the Lipschitz constant/amount of measure present in the infradistribution points, so if there was a minimizing point for the expectation of f in any of these infradistributions with b≥2λ⊙K||f||, the expectation value of f would be so high it'd violate the Lipschitz bound on the infradistribution. Thus, clipping off the b value at this height doesn't change the expectation of the function f. Finally, there's equality 3, which is addressed with Proposition 45, because the upper completions of said sets limit to each other in Hausdorff-distance.
We have now derived a contradiction, so our desired result of limn→∞K(xn)b∗=K(x)b∗ for all b∗ holds. We'll be using this.
So, we should be precise about this. We start with a (m,b)∈H where m is actually a measure. Then we fix a selection function mapping x to a point in K(x). Due to X and the cone of a-measures being separable, weak measurability and strong measurability coincide, as do the Bochner and Pettis integrals, so we can just talk about "measurability" and "the integral". Said choice function, due to lying in L1(X,m,M±(Y)⊕R) is measurable (in this case, we're equipping M±(Y)⊕R with the Borel σ-algebra). Further, said selection function doesn't "run off to infinity", ie, ∫X||s(x)||dm<∞. This measurability and being L1 integrable in a suitable sense means the Bochner integral is well defined, so ∫Xs(x)dm is well-defined and does indeed denote a unique point.
Now that we know how the selection function behaves, we should clarify what m⋉s means. Semidirect products are technically only defined when s is a Markov kernel. But we're working with measures, so we can lift the restriction that for all x, s(x) is a probability distribution. So, we need to just verify the first measurability condition on the Wikipedia page for "Markov kernel" instead. But wait, s(x) needs to be a measure for this to work! Instead it's an element of M±(Y)⊕R instead! Well... we can consider that last coordinate to be "amount of measure on a special disconnected point", and M±(Y)⊕R is then isomorphic to M±(Y+1). Taking this variant, the semidirect product (assuming the measurability condition) would then be a measure in M±(X×(Y+1)). Then we just apply the projection mapping X×(Y+1)→(X×Y)+1, which maps (x,y) to (x,y) and maps (x,b) to our special disconnected b point. So now we have something in M±((X×Y)+1), which, again, is isomorphic to M±(X×Y)⊕R. That's basically what's going on here when we take semidirect product w.r.t. a selection function, there's a couple isomorphisms and type conversions happening in the background here.
So, first, to show that this is even well-defined, we need to verify the relevant measurability condition for s to do semidirect product with it. Said measurability condition is "for all Borel B⊆Y+1, x↦s(x)(B) is a measurable function".
Now, here's the argument. We can split this up into two parts. First, there's the function x→s(x) of type X→M±(Y+1). and then, there's the function m→m(B), of type M±(Y+1)→R. Our function we're trying to show is measurable is the composition of these two. So if we can show they're both measurable, we're good, we've verified the condition. We immediately know that the first function is measurable, because our selection function has to be. So, we just need to check the second condition. The tricky part is that we've got the weak topology on M±(Y+1), not the strong topology. The definition of a measurable function is that the preimage of a measurable set in the target is measurable in the source. The weak topology has less open sets, so it's got less measurable sets, so it's more difficult for a preimage of a measurable to be measurable. Fortunately, by the answer to this mathoverflow question (and it generalizes because the answer didn't essentially use the "probability distribution" assumption in the setup, they only used properties of the weak topology), m↦m(B) is indeed measurable with the σ-algebra induced by the weak topology on the space of finite signed measures if B is Borel-measurable. So, the composition is measurable, and we've verified the condition to make a legit semidirect product.
Now, let's show that this set has the same expectation values as the semidirect product as defined for the functionals. H⋉K was defined as
This is the definition of our set. Let's minimize a function over it.
inf(m,b)∈H⋉K(m(f)+b)
Now, we can minimize over (m,b) and our selection function seperately, the selection function must come later since it depends on m. Further, the b component of m⋉c can be written as (m⋉c)↓X×1(1), because we're folding all the measure over X×1 into the value of a single point. The downwards arrow is restriction. We'll abbreviate L1(X,m,M±(Y)⊕R) as L1(m) for readability.
then we could proceed further. So let's show that. First, observe that if we replace = with ≥, the above line is true, because the selection function is always going to pick (for x), a point from K(x), which will have a value as-high-or-higher than the worst-case point from K(x). So, regardless of selection function, we have the ≥ version of the above line being true. So now we must establish that having a > in the above line is impossible. Given any ϵ, we'll craft a selection function where the two sides differ by only ϵ(λ⊙K||f||+2+λ⊙K), and we can let ϵ limit to 0 as everything else is fixed, yielding our result.
Our fundamental tool here will be the Kuratowski-Ryll-Nardzewski measurable selection theorem to measurably choose a suitable point from each K(x), getting us our selection function. We do have to be careful, though, because there's one additional property we need. Namely, that ∫X||s(x)||dm<∞, for said selection function to be L1.
Our multifunction for KRN-selection (given an ϵ) can be regarded as being split into two parts. So, given our m∈M+(X) from earlier, there must exist some compact set CXϵ⊆X which accounts for all but ϵof its measure.
Accordingly, our multifunction ψ for KRN-selection will be:
This is basically "on a compact set, act in a nearly worst-case way, and outside the compact set, just pick some point with bounded b value".
To verify the conditions to invoke KRN-selection... well, the first one is that each point gets mapped to a closed (check) and nonempty (check) set.
In order to make KRN-selection work, we need a measurability condition on our multifunction ψ. In order to set it up, we need to show that our multifunction has closed graph for x∈CXϵ. Ie, if xn limits to x, and each (mn,bn)∈ψ(xn), and (mn,bn) limits to (m,b), then (m,b)∈ψ(x).
To be in K(x) at least, observe that any infradistribution can be characterized as "a point is included as long as it exceeds our worst-case m(f)+b values for all the f". Fixing a particular f, the K(xn)(f) limit to K(x)(f). And the mn(f)+bn limit to m(f)+b. Thus, regardless of f, m(f)+b≥K(x)(f), because each mn(f)+bn lies above K(xn)(f) (by (mn,bn)∈K(xn)). This certifies that, regardless of f, m(f)+b≥K(x)(f) (the worst-case value), so (m,b)∈K(x).
We still have to check that in the limit, (m,b) fulfills the "almost-worst-case" defining inequality in order to lie in ψ(x). To work towards this result, we'll take another detour.
If xn limits to x and fn limits to f uniformly on all compact sets, then K(xn)(fn) limits to K(x)(f), because it gets arbitrarily close on the ϵ-almost-supports for the family K(xn), for arbitrarily low ϵ (Lemma 2). Also, K(xn)(f) limits to K(x)(f) by pointwise convergence for an infrakernel.
Further, we'll show that if (mn,bn) limits to (m,b), and fn limits uniformly to f on all compact sets, then mn(fn)+bn limits to m(f)+b. We obviously have convergence of the b terms, so that leaves the measure terms. For sufficiently large n, the gap between mn(fn) and mn(f) shrinks to 0, because compact sets of measures induce compact almost-supports for all ϵ, and we have convergence on compact sets. Also, mn(f) limits to m(f), so that checks out.
Now that we have this, we must show that m(λy.f(x,y))+b≤K(x)(λy.f(x,y))+ϵ.
Well, since f is continuous, then on the compact set {xn}n∈N∪{∞}×CY (where CY can be any compact subset of Y), it's uniformly continuous, so the sequence λy.f(xn,y) limits uniformly to λy.f(x,y). on CY. Ie, λy.f(xn,y) limits uniformly to λy.f(x,y) on all compact sets since CY was arbitrary.
Thus, by our earlier two results, we have mn(λy.f(xn,y))+bn limiting to m(λy.f(x,y))+b and K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). And the former sequence is always within \eps of the latter sequence, so the same applies to the limit points. Thus, (m,b) fulfills our final condition of approximately minimizing m(λy.f(x,y))+b.
Alright, so we've verified that the function x→ψ(x) maps each x to a closed nonempty set, and it has closed graph when restricted to x∈CXϵ Now, which condition do we need to check to invoke KRN-selection? The precise condition we need is that for every open set O in M±(Y)⊕R, the set of x where ψ(x)∩O≠∅ is measurable.
Let's think about that. We can view the graph of our multifunction as a subset of X×(M±(Y)⊕R). Remember it's divided into two parts, one for x∈CXϵ and one for not. Also, we can take our open set O⊆M±(Y)⊕R, extend it back to get an open set of
X×(M±(Y)⊕R)
intersect with the graph of our multifunction, and project down to the space X, and we want to show that said projection is always Borel.
Said projection is the projection of the intersection of the open set with the two "chunks" ((x,ψ(x)) where x∈CXϵ, and (x,ψ(x)) where this is not the case), so it's the union of two projected sets. If we can show that the projection of the intersection of O with these two "chunks" is measurable, then the net result is measurable, because the union of two measurable sets is measurable.
For one part of it, we'll be showing that the projection of O∩{(x,(m,b))|x∈CXϵ,(m,b)∈ψ(x)} Is a Fσ set, a countable union of closed sets, which is measurable. Here's how it works. That set that you're intersecting with O? Well, it isn't just closed (we know that already, we showed closed graph), it's compact. As you can clearly see, projecting it down to X, it lands in a compact set. So, we just need to show that projecting it down to M±(Y)⊕R lands in a compact set, and then we can go "huh, this set is a closed subset of the product of two compact sets, ie, compact, so it must be compact."
Necessary-and-sufficient conditions for a set of a-measures to be compact are: Bounded amount of measure present (ψ(x) is selecting from K(x) and the K(x) sets have a uniform upper bound on the amount of measure present in them), compact almost-support for all the measure components of all the K(x) for all ϵ (works because of the equivalence between CAS and the measure present in points in the infradistributions, and the compact-shared compact-almost-support property of an infrakernel, so the various K(x) do have all their measure components lying in a compact set because the x are selected from a compact set), and bounded b value (the finite Lipschitz constant and finite norm of the function of interest is incompatible with approximately minimizing your function with unboundedly large b values).
So, the a-measures of the various K(x) are contained in a compact set.
Where were we? Oh right, showing that the projection of
O∩{(x,(m,b))|x∈CXϵ,(m,b)∈ψ(x)}
Is a Fσ set. Pretty much, by the previous argument, we know that the set we're intersecting with O is a compact set. Also, in Polish spaces, all open sets can be written as a countable union of closed sets, so this set as a whole can be written as a countable union of (closed intersected with compact) sets. Ie, a countable union of compact sets. The projection of a compact set is compact, so the projection of the set is a countable union of compact sets (ie countable union of closed sets) and thus is Fσ.
Ok, that's part of things done. We still have to show that the projection of the set
O∩{(x,(m,b))|x∉CXϵ,(m,b)∈ψ(x)}
Is measurable. We'll actually show something a bit stronger, that it's an open set. Here's how. You can consider ψ′ to be a function of type X→K(Ma(Y)) (the space of compact subsets of a-measures) defined as x↦{(m,b)∈K(x)|b≤2}, which matches up with how ψ is defined outside of the compact set of interest. The projection of our set of interest is:
{x|ψ(x)∩O≠∅∧x∉CXϵ}
Or, it can be rephrased as:
{x|ψ′(x)∩O≠∅}∩{x|x∉CXϵ}
The complement of a closed set is open, so if we can just show that
{x|ψ′(x)∩O≠∅}
is open, we'll be done.
Now, the topology that K(Ma(Y)) is equipped with is the Vietoris topology. Ie, the basis of open sets of compact sets is given by a finite collection of open sets in Ma(Y), and taking all the compact sets which are a subset of the union of the opens and intersect each open.
Letting your two open sets for a Vietoris-open be the whole space itself, and the set O, this shows that the set of all compact sets of a-measures where they have nonempty intersection with O is open in K(Ma(Y)). So, our set of interest can be viewed as the preimage under ψ′ of an open set of compact sets. Further ψ′ is a continuous function by Lemma 4, the Hausdorff-distance induces the Vietoris topology, so the preimage of an open set is open.
So, the set of x overall where, for a given open set O, O∩ψ(x) is nonempty, is a measurable set, and we can now invoke the Kuratowski-Ryll-Nardzewswki measurable selection theorem and craft a measurable choice function s♢.
Said measurable choice function never picks points with a norm above certain bounds (it always picks from K(x), so there's a uniform bound on the amount of measure present, and the b value is either 2 or less in one case, or upper-bounded in the other case because too high of a b value would lead to violation of the Lipschitz constant), so it's in L1 and we can validly use it. And now we can go:
infs∈L1(m)m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1))
≤m(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
At this point, we split m into mCXϵ (the measure component on the compact set that makes up all but ϵ of its value), and m¬CXϵ, the measure component off the compact set.
=mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+m¬CXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
And noticing our conditions on our ψ multifunction that we selected from, in particular how it behaves off CXϵ,
≤mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+m¬CXϵ(λx.λ⊙K||f||+2)
=mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+ϵ(λ⊙K||f||+2)
That last part is because all but ϵ of the measure of m was captured by our compact set, so there's very little measure off it. Proceeding a bit further, and using what ψ(x) is for x∈CXϵ, particularly how close it is to the true minimum value, we have:
And we're done, we've shown that the expectations line up, so we have the right set form of semidirect product.
Proposition 25:The direct product is associative.(h1×h2)×h3=h1×(h2×h3)
Proof:
((h1×h2)×h3)(f)
=(h1×h2)(λx,y.h3(λz.f(x,y,z)))
=h1(λx.h2(λy.h3(λz.f(x,y,z))))
=h1(λx.(h2×h3)(λy,z.f(x,y,z)))
=(h1×(h2×h3))(f)
Done.
Proposition 26:If h1 and h2 are C-additive, then prX∗(h1×h2)=h1 and prY∗(h1×h2)=h2
prX∗(h1×h2)(f)=(h1×h2)(f∘prX)
=h1(λx.h2(λy.f(prX(x,y))))=h1(λx.h2(λy.f(x)))
=h1(λx.f(x))=h1(f)
That's one direction. For the second,
prY∗(h1×h2)(f)=(h1×h2)(f∘prY)
=h1(λx.h2(λy.f(prY(x,y))))=h1(λx.h2(λy.f(y)))
=h1(λx.h2(f))=h2(f)
Done.
Proposition 27:EK∗(H)(f)=EH(λx.EK(x)(f))
Proof of well-definedness of the pushforward: We'll use the same notational conveniences as in the proof of well-definedness of the product.
Our attempted set definition K∗(H) will be:
⋃(m,b)∈H(EmK(x)+(0,b))
Where EmK(x) equals
{(m′,b′)|∃s∈L1(m):∫Xs(x)dm=(m′,b′),∀x:s(x)∈K(x)}
The usual considerations about choice functions and weak measurability/strong measurability coinciding, along with the Bochner and Pettis integrals, means we aren't being vague here. The choice function is measurable and doesn't run off to infinity. Fortunately, we don't have to do the weird type conversions and isomorphisms from the semidirect product case, or do measurability arguments.
We could proceed further. And we can run through the exact same argument as from the semidirect product case to establish this equality, I'm not typing it again. So at this, point, we're at
=inf(m,b)∈H(m(λx.infm′,b′∈K(x)(m′(f)+b′))+b)
=inf(m,b)∈H(m(λx.K(x)(f))+b)
=EH(λx.EK(x)(f))
And the expectations match up and we're done.
Proposition 28:Ek∗(H)(f)=EH(λx.Ek(x)(f))
This is an easy one, k∗(H):={(k∗(m),b)|(m,b)∈H}, So then
Part 6: Showing the basic four infradistribution properties: Normalization, monotonicity, concavity, and Lipschitzness (specifically 1-Lipschitzness). For normalization, observe that regardless of n,
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ci1)=K:n(x0)(1)=1
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ci0)=K:n(x0)(0)=0
The limit of the all-1 sequence is 1 and the limit of the all-0 sequence is 0, so:
K:∞(x0)(1)=1
K:∞(x0)(0)=0
For monotonicity, let f′≥f. Then, regardless of n,
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
≥K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
This is because, since f′≥f, the "extend with worst-case outputs" function is bigger for f′ than f, and then by monotonicity for K:n(x0), the inequality transfers to the outside.
Accordingly,
K:∞(x0)(f′)=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
≥limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))=K:∞(x0)(f)
Now for concavity. For all n,
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ci(pf+(1−p)f′)(x1:n+1,xn+2:∞))
≥K:n(x0)(λx1:n+1.pinfxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
+(1−p)infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
≥pK:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
+(1−p)K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
This was by monotonicity (minimizing over the two parts separately produces lower values, and monotonicity transfers that to the outside), and concavity respectively, because K:n(x0) is an infradistribution. And now:
K:∞(x0)(pf+(1−p)f′)
=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ci(pf+(1−p)f′)(x1:n+1,xn+2:∞))
≥limn→∞pK:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
+(1−p)K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
=plimn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
+(1−p)limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))
=pK:∞(x0)(f)+(1−p)K:∞(x0)(f′)
Concavity is shown. Now for 1-Lipschitzness. Fix an arbitrary n. Our proof target is:
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))|≤d(f,f′)
Now, if we knew:
d(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞),λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))≤d(f,f′)
Then we'd be able to pair that with the 1-Lipschitzness of K:n(x0) to get our desired result. So let this be our new proof target. Reexpressing it a bit, it's:
∀x1:n+1∈∏i=n+1i=1Xi:|infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
−infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)|≤d(f,f′)
Now, if two functions are only d(f,f′) apart, then their minimal values over a compact set can only be d(f,f′) apart at most. Let your compact set be ∏i=n+1i=1{xi}×∏∞i=n+2Ci to see the result. Thus, we have proved our proof target and we have the result that, for all n,f,f′
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif′(x1:n+1,xn+2:∞))|≤d(f,f′)
Since they're always only d(f,f′) apart, the same property transfers to the limit, so we get:
|K:∞(x0)(f)−K:∞(x0)(f′)|≤d(f,f′)
And so, K:∞(x0) is 1-Lipschitz for all x0. This takes care of the Lipschitzness condition on infradistributions and the uniform Lipschitz bound condition on infrakernels. All that's left is the compact almost-support condition on infradistributions and the compact-shared compact almost-support condition on infrakernels, and the pointwise convergence condition on infrakernels.
Part 7: This will use our result from Part 4 about making a suitable compact set. Our desired thing we want to prove is that
∀CX0∈K(X0),ϵ>0∃Cϵ⊆∏∞i=1Xi∀x0∈CX0,f,f′∈CB(∏∞i=1Xi):
f↓Cϵ=f′↓Cϵ→|K:∞(x0)(f)−K:∞(x0)(f′)|≤ϵd(f,f′)
Because this property is exactly the compact-shared compact-almost-support property for K:∞.
So, here's our plan of attack. Fix our CX0 and ϵ, so our target is
∃Cϵ⊆∏∞i=1Xi∀x0∈CX0f,f′∈CB(∏∞i=1Xi):
f↓Cϵ=f↓Cϵ→|K:∞(x0)(f)−K:∞(x0)(f′)|≤ϵd(f,f′)
Let our Cϵ be C1:∞[CX0,ϵ], as defined from Part 4, and let x0 be arbitrary in CX0 and f,f′ be arbitrary and agree with each other on C1:∞[CX0,ϵ]. Then our goal becomes
|K:∞(x0)(f)−K:∞(x0)(f′)|≤ϵd(f,f′)
Now, letting our chosen defining sequence be Ci[CX0,ϵ], our goal is to show
∀n:|K:n(x0)(λx1:n+1.infxn+2:∞∈Cn+2:∞[CX0,ϵ]f(x1:n+1,xn+2:∞))
−K:n(x0)(λx1:n+1.infxn+2:∞∈Cn+2:∞[CX0,ϵ]f′(x1:n+1,xn+2:∞))|≤ϵd(f,f′)
If we could do this, all the approximating points being only ϵ away from each other shows that the same property transfers to the limit and we get our result. Accordingly, let n be arbitrary, and our proof goal is now:
|K:n(x0)(λx1:n+1.infxn+2:∞∈Cn+2:∞[CX0,ϵ]f(x1:n+1,xn+2:∞))
−K:n(x0)(λx1:n+1.infxn+2:∞∈Cn+2:∞[CX0,ϵ]f′(x1:n+1,xn+2:∞))|≤ϵd(f,f′)
Hypothetically, if we were able to show:
∀x1:n+1∈C1:n+1[CX0,ϵ]:infxn+2:∞∈Cn+2:∞[CX0,ϵ]f(x1:n+1,xn+2:∞)
=infxn+2:∞∈Cn+2:∞[CX0,ϵ]f′(x1:n+1,xn+2:∞)
Then we could conclude the two functions were equal on the set C1:n+1[CX0,ϵ], which, from Part 4, is an ϵ-almost support for all the K:n(x0) where x0∈CX0, and it would yield our result. So our proof target is now
∀x1:n+1∈C1:n+1[CX0,ϵ]:infxn+2:∞∈Cn+2:∞[CX0,ϵ]f(x1:n+1,xn+2:∞)
=infxn+2:∞∈Cn+2:∞[CX0,ϵ]f′(x1:n+1,xn+2:∞)
Accordingly, let x1:n+1 be arbitrary within said set, turning our proof target into:
infxn+2:∞∈Cn+2:∞[CX0,ϵ]f(x1:n+1,xn+2:∞)=infxn+2:∞∈Cn+2:∞[CX0,ϵ]f′(x1:n+1,xn+2:∞)
However, f and f′ are equal on the set C1:∞[CX0,ϵ], which breaks down as C1:n+1[CX0,ϵ]×Cn+2:∞[CX0,ϵ]. And we know that x1:n+1∈C1:n+1[CX0,ϵ], and since the second part is being selected from Cn+2:∞[CX0,ϵ], the two values are equal, and we've proved our proof target. Thus, we have compactly-shared compact almost-support, and that's the second condition for K:∞ to be an infrakernel, as well as the last condition needed for all the K:∞(x0) to be infradistributions. Just pointwise convergence left!
Part 8: Alright, for showing pointwise convergence, we'll need a lemma. We need to show:
∀CX0⊆X0,¯¯¯¯C∈∏∞i=1K(Xi),ϵ>0,f∈CB(∏∞i=1Xi)∃n∈N∀m∈N,x0∈CX0:
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ(2||f||+1)
This is roughly saying that the rate of convergence of K:n(x0)(f) to K:∞(x0)(f) is uniform over compact sets. Let's begin. Let CX0,¯¯¯¯C,ϵ,f be arbitrary, so our proof target is now:
∃n∈N∀m∈N,x0∈CX0:|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ(2||f||+1)
Here's what we'll do. Because we have a CX0 and ϵ, we can craft our sequence Ci[CX0,ϵ] of compact ϵ-supports. Now, on the set:
∏∞i=1(Ci[CX0,ϵ]∪Ci)
which is compact, we know that f is uniformly continuous, so there's some δ you need to go to to ensure that function values are only ϵ away from each other, which translates into an n via n:=log2(δ). Any two inputs which only differ beyond that point will be only δ apart and can only get an ϵ-difference in values from f. Now that that's defined, let m and x0∈CX0 be arbitrary. Our proof target is now to show:
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ(2||f||+1)
We know from Part 3 that
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
=K:n+m(x0)(λx1:n+m+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
So let's make that substitution, turning our proof target into
|K:n+m(x0)(λx1:n+m+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ(2||f||+1)
Assuming hypothetically we were able to show that:
∀x1:n+m+1∈C1:n+m+1[CX0,ϵ]:|infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
−infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞)|≤ϵ
Then, because C1:n+m+1[CX0,ϵ] is an ϵ-almost-support for K:n+m(x0) as long as x0∈CX0 (which we know to be the case), we could apply Lemma 2 to get our desired proof target.
The "Lipschitz constant times deviation over compact set" part would make one ϵ, and the "degree of almost-support times difference in the two functions" part would make 2ϵ||f|| in the worst-case due to being an ϵ-almost-support and the worst possible case where the two functions differ by 2||f|| somewhere.
So now our proof target is:
∀x1:n+m+1∈C1:n+m+1[CX0,ϵ]:|infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
−infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞)|≤ϵ
Thus, let x1:n+m+1 be arbitrary within the appropriate set, so our proof target is now:
|infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
−infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞)|≤ϵ
Because we're minimizing over a compact set in both instances, we can consider the infs to be attained by a xn+2:∞ and a xn+m+2:∞, so our proof target is now
|f(x1:n+1,xn+2:∞)−f(x1:n+m+1,xn+m+2:∞)|≤ϵ
Now, because x1:n+1∈C1:n+1[CX0,ϵ], and xn+2:∞∈∏∞i=n+2Ci, we have:
(x1:n+1,xn+2:∞)∈C1:n+1[CX0,ϵ]×∏∞i=n+2Ci
=∏i=n+1i=1Ci[CX0,ϵ]×∏∞i=n+2Ci⊆∏∞i=1(Ci[CX0,ϵ]∪Ci)
And also, because x1:n+m+1∈C1:n+m+1[CX0,ϵ], and xn+m+2:∞∈∏∞i=n+m+2Ci,
(x1:n+m+1,xn+m+2:∞)∈C1:n+m+1[CX0,ϵ]×∏∞i=n+m+2Ci
=∏i=n+m+1i=1Ci[CX0,ϵ]×∏∞i=n+m+2Ci⊆∏∞i=1(Ci[CX0,ϵ]∪Ci)
So, both inputs lie in the relevant compact set, and they agree on x1:n+1, so the inputs are only δ apart, so they only have value differing by ϵ, and we're done, our desired result of
∀CX0⊆X0,¯¯¯¯C∈∏∞i=1K(Xi),ϵ>0,f∈CB(∏∞i=1Xi)∃n∈N∀m∈N,x0∈CX0:
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ(2||f||+1)
goes through.
Part 9: Now, for showing our pointwise continuity property, fix a sequence x0,m which limits to x0,∞, and an arbitrary f. Our task is now to show that:
limm→∞K:∞(x0,m)(f)=K:∞(x0,∞)(f)
Fixing some arbitrary Ci sequence, we could do this if we could show that
limm→∞limn→∞K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
=limn→∞K:n(x0,∞)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
Now, because
K:n(x0,∞)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
=limm→∞K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
we can rewrite our desired proof target as:
limm→∞limn→∞K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
=limn→∞limm→∞K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
Huh, we just need to swap our limits! And the Moore-Osgood theorem says you can swap limits while preserving equality if, for fixed n,
limm→∞K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
=K:n(x0,∞)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
and (this part is harder)
K:n(x0,m)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Xif(x1:n+1,xn+2:∞))
limits to
K:∞(x0,m)(f)
uniformly in m. However, back in Part 8 we established uniform convergence on any compact set. And {x0,m}m∈N∪{∞} is a compact set! So we have uniform convergence and can invoke Moore-Osgood to swap the limits and get our proof target, showing our result, that
limm→∞K:∞(x0,m)(f)=K:∞(x0,∞)(f)
So, K:∞ fulfills pointwise convergence and we've verified all properties needed to show it's an infrakernel.
Part 10: Showing that if all the K:n have some nice property, then K:∞ inherits it too. Homogenity, Cohomogenity, C-additivity, crispness, and sharpness.
Proof of homogenity preservation:
K:∞(x0)(af)=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ciaf(x1:n+1,xn+2:∞))
=limn→∞K:n(x0)(λx1:n+1.ainfxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
=limn→∞aK:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
=alimn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))=aK:∞(x0)(f)
Proof of cohomogenity preservation: Let your Ci be a mere sequence of points xi, that's compact.
K:∞(x0)(1+af)=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2{xi}1+af(x1:n+1,xn+2:∞))
=limn→∞K:n(x0)(λx1:n+1.1+af(x1:n+1,xn+2:∞))
=limn→∞1−a+aK:n(x0)(λx1:n+1.1+f(x1:n+1,xn+2:∞))
=1−a+alimn→∞K:n(x0)(λx1:n+1.1+f(x1:n+1,xn+2:∞))
=1−a+aK:∞(x0)(1+f)
Proof of C-additivity preservation:
K:∞(x0)(c)=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cic)
=limn→∞K:n(x0)(c)=limn→∞c=c
Proof of crispness preservation: Crispness is equivalent to the conjunction of C-additivity and homogenity and both of those are preserved.
Proof of sharpness preservation: So, this will take a bit of work because we do have to get a fairly explicit form for the set you're minimizing over. Remember that when h and K are crisp, with associated compact sets Ch and CK(x), then the compact minimizing set for h⋉K is ⋃x∈ChCK(x), from the proof of sharpness preservation for semidirect product. Another way of writing this minimizing set of the semidirect product is as the set
{(x,y)|x∈Ch,y∈CK(x)}
Now, we'll give a form for the associated compact set of K:n(x0). It is:
{x1:n+1|∀i≤n:xi+1∈CKi(x0,x1:i)}
For the base case, observe that when n=0, it's
{x1|x1∈CK0(x0)}=CK0(x0)
which works out perfectly. For the induction step, because
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And we are, by induction, assuming the relevant form for the associated compact set of K:n(x0), and know how to craft the minimizing set for semidirect products, putting them together gets us:
{(x1:n+1,xn+2)|∀i≤n:xi+1∈CKi(x0,x1:i)∧xn+2∈CKn+1(x0,x1:n+1)}
={x1:n+2|∀i≤n+1:xi+1∈CKi(x0,x1:i)}
And so, we have the proper set form for the compact minimizing sets associated with all the sharp infradistributions K:n(x0).
Our conjectured set form for the infinite semidirect product would be:
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
Let's call the projections of this to coordinate i as C∞i. All of these are compact, because it yields the exact same result as projecting
{x1:i|∀j≤i−1:xj+1∈CKj(x0,x1:j)}
Down to coordinate i, and we know this is compact because it's the compact set associated with K:i−1(x0), and projections of compact sets are compact. Note that since there's a dependence on earlier selected points, it isn't as simple as our CK:∞(x0) set being a product. But we do at least have the result that
CK:∞(x0)⊆∏∞i=1C∞i
So, since it's a subset of a compact set (product of compact sets), it's at least precompact. All we need is to verify closure to see that CK:∞(x0) is compact.
Fix a convergent sequence xm1:∞ limiting to x∞1:∞, and all the xm1:∞ lie in
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
So, if this arbitrary convergent sequence didn't have its limit point be in the set, then we could conclude that:
∃n:x∞n+1∉CKn(x0,x∞1:n)
However, we could imagine projecting our set (and convergent sequence) down to coordinates 1-through-n+1 and we'd still get that exact same failure of closure, but in the set
{x1:n+1|∀i≤n:xi+1∈CKi(x0,x1:i)}
Ie, the compact set for K:n(x0), which, due to being compact, is closed, and it does contain that limit point, getting a contradiction, Therefore,
{x1:∞|∀n:xn+1∈CKn(x0,x1:n)}
Is indeed a compact subset of ∏∞i=1Xi.
Yes, but can K:∞(x0)(f) be expressed as minimizing over this set? Remember, we're using C∞i as an abbreviation for the projection of this set to coordinate i, and let's abbreviate the set as a whole as C∞1:∞, and the projections of it to coordinates 1-through-n+1 as C∞1:n+1 which is the same as the minimizing compact set associated with K:n(x0) Then:
∏∞i=1C∞i is a compact set, so f is uniformly continuous over it so there's always some huge n where two inputs agreeing perfectly on the first n coordinates will produce values extremely close together. Then we can go:
K:∞(x0)(f)=limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2C∞if(x1:n+1,xn+2:∞))
=limn→∞infx1:n+1∈C∞1:n+1(infxn+2:∞∈∏∞i=n+2C∞if(x1:n+1,xn+2:∞))
=limn→∞infx1:∞∈C∞1:n+1×∏∞i=n+2C∞if(x1:∞)
And because C∞1:n+1×∏∞i=n+2C∞i as a set agrees with C∞1:∞ on more and more coordinates as n increases (which we know drives f closer and closer to its true minimum value), and is constantly narrowing down, we have a monotonically increasing sequence, with limit:
infx1:∞∈C∞1:∞f(x1:∞)
And thus, all the K:∞ are sharp. We're finally done!
Proposition 23: If all the Kn are C-additive, then pr∏i=n+1i=0Xi∗(h⋉K:∞)=h⋉K:n
So,
pr∏i=n+1i=0Xi∗(h⋉K:∞)(f)
=(h⋉K:∞)(f∘pr∏i=n+1i=0Xi)
=h(λx0.K:∞(x0)(λx1:∞.f(pr∏i=n+1i=0Xi(x0,x1:∞))))
=h(λx0.limm→∞K:m(x0)(λx1:m+1.infxm+2:∞∈∏∞i=m+2Cif(pr∏i=n+1i=0Xi(x0,x1:m+1,xm+2:∞))))
Now, when m=n, the inner function turns into
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(pr∏i=n+1i=0Xi(x0,x1:n+1,xn+2:∞)))
=K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x0,x1:n+1))
=K:n(x0)(λx1:n+1.f(x0,x1:n+1))
And, when m>n, we'll use as our induction assumption that
K:m(x0)(λx1:m+1.infxm+2:∞∈∏∞i=m+2Cif(pr∏i=n+1i=0Xi(x0,x1:m+1,xm+2:∞)))
=K:n(x0)(λx1:n+1.f(x0,x1:n+1))
and show the same result for
K:m+1(x0)(λx1:m+2.infxm+3:∞∈∏∞i=m+3Cif(pr∏i=n+1i=0Xi(x0,x1:m+2,xm+3:∞)))
So let's begin rewriting this. By unpacking K:m+1(x0) as
K:m(x0)⋉(λx1:m+1.Km+1(x0,x1:m+1))
we get:
=K:m(x0)(λx1:m+1.Km+1(x0,x1:m+1)(λxm+2.infxm+3:∞∈∏∞i=m+3Cif(pr∏i=n+1i=0Xi(x0,x1:m+1,xm+2,xm+3:∞))))
And then, because m+2>n+1 if m>n, that chunk gets clipped off by the projection, and we have
=K:m(x0)(λx1:m+1.Km+1(x0,x1:m+1)(λxm+2.infxm+3:∞∈∏∞i=m+3Cif(pr∏i=n+1i=0Xi(x0,x1:m+1))))
=K:m(x0)(λx1:m+1.Km+1(x0,x1:m+1)(λxm+2.f(pr∏i=n+1i=0Xi(x0,x1:m+1))))
And then, since there's no dependence on xm+2, it's treated as a constant, and due to C-additivity of Km+1, we can pull it out of the Km+1(x0,x1:m+1) to yield
=K:m(x0)(λx1:m+1.f(pr∏i=n+1i=0Xi(x0,x1:m+1)))
And then, because adding any further coordinates would get clipped away by the projection, we can minimize over further coordinates, it does nothing, to get
=K:m(x0)(λx1:m+1.infxm+2:∞∈∏∞i=m+2Cif(pr∏i=n+1i=0Xi(x0,x1:m+1,xm+2:∞)))
and then by our induction assumption
=K:n(x0)(λx1:n+1.f(x0,x1:n+1))
So, past n, the value of that inner function freezes, and it determines the limit. Therefore, we have the result that
h(λx0.limm→∞K:m(x0)(λx1:m+1.infxm+2:∞∈∏∞i=m+2Cif(pr∏i=n+1i=0Xi(x0,x1:m+1,xm+2:∞))))
=h(λx0.K:n(x0)(λx1:n+1.f(x0,x1:n+1)))
=(h⋉K:n)(f)
Therefore,
pr∏i=n+1i=0Xi∗(h⋉K:∞)(f)=(h⋉K:n)(f)
Regardless of f, and we have our result.
For further progress, we need to put in some work on characterizing how infrakernels work. We'll be shamelessly stealing a result about the infra-KR metric from proofs in the future of this post, because there's no cyclic dependencies that show up.
Lemma 4: For an infrakernel K:Xik→Y, if xn limits to x, then if you let K(xn)b∗ be the set
{(m,b)∈K(xn)|b≤b∗} then limn→∞K(xn)b∗=K(x)b∗.
Interestingly enough, the only result we need to steal from later-number propositions is that, if a sequence of infradistributions Hn limits to H in Hausdorff-distance, then for all functions f, Hn(f) limits to H(f), which is proposition 45.
We'll be working backwards by taking our goal, and repeatedly showing that it can be shown if a slightly easier result can be shown, which eventually grounds out in something we can prove outright. So, our goal is: limn→∞K(xn)b∗=K(x)b∗
Now, we want b∗ to be an arbitrary integer. Let dhau∘(H,H′) (distance metric between two infradistribution sets) be defined as:
∑b≥22−bdhau(Hb,H′b)
Where dhau is the Hausdorff-distance between sets (the two truncated sets cut off at some +b value), according to some KR-metric on measures. Which is induced by a metric on X. It doesn't really matter which metric you use, any one will work.
Now, assuming you had the result:
limn→∞dhau∘(K(xn),K(x))=0
Then, by the definition of that distance metric, and b∗ being an integer, said distance converging to 0 would imply your desired
limn→∞K(xn)b∗=K(x)b∗
by the definition of the modified Hausdorff metric. So, let's try to prove that the modified Hausdorff distance limits to 0. The best way to do this is by assuming that it actually doesn't limit to 0, and deriving a contradiction. So, let's change our proof goal to try to derive bottom, and assume that there is some ϵ and some subsequence of xn where K(xn) always stays ϵ away from the set K(x) according to the modified Hausdorff metric.
At this point, given any value of b≥2, we can notice that K(xn)b, due to the compact-almost-support condition on our compact collection of infrakernels (the K(xn) sequence), has, for all ϵ, some component set Cϵ⊆Y where all the measure components of the K(xn) lie in a compact set. Also, due to the Lipschitz-boundedness condition, there's an upper bound on the amount of measure present. This is a necessary-and-sufficient condition for the measure components of the infrakernel family K(xn) to lie in a compact set of measures. Further, the b upper bound means that the last avenue to a failure of compactness is closed off, and all the K(xn)b, lie in some compact set of a-measures. Call the set that all the K(xn)b lie in Kb. Closure of the K(xn)b sets, and being a subset of a compact set means that they're all compact. They can then be considered as points in the space K(Kb), the space of compact subsets of Kb, which, being the space of compact subsets of a compact set equipped with the Hausdorff distance metric, is compact.
By compactness, we can pick yet another subsequence which converges in K(Kb), and we then get that the K(xn)b converge on some subsequence. This argument always works, so we can find a subsequence where the K(xn)2 converge in Hausdorff-distance, and then a subsequence of that where the K(xn)3 converge in Hausdorff-distance, and so on, and take a diagonal (first element from first subsequence, second element from second subsequence, etc..)
And so, we eventually arrive at a subsequence of the K(xn) where, regardless of b, K(xn)b is a Cauchy sequence.
So, we can find a subsequence where the K(xn)b converge, regardless of what b is. Therefore, since
dhau∘(H,H′)=∑b≥22−bdhau(Hb,H′b)
We have that on the subsequence, K(xn) is a Cauchy sequence according to dhau∘. Also, all the K(xn) are staying ϵ away from K(x), according to dhau∘.
Assuming, hypothetically, we were able to show that our Cauchy K(xn) sequence actually converges to K(x), we'd have our contradiction.
What we'll do is show that the K(xn) sets do have a set, which we'll dub K∞, that they do limit to according to dhau∘, then we'll show that said set must agree with K(x) re: the expectations of every function, and must be an identical infradistribution, so our convergent subsequence which stays away from K(x) actually limits to it and we have a contradiction.
So, let's specify the sets. For each b≥2, let Kb∞ be limn→∞K(xn)b. (For our convergent subsequence, this set is always well-defined)
What we want to show is that:
(⋃b′Kb′∞)b=Kb∞
Ie, our proposed limit set of the convergent K(xn) sequence (according to dhau∘) is ⋃b′Kb′∞. And if we can show that chopping this set off at any b value in particular makes the limit of the chopped-off sets, then the modified Hausdorff-metric limits to 0, and this is indeed the limit point of our K(xn) sequence according to the modified Hausdorff-metric.
One direction of this is very easy, we trivially have
(⋃b′Kb′∞)b⊇Kb∞
For the other direction, fix a particular b where equality fails. Then there exists some point(m,b′′)∈(⋃b′Kb′∞)b where it doesn't lie in Kb∞. From this, we know that b′′≤b and there is some b′ where (m,b′′)∈Kb′∞, so we also have b′′≤b′. Now, if b′′<b, and yet (m,b′′)∈Kb′∞, then there is some Cauchy sequence from the K(xn)b′ that limits to (m,b′′), and eventually the bn terms of this sequence will drop below b itself, and they will start all being present in K(xn)b, giving you a sequence of points in that sequence of sets that limits to (m,b′′), witnessing that said point lies in Kb∞, the limit of K(xn)b. However, what if b′′=b? Then, there's a Cauchy sequence from the K(xn)b′ that limits to (m,b′′), and eventually the bn terms of the sequence will approach b itself, and then mixing a little tiny bit with some point in K(xn) with b=0, to make a point nearby which still undershoots the cutoff, and this sequence still limits to (m,b′′), again witnessing that said point lies in Kb∞. That's both cases down, so we've shown
(⋃b′Kb′∞)b=Kb∞
For all the b. Accordingly, we now know that the K(xn) sequence limits to this set.
Now, we just have to show that said limit set equals K(x), in order to derive our desired contradiction. We do this by, letting f be arbitrary, and λ⊙K be the Lipschitz constant of the infrakernel,
(⋃bKb∞)(f)=(⋃b′Kb′∞)2λ⊙K||f||(f)
=K2λ⊙K||f||∞(f)=limn′→∞K2λ⊙K||f||(xn′)(f)
=limn′→∞K(xn′)(f)=limn→∞K(xn)(f)=K(x)(f)
So, we've got six equalities. Equality 2 is what we just showed, equality 5 is because the limit for a subsequence is the same as the limit for the original sequence, and equality 6 comes from the pointwise-convergence property for infradistributions, so that leaves equalities 1, 3, and 4. Equalities 1 and 4 can be dispatched by observing that there's some fixed λ⊙K upper-bound on the Lipschitz constant/amount of measure present in the infradistribution points, so if there was a minimizing point for the expectation of f in any of these infradistributions with b≥2λ⊙K||f||, the expectation value of f would be so high it'd violate the Lipschitz bound on the infradistribution. Thus, clipping off the b value at this height doesn't change the expectation of the function f. Finally, there's equality 3, which is addressed with Proposition 45, because the upper completions of said sets limit to each other in Hausdorff-distance.
We have now derived a contradiction, so our desired result of limn→∞K(xn)b∗=K(x)b∗ for all b∗ holds. We'll be using this.
Proposition 24: EH⋉K(f)=EH(λx.EK(x)(λy.f(x,y)))
Our attempted set definition H⋉K will be:
{(m⋉s)+(0,b)|(m,b)∈H,s∈L1(X,m,M±(Y)⊕R),∀x:s(x)∈K(x)}
So, we should be precise about this. We start with a (m,b)∈H where m is actually a measure. Then we fix a selection function mapping x to a point in K(x). Due to X and the cone of a-measures being separable, weak measurability and strong measurability coincide, as do the Bochner and Pettis integrals, so we can just talk about "measurability" and "the integral". Said choice function, due to lying in L1(X,m,M±(Y)⊕R) is measurable (in this case, we're equipping M±(Y)⊕R with the Borel σ-algebra). Further, said selection function doesn't "run off to infinity", ie, ∫X||s(x)||dm<∞. This measurability and being L1 integrable in a suitable sense means the Bochner integral is well defined, so ∫Xs(x)dm is well-defined and does indeed denote a unique point.
Now that we know how the selection function behaves, we should clarify what m⋉s means. Semidirect products are technically only defined when s is a Markov kernel. But we're working with measures, so we can lift the restriction that for all x, s(x) is a probability distribution. So, we need to just verify the first measurability condition on the Wikipedia page for "Markov kernel" instead. But wait, s(x) needs to be a measure for this to work! Instead it's an element of M±(Y)⊕R instead! Well... we can consider that last coordinate to be "amount of measure on a special disconnected point", and M±(Y)⊕R is then isomorphic to M±(Y+1). Taking this variant, the semidirect product (assuming the measurability condition) would then be a measure in M±(X×(Y+1)). Then we just apply the projection mapping X×(Y+1)→(X×Y)+1, which maps (x,y) to (x,y) and maps (x,b) to our special disconnected b point. So now we have something in M±((X×Y)+1), which, again, is isomorphic to M±(X×Y)⊕R. That's basically what's going on here when we take semidirect product w.r.t. a selection function, there's a couple isomorphisms and type conversions happening in the background here.
So, first, to show that this is even well-defined, we need to verify the relevant measurability condition for s to do semidirect product with it. Said measurability condition is "for all Borel B⊆Y+1, x↦s(x)(B) is a measurable function".
Now, here's the argument. We can split this up into two parts. First, there's the function x→s(x) of type X→M±(Y+1). and then, there's the function m→m(B), of type M±(Y+1)→R. Our function we're trying to show is measurable is the composition of these two. So if we can show they're both measurable, we're good, we've verified the condition. We immediately know that the first function is measurable, because our selection function has to be. So, we just need to check the second condition. The tricky part is that we've got the weak topology on M±(Y+1), not the strong topology. The definition of a measurable function is that the preimage of a measurable set in the target is measurable in the source. The weak topology has less open sets, so it's got less measurable sets, so it's more difficult for a preimage of a measurable to be measurable. Fortunately, by the answer to this mathoverflow question (and it generalizes because the answer didn't essentially use the "probability distribution" assumption in the setup, they only used properties of the weak topology), m↦m(B) is indeed measurable with the σ-algebra induced by the weak topology on the space of finite signed measures if B is Borel-measurable. So, the composition is measurable, and we've verified the condition to make a legit semidirect product.
Now, let's show that this set has the same expectation values as the semidirect product as defined for the functionals. H⋉K was defined as
{(m⋉s)+(0,b)|(m,b)∈H,s∈L1(X,m,M±(Y)⊕R),∀x:s(x)∈K(x)}
This is the definition of our set. Let's minimize a function over it.
inf(m,b)∈H⋉K(m(f)+b)
Now, we can minimize over (m,b) and our selection function seperately, the selection function must come later since it depends on m. Further, the b component of m⋉c can be written as (m⋉c)↓X×1(1), because we're folding all the measure over X×1 into the value of a single point. The downwards arrow is restriction. We'll abbreviate L1(X,m,M±(Y)⊕R) as L1(m) for readability.
=inf(m,b)∈Hinfs∈L1(m)((m⋉s)↓X×Y(f)+(m⋉s)↓X×1(1)+b)
Then, we can unpack the value of a semidirect product evaluating a function like our usual thing, producing...
=inf(m,b)∈H(infs∈L1(m)(m(λx.s(x)↓Y(λy.f(x,y)))+m(λx.s(x)↓1(1)))+b)
=inf(m,b)∈H(infs∈L1(m)(m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1)))+b)
Now, hypothetically, if
infs∈L1(m)m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1))=m(λx.infm′,b′∈K(x)m′(λy.f(x,y))+b′)
then we could proceed further. So let's show that. First, observe that if we replace = with ≥, the above line is true, because the selection function is always going to pick (for x), a point from K(x), which will have a value as-high-or-higher than the worst-case point from K(x). So, regardless of selection function, we have the ≥ version of the above line being true. So now we must establish that having a > in the above line is impossible. Given any ϵ, we'll craft a selection function where the two sides differ by only ϵ(λ⊙K||f||+2+λ⊙K), and we can let ϵ limit to 0 as everything else is fixed, yielding our result.
Our fundamental tool here will be the Kuratowski-Ryll-Nardzewski measurable selection theorem to measurably choose a suitable point from each K(x), getting us our selection function. We do have to be careful, though, because there's one additional property we need. Namely, that ∫X||s(x)||dm<∞, for said selection function to be L1.
Our multifunction for KRN-selection (given an ϵ) can be regarded as being split into two parts. So, given our m∈M+(X) from earlier, there must exist some compact set CXϵ⊆X which accounts for all but ϵof its measure.
Accordingly, our multifunction ψ for KRN-selection will be:
If x∉CXϵ, then
ψ(x):={(m,b)∈K(x)|b≤2}
If x∈CXϵ, then ψ(x) is the set:
K(x)∩{(m,b)|m(λy.f(x,y))+b≤infm′,b′∈K(x)(m′(λy.f(x,y))+b′)+ϵ}
This is basically "on a compact set, act in a nearly worst-case way, and outside the compact set, just pick some point with bounded b value".
To verify the conditions to invoke KRN-selection... well, the first one is that each point gets mapped to a closed (check) and nonempty (check) set.
In order to make KRN-selection work, we need a measurability condition on our multifunction ψ. In order to set it up, we need to show that our multifunction has closed graph for x∈CXϵ. Ie, if xn limits to x, and each (mn,bn)∈ψ(xn), and (mn,bn) limits to (m,b), then (m,b)∈ψ(x).
To be in K(x) at least, observe that any infradistribution can be characterized as "a point is included as long as it exceeds our worst-case m(f)+b values for all the f". Fixing a particular f, the K(xn)(f) limit to K(x)(f). And the mn(f)+bn limit to m(f)+b. Thus, regardless of f, m(f)+b≥K(x)(f), because each mn(f)+bn lies above K(xn)(f) (by (mn,bn)∈K(xn)). This certifies that, regardless of f, m(f)+b≥K(x)(f) (the worst-case value), so (m,b)∈K(x).
We still have to check that in the limit, (m,b) fulfills the "almost-worst-case" defining inequality in order to lie in ψ(x). To work towards this result, we'll take another detour.
If xn limits to x and fn limits to f uniformly on all compact sets, then K(xn)(fn) limits to K(x)(f), because it gets arbitrarily close on the ϵ-almost-supports for the family K(xn), for arbitrarily low ϵ (Lemma 2). Also, K(xn)(f) limits to K(x)(f) by pointwise convergence for an infrakernel.
Further, we'll show that if (mn,bn) limits to (m,b), and fn limits uniformly to f on all compact sets, then mn(fn)+bn limits to m(f)+b. We obviously have convergence of the b terms, so that leaves the measure terms. For sufficiently large n, the gap between mn(fn) and mn(f) shrinks to 0, because compact sets of measures induce compact almost-supports for all ϵ, and we have convergence on compact sets. Also, mn(f) limits to m(f), so that checks out.
Now that we have this, we must show that m(λy.f(x,y))+b≤K(x)(λy.f(x,y))+ϵ.
Well, since f is continuous, then on the compact set {xn}n∈N∪{∞}×CY (where CY can be any compact subset of Y), it's uniformly continuous, so the sequence λy.f(xn,y) limits uniformly to λy.f(x,y). on CY. Ie, λy.f(xn,y) limits uniformly to λy.f(x,y) on all compact sets since CY was arbitrary.
Thus, by our earlier two results, we have mn(λy.f(xn,y))+bn limiting to m(λy.f(x,y))+b and K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). And the former sequence is always within \eps of the latter sequence, so the same applies to the limit points. Thus, (m,b) fulfills our final condition of approximately minimizing m(λy.f(x,y))+b.
Alright, so we've verified that the function x→ψ(x) maps each x to a closed nonempty set, and it has closed graph when restricted to x∈CXϵ Now, which condition do we need to check to invoke KRN-selection? The precise condition we need is that for every open set O in M±(Y)⊕R, the set of x where ψ(x)∩O≠∅ is measurable.
Let's think about that. We can view the graph of our multifunction as a subset of
X×(M±(Y)⊕R). Remember it's divided into two parts, one for x∈CXϵ and one for not. Also, we can take our open set O⊆M±(Y)⊕R, extend it back to get an open set of
X×(M±(Y)⊕R)
intersect with the graph of our multifunction, and project down to the space X, and we want to show that said projection is always Borel.
Said projection is the projection of the intersection of the open set with the two "chunks" ((x,ψ(x)) where x∈CXϵ, and (x,ψ(x)) where this is not the case), so it's the union of two projected sets. If we can show that the projection of the intersection of O with these two "chunks" is measurable, then the net result is measurable, because the union of two measurable sets is measurable.
For one part of it, we'll be showing that the projection of
O∩{(x,(m,b))|x∈CXϵ,(m,b)∈ψ(x)}
Is a Fσ set, a countable union of closed sets, which is measurable. Here's how it works. That set that you're intersecting with O? Well, it isn't just closed (we know that already, we showed closed graph), it's compact. As you can clearly see, projecting it down to X, it lands in a compact set. So, we just need to show that projecting it down to M±(Y)⊕R lands in a compact set, and then we can go "huh, this set is a closed subset of the product of two compact sets, ie, compact, so it must be compact."
Necessary-and-sufficient conditions for a set of a-measures to be compact are: Bounded amount of measure present (ψ(x) is selecting from K(x) and the K(x) sets have a uniform upper bound on the amount of measure present in them), compact almost-support for all the measure components of all the K(x) for all ϵ (works because of the equivalence between CAS and the measure present in points in the infradistributions, and the compact-shared compact-almost-support property of an infrakernel, so the various K(x) do have all their measure components lying in a compact set because the x are selected from a compact set), and bounded b value (the finite Lipschitz constant and finite norm of the function of interest is incompatible with approximately minimizing your function with unboundedly large b values).
So, the a-measures of the various K(x) are contained in a compact set.
Where were we? Oh right, showing that the projection of
O∩{(x,(m,b))|x∈CXϵ,(m,b)∈ψ(x)}
Is a Fσ set. Pretty much, by the previous argument, we know that the set we're intersecting with O is a compact set. Also, in Polish spaces, all open sets can be written as a countable union of closed sets, so this set as a whole can be written as a countable union of (closed intersected with compact) sets. Ie, a countable union of compact sets. The projection of a compact set is compact, so the projection of the set is a countable union of compact sets (ie countable union of closed sets) and thus is Fσ.
Ok, that's part of things done. We still have to show that the projection of the set
O∩{(x,(m,b))|x∉CXϵ,(m,b)∈ψ(x)}
Is measurable. We'll actually show something a bit stronger, that it's an open set. Here's how. You can consider ψ′ to be a function of type X→K(Ma(Y)) (the space of compact subsets of a-measures) defined as x↦{(m,b)∈K(x)|b≤2}, which matches up with how ψ is defined outside of the compact set of interest. The projection of our set of interest is:
{x|ψ(x)∩O≠∅∧x∉CXϵ}
Or, it can be rephrased as:
{x|ψ′(x)∩O≠∅}∩{x|x∉CXϵ}
The complement of a closed set is open, so if we can just show that
{x|ψ′(x)∩O≠∅}
is open, we'll be done.
Now, the topology that K(Ma(Y)) is equipped with is the Vietoris topology. Ie, the basis of open sets of compact sets is given by a finite collection of open sets in Ma(Y), and taking all the compact sets which are a subset of the union of the opens and intersect each open.
Letting your two open sets for a Vietoris-open be the whole space itself, and the set O, this shows that the set of all compact sets of a-measures where they have nonempty intersection with O is open in K(Ma(Y)). So, our set of interest can be viewed as the preimage under ψ′ of an open set of compact sets. Further ψ′ is a continuous function by Lemma 4, the Hausdorff-distance induces the Vietoris topology, so the preimage of an open set is open.
So, the set of x overall where, for a given open set O, O∩ψ(x) is nonempty, is a measurable set, and we can now invoke the Kuratowski-Ryll-Nardzewswki measurable selection theorem and craft a measurable choice function s♢.
Said measurable choice function never picks points with a norm above certain bounds (it always picks from K(x), so there's a uniform bound on the amount of measure present, and the b value is either 2 or less in one case, or upper-bounded in the other case because too high of a b value would lead to violation of the Lipschitz constant), so it's in L1 and we can validly use it. And now we can go:
infs∈L1(m)m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1))
≤m(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
At this point, we split m into mCXϵ (the measure component on the compact set that makes up all but ϵ of its value), and m¬CXϵ, the measure component off the compact set.
=mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+m¬CXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
And noticing our conditions on our ψ multifunction that we selected from, in particular how it behaves off CXϵ,
≤mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+m¬CXϵ(λx.λ⊙K||f||+2)
=mCXϵ(λx.s♢(x)↓Y(λy.f(x,y))+s♢(x)↓1(1))
+ϵ(λ⊙K||f||+2)
That last part is because all but ϵ of the measure of m was captured by our compact set, so there's very little measure off it. Proceeding a bit further, and using what ψ(x) is for x∈CXϵ, particularly how close it is to the true minimum value, we have:
≤mCXϵ(λx.K(x)(λy.f(x,y))+ϵ)+ϵ(λ⊙K||f||+2)
≤m(λx.K(x)(λy.f(x,y))+ϵ)+ϵ(λ⊙K||f||+2)
=m(λx.K(x)(λy.f(x,y)))+m(ϵ)+ϵ(λ⊙K||f||+2)
≤m(λx.K(x)(λy.f(x,y)))+ϵ(λ⊙K||f||+2+λ⊙K)
=m(λx.infm′,b′∈K(x)m′(λy.f(x,y))+b′)+ϵ(λ⊙K||f||+2+λ⊙K)
m is fixed for now, and this argument works for all ϵ, so we arrive at the conclusion that
infs∈L1(m)m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1))=m(λx.infm′,b′∈K(x)m′(λy.f(x,y))+b′)
So now, we can proceed with our sequence of rewrites (our last point was as follows:)
=inf(m,b)∈H∩Ma(X)(infs∈L1(m)(m(λx.s(x)↓Y(λy.f(x,y))+s(x)↓1(1)))+b)
So we'll pick up from there.
=inf(m,b)∈H(m(λx.infm′,b′∈K(x)(m′(λy.f(x,y))+b′))+b)
=inf(m,b)∈H(m(λx.EK(x)(λy.f(x,y)))+b)
=EH(λx.EK(x)(λy.f(x,y)))
And we're done, we've shown that the expectations line up, so we have the right set form of semidirect product.
Proposition 25: The direct product is associative. (h1×h2)×h3=h1×(h2×h3)
Proof:
((h1×h2)×h3)(f)
=(h1×h2)(λx,y.h3(λz.f(x,y,z)))
=h1(λx.h2(λy.h3(λz.f(x,y,z))))
=h1(λx.(h2×h3)(λy,z.f(x,y,z)))
=(h1×(h2×h3))(f)
Done.
Proposition 26: If h1 and h2 are C-additive, then prX∗(h1×h2)=h1 and prY∗(h1×h2)=h2
prX∗(h1×h2)(f)=(h1×h2)(f∘prX)
=h1(λx.h2(λy.f(prX(x,y))))=h1(λx.h2(λy.f(x)))
=h1(λx.f(x))=h1(f)
That's one direction. For the second,
prY∗(h1×h2)(f)=(h1×h2)(f∘prY)
=h1(λx.h2(λy.f(prY(x,y))))=h1(λx.h2(λy.f(y)))
=h1(λx.h2(f))=h2(f)
Done.
Proposition 27: EK∗(H)(f)=EH(λx.EK(x)(f))
Proof of well-definedness of the pushforward: We'll use the same notational conveniences as in the proof of well-definedness of the product.
Our attempted set definition K∗(H) will be:
⋃(m,b)∈H(EmK(x)+(0,b))
Where EmK(x) equals
{(m′,b′)|∃s∈L1(m):∫Xs(x)dm=(m′,b′),∀x:s(x)∈K(x)}
The usual considerations about choice functions and weak measurability/strong measurability coinciding, along with the Bochner and Pettis integrals, means we aren't being vague here. The choice function is measurable and doesn't run off to infinity. Fortunately, we don't have to do the weird type conversions and isomorphisms from the semidirect product case, or do measurability arguments.
And now we can go:
inf(m,b)∈K∗(H)(m(f)+b)
=inf(m,b)∈H(inf(m′,b′)∈EmK(x)(m′(f)+b′)+b)
=inf(m,b)∈H(infs∈L1(m)((∫Xs(x)dm)↓Y(f)+(∫Xs(x)dm)↓b)+b)
=inf(m,b)∈H(infs∈L1(m)((s∗(m))↓Y(f)+(s∗(m))↓b(1))+b)
=inf(m,b)∈H(infs∈L1(m)(m(λx.s(x)↓Y(f))+m(λx.s(x)↓b(1)))+b)
=inf(m,b)∈H(infs∈L1(m)(m(λx.s(x)↓Y(f)+s(x)↓b(1)))+b)
Now, hypothetically, if
infs∈L1(m)m(λx.s(x)↓Y(f)+s(x)↓1(1))=m(λx.infm′,b′∈K(x)m′(f)+b′)
We could proceed further. And we can run through the exact same argument as from the semidirect product case to establish this equality, I'm not typing it again. So at this, point, we're at
=inf(m,b)∈H(m(λx.infm′,b′∈K(x)(m′(f)+b′))+b)
=inf(m,b)∈H(m(λx.K(x)(f))+b)
=EH(λx.EK(x)(f))
And the expectations match up and we're done.
Proposition 28: Ek∗(H)(f)=EH(λx.Ek(x)(f))
This is an easy one, k∗(H):={(k∗(m),b)|(m,b)∈H}, So then
Ek∗(H)(f)=inf(m,b)∈Hk∗(m)(f)+b
=inf(m,b)∈Hm(λx.k(x)(f))+b
=EH(λx.k(x)(f))
=EH(λx.Ek(x)(f))