Lemma 3:If F is a continuous function of type X→K(Y), where K(Y) is the space of nonempty compact subsets of the space Y, then given any compact set CX⊆X, ⋃x∈CXF(x) will be compact in Y.
Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.
Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection
Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:
Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}
Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.
This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.
Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).
Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.
Proposition 19:h⋉K is an infradistribution, and preserves all properties indicated in the diagram at the start of this section if h and all the K(x) have said property.
To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.
(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))
Our first order of business is verifying that
λx.K(x)(λy.f(x,y))
is even a continuous function to be able to show that h can accept it as input.
For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.
First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.
Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to
({xn}n∈N∪{x})×Cϵ
as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.
Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.
Now that we know our number m0 we can pick an arbitrary m above it, and go:
∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ
∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ
∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ
And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))
And for each \eps we can construct a m_0 in this way, concluding that
limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0
Also, from our pointwise convergence condition on infradistributions,
limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))
Therefore,
limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))
and so, we now know that
λx.K(x)(λy.f(x,y))
is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.
In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.
This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:
Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.
This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:
Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:
And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:
≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))
=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|
≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)
And regrouping this and recapping means that we have:
|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)
So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.
1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.
Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.
To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.
For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function
f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)
What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:
limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)
The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.
For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function
fi:=λy.inf(1,2ϵd(y,yi))
So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.
One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).
Because the distance from yi to any point in CK(xn) is above ϵ2, then
This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.
So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).
This lets us show that the set
⋃x∈Ch({x}×CK(x))
is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.
Also, by invoking Lemma 3, we know that
⋃x∈ChCK(x)
is compact.
Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:
Ch×⋃x∈ChCK(x)
Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,
⋃x∈Ch{x}×CK(x)
is a compact set, and from way back,
(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.
Proposition 20:If all the K(x) are C-additive, then prX∗(h⋉K)=h.
This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.
Proposition 21:If K0,K1,K2... are a sequence of infrakernels of type Kn:∏i=ni=0Xiik→Xn+1, and h is an infradistribution over X0, then (...((h⋉K0)⋉K1)...⋉Km) can be rewritten as h⋉K:m where K:n is an infrakernel of type X0ik→∏i=n+1i=1Xi, recursively defined as K:0:=K0 and K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
So, for our inductive definition,
K:0(x0):=K0(x0)
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)
Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,
(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n
For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0
Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.
Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.
Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.
Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any
K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.
Time to move onto the second one, compact-shared compact almost-support.
This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.
Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.
Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.
Further, due to compact-shared compact-almost-support for Kn+1 , the set
CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
Call said almost-support CXn+2ϵ2λ⊙:n
And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:
This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:
first. What we can do is use that, regardless of what is picked in the supremum, we have:
(x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
So this means that
CXn+2ϵ2λ⊙:n
is a ϵ2λ⊙:n-almost-support for Kn+1(x0,x1:n+1). Further, because f and f′ are identical on
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
and x1:n+1 was being selected from the former of those, then the functions λxn+2.f(x1:n+1,xn+2) (and the same for f′) agree on CXn+2ϵ2λ⊙:n, the almost-support. So, the supremum is upper-bounded by
And so we've shown that the functions are only ϵ times their distance apart, so the compact set we cooked up is indeed an ϵ-almost-support for K:n+1(x0) whenever x0∈CX0, and because ϵ and CX0 was arbitrary, we have compact-shared compact-almost-support for K:n+1.
Time to move onto the third one, pointwise convergence. If x0,m limits to x0,∞, we want K:n+1(x0,m)(f) to limit to K:n+1(x0,∞)(f). As usual, we use λ⊙n+1 for the Lipschitz constant of Kn+1 and λ⊙:n for the Lipschitz constant of K:n.
To begin with, fix an arbitrary ϵ and bounded continuous function f, and note that {x0,m}m∈N∪{∞} is a compact subset of X0. Because K:n:X0ik→∏i=n+1i=1Xi is assumed to be an infrakernel by induction, {x0,m}m∈N∪{∞} acts as a compact set for it. So, by compact-shared compact-almost-support for K:n, we can find a compact set C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| which is a ϵ4λ⊙n+1||f||-almost-support for K:n.
Also, it is important to note that
λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2))
Is a continuous function (as it must be for semidirect products with Kn+1 to have the functions on the inside be continuous). Accordingly, this means that the function:
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
must be uniformly continuous when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| And so, by uniform continuity, given any ϵ, there is some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output.
Now, here's what we'll be doing. We'll attempt to show the result that
Straight off the bat, we can apply Lemma 2 to decompose this difference into "starting Lipschitz constant times the difference of the inner functions on the compact set of interest" and "level of almost-support times the difference of the two functions", yielding:
Time to start breaking this down. First, to break down
supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
we can realize that the maximum value of one of these would be λ⊙n+1||f||, and the minimum possible value of one of these is −λ⊙n+1||f||, from Lipschitzness of Kn+1, producing an upper bound of:
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))) so
limm→∞(K:n(x0,m)⋉(λx1:n+1.Kn+1(x0,m,x1:n+1)))(f)
=(K:n(x0,∞)⋉(λx1:n+1.Kn+1(x0,∞,x1:n+1)))(f)
so
limm→∞K:n+1(x0,m)(f)=K:n+1(x0,∞)(f)
And we're done, we showed pointwise convergence for K:n+1 which is the last condition necessary to show it's an infrakernel, and the induction proof goes through to show that all the K:n are infrakernels.
Now all that's left is to show that
h⋉K:n+1=(h⋉K:n)⋉Kn+1
using induction, we have the base case set up. We can go:
Proposition 22:K:∞ is an infrakernel (C-additive, specifically) if all the Kn are C-additive infrakernels. It is unchanged by altering the Ci sequence of compact sets. In addition, if all the Kn are homogenous/cohomogenous/crisp/sharp, then K:∞ will be so as well.
So, K:∞:X0ik→∏∞i=1Xi is defined as: Fixing an arbitrary sequence of compact sets Ci∈Xi, K:∞(x0)(f):=limn→∞K:n(x0)(λx1:n+1infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)) Is it an infrakernel?
This is going to suck unbelievably much, we're gonna need a ton of results. The game plan is:
Part 1: Show that the functions you're feeding into those infrakernels are guaranteed to be continuous, to make some progress towards showing that K:∞ is well-defined.
Part 2: Show that all the K:n are 1-Lipschitz, and also preserve all nice properties we'd want if all the Kn do (homogenity, cohomogenity, C-additity, crispness, sharpness).
Part 3: Show that if a function only depends on the first n coordinates of the input, then all the K:n+m start agreeing on the expectation value of the function.
Part 4: Give a general procedure for taking a compact subset of the space X0 and making a compact subset of the space ∏∞i=1Xi with nice properties related to compact almost-support, that preserves its nice properties when projected down to any finite stage.
Part 5: Use parts 2, 3, 4, and a complicated chain of reasoning to get a result which implies that it doesn't matter whichCi sequence you pick, the limit will exist and be same for all of them, so K:∞ actually exists and is well-defined.
Part 6: Using parts 2 and 5, clean up the normalization, monotonicity, concavity, and C-additivity properties of K:∞. Showing that all the K:∞(x0) are C-additive trivially nets the bounded Lipschitz constant property to show that K:∞ is an infrakernel and K:∞(x0) is an infradistribution.
Part 7: Use our trick from Part 4 and our freedom of picking our compact set sequence from Part 5 to show compact-shared compact almost-support for K:∞, netting us the second infrakernel property, and the compact almost-support property for all the individual components of kernel, verifying the last condition we need to conclude that K:∞(x0) is an infradistribution.
Part 8: We recap one of the arguments for part 5, and it lets us get uniform convergence for a certain limit on any compact set, which is a critical lemma for Part 9.
Part 9: We use our result from Part 8 to invoke the Moore-Osgood theorem in order to show pointwise convergence for K:∞, wrapping up the last condition for it to be an infrakernel.
Part 10: Show that if all the K:n have some nice property, then the limit K:∞ inherits it too.
The proofs will proceed in a strange way to keep track of all the moving parts in places. We'll first present the thing we're trying to prove, and repeatedly go "we could prove it if we could prove this other thing", and keep chaining back until we get something that's easy to show.
Proof Part 1: Our desired result is whether the function λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞) is continuous. So, letting xm1:n+1 limit to x∞1:n+1, our task is to show that:
Now, what we can do is consider the compact subset of ∏i=∞i=1Xi to be {xm1:n+1}m∈N∪{∞}×∏∞i=n+2Ci
And then f must be uniformly continuous on it, so given any ϵ, there is some δ where points only δ away lead to only an ϵ differ in value. You can consider m to be big enough to guarantee that all future values of xm1:n+1 are within δ of x∞1:n+1, and then this gets that the function values can only differ by ϵ between (xm1:n+1,xn+2:∞) and (x∞1:n+1,xn+2:∞) if xn+2:∞∈∏∞i=n+2Ci, which it is. This ensures that the worst-case function values are only ϵ apart. This works for all ϵ, showing
And so, all the functions we're feeding into the K:n(x0) are continuous.
Proof Part 2: Desired result is "if all the Kn have a nice property, then all the K:n have it too".
This can be simply addressed by noting that, for the base case, because K:0=K0 and we're assuming all the Kn have (C-additivity/cohomogenity/homogenity/crispness/sharpness), K:0 trivially fulfills it.
And for the induction step, if we assume that K:n is 1-Lipschitz, note that:
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And, by our results on semidirect products preserving nice properties, if K:n(x0) has the nice property (by induction assumption) and Kn+1 does, then we get that K:n+1(x0) preserves the same property, and it holds all the way up the K:n. And we can move on to Part 3.
Part 3: Showing that, if we go far enough out in the K:n, the value assigned to functions which only depend on finitely many inputs stabilizes. The result that we'd like to show at this point is:
Admittedly, f is not of the proper type signature to be evaluated by K:n+m(x0), but we're abusing notation so that we can feed it in anyways and it just ignores all the coordinates it doesn't need. Accordingly, fix an arbitrary x0,n,f, and our proof target will now be:
∀m∈N:K:n+m(x0)(f)=K:n+m+1(x0)(f)
Proving this would let you apply induction, because we have a base case where K:n+0(x0)(f)=K:n(x0)(f). Let m be arbitrary. Then, we can go:
This is a bit complicated. It's saying that if you pick any compact subset of X0, you can make a compact subset of ∏∞i=1Xi where the projection of it to coordinates 1 through n+1 acts as a compact ϵ(1−12n+1)-almost-support for all the K:n(x0) infradistributions when x0 lies in your compact subset of X0. Regardless of what n is.
Accordingly, fix some CX0 and ϵ. Now, we can recursively build up compact subsets of all the Xn in the following way.
CXn+1ϵ2n+1⊆Xn+1 is a ϵ2n+1-almost-support for all the Kn(x1:n) where x1:n∈CX0×∏i=ni=1CXiϵ2i. So, basically, we're recursively building up compact subsets of ∏i=ni=0Xi by taking products of earlier compact subsets (with your base case being CX0), and then going "that's a compact subset of the input to Kn, we must be able to find a compact subset of Xn+1 that's a ϵ2n+1-almost-support for all the Kn(x:n) where x:n lies in our compact subset of input, because of the compact-shared almost-support condition for all the Kn" to go to the next compact set.
To establish some notation to make this a bit easier, let
Ci[CX0,ϵ]:=CXiϵ2i+1
(the i'th compact set in the sequence, defined with CX0 to start building your sequence), and let
C1:n[CX0,ϵ]:=∏i=ni=1Ci[CX0]
(the product of compact sets 1 through n, which is compact)
And let
C1:∞[CX0,ϵ]:=∏∞i=1Ci[CX0]
This is the product of all the compact sets, and is compact.
Note the dependence of these on the starting compact sets. Notice that the projection of C1:∞[CX0,ϵ] to coordinates 1 through n is exactly C1:n[CX0,ϵ].
Now that this is established, our proof target is (using our new notation):
Using that K:0=K0 and that C1[CX0,ϵ]=CX1ϵ2 and ϵ(1−12)=ϵ2 our proof target is now:
∀x0∈CX0,f,f′∈CB(X1):
f↓CX1ϵ2=f′↓CX1ϵ2→|K0(x0)(f)−K0(x0)(f′)|≤ϵ2d(f,f′)
However, we constructed CX1ϵ2 to be a ϵ2-almost-support for all the K0(x0) where x0∈CX0, so this statement is just true, and we're done with our base case.
Therefore let x0,f,f′ be arbitrary, and remember that they have the indicated properties, and that f,f′ agree with each other on the indicated set C1:n+2[CX0,ϵ]. Our proof target is now:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Unpacking the definition of K:n+1 and rewriting the thing on the end, this is equivalent to (we now take this as the proof target)
We can apply the Lemma 2 decomposition, to split this into "level of support of compact set x distance of functions + distance of functions on compact set x lipschitz constant of infradistribution". So, theoretically, if we had the following two results:
Which is exactly what we need. This works because C1:n+1[CX0,ϵ] is an ϵ(1−12n+1)-almost-support for K:n(x0) by our induction assumption, our 2 assumptions, Kn+1 be a 1-Lipschitz infrakernel, and Lemma 2. Accordingly, let's try to show our two proof targets we need to wrap up this result. We'll start with
Now, we know that f and f′ agree on C1:n+2[CX0,ϵ] by assumption, which is a set that factorizes as C1:n+1[CX0,ϵ]×Cn+2[CX0,ϵ], and we have a promise that x1:n+1∈C1:n+1[CX0,ϵ], so λxn+2.f(x1:n+1,xn+2) equals λxn+2.f′(x1:n+1,xn+2) on the set Cn+2[CX0,ϵ], which is a ϵ2n+2-almost-support for all the Kn+1(x0,x1:n+1) where x0∈CX0 (which is the case by assumption) and x1:n+1∈C1:n+1[CX0,ϵ] (also the case), so we have our result.
Now for the other branch,
∀x1:n+1:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤d(f,f′)
Due to 1-Lipschitzness of Kn+1(x0,x1:n+1) regardless of x0 and x1:n+1, we could prove it if, regardless of x1:n+1,
Part 5: Our aim here is to show that no matter what sequence of compact sets you have, they all limit to the same result, so our limit is going to be well defined. In order to do this, we'll have to show the result (letting Ci,1 being your first sequence of nonempty compact sets to attempt to define the limit and Ci,2 being your second sequence of nonempty compact sets, and abbreviating Cn:∞,1 as the product of the Ci,1 from n to ∞) that,
In words, this is saying that for any two sequences of compact sets, there exists some threshold where if you pick any value of the defining sequence for K:∞(x0)(f) associated with using Ci,1 as your compact sets, and the sequence associated with using Ci,2 as your compact sets (as long as they're both past some shared threshold), they'll be close. If you have both sequences being identical, then this result is basically saying that the sequence used to define K:∞(x0)(f) is Cauchy (never varies too far from itself after a finite time), and thus the limit exists. And if you have the sequences being different, then this can be used to show that they limit to each other, so K:∞ is well-defined and you always get the same result no matter which particular sequence of compact sets you fix.
This is going to be rough. Fix our x0,ϵ,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2,f (input value, closeness parameter, two sequences of compact sets, function), and now we'll find our n to make
true. Do it in the following way. Use {x0} as your compact seed set to invoke the technique in part 4 to construct your sequence Ci[x0,ϵ], and then consider the set:
∏∞i=1(Ci,1∪Ci,2∪Ci[x0,ϵ])
A finite union of compact sets is compact, and the product of compact sets is compact. If we restrict f to this set, it's uniformly continuous. In particular, using the standard product metric (which metrizes the product topology), defined as:
d(x1:∞,x′1:∞)=∑∞i=12−iinf(1,dXi(xi,x′i))
We can conclude that two sequences which agree up till time n must be, at most, distance 2−n apart. Since f restricted to our compact set of interest is uniformly continuous, there is some δ difference inputs that only leads to an ϵ difference in values. Now we can define our n as log2(δ).
Now that we've picked our n, let m1 and m2 be arbitrary. Our goal is to now prove:
The distances group into three "chunks". What we'll do is show that chunks 1 and 3 have value upper-bounded by ϵ(2||f||+1), and chunk 2 has a value of 0, producing our net 2ϵ(2||f||+1) upper-bound, and we'd be done. So, let's try to show the first one, that:
The way we'll deal with this one is by using our good old Lemma 2, where we split up the difference of the two quantities into "how much of a support is this set times how different are the two functions" and "how close are the two functions on this set times the Lipschitz constant of the infradistributions". We'll be picking the set C1:n+m1+1[x0,ϵ], which is an ϵ-almost-support for K1:n+m1(x0) by our discussions in Section 4. Because the infrakernels are always 1-Lipschitz because of C-additivity, the maximum/minimum expectation value the functions
can have is ||f|| (or −||f||) respectively. This produces a 2ϵ||f|| bound on the value of that piece produced via Lemma 2. All that remains is to prove that
Accordingly, let xn+m1+2:∞ and xn+2:∞ be selected from the appropriate sets, and our goal is now:
|f(x1:n+m1+1,xn+m1+2:∞)−f(x1:n+1,xn+2:∞)|≤ϵ
At this point, we should remember that if you have a promise that your input to f will be within the set ∏∞i=1(Ci,1∪Ci,2∪Ci[x0,ϵ])
then knowing the first n coordinates fully pins down the value of your function f to within ϵ, by how we picked our n. And then we can notice something interesting. By how they were selected,
So, the two inputs are both in the relevant compact set, and agree on the first n coordinates, so they agree to within ϵ value, and our desired result is the case. We've showed
Ie, no matter which sequence of compact sets is selected, the two convergent sequences get arbitrarily close to each other, so our definition of K:∞(x0)(f) doesn't just have the limit being well-defined, it has it being the same regardless of which sequence of compact sets Ci was selected.
With this, now we can let the compact sequence be whatever is most convenient for arguments, as it always produces the same limit no matter what. (continued in next post)
Lemma 3: If F is a continuous function of type X→K(Y), where K(Y) is the space of nonempty compact subsets of the space Y, then given any compact set CX⊆X, ⋃x∈CXF(x) will be compact in Y.
Fix some compact set CX⊆X, and continuous function F:X→K(Y). We will operate by taking an arbitrary open cover of ⋃x∈CXF(x) and finding a finite subcover.
Let {Oi}i∈I be an open cover of ⋃x∈CXF(x). The Oi are subsets of Y. The topology compatible with Hausdorff distance on K(Y) (space of compact subsets of Y) is the Vietoris topology, where the basis opens are given by finite collections of open sets in Y. You take the set of all compact subsets of Y which are subsets of the union of your finite collection of open sets, and intersect every open set in your finite collection
Accordingly, let J=Pfin(I) (the set of all finite subsets of I, the index set for our open cover), and fix a collection of open sets in K(Y), {Oj}j∈J. The sets Oj are defined as:
Oj:={CY∈K(Y)|CY⊆⋃i∈jOi∧∀i∈j:CY∩Oi≠∅}
Now, all the F(x) with x∈CX are compact (F produces compact sets as output), and they are all subsets of ⋃x∈CXF(x), so {Oi}i∈I is a cover of F(x), and due to its compactness we can identify a finite subcover, and prune away every open st which doesn't intersect F(x). F(x) is a subset of the union of those finitely many open sets, and intersects all of them, so the point F(x)∈K(Y) lies in the open set Oj induced by that finite cover of open sets.
This argument works for arbitrary F(x) with x∈CX, so the collection {Oj}j∈J is an open cover of F(CX). Also, because F is continuous and CX is compact, F(CX) is compact, so we can identify a finite subcover from {Oj}j∈J.
Then, consider the collection of open sets Oi where i∈j for some Oj which is part of the finite cover of F(CX). This is finitely many opens, we're unioning together finitely many (finitely many Oj selected) finite sets of open sets (each Oj is associated with finitely many Oi that it was built from).
Now we just have to show that this collection covers ⋃x∈CXF(x), and we'll have made our finite subcover and shown that said set is compact. Assume our finite collection of opens doesn't cover the set. Then there's some F(x) which wasn't covered completely. However, the point corresponding to F(x) in K(Y) lies in some Oj, and from its definition, the corresponding Oi manage to cover F(x), and we have a contradiction. We're done.
Proposition 19: h⋉K is an infradistribution, and preserves all properties indicated in the diagram at the start of this section if h and all the K(x) have said property.
To show this, we'll verify that it's well-defined at all, normalization, monotonicity, concavity, Lipschitzness, compact almost-support, and preservation of the properties.
(h⋉K)(f):=h(λx.K(x)(λy.f(x,y)))
Our first order of business is verifying that
λx.K(x)(λy.f(x,y))
is even a continuous function to be able to show that h can accept it as input.
For continuity, let xn limit to x, and we'll try to show that K(xn)(λy.f(xn,y)) limits to K(x)(λy.f(x,y)). Let λ⊙K be the Lipschitz constant upper bound of K.
Pick an ϵ, we'll show that there's some m0 where
∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
First, note that {xn}n∈N∪{x} is a compact set because xn limits to x. Thus, by the compact-shared compact almost-support condition on an infrakernel, there must be some compact set Cϵ⊆Y where all the K(xn) agree that functions f,f′ agreeing on Cϵ have values only ϵd(f,f′) apart from each other.
Now, because f is a continuous bounded function X×Y→R, it's uniformly continuous when restricted to
({xn}n∈N∪{x})×Cϵ
as this is the product of two compact sets and is compact. Due to the uniform continuity of f restricted to that set, there is some number δ where points only δ apart in that set have their values only differing by ϵ. Further, there is some number m0 where, for all m≥m0, d(xm,x)<δ.
Additionally, the maximum difference between λy.f(x,y) and λy.f(x′,y) is 2||f||.
Now that we know our number m0 we can pick an arbitrary m above it, and go:
∀m≥m0∀y∈Cϵ:d((xm,y),(x,y))=d(xm,x)≤δ
∀m≥m0∀y∈Cϵ:|f(xm,y)−f(x,y)|≤ϵ
∀m≥m0:d((λy.f(xm,y))↓Cϵ,(λy.f(x,y))↓Cϵ)≤δ
And now, because these two functions restricted to Cϵ are only ϵ apart, we can apply Lemma 2 to conclude that (since Cϵ and λ⊙K work for all the K(xn))
∀n:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ⋅2||f||+ϵλ⊙K
This argument works for any m≥m0, so we have:
∃m0∀n∀m≥m0:|K(xn)(λy.f(xm,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
Letting n=m in particular,
∃m0∀n≥m0:|K(xn)(λy.f(xn,y))−K(xn)(λy.f(x,y))|≤ϵ(2||f||+λ⊙K)
And for each \eps we can construct a m_0 in this way, concluding that
limn→∞|K(xn)(λy.f(xx,y))−K(xn)(λy.f(x,y))|=0
Also, from our pointwise convergence condition on infradistributions,
limn→∞K(xn)(λy.f(x,y))=K(x)(λy.f(x,y))
Therefore,
limn→∞K(xn)(λy.f(xn,y))=K(x)(λy.f(x,y))
and so, we now know that
λx.K(x)(λy.f(x,y))
is a continuous function X→R. For boundedness, upper and lower bounds on λy.f(x,y) are ||f|| (and the negative version of it). Due to the shared Lipschitz constant on the K(x), an upper and lower-bound on λx.K(x)(λy.f(x,y)) is λ⊙K||f|| (and the negative version.) Thus, we can safely feed said function into the infradistribution h, so the semidirect product is well-defined. We must still show that it makes an infradistribution.
For normalization,
(h⋉K)(1)=h(λx.K(x)(λy.1))=h(λx.1)=1
(h⋉K)(0)=h(λx.K(x)(λy.0))=h(λx.0)=0
For monotonicity, if f′≥f,
∀x:λy.f′(x,y)≥λy.f(x,y)
∀x:K(x)(λy.f′(x,y))≥K(x)(λy.f(x,y))
λx.K(x)(λy.f′(x,y))≥λx.K(x)(λy.f(x,y))
(h⋉K)(f′)=h(λx.K(x)(λy.f′(x,y)))≥h(λx.K(x)(λy.f′(x,y)))=(h⋉K)(f)
For concavity,
(h⋉K)(pf+(1−p)f′)=h(λx.K(x)(λy.pf(x,y)+(1−p)f′(x,y)))
≥h(λx.pK(x)(λy.f(x,y))+(1−p)K(x)(λy.f′(x,y)))
≥ph(λx.K(x)(λy.f(x,y)))+(1−p)h(λx.K(x)(λy.f′(x,y)))
=p(h⋉K)(f)+(1−p)(h⋉K)(f′)
In order, this was the definition of the semidirect product, all the K(x) being concave so splitting them up produces a lower value (and then monotonicity for h), then h being concave.
This leaves Lipschitzness and CAS. For Lipschitzness, given some f and f′, and letting λ⊙h be the Lipschitz constant of h, we have:
|(h⋉K)(f)−(h⋉K)(f′)|=|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|
≤λ⊙hd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
=λ⊙hsupx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
≤λ⊙hsupxλ⊙Kd(λy.f(x,y),λy.f′(x,y))=λ⊙hλ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))
≤λ⊙hλ⊙Kd(f,f′)
Thus, that final thing shows that there's a finite Lipschitz constant for h⋉K.
This leaves compact almost-support. Pick any ϵ. This induces a compact set CXϵ which is an ϵ-almost-support for h, and then this compact set induces a compact set CYϵ which an ϵ-almost-support for all the K(x) where x∈CXϵ. Now, we can apply Lemma 2 to go:
|h(λx.K(x)(λy.f(x,y)))−h(λx.K(x)(λy.f′(x,y)))|
≤ϵd(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
Pretty much, that first part is the "CXϵ is an ϵ-almost-support for h" piece, and the second piece is the "hey, these two functions may be a bit different on said compact set, we've gotta multiply that by the Lipschitz constant" piece. So, let's work on unpacking these two distances. For the first one, we can go:
d(λx.K(x)(λy.f(x,y)),λx.K(x)(λy.f′(x,y)))
=supx|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
≤supxλ⊙Kd(λy.f(x,y),λy.f′(x,y))
=λ⊙Ksupxd(λy.f(x,y),λy.f′(x,y))
=λ⊙Ksupxsupy|f(x,y)−f′(x,y)|
=λ⊙Ksupx,y|f(x,y)−f′(x,y)|=λ⊙Kd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+λ⊙hd((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
Time to go after the second distance piece. We have:
d((λx.K(x)(λy.f(x,y)))↓CXϵ,(λx.K(x)(λy.f′(x,y)))↓CXϵ)
=supx∈CXϵ|K(x)(λy.f(x,y))−K(x)(λy.f′(x,y))|
And, because f and f′ agree on CXϵ×CYϵ, we have λy.f(x,y) and λy.f′(x,y) agreeing on CYϵ, which is an ϵ-almost-support for all the K(x) where x∈CXϵ, so we have:
≤supx∈CXϵϵd(λy.f(x,y),λy.f′(x,y))
=ϵsupx∈CXϵsupy|f(x,y)−f′(x,y)|
≤ϵsupx,y|f(x,y)−f′(x,y)|=ϵd(f,f′)
Substituting this back in produces:
≤ϵλ⊙Kd(f,f′)+ϵλ⊙hd(f,f′)
And regrouping this and recapping means that we have:
|(h⋉K)(f)−(h⋉K)(f′)|≤ϵ(λ⊙K+λ⊙h)d(f,f′)
So we have crafted a compact ϵ(λ⊙K+λ⊙h)-support for h⋉K, and we can make ϵ arbitrarily small, so the semidirect product has compact almost-support, which is the last condition we needed.
Time for property verification.
Homogenity:
(h⋉K)(af)=h(λx.K(x)(λy.af(x,y)))=h(λx.aK(x)(λy.f(x,y)))
=ah(λx.K(x)(λy.f(x,y)))=a(h⋉K)(f)
1-Lipschitz: We showed in the Lipschitz section that an upper bound on the Lipschitz constant of h⋉K is the product of the Lipschitz constants of the kernel and the original infradistribution, so 1⋅1=1 and 1-Lipschitzness is preserved.
Cohomogenity:
(h⋉K)(1+af)=h(λx.K(x)(λy.1+af(x,y)))=h(λx.1−a+aK(x)(λy.1+f(x,y)))
=h(λx.1+a(−1+K(x)(λy.1+f(x,y))))=1−a+ah(λx.1−1+K(x)(λy.1+f(x,y))))
=1−a+ah(λx.K(x)(λy.1+f(x,y))))=1−a+a(h⋉K)(1+f)
C-additivity:
(h⋉K)(c)=h(λx.K(x)(λy.c))=h(λx.c)=c
Crispness: Both homogenity and C-additivity are preserved, so crispness is too.
Sharpness:
(h⋉K)(f)=h(λx.K(x)(λy.f(x,y)))=h(λx.infy∈CK(x)f(x,y))
=infx∈Ch(infy∈CK(x)f(x,y))=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
Our task now is to show that ⋃x∈Ch({x}×CK(x)) is compact, which will take a fair amount of topology work. Our first piece that we'll need is that if xn limits to x, then CK(xn) limits to CK(x) in Hausdorff-distance.
To show this, we'll split it into two parts. First, we'll assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x) and disprove that. Second, we'll assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn), and disprove that.
For the first part, assume that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). Craft the continuous function
f1:=λy.sup(1−1ϵinfy′∈CK(x)d(y,y′),0)
What this does is it's 1 on the set CK(x), and 0 on anything more than ϵ away from it. One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), so:
limn→∞infy∈CK(xn)f1(y)=infy∈CK(x)f1(y)
The latter term is 1 because f1 is 1 over CK(x). However, because we're assuming that infinitely often, there's a point in CK(xn) that is ϵ away from CK(x), the sequence on the left-hand side is infinitely often 0, so it doesn't converge and we have a contradiction.
For the second part, assume there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). By compactness of CK(x), we can find finitely many points yi in it s.t. every point in CK(x) is only ϵ2 away from one of the yi (cover CK(x) with ϵ2-size open balls centered on points in it and take a finite subcover). Now, for each of these, we can craft a function
fi:=λy.inf(1,2ϵd(y,yi))
So, this is 0 at the point yi, and 1 at any distance ϵ2 or more away from it.
One of our conditions on an infrakernel was that limn→∞K(xn)(f)=K(x)(f), and there are finitely many fi, so there's some time where all of them nearly converge, ie:
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
However, infinitely often there's a point yn∈CK(x) that is ϵ away from CK(xn). yn is ϵ2 away from some yi, so that yi can't be closer than ϵ2 to CK(xn). (if it was closer, then we could pick some point in CK(xn) that's closer than ϵ2 to yi, and then since it's only ϵ2 away from yn, we'd have that the distance from yn to CK(xn) is below ϵ2, an impossibility).
Because the distance from yi to any point in CK(xn) is above ϵ2, then
|K(xn)(fi)−K(x)(fi)|=|infy∈CK(xn)fi(y)−infy∈CK(x)fi(y)|=|1−0|=1
This is because yi∈CK(x) and attains a value of 0 according to fi, while CK(xn) stays away from yi and all its points must have a value of 1. This situation happens infinitely often, which leads to a contradiction with
limn→∞supi|K(xn)(fi)−K(x)(fi)|=0
Because infinitely often, one of these fi has very different values, so the sequence is 1 infinitely often and can't limit to 0.
So, we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(xn) that is ϵ away from CK(x). And we've ruled out that there is an ϵ where, infinitely often, there is a point in CK(x) that is ϵ away from CK(xn). Fixing any ϵ, in the tail of the sequence, CK(x) and CK(xn) are ϵ distance or closer in Hausdorff distance because you can't find points in either set which are far away from the other set. So, CK(xn) limits to CK(x) in Hausdorff-distance when xn limits to x, and we know that x↦CK(x) is a continuous function X→K(Y).
This lets us show that the set
⋃x∈Ch({x}×CK(x))
is closed, because if xn limits to x and yn∈CK(xn) and yn limits to y, we have that y∈CK(x) because CK(xn) limits to CK(x) in Hausdorff distance, so we've got closed graph.
Also, by invoking Lemma 3, we know that
⋃x∈ChCK(x)
is compact.
Time to wrap this all up. We know that ⋃x∈Ch{x}×CK(x) is closed in X×Y from our Hausdorff limit argument. This set is also a subset of:
Ch×⋃x∈ChCK(x)
Which is a product of two sets known to be compact, and is compact. It's a closed subset of a compact set, so it's compact. Therefore,
⋃x∈Ch{x}×CK(x)
is a compact set, and from way back,
(h⋉K)(f)=inf(x,y)∈⋃x∈Ch({x}×CK(x))f(x,y)
And we've shown that set is compact, so h⋉K where h and all the K(x) are sharp can be written as minimizing over a compact set, so h⋉K is sharp. Thus, semidirect product preserves all the nice properties, and we're finally done with this proof.
Proposition 20: If all the K(x) are C-additive, then prX∗(h⋉K)=h.
prX∗(h⋉K)(f)=(h⋉K)(f∘prX)=h(λx.K(x)(λy.f(prX(x,y))))
=h(λx.K(x)(λy.f(x)))=h(λx.f(x))=h(f)
This is because, since f(x) doesn't depend on y, it acts as a constant inside K(x) and C-additivity lets us pull it out.
Proposition 21: If K0,K1,K2... are a sequence of infrakernels of type Kn:∏i=ni=0Xiik→Xn+1, and h is an infradistribution over X0, then (...((h⋉K0)⋉K1)...⋉Km) can be rewritten as h⋉K:m where K:n is an infrakernel of type X0ik→∏i=n+1i=1Xi, recursively defined as K:0:=K0 and K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
So, for our inductive definition,
K:0(x0):=K0(x0)
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)
Our task is to show that these are all infrakernels, by induction, and that for any infradistribution h,
(...((h⋉K0)⋉K1)...⋉Kn)=h⋉K:n
For the base case, we observe that K:0 is an infrakernel because it equals K0, which is an infrakernel, and that h⋉K0=h⋉K:0
Time for the induction step. We'll assume that K:n is an infrakernel, and show that K:n+1 is. Further, we need to show that h⋉K:n+1=(h⋉K:n)⋉Kn+1. This will show the result.
Our first requirement is showing that for all x0, K:n+1(x0) is an infradistribution.
K:n+1(x0):=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
By our induction assumption, K:n(x0) is an infradistribution as K:n is an infrakernel. Further, λx1:n+1.Kn+1(x0,x1:n+1) is an infrakernel because Kn+1 is and we're just restricting it to a subset of its domain, so it keeps being an infrakernel. And we know from earlier that the semidirect product of an infradistribution and an infrakernel is an infradistribution. So that's taken care of.
Now, we must show a common Lipschitz constant, pointwise function convergence, and compact-shared compact almost-support for K:n+1 to certify that it's an infrakernel.
Starting with common Lipschitz constant, we can just note that, in our proof of Proposition 19, we saw that the Lipschitz constant of the semidirect product was upper-bounded by the product of the Lipschitz constants of the starting infradistributions and the kernel. Assuming that K:n is an infradistribution, we have that the Lipschitz constant of any K:n(x0) is upper-bounded by some λ⊙:n Lipschitz constant. Also, the Lipschitz constant of Kn+1(x0,x1:n+1) is upper-bounded by some λ⊙n+1 Lipschitz constant. Thus, λ⊙:nλ⊙n+1 is an upper-bound on the Lipschitz constant of any
K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
infradistribution, which is exactly K:n+1(x0), witnessing that K:n+1 has a uniform upper bound on its Lipschitz constants.
Time to move onto the second one, compact-shared compact almost-support.
For this one, we're trying to prove:
∀CX0,ϵ∃C∏i=n+2i=1Xiϵ⊆∏i=n+2i=1Xi∀x0∈CX0,f,f′:
f↓C∏i=n+2i=1Xiϵ=f′↓C∏i=n+2i=1Xiϵ→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵd(f,f′)
This is the sentence that says that K:n+1 has compact-shared compact almost-support. f and f′ have type signature ∏i=n+2i=1Xi→R.
Now, this is going to be quite complicated, so pay close attention. Fix an arbitrary compact CX0⊆X0, and an arbitrary ϵ. Let λ⊙:n be the Lipschitz constant for the infrakernel K:n, and λ⊙n+1 be the Lipschitz constant for the infrakernel Kn+1.
Due to compact-shared compact-almost-support for K:n which exists by our induction assumption, your set CX0 induces a compact ϵ2λ⊙n+1-almost-support for the family of infradistributions K:n(x0) where x0∈CX0. Call said almost-support C∏i=n+1i=1Xiϵ2λ⊙n+1.
Further, due to compact-shared compact-almost-support for Kn+1 , the set
CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
induces a compact ϵ2λ⊙:n-almost-support for the family of infradistributions Kn+1(x0,x1:n+1) where (x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
Call said almost-support CXn+2ϵ2λ⊙:n
And now let your shared ϵ-almost-support for K:n+1(x0) where x0∈CX0 be:
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
We must show that said set is indeed a shared ϵ-almost-support for K:n+1(x0) where x0∈CX0. So, let f and f′ agree on said set. Then, we have:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|
=|(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f)−(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(f′)|
=|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
This is just unpacking the definition of the iterated semidirect product, no issues here. Now, we use Lemma 2 and the fact that C∏i=n+1i=1Xiϵ2λ⊙n+1 is a ϵ2λ⊙n+1-almost-support for K:n(x0) when x0∈CX0, to get:
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
+ϵ2λ⊙n+1supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
Ok, this is a mess. Let's try to unpack
supx1:n+1∈C∏i=n+1i=1Xiϵ2λ⊙n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
first. What we can do is use that, regardless of what is picked in the supremum, we have:
(x0,x1:n+1)∈CX0×C∏i=n+1i=1Xiϵ2λ⊙n+1
So this means that
CXn+2ϵ2λ⊙:n
is a ϵ2λ⊙:n-almost-support for Kn+1(x0,x1:n+1). Further, because f and f′ are identical on
C∏i=n+1i=1Xiϵ2λ⊙n+1×CXn+2ϵ2λ⊙:n
and x1:n+1 was being selected from the former of those, then the functions λxn+2.f(x1:n+1,xn+2) (and the same for f′) agree on CXn+2ϵ2λ⊙:n, the almost-support. So, the supremum is upper-bounded by
≤ϵ2λ⊙:nd(f,f′)
Substituting this back in, we get:
≤λ⊙:nϵ2λ⊙:nd(f,f′)+ϵ2λ⊙n+1supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
Now let's try to unpack
supx1:n+1|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|
We don't have much leverage on it, besides using the basic Lipschitz constant upper bound, so let's try that.
≤supx1:n+1λ⊙n+1d(λxn+2.f(x1:n+1,xn+2),λxn+2.f′(x1:n+1,xn+2))
=λ⊙n+1d(f,f′)
And substituting this back in, we get:
≤λ⊙:nϵ2λ⊙:nd(f,f′)+ϵ2λ⊙n+1λ⊙n+1d(f,f′)
=ϵd(f,f′)
And so we've shown that the functions are only ϵ times their distance apart, so the compact set we cooked up is indeed an ϵ-almost-support for K:n+1(x0) whenever x0∈CX0, and because ϵ and CX0 was arbitrary, we have compact-shared compact-almost-support for K:n+1.
Time to move onto the third one, pointwise convergence. If x0,m limits to x0,∞, we want K:n+1(x0,m)(f) to limit to K:n+1(x0,∞)(f). As usual, we use λ⊙n+1 for the Lipschitz constant of Kn+1 and λ⊙:n for the Lipschitz constant of K:n.
To begin with, fix an arbitrary ϵ and bounded continuous function f, and note that {x0,m}m∈N∪{∞} is a compact subset of X0. Because K:n:X0ik→∏i=n+1i=1Xi is assumed to be an infrakernel by induction, {x0,m}m∈N∪{∞} acts as a compact set for it. So, by compact-shared compact-almost-support for K:n, we can find a compact set C∏i=n+1i=1Xiϵ4λ⊙n+1||f|| which is a ϵ4λ⊙n+1||f||-almost-support for K:n.
Also, it is important to note that
λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2))
Is a continuous function (as it must be for semidirect products with Kn+1 to have the functions on the inside be continuous). Accordingly, this means that the function:
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
must be uniformly continuous when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
And so, by uniform continuity, given any ϵ, there is some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output.
Now, here's what we'll be doing. We'll attempt to show the result that
∀ϵ∃m∗∀m≥m∗:|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|≤ϵ
Straight off the bat, we can apply Lemma 2 to decompose this difference into "starting Lipschitz constant times the difference of the inner functions on the compact set of interest" and "level of almost-support times the difference of the two functions", yielding:
|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
+ϵ4λ⊙n+1||f||supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
Time to start breaking this down. First, to break down
supx1:n+1|Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|
we can realize that the maximum value of one of these would be λ⊙n+1||f||, and the minimum possible value of one of these is −λ⊙n+1||f||, from Lipschitzness of Kn+1, producing an upper bound of:
≤2λ⊙n+1||f||
Substituting this back in, we have:
≤λ⊙:nsupx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|+ϵ4λ⊙n+1||f||2λ⊙n+1||f||
And now, we can use the fact that there is always some δ difference in inputs which gives rise to a ϵ2λ⊙:n difference in output of the function
λx0,x1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
when restricted to the set {x0,m}m∈N∪{∞}×C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
in order to find some m∗ where all future m have x0,m being within δ of x0,∞.
This tiny difference means that the values
Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
and
Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
will only differ by ϵ2λ⊙:n for all x1:n+1 which lie in
C∏i=n+1i=1Xiϵ4λ⊙n+1||f||
Therefore, we have that for all m past m∗,
supx1:n+1∈C∏i=n+1i=1Xiϵ4λ⊙n+1||f|||Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2))|≤ϵ2λ⊙:n
And substituting this back in, we have:
≤λ⊙:nϵ2λ⊙:n+ϵ4λ⊙n+1||f||2λ⊙n+1||f||=ϵ
And ϵ was arbitrary. Therefore we have our desired result that, regardless of bounded continuous function f,
∀ϵ∃m∗∀m≥m∗:|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|≤ϵ
Therefore,
limm→∞|K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))|=0
These two things limit increasingly close to each other. Further,
limm→∞K:n(x0,m)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
By pointwise convergence for K:n which is an infrakernel by our induction assumption. Putting these two parts together, we have:
limm→∞K:n(x0,m)(λx1:n+1.Kn+1(x0,m,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
=K:n(x0,∞)(λx1:n+1.Kn+1(x0,∞,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
so
limm→∞(K:n(x0,m)⋉(λx1:n+1.Kn+1(x0,m,x1:n+1)))(f)
=(K:n(x0,∞)⋉(λx1:n+1.Kn+1(x0,∞,x1:n+1)))(f)
so
limm→∞K:n+1(x0,m)(f)=K:n+1(x0,∞)(f)
And we're done, we showed pointwise convergence for K:n+1 which is the last condition necessary to show it's an infrakernel, and the induction proof goes through to show that all the K:n are infrakernels.
Now all that's left is to show that
h⋉K:n+1=(h⋉K:n)⋉Kn+1
using induction, we have the base case set up. We can go:
(h⋉K:n+1)(f)
=h(λx0.K:n+1(x0)(λx1:n+2.f(x0,x1:n+2)))
=h(λx0.(K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1)))(λx1:n+2.f(x0,x1:n+2)))
=h(λx0.K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x0,x1:n+1,xn+2))))
=(h⋉K:n)(λx:n+1.Kn+1(x:n+1)(λxn+2.f(x:n+1,xn+2)))
=((h⋉K:n)⋉Kn+1)(f)
And we're done. Because
h⋉K:n+1=(h⋉K:n)⋉Kn+1
and we know that h⋉K:0=h⋉K0, this means that
∀m:(...((h⋉K0)⋉K1)...⋉Km)=h⋉K:m
Proposition 22: K:∞ is an infrakernel (C-additive, specifically) if all the Kn are C-additive infrakernels. It is unchanged by altering the Ci sequence of compact sets. In addition, if all the Kn are homogenous/cohomogenous/crisp/sharp, then K:∞ will be so as well.
So, K:∞:X0ik→∏∞i=1Xi is defined as: Fixing an arbitrary sequence of compact sets Ci∈Xi,
K:∞(x0)(f):=limn→∞K:n(x0)(λx1:n+1infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
Is it an infrakernel?
This is going to suck unbelievably much, we're gonna need a ton of results. The game plan is:
Part 1: Show that the functions you're feeding into those infrakernels are guaranteed to be continuous, to make some progress towards showing that K:∞ is well-defined.
Part 2: Show that all the K:n are 1-Lipschitz, and also preserve all nice properties we'd want if all the Kn do (homogenity, cohomogenity, C-additity, crispness, sharpness).
Part 3: Show that if a function only depends on the first n coordinates of the input, then all the K:n+m start agreeing on the expectation value of the function.
Part 4: Give a general procedure for taking a compact subset of the space X0 and making a compact subset of the space ∏∞i=1Xi with nice properties related to compact almost-support, that preserves its nice properties when projected down to any finite stage.
Part 5: Use parts 2, 3, 4, and a complicated chain of reasoning to get a result which implies that it doesn't matter which Ci sequence you pick, the limit will exist and be same for all of them, so K:∞ actually exists and is well-defined.
Part 6: Using parts 2 and 5, clean up the normalization, monotonicity, concavity, and C-additivity properties of K:∞. Showing that all the K:∞(x0) are C-additive trivially nets the bounded Lipschitz constant property to show that K:∞ is an infrakernel and K:∞(x0) is an infradistribution.
Part 7: Use our trick from Part 4 and our freedom of picking our compact set sequence from Part 5 to show compact-shared compact almost-support for K:∞, netting us the second infrakernel property, and the compact almost-support property for all the individual components of kernel, verifying the last condition we need to conclude that K:∞(x0) is an infradistribution.
Part 8: We recap one of the arguments for part 5, and it lets us get uniform convergence for a certain limit on any compact set, which is a critical lemma for Part 9.
Part 9: We use our result from Part 8 to invoke the Moore-Osgood theorem in order to show pointwise convergence for K:∞, wrapping up the last condition for it to be an infrakernel.
Part 10: Show that if all the K:n have some nice property, then the limit K:∞ inherits it too.
The proofs will proceed in a strange way to keep track of all the moving parts in places. We'll first present the thing we're trying to prove, and repeatedly go "we could prove it if we could prove this other thing", and keep chaining back until we get something that's easy to show.
Proof Part 1: Our desired result is whether the function λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞)
is continuous. So, letting xm1:n+1 limit to x∞1:n+1, our task is to show that:
limm→∞infxn+2:∞∈∏∞i=n+2Cif(xm1:n+1,xn+2:∞)=infxn+2:∞∈∏∞i=n+2Cif(x∞1:n+1,xn+2:∞)
Now, what we can do is consider the compact subset of ∏i=∞i=1Xi to be
{xm1:n+1}m∈N∪{∞}×∏∞i=n+2Ci
And then f must be uniformly continuous on it, so given any ϵ, there is some δ where points only δ away lead to only an ϵ differ in value. You can consider m to be big enough to guarantee that all future values of xm1:n+1 are within δ of x∞1:n+1, and then this gets that the function values can only differ by ϵ between (xm1:n+1,xn+2:∞) and (x∞1:n+1,xn+2:∞) if xn+2:∞∈∏∞i=n+2Ci, which it is. This ensures that the worst-case function values are only ϵ apart. This works for all ϵ, showing
limm→∞infxn+2:∞∈∏∞i=n+2Cif(xm1:n+1,xn+2:∞)=infxn+2:∞∈∏∞i=n+2Cif(x∞1:n+1,xn+2:∞)
And so, all the functions we're feeding into the K:n(x0) are continuous.
Proof Part 2: Desired result is "if all the Kn have a nice property, then all the K:n have it too".
This can be simply addressed by noting that, for the base case, because K:0=K0 and we're assuming all the Kn have (C-additivity/cohomogenity/homogenity/crispness/sharpness), K:0 trivially fulfills it.
And for the induction step, if we assume that K:n is 1-Lipschitz, note that:
K:n+1(x0)=K:n(x0)⋉(λx1:n+1.Kn+1(x0,x1:n+1))
And, by our results on semidirect products preserving nice properties, if K:n(x0) has the nice property (by induction assumption) and Kn+1 does, then we get that K:n+1(x0) preserves the same property, and it holds all the way up the K:n.
And we can move on to Part 3.
Part 3: Showing that, if we go far enough out in the K:n, the value assigned to functions which only depend on finitely many inputs stabilizes. The result that we'd like to show at this point is:
∀x0∈X0,n,m∈N,f∈CB(∏i=n+1i=1Xi):K:n+m(x0)(f)=K:n(x0)(f)
Admittedly, f is not of the proper type signature to be evaluated by K:n+m(x0), but we're abusing notation so that we can feed it in anyways and it just ignores all the coordinates it doesn't need. Accordingly, fix an arbitrary x0,n,f, and our proof target will now be:
∀m∈N:K:n+m(x0)(f)=K:n+m+1(x0)(f)
Proving this would let you apply induction, because we have a base case where K:n+0(x0)(f)=K:n(x0)(f). Let m be arbitrary. Then, we can go:
K:n+m+1(x0)(f)=K:n+m(x0)(λx1:n+m+1.Kn+m+1(x0,x1:n+m+1)(λxn+m+2.f(x1:n+1)))
And then, since the function doesn't depend on the choice of xn+m+2, it's a constant and C-additivity of Kn+m+1 lets us pull it out, yielding
=K:n+m(x0)(λx1:n+m+1.f(x1:n+1))=K:n+m(x0)(f)
And we're done.
Part 4: Our desired result here is:
∀CX0⊆X0,ϵ>0:∃C1:∞[CX0,ϵ]⊆∏∞i=1Xi:∀n∈N,x0∈CX0,f,f′∈CB(∏n+1i=1Xi):
f↓pr1:n+1(C1:∞[CX0,ϵ])=f′↓pr1:n+1(C1:∞[CX0,ϵ])
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
This is a bit complicated. It's saying that if you pick any compact subset of X0, you can make a compact subset of ∏∞i=1Xi where the projection of it to coordinates 1 through n+1 acts as a compact ϵ(1−12n+1)-almost-support for all the K:n(x0) infradistributions when x0 lies in your compact subset of X0. Regardless of what n is.
Accordingly, fix some CX0 and ϵ. Now, we can recursively build up compact subsets of all the Xn in the following way.
CXn+1ϵ2n+1⊆Xn+1 is a ϵ2n+1-almost-support for all the Kn(x1:n) where x1:n∈CX0×∏i=ni=1CXiϵ2i. So, basically, we're recursively building up compact subsets of ∏i=ni=0Xi by taking products of earlier compact subsets (with your base case being CX0), and then going "that's a compact subset of the input to Kn, we must be able to find a compact subset of Xn+1 that's a ϵ2n+1-almost-support for all the Kn(x:n) where x:n lies in our compact subset of input, because of the compact-shared almost-support condition for all the Kn" to go to the next compact set.
To establish some notation to make this a bit easier, let
Ci[CX0,ϵ]:=CXiϵ2i+1
(the i'th compact set in the sequence, defined with CX0 to start building your sequence), and let
C1:n[CX0,ϵ]:=∏i=ni=1Ci[CX0]
(the product of compact sets 1 through n, which is compact)
And let
C1:∞[CX0,ϵ]:=∏∞i=1Ci[CX0]
This is the product of all the compact sets, and is compact.
Note the dependence of these on the starting compact sets. Notice that the projection of C1:∞[CX0,ϵ] to coordinates 1 through n is exactly C1:n[CX0,ϵ].
Now that this is established, our proof target is (using our new notation):
∀n∈N,x0∈CX0,f,f′∈CB(∏n+1i=1Xi):
f↓C1:n+1[CX0,ϵ])=f′↓C1:n+1[CX0,ϵ])→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
This structure naturally suggests an induction proof, so for the base case, let our number be 0. Our proof target then turns into:
∀x0∈CX0,f,f′∈CB(X1):
f↓C1[CX0,ϵ])=f′↓C1[CX0,ϵ])→|K:0(x0)(f)−K:0(x0)(f′)|≤ϵ(1−12)d(f,f′)
Using that K:0=K0 and that C1[CX0,ϵ]=CX1ϵ2 and ϵ(1−12)=ϵ2 our proof target is now:
∀x0∈CX0,f,f′∈CB(X1):
f↓CX1ϵ2=f′↓CX1ϵ2→|K0(x0)(f)−K0(x0)(f′)|≤ϵ2d(f,f′)
However, we constructed CX1ϵ2 to be a ϵ2-almost-support for all the K0(x0) where x0∈CX0, so this statement is just true, and we're done with our base case.
Now, for the induction step, our proof target is:
∀x0∈CX0,f,f′∈CB(∏n+1i=1Xi):f↓C1:n+1[CX0,ϵ]=f′↓C1:n+1[CX0,ϵ]
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
implies
∀x0∈CX0,f,f′∈CB(∏n+2i=1Xi):f↓C1:n+2[CX0,ϵ]=f′↓C1:n+2[CX0,ϵ]
→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Accordingly, assume
∀x0∈CX0,f,f′∈CB(∏n+1i=1Xi):f↓C1:n+1[CX0,ϵ]=f′↓C1:n+1[CX0,ϵ]
→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
And our task is to prove
∀x0∈CX0,f,f′∈CB(∏n+2i=1Xi):f↓C1:n+2[CX0,ϵ]=f′↓C1:n+2[CX0,ϵ]
→|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Therefore let x0,f,f′ be arbitrary, and remember that they have the indicated properties, and that f,f′ agree with each other on the indicated set C1:n+2[CX0,ϵ]. Our proof target is now:
|K:n+1(x0)(f)−K:n+1(x0)(f′)|≤ϵ(1−12n+2)d(f,f′)
Unpacking the definition of K:n+1 and rewriting the thing on the end, this is equivalent to (we now take this as the proof target)
|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
≤ϵ(1−12n+1)d(f,f′)+ϵ2n+2d(f,f′)
We can apply the Lemma 2 decomposition, to split this into "level of support of compact set x distance of functions + distance of functions on compact set x lipschitz constant of infradistribution". So, theoretically, if we had the following two results:
∀x1:n+1∈C1:n+1[CX0,ϵ]:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤ϵ2n+2d(f,f′)
and
∀x1:n+1:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤d(f,f′)
then applying Lemma 2 would get us
|K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2)))
−K:n(x0)(λx1:n+1.Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2)))|
≤ϵ(1−12n+1)d(f,f′)+ϵ2n+2d(f,f′)⋅1
Which is exactly what we need. This works because C1:n+1[CX0,ϵ] is an ϵ(1−12n+1)-almost-support for K:n(x0) by our induction assumption, our 2 assumptions, Kn+1 be a 1-Lipschitz infrakernel, and Lemma 2. Accordingly, let's try to show our two proof targets we need to wrap up this result. We'll start with
∀x1:n+1∈C1:n+1[CX0,ϵ]:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤ϵ2n+2d(f,f′)
Now, we know that f and f′ agree on C1:n+2[CX0,ϵ] by assumption, which is a set that factorizes as C1:n+1[CX0,ϵ]×Cn+2[CX0,ϵ], and we have a promise that x1:n+1∈C1:n+1[CX0,ϵ], so λxn+2.f(x1:n+1,xn+2) equals λxn+2.f′(x1:n+1,xn+2) on the set Cn+2[CX0,ϵ], which is a ϵ2n+2-almost-support for all the Kn+1(x0,x1:n+1) where x0∈CX0 (which is the case by assumption) and x1:n+1∈C1:n+1[CX0,ϵ] (also the case), so we have our result.
Now for the other branch,
∀x1:n+1:|Kn+1(x0,x1:n+1)(λxn+2.f(x1:n+1,xn+2))
−Kn+1(x0,x1:n+1)(λxn+2.f′(x1:n+1,xn+2))|≤d(f,f′)
Due to 1-Lipschitzness of Kn+1(x0,x1:n+1) regardless of x0 and x1:n+1, we could prove it if, regardless of x1:n+1,
d(λxn+2.f(x1:n+1,xn+2),λxn+2.f′(x1:n+1,xn+2))≤d(f,f′)
which is the case, so we have our result.
That's everything wrapped up, so our induction proof goes through, yielding our result of:
∀CX0⊆X0,ϵ>0:∃C1:∞[CX0,ϵ]⊆∏∞i=1Xi:∀n∈N,x0∈CX0,f,f′∈CB(∏n+1i=1Xi):
f↓C1:n+1[CX0,ϵ]=f′↓C1:n+1[CX0,ϵ]→|K:n(x0)(f)−K:n(x0)(f′)|≤ϵ(1−12n+1)d(f,f′)
Now to begin our fifth part.
Part 5: Our aim here is to show that no matter what sequence of compact sets you have, they all limit to the same result, so our limit is going to be well defined. In order to do this, we'll have to show the result (letting Ci,1 being your first sequence of nonempty compact sets to attempt to define the limit and Ci,2 being your second sequence of nonempty compact sets, and abbreviating Cn:∞,1 as the product of the Ci,1 from n to ∞) that,
∀x0∈X0,ϵ>0,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2∈∏∞i=1K(Xi),f∈CB(∏∞i=1Xi)∃n∈N∀m1,m2∈N:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤2ϵ(2||f||+1)
In words, this is saying that for any two sequences of compact sets, there exists some threshold where if you pick any value of the defining sequence for K:∞(x0)(f) associated with using Ci,1 as your compact sets, and the sequence associated with using Ci,2 as your compact sets (as long as they're both past some shared threshold), they'll be close. If you have both sequences being identical, then this result is basically saying that the sequence used to define K:∞(x0)(f) is Cauchy (never varies too far from itself after a finite time), and thus the limit exists. And if you have the sequences being different, then this can be used to show that they limit to each other, so K:∞ is well-defined and you always get the same result no matter which particular sequence of compact sets you fix.
This is going to be rough. Fix our x0,ϵ,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2,f (input value, closeness parameter, two sequences of compact sets, function), and now we'll find our n to make
∀m1,m2∈N:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤2ϵ(2||f||+1)
true. Do it in the following way. Use {x0} as your compact seed set to invoke the technique in part 4 to construct your sequence Ci[x0,ϵ], and then consider the set:
∏∞i=1(Ci,1∪Ci,2∪Ci[x0,ϵ])
A finite union of compact sets is compact, and the product of compact sets is compact. If we restrict f to this set, it's uniformly continuous. In particular, using the standard product metric (which metrizes the product topology), defined as:
d(x1:∞,x′1:∞)=∑∞i=12−iinf(1,dXi(xi,x′i))
We can conclude that two sequences which agree up till time n must be, at most, distance 2−n apart. Since f restricted to our compact set of interest is uniformly continuous, there is some δ difference inputs that only leads to an ϵ difference in values. Now we can define our n as log2(δ).
Now that we've picked our n, let m1 and m2 be arbitrary. Our goal is to now prove:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤2ϵ(2||f||+1)
We can break up the distance between these two quantities as:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))|
+|K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))|
+|K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
The distances group into three "chunks". What we'll do is show that chunks 1 and 3 have value upper-bounded by ϵ(2||f||+1), and chunk 2 has a value of 0, producing our net 2ϵ(2||f||+1) upper-bound, and we'd be done. So, let's try to show the first one, that:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))|≤ϵ(2||f||+1)
This one is perfectly symmetric to the third distance chunk, so disposing of this will also deal with showing
|K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤ϵ(2||f||+1)
The way we'll deal with this one is by using our good old Lemma 2, where we split up the difference of the two quantities into "how much of a support is this set times how different are the two functions" and "how close are the two functions on this set times the Lipschitz constant of the infradistributions". We'll be picking the set C1:n+m1+1[x0,ϵ], which is an ϵ-almost-support for K1:n+m1(x0) by our discussions in Section 4. Because the infrakernels are always 1-Lipschitz because of C-additivity, the maximum/minimum expectation value the functions
λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞)
and
λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞)
can have is ||f|| (or −||f||) respectively. This produces a 2ϵ||f|| bound on the value of that piece produced via Lemma 2. All that remains is to prove that
supx1:n+m1+1∈C1:n+m1+1[x0,ϵ]
|infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞)
−infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞)|≤ϵ
(because of 1-Lipschitzness of the infrakernel) And we'll be done. We can reformulate this proof objective as:
∀x1:n+m1+1∈C1:n+m1+1[x0,ϵ]:
|infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞)
−infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞)|≤ϵ
Accordingly, let x1:n+m1+1 be arbitrary in said set, so our objective is now:
|infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞)
−infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞)|≤ϵ
We'd be able to prove this if we could show:
∀xn+m+m1+2:∞∈∏∞i=n+m1+2Ci,1,xn+2:∞∈∏∞i=n+2Ci[x0,ϵ]:
|f(x1:n+m1+1,xn+m1+2:∞)−f(x1:n+1,xn+2:∞)|≤ϵ
Accordingly, let xn+m1+2:∞ and xn+2:∞ be selected from the appropriate sets, and our goal is now:
|f(x1:n+m1+1,xn+m1+2:∞)−f(x1:n+1,xn+2:∞)|≤ϵ
At this point, we should remember that if you have a promise that your input to f will be within the set ∏∞i=1(Ci,1∪Ci,2∪Ci[x0,ϵ])
then knowing the first n coordinates fully pins down the value of your function f to within ϵ, by how we picked our n. And then we can notice something interesting. By how they were selected,
(x1:n+m1+1,xn+m1+2:∞)∈∏i=n+m1+1i=1Ci[x0,ϵ]×∏∞i=n+m1+2Ci,1
And also, (x1:n+1,xn+2:∞)∈∏∞i=1Ci[x0,ϵ]
So, the two inputs are both in the relevant compact set, and agree on the first n coordinates, so they agree to within ϵ value, and our desired result is the case. We've showed
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))|≤ϵ(2||f||+1)
Which symmetrically establishes
|K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤ϵ(2||f||+1)
By pretty much exactly the same line of argument. That leaves one last distance equality to establish. All we need now is to show that
|K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))|=0
Which is the same as the proof target:
K:n+m1(x0)(λx1:n+m1+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
=K:n+m2(x0)(λx1:n+m2+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
However, this inner function only depends on the inputs from 1 to n+1, so by our Part 3, both of these equal the value
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Ci[x0,ϵ]f(x1:n+1,xn+2:∞))
And so, we're done.
So, our entire result goes through, as we've shown all our proof targets, and we have:
∀x0∈X0,ϵ>0,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2∈∏∞i=1K(Xi),f∈CB(∏∞i=1Xi)∃n∈N∀m1,m2∈N:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|
≤2ϵ(2||f||+1)
As a result. Let's clean this up a little bit. It can be cleaned up into the equivalent form:
∀x0∈X0,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2∈∏∞i=1K(Xi),f∈CB(∏∞i=1Xi),ϵ>0,∃n∈N∀m1,m2∈N:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|≤ϵ
By shifting the ϵ to the front and using that ||f|| is finite so we can just let the old ϵ be small enough.
And now this gets interesting. Let x0 and f be arbitrary, specialize to ¯¯¯¯¯¯C1=¯¯¯¯¯¯C2, and m1=0, and abbreviate m2 as m. Then this turns into:
∀¯¯¯¯¯¯Ci∈∏∞i=1K(Xi),ϵ>0,∃n∈N∀m∈N:
|K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Cif(x1:n+m+1,xn+m+2:∞))|≤ϵ
Ie, this is pretty much saying that, regardless of your series of compact sets, the sequence in n given by:
K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
is Cauchy (and therefore must converge to a particular value, regardless of which choice of compact sets is made). So, when we defined K:∞(x0)(f) as
limn→∞K:n(x0)(λx1:n+1.infxn+2:∞∈∏∞i=n+2Cif(x1:n+1,xn+2:∞))
The limit does indeed exist. But, we can actually get something even stronger. All these limits must be the same. Given the thing we showed,
∀x0∈X0,¯¯¯¯¯¯C1,¯¯¯¯¯¯C2∈∏∞i=1K(Xi),f∈CB(∏∞i=1Xi),ϵ>0,∃n∈N∀m1,m2∈N:
|K:n+m1(x0)(λx1:n+m1+1.infxn+m1+2:∞∈∏∞i=n+m1+2Ci,1f(x1:n+m1+1,xn+m1+2:∞))
−K:n+m2(x0)(λx1:n+m2+1.infxn+m2+2:∞∈∏∞i=n+m2+2Ci,2f(x1:n+m2+1,xn+m2+2:∞))|≤ϵ
We can let x0and f be arbitrary, and m1=m2 to get:
∀¯¯¯¯¯¯C1,¯¯¯¯¯¯C2∈∏∞i=1K(Xi),ϵ>0,∃n∈N∀m∈N:
|K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Ci,1f(x1:n+m+1,xn+m+2:∞))
−K:n+m(x0)(λx1:n+m+1.infxn+m+2:∞∈∏∞i=n+m+2Ci,2f(x1:n+m+1,xn+m+2:∞))|≤ϵ
Ie, no matter which sequence of compact sets is selected, the two convergent sequences get arbitrarily close to each other, so our definition of K:∞(x0)(f) doesn't just have the limit being well-defined, it has it being the same regardless of which sequence of compact sets Ci was selected.
With this, now we can let the compact sequence be whatever is most convenient for arguments, as it always produces the same limit no matter what. (continued in next post)