We define a new class of reductions which preserve optimal predictor schemes, generalizing the previous notion of pseudo-invertible reductions. These are reductions that preserve the estimation problem on average but allow for large variance.
Definition 1
Fix an error space Δ of rank 2. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-bischeme. ^ζ is called a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) when there is a polynomial p:N→N s.t. the following conditions hold:
(i) Eμk[(EUrζ(k,j)[g(^ζkj(x))]−f(x))2]∈Δ
(ii) Prμk×Urζ(k,j)[νp(k)(^ζkj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a Q∩[0,M]-valued (poly,log)-bischeme s.t.
^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-scheme is called a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) when it becomes such when adding trivial dependence on j.
Definition 2
An error space Δ of rank 2 is called stable when for any non-constant polynomial p:N→N and δ∈Δ, the function δ′(k,j):=δ(p(k),j) is in Δ.
Theorem
Fix Δ a stable error space of rank 2. Suppose there is a polynomial h:N2→N s.t. h−1∈Δ. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor scheme for (g,ν). Define ^Pf by
Here, we assume the lengths of the yi and zi are compatible with ^ζ and ^Pg respectively. Then, ^Pf is a Δ(poly,log)-optimal predictor scheme for (f,μ).
Proposition 1
Consider (f,μ), (g,ν) distributional estimation problems. Suppose ^ζ=(ζ,rζ,aζ) is a weak Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^ξ=(ξ,rξ,aξ) is it's Δ-pseudo-inverse. Then there is δ∈Δ s.t. for any bounded function h:{0,1}∗2→R
We define a new class of reductions which preserve optimal predictor schemes, generalizing the previous notion of pseudo-invertible reductions. These are reductions that preserve the estimation problem on average but allow for large variance.
Definition 1
Fix an error space Δ of rank 2. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-bischeme. ^ζ is called a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) when there is a polynomial p:N→N s.t. the following conditions hold:
(i) Eμk[(EUrζ(k,j)[g(^ζkj(x))]−f(x))2]∈Δ
(ii) Prμk×Urζ(k,j)[νp(k)(^ζkj(x))=0]∈Δ
(iii) There is M>0 and ^R=(R,rR,aR) a Q∩[0,M]-valued (poly,log)-bischeme s.t.
Eνp(k)×UrR(k,j)[(^Rkj(y)−Prμk×Urζ(k,j)[^ζkj(x)=y]νp(k)(y))2]∈Δ
(iv) There is a {0,1}∗-valued (poly,log)-scheme ^ξ=(ξ,rξ,aξ) s.t.
Eμk×Urζ(k,j)[∑x′∈{0,1}∗|PrUrξ(k,j)[^ξkj(^ζkj(x,z),w)=x′]−Prμk×Urζ(k,j)[x′′=x′∣^ζkj(x′′,z′)=^ζkj(x,z)]|]∈Δ
Such ^ξ is called a Δ-pseudo-inverse of ^ζ.
^ζ=(ζ,rζ,aζ) a {0,1}∗-valued (poly,log)-scheme is called a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) when it becomes such when adding trivial dependence on j.
Definition 2
An error space Δ of rank 2 is called stable when for any non-constant polynomial p:N→N and δ∈Δ, the function δ′(k,j):=δ(p(k),j) is in Δ.
Theorem
Fix Δ a stable error space of rank 2. Suppose there is a polynomial h:N2→N s.t. h−1∈Δ. Consider (f,μ), (g,ν) distributional estimation problems, ^ζ a Δ-pseudo-invertible weak reduction of (f,μ) to (g,ν) and ^Pg a Δ(poly,log)-optimal predictor scheme for (g,ν). Define ^Pf by
^Pkjf(x,y1y2…yh(k,j)z1z2…zh(k,j)):=1h(k,j)h(k,j)∑i=1^Pp(k),jg(^ζkj(x,yi),zi)
Here, we assume the lengths of the yi and zi are compatible with ^ζ and ^Pg respectively. Then, ^Pf is a Δ(poly,log)-optimal predictor scheme for (f,μ).
Proposition 1
Consider (f,μ), (g,ν) distributional estimation problems. Suppose ^ζ=(ζ,rζ,aζ) is a weak Δ-pseudo-invertible reduction of (f,μ) to (g,ν) and ^ξ=(ξ,rξ,aξ) is it's Δ-pseudo-inverse. Then there is δ∈Δ s.t. for any bounded function h:{0,1}∗2→R
|Eμk×Urζ(k,j)×Urξ(k,j)[h(^ξkj(^ζkj(x,z),w),^ζkj(x,z))]−Eμk×Urζ(k,j)[h(x,^ζkj(x,z))]|≤(sup|h|)δ(k,j)
Proposition 1 proved exactly as Proposition 2 for unidistributional estimation problems.
Proposition 2
Consider X a finite set, μ a probability measure on X, h:X→R and s,t∈R. Then
Eμ[(h(x)−s)2−(h(x)−t)2]=(Eμ[h(x)]−s)2−(Eμ[h(x)]−t)2
Proof of Proposition 2
Eμ[(h(x)−s)2]=Eμ[h(x)2]−2Eμ[h(x)]s+s2
Eμ[(h(x)−t)2]=Eμ[h(x)2]−2Eμ[h(x)]t+t2
Eμ[(h(x)−s)2−(h(x)−t)2]=2Eμ[h(x)](t−s)+s2−t2
(Eμ[h(x)]−s)2=Eμ[h(x)]2−2Eμ[h(x)]s+s2
(Eμ[h(x)]−t)2=Eμ[h(x)]2−2Eμ[h(x)]t+t2
(Eμ[h(x)]−s)2−(Eμ[h(x)]−t)2=2Eμ[h(x)](t−s)+s2−t2
Proof of Theorem
Consider ^Qf=(Qf,sf,bf) a (poly,log)-predictor scheme. Let ^Qg=(Qg,sg,bg) be the (poly,log)-predictor scheme defined by
^Qkjg(x,y1y2y3y4):=^Pkjg(x,y1)+^Qkjf(^ξkj(x,y2),y3)−^Pkjf(^ξkj(x,y2),y4)
We have
Eνp(k)×UrR(k,j)×Urg(p(k),j)[^Rkj(y)(^Pp(k),jg(y)−g(y))2]≤Eνp(k)×UrR(k,j)×Usg(p(k),j)[^Rkj(y)(^Qp(k),jg(y)−g(y))2]+δ0(p(k),j)
where δ0∈Δ.
Applying the definitive property of ^R to the left hand side we get
Eνp(k)×UrR(k,j)×Urg(p(k),j)[^Rkj(y)(^Pp(k),jg(y)−g(y))2]=Eνp(k)×Urg(p(k),j)[Prμk×Urζ(k,j)[^ζkj(x)=y]νp(k)(y)(^Pp(k),jg(y)−g(y))2]+γR(k,j)
where |γR|∈Δ. Using property (ii) of pseudo-invertible reductions, we get
Eνp(k)×UrR(k,j)×Urg(p(k),j)[^Rkj(y)(^Pp(k),jg(y)−g(y))2]=Eμk×Urg(p(k),j)×Urζ(k,j)[(^Pp(k),jg(^ζkj(x))−g(^ζkj(x)))2]+γR(k,j)
Using the definitive property of ^R and property (ii) of pseudo-invertible reductions on the right-hand side, we get
E[^Rkj(y)(^Qkjg(y)−g(y))2]=E[(^Pp(k),jg(^ζkj(x))+^Qkjf(^ξkj(^ζkj(x)))−^Pkjf(^ξkj(^ζkj(x)))−g(^ζkj(x)))2]+γ′R(k,j)
where |γ′R|∈Δ. Applying Proposition 1
E[^Rkj(y)(^Qkjg(y)−g(y))2]=E[(^Pp(k),jg(^ζkj(x))+^Qkjf(x)−^Pkjf(x)−g(^ζkj(x)))2]+γξ(k,j)+γ′R(k,j)
where |γξ|∈Δ. Applying the stability of Δ to δ0 and putting everything together
E[(^Pp(k),jg(^ζkj(x))−g(^ζkj(x)))2]≤E[(^Pp(k),jg(^ζkj(x))+^Qkjf(x)−^Pkjf(x)−g(^ζkj(x)))2]+δ1(k,j)
for δ1∈Δ. Applying Proposition 2
Eμk[E[^Pp(k),jg(^ζkj(x))−g(^ζkj(x))]2]≤Eμk×Usf(k,j)+rf(k,j)[(E[^Pp(k),jg(^ζkj(x))−g(^ζkj(x))]+^Qkjf(x)−^Pkjf(x))2]+δ1(k,j)
Applying property (i) of pseudo-invertible reductions
Eμk[(E[^Pp(k),jg(^ζkj(x))]−f(x))2]≤Eμk×Usf(k,j)+rf(k,j)[(E[^Pp(k),jg(^ζkj(x))]−f(x)+^Qkjf(x)−^Pkjf(x))2]+δ2(k,j)
for δ2∈Δ. Using the definition of ^Pf we get
E[(^Pkjf(x)−f(x))2]≤E[(^Pkjf(x)−f(x)+^Qkjf(x)−^Pkjf(x))2]+δ3(k,j)
for δ3∈Δ. Finally we get
E[(^Pkjf(x)−f(x))2]≤E[(^Qkjf(x)−f(x))2]+δ3(k,j)