Reflection with optimal predictors

Vanessa Kosoy

A change in terminology: It is convenient when important concepts have short names. The concept of an "optimal predictor scheme" seems much more important than its historical predecessor, the "optimal predictor". Therefore "optimal predictor schemes" will be henceforth called just "optimal predictors" while the previous concept of "optimal predictor" might be called "flat optimal predictor".

We study systems of computations which have access to optimal predictors for each other. We expect such systems to play an important role in decision theory (where self-prediction is required to define logical counterfactuals and mutual prediction is required for a collection of agents in a game) and Vingean reflection (where the different computations correspond to different successor agents). The previously known existence theorems for optimal predictors are not directly applicable to this case. To overcome this we prove new, specifically tailored existence theorems.

The Results section states the main novelties, Appendix A contains adaptations of old theorems, Appendix B proves selected claims from Appendix A and Appendix C proves the novel results.

Results

Notation

Given sets $X$ and $Y$ , $X \to Y$ will denote the set of mappings from $X$ to $Y$ .

Before taking on reflection, we introduce a stronger concept of optimal predictor, to which the previous existence theorems still apply.

Definition 1

Let $r$ be a positive integer. A proto-error space of rank $r$ is a set $E$ of bounded functions from $N^{r}$ to $R^{\geq 0}$ s.t.

(i) If $δ_{1}, δ_{2} \in Δ$ then $δ_{1} + δ_{2} \in E$ .

(ii) If $δ_{1} \in Δ$ and $δ_{2} \leq δ_{1}$ then $δ_{2} \in E$ .

(iii) There is a polynomial $h : N^{r} \to R$ s.t. $2^{- h} \in E$ .

Proposition 1

If $E$ is a proto-error space of rank $r$ and $α \in R^{> 0}$ , then $E^{α} := {δ^{α} ∣ δ \in E}$ is also a proto-error space of rank $r$ .

Proposition 2

If $E$ is a proto-error space of rank $r$ , $α, γ \in R^{> 0}$ and $α < γ$ then $E^{γ} \subseteq E^{α}$ .

Definition 3

Fix $E$ a proto-error space of rank 2 and $(f, μ)$ a distributional estimation problem. Consider $^P$ a $(p o l y, l o g)$ -predictor.

$^P$ is called an $E (p o l y, l o g)$ -optimal predictor for $(f, μ)$ when for any $(p o l y, l o g)$ -predictor $^Q$ , there is $δ \in E$ s.t.

$E_{μ^{k} \times U^{r_{P} (k, j)}} [({^P}^{k j} (x) - f (x))^{2}] \leq E_{μ^{k} \times U^{r_{Q} (k, j)}} [({^Q}^{k j} (x) - f (x))^{2}] + δ (k, j)$

$^P$ is called an $E^{*} (p o l y, l o g)$ -optimal predictor for $(f, μ)$ when there is $α > 0$ s.t. $^P$ is an $E^{α} (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

$E^{*} (p o l y, l o g)$ -optimal predictors have properties closely parallel to $Δ (p o l y, l o g)$ -optimal predictors. The corresponding theorems are listed in Appendix A. Most theorems are given without proof, as the proofs are closely analogous to before, with the exception of Theorem A.4 which is proven in Appendix B.

We now consider a generalization in which the advice is allowed to be random.

Definition 4

Given appropriate sets $X$ and $Y$ , consider $Q : N^{2} \times X \times {0, 1}^{*}^{2} a l g - \to Y$ , $r_{Q} : N^{2} \to N$ polynomial and ${σ_{Q}^{k j} : {0, 1}^{*} \to [0, 1]}_{k, j \in N}$ a family of probability measures. The triple $^Q = (Q, r_{Q}, σ_{Q})$ is called $(p o l y, r l o g)$ -bischeme of signature $X \to Y$ when

(i) The runtime of $Q^{k j} (x, y, z)$ is bounded by $p (k, j)$ with $p$ polynomial.

(ii) $supp σ_{Q}^{k j} \subseteq {0, 1}^{\leq c ⌊ log (k + 2) + log (j + 2) ⌋}$ for some $c \in N$ .

We will use the notation ${^Q}^{k j} (x, y)$ to stand for the obvious $Y$ -valued $σ_{Q}^{k j}$ -random variable and the notation ${^Q}^{k j} (x)$ to stand for the obvious $Y$ -valued $U^{r_{Q} (k, j)} \times σ_{Q}^{k j}$ -random variable.

A $(p o l y, r l o g)$ -bischeme of signature ${0, 1}^{*} \to Y$ will also be called a $Y$ -valued $(p o l y, r l o g)$ -bischeme.

A $[0, 1]$ -valued $(p o l y, r l o g)$ -bischeme will also be called a $(p o l y, r l o g)$ -predictor.

Note 1

Conceptually advice corresponds to precomputation which might be expensive or even "hyperprecomputation". Random advice corresponds to random precomputation. Random logarithmic advice can be always replaced by deterministic polynomial advice at the cost of a small rounding error.

Definition 5

Fix $E$ a proto-error space of rank 2 and $(f, μ)$ a distributional estimation problem. Consider $^P$ a $(p o l y, r l o g)$ -predictor. $^P$ is called an $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ when for any $(p o l y, r l o g)$ -predictor $^Q$ , there is $δ \in E$ s.t.

$E_{μ^{k} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [({^P}^{k j} (x) - f (x))^{2}] \leq E_{μ^{k} \times U^{r_{Q} (k, j)} \times σ_{Q}^{k j}} [({^Q}^{k j} (x) - f (x))^{2}] + δ (k, j)$

The concept of an $E (p o l y, r l o g)$ -optimal predictor is essentially of the same strength as an $E (p o l y, r l o g)$ -optimal predictors, as seen in the following two Propositions.

Proposition 3

Fix $E$ a proto-error space of rank 2 and $(f, μ)$ a distributional estimation problem. Consider $^P$ an $E (p o l y, l o g)$ -optimal predictor. Then it defines a $E (p o l y, r l o g)$ -optimal predictor by setting $σ_{P}^{k j} (a_{P}^{k j}) = 1$ .

Proposition 4

Fix $E$ a proto-error space of rank 2 and $(f, μ)$ a distributional estimation problem. If $(f, μ)$ admits a $E (p o l y, r l o g)$ -optimal predictor then it admits a $E (p o l y, l o g)$ -optimal predictor.

We are now ready to introduce the key abstraction.

Definition 6

Given a set $Σ$ , denote $Π_{Σ} := Σ \times N^{2} \to {0, 1}^{*} \times N$ equipped with the product topology.

A reflective system is a triple $(Σ, f, μ)$ where $Σ$ is a set, ${μ_{n}^{k} : {0, 1}^{*} \to [0, 1]}_{n \in Σ, k \in N}$ is a collection of probability measures where we regard each $μ_{n}$ as a word ensemble and ${f_{n} : supp μ_{n} \times Π_{Σ} \to [0, 1]}_{n \in Σ}$ is a collection of continuous functions (here $supp μ_{n}$ has the discrete topology or, equivalently, we only require continuity in the second variable).

The motivation behind this definition is regarding $Π_{Σ}$ as the space of possible predictor programs, where the first factor of ${0, 1}^{*} \times N$ is the program itself (including advice) and the second factor is the number of intrinsic coin flips. Thus the $f_{n}$ represent a system of computations in which each has access to the source code of predictors for the entire system.

Definition 7

Consider a reflective system $R = (Σ, f, μ)$ and a collection of $(p o l y, r l o g)$ -predictors ${^Qn=(Qn,rn,σn)}n∈Σ$ .

Denote $A_{Σ} := Σ \times N^{2} \to {0, 1}^{*}$ equipped with the product $σ$ -algebra. Denote $σ := \prod_{n, k, j} σ_{n}^{k j}$ , a probability measure on $A_{Σ}$ .

Given $n \in Σ$ , $k, j \in N$ and $a \in {0, 1}^{*}$ denote $Q_{n}^{k j} [a] \in {0, 1}^{*}$ to be the program that computes $Q_{n}^{k j} (x, y, a)$ given input $x, y \in {0, 1}^{*}$ . Given $a \in A_{Σ}$ , define $^Q [a] \in Π_{Σ}$ by $^Q [a]_{n}^{k j} := (Q_{n}^{k j} [a_{n}^{k j}], r_{n} (k, j))$ .

Given $n \in Σ$ , $R [^Q]_{n} : supp μ_{n} \to [0, 1]$ is defined as follows

$R [^Q]_{n} (x) := E_{σ} [f_{n} (x,^Q [a])]$

The expectation value is well-defined thanks to the continuity of $f_{n}$ .

Definition 8

Fix $E$ a proto-error space of rank 2 and $R = (Σ, f, μ)$ a reflective system. A collection ${P_{n}}_{n \in Σ}$ of $(p o l y, r l o g)$ -predictors is called an $E (p o l y, r l o g)$ -optimal predictor system for $R$ when for any $n \in Σ$ , $P_{n}$ is a $E (p o l y, r l o g)$ -optimal predictor for $(R [P]_{n}, μ_{n})$ .

Construction 1

Given $ϕ \in Φ$ , denote $E_{1 (ϕ)}$ the set of bounded functions $δ : N \to R^{\geq 0}$ s.t.

$\forall ϵ \in (0, 1) : lim k \to \infty ϕ (k)^{1 - ϵ} δ (k) = 0$

Given $ϕ \in Φ$ , denote $E_{2 (l l, ϕ)}$ the set of bounded functions $δ : N^{2} \to R^{\geq 0}$ s.t.

$\forall ψ \in Φ : ψ \leq ϕ ⟹ E_{λ_{ψ}^{k}} [δ (k, j)] \in E_{1 (ψ)}$

Denote

$E_{2 (l l)} := ⋂ ϕ \in Φ E_{2 (l l, ϕ)}$

Proposition 5

If $ϕ \in Φ$ is s.t. $\exists n : {lim}_{k \to \infty} 2^{- k^{n}} ϕ (k) = 0$ , $E_{1 (ϕ)}$ is a proto-error space.

For any $ϕ \in Φ$ , $E_{2 (l l, ϕ)}$ is a proto-error space.

$E_{2 (l l)}$ is a proto-error space.

Note 2

$E_{2 (l l)}^{*}$ -optimality is strictly stronger than $Δ_{l l}^{2}$ -optimality. We exploit this strength in Theorem 2 below which is the main motivation for introducing $E$ -optimality at this point.

Theorem 1 (general existence theorem)

Any reflective system has a $E_{2 (l l)} (p o l y, r l o g)$ -optimal predictor system.

We also prove a more restricted existence theorem which allows deterministic advice.

Definition 9

Consider a set $Σ$ and ${μ_{n}^{k} : {0, 1}^{*} \to [0, 1]}_{n \in Σ, k \in N}$ a collection of probability measures. Denote $D_{μ} := {(n \in Σ, k \in N, j \in N, x \in {0, 1}^{*}) ∣ μ_{n}^{k} (x) > 0}$ . Denote $Ω_{μ} := D_{μ} \to [0, 1]$ .

Fix a set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . A $ϕ$ -reflective system is a pair $(f, μ)$ where $μ$ is as above and ${f_{n} : supp μ_{n} \times Ω_{μ} \to [0, 1]}_{n \in Σ}$ is a collection of functions s.t. there are collections ${ψ_{n m} \in Φ}_{n, m \in Σ}$ , ${α_{n} \in (0, 1]}_{n \in Σ}$ , ${c_{n} \in R^{> 0}}_{n \in Σ}$ and probability measures ${ρ_{n} : Σ \to [0, 1]}_{n \in Σ}$ satisfying

$\forall n, m \in Σ : ψ_{n m} \geq ϕ_{n}$

$\forall n \in Σ, k \in N, q, ~ q \in Ω_{μ} : E_{μ_{n}^{k}} [(f_{n} (x, q) - f_{n} (x, ~ q))^{2}] \leq c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k} \times μ_{m}^{k}} [(q_{m}^{k j} (x) - {~ q}_{m}^{k j} (x))^{2}]])^{α_{n}}$

Note 3

The condition on $f$ is a kind of Hoelder condition, uniform over $k$ .

Proposition 6

Fix a set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . For any $ϕ$ -reflective system $(f, μ)$ and $n \in Σ$ , $f_{n}$ is continuous with respect to the product topology on $Ω_{μ}$ .

Definition 10

Fix a set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ .

Given $m \in N$ , $x, y_{1}, y_{2} \dots y_{m} \in {0, 1}^{*}$ , we let $e v (x, y_{1}, y_{2} \dots y_{m}) \in {0, 1}^{\leq ω}$ stand for the evaluation of program $x$ on inputs $y_{1}, y_{2} \dots y_{m}$ without a time limit (we assume that on the output tape the machine head moves right iff it produces a symbol in ${0, 1}$ and cannot be moved left). We also extend $β$ to be defined on ${0, 1}^{\leq ω}$ in the obvious way.

We define $e x_{μ} : Π_{Σ} \to Ω_{μ}$ by $e x_{μ} (π)_{n}^{k j} (x) := E_{U^{(π_{n}^{k j})_{2}}} [β (e v ((π_{n}^{k j})_{1}, x, y))]$ .

Consider a $ϕ$ -reflective system $R = (f, μ)$ and a collection of $(p o l y, r l o g)$ -predictors ${^Qn=(Qn,rn,σn)}n∈Σ$ . Given $n \in Σ$ , $R [^Q]_{n} : supp μ_{n} \to [0, 1]$ is defined as follows

$R [^Q]_{n} (x) := E_{σ} [f_{n} (x, e x_{μ} (^Q [a]))]$

The expectation value is well-defined due to Proposition 5.

Definition 11

Fix a set $Σ$ , ${E_{n}}_{n \in Σ}$ a collection of proto-error spaces of rank 2, a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ and $R = (f, μ)$ a $ϕ$ -reflective system. A collection ${P_{n}}_{n \in Σ}$ of $(p o l y, l o g)$ -predictors is called an $E^{*} (p o l y, l o g)$ -optimal predictor system for $R$ when for any $n \in Σ$ there is $γ \in R^{> 0}$ s.t. $P_{n}$ is an $E_{n}^{γ} (p o l y, l o g)$ -optimal predictor for $(R [P]_{n}, μ_{n})$ .

Theorem 2

Consider a finite set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . Any $ϕ$ -reflective system has an $E_{2 (l l, ϕ)}^{*} (p o l y, l o g)$ -optimal predictor system.

Appendix A

Definition A.1

$E$ , a proto-error space of rank $r$ , is called ample when there is a polynomial $h : N^{r} \to R^{> 0}$ s.t. $\frac{1}{h} \in E$ .

Fix $E$ , a proto-error space of rank 2.

Theorem A.1

Assume $E$ is ample. Consider $(f, μ)$ a distributional estimation problem, $^P$ an $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ and ${p_{k j} \in [0, 1]}_{k, j \in N}$ , ${q_{k j} \in [0, 1]}_{k, j \in N}$ . Define $γ : N^{2} \to R^{> 0}$ by

$γ (k, j) := P r_{μ^{k} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [p_{k j} \leq {^P}^{k j} \leq q_{k j}]^{- 1}$

Define

$ϕ_{k j} := E_{μ^{k} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [f - {^P}^{k j} ∣ p_{k j} \leq {^P}^{k j} \leq q_{k j}]$

Assume that either $p_{k j}, q_{k j}$ have a number of digits logarithmically bounded in $k, j$ or ${^P}^{k j}$ produces outputs with a number of digits logarithmically bounded in $k, j$ . Then, $| ϕ | \in (γ E)^{\frac{1}{2}}$ .

Theorem A.2

Consider $μ$ a word ensemble and $f_{1}, f_{2} : supp μ \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose ${^P}_{1}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{1}, μ)$ and ${^P}_{2}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{2}, μ)$ . Then, ${^P}_{1} + {^P}_{2}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{1} + f_{2}, μ)$ .

Theorem A.3

Consider $μ$ a word ensemble and $f_{1}, f_{2} : supp μ \to [0, 1]$ s.t. $f_{1} + f_{2} \leq 1$ . Suppose ${^P}_{1}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{1}, μ)$ and ${^P}_{2}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{1} + f_{2}, μ)$ . Then, ${^P}_{2} - {^P}_{1}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{2}, μ)$ .

Definition A.2

Given $E$ a proto-error space of rank $r$ , the associated error space is $E^{\frac{1}{\infty}} := ⋃_{ϵ > 0} E^{ϵ}$ .

Theorem A.4

Consider $(f_{1}, μ_{1})$ , $(f_{2}, μ_{2})$ distributional estimation problems with respective $E^{*} (p o l y, r l o g)$ -optimal predictors ${^P}_{1}$ and ${^P}_{2}$ . Assume $μ_{1}$ is $E^{\frac{1}{\infty}} (l o g)$ -sampleable and $(f_{2}, μ_{2})$ is $E^{\frac{1}{\infty}} (l o g)$ -generatable. Then, ${^P}_{1} \times {^P}_{2}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f_{1} \times f_{2}, μ_{1} \times μ_{2})$ .

Theorem A.5

Consider $μ$ a word ensemble, $f : supp μ \to [0, 1]$ and $D \subseteq {0, 1}^{*}$ . Assume ${^P}_{D}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(D, μ)$ and ${^P}_{f ∣ D}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f, μ ∣ D)$ . Then ${^P}_{D} {^P}_{f ∣ D}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(χ_{D} f, μ)$ .

Definition A.3

We define the stabilizer of $E$ , denoted $stab E$ , to be the set of functions $γ : N^{2} \to R^{> 0}$ s.t. for any $δ \in E$ we have $γ δ \in E$ .

Theorem A.6

Fix $h$ a polynomial s.t. $2^{- h} \in Δ$ . Consider $μ$ a word ensemble, $f : supp μ \to [0, 1]$ and $D \subseteq {0, 1}^{*}$ . Assume $μ^{k} (D)^{- 1} \in stab E$ . Assume $^P$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(D, μ)$ and ${^P}_{χ_{D} f}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(χ_{D} f, μ)$ . Define ${^P}_{f ∣ D}$ by

${^P}_{f ∣ D}^{k j} (x) := ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ \begin{matrix} 1 & if {^P}_{D}^{k j} (x) = 0 η (\frac{{^P}_{χ_{D} f}^{k j} (x)}{{^P}_{D}^{k j} (x)}) & rounded to h (k, j) binary places if {^P}_{D}^{k j} (x) > 0 \end{matrix}$

Then, ${^P}_{f ∣ D}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f, μ ∣ D)$ .

Definition A.4

Consider $μ$ a word ensemble, ${^Q}_{1}$ , ${^Q}_{2}$ $(p o l y, r l o g)$ -predictors.

We say ${^Q}_{1}$ is $E$ -similar to ${^Q}_{2}$ relative to $μ$ (denoted ${^Q}_{1} μ ≃ E {^Q}_{2}$ ) when $E_{μ^{k} \times U^{r_{1} (k, j)} \times U^{r_{2} (k, j)} \times σ_{1}^{k j} \times σ_{2}^{k j}} [({^Q}_{1}^{k j} (x) - {^Q}_{2}^{k j} (x))^{2}] \in E$ .

Let $Δ$ be an error space.

We say ${^Q}_{1}$ is $Δ$ -similar to ${^Q}_{2}$ relative to $μ$ (denoted ${^Q}_{1} μ ≃ Δ {^Q}_{2}$ ) when $E_{μ^{k} \times U^{r_{1} (k, j)} \times U^{r_{2} (k, j)} \times σ_{1}^{k j} \times σ_{2}^{k j}} [({^Q}_{1}^{k j} (x) - {^Q}_{2}^{k j} (x))^{2}] \in Δ$ .

Theorem A.7 (uniqueness theorem)

Consider $(f, μ)$ a distributional estimation problem, $^P$ an $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ and $^Q$ an $(p o l y, r l o g)$ -predictor.

If $^Q$ is a $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ then $^P μ ≃ E^{\frac{1}{2}}^Q$ .

Conversely, if $^P μ ≃ E^{\frac{1}{\infty}}^Q$ then $^Q$ is a $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ .

Definition A.5

$E$ is called stable when for any non-constant polynomial $p : N \to N$ there is $α_{p} > 0$ s.t. for any $δ \in E$ , the function $δ^{'} (k, j) := δ (p (k), j)$ is in $E^{α_{p}}$ .

Proposition A.1

$E_{2 (l l)}$ is stable.

Theorem A.8

Assume $E$ is ample and stable. Consider $(f, μ)$ , $(g, ν)$ distributional estimation problems, $^ζ$ a $E^{\frac{1}{\infty}}$ -pseudo-invertible reduction of $(f, μ)$ to $(g, ν)$ and ${^P}_{g}$ an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(g, ν)$ . Define ${^P}_{f}$ by ${^P}_{f}^{k j} (x) := {^P}_{g}^{k j} ({^ζ}^{k j} (x))$ . Then, ${^P}_{f}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ .

Theorem A.9

Assume $E$ is ample and stable. Consider $(f, μ)$ , $(g, ν)$ distributional estimation problems, $^ζ$ a $E^{\frac{1}{\infty}}$ -pseudo-invertible weak reduction of $(f, μ)$ to $(g, ν)$ and ${^P}_{g}$ an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(g, ν)$ . Choose $h : N^{2} \to N$ a polynomial with $\frac{1}{h} \in E$ and define ${^P}_{f}$ by

${^P}_{f}^{k j} (x) := \frac{1}{h (k, j)} h (k, j) \sum i = 1 {^P}_{g}^{k j} ({^ζ}^{k j} (x \dots) \dots)$

Here, the ellipses signify that each term corresponds to an independent sampling of $U^{r_{ζ} (k, j)} \times U^{r_{g} (k, j)} \times σ_{g}^{k j}$ . Then, ${^P}_{f}$ is an $E^{*} (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ .

Theorem A.10

Consider $(f, μ)$ a distributional estimation problem, $ϕ \in Φ$ and $^G$ a weak $Δ_{s q p, ϕ}^{2} (l o g)$ -generator for $(f, μ)$ . Then, $^Λ [^G]$ is an $E_{2 (l l, ϕ)} (p o l y, l o g)$ -optimal predictor for $(f, μ)$ .

Appendix B

Definition B.1

Given $n \in N$ , a function $δ : N^{2 + n} \to R^{\geq 0}$ is called $E$ -moderate when

(i) $δ$ is non-decreasing in arguments $3$ to $2 + n$ .

(ii) For any collection of polynomials ${p_{i} : N^{2} \to N}_{i < n}$ , $δ (k, j, p_{0} (k, j) \dots p_{n - 1} (k, j)) \in E$

Lemmas B.1 and B.2 below are given only for future reference (and as an aid in spelling out the proofs of other Theorems in Appendix A).

Lemma B.1

Fix $(f, μ)$ a distributional estimation problem and $^P$ a $(p o l y, r l o g)$ -predictor. Then, $^P$ is $E (p o l y, r l o g)$ -optimal iff there is a $E$ -moderate function $δ : N^{4} \to [0, 1]$ s.t. for any $k, j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to [0, 1]$

$E_{μ^{k} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [(P^{k j} (x, y, z) - f (x))^{2}] \leq E_{μ^{k} \times U^{s}} [(Q (x, y) - f (x))^{2}] + δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

Proof of Lemma B.1

Define

$δ (k, j, t, u) := max \begin{matrix} T_{Q}^{μ} (k, s) \leq t | Q | \leq log u \end{matrix} max (E_{μ^{k} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [(P^{k j} (x, y, z) - f (x))^{2}] - E_{μ^{k} \times U^{s}} [(Q (x, y) - f (x))^{2}], 0)$

Lemma B.2

Assume $E$ is ample. Fix $(f, μ)$ a distributional estimation problem and $^P$ a corresponding $E (p o l y, r l o g)$ -optimal predictor. Consider $^Q$ a $(p o l y, r l o g)$ -predictor, $M > 0$ , $^w$ a $Q \cap [0, M]$ -valued $(p o l y, r l o g)$ -bischeme. Assume $r_{w} \geq max (r_{P}, r_{Q})$ and $u_{P}, u_{Q} : {0, 1}^{*} \to {0, 1}^{*}$ are s.t. $u_{P *} (σ_{w}) = σ_{P}$ and $u_{Q *} (σ_{w}) = σ_{Q}$ . Then there is $δ \in E$ s.t.

$E_{μ^{k} \times U^{r_{w} (k, j)} \times σ_{w}^{k j}} [w^{k j} (x, y, z) (P^{k j} (x, y_{\leq r_{P} (k, j)}, u_{P} (z)) - f (x))^{2}] \leq E_{μ^{k} \times U^{r_{w} (k, j)} \times σ_{w}^{k j}} [w^{k j} (x, y, z) (Q^{k j} (x, y_{\leq r_{Q} (k, j)}, u_{Q} (z)) - f (x))^{2}] + δ (k, j)$

Proof of Lemma B.2

Suppose $h : N^{2} \to N$ is a polynomial s.t. $\frac{1}{h} \in E$ . Given $t \in [0, M]$ , define $α^{k j} (t)$ to be $t$ rounded within error $h (k, j)^{- 1}$ . Thus, the number of digits in $α^{k j} (t)$ is logarithmic in $k$ and $j$ . Consider ${^Q}_{t} := (Q_{t}, r_{w}, σ_{u})$ the $(p o l y, r l o g)$ -predictor defined by

$σ_{u} := (1 \times u_{P} \times u_{Q})_{*} σ_{w}$

$Q_{t}^{k j} (x, y, (a, b, c)) := {\begin{matrix} Q^{k j} (x, y_{\leq r_{Q} (k, j)}, c) & if w^{k j} (x, y, a) \geq α^{k j} (t) P^{k j} (x, y_{\leq r_{P} (k, j)}, b) & if w^{k j} (x, y, a) < α^{k j} (t) \end{matrix}$

${^Q}_{t}$ satisfies bounds on runtime and advice size uniform in $t$ . Therefore, Lemma B.1 implies that there is $δ \in E$ s.t.

$E [({^P}^{k j} (x) - f (x))^{2}] \leq E [({^Q}_{t}^{k j} (x) - f (x))^{2}] + δ (k, j)$

$E [({^P}^{k j} (x) - f (x))^{2} - ({^Q}_{t}^{k j} (x) - f (x))^{2}] \leq δ (k, j)$

$E [θ ({^w}^{k j} (x) - α^{k j} (t)) (({^P}^{k j} (x) - f (x))^{2} - ({^Q}^{k j} (x) - f (x))^{2})] \leq δ (k, j)$

$E [\int_{0}^{M} θ ({^w}^{k j} (x) - α^{k j} (t)) d t (({^P}^{k j} (x) - f (x))^{2} - ({^Q}^{k j} (x) - f (x))^{2})] \leq M δ (k, j)$

$E [{^w}^{k j} (x) (({^P}^{k j} (x) - f (x))^{2} - ({^Q}^{k j} (x) - f (x))^{2})] \leq M δ (k, j) + h (k, j)^{- 1}$

Lemma B.3 (orthogonality lemma)

Consider $(f, μ)$ a distributional estimation problem and $^P$ an $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ . Then there are $c_{1}, c_{2} \in R$ and an $E^{\frac{1}{2}}$ -moderate function $δ : N^{4} \to [0, 1]$ s.t. for any $k, j, s \in N$ , $Q : {0, 1}^{*}^{2} a l g - \to Q$

$| E_{μ^{k} \times U^{s} \times U^{r_{P} (k, j)} \times σ_{P}^{k j}} [Q ({^P}^{k j} - f)] | \leq (c_{1} + c_{2} E_{μ^{k} \times U^{s}} [Q^{2}]) δ (k, j, T_{Q}^{μ} (k, s), 2^{| Q |})$

Conversely, consider $M \in Q$ and $^P$ a $Q \cap [- M, + M]$ -valued $(p o l y, r l o g)$ -bischeme. Suppose that for any $Q \cap [- M - 1, + M]$ -valued $(p o l y, l o g)$ -bischeme $(Q, s, b)$ we have $| E [Q (P - f)] | \in E$ .

Define $~ P$ to be s.t. computing ${~ P}^{k j}$ is equivalent to computing $η ({^P}^{k j})$ rounded to $h (k, j)$ digits after the binary point, where $2^{- h} \in E$ . Then, $~ P$ is an $E (p o l y, r l o g)$ -optimal predictor for $(f, μ)$ .

We ommit the proofs of Lemma B.3 and Lemma B.4 below since they are closely analogous to before.

Lemma B.4

Consider $(f, μ)$ a distributional estimation problem, $^P$ , $^Q$ $(p o l y, r l o g)$ -predictors. Suppose $p : N^{2} \to N$ a polynomial, $ϕ \in Φ$ and $δ \in E_{2 (l l, ϕ)}$ are s.t.

$\forall i, k, j \in N : E [({^P}^{k, p (k, j) + i} - f)^{2}] \leq E [({^Q}^{k j} - f)^{2}] + δ (k, j)$

Then $\exists δ^{'} \in E_{2 (l l, ϕ)}$ s.t.

$E [({^P}^{k j} - f)^{2}] \leq E [({^Q}^{k j} - f)^{2}] + δ^{'} (k, j)$

Proof of Theorem A.4

Denote $^P := {^P}_{1} \times {^P}_{2}$ . We have

$^P (x_{1}, x_{2}) - (f_{1} \times f_{2}) (x_{1}, x_{2}) = ({^P}_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2}) + {^P}_{1} (x_{1}) ({^P}_{2} (x_{2}) - f_{2} (x_{2}))$

Therefore, for any $Q \cap [- 1, + 1]$ -valued $(p o l y, l o g)$ -bischeme $^Q$

$| E [^Q (^P - f_{1} \times f_{2})] | \leq | E [^Q (x_{1}, x_{2}) ({^P}_{1} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | + | E [^Q (x_{1}, x_{2}) {^P}_{1} (x_{1}) ({^P}_{2} (x_{2}) - f_{2} (x_{2}))] |$

By Lemma B.3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. Suppose $^G$ is a $E^{\frac{1}{\infty}} (l o g)$ -generator for $(f_{2}, μ_{2})$ . For the first term, we have

$| E [{^Q}^{k j} (x_{1}, x_{2}) ({^P}_{1}^{k j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq | E [{^Q}^{k j} (x_{1}, {^G}_{1}^{k j}) ({^P}_{1}^{k j} (x_{1}) - f_{1} (x_{1})) {^G}_{2}^{k j}] | + δ_{2} (k, j)$

where $δ_{2} \in E^{\frac{1}{\infty}}$ doesn't depend on $Q$ . Applying Lemma B.3 for ${^P}_{1}$ , we get

$| E [{^Q}^{k j} (x_{1}, x_{2}) ({^P}_{1}^{k j} (x_{1}) - f_{1} (x_{1})) f_{2} (x_{2})] | \leq δ_{Q, 1} (k, j) + δ_{2} (k, j)$

where $δ_{Q, 1} \in E^{α_{1}}$ for some $α_{1} \in R^{> 0}$ that doesn't depend on $Q$ .

Suppose $^S$ is a $E^{\frac{1}{\infty}} (l o g)$ -sampler for $μ_{1}$ . For the second term, we have

$| E [{^Q}^{k j} (x_{1}, x_{2}) {^P}_{1} (x_{1}) ({^P}_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | \leq | E [{^Q}^{k j} ({^S}^{k j}, x_{2}) {^P}_{1} ({^S}^{k j}) ({^P}_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | + δ_{1} (k, j)$

where $δ_{1} \in E^{\frac{1}{\infty}}$ doesn't depend on $Q$ . Applying Lemma B.3 for ${^P}_{2}$ , we get

$| E [{^Q}^{k j} (x_{1}, x_{2}) {^P}_{1} (x_{1}) ({^P}_{2}^{k j} (x_{2}) - f_{2} (x_{2}))] | \leq δ_{Q, 2} (k, j) + δ_{1} (k, j)$

where $δ_{Q, 2} \in E^{α_{2}}$ for some $α_{2} \in R^{> 0}$ that doesn't depend on $Q$ . Again, we got the required bound.

Proof of Proposition A.1

Consider a non-constant polynomial $p : N \to N$ and $δ \in E_{2 (l l)}$ . Define $δ^{'} (k, j) := δ (p (k), j)$ . To get the desired condition for $δ^{'}$ and $ϕ \in Φ$ , consider any $ϕ^{'} \in Φ$ s.t. for sufficiently large $k$ we have $ϕ^{'} (p (k)) = ϕ (k)$ . For any $ϵ \in (0, 1)$ we have

$lim k \to \infty ϕ^{'} (k)^{ϵ} E_{λ_{ϕ^{'}}^{k}} [δ (k, j)] = 0$

In particular

$lim k \to \infty ϕ^{'} (p (k))^{ϵ} E_{λ_{ϕ^{'}}^{p (k)}} [δ (p (k), j)] = 0$

$lim k \to \infty ϕ (k)^{ϵ} E_{λ_{ϕ}^{k}} [δ^{'} (k, j)] = 0$

Appendix C

Proof of Proposition 1

To check condition (i), consider $δ_{1}, δ_{2} \in E$ .

If $α > 1$ , $(δ_{1}^{α} + δ_{2}^{α})^{\frac{1}{α}} \leq δ_{1} + δ_{2} \in E$ hence $(δ_{1}^{α} + δ_{2}^{α})^{\frac{1}{α}} \in E$ and $δ_{1}^{α} + δ_{2}^{α} \in E^{α}$ .

If $α \leq 1$ , $(δ_{1}^{α} + δ_{2}^{α})^{\frac{1}{α}} = 2^{\frac{1}{α}} (\frac{δ_{1}^{α} + δ_{2}^{α}}{2})^{\frac{1}{α}} \leq 2^{\frac{1}{α}} \frac{δ_{1} + δ_{2}}{2} \in E$ hence $(δ_{1}^{α} + δ_{2}^{α})^{\frac{1}{α}} \in E$ and $δ_{1}^{α} + δ_{2}^{α} \in E^{α}$ .

Conditions (ii) and (iii) are obvious.

Proof of Proposition 2

Consider $δ \in E$ . We need to show that $δ^{γ} \in E^{α}$ i.e. that $δ^{\frac{γ}{α}} \in E$ . But $\frac{γ}{α} > 1$ hence $δ^{\frac{γ}{α}} = (sup δ)^{\frac{γ}{α}} (\frac{δ}{sup δ})^{\frac{γ}{α}} \leq (sup δ)^{\frac{γ}{α}} \frac{δ}{sup δ} \in E$ .

Proof of Proposition 3

Follows immediately from Lemma B.1.

Proof of Proposition 4

Suppose $^P$ is a $E (p o l y, r l o g)$ -optimal predictor. Set $a^{k j} := arg {min}_{z \in supp σ_{P}^{k j}} E_{μ^{k} \times U^{r_{P} (k, j)}} [(P^{k j} (x, y, z) - f (x))^{2}]$ . Replacing $σ$ by $a$ we get the desired $E (p o l y, l o g)$ -optimal predictor.

Proposition C.1

For any $ψ \in Φ$ , $min (\frac{log log (k + 3)}{log log (j + 3)}, 1)^{ψ (k)} \in E_{2 (l l)}$

In particular, this implies $\frac{1}{j + 1} \in E_{2 (l l)}$ so $E_{2 (l l)}$ is ample.

Proof of Proposition C.1

Denote $δ_{ψ} (k, j) := min (\frac{log log (k + 3)}{log log (j + 3)}, 1)^{ψ (k)}$ . Consider $ϕ \in Φ$ , $ϵ \in (0, 1)$ . We have

$E_{λ_{ϕ}^{k}} [δ_{ψ} (k, j)] = P r_{λ_{ϕ}^{k}} [j < t_{ϕ^{\frac{ϵ}{2}}} (k)] E_{λ_{ϕ}^{k}} [δ_{ψ} (k, j) ∣ j < t_{ϕ^{\frac{ϵ}{2}}} (k)] + P r_{λ_{ϕ}^{k}} [j \geq t_{ϕ^{\frac{ϵ}{2}}} (k)] E_{λ_{ϕ}^{k}} [δ_{ψ} (k, j) ∣ j \geq t_{ϕ^{\frac{ϵ}{2}}} (k)]$

$limsup k \to \infty ϕ (k)^{1 - ϵ} E_{λ_{ϕ}^{k}} [δ (k, j)] \leq limsup k \to \infty ϕ (k)^{1 - ϵ} (ϕ (k)^{\frac{ϵ}{2} - 1} (sup δ_{ψ}) + sup j \geq t_{ϕ^{\frac{ϵ}{2}}} (k) δ_{ψ} (k, j))$

$limsup k \to \infty ϕ (k)^{1 - ϵ} E_{λ_{ϕ}^{k}} [δ (k, j)] \leq limsup k \to \infty ϕ (k)^{1 - ϵ} (ϕ (k)^{\frac{ϵ}{2} - 1} + ϕ (k)^{- \frac{ϵ}{2} ψ (k)})$

$limsup k \to \infty ϕ (k)^{1 - ϵ} E_{λ_{ϕ}^{k}} [δ (k, j)] \leq limsup k \to \infty (ϕ (k)^{- \frac{ϵ}{2}} + ϕ (k)^{1 - ϵ - \frac{ϵ}{2} ψ (k)})$

$lim k \to \infty ϕ (k)^{1 - ϵ} E_{λ_{ϕ}^{k}} [δ (k, j)] = 0$

Proof of Proposition 5

The only not entirely obvious part is condition (iii) for $E_{2 (l l)}$ which follows from Proposition C.1 (since $2^{- j} \in E_{2 (l l)}$ ).

Construction C.1

Fix $R = (Σ, f, μ)$ a reflective system.

For any $j \in N$ , denote $W^{j} \subseteq {0, 1}^{*}$ the set of the first $j$ words in lexicographic order. Denote $Δ^{j}$ the space of probability distributions on $W^{j}$ . Denote $Δ_{Σ} := \prod_{\begin{matrix} k, j \in N n \in Σ \end{matrix}} Δ^{j}$ . Denote $W_{Σ} := {(n \in Σ, k \in N, j \in N, x \in W^{j})}$ . Let $V_{Σ}$ be the locally convex topological vector space $W_{Σ} \to R$ , where the topology is the product topology.

$Δ_{Σ}$ is a compact (by Tychonoff's theorem) convex subset of $V_{Σ}$ . Each $ϑ \in Δ_{Σ}$ can be regarded as a probability measure on $A_{Σ}$ .

Given $a \in A_{Σ}$ , we define $¯ a \in Π_{Σ}$ by ${¯ a}_{n}^{k j} := (a_{n}^{k j}, j)$ .

Given $k, j \in N$ and $n \in Σ$ , define $ϵ_{R, n}^{k j} : Δ_{Σ} \times Δ^{j} \to [0, 1]$ as follows

$ϵ_{R, n}^{k j} (ϑ, ζ) := E_{ζ \times μ_{n}^{k} \times U^{j} \times ϑ} [(e v^{j} (x, y, z) - f_{n} (y, ¯ ¯¯¯¯¯¯¯¯¯ ¯ Υ [w]))^{2}]$

Define $κ_{R} \subseteq Δ_{Σ} \times Δ_{Σ}$ by

$κ_{R} = {(ϑ_{1}, ϑ_{2}) ∣ \forall k, j \in N, n \in Σ, ζ \in Δ^{j} : ϵ_{R, n}^{k j} (ϑ_{1}, (ϑ_{2})_{n}^{k j}) \leq ϵ_{R, n}^{k j} (ϑ_{1}, ζ)}$

Proposition C.2

$κ_{R}$ is a Kakutani map.

Proof of Proposition C.2

$ϵ_{R, n}^{k j}$ is continuous in the 2nd argument and $Δ^{j}$ is compact for any $j \in N$ . Therefore, for any $ϑ \in Δ_{Σ}$ , $κ_{R} (ϑ) \subseteq Δ_{Σ}$ is non-empty by the extreme value theorem. It is compact by Tychonoff's theorem and it is obviously convex.

Given $Y \subseteq supp μ_{n}^{k}$ finite, define $ϵ_{R, Y, n}^{k j} : Δ_{Σ} \times Δ^{j} \to [0, 1]$ by

$ϵ_{R, Y, n}^{k j} (ϑ, ζ) := \sum y \in Y μ_{n}^{k} (y) E_{ζ \times U^{j} \times ϑ} [(e v^{j} (x, y, z) - f_{n} (y, ¯ ¯¯¯¯¯¯¯¯¯ ¯ Υ [w]))^{2}]$

$ϵ_{R, Y, n}^{k j}$ is continuous since $f$ is continuous. Regarding the $Y$ s as a net by set inclusion, $ϵ_{R, Y, n}^{k j}$ uniformly converges to $ϵ_{R, n}^{k j}$ , therefore $ϵ_{R, n}^{k j}$ is continuous.

Given $n \in Σ$ , $k, j \in N$ , define $κ_{R, n}^{k j} \subseteq Δ_{Σ} \times Δ_{Σ}$ by

$κ_{R, n}^{k j} = {(ϑ_{1}, ϑ_{2}) ∣ \forall ζ \in Δ^{j} : ϵ_{R, n}^{k j} (ϑ_{1}, (ϑ_{2})_{n}^{k j}) \leq ϵ_{R, n}^{k j} (ϑ_{1}, ζ)}$

$κ_{R, n}^{k j}$ is closed since $ϵ_{R, n}^{k j}$ is continuous. Therefore $κ_{R} = ⋂_{\begin{matrix} k, j \in N n \in Σ \end{matrix}} κ_{R, n}^{k j}$ is also closed.

Proof of Theorem 1

Using Proposition C.2, we apply the Kakutani-Glicksberg-Fan theorem to get $(σ, σ) \in κ_{R}$ . Define ${^P}_{n} := (Υ, j, σ)$ .

Consider $n \in Σ$ , $^Q$ a $(p o l y, l o g)$ -predictor. Choose $p : N^{2} \to N$ a polynomial and ${q^{k j} \in W^{p (k, j)}}_{k, j \in N}$ s.t. $p \geq r_{Q}$ and

$\forall i, k, j \in N, x \in supp μ_{n}^{k}, y \in {0, 1}^{p (k, j) + i} : e v^{p (k, j) + i} (q^{k j}, x, y) = Q^{k j} (x, y_{< r_{Q} (k, j)}, a_{Q}^{k j})$

By definition of $κ_{R}$ , we have

$\forall i, k, j \in N : ϵ_{R, n}^{k, p (k, j) + i} (σ, σ_{n}^{k, p (k, j) + i}) \leq ϵ_{R, n}^{k, p (k, j) + i} (σ, q^{k j})$

Here we implicitly used the natural injection $W^{m} \to Δ^{m}$ .

$\forall i, k, j \in N : E_{σ_{n}^{k, p (k, j) + i} \times μ_{n}^{k} \times U^{p (k, j) + i} \times σ} [(e v^{p (k, j) + i} (x, y, z) - f_{n} (y, ¯ ¯¯¯¯¯¯¯¯¯ ¯ Υ [w]))^{2}] \leq E_{μ_{n}^{k} \times U^{p (k, j) + i} \times σ} [(e v^{p (k, j) + i} (q^{k j}, y, z) - f_{n} (y, ¯ ¯¯¯¯¯¯¯¯¯ ¯ Υ [w]))^{2}]$

$\forall i, k, j \in N : E_{σ_{n}^{k, p (k, j) + i} \times μ_{n}^{k} \times U^{p (k, j) + i} \times σ} [(P_{n}^{k, p (k, j) + i} (y, z, x) - f_{n} (y, {^P}_{n} [w]))^{2}] \leq E_{μ_{n}^{k} \times U^{r_{Q} (k, j)} \times σ} [({^Q}^{k j} (y, z) - f_{n} (y, {^P}_{n} [w]))^{2}]$

Denoting ${^σ}_{n}^{k, p (k, j) + i} := σ_{n}^{k, p (k, j) + i} \times μ_{n}^{k} \times U^{p (k, j) + i}$ , The left hand side satisfies

$E_{{^σ}_{n}^{k, p (k, j) + i} \times σ} [(P_{n}^{k, p (k, j) + i} (y, z, x) - f_{n} (y, {^P}_{n} [w]))^{2}] = E_{{^σ}_{n}^{k, p (k, j) + i}} [(P_{n}^{k, p (k, j) + i} (y, z, x) - E_{σ} [f_{n} (y, {^P}_{n} [w])])^{2}] + V a r_{σ} [f_{n} (y, {^P}_{n} [w])]$

Similarly, for the right hand side we have

$E_{μ_{n}^{k} \times U^{r_{Q} (k, j)} \times σ} [({^Q}^{k j} (y, z) - f_{n} (y, {^P}_{n} [w]))^{2}] = E_{μ_{n}^{k} \times U^{r_{Q} (k, j)}} [({^Q}^{k j} (y, z) - E_{σ} [f_{n} (y, {^P}_{n} [w])])^{2}] + V a r_{σ} [f_{n} (y, {^P}_{n} [w])]$

Combining the two together, we get

$E_{{^σ}_{n}^{k, p (k, j) + i}} [(P_{n}^{k, p (k, j) + i} (y, z, x) - E_{σ} [f_{n} (y, {^P}_{n} [w])])^{2}] \leq E_{μ_{n}^{k} \times U^{r_{Q} (k, j)}} [({^Q}^{k j} (y, z) - E_{σ} [f_{n} (y, {^P}_{n} [w])])^{2}]$

$E_{{^σ}_{n}^{k, p (k, j) + i}} [({^P}_{n}^{k, p (k, j) + i} (y) - R [^P]_{n} (y))^{2}] \leq E_{μ_{n}^{k} \times U^{r_{Q} (k, j)}} [({^Q}^{k j} (y) - R [^P]_{n} (y))^{2}]$

Applying Lemma B.4 we conclude that there is $δ \in E_{2 (l l)}$ s.t.

$E_{{^σ}_{n}^{k j}} [({^P}_{n}^{k j} (y) - R [^P]_{n} (y))^{2}] \leq E_{μ_{n}^{k} \times U^{r_{Q} (k, j)}} [({^Q}^{k j} (y) - R [^P]_{n} (y))^{2}] + δ (k, j)$

Proof of Proposition 6

We need to show that for every $n \in Σ$ , $x \in supp μ_{n}$ , $q \in Ω_{μ}$ and $ϵ > 0$ , there is a finite set $X \subseteq D_{μ}$ and $δ > 0$ s.t. for every $~ q \in Ω_{μ}$ with $\forall (n, k, j, y) \in X : | q_{n}^{k j} (y) - {~ q}_{n}^{k j} (y) | < δ$ we have $| f (x, q) - f (x, ~ q) | < ϵ$ .

Choose $k \in N$ s.t. $x \in supp μ_{n}^{k}$ , $Z \subseteq Σ$ finite and $Y \subseteq {0, 1}^{*}$ finite. Define $X := {(k, j \in N, m \in Z, y \in Y) ∣ j < t_{ψ_{n m}} (k)}$ . We get

$E_{μ_{n}^{k}} [(f_{n} (y, q) - f_{n} (y, ~ q))^{2}] \leq c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k} \times μ_{m}^{k}} [(q_{m}^{k j} (y) - {~ q}_{m}^{k j} (y))^{2}]])^{α_{n}}$

$μ_{n}^{k} (x) (f_{n} (x, q) - f_{n} (x, ~ q))^{2} \leq c_{n} (ρ_{n} (Σ ∖ Z) + δ^{2})^{α_{n}}$

By choosing $Z$ with $ρ_{n} (Σ ∖ Z)$ sufficiently small and $δ$ sufficiently small we get the desired condition.

Proposition C.3

Fix a finite set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . Consider $R = (f, μ)$ a $ϕ$ -reflective system and two collections of $(p o l y, r l o g)$ -predictors ${^Q1n}n∈Σ$ and ${^Q2n}n∈Σ$ . Assume that for some $γ \in (0, 1]$ , $\forall n \in Σ : {^Q}_{1 n}_{n} E_{2 (l l)}^{γ} {^Q}_{2 n}$ . Then

$\forall n \in Σ : E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \in E_{1 (ϕ_{n})}^{\frac{1}{\infty}}$

Proof of Proposition C.3

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] = E_{μ_{n}^{k}} [(E_{σ_{1}} [f_{n} (x, e x_{μ} ({^Q}_{1} [a]))] - E_{σ_{2}} [f_{n} (x, e x_{μ} ({^Q}_{2} [a]))])^{2}]$

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq E_{μ_{n}^{k} \times σ_{1} \times σ_{2}} [(f_{n} (x, e x_{μ} ({^Q}_{1} [a_{1}])) - f_{n} (x, e x_{μ} ({^Q}_{2} [a_{2}])))^{2}]$

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq E_{σ_{1} \times σ_{2}} [c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k} \times μ_{m}^{k}} [(E_{U^{r_{1} (k, j)}} [Q_{1 m}^{k j} (x, y, a_{1})] - E_{U^{r_{2} (k, j)}} [Q_{2 m}^{k j} (x, y, a_{2})])^{2}]])^{α_{n}}]$

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq E_{σ_{1} \times σ_{2}} [c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k}} [E_{μ_{m}^{k} \times U^{r_{1} (k, j)} \times U^{r_{2} (k, j)}} [(Q_{1 m}^{k j} (x, y, a_{1}) - Q_{2 m}^{k j} (x, y, a_{2}))^{2}]]])^{α_{n}}]$

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k}} [E_{μ_{m}^{k} \times U^{r_{1} (k, j)} \times U^{r_{2} (k, j)} \times σ_{1} \times σ_{2}} [(Q_{1 m}^{k j} (x, y, a_{1}) - Q_{2 m}^{k j} (x, y, a_{2}))^{2}]]])^{α_{n}}$

Using the similarity of ${^Q}_{1}$ and ${^Q}_{2}$ there are ${δ_{n} : N^{2} \to [0, 1] \in E_{2 (l l)}}_{n \in Σ}$ s.t.

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k}} [δ_{m} (k, j)^{γ}]])^{α_{n}}$

$E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \leq c_{n} (E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k}} [δ_{m} (k, j)]])^{α_{n} γ}$

$δ_{m} \in E_{2 (l l)}$ hence $E_{λ_{ψ_{n m}}^{k}} [δ_{m} (k, j)] \in E_{1 (ψ_{n m})} \subseteq E_{1 (ϕ_{n})}$ . This implies $E_{ρ_{n}} [E_{λ_{ψ_{n m}}^{k}} [δ_{m} (k, j)]] \in E_{1 (ϕ_{n})}$ and $E_{μ_{n}^{k}} [(R [{^Q}_{1}]_{n} (x) - R [{^Q}_{1}]_{n} (x))^{2}] \in E_{1 (ϕ_{n})}^{α_{n} γ}$

Definition C.1

Fix a set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . Given $R = (f, μ)$ a $ϕ$ -reflective system, the associated reflective system $e x^{- 1} R = (Σ, e x^{- 1} f, μ)$ is defined by

$(e x^{- 1} f_{n}) (x, π) := f_{n} (x, e x_{μ} (π))$

$f_{n}$ is continuous thanks to Proposition 6 since $e x_{μ}$ is continuous.

Proof of Theorem 2

Fix a finite set $Σ$ and a collection ${ϕ_{n} \in Φ}_{n \in Σ}$ . Consider $R$ a $ϕ$ -reflective system. By Theorem 1, there is $^R$ an $E_{2 (l l)} (p o l y, r l o g)$ -optimal predictor system for $e x^{- 1} R$ . For each $n \in Σ$ we can use Proposition 4 to choose ${^P}_{n}$ , an $E_{2 (l l)} (p o l y, l o g)$ -optimal predictor for $(e x^{- 1} R) [^R]_{n} = R [^R]_{n}$ . By Theorem A.7, we have ${^P}_{n}_{n} E_{2 (l l)}^{\frac{1}{2}} {^R}_{n}$ . By Proposition C.3 this implies $E_{μ_{n}^{k}} [(R [^P]_{n} (x) - R [^R]_{n} (x))^{2}] \in E_{1 (ϕ_{n})}^{\frac{1}{\infty}}$ . This means ${^P}_{n}$ is an $E_{2 (l l, ϕ_{n})}^{*} (p o l y, l o g)$ -optimal predictor for $R [^P]_{n}$ and $^P$ is an $E_{2 (l l, ϕ)}^{*} (p o l y, l o g)$ -optimal predictor system for $R$ .

2