In this post I define the concept of quasi-optimal predictors which is a weaker variant on the theme of optimal predictors. I explain the properties of quasi-optimal predictors that I currently understand (which are completely parallel to the properties of optimal predictors) and give an example where there is a quasi-optimal predictor but there is no optimal predictor.

All proofs are given in the appendix and are mostly analogous to proofs of corresponding theorems for optimal predictors.

Definition 1

Given a distributional decision problem, a quasi-optimal predictor for is a family of polynomial size Boolean circuits s.t. for any family of polynomial size Boolean circuits we have

where .

Theorem 1

Consider a distributional decision problem and a quasi-optimal predictor for . Suppose , are s.t.

Then:

Theorem 2

Consider a word ensemble and , disjoint languages. Suppose is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then, is a quasi-optimal predictor for .

Theorem 3

Consider a word ensemble and , disjoint languages. Suppose is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then, is a quasi-optimal predictor for .

Theorem 4

Consider , distributional decision problems with respective quasi-optimal predictors and . Define as the family of circuits computing . Then, is a quasi-optimal predictor for .

Theorem 5

Consider and a word ensemble. Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Then is a quasi-optimal predictor for

Theorem 6

Consider and a word ensemble. Assume . Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Define as the circuit family computing

Then, is a quasi-optimal predictor for .

Definition 2

Consider a word ensemble and two circuit families. We say is quasisimilar to relative to (denoted ) when .

Theorem 7

Consider a distributional decision problem, a quasi-optimal predictor for and a polynomial size family. Then, is a quasi-optimal predictor for if and only if .

Definition 3

Consider , distributional decision problems, a polynomial size family of circuits. is called a (non-uniform) strong pseudo-invertible reduction of to when there is a polynomial s.t. the following conditions hold:

(i)

(ii) There is s.t.

(iii) There is a polynomial and a family of polynomial size circuits s.t.

(iv) There are polynomial size circuits s.t.

Theorem 8

Consider , distributional decision problems, a strong pseudo-invertible reduction of to and a quasi-optimal predictor for . Define as the family of circuits computing . Then, is a quasi-optimal predictor for .

Theorem 9

Consider a one-to-one non-uniformly hard one-way function. Define . Then, is a quasi-optimal predictor for .

Appendix

Lemma 1

Consider a distributional decision problem and a family of polynomial size. Then, is a quasi-optimal predictor if and only if there is a function s.t.

(i) is non-decreasing in the second argument.

(ii) For any polynomial :

In the following, we will call functions satisfying conditions (i) and (ii) quasinegligible.

(iii) for any we have

Proof of Lemma 1

Define

Lemma 2

Consider a distributional decision problem and a corresponding quasi-optimal predictor. Then, there is a function s.t.

(i) is non-decreasing in the second and third arguments.

(ii) For all polynomials :

(iii) for all , and we have

Proof of Lemma 2

Given , denote

Consider circuit computing the following function:

There is a polynomial s.t. . By Lemma 1,

for quasinegligible.

Integrating the inequality with respect to from to , we get

Proof of Theorem 1

Define

Assume to the contrary that there is and an infinite set s.t.

Define as the circuits computing

is bounded by a polynomial since produces binary fractions of polynomial size therefore it is possible to compare them to the fixed numbers using a polynomial size circuit even if the latter have infinite binary expansions.

We have

Define to be truncated to the first significant binary digit. Define as the circuits computing

By the assumption, has binary notation of bounded size, therefore is bounded by a polynomial.

Applying Lemma 2 we get

for vanishing at infinity.

Obviously , therefore

The expression on the left hand side is a quadratic polynomial in which attains its maximum at and has roots at and . is between and , but not closer to than . Therefore, the inequality is preserved if we replace by .

Substituting the equation for we get

Thus vanishes at infinity on , which is a contradiction.

Lemma 3

Consider a distributional decision problem. If is a quasi-optimal predictor for then there are and a quasinegligible function s.t. for any we have

Conversely, suppose and is a polynomial size family for which there is a quasinegligible function s.t. for any we have

Define to be s.t. computing is equivalent to computing rounded to digits after the binary point. Then, is a quasi-optimal predictor.

Proof of Lemma 3

Assume is an optimal predictor. Consider and where and . The function can be approximated by a circuit of size for some fixed polynomial , within rounding error s.t. . By Lemma 1,

where is quasinegligible. is bounded by a negligible function and therefore can be ignored by redefining . As in the proof of Theorem 1, can be dropped.

The expression on the left hand side is a quadratic polynomial in . Explicitly:

Moving to the right hand side and dividing both sides by we get

Take where is the rounding error. We get

Conversely, assume that for any

Consider . We have

can be computed by a circuit of size polynomial in and . Applying the assumption we get

where is quasinegligible. Noting that and we get

Observing that is bounded by a negligible function, we get the desired result.

Proof of Theorem 2

Consider . We have

Using Lemma 3:

Therefore

Using Lemma 3 again we get the desired result.

Proof of Theorem 4

We have

Therefore, for any

By Lemma 3, it is sufficient to show an appropriate bound for each of the terms on the right hand side. For the first term, we have

For any given , can be computed by a circuit with input of size polynomial in and . Applying Lemma 3 to , we get

where is a polynomial and is quasinegligible. Since is bounded by a polynomial in for , we get the bound we need.

For the second term, we have

For any given , can be computed by a circuit with input of size polynomial in , and . Applying Lemma 3 to , we get

Again, we got the required bound.

Proof of Theorem 7

Assume is a quasi-optimal predictor. Applying Lemma 3 to predictor and circuits computing , we get

for some vanishing at infinity. Applying Lemma 3 to predictor and circuits computing , we get

for some vanishing at infinity. We have

Conversely, assume . Consider some . We have

for some vanishing at infinity, since .

for some quasinegligible , using Lemma 3. Combining both inequalities we get

Using Lemma 3 again we conclude is a quasi-optimal predictor.

Lemma 4

Consider and a word ensemble. Assume is a quasi-optimal predictor for and is a quasi-optimal predictor for . Define

Then, .

Proof of Lemma 4

By Theorem 3 and Lemma 3 there is a quasinegligible function such that for any we have

Take to be the circuit computing . Its size is polynomial in therefore

where vanishes at infinity.

Since both terms inside the absolute value are non-negative we get the desired result.

Proof of Theorem 6

When we have

Define to be the circuit computing . Since , Lemma 4 implies that . This implies and by Theorem 7 is a quasi-optimal predictor for .

We have (whether or ) and therefore

Consider .

By Lemma 3 it is sufficient to prove appropriate bounds on and . Both bounds follow from Lemma 3 using the facts and are quasi-optimal predictors and is bounded by a polynomial.

Proof of Theorem 8

Consider , . Define to be the circuit computing . Applying Lemma 2, treating as a constant and using as the weight circuit, we get

where is quasinegligible. We used condition (ii) to get a constant bound on and condition (iv) to get a polynomial bound on .

We take the expectation value of both sides with respect to the uniform measure over :

The left hand side can be rewritten as follows

Grouping the sum by , we get

The first term on the right hand side can be rewritten as

Grouping the sum by we get:

Condition (iii) tells us that is only non-vanishing when and that in this case it equals . Therefore

Putting everything together, we get

Proof of Theorem 9

Assume to the contrary that is not quasi-optimal. Then there is an infinite set , a polynomial size family of circuits and s.t.

Define the functions by . We have

Substituting into the inequality above

For every given , is either non-negative for all or non-positive for . Hence we can move the absolute value inside the integral:

This implies that we can choose s.t.

Using the fact that the graph of the square root lies below its tangent at any point, this leads to

Define as the circuits computing . The definitions of and imply that is bounded by a polynomial. The inequality above and the definitions of and imply

But this contradicts the assumption on .

Note that this argument doesn't show is optimal since while the averaging over preserves the property of vanishing at infinity, it doesn't preserve the property of negligibility. Moreover, it is possible to show that no optimal predictor for exists.

New Comment
2 comments, sorted by Click to highlight new comments since:

So, to be clear, the difference between an optimal predictor and a quasi-optimal predictor is as follows: is the amount by which some other polynomial-size circuit family is able to beat the current predictor. An optimal predictor cannot be beaten by any more than a such that for any , while a quasi-optimal predictor can only assert it cannot be beaten by any more than a such that . Yes?

Exactly. It's such a simple tweak that it's embarrassing I haven't noticed it in the first place. It's just that in average complexity theory negligible errors seem much more popular. The price to pay is that some theorems require stronger assumptions namely something that had to be bounded by a polynomial now has to be bounded by a constant. On the other hand Theorems 9 demonstrates that there is no chance optimal predictors cover all of (for example) whereas quasi-optimal predictors might work. Also, I think that in there is a universal construction for uniform quasi-optimal predictors (as opposed to optimal predictors) similar to Levin's universal search although I haven't fleshed out the details yet (anyhow such a construction would be theoretically valid but highly inefficient in practice).