It appears to me that there is a natural analogue of the concept of irreducible pattern in the language of average case complexity. Moreover, the calibration theorem for optimal predictor schemes implies they pass the Benford test in the associated sense. I'll write it down carefully and post...
There is a typo in the text: "We say that S is an ??? with probability p." I guess this is supposed to be "irreducible pattern"?
Btw, it seems that the definition makes sense for arbitrary promise problems, you don't have to consider provability in particular.
This post is part of the Asymptotic Logical Uncertainty series. In this post, I will talk more about the exact assumptions we need to make about the sequence of sentences in the Benford Test.
I will start by making a few changes to the previous notation. First, we no longer need the output of L to be computable in bounded time. From now on, we can just say that L is a Turing machine that on input N eventually accepts if ϕN is provable, eventually rejects if ϕN is disprovable, and runs forever otherwise.
We will also change the sequence of sentences in the Benford test. Instead of "The first digit of A(n) in base 10 is 1," we will use "The first digit of 3↑n3 in base 10 is 1." This way, the reasonable answer to every question is 1log10 as opposed to before the powers of 10 were an exception. This change does not make the Benford test weaker, and the program we present will also pass the original Benford test, but it will make many definitions have fewer messy cases.
Finally, we will switch to deterministic machines. Instead of making a machine which outputs 1 with the correct probability, we will have a machine that outputs a probability. This clearly makes the problem no easier.
Here is the new version of the Benford test:
Let M be a Turing machine which on input N runs quickly and outputs a probability M(N), which represents the probability assigned to ϕN. We say that M satisfies the Benford test if limn→∞M(sn)=1log(10), where ϕsn= ``The first digit of 3↑n3 is a 1.''
In this post, we will not present a solution to the Benford test, but will be very explicit about the assumptions we need to make.
The fact that the best probability to assign to ϕsn is 1log(10) is dependent not only on the fact that frequency with which ϕsn is true tends to 1log(10), but also on the fact that the sequence of truth values of ϕsn contains no patterns that can be used to quickly compute a better probability on some subsequence. We therefore assume that this sequence of truth values is indistinguishable from a a sequence produced by a coin that outputs "true" with probability 1log(10). Formally, we are assuming that S={sn|n∈N} is an irreducible pattern with probability 1log(10) as defined below.
Fix a universal Turing machine UTM and an encoding scheme for machines, and let UTM(M,x) denote running the machine UTM to simulate M with input x.
Let S⊆N be an infinite subset of natural numbers such that ϕN is provable or disprovable for all N∈S, and there exists a Turing machine Z which on input N runs in time T(N) and accepts N if and only if N∈S. We say that S is an irreducible pattern with probability p if there exists a constant c such that for every positive integer m≥3 and every quickly computable subset S′⊆S with at least m elements, we have |r(m,W)−p|<ck(W)√loglogm√m, where k(W) is the number of bits necessary to encode a Turing machine W such that, for N∈S, UTM(W,N) accepts in time T(N) if and only if N∈S′, and r(m,W) is the probability that~ϕN is provable when N is chosen uniformly at random from the first m elements of S′.
This may seem like an unmotivated definition, but there is a good reason for it. It comes from the Law of the Iterated Logarithm. A coin that outputs true with probability p will pass this test with probability 1. The definition is tight in the sense that we cannot replace the right hand side with something that diminishes more quickly as m increases. It is also important to note that while we think that this is a good definition of the subsequence being quickly indistinguishable from a coin, we really only need it as a necessary condition, so that the sequence of Benford sentences is an irreducible pattern.
Theorem: If we replace provability in the definiton of irreducible pattern with random process, such that for each N∈S the sentence ϕN is independently called "provable" with probability p, then S would almost surely be an irreducible pattern with probability p.
Proof: Fix a Turing machine W. By the law of the iterated logarithm, there exists a constant c1 such that limsupm→∞|mr(m,W)−mp|√mloglogm=c1 almost surely. Therefore supm|mr(m,W)−mp|√mloglogm<∞ almost surely. We will use Φ(W) as a short hand for this supremum. For any ε>0, there therefore exists a c2 such that Φ(W)>c2 with probability at most ε.
We now show that the probability that Φ(W)>2c2+1 is at most ε2. It suffices to show that the probability of Φ(W)>2c2+1 given Φ(W)>c2 is at most ε. Let m1 be the first m such that |mr(m,W)−mp|√mloglogm>c2. It suffices to show that the probability that there exists an m2 with |m2r(m2,W)−m2p|√m2loglogm2−|m1r(m1,W)−m1p|√m1loglogm1>c2 is at most ε.
Observe that |m2r(m2,W)−m2p|√m2loglogm2−|m1r(m1,W)−m1p|√m1loglogm1≤|m2r(m2,W)−m1r(m1,W)−(m2−m1)p|√(m2−m1)loglog(m2−m1), and that the probability there exists an m2 with |m2r(m2,W)−m1r(m1,W)−(m2−m1)p|√(m2−m1)loglog(m2−m1)>c2 is the same as the probability that Φ(W)>c2, which is at most ε.
We have thus shown that for every ε, there exists a constant c3=c2+1 such that the probability that Φ(W)≥2ℓc3 is at most ε2ℓ.
Partition the set of all Turing machines into sets W1,W2,…, such that Wℓ contains all Turing machines expressed in at least 2ℓ, but fewer than 2ℓ+1 bits. The probability that a Turing W machine in Wℓ violates |r(m,W)−p|<c3k(W)√loglogm√m, (call this equation (⋆)) for any m≥3 is at most ε2ℓ. The number of Turing machines in Wℓ is at most 22ℓ+1, so the probability that there is any W∈Wℓ and m≥3 which violate (⋆) is at most ε2ℓ22ℓ+1.
Therefore, the probability that there is any Turing machine W and m≥3 which violate (⋆) is at most ∑ℓ∈Nε2ℓ22ℓ+1=∑ℓ∈N(4ε)2ℓ. For small enough ε this goes to 0, so for large enough c3, the probability that (⋆) holds for all W and m goes to 1. Therefore, with probability 1, there exists a c such that |r(m,W)−p|<ck(W)√loglogm√m, for all m and W.