I can't seem to get my head around a simple issue of judging probability. Perhaps someone here can point to an obvious flaw in my thinking.
Let's say we have a binary generator, a machine that outputs a required sequence of ones and zeros according to some internally encapsulated rule (deterministic or probabilistic). All binary generators look alike and you can only infer (a probability of) a rule by looking at its output.
You have two binary generators: A and B. One of these is a true random generator (fair coin tosser). The other one is a biased random generator: stateless (each digit is independently calculated from those given before), with probability of outputting zero p(0) somewhere between zero and one, but NOT 0.5 - let's say it's uniformly distributed in the range [0; .5) U (.5; 1]. At this point, chances that A is a true random generator are 50%.
Now you read the output of first ten digits generated by these machines. Machine A outputs 0000000000. Machine B outputs 0010111101. Knowing this, is the probability of machine A being a true random generator now less than 50%?
My intuition says yes.
But the probability that a true random generator will output 0000000000 should be the same as the probability that it will output 0010111101, because all sequences of equal length are equally likely. The biased random generator is also just as likely to output 0000000000 as it is 0010111101.
So there seems to be no reason to think that a machine outputting a sequence of zeros of any size is any more likely to be a biased stateless random generator than it is to be a true random generator.
I know that you can never know that the generator is truly random. But surely you can statistically discern between random and non-random generators?
This is the mistake.
If you actually do the maths the biased generator is significantly more likely to output 0000000000 than 0010111101.
For a much simpler example, suppose we run on two times. The random generator outputs 00 25% of the time, 01 25% of the time, 10 25% of the time and 11 25% of the time.
For the biased generator, we need calculus. First suppose its p(0) = x. Then p(00 | p(0) = x ) = x^2. Since we have what is essentially a uniform distribution over [0,1] (the presence of absence of a single point makes no difference) we need to integrate f(x) = x^2 over the interval [0,1], which gives and answer of p(00) = 1/3. The same method gives p(11) = 1/3 and p(01) = p(10) = 1/6.
The general rule, is that if we run it n times, for any k between 0 and n the chance of it outputting k 1s is 1/(n+1), and that probability is shared out evenly among all possible different ways of outputing k 1s (also derivable from calculus). Thus p(0000000000) = 9.1% while p(0010111101) = 0.043%.
Thanks, this is a great answer. It didn't occur to me that stateless generator with unknown p(0) will have such a "preference" for all-digits-are-same-sequences. p(ten zeros) = 1/11 if p(0) can be any number; but p(ten zeros)=1/1024 if p(0)=1/2.