Comment author:mcdowella
11 May 2010 07:16:04PM
0 points
[-]

If there are hidden variables and random noise, you can still be learning after repeating an experience an arbitrary number of times. Consider the probability of observed x calculated after reestimating the distribution on hidden variable t. We calculate this by integrating the probability of x given t, p(x|t), over all possible t weighted by the probability of t given x, p(t|x). We have

Integral p(x|t)p(t|x) dt = Integral p(x|t)p(x|t)p(t)/p(x) dt =
Expectation(p(x|t)^2)/p(x) =
Expectation(p(x|t))^2/p(x) + Variance(p(x|t))/p(x)
≥ Expectation(p(x|t))^2/p(x) = p(x).
Here the expectation is over the prior distribution of t. Note that we have equality iff the variance of P(x|t), according to our prior distribution on t, is zero, which is to say that the probability of x given t is constant almost everywhere the prior distribution on t is positive. If this variance is not zero, then p(x) in this calculation changes (increases) which means that we are revising our distribution on t, and changing our minds.

Right, but even with a digital brain, if you only have a finite number of bits to store the floating point number representing the probabilities, eventually you will run out of bits. What you just described gets you a whole lot of new experience, but not a literally infinite amount.

## Comments (65)

OldIf there are hidden variables and random noise, you can still be learning after repeating an experience an arbitrary number of times. Consider the probability of observed x calculated after reestimating the distribution on hidden variable t. We calculate this by integrating the probability of x given t, p(x|t), over all possible t weighted by the probability of t given x, p(t|x). We have

Integral p(x|t)p(t|x) dt = Integral p(x|t)p(x|t)p(t)/p(x) dt = Expectation(p(x|t)^2)/p(x) = Expectation(p(x|t))^2/p(x) + Variance(p(x|t))/p(x) ≥ Expectation(p(x|t))^2/p(x) = p(x). Here the expectation is over the prior distribution of t. Note that we have equality iff the variance of P(x|t), according to our prior distribution on t, is zero, which is to say that the probability of x given t is constant almost everywhere the prior distribution on t is positive. If this variance is not zero, then p(x) in this calculation changes (increases) which means that we are revising our distribution on t, and changing our minds.

Right, but even with a digital brain, if you only have a finite number of bits to store the floating point number representing the probabilities, eventually you will run out of bits. What you just described gets you a whole lot of new experience, but not a literally infinite amount.