We do ten experiments. A scientist observes the results, constructs a theory consistent with them, and uses it to predict the results of the next ten. We do them and the results fit his predictions. A second scientist now constructs a theory consistent with the results of all twenty experiments.
The two theories give different predictions for the next experiment. Which do we believe? Why?
One of the commenters links to Overcoming Bias, but as of 11PM on Sep 28th, David's blog's time, no one has given the exact answer that I would have given. It's interesting that a question so basic has received so many answers.
Peter de Blanc is right: Theories screen off the theorists. It doesn't matter what data they had, or what process they used to come up with the theory. At the end of the data, you've got twenty data points, and two theories, and you can use your priors in the domain (along with things like Occam's Razor) to compute the likelihoods of the two theories.
But that's not the puzzle. The puzzle doesn't give us the two theories. Hence, strictly speaking, there is no correct answer.
That said, we can start guessing likelihoods for what answer we would come up with, if we knew the two theories. And here what is important is that all we know is that both theories are "consistent" with the data they had seen so far. Well, there are an infinite number of consistent theories for any data set, so that's a pretty weak constraint.
Hence people are jumping into the guess that scientist #2 will "overfit" the data, given the extra 10 observations.
But that's not a conclusion you ought to make before seeing the details of the two theories. Either he did overfit the data, or he didn't, but we can't determine that until we see the theories.
So what it comes down to is that the first scientist has less opportunity to overfit the data, since he only saw the first 10 points. And, its successful prediction of the next 10 points is reasonable evidence that theory #1 is on the right track, whereas we have precious little evidence (from the puzzle) about theory #2.
But this doesn't say that theory #1 _isbetter than theory #2. It only says that, if we ever had the chance to actually correctly evaluate both theories (using Bayesian priors on both theories and all the data), then we currently expect theory #1 will win that battle more often than theory #2.
But that's a weak, kind of indirect, conclusion.