We do ten experiments. A scientist observes the results, constructs a theory consistent with them, and uses it to predict the results of the next ten. We do them and the results fit his predictions. A second scientist now constructs a theory consistent with the results of all twenty experiments.
The two theories give different predictions for the next experiment. Which do we believe? Why?
One of the commenters links to Overcoming Bias, but as of 11PM on Sep 28th, David's blog's time, no one has given the exact answer that I would have given. It's interesting that a question so basic has received so many answers.
Here's my answer, prior to reading any of the comments here, or on Friedman's blog, or Friedman's own commentary immediately following his statement of the puzzle. So, it may have already been given and/or shot down.
We should believe the first theory. My argument is this. I'll call the first theory T1 and the second theory T2. I'll also assume that both theories made their predictions with certainty. That is, T1 and T2 gave 100% probability to all the predictions that the story attributed to them.
First, it should be noted that the two theories should have given the same prediction for the next experiment (experiment 21). This is because T1 should have been the best theory that (would have) predicted the first batch. And since T1 also correctly predicted the second batch, it should have been the best theory that would do that, too. (Here, "best" is according to whatever objective metric evaluates theories with respect to a given body of evidence.)
But we are told that T2 makes exactly the same predictions for the first two batches. So it also should have been the best such theory. It should be noted that T2 has no more information with which to improve itself. T1, for all intents and purposes, also knew the outcomes of the second batch of experiments, since it predicted them with 100% certainty. Therefore, the theories should have been the best possible given the first two batches. In particular, they should have been equally good.
But if "being the best, given the first two batches" doesn't determine a prediction for experiment 21, then neither of these "best" theories should be predicting the outcome of experiment 21 with certainty. Therefore, since it is given that they are making such predictions, they should be making the same one.
It follows that at least one of the theories is not the best, given the evidence that it had. That is, at least one of them was constructed using flawed methods. T2 is more likely to be flawed than is T1, because T2 only had to post-dict the second batch. This is trivial to formalize using Bayes's theorem. Roughly speaking, it would have been harder for T1 to been constructed in a flawed way and still have gotten its predictions for the second batch right.
Therefore, T1 is more likely to be right than is T2 about the outcome of experiment 21.