Split the researchers that generate the data from the reasoner who is trying to estimate the hidden double from the data.
What is the data that the estimator receives? There is clearly a string of 100 bits indicating the results of the comparisons, but there is also another datum which indicates that the experiment was stopped after 100 iterations. This is a piece of evidence which must be included in the model, and the way to include it depends on the estimator's knowledge of the stopping criterion used by the data generator.
The estimator has to take into account the possibility of cherry picking.
EDIT:
I think I can use an example:
Suppose that I give you N =~ 10^9 bits of data generated according to the process you describe, and I declare that I had precommitted to stop gathering data after exactly N bits. If you trust me, then you must believe that you have an extremely accurate estimate of the hidden double. After all, you are using 1 gigabit of data to estimate less than 64 bits of entropy!
But then you learn that I lied about the stopping criterion, and I had in fact precommitted to stop gathering data at the point that it would have fooled you into believing with very high probability that the hidden number was, say, 0.42.
Should you update your belief on the hidden double after hearing of my deception? Obviously you should. In fact, the observation that I gave you so much data now makes the estimate extremely suspect, since the more data I give you the more I can manipulate your estimate.
So, suppose I know the stopping criterion and the number of button presses that it took to stop the sequence, but I wasn't given the actual sequence.
It seems to me like I can use the two of those to recreate the sequence, for a broad class of stopping criteria. "If it took 100 presses, then clearly it must be 70 1s and 30 0s, because if it had been 71 1s and 29 0s he would have stopped then and there would be only 99 presses, but he wouldn't have stopped at 69 1s and 30 0s." I don't think I have any additional info.
...Should you update your belief
[Epistemic status | Contains generalization based on like three data points.]
In grad school, I took a philosophy of science class that was based around looking for examples of bad reasoning in the scientific literature. The kinds of objections to published scientific studies we talked about were not stupid ones. The professor had a background in statistics, and as far as I could tell knew her stuff in that area (though she dismissed Bayesianism in favor of frequentism). And no, unlike some of the professors in the department, she wasn't an anti-evolutionist or anything like that.
Instead she was convinced that cellphones cause cancer. In spite of the fact that there's scant evidence for that claim, and there's no plausible physial mechanism for how that could happen. This along with a number of other borderline-fringe beliefs that I won't get into here, but that was the big screaming red flag.*
Over the course of the semester, I got a pretty good idea of what was going on. She had an agenda—it happened to be an environmentalist, populist, pro-"natural"-things agenda, but that's incidental. The problem was that when she saw a scientific study that seemed at odds with her agenda, she went looking for flaws. And often she could find them! Real flaws, not ones she was imagining! But people who've read the rationalization sequence will see a problem here...
In my last post, I quoted Robin Hanson on the tendency of some physicists to be unduly dismissive of other fields. But based the above case and a couple others like it, I've come to suspect statistics may be even worse than physics in that way. That fluency in statistics sometimes causes a supercharged sophistication effect.
For example, some anthropogenic global warming skeptics make a big deal of alleged statistical errors in global warming research, but as I wrote in my post Trusting Expert Consensus:
Most recently, I got into a Twitter argument with someone who claimed that "IQ is demonstrably statistically meaningless" and that this was widely accepted among statisticians. Not only did this set off my "academic clique!" alarm bells, but I'd just come off doing a spurt of reading about intelligence, including the excellent Intelligence: A Very Short Introduction. The claim that IQ is meaningless was wildly contrary to what I understood was the consensus among people who study intelligence for a living.
In response to my surprise, I got an article that contained lengthy and impressive-looking statistical arguments... but completely ignored a couple key points from the intelligence literature I'd read: first, that there's a strong correlation between IQ and real-world performance, and second that correlations between the components of intelligence we know how to test for turn out to be really strong. If IQ is actually made up of several independent factors, we haven't been able to find them. Maybe some people in intelligence research really did make the mistakes alleged, but there was more to intelligence research than the statistician who wrote the article let on.
It would be fair to shout a warning about correspondence bias before inferring anything from these cases. But consider two facts:
The first fact may make it tempting to think that if you know a lot of statistics, you're in a priviledged position to judge the validity of any scientific claim you come across. But the second fact means that if you've specialized in statistics, you'll probably be better at it than most scientists, even good scientists. So if you go scrutinizing their papers, there's a good chance you'll find clear mistakes in their stats, and an even better chance you'll find arguable ones.
Bayesians will realize that, since there's a good chance that of happening even when the conclusion is correct and well-supported by the evidence, finding mistakes in the statistics is only weak evidence that the conclusion is wrong. Call it the statistician's fallacy: thinking that finding a mistake in the statistics is sufficient grounds to dismiss a finding.
Oh, if you're dealing with a novel finding that experts in the field aren't sure what to make of yet, and the statistics turns out to be wrong, then that may be enough. You may have better things to do than investigate further. But when a solid majority of the experts agree on a conclusion, and you see flaws in their statistics, I think the default assumption should be that they still know the issue better than you and very likely the sum total of the available evidence does support the conclusion. Even if the specific statistical arguments youv'e seen from them are wrong.
*Note: I've done some Googling to try to find rebuttals to this link, and most of what I found confirms it. I did find some people talking about multi-photon effects and heating, but couldn't find defenses of these suggestions that rise beyond people saying, "well there's a chance."