I'm obviously new to this whole thing, but is this a largely undebated, widely accepted view on probabilities? That there are NO situations in which you can't meaningfully state a probability?
Actually, yes, but you're right to be surprised because it's (to my mind at least) an incredible result. Cox's theorem establishes this as a mathematical result from the assumption that you want to reason quantitatively and consistently. Jaynes gives a great explanation of this in chapters 1 and 2 of his book "Probability Theory".
But how valid is this result when we knew nothing of the original distribution?
The short answer is that a probability always reflects your current state of knowledge. If I told you absolutely nothing about the coin or the distribution, then you would be entirely justified in assigning 50% probability to heads (on the basis of symmetry). If I told you the exact distribution over p then you would be justified in assigning a different probability to heads. But in both cases I carried out the same experiment -- it's just that you had different information in the two trials. You are justified in assigning different probabilities because Probability is in the mind. The knowledge you have about the distribution over p is just one more piece of information to roll into your probability.
With this in mind, do we still believe that it's not wrong (or less wrong? :D) to assume a normal distribution, make our calculations and decide how much you'd bet that the mean of the next 100,000 samples is in the range -100..100?
That depends on the probability that the coin flipper chooses a Cauchy distribution. If this were a real experiment then you'd have to take into account unwieldy facts about human psychology, physics of coin flips, and so on. Cox's theorem tells us that in this case there is a unique answer in the form of a probability, but it doesn't guarantee that we have time, resources, or inclination to actually calculate it. If you want to avoid all those kinds of complicated facts then you can start from some reasonable mathematical assumptions such as a normal distribution over p - but if your assumptions are wrong then don't be surprised when your conclusions turn out wrong.
Thanks for this, it really helped.
it doesn't guarantee that we have time, resources, or inclination to actually calculate it
Here's how I understand this point, that finally made things clearer:
Yes, there exists a more accurate answer, and we might even be able to discover it by investing some time. But until we do, the fact that such an answer exists is completely irrelevant. It is orthogonal to the problem.
In other words, doing the calculations would give us more information to base our prediction on, but knowing that we can do the calculation doesn't...
Suppose I tell you I have an infinite supply of unfair coins. I pick one randomly and flip it, recording the result. I've done this a total of 100 times and they all came out heads. I will pay you $1000 if the next throw is heads, and $10 if it's tails. Each unfair coin is entirely normal, whose "heads" follow a binomial distribution with an unknown p. This is all you know. How much would you pay to enter this game?
I suppose another way to phrase this question is "what is your best estimate of your expected winnings?", or, more generally, "how do you choose the maximum price you'll pay to play this game?"
Observe that the only fact you know about the distribution from which I'm drawing my coins is those 100 outcomes. Importantly, you don't know the distribution of each coin's p in my supply of unfair coins. Can you reasonably assume a specific distribution to make your calculation, and claim that it results in a better best estimate than any other distribution?
Most importantly, can one actually produce a "theoretically sound" expectation here? I.e. one that is calibrated so that if you pay your expected winnings every time and we perform this experiment lots of times then your average winnings will be zero - assuming I'm using the same source of unfair coins each time.
I suspect that the best one can do here is produce a range of values with confidence intervals. So you're 80% confident that the price you should pay to break even in the repeated game is between A80 and B80, 95% confident it's between A95 and B95, etc.
If this is really the best obtainable result, then what is a bayesianist to do with such a result to make their decision? Do you pick a price randomly from a specially crafted distribution, which is 95% likely to produce a value between A95..B95, etc? Or is there a more "bayesian" way?