Let's say Omega turns up and sets you a puzzle, since this seems to be what Omega does in his spare time. He has with him an opaque jar, which he says contains some solid-colored beads, and he's going to draw one bead out of the jar. He would like to know what your probability is that the bead will be red.
Well, now there is an interesting question. We'll bypass the novice mistake of calling it .5, of course; just because the options are binary (red or non-red) doesn't make them equally likely. It's not like you have any information. Assuming you don't think Omega is out to deliberately screw with you, you could say that the probability is .083 based on the fact that "red" is one of twelve basic color words in English. (If he had asked for the probability that the bead would be lilac, you'd be in a bit more trouble.) If you were obliged to make a bet that the bead is red, you would probably take the most conservative bet available (even if you're still assuming Omega isn't deliberately screwing with you), but .083 sounds okay.
But because you start with no information, it's very hard to gather more. Suppose Omega reaches into the jar and pulls out a red bead. Does your probability that the second bead will be red go up (obviously the beads come in red)? Does it go down (that might have been the only one, and however many red beads there were before, there are fewer now)? Does it stay the same (the beads are all - as far as you know - independent of one another; removing this one bead has an effect on the actual probabilities of what the next one will be, but it can't affect your epistemic probability)? What if he pulled out a gray bead first, instead of a red one? How many beads would he have to pull, and in what colors, for you to start making confident predictions?
So that's one kind of probability: the bead jar guess. It has a basis, but it's a terribly flimsy one, and guessing right (or wrong) doesn't help much to confirm or disconfirm the guess. Even if Omega had asked about the bead being lilac, and you'd dutifully given a tiny probability, it would not have surprised you to see a lilac bead emerge from the jar.
A non-bead-jar-guess probability yields surprise when it turns out to be true even if it's just the same size. Say your probability for lilac was .003. That's tiny. If you had a probability of .003 that it would rain on a particular day, you would be right to be astonished if you turned out to need the umbrella you left at home.
Bead jar guesses vacillate more easily. Although in the case of the bead jar, you are in an extremely disadvantageous position when it comes to getting more information, we can fix that: somebody who says she's peeked into the jar says all the beads in the jar are red. Just like that, you'll discard the .083 and swap it for a solid .99 (adjusted as you like for the possibility that she is lying or can't see well). It would take considerable evidence to move a probability that far if it were not a wild guess, not just a single person's say-so, but that's all you've got. Then Omega pulling out a bead can give you information: the minute he pulls out the gray bead you know you can't rely on your informant, at least not completely. You can start making decent inferences.
I think more of our beliefs are bead jar guesses than we realize, but because of assorted insidious psychological tendencies, we don't recognize that and we hold onto them tighter than baseless suppositions deserve.
I think this post could have been more formally worded. It draws a distinction between two types of probability assignment, but the only practical difference given is that you'd be surprised if you're wrong in one case but not the other. My initial thought was just that surprise is an irrational thing that should be disregarded ― there's no term for "how surprised I was" in Bayes' Theorem.
But let's rephrase the problem a bit. You've made your probability assignments based on Omega's question: say 1/12 for each color. Now consider another situation where you'd give an identical probability assignment. Say I'm going to roll a demonstrated-fair twelve-sided die, and ask you the probability that it lands on one. Again, you assign 1/12 probability to each possibility.
(Actually, these assignments are spectacularly wrong, since they give a zero probability to all other colors/numbers. Nothing deserves a zero probability. But let's assume you gave a negligible but nonzero probability to everything else, and 1/12 is just shorthand for "slightly less than 1/12, but not enough to bother specifying".)
So as far as everything goes, your probability assignments for the two cases look identical up to this point. Now let's say I offer you a bet: we'll go through both events (drawing a bead and putting it back, or rolling the die) a million times. If your estimate of the probability of red/one was within 1% of correct in that sample, I give you $1000. Otherwise, you give me $1000.
In the case of the die, we would all take the bet in a heartbeat. We're very sure that our figures are correct, since the die is demonstrated to be fair, and 1% is a lot of wiggle room for the law of large numbers. But you'd have to be crazy to take the same bet on the jar, despite having assigned a precisely identical chance of winning.
So what's the difference? Isn't all the information you care about supposed to be encapsulated in your probability distribution? What is the mathematical distinction between these two cases that causes such a clear difference in whether a given bet is rational? Are we supposed to not only assign probabilities to which events will occur, but also to our probabilities themselves, ad infinitum?
Not quite. The intuitive notion of "how surprised you were" maps closely to bayesian likelihood ratios.
Regarding your die/beads scenarios:
In your die scenario, you have one highly favored model that assigns equal probability to each possible number. In the beads scenario you have many possible models, all with low probability; averaging their predictions gives equal probability to each possible color.
To simplify things, let's say our only models are M, which predicts the out... (read more)