Bayesianism in the face of unknowns

rstarkov

Suppose I tell you I have an infinite supply of unfair coins. I pick one randomly and flip it, recording the result. I've done this a total of 100 times and they all came out heads. I will pay you $1000 if the next throw is heads, and $10 if it's tails. Each unfair coin is entirely normal, whose "heads" follow a binomial distribution with an unknown p. This is all you know. How much would you pay to enter this game?

I suppose another way to phrase this question is "what is your best estimate of your expected winnings?", or, more generally, "how do you choose the maximum price you'll pay to play this game?"

Observe that the only fact you know about the distribution from which I'm drawing my coins is those 100 outcomes. Importantly, you don't know the distribution of each coin's p in my supply of unfair coins. Can you reasonably assume a specific distribution to make your calculation, and claim that it results in a better best estimate than any other distribution?

Most importantly, can one actually produce a "theoretically sound" expectation here? I.e. one that is calibrated so that if you pay your expected winnings every time and we perform this experiment lots of times then your average winnings will be zero - assuming I'm using the same source of unfair coins each time.

I suspect that the best one can do here is produce a range of values with confidence intervals. So you're 80% confident that the price you should pay to break even in the repeated game is between A80 and B80, 95% confident it's between A95 and B95, etc.

If this is really the best obtainable result, then what is a bayesianist to do with such a result to make their decision? Do you pick a price randomly from a specially crafted distribution, which is 95% likely to produce a value between A95..B95, etc? Or is there a more "bayesian" way?

I suppose another way to phrase this question is "what is your best estimate of your expected winnings?", or, more generally, "how do you choose the maximum price you'll pay to play this game?"

I'm obviously new to this whole thing, but is this a largely undebated, widely accepted view on probabilities? That there are NO situations in which you can't meaningfully state a probability?

It does seem to be widely accepted and largely undebated. However, it is also widely rejected and largely undebated, for example by Andrew Gelman, Cosma Shalizi, Ken Binmore, and Leonard Savage (to name just the people I happen to have seen rejecting it -- I am not a statistician, so I do not know how representative these are of the field in general, or if there has actually been a substantial debate anywhere). None of them except Ken Binmore actually present arguments against it in the material I have read, they merely dismiss the idea of a universal prior as absurd. But in mathematics, only one thing is absurd, a contradiction, and by that standard only Ken Binmore has offered any mathematical arguments. He gives two in his book "Rational Decisions": one based on Gödel-style self-reference, and the other based on a formalisation of the concept of "knowing that" as the box operator of S5 modal logic. I haven't studied the first but am not convinced by the second, which fails at the outset by defining "I know that" as an extensional predicate. (He identifies a proposition P with the set of worlds in which it is true, and assumes that "I know that P" is a function of the set representing P, not of the syntactic form of P. Therefore by that definition of knowing, since I know that 2+2=4, I know every true statement of mathematics, since they are all true in all possible worlds.)

(ETA: Binmore's S5 argument can also be found online here.)

(ETA2: For those who don't have a copy of "Rational Decisions" to hand, here's a lengthy and informative review of it.)

These people distinguish "small-world" Bayesianism from "large-world" Bayesianism, they themselves being small-worlders. Large-worlders would include Eliezer, Marcus Hutter, and everyone else who believes in the possibility of a universal prior.

A typical small-world Bayesian argument would be: I hypothesise that a certain variable has a Gaussian distribution with unknown parameters over which I have a prior distribution; I observe some samples; I obtain a posterior distribution for the parameters. A large-world Bayesian also makes arguments of this sort and they both make the same calculations.

Where they part company is when the variable in fact does not have a Gaussian distribution. For example, suppose it is a sum of two widely separated Gaussians. According to small-worlders, the large-world Bayesian is stuck with his prior hypothesis of a single Gaussian, which no quantity of observations will force him to relinquish, since it is his prior. His estimate of the mean of the Gaussian will drift aimlessly up and down like the Flying Dutchman between the two modes of the real distribution, unable to see the world beyond his prior. According to large-worlders, that prior was not the real prior which one started from. That whole calculation was really conditional on the assumption of a Gaussian, and this assumption itself has a certain prior probability less than 1, and was chosen from a space of all possible hypothetical distributions. The small-worlders reply that this is absurd, declare victory, and walk away without listening to the large-worlders explain how to choose universal priors. Instead, small-worlders insist that to rectify the fault of having hypothesised the wrong model, one must engage in a completely different non-Bayesian activity called model-checking. Chapter 6 of Gelman's book "Bayesian Data Analysis" is all about that, but I haven't read it. There is some material in this paper by Gelman and Shalizi.

(ETA: I have now read Gelman ch.6. Model-checking is performed by various means, such as (1) eyeballing visualisations of the real data and simulated data generated by the model, (2) comparing statistics evaluated for both real and simulated data, or (3) seeing if the model predicts things that conflict with whatever other knowledge you have of the phenomenon being studied.)

And that's as far as I've read on the subject. Have the small-worlders ever responded to large-worlders' construction of universal priors? Have the large-worlders ever demonstrated that universal priors are more than a theoretical construction without practical application? Has "model checking" ever been analysed in large-world Bayesian terms?