Suppose I tell you I have an infinite supply of unfair coins. I pick one randomly and flip it, recording the result. I've done this a total of 100 times and they all came out heads. I will pay you $1000 if the next throw is heads, and $10 if it's tails. Each unfair coin is entirely normal, whose "heads" follow a binomial distribution with an unknown p. This is all you know. How much would you pay to enter this game?
I suppose another way to phrase this question is "what is your best estimate of your expected winnings?", or, more generally, "how do you choose the maximum price you'll pay to play this game?"
Observe that the only fact you know about the distribution from which I'm drawing my coins is those 100 outcomes. Importantly, you don't know the distribution of each coin's p in my supply of unfair coins. Can you reasonably assume a specific distribution to make your calculation, and claim that it results in a better best estimate than any other distribution?
Most importantly, can one actually produce a "theoretically sound" expectation here? I.e. one that is calibrated so that if you pay your expected winnings every time and we perform this experiment lots of times then your average winnings will be zero - assuming I'm using the same source of unfair coins each time.
I suspect that the best one can do here is produce a range of values with confidence intervals. So you're 80% confident that the price you should pay to break even in the repeated game is between A80 and B80, 95% confident it's between A95 and B95, etc.
If this is really the best obtainable result, then what is a bayesianist to do with such a result to make their decision? Do you pick a price randomly from a specially crafted distribution, which is 95% likely to produce a value between A95..B95, etc? Or is there a more "bayesian" way?
My take at it is basically this: average over all possible distributions until you have further evidence. (Preferably, let other people play the game first to gather the evidence at no cost to myself.)
If someone tells me a coin has an unknown binomial distribution, and we really genuinely don’t know anything about this distribution (not even the distribution of possible distributions), I take the set of all possible distributions and assume they are all equally likely. Since they are symmetric, the average is a 50:50 fair coin.
In your example, you throw not just one coin, but a different one each time. Differently put, the sequence of coins is a random variable whose distribution we don’t know. I therefore begin by assuming it is the average over all possible distributions, and within that average distribution, the distribution of first coins is symmetric, so its average is 50:50, so for the first coin toss I’ll assume that it’s a 50:50.
Say the first coin toss now comes out tails. This provides me with a slight piece of evidence that your set of unfair coins may be biased towards containing more tail-preferring coins than head-preferring coins. Therefore, I will assume that the second coin is more likely to be a tail-preferring coin than a head-preferring one. I’d have to do the maths to find out exactly how much my ante would change.
With each new coin toss, I learn more about the likelihood of each distribution of coins within the set of all possible distributions.
I suspect that the level of indirection can actually be safely removed and we can regard the whole thing as a single random binary digit generator with a single distribution, about which we initially don’t know anything and gradually build up evidence. I further suspect that if the distribution of coins is uniformly distributed, and therefore symmetric, the sequence of coin tosses is equivalent to a single, fair coin. Once again, someone would have to do the maths to prove whether this is actually equivalent.
For the record, this is not permitted.
It's easy to say this but I don't think this works when you start doing the maths to get actual numbers out. Additionally, if you really take ALL possible distributions then you're already in trouble, because some of them are pretty weird - e.g. the Cauchy distribution doesn't have a mean or a variance.
... (read more)