The description of the coin flips having a Binomial(n=?,p) distribution, instead of a Bernoulli(p) distribution, might be a cause of the mis-reading.
Perhaps - obviously each coin is flipped just once, i.e. Binomial(n=1,p), which is the same thing as Bernoulli(p). I was trying to point out that for any other n it would work the same as a normal coin, if someone were to keep flipping it.
I'd love to know if there are established formal approaches to this.
You should probably look at Jaynes's book "Probability Theory: the Language of Science". In particular, I think that the discussion there dealing with the Widget Problem and with Laplace's Rule of Succession may be relevant to your question.
And just as it gets really interesting, that chapter ends. There is no solution provided for stage 4 :/
I think the key point here is that there is uncertainty in both the distribution of coins and the outcome of any particular coin flip. The Bayesian answer is that you roll these two sources of uncertainty into a single uncertainty over the outcome of the next coin flip.
If this were a real world situation then your evidence would (ideally) include such unwieldy things as facts about your possible motivations, all the real-world facts about the physics of coin flips, and so on. Bayesianism tells us that there is a unique answer in the form of a probability for the next coin to be heads, but obviously the math to figure that out is enormously complicated.
If you don't want to deal with all this then you can pick some nicer mathematical starting point, like that the you have a uniform distribution over how biased the coins are, or a beta distribution, or some such. In this case do as follows
Let h be some hypothesis about what distribution the unfair coins follow
Write out P(h) according to the assumptions you made. A reasonable choice is the maximum entropy distribution (which in this case is uniform over the parameter p)
Let D be the data that you observed (100 heads). Write out P(D | h) - this will be a straightforward binomial distribution in this problem.
Write out P(h | D) - give a hurrah for Bayes rule at this point
Let T be the event of getting tails on the next flip. Write P(T | D) by marginalising over the hypotheses like integral-over-h P(T, h | D) = integral-over-h P(T | h) P(h | D)
That's it!
Bayesianism tells us that there is a unique answer in the form of a probability for the next coin to be heads
I'm obviously new to this whole thing, but is this a largely undebated, widely accepted view on probabilities? That there are NO situations in which you can't meaningfully state a probability?
For example, let's say we have observed 100 samples of a real-valued random variable. We can use the maximum entropy principle, and thus use the normal distribution (whcih is maximal-entropy for unbounded reals). We then use standard methods to estimate population mean, and can even provide a probability that it's in a certain interval.
But how valid is this result when we knew nothing of the original distribution? What if it was something awkward like the Cauchy distribution? It has no mean; so our interval is meaningless. You can't just say that "well, we're 60% certain it's in this interval, that leaves 40% chance of us being wrong" - because it doesn't; the mean isn't outside the interval either! A complete answer would allow for a third outcome, that the mean isn't defined, but how exactly do you assign a number to this probability?
With this in mind, do we still believe that it's not wrong (or less wrong? :D) to assume a normal distribution, make our calculations and decide how much you'd bet that the mean of the next 100,000 samples is in the range -100..100? (the sample means of Cauchy distributions diverge as you add more samples)
For any finite amount of data you won't perfectly break even using a bayesian method, but it's better than all the alternatives, as long as you don't leave out some data.
What!? Are you saying that you can predict in advance that you'll lose money? Surely that can't happen, because you get to choose how much you want to pay, so you can always just pay less. No?
I read this to say that you can't calculate a value that is guaranteed to break even in the long term, because there isn't enough information to do this. (which I tend to agree with)
Would you have tried to bet us anyways if you had not landed heads 100 times?
If I were trying to make a profit then I'd need to know how much to charge for entry. If I could calculate that then yes, I'd offer the bet regardless of how many heads came out of 100 trials.
But this is entirely beside the point; the purpose of this thought experiment is for me to show which parts of bayesianism I don't understand and solicit some feedback on those parts.
In particular, a procedure that I could use to actually pick a break-even price of entry would be very helpful.
I.e. one that is calibrated so that if you pay your expected winnings every time and we perform this experiment lots of times then your average winnings will be zero - assuming I'm using the same source of unfair coins each time.
This may be tangential, but how do you run this experiment lots of times? If you just abort the runs where you get any tails among the initial 100 throws, then I think seeing 100 heads in a row doesn't mean anything much.
You take the evidence, and you decide that you pay X. Then we run it lots of times. You pay X, I pick a random coin and flip it. I pay your winnings. You pay X again, I pick again, etc. X is fixed.
My take at it is basically this: average over all possible distributions until you have further evidence. (Preferably, let other people play the game first to gather the evidence at no cost to myself.)
If someone tells me a coin has an unknown binomial distribution, and we really genuinely don’t know anything about this distribution (not even the distribution of possible distributions), I take the set of all possible distributions and assume they are all equally likely. Since they are symmetric, the average is a 50:50 fair coin.
In your example, you throw not just one coin, but a different one each time. Differently put, the sequence of coins is a random variable whose distribution we don’t know. I therefore begin by assuming it is the average over all possible distributions, and within that average distribution, the distribution of first coins is symmetric, so its average is 50:50, so for the first coin toss I’ll assume that it’s a 50:50.
Say the first coin toss now comes out tails. This provides me with a slight piece of evidence that your set of unfair coins may be biased towards containing more tail-preferring coins than head-preferring coins. Therefore, I will assume that the second coin is more likely to be a tail-preferring coin than a head-preferring one. I’d have to do the maths to find out exactly how much my ante would change.
With each new coin toss, I learn more about the likelihood of each distribution of coins within the set of all possible distributions.
I suspect that the level of indirection can actually be safely removed and we can regard the whole thing as a single random binary digit generator with a single distribution, about which we initially don’t know anything and gradually build up evidence. I further suspect that if the distribution of coins is uniformly distributed, and therefore symmetric, the sequence of coin tosses is equivalent to a single, fair coin. Once again, someone would have to do the maths to prove whether this is actually equivalent.
Preferably, let other people play the game first to gather the evidence at no cost to myself.
For the record, this is not permitted.
My take at it is basically this: average over all possible distributions
It's easy to say this but I don't think this works when you start doing the maths to get actual numbers out. Additionally, if you really take ALL possible distributions then you're already in trouble, because some of them are pretty weird - e.g. the Cauchy distribution doesn't have a mean or a variance.
distribution about which we initially don’t know anything and gradually build up evidence
I'd love to know if there are established formal approaches to this. The only parts of statistics that I'm familiar with assume known distributions and work from there. Anyone?
You still decide what coins are in the infinite pool, right?
The properties of the pool are unknown to you, so you have to take into account the possibility that I've tuned them somehow. But you do know that the 100 coins I drew from that pool were drawn fairly and randomly.
I figure that people usually offer bets to make money, so if I assume that you are a selfish actor, your coins will come up tails practically all the time -- you have no incentive that I can think of to ever make them come out heads. Expected payout is $10 each time, so I will pay a maximum of $10 to play the game each time.
Or is your intent for this to be an "Omega tells you..." thought-experiment?
I have clarified my post to specify that for each flip, I pick a coin from this infinite pool at random. Suppose you also magically know with absolute certainty that these givens are true. Still $10?
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Actually, yes, but you're right to be surprised because it's (to my mind at least) an incredible result. Cox's theorem establishes this as a mathematical result from the assumption that you want to reason quantitatively and consistently. Jaynes gives a great explanation of this in chapters 1 and 2 of his book "Probability Theory".
The short answer is that a probability always reflects your current state of knowledge. If I told you absolutely nothing about the coin or the distribution, then you would be entirely justified in assigning 50% probability to heads (on the basis of symmetry). If I told you the exact distribution over p then you would be justified in assigning a different probability to heads. But in both cases I carried out the same experiment -- it's just that you had different information in the two trials. You are justified in assigning different probabilities because Probability is in the mind. The knowledge you have about the distribution over p is just one more piece of information to roll into your probability.
That depends on the probability that the coin flipper chooses a Cauchy distribution. If this were a real experiment then you'd have to take into account unwieldy facts about human psychology, physics of coin flips, and so on. Cox's theorem tells us that in this case there is a unique answer in the form of a probability, but it doesn't guarantee that we have time, resources, or inclination to actually calculate it. If you want to avoid all those kinds of complicated facts then you can start from some reasonable mathematical assumptions such as a normal distribution over p - but if your assumptions are wrong then don't be surprised when your conclusions turn out wrong.
Thanks for this, it really helped.
Here's how I understand this point, that finally made things clearer:
Yes, there exists a more accurate answer, and we might even be able to discover it by investing some time. But until we do, the fact that such an answer exists is completely irrelevant. It is orthogonal to the problem.
In other words, doing the calculations would give us more information to base our prediction on, but knowing that we can do the calculation doesn't change it in the slightest.
Thus, we are justified to treat this as "don't know at all", even though it seems that we do know something.
Great read, and I think things have finally fit into the right places in my head. Now I just need to learn to guesstimate what the maximum entropy distribution might look like for a given set of facts :)
Well, that and how to actually churn out confidence intervals and expected values for experiments like this one, so that I know how much to bet given a particular set of knowledge.