This article is the first in a sequence that will consider situations where probability estimates are not, by themselves, adequate to make rational decisions. This one introduces a "meta-probability" approach, borrowed from E. T. Jaynes, and uses it to analyze a gambling problem. This situation is one in which reasonably straightforward decision-theoretic methods suffice. Later articles introduce increasingly problematic cases.
A surprising decision anomaly
Let’s say I’ve recruited you as a subject in my thought experiment. I show you three cubical plastic boxes, about eight inches on a side. There’s two green ones—identical as far as you can see—and a brown one. I explain that they are gambling machines: each has a faceplate with a slot that accepts a dollar coin, and an output slot that will return either two or zero dollars.
I unscrew the faceplates to show you the mechanisms inside. They are quite simple. When you put a coin in, a wheel spins. It has a hundred holes around the rim. Each can be blocked, or not, with a teeny rubber plug. When the wheel slows to a halt, a sensor checks the nearest hole, and dispenses either zero or two coins.
The brown box has 45 holes open, so it has probability p=0.45 of returning two coins. One green box has 90 holes open (p=0.9) and the other has none (p=0). I let you experiment with the boxes until you are satisfied these probabilities are accurate (or very nearly so).
Then, I screw the faceplates back on, and put all the boxes in a black cloth sack with an elastic closure. I squidge the sack around, to mix up the boxes inside, and you reach in and pull one out at random.
I give you a hundred one-dollar coins. You can put as many into the box as you like. You can keep as many coins as you don’t gamble, plus whatever comes out of the box.
If you pulled out the brown box, there’s a 45% chance of getting $2 back, and the expected value of putting a dollar in is $0.90. Rationally, you should keep the hundred coins I gave you, and not gamble.
If you pulled out a green box, there’s a 50% chance that it’s the one that pays two dollars 90% of the time, and a 50% chance that it’s the one that never pays out. So, overall, there’s a 45% chance of getting $2 back.
Still, rationally, you should put some coins in the box. If it pays out at least once, you should gamble all the coins I gave you, because you know that you got the 90% box, and you’ll nearly double your money.
If you get nothing out after a few tries, you’ve probably got the never-pay box, and you should hold onto the rest of your money. (Exercise for readers: how many no-payouts in a row should you accept before quitting?)
What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course of action is different. What’s up with that?
Here, a single probability value fails to capture everything you know about an uncertain event. And, it’s a case in which that failure matters.
Such limitations have been recognized almost since the beginning of probability theory. Dozens of solutions have been proposed. In the rest of this article, I’ll explore one. In subsequent articles, I’ll look at the problem more generally.
Meta-probability
To think about the green box, we have to reason about the probabilities of probabilities. We could call this meta-probability, although that’s not a standard term. Let’s develop a method for it.
Pull a penny out of your pocket. If you flip it, what’s the probability it will come up heads? 0.5. Are you sure? Pretty darn sure.
What’s the probability that my local junior high school sportsball team will win its next game? I haven’t a ghost of a clue. I don’t know anything even about professional sportsball, and certainly nothing about “my” team. In a match between two teams, I’d have to say the probability is 0.5.
My girlfriend asked me today: “Do you think Raley’s will have dolmades?” Raley’s is our local supermarket. “I don’t know,” I said. “I guess it’s about 50/50.” But unlike sportsball, I know something about supermarkets. A fancy Whole Foods is very likely to have dolmades; a 7-11 almost certainly won’t; Raley’s is somewhere in between.
How can we model these three cases? One way is by assigning probabilities to each possible probability between 0 and 1. In the case of a coin flip, 0.5 is much more probable than any other probability:
We can’t be absolutely sure the probability is 0.5. In fact, it’s almost certainly not exactly that, because coins aren’t perfectly symmetrical. And, there’s a very small probability that you’ve been given a tricky penny that comes up tails only 10% of the time. So I’ve illustrated this with a tight Gaussian centered around 0.5.
In the sportsball case, I have no clue what the odds are. They might be anything between 0 to 1:
In the Raley’s case, I have some knowledge, and extremely high and extremely low probabilities seem unlikely. So the curve looks something like this:
Each of these curves averages to a probability of 0.5, but they express different degrees of confidence in that probability.
Now let’s consider the gambling machines in my thought experiment. The brown box has a curve like this:
Whereas, when you’ve chosen one of the two green boxes at random, the curve looks like this:
Both these curves give an average probability of 0.45. However, a rational decision theory has to distinguish between them. Your optimal strategy in the two cases is quite different.
With this framework, we can consider another box—a blue one. It has a fixed payout probability somewhere between 0 and 0.9. I put a random number of plugs in the holes in the spinning disk—leaving between 0 and 90 holes open. I used a noise diode to choose; but you don’t get to see what the odds are. Here the probability-of-probability curve looks rather like this:
This isn’t quite right, because 0.23 and 0.24 are much more likely than 0.235—the plot should look like a comb—but for strategy choice the difference doesn’t matter.
What is your optimal strategy in this case?
As with the green box, you ought to spend some coins gathering information about what the odds are. If your estimate of the probability is less than 0.5, when you get confident enough in that estimate, you should stop. If you’re confident enough that it’s more than 0.5, you should continue gambling.
If you enjoy this sort of thing, you might like to work out what the exact optimal algorithm is.
In the next article in this sequence, we’ll look at some more complicated and interesting cases.
Further reading
The “meta-probability” approach I’ve taken here is the Ap distribution of E. T. Jaynes. I find it highly intuitive, but it seems to have had almost no influence or application in practice. We’ll see later that it has some problems, which might explain this.
The green and blue boxes are related to “multi-armed bandit problems.” A “one-armed bandit” is a casino slot machine, which has defined odds of payout. A multi-armed bandit is a hypothetical generalization with several arms, each of which may have different, unknown odds. In general, you ought to pull each arm several times, to gain information. The question is: what is the optimal algorithm for deciding which arms to pull how many times, given the payments you have received so far?
If you read the Wikipedia article and follow some links, you’ll find the concepts you need to find the optimal green and blue box strategies. But it might be more fun to try on your own first! The green box is simple. The blue box is harder, but the same general approach applies.
Wikipedia also has an accidental list of formal approaches for problems where ordinary probability theory fails. This is far from complete, but a good starting point for a browser tab explosion.
Acknowledgements
Thanks to Rin’dzin Pamo, St. Rev., Matt_Simpson, Kaj_Sotala, and Vaniver for helpful comments on drafts. Of course, they may disagree with my analyses, and aren’t responsible for my mistakes!
The exposition of meta-probability is well done, and shows an interesting way of examining and evaluating scenarios. However, I would take issue with the first section of this article in which you establish single probability (expected utility) calculations as insufficient for the problem, and present meta-probability as the solution.
In particular, you say
I do not believe that this is a failure of applying a single probability to the situation, but merely calculating the probability wrongly, by ignoring future effects of your choice. I think this is most clearly illustrated by scaling the problem down to the case where you are handed a green box, and only two coins. In this simplified problem, we can clearly examine all possible strategies.
When put in these terms, it seems quite obvious that your choice to open the box would depend on more than the expected payoff from only the first box, because quite clearly your choice to open the first box pays off (or doesn't pay off) when opening (or not opening) the other boxes as well. This seems like an error in calculating the payoff matrix rather than a flaw with the technique of single probability values itself. It ignores the fact that opening the first box not only pays you off immediately, but also pays you off in the future by giving you information about the other boxes.
This problem easily succumbs to standard expected value calculations if all actions are considered. The steps remain the same as always:
In the case of two coins, we were able to trivially calculate the outcomes of all possible strategies, but in larger instances of the problem, it might be advisable to use shortcuts in the calculations. However, it still remains true that the best choice will still be the one you would have gotten if you had done out the full expected value calculation.
I think the confusion arises because a lot of the time problems are presented in a way that screens them off from the rest of the world. For example, you are given a box, and it either has $10.00 or $100.00. Once you open the box, the only effect it has on you is the amount of money you got. After you get the money, the box does not matter to the rest of the world. Problems are presented this way so that it is easy to factor out the decisions and calculations you have to make from every other decision you have to make. However, decision are not necessarily this way (in fact in real life, very few decisions are). In the choice of inserting the first coin or not, this is simply not the case, despite having superficial similarities to standard "box" problems.
Although you clearly understand that the payoffs from the boxes are entangled, you only apply this knowledge in your informal approach to the problem. The failure to consider the full effects of your actions in opening the first box may be psychologically encouraged by the technique of "single probability calculations", but it is certainly not a failure of the technique itself to capture such situations.
A single probability cannot sum up our knowledge.
Before we talk about plans, as you went on to, we must talk about the world as it stands. We know there is a 50% chance of a 0% machine and a 50% chance of a 90% machine. Saying 45% does not encode this information. No other number does either.
Scalar probabilities of binary outcomes are such a useful hammer that we need to stop and remember sometimes that not all uncertainties are nails.