This article is the first in a sequence that will consider situations where probability estimates are not, by themselves, adequate to make rational decisions. This one introduces a "meta-probability" approach, borrowed from E. T. Jaynes, and uses it to analyze a gambling problem. This situation is one in which reasonably straightforward decision-theoretic methods suffice. Later articles introduce increasingly problematic cases.
A surprising decision anomaly
Let’s say I’ve recruited you as a subject in my thought experiment. I show you three cubical plastic boxes, about eight inches on a side. There’s two green ones—identical as far as you can see—and a brown one. I explain that they are gambling machines: each has a faceplate with a slot that accepts a dollar coin, and an output slot that will return either two or zero dollars.
I unscrew the faceplates to show you the mechanisms inside. They are quite simple. When you put a coin in, a wheel spins. It has a hundred holes around the rim. Each can be blocked, or not, with a teeny rubber plug. When the wheel slows to a halt, a sensor checks the nearest hole, and dispenses either zero or two coins.
The brown box has 45 holes open, so it has probability p=0.45 of returning two coins. One green box has 90 holes open (p=0.9) and the other has none (p=0). I let you experiment with the boxes until you are satisfied these probabilities are accurate (or very nearly so).
Then, I screw the faceplates back on, and put all the boxes in a black cloth sack with an elastic closure. I squidge the sack around, to mix up the boxes inside, and you reach in and pull one out at random.
I give you a hundred one-dollar coins. You can put as many into the box as you like. You can keep as many coins as you don’t gamble, plus whatever comes out of the box.
If you pulled out the brown box, there’s a 45% chance of getting $2 back, and the expected value of putting a dollar in is $0.90. Rationally, you should keep the hundred coins I gave you, and not gamble.
If you pulled out a green box, there’s a 50% chance that it’s the one that pays two dollars 90% of the time, and a 50% chance that it’s the one that never pays out. So, overall, there’s a 45% chance of getting $2 back.
Still, rationally, you should put some coins in the box. If it pays out at least once, you should gamble all the coins I gave you, because you know that you got the 90% box, and you’ll nearly double your money.
If you get nothing out after a few tries, you’ve probably got the never-pay box, and you should hold onto the rest of your money. (Exercise for readers: how many no-payouts in a row should you accept before quitting?)
What’s interesting is that, when you have to decide whether or not to gamble your first coin, the probability is exactly the same in the two cases (p=0.45 of a $2 payout). However, the rational course of action is different. What’s up with that?
Here, a single probability value fails to capture everything you know about an uncertain event. And, it’s a case in which that failure matters.
Such limitations have been recognized almost since the beginning of probability theory. Dozens of solutions have been proposed. In the rest of this article, I’ll explore one. In subsequent articles, I’ll look at the problem more generally.
Meta-probability
To think about the green box, we have to reason about the probabilities of probabilities. We could call this meta-probability, although that’s not a standard term. Let’s develop a method for it.
Pull a penny out of your pocket. If you flip it, what’s the probability it will come up heads? 0.5. Are you sure? Pretty darn sure.
What’s the probability that my local junior high school sportsball team will win its next game? I haven’t a ghost of a clue. I don’t know anything even about professional sportsball, and certainly nothing about “my” team. In a match between two teams, I’d have to say the probability is 0.5.
My girlfriend asked me today: “Do you think Raley’s will have dolmades?” Raley’s is our local supermarket. “I don’t know,” I said. “I guess it’s about 50/50.” But unlike sportsball, I know something about supermarkets. A fancy Whole Foods is very likely to have dolmades; a 7-11 almost certainly won’t; Raley’s is somewhere in between.
How can we model these three cases? One way is by assigning probabilities to each possible probability between 0 and 1. In the case of a coin flip, 0.5 is much more probable than any other probability:
We can’t be absolutely sure the probability is 0.5. In fact, it’s almost certainly not exactly that, because coins aren’t perfectly symmetrical. And, there’s a very small probability that you’ve been given a tricky penny that comes up tails only 10% of the time. So I’ve illustrated this with a tight Gaussian centered around 0.5.
In the sportsball case, I have no clue what the odds are. They might be anything between 0 to 1:
In the Raley’s case, I have some knowledge, and extremely high and extremely low probabilities seem unlikely. So the curve looks something like this:
Each of these curves averages to a probability of 0.5, but they express different degrees of confidence in that probability.
Now let’s consider the gambling machines in my thought experiment. The brown box has a curve like this:
Whereas, when you’ve chosen one of the two green boxes at random, the curve looks like this:
Both these curves give an average probability of 0.45. However, a rational decision theory has to distinguish between them. Your optimal strategy in the two cases is quite different.
With this framework, we can consider another box—a blue one. It has a fixed payout probability somewhere between 0 and 0.9. I put a random number of plugs in the holes in the spinning disk—leaving between 0 and 90 holes open. I used a noise diode to choose; but you don’t get to see what the odds are. Here the probability-of-probability curve looks rather like this:
This isn’t quite right, because 0.23 and 0.24 are much more likely than 0.235—the plot should look like a comb—but for strategy choice the difference doesn’t matter.
What is your optimal strategy in this case?
As with the green box, you ought to spend some coins gathering information about what the odds are. If your estimate of the probability is less than 0.5, when you get confident enough in that estimate, you should stop. If you’re confident enough that it’s more than 0.5, you should continue gambling.
If you enjoy this sort of thing, you might like to work out what the exact optimal algorithm is.
In the next article in this sequence, we’ll look at some more complicated and interesting cases.
Further reading
The “meta-probability” approach I’ve taken here is the Ap distribution of E. T. Jaynes. I find it highly intuitive, but it seems to have had almost no influence or application in practice. We’ll see later that it has some problems, which might explain this.
The green and blue boxes are related to “multi-armed bandit problems.” A “one-armed bandit” is a casino slot machine, which has defined odds of payout. A multi-armed bandit is a hypothetical generalization with several arms, each of which may have different, unknown odds. In general, you ought to pull each arm several times, to gain information. The question is: what is the optimal algorithm for deciding which arms to pull how many times, given the payments you have received so far?
If you read the Wikipedia article and follow some links, you’ll find the concepts you need to find the optimal green and blue box strategies. But it might be more fun to try on your own first! The green box is simple. The blue box is harder, but the same general approach applies.
Wikipedia also has an accidental list of formal approaches for problems where ordinary probability theory fails. This is far from complete, but a good starting point for a browser tab explosion.
Acknowledgements
Thanks to Rin’dzin Pamo, St. Rev., Matt_Simpson, Kaj_Sotala, and Vaniver for helpful comments on drafts. Of course, they may disagree with my analyses, and aren’t responsible for my mistakes!
However, a single probability for each outcome given each strategy is all the information needed. The problem is not with using single probabilities to represent knowledge about the world, it's the straw math that was used to represent the technique. To me, this reasoning is equivalent to the following:
"You work at a store where management is highly disorganized. Although they precisely track the number of days you have worked since the last payday, they never remember when they last paid you, and thus every day of the work week has a 1/5 chance of being a payday. For simplicity's sake, let's assume you earn $100 a day.
You wake up on Monday and do the following calculation: If you go in to work, you have a 1/5 chance of being paid. Thus the expected payoff of working today is $20, which is too low for it to be worth it. So you skip work. On Tuesday, you make the same calculation, and decide that it's not worth it to work again, and so you continue forever.
I visit you and immediately point out that you're being irrational. After all, a salary of $100 a day clearly is worth it to you, yet you are not working. I look at your calculations, and immediately find the problem: You're using a single probability to represent your expected payoff from working! I tell you that using a meta-probability distribution fixes this problem, and so you excitedly scrap your previous calculations and set about using a meta-probability distribution instead. We decide that a Gaussian sharply peaked at 0.2 best represents our meta-probability distribution, and I send you on your way."
Of course, in this case, the meta-probability distribution doesn't change anything. You still continue skipping work, because I have devised the hypothetical situation to illustrate my point (evil laugh). The point is that in this problem the meta-probability distribution solves nothing, because the problem is not with a lack of meta-probability, but rather a lack of considering future consequences.
In both the OPs example and mine, the problem is that the math was done incorrectly, not that you need meta-probabilities. As you said, meta-probabilities are a method of screening off additional labels on your probability distributions for a particular class of problems where you are taking repeated samples that are entangled in a very particular sort of way. As I said above, I appreciate the exposition of meta-probabilities as a tool, and your comment as well has helped me better understand their instrumental nature, but I take issue with what sort of tool they are presented as.
If you do the calculations directly with the probabilities, your calculation will succeed if you do the math right, and fail if you do the math wrong. Meta-probabilities are a particular way of representing a certain calculation that succeed and fail on their own right. If you use them to represent the correct direct probabilities, you will get the right answer, but they are only an aid in the calculation, they never fix any problem with direct probability calculations. The fixing of the calculation and the use of probabilities are orthogonal issues.
To make a blunt analogy, this is like someone trying to plug an Ethernet cable into a phone jack, and then saying "when Ethernet fails, wifi works", conveniently plugging in the wifi adapter correctly.
The key of the dispute in my eyes is not whether wifi can work for certain situations, but whether there's anything actually wrong with Ethernet in the first place.
So, my observation is that without meta-distributions (or A_p), or conditioning on a pile of past information (and thus tracking /more/ than just a probability distribution over current outcomes), you don't have the room in your knowledge to be able to even talk about sensitivity to new information coherently. Once you can talk about a complete state of knowledge, you can begin to talk about the utility of long term strategies.
For example, in your example, one would have the same probability of being paid today if 20% of employers actually pay you every da... (read more)