Rival formalizations of a decision problem
Decision theory is not one of my strengths, and I have a question about it.
Is there a consensus view on how to deal with the problem of "rival formalizations"? Peterson (2009) illustrates the problem like this:
Imagine that you are a paparazzi photographer and that rumour has it that actress Julia Roberts will show up in either New York (NY), Los Angeles (LA) or Paris (P). Nothing is known about the probability of these states of the world. You have to decide if you should stay in America or catch a plane to Paris. If you stay and [she] shows up in Paris you get $0; otherwise you get your photos, which you will be able to sell for $10,000. If you catch a plane to Paris and Julia Roberts shows up in Paris your net gain after having paid for the ticket is $5,000, and if she shows up in America you for some reason, never mind why, get $6,000. Your initial representation of the decision problem is visualized in Table 2.13.
Table 2.13
| P | LA | NY | |
| Stay | $0 | $10k | $10k |
| Go to Paris | $5k | $6k | $6k |
Since nothing is known about the probabilities of the states in Table 2.13, you decide it makes sense to regard them as equally probable [see Table 2.14].
Table 2.14
| P (1/3) | LA (1/3) | NY (1/3) | |
| Stay | $0 | $10k | $10k |
| Go to Paris | $5k | $6k | $6k |
The rightmost columns are exactly parallel. Therefore, they can be merged into a single (disjuntive) column, by adding the probabilities of the two rightmost columns together (Table 2.15).
Table 2.15
| P (1/3) | LA or NY (2/3) | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
However, now suppose that you instead start with Table 2.13 and first merge the two repetitious states into a single state. You would then obtain the decision matrix in Table 2.16.
Table 2.16
| P | LA or NY | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
Now, since you know nothing about the probabilities of the two states, you decide to regard them as equally probable... This yields the formal representation in Table 2.17, which is clearly different from the one suggested above in Table 2.15.
Table 2.17
| P (1/2) | LA or NY (1/2) | |
| Stay | $0 | $10k |
| Go to Paris | $5k | $6k |
Which formalisation is best, 2.15 or 2.17? It seems question begging to claim that one of them must be better than the other — so perhaps they are equally reasonable? If they are, we have an example of rival formalisations.
Note that the principle of maximising expected value recommends different acts in the two matrices. According to Table 2.15 you should stay, but 2.17 suggests you should go to Paris.
Does anyone know how to solve this problem? If one is not convinced by the illustration above, Peterson (2009) offers a proof that rival representations are possible on pages 33–35.
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (33)
DT is not my strength either, but it seems like Peterson is just doing a simple slight-of-hand trick. The trickery is concealed in this line:
This conflicts with the earlier claim that our priors for each location are 1/3. Changing the way the table looks does not mean that we are allowed to change that prior information. P(P) is still 1/3, and P(LA or NY) is still 2/3. Calling these "two states" conceals the fact that you have manipulated the prior information, which is what's creating the "paradox."
This. Either we know nothing about each of the three states, or we know nothing about either of the two states, not both.
The trick is that until you add in the prior, you don't actually have a decision theory problem, only part of one; making the states equally probable is adding information, and shuffling states around and then making the states equally probable is adding different information.
A fully-specified decision theory problem is one that could be written as a function which takes as input a strategy and a random number generator, and outputs a utility score. If you have to add any information - priors, structure, expected opponent-strategy - then you have an underspecified problem, which puts you back in the realm of science.
I think this just repeats what Peterson is saying. The difficulty is that there are multiple "reasonable" ways to specify (formalize) the decision problem. So, whether the "rival formalizations" problem is categorized into the domain of science or decision theory, do you know a solution to the problem?
The trick is that when he condenses LA and NY into an "America" option, he is actually throwing away information, thus changing the problem. If he didn't throw away that information, he couldn't apply the indifference principle to Paris vs. LA/NY, because knowing that LA and NY are two cities while Paris is one breaks the symmetry that the indifference principle relies on.
Now, it's entirely reasonable to get that same effect by saying something like "well, Julia Roberts really likes Paris, so her chance of showing up there is twice that of the other cities." This sort of thing cannot be practically represented by the indifference principle, thus replacing symmetry with arbitrariness. But the arbitrariness is about which problems are possible, not about the solution to an individual problem.
Suppose I subdivide Paris into two districts?
And, presumably, assign one district each to LA and NY? I bet you can guess the answer.
The trouble with these spatial examples is that everyone has all these pesky intuitions lying around. "Space is continuous, of course!" we think, and "cities are made of parts!" But the formal statement of the problem, if the principle of indifference is to be useful, must generally be quite low-information - if the symmetry between the cities is thoroughly broken by us having tons of knowledge about the cities, the example is false as stated.
In order to get in the low-information mindset, it helps to replace meaningful (to us) labels with meaningless ones. In the first "formalization," all we know is that Julia Roberts could be in one of 3 named cities. Avoiding labels, all we know is that agent 1 could have mutually exclusive and exhaustive properties A, B and C. As soon as the problem is stated this way it becomes clearer that you can't just condense properties B and C together without changing the problem.
I never said that?
Why does "the formal statement of the problem" matter? Reality doesn't depend on how the problem is phrased.
You seem to be trying to find an answer that would satisfy a hypothetical teacher not the answer that you would use if you had something to protect.
Suppose I instead called the options A1, B1 and B2. Renaming the options shouldn't change anything after all.
It's another form of the Bayesian priors problem, which I believe is fundamentally unsolvable. A Solomonoff prior gets you to within a constant factor, given sufficient computational resources, but that constant factor is allowed to be huge. You can drive the problem out from specific domains by gathering enough evidence about them to overwhelm the priors, but with a fixed pool of evidence, you really do have to just guess.
Regarding a set of states as equally probable is significant not for scientific or decision-theoretic reasons, but because it's a Schelling point in debates over priors. Unfortunately, as you have noticed, there can be arbitrarily many Schelling points, and the number of points increases as you add more vagaries to the problem. There are special cases in which you can derive an ignorance prior from symmetry - such as if the labels on the locations were known to have been shuffled in a uniformly random way - but the labels in this case are not symmetrical.
Why are you surprised that incompatible priors (called "rival formalizations" by Peterson) produce incompatible decisions?
The "consensus" view (also the only one that seems to make sense) is likely that the more accurate map (in this case - literally) of the territory (e.g. three equiprobable cities instead of two equiprobable continents) produces better decisions.
This problem is similar to the bead jar guess problem. Essentially the problem is where priors come from and it doesn't have a general solution within the context of Bayesianism. Bayes can tell you how to update your priors, but not what your initial priors should be.
The best thing to do in this problem, when you're not sure what priors you should assign, is to work backwards and figure out what priors you need to arrive at one solution or the other. In this case:
Let P = Pr(Julia Roberts goes to Paris). Then E(Stay) = 10(1-P) = 10 - 10P and E(Go) = 5P + 6(1-P) = 6 + P. So E(Stay) > E(Go) if 10 - 10 P > 6 + P or 4 > 11 P or P < 4/11.
Now, instead of trying to decide "what does the Holy Doctrine of Indifference direct us to do in this situation" we can think about the real question: is the probability that Julia Roberts goes to Paris less than 4/11?
The question being asked here is really "what priors should I assign?". I don't think I have the answer, but allow me to restate the problem:
"The subject is known to be headed for one of three places, A, B and C. What is the probability they turn up in each place?" To which the answer is 1/3 each by indifference.
Now why did it look ok to us to merge B and C into one option, (B or C)? Because (in the original problem) B and C were cities located in the same country, the U.S., and that prior geographical information had been incorporated into the problem. When we condition on the knowledge that A is in one country (France) and B and C are in another (the U.S.) the problem is, well, no longer symmetrical. And I confess I'm now actually unsure how or if indifference or maximum entropy can be applied to this now asymmetric problem.
You have to use the information about the asymmetry, which in this case involves an actress and geopolitical boundaries. This isn't a case where there's an elegant ignorance prior, you just have to actually use your knowledge.
As far as I can tell, this is just the standard complaint about the (naive?) Principle of Indifference and doesn't have much to do with decision theory per se. E.g., here's Keynes talking about a similar case. The most plausible solutions I know of are to either 1. insist that there simply are no rational constraints besides the axioms of probability on how we should weight the various possibilities in the absence of evidence and hence the problem is underdetermined (it depends on our "arbitrary" priors), or 2. accept that this is a real problem with Bayesian epistemology and hope something better comes along that doesn't model all doxastic attitudes as probabilities.
Or, I suppose, 3. Tells us how to actually calculate some priors. That would be fine too.
I don't know of any plausible, objective, truly general methods of calculating priors. Solomonoff induction or whatever isn't going to help very much.
Solomonoff induction (or similar) gives you your priors on your first day in the world.
You get a zillion updates after that that bear on the question of where Julia Roberts is most likely to be.
Could you tell me how to use Solomonoff induction to estimate the prior probability of Julia Roberts being in New York vs. LA vs America vs Paris?
Solomonoff induction lets you calculate priors for observing any finite stream of sense data. Pick a reference machine, enumerate the ways in which you might learn about her location and off you go.
O.K., let's imagine I've enumerated all the ways E1, E2, E3, ... in which I could learn about Julia Roberts' location. What do I do now?
Read up about Solomonoff induction, by the sound of it. I gave you one link, and here is another one. You will need to use a computable approximation.
I'm familiar with Solomonoff induction. I don't think it can be used to do what you want it to do (though open to be convinced otherwise), which is why I'm trying to ask you to spell out in detail how you think the highly formal mathematical machinery could be applied in principle to a real-world case like this one. In particular, I'm trying to ascertain how exactly you bridge the gap - in a general way - between the purely syntactic algorithmic complexity of a sequence of English letters and the relative probability of the statement that sequence semantically represents.
There is no "gap-bridging". Solomonoff induction gives you the probability of a sequence of symbols. In practice that is typically applied to the sense data streams of agents (such as produced by camera or microphone), to give estimates of their probabilities. Solomonoff induction knows nothing of semantics - it just works on symbol sequences.