The second assumption, however, is harder to justify. There are many ways that a calculation of odds could go wrong (putting a decimal point in the wrong place, making a multiplication error, unknowingly misunderstanding the laws of probability, actually being insane, etc.) If we could really enumerate all of them, understand how they effect our computed payout probability, and estimate the probability of each occurring, then we could compute this missing factor exactly. As things stand though, it is probably untenable. It should not be expected though that errors that make the payout probability artificially larger will balance those that make it artificially smaller. Misplacing a decimal point, for example, will almost certainly be noticed if it leads to a percentage greater than 100%, but not if it leads to one that is less than that (creating an asymmetry).
This is a valid point, and one I missed in my writeup. (Toby_Ord said something similar, but that was in response to a specific question.)
It is probably a useful skill to recognize asymmetries in the possible direction of error, such as that which you pointed out. I can see two ways to handle this:
a. Additional terms in the derivation, such as P(decimal-point error) and P(sign error), with the e term restricted to the unanticipated-error case.
b. Modification of e.
Related: Advancing Certainty, Reversed Stupidity Is Not Intelligence
The substance of this post is derived from a conversation in the comment thread which I have decided to promote. Teal;deer: if you have to rely on a calculation you may have gotten wrong for your prediction, your expectation for the case when your calculation is wrong should use a simpler calculation, such as reference class forecasting.
Edit 2010-01-19: Toby Ord mentions in the comments Probing the Improbable: Methodological Challenges for Risks with Low Probabilities and High Stakes (PDF) by Toby Ord, Rafaela Hillerbrand, and Anders Sandberg of the Future of Humanity Institute, University of Oxford. It uses a similar mathematical argument, but is much more substantive than this.
A lottery has a jackpot of a million dollars. A ticket costs one dollar. Odds of a given ticket winning are approximately one in forty million. If your utility is linear in dollars, should you bet?
The obvious (and correct) answer is "no". The clever (and incorrect) answer is "yes", as follows:
The logic is not obviously wrong, but where is the error?
First, let us write out the calculation algebraically. Let E(L) be the expected value of playing the lottery. Let p(L) be your calculated probability that the lottery will pay off. Let p(C) be your probability that your calculations are correct. Finally, let j represent the value of the jackpot and let t represent the price of the ticket. The obvious way to write the clever theory is:
E(L) = max(p(L), 1-p(C)) * j - t
This doesn't sound quite right, though - surely you should ascribe a higher confidence when you calculate a higher probability. That said, when p(L) is much less than p(C), it shouldn't make a large difference. The straightforward way to account for this is to take p(C) as the probability that p(L) is correct, and write the following:
E(L) = [ p(C)*p(L) + 1-p(C) ] * j - t
which can be rearranged as:
E(L) = p(C) * [p(L)*j - t] + (1-p(C)) * [j - t]
I believe this exposes the problem with the clever argument quite explicitly. Why, if your calculations are incorrect (probability 1-p(C)), should you assume that you are certain to win the lottery? If your calculations are incorrect, they should tell you almost nothing about whether you will win the lottery or not. So what do you do?
What appears to me the elegant solution is to use a less complex calculation - or a series of less complex calcuations - to act as your backup hypothesis. In a tricky engineering problem (say, calculating the effectiveness of a heat sink), your primary prediction might come out of a finite element fluid dynamics calculator with p(C) = 0.99 and narrow error bars, but you would also refer to the result of a simple algebraic model with p(C) = 0.9999 and much wider error bars. And then you would backstop the lot with your background knowledge about heat sinks in general, written with wide enough error bars to call p(C) = 1 - epsilon.
In this case, though, the calculation was simple, so our backup prediction is just the background knowledge. Say that, knowing nothing about a lottery but "it's a lottery", we would have an expected payoff e. Then we write:
E(L) = p(C) * [p(L)*j - t] + (1-p(C)) * e
I don't know about you, but for me, e is approximately equal to -t. And justice is restored.
We are advised that, when solving hard problems, we should solve multiple problems at once. This is relatively trivial, but I can point out a couple other relatively trivial examples where it shows up well:
Suppose the lottery appears to be marginally profitable: should you bet on it? Not unless you are confident in your numbers.
Suppose we consider the LHC. Should we (have) switch(ed) it on? Once you've checked that it is safe, yes. As a high-energy physics experiment, the backup comparison would be to things like nuclear energy, which have only small chances of devastation on the planetary scale. If your calculations were to indicate that the LHC is completely safe, even if your P(C) were as low as three or four nines (99.9%, 99.99%), your actual estimate of the safety of turning it on should be no lower than six or seven nines, and probably higher. (In point of fact, given the number of physicists analyzing the question, P(C) is much higher. Three cheers for intersubjective verification.)
Suppose we consider our Christmas shopping? When you're estimating your time to finish your shopping, your calculations are not very reliable. Therefore your answer is strongly dominated by the simpler, much more reliable reference class prediction.
But what are the odds that this ticket won't win the lottery? ...how many nines do I type, again?