Suppose that a coin has been tossed and come to rest, but I have not looked at it yet. What probability should I assign to the outcomes of heads and tails? It seems to me your analysis would say that the coin actually has either a 100% probability of being heads or a 100% probability of being tails, not a 50% probability of each.
This contradicts the entire idea of probability as being a measure of one's own imperfect information. Is this your intention? If not, what is the difference between the assignment of probability to the unobserved but already existing coin toss and to the unobserved but already existing amount in the envelope?
Thank you for responding. This is indeed a very tricky issue, and I was looking for a sounding board... anyone who could challenge me in order to help me to clarify my explanation. I didn't expect so many haters in this forum, but the show must go on with or without them.
My undergraduate degree is in math, and mathematicians sometimes use the phrase "without loss of generality" (WLOG). Every once in a while they will make a semi-apologetic remark about the phrase because they all know that, if it were ever to be used in an inappropriate way, then everything could fall apart. Appealing to WLOG is not a cop-out but rather an attempt to tell those who are evaluating the proof, "Tell me if I'm wrong."
In your example of a coin flip, I can find no loss of generality. However, in the two envelopes problem, I can. If step (1) of the argument had said "unselected envelope" rather than "selected envelope", then the argument would have led the player to choose to keeping the selected argument rather than switching it. Why should the argument using the words "selected envelope" be more persuasive than the argument involving the words "unselected envelope"? Do you see what I mean? There is an implicit "WLOG" but, in this case, with an actual loss in generality.
This problem still leaves me feeling very troubled because, even to the extent that I understand the fallacy, it still seems very difficult for me to know whether I have explained it in a way that leaves absolutely no room for confusion (which is very rare when I see an actual error in somebody's reasoning). And apparently, I was not able to explain the fallacy in a way that others could understand. As far as I'm concerned, that's a sign of a very dangerous fallacy. And I've encountered some very deep and dangerous fallacies. So, this one is still quite disturbing to me.
To follow a maxim of Edwin Jaynes, when a paradox arises in matters of probability, one must consider the generating process from which the probabilities were derived.
How does the envelope-filler choose the amount to put in either envelope? He cannot pick an "arbitrary" real number. Almost all real numbers are so gigantic as to be beyond human comprehension. Let us suppose that he has a probability distribution over the non-negative reals from which he draws a single value , and puts into one envelope and into the other. (One could also imagine that he puts into the other, or tosses a coin to decide between and , but I'll stick with this method.)
Any such probability distribution must tail off to zero as becomes large. Suppose the envelope-chooser is allowed to open the first envelope, and then is allowed to switch to the other one if they think it's worth switching. The larger the value they find in the first envelope, the less likely it is that the other envelope has twice as much. Similarly, if they find a very small value in the first envelope (i.e. well into the lower tail of the distribution), then they can expect to profit by switching.
In the original version, of course, they do not see what is in the envelope before deciding whether to switch. So we must consider the expected value of switching conditional on the value in the first envelope, summed or integrated over the probability distribution of what is in that envelope.
I shall work this through with an example probability distribution. Suppose that the probability of the chosen value being is for all positive integers , and no other value of is possible. (Taking would be simpler, but that distribution has infinite expected value, which introduces its own paradoxes.)
I shall list all the possible ways the game can play out.
1. $2 in the envelope in your hand, $4 in the other. Probability for selecting the value , and for picking up the envelope containing , so . Value of switching is , so the contribution of this possibility to the expected value of switching is .
2. $4 in your hand, $2 in the other. Probability , value of switching , expectation .
3. $4 in your hand, $8 in the other. Probability , value of switching , expectation .
4. $8 in your hand, $4 in the other. Probability , value of switching , expectation .
And so on. Now, we can pair these up as , , , etc. and see that the expected value of switching without knowledge of the first envelope's contents is zero. But that is just the symmetry argument against switching. To dissolve the paradoxical argument that says that you should always switch, we pair up the outcomes according to the value in the first envelope.
If it has $2, the value of switching is .
If it has $4, the value is .
If it has $8, the value is .
The sum of all of the negative terms is , cancelling out the positive one. The expected value is zero.
The general term in this sum is, for ,
, which is negative. The value conditional on the value having been drawn is just this divided by , which leaves it still negative. If we write and , this works out to and . The expected value given is then . Observe how this weights the negative value three times as heavily as the positive value, but the positive value is only twice as large.
Compare with the argument for switching, which instead computes the expected value as , which is positive. It is neglect of the distribution from which was drawn that leads to this wrong calculation.
I worked this through for just one distribution, but I expect that a general proof can be given, at least for all distributions for which the expected value of is finite.
Thanks for offering that solution. It seems appropriate to me. I think that the issue at stake is related to the difference in programming language semantics between a probabilistic and nondeterministic semantics. Once you have decided on a nondeterministic semantics, you can't simply start adding in probabilities and expect it to make sense. So, your solution suggests that we should have had grounded the entire problem in a probability distribution, whereas I was saying that, because we hadn't done that, we couldn't legitimately add probabilities into the picture at a later step. I wasn't ruling out the possibility of a solution like yours, and it would indeed be interesting to know whether yours can be generalized in any way. In a prior draft of this post, I actually suggested that we could introduce a random variable before the envelope was chosen (although I hadn't even attempted to work out the details). It was only for the sake of brevity that I omitted suggesting that idea.
My interest is more in the philosophy of language and how language can be deceptive — which is clearly happening in some way in statement of this problem — and what we can do to guard ourselves against that. What bothers me is that, even when I claimed to have spotted where where and how the false step occurred, nobody wanted to believe that I spotted it, or at least they they didn't believe that it mattered. That's rather disturbing to me because this problem involves a relatively simple use of language. And I think that humans are in a bit a trouble if we can't even get on the same page about something this simple... because we've got very serious problems right now in regard to A.I. that are much more complicated and tricky than this to deal with than this one.
But I do like your solution, and I'm glad that it's documented here if nowhere else.
And for anyone who reads this, I apologize if the tone of my post was off-putting. I deliberately chose a slightly provocative title simply to draw attention to this post. I don't mind being corrected if I'm mistaken or have misspoken.
Here's the general calculation.
Take any probability distribution defined on the set of all values where and are non-negative reals and . It can be discrete, continuous, or a mixture.
Let be the marginal distribution over . This method of defining avoids the distinction between choosing and then doubling it, or choosing and then halving it, or any other method of choosing such that .
Assume that has an expected value, denoted by .
The expected value of switching when the amount in the first envelope is in the range consists of two parts:
(i) The first envelope contains the smaller amount. This has probability . The division by 2 comes from the 50% chance of choosing the envelope with the smaller amount.
(ii) The first envelope contains the larger amount. This has probability . The extra factor of 2 comes from the fact that when the contents are in an interval of length , half of that (the amount chosen by the envelope-filler) is in an interval of length .
In the two cases the gain from switching is respectively or .
The expected gain given the contents is therefore .
Multiply this by , let tend to 0 (eliminating the term in ) and integrate over the real line:
The first integral is . In the second, substitute (therefore ), giving . The two integrals cancel.
I was about today years old when I learned of the two envelopes problem during one of my not-so-unusual attempts to do a breadth-first-search of the entirety of Wikipedia. Below is a summary of the relevant parts of the relevant article. (For your convenience, I omitted some irrelevant details in the "switching argument".)
The Wikipedia article currently has 37 citations and describes a number of proposed solutions to the problem. I would assume that there are some insights in each of the described proposals, but I felt that reading them would most likely distract me with false leads. So, I'm not entirely sure what others have said about the problem. However, my solution is 101 words long. So, if I have to read 1,001 words in order to figure out whether someone else has solved it, then, in a certain sense, they haven't "solved" it. Maybe I haven't either. Maybe there is a 17-syllable haiku that could serve as a solution. But the compression algorithm underlying my uses of language is not, I suspect, haiku-complete.
Preface
The apparent existence of a problem here is a consequence of the way in which we are attempting to explain the apparent existence of a problem. Therefore, we could call it an illusory problem. However, such a label is misleading because even illusory problems can rather quickly turn into problems that are very real when people around us don't understand that the problems are illusory. (If you haven't encountered such a phenomenon in real life, I can assure you that it is by no means a purely hypothetical possibility.) So, our ability to succinctly explain the nature of the unreality of these illusory problems is a skill that can have significant real-world consequences. In other words, contriving such problems and discussing them is not at all a waste of time. If we can explain even one really well, then we'll probably find an anti-pattern that can crop up in multiple scenarios.
My Solution
Statement (2) begins, "The probability that A is...". That's an interesting way to begin that statement, given the fact the variable A is not a random variable. It was introduced to refer to an already-determined value. We don't know which value that is. However, there are three facts that we do know:
In light of the third fact, we see that statement (2) is false.
Further Analysis
In this scenario, a random variable would indeed need to be introduced in order to model the intuitive, common-sense reasoning that we would use in everyday life. However, we have to introduce the random variable into the problem statement at the point where common sense can still prevail over nonsense.[1]
The fact that we don't know which value it is that A has with a probability of 100% is indeed quite inconvenient. In fact, it means that the variable A is almost surely entirely useless to us for any purpose other than allowing us an opportunity to explain its uselessness. But that was a state of affairs that we created for ourselves. There's no one else to blame. We have to accept responsibility for our own choices and not to confuse them with the choices made by other people, real or hypothetical.[2]
Anti-pattern: Confusing the ignorance that we're trying to model with our own ignorance of how to specify that model or what we've actually specified.
More generally, the poor choices that we've made about how we describe or model problems cannot be overcome simply by adding more probability theory into our reasoning wherever we seem to get stuck.
If you want to know how this could all seem relatively obvious to me (which is not to say that I didn't spend a couple of hours trying to put this explanation into words), I will credit the fact that I spent much of my time in my Ph.D. program using automated proof assistants to check my proofs. If you know anything about such software, then you know that those tools don't cut you much slack. You learn to say what you mean and mean what you say, even if it takes you five times as long to say it. In the process, you also learn something about when and how a person can get fooled.
Then again, who's to say what "common sense" really is? According to Principia Discordia, "Common sense tells us the earth is flat."
I think Jimmy Buffett might have been trying to say something about that in one of his songs: