Two or three months ago, my trip to Las Vegas made me ponder the following: If all gambles in the casinos have negative expected values, why do people still engage in gambling - especially my friends fairly well-versed in probability/statistics?
Suffice it to say, I still have not answered that question.
On the other hand, this did lead me to ponder more about whether rational behavior always involves making choices with the highest expected (or positive) value - call this Rationality-Expectation (R-E) hypothesis.
Here I'd like to offer some counterexamples that show R-E is clearly false, to me at least. (In hindsight, these look fairly trivial but some commentators on this site speak as if maximizing expectation is somehow constitutive of rational decision making - as I used to. So, it may be interesting for those people at the very least.)
- Suppose someone offers you a (single trial) gamble A in which you stand to gain 100k dollars with probability 0.99 and stand to lose 100M dollars with probability 0.01. Even though expectation is -98999000 dollars, you should still take the gamble since the probability of winning on a single trial is very high - 0.99 to be exact.
- Suppose someone offers you a (single trial) gamble B in which you stand to lose 100k dollars with probability 0.99 and stand to gain 100M dollars with probability 0.01. Even though expectation is 98999000 dollars, you should not take the gamble since the probability of losing on a single trial is very high - 0.99 to be exact.
A is a gamble that shows that choices with negative expectation can sometimes lead to net pay off.
B is a gamble that shows that choices with positive expectation can sometimes lead to net costs.
As I'm sure you've all noticed, expectation is only meaningful in decision-making when the number of trials in question can be large (or more precisely, large enough relative to the variance of the random variable in question). This, I think, in essence is another way of looking at Weak Law of Large Numbers.
In general, most (all? few?) statistical concepts make sense only when we have trials numerous enough relative to the variance of the quantities in question.
This makes me ponder a deeper question, nonetheless.
Does it make sense to speak of probabilities only when you have numerous enough trials? Can we speak of probabilities for singular, non-repeating events?
Do you think the chain of reasoning is infinite? For actual humans there is certainly some boundary under which the prior no more feels as an output of further computation, although such beliefs could have been influenced by earlier observations either subconsciously, or consciously while this fact has been forgotten later. Especially in the former case, I think the reasoning leading to such beliefs is very likely to be flawed, so it seems fair to treat such beliefs as genuine priors, even if, strictly speaking, they were physically influenced by evidence.
A perfect Bayesian, on the other hand, should be immune to flawed reasoning, but still it has to be finite, so I suppose it must have some genuine priors which are part of its immutable hardware. I imagine it in an analogy with formal systems, which have a finite set of axioms (or an infinite set defined by a finite set of conditions) and a finite set of derivation rules, and a set of theorems consisting of axioms and derived statements. For a Bayesian, axioms are replaced by several statements with associated priors, there is the Bayes' theorem among the derivation rules, and instead of a set of theorems, it has a set of encountered statements with attached probability. Possible issues are:
Not infinite but for humans all priors (or their non-strict-Bayesian equivalent at least) ultimately derive either from sensory input over the individual's lifetime or from millions of years of evolution baking in some 'hard-coded' priors to the human brain.
When dealing with any particular question you essentially draw a somewhat arbitrary line and lump millions of years of accumulated sensory input and evolutionary 'learning' together with a lifetime of actual learning and assign a single real number to it and call it a 'prior' but this is just a way of making calculation tractable.