[Link] Better results by changing Bayes’ theorem

XiXiDu

If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.

-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality

Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:

In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.

So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.

In other words, if we change Bayes’ theorem (!) we get a better result. He goes on to explain

Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/

Peter Norvig - The Unreasonable Effectiveness of Data

If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.

-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality

Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:

In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.

So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.

In other words, if we change Bayes’ theorem (!) we get a better result. He goes on to explain

Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/

Peter Norvig - The Unreasonable Effectiveness of Data

Excellent explanation. I would add that the source of this overconfidence is not a mystery at all. Models for estimating Pr(f|e) are so ridiculously simplistic that a layperson would laugh us out if we explained them to her in plain English instead of formulas. For example, P(f|e) was sometimes defined as the probability that we can produce f from e by first applying a randomly chosen lexicon translation for each word of e, and then do a random local reordering of words. Here the whole responsibility of finding a random reordering that leads to a grammatical English sentence rests on the shoulders of Pr(e). It's almost like the translation model spits out a bag of words, and the language model has to assemble them into a chain of words. (The above simple example is far from being state of the art, but actual state of the art it is not that much more realistic either.)