[Link] Better results by changing Bayes’ theorem

XiXiDu

If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.

-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality

Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:

In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.

So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.

In other words, if we change Bayes’ theorem (!) we get a better result. He goes on to explain

Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/

Peter Norvig - The Unreasonable Effectiveness of Data

If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.

-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality

Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:

In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.

So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.

In other words, if we change Bayes’ theorem (!) we get a better result. He goes on to explain

Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/

Peter Norvig - The Unreasonable Effectiveness of Data

Bayes is optimal if you throw all your knowledge into the equation. That's practically infeasible, so this program throws away most of the data first, and then applies a heuristic to the remaining data. There's no guarantee that applying Bayes' Rule directly to the remaining data will outperform another heuristic, just that both would be outperformed by running the ideal version of Bayes (and including everything we know about grammar, among other missing data).

I don't follow. The heuristic isn't using any of the thrown-away data, its just using the same data they used Bayes on. That is to say, someone who had only the information that Norvig actually used would be able to apply Bayes using all their knowledge or they could use this heuristic with all their knowledge and the heuristic would come out better.

This could possibly be explained if the heuristic also embodied some background information that allows us to correct for overconfidence if P(f|e) that isn't explicitly mentioned in the data, as suggested by DanielVarga, or if the heuristic was effectively preforming an expected utility calculation, as suggested in my other comment.