If it ever turns out that Bayes fails - receives systematically lower rewards on some problem, relative to a superior alternative, in virtue of its mere decisions - then Bayes has to go out the window.
-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality
Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:
In Peter Norvig’s talk The Unreasonable Effectiveness of Data, starting at 37:42, he describes a translation algorithm based on Bayes’ theorem. Pick the English word that has the highest posterior probability as the translation. No surprise here. Then at 38:16 he says something curious.
So this is all nice and theoretical and pure, but as well as being mathematically inclined, we are also realists. So we experimented some, and we found out that when you raise that first factor [in Bayes' theorem] to the 1.5 power, you get a better result.
In other words, if we change Bayes’ theorem (!) we get a better result. He goes on to explain
Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/
Peter Norvig - The Unreasonable Effectiveness of Data
Two possible explanations come to mind:
They are defining "better" as something other than higher expected value of number of correctly translated words.
The probabilities they're using are biased, and this reverses the bias.
I haven't watched the video, but are they using expected value at all or are they just using the most likely word? Accidentally using a nonoptimal common word seems like it would produce a better translation than accidentally using a nonoptimal uncommon word, so this effect might just be making their algorithm more like expected utility and less like raw probabilities.