I haven't watched the video, but are they using expected value at all or are they just using the most likely word? Accidentally using a nonoptimal common word seems like it would produce a better translation than accidentally using a nonoptimal uncommon word, so this effect might just be making their algorithm more like expected utility and less like raw probabilities.
-- Eliezer Yudkowsky, Newcomb's Problem and Regret of Rationality
Don't worry, we don't have to abandon Bayes’ theorem yet. But changing it slightly seems to be the winning Way given certain circumstances. See below:
Link: johndcook.com/blog/2012/03/09/monkeying-with-bayes-theorem/
Peter Norvig - The Unreasonable Effectiveness of Data