DanielVarga comments on [Link] Better results by changing Bayes’ theorem - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (17)
Excellent explanation. I would add that the source of this overconfidence is not a mystery at all. Models for estimating Pr(f|e) are so ridiculously simplistic that a layperson would laugh us out if we explained them to her in plain English instead of formulas. For example, P(f|e) was sometimes defined as the probability that we can produce f from e by first applying a randomly chosen lexicon translation for each word of e, and then do a random local reordering of words. Here the whole responsibility of finding a random reordering that leads to a grammatical English sentence rests on the shoulders of Pr(e). It's almost like the translation model spits out a bag of words, and the language model has to assemble them into a chain of words. (The above simple example is far from being state of the art, but actual state of the art it is not that much more realistic either.)