You've already surmised why rot13 words are undesirable. Just to check, are you suggesting I use n-gram frequency to identify rot13 words, or replace TF-IDF with some sort of n-gram frequency metric instead?
You could use TF-IDF on n-grams. That's what I was thinking. But when I said to combine combine the local n-gram frequencies and the global n+1-gram frequencies to get a prediction of local n+1-gram frequencies to compare against, you might say it's too complicated to continue calling it TF-IDF.
If all you want to do is recognize rot13 words, then a dictionary and/or bigram frequencies sound pretty reasonable. But don't just eliminate rot13 words from the top 11 list; also include some kind of score of how much people use rot13. For example, you could use t...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.