PhilGoetz comments on [meta] Policy for dealing with users suspected/guilty of mass-downvote harassment? - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (239)
Digging into the paper, I give them an A for effort--they used some interesting methodologies--but there's a serious problem with it that destroys many of its conclusions. Here's 3 different measures they used of a post's quality:
q is the measure they used for most of their conclusions. Note that it is supposed to represent quality, but is based entirely on bigrams. This doesn't pass the sniff test. Whatever q measures, it isn't quality. At best it's grammaticality. It is more likely a prediction of rating based on the user's identity (individuals have identifiable bigram counts) or politics ("liberal media" and "death tax" vs. "pro choice" and "hate crime").
q is a prediction for p. p is a proxy for q'. There is no direct connection between q' and q -- no reason to think they will have any correlation not mediated by p.
R-squared values:
First, the R-squared between q', quality scores by judges, and p, community rating, is 0.12. That's crap. It means that votes are almost unrelated to post quality.
Next, the strongest correlation is between q and q', but the maximum possible causal correlation between them is 0.04 * 0.12 = 0.0048, because there is no causal connection between them except p.
That means that q, the machine-learned prediction they use for their study, has an acausal correlation with q', post quality, that is 50 times stronger than the causal correlation.
In other words, all their numbers are bullshit. They aren't produced by post quality, nor by user voting patterns. There is something wrong with how they've processed their data that has produced an artifactual correlation.