Below is a message I just got from jackk. Some specifics have been redacted 1) so that we can discuss general policy rather than the details of this specific case 2) because presumption of innocence, just in case there happens to be an innocuous explanation to this.
Hi Kaj_Sotala,
I'm Jack, one of the Trike devs. I'm messaging you because you're the moderator who commented most recently. A while back the user [REDACTED 1] asked if Trike could look into retributive downvoting against his account. I've done that, and it looks like [REDACTED 2] has downvoted at least [over half of REDACTED 1's comments, amounting to hundreds of downvotes] ([REDACTED 1]'s next-largest downvoter is [REDACTED 3] at -15).
What action to take is a community problem, not a technical one, so we'd rather leave that up to the moderators. Some options:
1. Ask [REDACTED 2] for the story behind these votes
2. Use the "admin" account (which exists for sending scripted messages, &c.) to apply an upvote to each downvoted post
3. Apply a karma award to [REDACTED 1]'s account. This would fix the karma damage but not the sorting of individual comments
4. Apply a negative karma award to [REDACTED 2]'s account. This makes him pay for false downvotes twice over. This isn't possible in the current code, but it's an easy fix
5. Ban [REDACTED 2]
For future reference, it's very easy for Trike to look at who downvoted someone's account, so if you get questions about downvoting in the future I can run the same report.
If you need to verify my identity before you take action, let me know and we'll work something out.
-- Jack
So... thoughts? I have mod powers, but when I was granted them I was basically just told to use them to fight spam; there was never any discussion of any other policy, and I don't feel like I have the authority to decide on the suitable course of action without consulting the rest of the community.
Digging into the paper, I give them an A for effort--they used some interesting methodologies--but there's a serious problem with it that destroys many of its conclusions. Here's 3 different measures they used of a post's quality:
q is the measure they used for most of their conclusions. Note that it is supposed to represent quality, but is based entirely on bigrams. This doesn't pass the sniff test. Whatever q measures, it isn't quality. At best it's grammaticality. It is more likely a prediction of rating based on the user's identity (individuals have identifiable bigram counts) or politics ("liberal media" and "death tax" vs. "pro choice" and "hate crime").
q is a prediction for p. p is a proxy for q'. There is no direct connection between q' and q -- no reason to think they will have any correlation not mediated by p.
R-squared values:
First, the R-squared between q', quality scores by judges, and p, community rating, is 0.12. That's crap. It means that votes are almost unrelated to post quality.
Next, the strongest correlation is between q and q', but the maximum possible causal correlation between them is 0.04 * 0.12 = 0.0048, because there is no causal connection between them except p.
That means that q, the machine-learned prediction they use for their study, has an acausal correlation with q', post quality, that is 50 times stronger than the causal correlation.
In other words, all their numbers are bullshit. They aren't produced by post quality, nor by user voting patterns. There is something wrong with how they've processed their data that has produced an artifactual correlation.