I would really appreciate a very brief statement of your conclusions. My apologies, but I don't feel like shoveling through your analysis just to find out whether there is an effect, a weak effect, a backwards effect, or whatever.
Just skip the intro, R-code and graphs (too heavy on math).
Question 1:
Is there a difference in karma between posts that received a negative initial comment and those that received a positive initial comment? (Any difference suggests that one or both is having an effect.)
Conclusion 1:
The difference in means has shrunk but not gone away; it’s large enough that 10% of the possible effect sizes (of "a negative initial comment rather than positive") may be zero or actually be positive (increase karma) instead. This is a little concerning, but I don’t take this too seriously:
- this is not a lot of data
- as we’ve seen there are extreme outliers suggesting that the assumptions of normality may be badly wrong
- even at face value, 10 karma points doesn’t seem like it’s large enough to have any important real-world consequences (like make people leave LW who should’ve stayed)
Question 2:
Is there a difference in karma between the two kinds of initial comments, as I began to suspect during the experiment?
Conclusion 2:
As one would hope, neither group of comments ends up with net positive mean score, but they’re clearly being treated very differently: the negative comments get downvoted far more than the positive comments. I take this as perhaps implying that LW’s reputation for being negative & hostile is a bit overblown: we’re negative and hostile to poorly thought out criticisms and arguments, not fluffy praise.
tl;dr: maybe
I'm not sure to what extent these comments can be modeled as expressing a "positive" or a "negative" reaction, the nonsensical one-line explanations made them mostly "insane" reactions (in my perception), which might overshadow the intended interpretation. It might have been a cleaner test if there were no explanations, or if you made an effort to carefully rationalize the random judgments (although that would be a more significant interference).
It's a "damned if you do, damned if you don't" sort of dilemma.
I know from watching them plummet into oblivion that comments which are just "Upvoted" or "Downvoted" are not a good idea for any anchoring question - they'll quickly be hidden, so any effect size will be a lot smaller than usual, and it's possible that hidden comments themselves anchor (my guess: negatively, by making people think "why is this attracting stupid comments?').
While if you go with more carefully rationalized comments, that's sort of like http://xkcd.com/810/ and starts to draw on the experimenter's own strengths & weaknesses (I'm sure I could make both quality criticisms and praises of psychology-related articles, but not so much technical decision theory articles).
I hoped my strategy would be a golden mean of not too trivial to be downvoted into oblivion, but not so high-quality and individualized that comparability was lost. I think I came close, since the positive comments saw only a small negative net downvote, indicating LWers may not have regarded it as good enough to upvote but also not so obviously bad as to merit a downvote.
(Of course, I didn't expect the positive and negative comments to be treated differently - they're pretty much the same thing, with a negation. I'm not sure how I would have designed it differently if I had known about the double-standard in advance.)
Of course, I didn't expect the positive and negative comments to be treated differently
(Positive and somewhat stupid comments tend to be upvoted back to 0 even after they get downvoted at some point, so it's not just absence of response. I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.)
I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.
It's a huge problem that I have observed eroding quality of thought and discussion over time. I'm relieved to see others acknowledge it.
A respected member saying "I know, right?" as you just did is valuable evidence, whereas the same from a no-name poster is noise. The naive reaction risks forming cliques with mutual back-scratching from big names.
Full disclosure: That kind of fluff is how I got most of my karma.
Possible model extensions:
Does best allow you to add prior information?
You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.
You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts.
It would be interesting to look at the distribution of post karma. That might be kind of informative, perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...
Does best allow you to add prior information?
Supposedly you can add it but you'd have to edit the source, and that's beyond me right now.
You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.
Sure, but the normal distribution is the wrong distribution to be using in the first place. I'm not really sure what... an exponential, maybe?
You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts. It would be interesting to look at the distribution of post karma.
You'd need the post karma in the first place. Offhand, I don't know any way to get it other than scraping thousands of pages...
perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...
Run the log on the absolute value and negate.
You can look at the RSS feed for some post category, and extract the votes, they're near the beginning in the description section.
I did something similar here: http://pipes.yahoo.com/pipes/pipe.info?_id=a80f45dca206abea2297138e722f9b10
Full writeup on gwern.net at http://www.gwern.net/Anchoring