Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

LW anchoring experiment: maybe

14 Post author: gwern 23 January 2013 10:41PM

I do an informal experiment testing whether LessWrong karma scores are susceptible to a form of anchoring based on the first comment posted; a medium-large effect size is found although the data does not fit the assumed normal distribution so there may or may not be an anchoring effect. Full writeup on gwern.net

It has been suggested that the top-scoring articles tend to benefit from an initially positive reaction in comments. Such an anchoring or social proof effect resulting in a first-mover advantage seems quite plausible to me.

Design

So on 27 February 2012, I registered the account Rhwawn53. I made some quality comments and upvotes to seed the account as a legitimate active account.

Thereafter, whenever I wrote an Article or Discussion, after making it public, I flipped a coin and if Heads, I posted a comment as Rhwawn saying Upvoted or if Tails, a comment saying Downvoted. (Grognor said that the comments came with reasons, but unfortunately if I came up with reasons for either comment, some criticisms or praise would be better than others and this would be another source of variability; I settled on adding some generic comments, see the full writeup.) Needless to say, no actual vote was made. I then made a number of quality comments and votes on other Articles/Discussions to camouflage the experimental intervention. (In no case did I upvote or downvote someone I had already replied to or voted on with my Gwern account.) Finally, I scheduled a reminder on my calendar for 30 days later to record the karma on that Article/Discussion. I don’t post that often, so I decided to stop after 1 year, on 27 February 2013. I wound up breaking this decision since by September I had ceased to find it an interesting question, it was an unfinished task that was burdening my mind, and the necessity of making some genuine contributions as Rhwawn to cloak a anchoring comment was a not-so-trivial inconvenience that was stopping me from posting.

I also asked some LWers who posted often if they wanted to help; I thank them for their assistance.

 

And the ethics?

The post authors are volunteers, and as already pointed out, the expected karma benefit is 0. So no one is harmed, and as for the deception, it does not seem to me to be a big deal. We are already nudged by countless primes and stimuli and biases, so another one, designed to be neutral in total effect, seems harmless to me.

What comes before determines what comes after…The thoughts of all men arise from the darkness. If you are the movement of your soul, and the cause of that movement precedes you, then how could you ever call your thoughts your own? How could you be anything other than a slave to the darkness that comes before?…History. Language. Passion. Custom. All these things determine what men say, think, and do. These are the hidden puppet-strings from which all men hang…all men are deceived….So long as what comes before remains shrouded, so long as men are already deceived, what does [deceiving men] matter? –Kelhus, R. Scott Bakker’s The Darkness That Comes Before

Data

The results:

Post Author Date Anchor Post karma Comment karma
Cashing Out Cognitive Biases as Behavior gwern 02 March 0 11 -4
Heuristics and Biases in Charity Kaj_Sotala 02 March 0 19 -6
I Was Not Almost Wrong But I Was Almost Right Kaj_Sotala 08 March 1 50 0
Emotional regulation, Part I: a problem summary Swimmer963 05 March 0 9 -2
How would you stop Moore’s Law? gwern 10 March 1 19 -3
On the etiology of religious belief gwern 11 March 0 11 -7
Decision Theories: A Less Wrong Primer orthonormal 13 March 1 62 0
Schelling fences on slippery slopes Yvain 16 March 1 120 1
Fallacies as weak Bayesian evidence Kaj_Sotala 18 March 1 49 0
Decision Theories: A Semi-Formal Analysis, Part I orthonormal 24 March 1 20 -1
Decision Theories: A Semi-Formal Analysis, Part II orthonormal 06 April 0 16 -13
To like each other, sing and dance in synchrony Kaj_Sotala 23 April 0 20 4
The state of life extension research gwern 23 April 1 10 0
Value of Information: 8 examples gwern 18 May 0 45 -8
Hope Function gwern 01 July 1 22 0
To Learn Critical Thinking, Study Critical Thinking gwern 07 July 0 23 -11
Dragon Ball’s Hyperbolic Time Chamber gwern 02 Sep 0 33 -9

Analysis

For the analysis, I have 2 questions:

  1. Is there a difference in karma between posts that received a negative initial comment and those that received a positive initial comment? (Any difference suggests that one or both is having an effect.)
  2. Is there a difference in karma between the two kinds of initial comments, as I began to suspect during the experiment?

Article effect

Some Bayesian inference using BEST:

lw <- read.table(stdin(),header=TRUE)
...
source("BEST.R")
neg <- lw[lw$Anchor==0,]$Post.karma
pos <- lw[lw$Anchor==1,]$Post.karma
mcmc = BESTmcmc(neg, pos)
BESTplot(neg, pos, mcmcChain=mcmc)
SUMMARY.INFO
PARAMETER mean median mode HDIlow HDIhigh pcgtZero
mu1 20.1792 20.104 20.0392 10.7631 29.9835 NA
mu2 41.9474 41.640 40.4661 11.0307 75.1056 NA
muDiff -21.7682 -21.519 -22.6345 -55.3222 11.2283 8.143
sigma1 13.1212 12.264 10.9018 5.8229 22.4381 NA
sigma2 40.9768 37.835 33.8565 16.5560 72.6948 NA
sigmaDiff -27.8556 -24.995 -21.7802 -60.9420 -0.9855 0.838
nu 30.0681 21.230 5.6449 1.0001 86.5698 NA
nuLog10 1.2896 1.327 1.4332 0.4332 2.0671 NA
effSz -0.7718 -0.765 -0.7632 -1.8555 0.3322 8.143

Graphical summary of BEST results for full datasetGraphical summary of BEST results for full dataset

The results are heavily skewed by Yvain’s very popular post; we can’t trust any results based on such a high scoring post. Let’s try omitting Yvain’s datapoint. BEST actually crashes displaying the result, perhaps due to making an assumption about there being at least 8 datapoints or something, so we’ll fall back to a t-test:

t.test(neg,pos)
Welch Two Sample t-test
data: neg and pos
t = -1.453, df = 9.139, p-value = 0.1796
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -31.57 6.84
sample estimates:
mean of x mean of y
20.78 33.14

More reasonable. To work around the bug, let’s replace Yvain by the mean for that group without him, 33; the new results:

 SUMMARY.INFO
PARAMETER mean median mode HDIlow HDIhigh pcgtZero
mu1 20.2877 20.2002 20.1374 10.863 29.9532 NA
mu2 32.7912 32.7664 32.8370 15.609 50.4410 NA
muDiff -12.5035 -12.4802 -12.2682 -32.098 7.3301 9.396
sigma1 13.2561 12.3968 10.9385 6.044 22.3085 NA
sigma2 22.4574 20.6784 18.3106 10.449 38.5859 NA
sigmaDiff -9.2013 -8.1031 -7.1115 -28.725 7.6973 11.685
nu 33.2258 24.5819 8.6726 1.143 91.3693 NA
nuLog10 1.3575 1.3906 1.4516 0.555 2.0837 NA
effSz -0.7139 -0.7066 -0.7053 -1.779 0.3607 9.396

Graphical summary of BEST results for dataset with Yvain replaced by a meanGraphical summary of BEST results for dataset with Yvain replaced by a mean

The difference in means has shrunk but not gone away; it’s large enough that 10% of the possible effect sizes (of a negative initial comment rather than positive) may be zero or actually be positive (increase karma) instead. This is a little concerning, but I don’t take this too seriously:

  • this is not a lot of data
  • as we’ve seen there are extreme outliers suggesting that the assumptions of normality may be badly wrong
  • even at face value, 10 karma points doesn’t seem like it’s large enough to have any important real-world consequences (like make people leave LW who should’ve stayed)

Comment treatment

How did these mindless unsubstantiated comments either praising or criticizing an article get treated by the community? Let’s look at the anchoring comment’s karma:

neg <- lw[lw$Anchor==0,]$Comment.karma
pos <- lw[lw$Anchor==1,]$Comment.karma
mcmc = BESTmcmc(neg, pos)
BESTplot(neg, pos, mcmcChain=mcmc)
SUMMARY.INFO
PARAMETER mean median mode HDIlow HDIhigh pcgtZero
mu1 -6.4278 -6.4535 -6.55032 -10.5214 -2.2350 NA
mu2 -0.2755 -0.2455 -0.01863 -1.3180 0.7239 NA
muDiff -6.1523 -6.1809 -6.25451 -10.3706 -1.8571 0.569
sigma1 5.6508 5.2895 4.70143 2.3262 9.7424 NA
sigma2 1.2614 1.1822 1.07138 0.2241 2.4755 NA
sigmaDiff 4.3893 4.0347 3.53457 1.1012 8.5941 99.836
nu 27.4160 18.1596 4.04827 1.0001 83.9648 NA
nuLog10 1.2060 1.2591 1.41437 0.2017 2.0491 NA
effSz -1.6750 -1.5931 -1.48805 -3.2757 -0.1889 0.569

Graphical summary of BEST results for full dataset of how the positive/negative comments were treatedGraphical summary of BEST results for dataset with Yvain replaced by a mean

As one would hope, neither group of comments ends up with net positive mean score, but they’re clearly being treated very differently: the negative comments get downvoted far more than the positive comments. I take this as perhaps implying that LW’s reputation for being negative & hostile is a bit overblown: we’re negative and hostile to poorly thought out criticisms and arguments, not fluffy praise.

Comments (23)

Comment author: mwengler 23 January 2013 11:24:25PM 14 points [-]

I would really appreciate a very brief statement of your conclusions. My apologies, but I don't feel like shoveling through your analysis just to find out whether there is an effect, a weak effect, a backwards effect, or whatever.

Comment author: gokfar 24 January 2013 01:27:43AM *  7 points [-]

Just skip the intro, R-code and graphs (too heavy on math).

Question 1:

Is there a difference in karma between posts that received a negative initial comment and those that received a positive initial comment? (Any difference suggests that one or both is having an effect.)

Conclusion 1:

The difference in means has shrunk but not gone away; it’s large enough that 10% of the possible effect sizes (of "a negative initial comment rather than positive") may be zero or actually be positive (increase karma) instead. This is a little concerning, but I don’t take this too seriously:

  • this is not a lot of data
  • as we’ve seen there are extreme outliers suggesting that the assumptions of normality may be badly wrong
  • even at face value, 10 karma points doesn’t seem like it’s large enough to have any important real-world consequences (like make people leave LW who should’ve stayed)

Question 2:

Is there a difference in karma between the two kinds of initial comments, as I began to suspect during the experiment?

Conclusion 2:

As one would hope, neither group of comments ends up with net positive mean score, but they’re clearly being treated very differently: the negative comments get downvoted far more than the positive comments. I take this as perhaps implying that LW’s reputation for being negative & hostile is a bit overblown: we’re negative and hostile to poorly thought out criticisms and arguments, not fluffy praise.

tl;dr: maybe

Comment author: jsalvatier 24 January 2013 12:11:52AM 2 points [-]

I was interested in the details of this, but yes, even I would have appreciated a tl:dr.

Comment author: Vladimir_Nesov 24 January 2013 01:09:54AM *  4 points [-]

I'm not sure to what extent these comments can be modeled as expressing a "positive" or a "negative" reaction, the nonsensical one-line explanations made them mostly "insane" reactions (in my perception), which might overshadow the intended interpretation. It might have been a cleaner test if there were no explanations, or if you made an effort to carefully rationalize the random judgments (although that would be a more significant interference).

Comment author: gwern 24 January 2013 01:42:55AM *  5 points [-]

It's a "damned if you do, damned if you don't" sort of dilemma.

I know from watching them plummet into oblivion that comments which are just "Upvoted" or "Downvoted" are not a good idea for any anchoring question - they'll quickly be hidden, so any effect size will be a lot smaller than usual, and it's possible that hidden comments themselves anchor (my guess: negatively, by making people think "why is this attracting stupid comments?').

While if you go with more carefully rationalized comments, that's sort of like http://xkcd.com/810/ and starts to draw on the experimenter's own strengths & weaknesses (I'm sure I could make both quality criticisms and praises of psychology-related articles, but not so much technical decision theory articles).

I hoped my strategy would be a golden mean of not too trivial to be downvoted into oblivion, but not so high-quality and individualized that comparability was lost. I think I came close, since the positive comments saw only a small negative net downvote, indicating LWers may not have regarded it as good enough to upvote but also not so obviously bad as to merit a downvote.

(Of course, I didn't expect the positive and negative comments to be treated differently - they're pretty much the same thing, with a negation. I'm not sure how I would have designed it differently if I had known about the double-standard in advance.)

Comment author: Vladimir_Nesov 24 January 2013 03:23:50AM *  12 points [-]

Of course, I didn't expect the positive and negative comments to be treated differently

(Positive and somewhat stupid comments tend to be upvoted back to 0 even after they get downvoted at some point, so it's not just absence of response. I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.)

Comment author: wedrifid 24 January 2013 03:49:24AM *  6 points [-]

I consider it a dangerous vulnerability of LW to poorly thinking but socially conforming participants, whose active participation should be discouraged, but who are instead mildly rewarded.

It's a huge problem that I have observed eroding quality of thought and discussion over time. I'm relieved to see others acknowledge it.

Comment author: MixedNuts 25 January 2013 05:25:38PM 2 points [-]

A respected member saying "I know, right?" as you just did is valuable evidence, whereas the same from a no-name poster is noise. The naive reaction risks forming cliques with mutual back-scratching from big names.

Full disclosure: That kind of fluff is how I got most of my karma.

Comment author: CarlShulman 23 January 2013 11:28:13PM 1 point [-]

Haven't you critiqued people for doing just this kind of thing on LW?

Comment author: gwern 23 January 2013 11:41:17PM 1 point [-]

Have I? If I have, I'm sure there were some germane difference: banned accounts, more than 1 sock, abuse of socks to gain multiple votes, unsystematic data collection, no analysis, no public claim, clear damage, etc.

Comment author: accolade 27 January 2013 02:26:49PM 0 points [-]

Upvoted.

Comment author: gwern 27 January 2013 05:17:20PM 1 point [-]

You're a bit late.

Comment author: accolade 27 January 2013 08:30:19PM 0 points [-]

Never too late to upboat a good post! \o/ (…and dispense some bias at the occasion…)

Comment author: Kaj_Sotala 24 January 2013 05:56:27PM 1 point [-]

Oh, damn - now I'm annoyed at myself for forgetting to make Rhwawn comments on my own posts after the beginning.

Comment author: gwern 24 January 2013 06:12:04PM 3 points [-]

Don't feel too bad, you weren't the only one who lapsed. I didn't hector you guys because after all, it wasn't your experiment.

Comment author: jsalvatier 23 January 2013 11:31:18PM 1 point [-]

Possible model extensions:

Does best allow you to add prior information?

You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.

You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts.

It would be interesting to look at the distribution of post karma. That might be kind of informative, perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...

Comment author: gwern 23 January 2013 11:46:55PM 0 points [-]

Does best allow you to add prior information?

Supposedly you can add it but you'd have to edit the source, and that's beyond me right now.

You might try adding a prior over the effect size, it would be surprising if it was huge. For example, -30 seems implausibly large to me.

Sure, but the normal distribution is the wrong distribution to be using in the first place. I'm not really sure what... an exponential, maybe?

You could also add priors for the group means. You have some pretty good prior information here since there are lots of other posts. It would be interesting to look at the distribution of post karma.

You'd need the post karma in the first place. Offhand, I don't know any way to get it other than scraping thousands of pages...

perhaps it would be better to do the analysis on something like a log scale? Obviously it can't be exactly that since there are negative values...

Run the log on the absolute value and negate.

Comment author: jsalvatier 24 January 2013 12:13:59AM 0 points [-]

You can look at the RSS feed for some post category, and extract the votes, they're near the beginning in the description section.

Comment author: jsalvatier 24 January 2013 12:17:37AM 0 points [-]
Comment author: gwern 23 January 2013 10:49:11PM 1 point [-]
Comment author: Nisan 23 January 2013 11:47:50PM 0 points [-]

if Heads, I posted a comment as Rhwawn saying only "Upvoted" or if Tails, a comment saying "Downvoted".

Upon inspection, these comments seem to all contain explanatory remarks.

Comment author: gwern 23 January 2013 11:52:01PM 3 points [-]

Yes, see the full writeup.