You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on Should you write longer comments? (Statistical analysis of the relationship between comment length and ratings) - Less Wrong Discussion

11 Post author: cleonid 20 July 2015 02:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (47)

You are viewing a single comment's thread.

Comment author: Lumifer 23 July 2015 06:02:56AM *  2 points [-]

Continued from part 1.

The gist of part 2 is four graphs.

The graphs plot most of the data (except for outliers) in the following form. Each post is represented by two points with the same X coordinate: the number of characters. The Y coordinate for one point is the number of upvotes the post received, the Y coordinate of the other point is the number of downvotes for the same post. Upvotes are light green and downvotes are pink.

The upvotes and the downvotes are modeled separately by two loess (local regression) curves. The difference between two graphs for each of the posters is in the details of the fit. Specifically, one fit assumes gaussian errors and so the loess curve tends to approximate the local mean. The other fit assumes heavy-tailed errors and its loess curve tends to approximate the local median. Since the distribution of votes is skewed, the mean and the median are noticeably different.

Each plot has four vertical lines at four quantiles: 25%, 50%, 75%, and 95%. The lower numbers represent the loess estimate of the number of downvotes for this particular post length. The upper numbers represent the loess estimate of the number of upvotes.

We will start with the robust fit which approximates the median. Here is the plot for EY

and here is the plot for SA

As you can see, longer posts pay off though not in a particularly spectacular manner for EY -- long posts work better for SA. The downvotes also increase, but insignificantly. If we treat the loess estimate as the median, in all cases half of the posts has zero downvotes.

Since the votes are positively skewed, the means should be higher than the medians and we can see it in the second set of graphs with non-robust loess fits. EY

and SA

The overall pattern is very much the same, but the numbers are higher. Again, longer posts bring much more karma for SA, not so much but still some for EY.