You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on Should you write longer comments? (Statistical analysis of the relationship between comment length and ratings) - Less Wrong Discussion

11 Post author: cleonid 20 July 2015 02:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (47)

You are viewing a single comment's thread.

Comment author: Lumifer 20 July 2015 04:05:46PM *  7 points [-]

I think these plots would by much improved by adding error bars. In particular, I suspect that the number of short posts is greater than the number of long posts and so the average-karma estimates for long posts are more uncertain.

Also, did you bucketize the word counts? What do specific points on your plots correspond to?

Comment author: cleonid 20 July 2015 10:32:17PM 1 point [-]

Each point on the graph corresponds to an average of several hundred (about two thousand for the middle graph) data points. A number of short posts is indeed greater than the number of long posts, so the horizontal distance between the points on the graph increases with increasing number of characters.

Comment author: Lumifer 21 July 2015 01:30:12AM 3 points [-]

Any particular reason you did a plot this way instead of having a cloud of points and drawing some kind of regression line or curve through? You are unnecessarily losing information by aggregating into buckets.

Comment author: cleonid 21 July 2015 11:43:00AM 0 points [-]

True, but it is virtually impossible to see a meaningful pattern when you have thousands data points on the graph and R2<0.2.

Comment author: Douglas_Knight 22 July 2015 05:28:45AM 0 points [-]

I disagree. I find point clouds useful, as long as they are not pure black. Kernel density plots are better, though.

But Lumifer gave you a concrete suggestion: plot a regression curve, not a bunch of buckets. Bucketing and drawing lines between points are kinds of smoothing, so you should instead use a good smoothing. Say, loess. Just use ggplot and trust its defaults. (not loess with this many points)

Comment author: Lumifer 21 July 2015 04:30:10PM 0 points [-]

Well, one question is if it's "impossible to see a meaningful pattern", should you melt-and-recast the data so that the pattern appears X-/

Another observation is that you are constrained by Excel. R can deal with such problems easily -- do you have the raw dataset available somewhere?