Lumifer comments on Should you write longer comments? (Statistical analysis of the relationship between comment length and ratings) - Less Wrong

11 Post author: cleonid 20 July 2015 02:09PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (47)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 21 July 2015 01:30:12AM 3 points [-]

Any particular reason you did a plot this way instead of having a cloud of points and drawing some kind of regression line or curve through? You are unnecessarily losing information by aggregating into buckets.

Comment author: cleonid 21 July 2015 11:43:00AM 0 points [-]

True, but it is virtually impossible to see a meaningful pattern when you have thousands data points on the graph and R2<0.2.

Comment author: Douglas_Knight 22 July 2015 05:28:45AM 0 points [-]

I disagree. I find point clouds useful, as long as they are not pure black. Kernel density plots are better, though.

But Lumifer gave you a concrete suggestion: plot a regression curve, not a bunch of buckets. Bucketing and drawing lines between points are kinds of smoothing, so you should instead use a good smoothing. Say, loess. Just use ggplot and trust its defaults. (not loess with this many points)

Comment author: Lumifer 21 July 2015 04:30:10PM 0 points [-]

Well, one question is if it's "impossible to see a meaningful pattern", should you melt-and-recast the data so that the pattern appears X-/

Another observation is that you are constrained by Excel. R can deal with such problems easily -- do you have the raw dataset available somewhere?