Dan_Weinand comments on 2013 Survey Results - Less Wrong

74 Post author: Yvain 19 January 2014 02:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (558)

You are viewing a single comment's thread. Show more comments above.

Comment author: shminux 19 January 2014 04:15:24AM 14 points [-]

Yvain is not hugely on board with the idea of running correlations between everything and seeing what sticks, but will grudgingly publish the results because of the very high bar for significance (p < .001 on ~800 correlations suggests < 1 spurious result) and because he doesn't want to have to do it himself.

The standard way to fix this is to run them on half the data only and then test their predictive power on the other half. This eliminates almost all spurious correlations.

Comment author: Nominull 19 January 2014 04:59:15AM 10 points [-]

Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can't work magic.

Comment author: Dan_Weinand 19 January 2014 05:53:07AM 9 points [-]

Cross validation is actually hugely useful for predictive models. For a simple correlation like this, it's less of a big deal. But if you are fitting a local linearly weighted regression line for instance, chopping the data up is absolutely standard operating procedure.