shminux comments on 2013 Survey Results - Less Wrong

74 Post author: Yvain 19 January 2014 02:51AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (558)

You are viewing a single comment's thread.

Comment author: shminux 19 January 2014 04:15:24AM 14 points [-]

Yvain is not hugely on board with the idea of running correlations between everything and seeing what sticks, but will grudgingly publish the results because of the very high bar for significance (p < .001 on ~800 correlations suggests < 1 spurious result) and because he doesn't want to have to do it himself.

The standard way to fix this is to run them on half the data only and then test their predictive power on the other half. This eliminates almost all spurious correlations.

Comment author: Nominull 19 January 2014 04:59:15AM 10 points [-]

Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can't work magic.

Comment author: Dan_Weinand 19 January 2014 05:53:07AM 9 points [-]

Cross validation is actually hugely useful for predictive models. For a simple correlation like this, it's less of a big deal. But if you are fitting a local linearly weighted regression line for instance, chopping the data up is absolutely standard operating procedure.

Comment author: ChristianKl 19 January 2014 04:04:10PM *  0 points [-]

Does that actually work better than just setting a higher bar for significance? My gut says that data is data and chopping it up cleverly can't work magic.

How do you decide for how high to hang your bar for significance? It very hard to estimate how high you have to hang it depending on how you go fishing in your data. The advantage of the two step procedure is that you are completely free to fish how you want in the first step. There are even cases where you might want a three step procedure.

Comment author: Kawoomba 19 January 2014 08:48:10AM *  7 points [-]

Alternatively, Bonferroni correction.

Comment author: Pablo_Stafforini 19 January 2014 09:51:25AM *  8 points [-]

That's roughly what Yvain did, by taking into consideration the number of correlations tested when setting the significance level.