You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

gwern comments on Open Thread, May 25 - May 31, 2015 - Less Wrong Discussion

3 Post author: Gondolinian 25 May 2015 12:00AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (301)

You are viewing a single comment's thread. Show more comments above.

Comment author: gwern 27 May 2015 12:17:29AM *  5 points [-]

Do the findings apply to physics? Math? Computer science?

Different fields use different methods. The basic point Ioannidis makes applies to any field which uses null-hypothesis significance-testing statistics for interpreting sampled data.

  • Math uses formal proofs, so whatever math error rates are (non-zero and meaningful, but not sure how big), they are independent of NHST's problems.
  • Ecology, medicine, biology, psychology, economics - heavy NHST users, critique definitely applies.

    • Experimental physics seems to use a lot of NHST too but they obey the critique by increasing power substantially: reducing measurement error and gathering enormous masses of data, more than is feasible in the other fields, so many n they can use the famed six-sigma alpha, which translates to very high PPV. They're also helped by the commitment to falsifiable narrow predictions of things like intervals rather than directions (hypothesis testing works much better if you can predict the Higgs's mass lies within a narrow range rather than having a null hypothesis of mass equals zero and an alternative of mass is non-zero; if you're interested, see Paul Meehl's methodological papers on why this is important).
  • Computer science is tricky:

    • the mathy parts are math and are safe (but not necessarily important or worth doing),
    • but other areas like systems work or machine learning may use NHST techniques or may not; there seem to be a lot of replicability problems in optimization work due to variation from machine to machine, and in machine learning I've heard many insinuations that papers get published by p-hacking hyperparameters until finally the new algorithm is p<0.05 better than the comparison algorithm or that the new tweak is just overfitting on a standard dataset, and some subfields are visibly rotten (HCI especially; you only have to look at how routine it is for HCI papers to claim an improvement based on NHST techniques applied to n=10 or something to know that those ain't gonna replicate)... but aside from a few critical papers like "Producing Wrong Data Without Doing Anything Obviously Wrong!" I don't know of any general argument that most CS research is wrong.

It would be interesting to weight fields by publication count to see if Ioannidis's title, interpreted literally, is still right. When one criticizes 'ecology, medicine, biology, psychology, economics', one is criticizing what must be at least hundreds of thousands of papers every year - those are big fields. I don't know that math, physics, theoretical CS etc publish enough papers to offset that.

Comment author: passive_fist 27 May 2015 01:57:42AM 0 points [-]

I agree 100%.