ISO: Automated P-Hacking Detection

johnswentworth

7 ISO: Automated P-Hacking Detection

16th Jun 2019

1 min read

7

I'm sure there's some ML students/researchers on Lesswrong in search of new projects, so here's one I'd love to see and probably won't build myself: an automated method for predicting which papers are unlikely to replicate, given the text of the paper. Ideally, I'd like to be able to use it to filter and/or rank results from Google scholar.

Getting a good data set would probably be the main bottleneck for such a project. Various replication-crisis papers which review replication success/failure for tens or hundreds of other studies seem like a natural starting point. Presumably some amount of feature engineering would be needed; I doubt anyone has a large enough dataset of labelled papers to just throw raw or lightly-processed text into a black box.

Also, if anyone knows of previous attempts to do this, I'd be interested to hear about it.

Frontpage

7

ISO: Automated P-Hacking Detection

8gwern

3libero

2Ruby

New Comment

3 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:06 PM

[-]gwern6y80

https://osf.io/preprints/metaarxiv/zamry/

[-]libero6y30

The title doesn't seem to fit well the question: P-Hacking Detection does not map well to replicability, even if the presence of P-hacking usually means that the study will not replicate.

I'm interested in automatic summarization of papers key characteristics (PICO, sample size, methods) and I'm starting to build something soon.

[-]Ruby6y20

ISO = In Search Of , I wasn't familiar with that abbreviation.

Moderation Log

Curated and popular this week