The hard part isn't noticing when papers are bad, it's deciding what to do afterwards

LawrenceC

Written (very) quickly for the Inkhaven Residency.

I used to hate the classic management adage of “bring me solutions, not problems”. After all, identifying problems is the first step of solving them, and clearly understanding a problem is often a substantial part of the difficulty of solving it. (It also doesn’t help that I’ve sat in on many modern management classes where this adage was treated as obviously wrong and outdated.)

But over time, I’ve realized the adage contains some amount of wisdom, at least in the context of research. The interesting question is rarely if a thing is bad, but instead about how bad it really is, and what to do afterwards.

When I was in middle and high school, I loved memorizing logical fallacies, and spotting them in the arguments made by others. “That’s an appeal to authority!”, I’d think in my head. “Dismissed!” (Yes, I was indeed an annoying debate kid.) Thankfully, as I grew up, I realized that it often matters to figure out what is actually true, rather than scoring points against imagined or real debate opponents. The interesting question in debates is often what is actually true, and not how hard you can dunk on the poorly constructed arguments of others.

People who've known me in the last decade often note that I tend to lean critical or skeptical when it comes to anything. For example, I often give spectacular impromptu lectures (an impolite person might call these rants) on the failings of newly released papers, some of which even get translated into blog posts. I think my criticisms are generally correct and point at real issues in the papers. But the interesting question when critiquing research is not if a research paper has questionable methodological choices (under sufficiently intense scrutiny, all papers do) but instead if the issues are large enough to impact the validity of the paper’s core claims. Oftentimes, after doing further investigation, I come around to thinking that even though a new paper has serious methodological problems, its core claims are still correct.

When I read many critiques of papers, I see my much younger self: oftentimes, people seem to read papers, find one or two issues, and dismiss them out of hand. (This is especially common on Twitter, and is a big part of why I strongly dislike using it. But it’s been unfortunately common even amongst AI safety people.) I think it’s understandable why this happens: deeply investigating a paper’s claims takes time and cognitive effort, while finding a gotcha is cheap. Oftentimes, finding a clear methodological issue unaddressed by the paper can be useful as evidence of lack of academic proof-of-work on the part of the authors. And it’s not the case that every paper is worth the amount of investigation to fully understand: after all, not every paper has interesting claims, and many papers do have serious methodological flaws that are fatal to their core conclusions. But I still think that critiques should spend way more time assessing the core claims of the paper, rather than finding dunks.

In the interest of suggesting some solutions (and not just pointing at a problem), here are some good rules of thumb to follow in the context of paper critiques First, I think every critique of a paper should at the very least understand the paper well enough to summarize it in a way the authors would agree with. Second, critiques should rarely dwell on typos, formatting errors, or lack of citations, and should ideally explicitly distinguish criticisms that are fatal to the core claim from ones that aren't. Third, critiques should give the paper the benefit of steelmanning any ambiguous methodological choice before criticizing it.

I agree that steering toward truth is better than dunking on opponents, and I think your first and third suggestions for how to encourage steering toward truth are quite reasonable.

I'm not convinced that, as a rule of thumb, it makes sense to gloss over formatting errors or missing citations. Of course there are examples of critiques about formatting or citations that are thoughtless and unhelpful dunks, but it's not obvious to me that most such critiques are unhelpful.

In particular, if the concept of "formatting" is broad enough to include things like the choice of title, choice of section headers, relative order and hierarchy of sections, etc., then I often see papers that are so badly formatted that it's not clear what if anything the author is trying to say. Similarly, a poorly formatted graph or chart might fail to convey its key points or make digesting these points so difficult as to not be worth the effort for a typical reader.

With regard to citations, it's one thing to complain that a paper is only citing two out of three of the relevant pieces of prior work -- but it's another thing to complain that a paper seems blissfully unaware of an entire relevant body of prior work. This is especially problematic if the prior work persuasively establishes some limitations on or reasons to be skeptical of the author's preferred data or methodology.

I'm curious to what extent you agree with these counterpoints (in which case we're haggling over semantics) and to what extent you think that reviewers really should refrain from complaining about missing structure and missing acknowledgements/caveats (in which case I'd love to hear more about why.)

With regard to citations, it's one thing to complain that a paper is only citing two out of three of the relevant pieces of prior work -- but it's another thing to complain that a paper seems blissfully unaware of an entire relevant body of prior work. This is especially problematic if the prior work persuasively establishes some limitations on or reasons to be skeptical of the author's preferred data or methodology.

There are also a lot of papers that just cite without properly engage––some even mischaractererize––with the work. I notice that this problem is worsened with a lot of less-experienced researchers simply relying on LLMs for writing, especially on related work or discussion section.

Many times I went through some of the cited work, and found that they almost have identical findings or even entirely opposite findings. I usually make it a point in my own writings to write something like "The closest work to ours is by XXX, who found ...".

I think reviewers have two seperate tasks when assessing a paper.

The first, that (in my opinion almost all reviewers are good at) is identifying the easy wins that will improve the paper. This is the land of typoes, the occssional confusing sentence or a missed citation.

The second (that reviwers are on average less good at) is telling the editor if they think the paper, at its core, is actually any good, assuming some fixes and repairs.

First, I think every critique of a paper should at the very least understand the paper well enough to summarize it in a way the authors would agree with. Second, critiques should rarely dwell on typos, formatting errors, or lack of citations, and should ideally explicitly distinguish criticisms that are fatal to the core claim from ones that aren't. Third, critiques should give the paper the benefit of steelmanning any ambiguous methodological choice before criticizing it.

I my experience, the first two are common in reviews. I went through the publication process quite a little under a dozen times, all in computer science, most in logic / AI conferences (conferences are the primary mean or publication in CS). More often than not the reviewer are explicitly expected to give an overview of the paper. Most reviewers put typos and small remarks in a separate section of their review.

I quite like the idea that reviewers should steelman the choices of the paper they are criticizing, that would certainly improve review quality. But it would make reviewing more time consuming and harder.

I agree that steering toward truth is better than dunking on opponents, and I think your first and third suggestions for how to encourage steering toward truth are quite reasonable.

With regard to citations, it's one thing to complain that a paper is only citing two out of three of the relevant pieces of prior work -- but it's another thing to complain that a paper seems blissfully unaware of an entire relevant body of prior work. This is especially problematic if the prior work persuasively establishes some limitations on or reasons to be skeptical of the author's preferred data or methodology.

I think reviewers have two seperate tasks when assessing a paper.

The second (that reviwers are on average less good at) is telling the editor if they think the paper, at its core, is actually any good, assuming some fixes and repairs.

First, I think every critique of a paper should at the very least understand the paper well enough to summarize it in a way the authors would agree with. Second, critiques should rarely dwell on typos, formatting errors, or lack of citations, and should ideally explicitly distinguish criticisms that are fatal to the core claim from ones that aren't. Third, critiques should give the paper the benefit of steelmanning any ambiguous methodological choice before criticizing it.

65

The hard part isn't noticing when papers are bad, it's deciding what to do afterwards

65

65

65