IlyaShpitser comments on Open Thread August 31 - September 6 - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (326)
If MIRI doesn't publish reasonably frequently (via peer review), how do you know they aren't wasting donor money? Donors can't evaluate their stuff themselves, and MIRI doesn't seem to submit a lot of stuff to peer review.
How do you know they aren't just living it up in a very expensive part of the country doing the equivalent of freshman philosophizing in front of the white board. The way you usually know is via peer review -- e.g. other people previously declared to have produced good things declare that MIRI produces good things.
How did science get done for the centuries before peer review? Why do you place such weight on such a recently invented construct like peer review (you may remember Einstein being so enraged by the first and only time he tried out this new thing called 'peer review' that he vowed to never again submit anything to a 'peer reviewed' journal), a construct which routinely fails anytime it's evaluated and has been shown to be extremely unreliable where the same paper can be accepted and rejected based on chance? If peer-review is so good, why do so many terrible papers get published and great Nobel-prize-winning work rejected repeatedly? If peer review is such an effective method of divining quality, why do many communities seem to get along fine with desultory use of peer review where it's barely used or left as the final step long after the results have been disseminated and evaluated and people don't even bother to read the final peer-reviewed version (particularly in economics, I get the impression that everyone reads the preprints & working papers and the final publication comes as a non-event; which has caused me serious trouble in the past in trying to figure out what to cite and whether one cite is the same as another; and of course, I'm not always clear on where various statistics or machine learning papers get published, or if they are published in any sense beyond posting to ArXiv)? And why does all the real criticism and debate and refutations seem to take place on blogs & Twitter if peer-review is such an acid test of whether papers are gold or dross, leading to the growing need for altmetrics and other ways of dealing with the 'post-publication peer review' problem as journals increasingly fail to reflect where scientific debates actually are?
I've said it before and I'll said it again: 'peer review' is not a core element of science. It's barely even peripheral and unclear it adds anything on net. For the most part, calls for 'peer review' are cargo culting. What makes science work is replication and putting your work out there for community evaluation. Those are the real review by peers.
If you are a donor who wants to evaluate MIRI, whether some arbitrary reviewers pass or fail its papers is not very important. There are better measures of impact: is anyone building on their work? have MIRI-specific claims begun filtering out? are non-affiliated academics starting to move into the AI risk field? Heck, even citation counts would probably be better here.
Peer review seems like a form of costly signalling. If you pass peer review, it only demonstrates that you have the ability to pass peer review. On the other hand, if you don't pass peer review, it signals that you don't have even this ability. (If so much crap passes peer review, why doesn't your research? Is it even worse than the usual crap?)
This is why I recommend to treat "peer review" simply as a hoop you have to jump through, otherwise people will bother you about it endlessly. To remove the suspicion that your research is even worse than the stuff that already gets published.
Mostly by well-off people satisfying their personal curiosity. Other than that, by finding a rich and/or powerful patron and keeping him amused :-D
I agree that the cult of peer review is overblown. But does MIRI produce any relevant and falsifiable output at all?
I would answer differently than you: "Very inefficiently and with lots of errors".
As opposed to quick, reliable present-day peer-reviewed science? ;-)
Well, not that this has changed...
What leads you to that conclusion? When do you think peer review began and how do you judge efficiency before and after?
Is this an "arguments as soldiers" thing? Compare an isomorphic argument: "how did medicine get done for the centuries before antibiotics."
Leaving aside that this an argument from authority, there is also selection bias here: peer review may well not be crucial -- if you happen to be of Einstein's caliber. But: "they also laughed at Bozo the Clown." I am sure plenty of Bozos are enraged at peer review too, unjustly rejecting their crap.
There is a stochastic element to peer review, but in my experience it works remarkably well, given what it is. Good papers are very likely to get a fair shake and get published. I routinely get very penetrating comments that greatly improve the quality of the final paper. I almost always get help with scholarship from reviewers (e.g. this is probably a good paper to cite.) A bigger issue I saw was not chance, but ideology from reviewers. I very occasionally get bad reviews (<5% chance) and associate editors (people who handle the paper and assign reviewers) are almost always helpful in such cases.
I asked you this before, gwern, how much experience with actual peer review (let's say in applied stats journals, as that is closest to what you do) do you have?
Absolute numbers are kind of useless here. Do you have some work in mind on false positive and false negative rates for peer review?
I don't think we disagree here, I think this is a form of peer review. I routinely do this with my papers, and am asked to look over preprints by others. I think this is fine for certain types of papers (generally very specialized or very large/weighty ones).
The worry is MIRI's conception of what a "peer" is basically ignores the wider academic community (which has a lot of intellectual firepower), so they end up in a bubble. The other worry is people who worry about getting tenured are incentivized to be productive (albeit imperfectly). MIRI is not incentivized to be productive except in some vague "saving the world" sense. And indeed, MIRI appears to be remarkably unproductive by academic standards. The guy who really calls the shots at MIRI, EY, has not internalized academic norms and appears to be fairly hostile to them.
Honestly, you sound a bit angry about peer review.
That's not isomorphic. To put it bluntly, medicine didn't. It only started becoming net beneficial extremely recently (and even now tons of medicine is harmful or a pure waste), based on copying a tremendous amount of basic science like biology and bacteriology and benefitting from others' discoveries, and importing methodology like randomized trials (which it still chafes at) and not by importing peer review. Up until the very late 1800s or so, you would have been better off often ignoring doctors if you were, say, an expecting mother wondering whether to give birth in a hospital pre-Semmelweiss. You can't expect too much too much help from a field which published its first RCT in 1948 (on, incidentally, an antibiotic).
I include it as a piquant anecdote since you seem to have no interest in looking up any of the statistical evidence on the unreliability and biases (in the statistical senses) of peer review, or the absence of any especial evidence that it works.
That is not what I am saying. I am saying, 'if you think MIRI is Bozo the Clown, get a photograph of its leader and see if he has a red nose! See if his face is suspiciously white and the entire MIRI staff saves a remarkable amount on gas purchases because they can all fit into one small car to run their errands! Don't deliberately look away and simply listen for the sound of laughter! That's a terrible way of deciding!'
No, they're not, or at the very least, you need to modify this to, 'after being forced to repeatedly try solely thanks to the peer review process, a good paper may still finally be published'. For example, in the NIPS experiment, most accepted papers would not have been accepted given a different committee. Unsurprisingly! given low inter-rater reliabilities for tons of things in psychology far less complicated, and enormous variability when n=1 or 3.
Yes, any of it. They all say that peer review is not a little but highly stochastic. This isn't a new field by any means.
I have little first-hand experience; my vitriol comes mostly from having read over the literature showing peer-review to be highly unreliable, and biased, from the unthinking respect and overestimation of it that most people give it, being shocked at how awful many published studies are despite being 'peer reviewed', and from talking to researchers and learning how pervasive bias is in the process and how reviewers enforce particular cliques & theories (some politically-motivated) and try to snuff opposition in the cradle.
The first represents a huge waste of time; the second hinders scientific progress directly and contributes to one of the banes of my existence as a meta-analyst, publication bias (why do we have a 'grey literature' in the first place?); the third is seriously annoying in trying to get most people to wake up and think a little about the research they read about ('but it's peer-reviwed!'); and the fourth is simply enraging as the issue moves from an abstract, general science-wide problem to something I can directly perceive specifically harming me and my attempts to get accurate beliefs.
(Well, actually I think my analysis of Silk Road 2 listings is supposed to be peer-reviewed, but the lead author is handling the bureaucracy so I can't say anything directly about how good or bad the reviewers for that journal are, aside from noting that this was a case of problem #4: the paper we were responding too is so egregiously, obviously wrong that the journal's reviewers must have either been morons or totally ignorant of the paper topic they were supposed to be reviewing. I'm still shocked & baffled about this: how does an apparently respectable journal wind up publishing a paper claiming, essentially, that Silk Road 2 did not sell drugs? This would have been caught in a heartbeat by any kind of remotely public process - even one person who had actually used Silk Road 1 or 2 peeking in on the paper could have laughed it out of the room - but because the journal is 'peer reviewed'... Pace the Gell-Man Effect, it makes me wonder about all the papers published about topics I am not so knowledgeable about as I am on Silk Road 2 and wonder if I am still not cynical enough.)
Yes, I have no objection to 'peer review' if by what you mean is all the things I singled out as opposed to, and prior to, and afterwards, the institution of peer review: having colleagues critique your work, having many other people with different perspectives & knowledge check it over and replicate it and build on it and post essays rebutting it - all this is great stuff, we both agree. I would say replication is the most important of those elements, but all have their place.
What I am attacking is the very specific formal institutional practice of journals outsourcing editorial judgment to a few selected researchers and effectively giving them veto power, a process which hardly seems calculated to yield very good results and which does not seem to have been institutionalized because it has been rigorously demonstrated to work far better than the pre-existing alternatives (which of course it wasn't, any more than medical proposals at that time were routinely put through RCTs first, even though we know how many good-sounding proposals in psychology & sociology & economics & medicine go down in flames when they are rigorously tested), but - to go off on a more speculative tangent here - whose chief purpose was to simply make the bureaucracy of science scale to the post-WWII expansion of science as part of the Cold War/Vannevar Bush academic-military-government complex.
If this is the problem with MIRI, I think there are far more informative ways to criticize them. For example, I don't think you need to rely on any proxies or filters: you should be able to evaluate their work directly and form your own critique of whether it's any good or if it seems like a good research avenue for their stated goals.
Science is srs bsns. (I find it hard to see why other people can't get worked up over things like publication bias or aging or p-hacking. They're a lot more important than the latest outrage du jour. This stuff matters!)
Medicine was often harmful in the past, with some occasional parts that helped, e.g. amputating gangrenous limbs was dangerous and people died, but probably was still a benefit on net. Admiral Nelson had multiple surgeries and was in serious danger of infection and death afterwards, but he would have been a goner for sure without surgery.
Science was pretty similar, it was mostly nonsense with occasional islands of sense. It didn't really get underway until, what, Francis Bacon wrote about biases and empiricism? That is not very long ago. The early "gentlemen scholars" all did informal peer review by sending their stuff to each other (they also hid discoveries from each other due to competition and egos, but this stuff happens today too).
Gwern, peer review is my life. My tenure case will be decided by peer review, ultimately. I do peer review myself as a service, constantly. I know all about peer review.
The burden of proof is on MIRI, not on me. MIRI is the one that wants funding and people to save the world. It's up to MIRI to use all available financial and intellectual resources out there, which includes engaging with academia.
I really think you should moderate your criticism of peer review. Peer review for data analysis papers is very different from peer review for mathematics or theoretical physics. Fields are different and have vastly different cultural norms. Even in the same field, different conferences/journals may have different norms.
I do a lot of theory. When I do data analysis, my collabs and I try to lead by example. What is the point of being angry? Angry outsiders just make people circle the wagons.
This argument seems exactly identical to the argument for trepanning, even including the survivorship bias. (One of the suspected uses of trepanning was to revive people otherwise thought dead.)
While we're looking at anecdotes, this bit of Nelson's experience with surgery seems relevant:
I'm not sure I'd count that as a win for surgery, or evidence that he couldn't have survived without it!
But this means that, unless you're particularly good at distancing yourself from your work, you should expect to be worse at judging it than a disinterested observer. The classic anecdote about "which half?" comes to mind, or the reaction of other obstetricians to Semmelweis's concerns.
Regardless, we would expect that, if studies are better than anecdotes, studies on peer review will outperform anecdotes on peer review, right?
It's not identical because we know, with benefit of hindsight, that amputating potentially gangrenous limbs is a good idea. The folks in the past had solid empirical basis for amputations, even if they did not fully understand gangrene. Medicine was mostly, but not always nonsense in the past. A lot of the stuff was not based on the scientific method, because they had no scientific method. But there were isolated communities that came up with sensible things for sensible reasons. This is one case when standard practices were sensible (there are other isolated examples, e.g. honey to disinfect wounds).
Ok, but isn't this "incentive tennis?" Gwern's incentives are clearer than mine here -- he's not a mainstream academic, so he loses out on status. So a "low motive" interpretation of the argument is: "your status castle is built on sand, tear it down!" Gwern is also pretty angry. Are we going to stockpile argument ammunition [X] of the form "you are more biased when evaluating peer review because of [X]"?
For me, peer review is a double edged sword -- I get papers rejected sometimes, and at other times I get silly reviewer comments, or editors that make me spend years revising. I have a lot of data both ways. The point with peer review is I sleep better at night due to extra sanity checking. Who sanity-checks MIRI's whiteboard stuff?
A "low motive" argument for me would be "keep peer review, but have it softball all my papers, they are obviously so amazing why can't you people see that!"
A "low motive" argument for MIRI would be "look buddy, we are trying to save the world here, we don't have time for your flawed human institutions. Don't you worry about our whiteboard content, you probably don't know enough math to understand it anyways." MIRI is doing pretty theoretical decision theory. Is that a good idea? Are they producing enough substantive work? In standard academia peer review would help with the former question, and answering to the grant agency and tenure pressure would help with the second. These are not perfect incentives, but they are there. Right now there are absolutely no guard rails in place preventing MIRI from going off the deep end.
Your argument basically says not to trust domain experts, that's the opposite of what should be done.
Gwern also completely ignores effect modification (e.g. the practice of evaluating conditional effects after conditioning on things like paper topic). Peer review cultures for empirical social science papers and for theoretical physics papers basically have nothing to do with each other.
I would put the start of solid empirical basis for gangrene treatment at Middleton Goldsmith during the American Civil War (dropping mortality from 45% to 3%), about sixty years after Nelson.
I think this is putting too much weight on superficial resemblance. Yes, gangrene treatment from Goldsmith to today involves amputation. But that does not mean amputation pre-Goldsmith actually decreased mortality over no treatment! My priors are pretty strong that it would increase it, but going into details on my priors is perhaps a digression. (The short version is that I take a very Hansonian view of medicine and its efficacy.) I'm not aware of (but would greatly appreciate) any evidence on that question.
(To see where I'm coming from, consider that there is a reference class that contains both "trepanning" and "brian surgery" that seems about as natural as the reference class that includes amputation before and after Goldsmith.)
But this only makes sense if peer review actually improves the quality of studies. Do you believe that's the case, and if so, why?
I think my argument is domain expert tennis. That is, I think that in order to evaluate whether or not peer review is effective, we shouldn't ask scientists who use peer review, we should ask scientists who study peer review. Similarly, in order to determine whether a treatment is effective, we shouldn't ask the users of the treatment, but statisticians. If you go down to the church/synagogue/mosque, they'll say that prayer is effective, and they're obviously the domain experts on prayer. I'm just applying the same principles and same level of skepticism.
I am not sure what the relevance of either of these are. If anything, the latter suggests that we need to make the case for peer review field by field, and so proponents have an even harder time than they do without that claim!
I think treating gangrene by amputation was well known in the ancient world. Depending on how you deal w/ hemorrhage/complications you would have a pretty high post-surgery mortality rate, but the point is, it is still an improvement on gangrene killing you for sure.
Actually, while I didn't look into this, I expect Jewish and Greek surgeons would have been pretty good compared to medieval European ones.
I don't have data from the ancient world :). But mortality from gangrene if you leave the dead tissue in place is what, >95%? Amputation didn't have to be perfect or even very good, it merely had to do better than an almost certain death sentence.
Well, because peer review would do things like say "your proof has a bug," "you didn't cite this important paper," "this is an exact a very minor modification of [approach]." Peer review in my case is a social institution where smart knowledgeable people read my stuff.
You can say that's heavily confounded by your field, the types of papers you write (or review), etc., and I agree! But that is of little relevance to gwern, he thinks the whole thing needs to be burned to the ground.
Not following. The claim "peer review sucks for all X," is stronger than the claim "peer review sucks for some X." The person making the stronger claim will have a harder time demonstrating it than the person making the weaker claim. So as a status quo defender, I have an easier time attacking the stronger claim.
I think you missed the meat of my claim; yes, al-Zharawi said to amputate as a response to gangrene, but that is not a solid empirical basis, and as a result it is not obvious that it actually extended lifespans on net. We don't have the data to verify, and we don't have reason to trust their methodology.
Now, maybe gangrene is a case where we can move away from priors on whether archaic surgery was net positive or net negative based on inside view reasoning. I'm not a doctor or a medical historian, and the one place I can think of to look for data (homeopathic treatment of gangrene) doesn't seem to have any sort of aggregated data, just case reports of survival. Perhaps an actual medical historian could determine it one way or the other, or come up with a better estimate of the survival rate. But my guess is that 95% is a very high estimate.
I could, but why? I'll simply point out that is not science, and that it's not even trying to be science. It's raw good intentions.
Suppose that the person on the street thinks that price caps on food are a good idea, because it would be morally wrong to gouge on necessities and the poor deserve to be able to afford to eat. Then someone comes along and points out that the frequent queues, or food shortages, or starvation, are a consequence of this policy, regardless of the policy's intentions.
The person on the street is confused--but food being cheap is a good thing, why is this person so angry about price caps? They're angry because of the difference between perception of policies and their actual consequences.
The claim I saw you as making is that peer review's efficacy in field x is unrelated to its efficacy in field y. If true, that makes it harder for either of us to convince the other in either direction. I, with the null hypothesis that peer review does not add scientific value, would need to be convinced of peer review's efficacy in every field separately. The situation is symmetric for you: your null hypothesis that peer review adds scientific value would need to be defeated in every field separately.
Now, whether or not our null hypothesis should be efficacy or lack of efficacy is a key component of this whole debate. How would you go about arguing that, say, to someone who believed that prayer caused rain?
I think this isn't really cutting to the heart of things--which seems to be 'reputation among intellectuals,' which is related to 'reputation among academia,' which is related to 'journal articles survive the peer review process.' It seems to me that the peer review process as it exists now is a pretty terrible way of capturing reputation among intellectuals, and that we could do something considerably better with the technology we have now.
Anyone suggested a system based on blockchain yet? X-)
I imagine a system where new Sciencecoins could be mined by doing valid scientific research, but then they could be used as a usual cryptocurrency. That would also solve the problem of funding research. :D