In light of the portions I quoted from Armstrong and Green's paper, I'll look at Gavin Schmidt's post:
Principle 1: When moving into a new field, don’t assume you know everything about it because you read a review and none of the primary literature.
Score: -2 G+A appear to have only read one chapter of the IPCC report (Chap 8), and an un-peer reviewed hatchet job on the Stern report. Not a very good start…
The paper does cite many other sources than just the IPCC and the "hatchet job" on the Stern Report, including sources that evaluate climate models and their quality in general. ChrisC notes that the author's fail to cite the ~788 references for the IPCC Chapter 8. The authors claim to have a bibliography on their website that includes the full list of references given to them by all academics who suggested references. Unfortunately, as I noted in my earlier comment, the link to the bibliography from http://www.forecastingprinciples.com/index.php?option=com_content&view=article&id=78&Itemid=107 is broken. This doesn't reflect well on the authors (the site on the whole is a mess, with many broken links). Assuming, however, that the authors had put up the bibliography and that it was available as promised in the paper, this critique seems off the mark (though I'd have to see the bibliography to know for sure).
Principle 2: Talk to people who are doing what you are concerned about.
Score: -2 Of the roughly 20 climate modelling groups in the world, and hundreds of associated researchers, G+A appear to have talked to none of them. Strike 2.
This seems patently false given the contents of the paper as I quoted it, and the list of experts that they sought. In fact, it seems like such a major error that I have no idea how Schmidt could have made it if he'd read the paper. (Perhaps he had a more nuanced critique to offer, e.g., that the authors' survey didn't ask enough questions, or they should have tried harder, or contacted more people. But the critique as offered here smacks of incompetence or malice). [Unless Schmidt was reading an older version of the paper that didn't mention the survey at all. But I doubt that even if he was looking at an old version of the paper, it omitted all references to the survey.]
Principle 3: Be humble. If something initially doesn’t make sense, it is more likely that you’ve mis-understood than the entire field is wrong.
Score: -2 For instance, G+A appear to think that climate models are not tested on ‘out of sample’ data (they gave that a ‘-2′). On the contrary, the models are used for many situations that they were not tuned for, paleo-climate changes (mid Holocene, last glacial maximum, 8.2 kyr event) being a good example. Similarly, model projections for the future have been matched with actual data – for instance, forecasting the effects of Pinatubo ahead of time, or Hansen’s early projections. The amount of ‘out of sample’ testing is actually huge, but the confusion stems from G+A not being aware of what the ‘sample’ data actually consists of (mainly present day climatology). Another example is that G+A appear to think that GCMs use the history of temperature changes to make their projections since they suggest leaving some of it out as a validation. But this is just not so, as we discussed more thoroughly in a recent thread.
First off, retrospective "predictions" of things that people already tacitly know, even though those things aren't explicitly used in tuning the models, are not that reliable.
Secondly, it's possible (and likely) that Armstrong and Green missed some out-of-model tests and validations that had been performed in the climate science arena. While part of this can be laid at their feet, part of it also reflects poor documentation by climate scientists of exactly how they were going about their testing. I did read that IPCC AR4 chapter that Armstrong and Green did, and I found it quite unclear on the forecasting side of things (compared to other papers I've read that judge forecast skill, in weather and short-term climate forecasting, macroeconomic forecasting, and business forecasting). This is similar to the sloppy code problem.
Thirdly, the climate scentists whom Armstrong and Green attempted to engage could have been more engaging (not Gavin Schmidt's fault; he wasn't included in the list, and the response rate appears to have been low from mainstream scientists as well as skeptics, so it's not just a problem of the climate science mainstream).
Overall, I'd like to know more details of the survey responses and Armstrong and Green's methodology, and it would be good if they combined their proclaimed commitment to openness with actually having working links on their websites. But Schmidt's critique doesn't reflect too well on him, even if Armstrong and Green were wrong.
Now, to ChrisC's comment:
Call me crazy, but in my field of meteorology, we would never head to popular literature, much less the figgin internet, in order to evaluate the state of the art in science. You head to the scientific literature first and foremost. Since meteorology and climatology are not that different, I would struggle to see why it would be any different.
The authors also seem to put a large weight on “forecasting principles” developed in different fields. While there may be some valuable advice, and cross-field cooperation is to be encouraged, one should not assume that techniques developed in say, econometrics, port directly into climate science.
The authors also make much of a wild goose chase on google for sites matching their specific phrases, such as “global warming” AND “forecast principles”. I’m not sure what a lack of web sites would prove. They also seem to have skiped most of the literature cited in AR4 ch. 8 on model validation and climatology predictions.
Part of the authors' criticism was that the climate science mainstream hadn't paid enough attention to forecasting, or to formal evaluations of forecasting. So it's natural that they didn't find enough mainstream stuff to cite that was directly relevant to the questions at hand for them.
As for the Google search and Google Scholar search, these are standard tools for initiating an inquiry. I know, I've done it, and so has everybody else. It would be damning if the authors had relied only on such searches. But they surveyed climate scientists and worked their way through the IPCC Working Group Report. This may have been far short of full due diligence, but it isn't anywhere near as sloppy as Gavin Schmidt and ChrisC make it sound.
Thanks for a comprehensive summary - that was helpful.
It seems that A&G contacted the working scientists to identify papers which (in the scientists' view) contained the most credible climate forecasts. Not many responded, but 30 referred to the recent (at the time) IPCC WP1 report, which in turn referenced and attempted to summarize over 700 primary papers. There also appear to have been a bunch of other papers cited by the surveyed scientists, but the site has lost them. So we're somewhat at a loss to decide which primary sources climate scientists ...
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
One of the categories of critique that have been leveled against climate science is the critique of insularity. Broadly, it is claimed that the type of work that climate scientists are trying to do draws upon insight and expertise in many other domains, but climate scientists have historically failed to consult experts in those domains or even to follow well-documented best practices.
Some takeaways/conclusions
Note: I wrote a preliminary version of this before drafting the post, but after having done most of the relevant investigation. I reviewed and edited it prior to publication. Note also that I don't justify these takeaways explicitly in my later discussion, because a lot of these come from general intuitions of mine and it's hard to articulate how the information I received explicitly affected my reaching the takeaways. I might discuss the rationales behind these takeaways more in a later post.
Relevant domains they may have failed to use or learn from
Let's look at each of these critiques in turn.
Critique #1: Failure to consider forecasting research
We'll devote more attention to this critique, because it has been made, and addressed, cogently in considerable detail.
J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.
Armstrong and Green began their critique by noting the following:
How significant are these general criticisms? It depends on the answers to the following questions:
So it seems like there was arguably a failure of proper procedure in the climate science community in terms of consulting and applying practices from relevant domains. Still, how germane was it to the quality of their conclusions? Maybe it didn't matter after all?
In Chapter 12 of The Signal and the Noise, statistician and forecaster Nate Silver offers the following summary of Armstrong and Green's views:
Silver addresses each of these in his book (read it to know what he says). Here are my own thoughts on the three points as put forth by Silver:
Some counterpoints to the Armstrong and Green critique:
UPDATE: I forgot to mention in my original draft of the post that Armstrong challenged Al Gore to a bet pitting Armstrong's No Change model with the IPCC model. Gore did not accept the bet, but Armstrong created the website (here) anyway to record the relative performance of the two models.
UPDATE 2: Read drnickbone's comment and my replies for more information on the debate. drnickbone in particular points to responses from Real Climate and Skeptical Science, that I discuss in my response to his comment.
Critique #2: Inappropriate or misguided use of statistics, and failure to consult statisticians
To some extent, this overlaps with Critique #1, because best practices in forecasting include good use of statistical methods. However, the critique is a little broader. There are many parts of climate science not directly involved with forecasting, but where statistical methods still matter. Historical climate reconstruction is one such example. The purpose of these is to get a better understanding of the sorts of climate that could occur and have occurred, and how different aspects of the climate correlated. Unfortunately, historical climate data is not very reliable. How do we deal with different proxies for the climate variables we are interested in so that we can reconstruct them? A careful use of statistics is important here.
Let's consider an example that's quite far removed from climate forecasting, but has (perhaps undeservedly) played an important role in the public debate on global warming: Michael Mann's famed hockey stick (Wikipedia), discussed in detail in Mann, Bradley and Hughes (henceforth, MBH98) (available online here). The major critiques of the paper arose in a series of papers by McIntyre and McKitrick, the most important of them being their 2005 paper in Geophysical Research Letters (henceforth, MM05) (available online here).
I read about the controversy in the book The Hockey Stick Illusion by Andrew Montford (Amazon, Wikipedia), but the author also has a shorter article titled Caspar and the Jesus paper that covers the story as it unfolds from his perspective. While there's a lot more to the hockey stick controversy than statistics alone, some of the main issues are statistical.
Unfortunately, I wasn't able to resolve the statistical issues myself well enough to have an informed view. But my very crude intuition, as well as the statements made by statisticians as recorded below, supports Montford's broad outline of the story. I'll try to describe the broad critiques leveled from the statistical perspective:
There has been a lengthy debate on the subject, plus two external inquiries and reports on the debate: the NAS Panel Report headed by Gerry North, and the Wegman Report headed by Edward Wegman. Both of them agreed with the statistical criticisms made by McIntyre, but the NAS report did not make any broader comments on what this says about the discipline or the general hockey stick hypothesis, while the Wegman report made more explicit criticism.
The Wegman Report made the insularity critique in some detail:
McIntyre has a lengthy blog post summarizing what he sees as the main parts of the NAS Panel Report, the Wegman Report, and other statements made by statisticians critical of MBH98.
Critique #3: Inadequate use of software engineering, project management, and coding documentation and testing principles
In the aftermath of Climategate, most public attention was drawn to the content of the emails. But apart from the emails, data and code was also leaked, and this gave the world an inside view of the code that's used to simulate the climate. A number of criticisms of the coding practice emerged.
Chicago Boyz had a lengthy post titled Scientists are not Software Engineers that noted the sloppiness in the code, and some of the implications, but was also quick to point out that poor-quality code is not unique to climate science and is a general problem with large-scale projects that arise from small-scale academic research growing beyond what the coders originally intended, but with no systematic efforts being made to refactor the code (if you have thoughts on the general prevalence of good software engineering practices in code for academic research, feel free to share them by answering my Quora question here, and if you have insights on climate science code in particular, answer my Quora question here). Below are some excerpts from the post:
For some choice comments excerpted from a code file, see here.
Critique #4: Practices of publication of data, metadata, and code (that had gained traction in other disciplines)
When McIntyre wanted to replicate MBH98, he emailed Mann asking for his data and code. Mann, though initially cooperative, soon started trying to fed McIntyre off. Part of this was because he thought McIntyre was out to find something wrong with his work (a well-grounded suspicion). But part of it was also that his data and code were a mess. He didn't maintain them in a way that he'd be comfortable sharing them around to anybody other than an already sympathetic academic. And, more importantly, as Mann's colleague Stephen Schneider noted, nobody asked for the code and underlying data during peer review. And most journals at the time did not require authors to submit or archive their code and data at the time of submission or acceptance of their paper. This also closely relates to Critique #3: a requirement or expectation that one's data and code would be published along with one's paper might make people more careful to follow good coding practices and avoid using various "tricks" and "hacks" in their code.
Here's how Andrew Montford puts it in The Hockey Stick Illusion: