This comment was getting a bit long, so I decided to just post relevant stuff from Armstrong and Green first and then offer my own thoughts in a follow-up comment.
We surveyed scientists involved in long-term climate forecasting and policy makers. Our primary concern was to identify the most important forecasts and how those forecasts were made. In particular, we wished to know if the most widely accepted forecasts of global average temperature were based on the opinions of experts or were derived using scientific forecasting methods. Given the findings of our review of reviews of climate forecasting and the conclusion from our Google search that many scientists are unaware of evidence-based findings related to forecasting methods, we expected that the forecasts would be based on the opinions of scientists. We sent a questionnaire to experts who had expressed diverse opinions on global warming. We generated lists of experts by identifying key people and asking them to identify others. (The lists are provided in Appendix A.) Most (70%) of the 240 experts on our lists were IPCC reviewers and authors. Our questionnaire asked the experts to provide references for what they regarded as the most credible source of long-term forecasts of mean global temperatures. We strove for simplicity to minimize resistance to our request. Even busy people should have time to send a few references, especially if they believe that it is important to evaluate the quality of the forecasts that may influence major decisions. We asked: “We want to know which forecasts people regard as the most credible and how those forecasts were derived… In your opinion, which scientific article is the source of the most credible forecasts of global average temperatures over the rest of this century?” We received useful responses from 51 of the 240 experts, 42 of whom provided references to what they regarded as credible sources of long-term forecasts of mean global temperatures. Interestingly, eight respondents provided references in support of their claims that no credible forecasts exist. Of the 42 expert respondents who were associated with global warming views, 30 referred us to the IPCC’s report. A list of the papers that were suggested by respondents is provided at publicpolicyforecasting.com in the “Global Warming” section.
Unfortunately, the Forecasting Principles website seems to be a mess. Their Global Warming Audit page:
http://www.forecastingprinciples.com/index.php?option=com_content&view=article&id=78&Itemid=107
does link to a bibliography, but the link is broken (as is their global warming audit link, though the file is still on their website).
(This is another example where experts in one field ignore best practices -- of maintaining working links to their writing -- so the insularity critique applies to forecasting experts).
Continuing:
Based on the replies to our survey, it was clear that the IPCC’s Working Group 1 Report contained the forecasts that are viewed as most credible by the bulk of the climate forecasting community. These forecasts are contained in Chapter 10 of the Report and the models that are used to forecast climate are assessed in Chapter 8, “Climate Models and Their Evaluation” (Randall et al. 2007). Chapter 8 provided the most useful information on the forecasting process used by the IPCC to derive forecasts of mean global temperatures, so we audited that chapter.
We also posted calls on email lists and on the forecastingprinciples.com site asking for help from those who might have any knowledge about scientific climate forecasts. This yielded few responses, only one of which provided relevant references.
Trenberth (2007) and others have claimed that the IPCC does not provide forecasts but rather presents “scenarios” or “projections.” As best as we can tell, these terms are used by the IPCC authors to indicate that they provide “conditional forecasts.” Presumably the IPCC authors hope that readers, especially policy makers, will find at least one of their conditional forecast series plausible and will act as if it will come true if no action is taken. As it happens, the word “forecast” and its derivatives occurred 37 times, and “predict” and its derivatives occurred 90 times in the body of Chapter 8. Recall also that most of our respondents (29 of whom were IPCC authors or reviewers) nominated the IPCC report as the most credible source of forecasts (not “scenarios” or “projections”) of global average temperature. We conclude that the IPCC does provide forecasts. In order to audit the forecasting processes described in Chapter 8 of the IPCC’s report, we each read it prior to any discussion. The chapter was, in our judgment, poorly written. The writing showed little concern for the target readership. It provided extensive detail on items that are of little interest in judging the merits of the forecasting process, provided references without describing what readers might find, and imposed an incredible burden on readers by providing 788 references. In addition, the Chapter reads in places like a sales brochure. In the three-page executive summary, the terms, “new” and “improved” and related derivatives appeared 17 times. Most significantly, the chapter omitted key details on the assumptions and the forecasting process that were used. If the authors used a formal structured procedure to assess the forecasting processes, this was not evident. [...] Reliability is an issue with rating tasks. For that reason, it is desirable to use two or more raters. We sent out general calls for experts to use the Forecasting Audit Software to conduct their own audits and we also asked a few individuals to do so. At the time of writing, none have done so.
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
One of the categories of critique that have been leveled against climate science is the critique of insularity. Broadly, it is claimed that the type of work that climate scientists are trying to do draws upon insight and expertise in many other domains, but climate scientists have historically failed to consult experts in those domains or even to follow well-documented best practices.
Some takeaways/conclusions
Note: I wrote a preliminary version of this before drafting the post, but after having done most of the relevant investigation. I reviewed and edited it prior to publication. Note also that I don't justify these takeaways explicitly in my later discussion, because a lot of these come from general intuitions of mine and it's hard to articulate how the information I received explicitly affected my reaching the takeaways. I might discuss the rationales behind these takeaways more in a later post.
Relevant domains they may have failed to use or learn from
Let's look at each of these critiques in turn.
Critique #1: Failure to consider forecasting research
We'll devote more attention to this critique, because it has been made, and addressed, cogently in considerable detail.
J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.
Armstrong and Green began their critique by noting the following:
How significant are these general criticisms? It depends on the answers to the following questions:
So it seems like there was arguably a failure of proper procedure in the climate science community in terms of consulting and applying practices from relevant domains. Still, how germane was it to the quality of their conclusions? Maybe it didn't matter after all?
In Chapter 12 of The Signal and the Noise, statistician and forecaster Nate Silver offers the following summary of Armstrong and Green's views:
Silver addresses each of these in his book (read it to know what he says). Here are my own thoughts on the three points as put forth by Silver:
Some counterpoints to the Armstrong and Green critique:
UPDATE: I forgot to mention in my original draft of the post that Armstrong challenged Al Gore to a bet pitting Armstrong's No Change model with the IPCC model. Gore did not accept the bet, but Armstrong created the website (here) anyway to record the relative performance of the two models.
UPDATE 2: Read drnickbone's comment and my replies for more information on the debate. drnickbone in particular points to responses from Real Climate and Skeptical Science, that I discuss in my response to his comment.
Critique #2: Inappropriate or misguided use of statistics, and failure to consult statisticians
To some extent, this overlaps with Critique #1, because best practices in forecasting include good use of statistical methods. However, the critique is a little broader. There are many parts of climate science not directly involved with forecasting, but where statistical methods still matter. Historical climate reconstruction is one such example. The purpose of these is to get a better understanding of the sorts of climate that could occur and have occurred, and how different aspects of the climate correlated. Unfortunately, historical climate data is not very reliable. How do we deal with different proxies for the climate variables we are interested in so that we can reconstruct them? A careful use of statistics is important here.
Let's consider an example that's quite far removed from climate forecasting, but has (perhaps undeservedly) played an important role in the public debate on global warming: Michael Mann's famed hockey stick (Wikipedia), discussed in detail in Mann, Bradley and Hughes (henceforth, MBH98) (available online here). The major critiques of the paper arose in a series of papers by McIntyre and McKitrick, the most important of them being their 2005 paper in Geophysical Research Letters (henceforth, MM05) (available online here).
I read about the controversy in the book The Hockey Stick Illusion by Andrew Montford (Amazon, Wikipedia), but the author also has a shorter article titled Caspar and the Jesus paper that covers the story as it unfolds from his perspective. While there's a lot more to the hockey stick controversy than statistics alone, some of the main issues are statistical.
Unfortunately, I wasn't able to resolve the statistical issues myself well enough to have an informed view. But my very crude intuition, as well as the statements made by statisticians as recorded below, supports Montford's broad outline of the story. I'll try to describe the broad critiques leveled from the statistical perspective:
There has been a lengthy debate on the subject, plus two external inquiries and reports on the debate: the NAS Panel Report headed by Gerry North, and the Wegman Report headed by Edward Wegman. Both of them agreed with the statistical criticisms made by McIntyre, but the NAS report did not make any broader comments on what this says about the discipline or the general hockey stick hypothesis, while the Wegman report made more explicit criticism.
The Wegman Report made the insularity critique in some detail:
McIntyre has a lengthy blog post summarizing what he sees as the main parts of the NAS Panel Report, the Wegman Report, and other statements made by statisticians critical of MBH98.
Critique #3: Inadequate use of software engineering, project management, and coding documentation and testing principles
In the aftermath of Climategate, most public attention was drawn to the content of the emails. But apart from the emails, data and code was also leaked, and this gave the world an inside view of the code that's used to simulate the climate. A number of criticisms of the coding practice emerged.
Chicago Boyz had a lengthy post titled Scientists are not Software Engineers that noted the sloppiness in the code, and some of the implications, but was also quick to point out that poor-quality code is not unique to climate science and is a general problem with large-scale projects that arise from small-scale academic research growing beyond what the coders originally intended, but with no systematic efforts being made to refactor the code (if you have thoughts on the general prevalence of good software engineering practices in code for academic research, feel free to share them by answering my Quora question here, and if you have insights on climate science code in particular, answer my Quora question here). Below are some excerpts from the post:
For some choice comments excerpted from a code file, see here.
Critique #4: Practices of publication of data, metadata, and code (that had gained traction in other disciplines)
When McIntyre wanted to replicate MBH98, he emailed Mann asking for his data and code. Mann, though initially cooperative, soon started trying to fed McIntyre off. Part of this was because he thought McIntyre was out to find something wrong with his work (a well-grounded suspicion). But part of it was also that his data and code were a mess. He didn't maintain them in a way that he'd be comfortable sharing them around to anybody other than an already sympathetic academic. And, more importantly, as Mann's colleague Stephen Schneider noted, nobody asked for the code and underlying data during peer review. And most journals at the time did not require authors to submit or archive their code and data at the time of submission or acceptance of their paper. This also closely relates to Critique #3: a requirement or expectation that one's data and code would be published along with one's paper might make people more careful to follow good coding practices and avoid using various "tricks" and "hacks" in their code.
Here's how Andrew Montford puts it in The Hockey Stick Illusion: