Domains of forecasting
Note: This post is part of my series on forecasting for MIRI. I recommend reading my earlier post on the general-purpose forecasting community, my post on scenario planning, and my post on futures studies. Although this post doesn't rely on those, they do complement each other.
Note 2: If I run across more domains where I have substantive things to say, I'll add them to this post (if I've got a lot to say, I'll write a separate post and add a link to it as well). Suggestions for other domains worth looking into, that I've missed below, would be appreciated.
Below, I list some examples of domains where forecasting is commonly used. In the post, I briefly describe each of the domains, linking to other posts of mine, or external sources, for more information. The list is not intended to be comprehensive. It's just the domains that I investigated at least somewhat and therefore have something to write about.
- Weather and climate forecasting
- Agriculture, crop simulation
- Business forecasting, including demand, supply, and price forecasting
- Macroeconomic forecasting
- Political and geopolitical forecasting: This includes forecasting of election results, public opinion on issues, armed conflicts or political violence, and legislative changes
- Demographic forecasting, including forecasting of population, age structure, births, deaths, and migration flows.
- Energy use forecasting (demand forecasting, price forecasting, and supply forecasting, including forecasting of conventional and alternative energy sources; borrows some general ideas from business forecasting)
- Technology forecasting
Let's look into these in somewhat more detail.
Note that for some domains, scenario planning may be more commonly used than forecasting in the traditional sense. Some domains have historically been more closely associated with machine learning, data science, and predictive analytics techniques (this is usually the case when a large number of explanatory variables are available). Some domains have been more closely associated with futures studies, that I discussed here. I've included the relevant observations for individual domains where applicable.
Climate and weather forecasting
More details are in my posts on weather forecasting and weather and climate forecasting, but here are a few main points:
- The best weather forecasting methods use physical models rather than statistical models (though some statistics/probability is used to tackle some inherently uncertain processes, such as cloud formation). Moreover, they use simulations rather than direct closed form expressions. Errors compound over time due to a combination of model errors, measurement errors, and hypersensitivity to initial conditions.
- There are two baseline models against which the quality of any model can be judged: persistence (weather tomorrow is predicted to be the same as weather today) and climatology (weather tomorrow is predicted to be the average of the weather on that day over the last few years). We can think of persistence and climatology as purely statistical approaches, and these already do quite well. Any approach that consistently beats them needs to run very computationally intensive weather simulations.
- Even though a lot of computing power is used in weather prediction, human judgment still adds considerable value, about 10-25%, relative to what the computer models generate. This is attributed to humans being better able to integrate historical experience and common sense into their forecasts, and can offer better sanity checks. The use of machine learning tools in sanity-checking weather forecasts might substitute for the human value-added.
- Long-run climate forecasting methods are more robust in the sense of not being hypersensitive to initial conditions. Long-run forecasts require a better understanding of the speed and strength of various feedback mechanisms and equilibrating processes, and this makes them more uncertain. Whereas the uncertainty in short-run forecasts is mostly initial condition uncertainty, the uncertainty in long run forecasts arises from scenario uncertainty, plus uncertainty about the strength of various feedback mechanisms.
With long-term climate forecasting, a common alternative to forecasting is scenario analysis, such as that used by the IPCC in its discussion of long-term climate change. An example is the IPCC Special Report on Emissions Scenarios.
In addition to my overviews of weather and climate forecasting, I also wrote a series of posts on climate change science and some of its implications. These provide some interesting insight into the different points of contention related to making long-term climate forecasts, identifying causes, and making sense of a somewhat politicized realm of discourse. My posts in the area so far are below (I'll update this list with more posts as and when I make them):
- Climate science: how it matters for understanding forecasting, materials I've read or plan to read, sources of potential bias
- Time series forecasting for global temperature: an outside view of climate forecasting
- Carbon dioxide, climate sensitivity, feedback, and the historical record: a cursory examination of the Anthropogenic Global Warming (AGW) hypothesis
- [QUESTION]: What are your views on climate change, and how did you form them?
- The insularity critique of climate science
Agriculture and crop simulation
- Predictions of agricultural conditions and crop yields are made using crop simulation models (Wikipedia, PDF overview). Crop simulation models include purely statistical models, physical models that rely on simulations, and approximate physical models that use functional expressions.
- Weather and climate predictions are a key component of agricultural prediction, because of the dependence of agricultural yield on climate conditions. Some companies, such as The Climate Corporation (website, Wikipedia) specialize in using climate prediction to make predictions and recommendations for farmers.
Business forecasting
- Business forecasting includes forecasting of demand, supply, and price.
- Time series forecasting (i.e., trying to predict future values of a variable from past values of that variable alone) is quite common for businesses operating in environments where they have very little understanding of or ability to identify and measure explanatory variables.
- As with weather forecasting, persistence (or slightly modified versions thereof, such as trend persistence that assumes a constant rate of growth) can generally be simple to implement while coming close to the theoretical limit of what can be predicted.
- More about business forecasting can be learned from the SAS Business Forecasting Blog or the Institute of Business Forecasting and Planning website and LinkedIn group.
Two commonly used journals in business forecasting are:
- Journal of Business Forecasting (website)
- International Journal of Business Forecasting and Marketing Intelligence (website)
Many of the time series used in the Makridakis Competitions (that I discussed in my review of historical evaluations of forecasting) come from businesses, so the lessons of that competition can broadly be said to apply to the realm of business forecasting (the competition also uses a few macroeconomic time series).
Macroeconomic forecasting
There is a mix of explicit forecasting models and individual judgment-based forecasters in the macroeconomic forecasting arena. However, unlike the case of weather forecasting, where the explicit forecasting models (or more precisely, the numerical weather simulations) improve forecast accuracy to a level that would be impossible for unaided humans, the situation with macroeconomic forecasting is more ambiguous. In fact, the most reliable macroeconomic forecasts seem to arise by taking averages of the forecasts of a reasonably large number of expert forecasters, each using their own intuition, judgment, or formal model. For an overview of the different examples of survey-based macroeconomic forecasting and how they compare with each other, see my earlier post on the track record of survey-based macroeconomic forecasting.
Political and geopolitical forecasting
I reviewed political and geopolitical forecasting, including forecasting for political conflicts and violence, in this post. A few key highlights:
- This is the domain where Tetlock did his famous work showing that experts don't do a great job of predicting things, as described in his book Expert Political Judgment. I discussed Tetlock's work briefly in my review of historical evaluations of forecasting.
- Currently, the most reliable source of forecasts for international political questions is The Good Judgment Project (website, Wikipedia), which relies on aggregating the judgments of contestants who are given access to basic data and are allowed to use web searches. The GJP is co-run by Tetlock. For election forecasting in the United States, PollyVote (website, Wikipedia), FiveThirtyEight (website, Wikipedia), and prediction markets such as Intrade (website, Wikipedia) and the Iowa Electonic Markets (website, Wikipedia) are good forecast sources. Of these, PollyVote appears to have done the best, but the others have been more widely used.
- Quantitative approaches to prediction rely on machine learning and data science, combined with text analysis of news of political events.
Demographic forecasting
Forecasting of future population is a tricky business, but some aspects are easier to forecast than others. For instance, the population of 25-year-olds 5 years from now can be determined with reasonable precision by knowing the population of 20-year-olds now. Other variables, such as birth rates, are harder to predict (they can go up or down fast, at least in principle) but in practice, assuming level persistence or trend persistence can often offer reasonably good forecasts over the short term. While there are long-run trends (such as a trend of decline in both period fertility and total fertility) I don't know how well these declines were predicted. I wrote up some of my findings on the recent phenomenon of ultra-low fertility in many countries, so I have some knowledge of fertility trends, but I did not look systematically into the question of whether people were able to correctly forecast specific trends.
Gary King (Wikipedia) has written a book on demographic forecasting and also prepared slides covering the subject. I skimmed through his writing, but not enough to comment on it. It seems like mostly simple mathematics and statistics, tailored somewhat to the context of demographics.
With demographics, depending on context, scenario analyses may be more useful than forecasts. For instance, land use planning or city development may be done keeping in mind different possibilities for how the population and age structure might change.
Energy use forecasting (demand and supply)
Short-term energy use forecasting is often treated as a data science or predictive modeling problem, though ideas from general-purpose forecasting also apply. You can get an idea of the state of energy use forecasting by checking out the Global Energy Forecasting Competition (website, Wikipedia), carried out by a team led by Dr. Tao Hong, and cooperating with data science competitions company Kaggle (website, Wikipedia), some of the IEEE working groups, and the International Journal of Forecasting (one of the main journals of the forecasting community).
For somewhat more long-term energy forecasting, scenario analyses are more common. Energy is so intertwined with the global economy that an analysis of long-term energy use often involves thinking about many other elements of the world.
Shell (the organization to pioneer scenario analysis for the private sector) publishes some of its scenario analyses online at the Future Energy Scenarios page. While the understanding of future energy demand and supply is a driving force for the scenario analyses, they cover a wide range of aspects of society. For instance, the New Lens Scenario published in 2012 described two candidate futures for how the world might unfold till 2100, a "Mountains" future where governments played a major role and coordinated to solve global crises, and an "Oceans" future that was more decentralized and market-driven. (For a critique of Shell's scenario planning, see here). Shell competitor BP also publishes an Energy Outlook that is structured more as a forecast than as a scenario analysis, but does briefly consider alternative assumptions in a fashion similar to scenario analysis.
Technology forecasting
Many people in the LessWrong audience might find technology forecasting to be the first thing that crosses their minds when the topic of forecasting is raised. This is partly because technology improvements are quite salient. Improvements in computing are closely linked with the possibility of an Artificial General Intelligence. Famous among the people who view technology trends as harbingers of superintelligence is technologist and inventor Ray Kurzweil, who has been evaluated on LessWrong before. Website such as KurzweilAI.net and Exponential Times have popularized the idea of rapid, unprecedented, exponential growth, that despite its fast pace is somewhat predictable because of the close-to-exponential pattern.
I've written about technology forecasting at the object level before, for instance here, here (a look at Megamistakes), and here.
One other point about technology forecasting: compared to other types of forecasting, technology forecasting is more intricately linked with the domain of futures studies (that I described here). Why technology forecasting specifically? Futures studies seems designed more for studying and bringing about change rather than determining what will happen at or by a specific time. Technology forecasting, unlike other forms of forecasting, is forecasting changes in the technology that we use to operate our lives. So this is the most transformative forecasting domain, and naturally attracts more attention from futures studies.
The insularity critique of climate science
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
One of the categories of critique that have been leveled against climate science is the critique of insularity. Broadly, it is claimed that the type of work that climate scientists are trying to do draws upon insight and expertise in many other domains, but climate scientists have historically failed to consult experts in those domains or even to follow well-documented best practices.
Some takeaways/conclusions
Note: I wrote a preliminary version of this before drafting the post, but after having done most of the relevant investigation. I reviewed and edited it prior to publication. Note also that I don't justify these takeaways explicitly in my later discussion, because a lot of these come from general intuitions of mine and it's hard to articulate how the information I received explicitly affected my reaching the takeaways. I might discuss the rationales behind these takeaways more in a later post.
- Many of the criticisms are broadly on the mark: climate scientists should have consulted best practices in other domains, and in general should have either followed them or clearly explained the reasons for divergence.
- However, this criticism is not unique to climate science: academia in general has suffered from problems of disciplines being relatively insular (UPDATE: Here's Robin Hanson saying something similar). And many similar things may be true, albeit in different ways, outside academia.
- One interesting possibility is that bad practices here operate via founder effects: for an area that starts off as relatively obscure and unimportant, setting up good practices may not be considered important. But as the area grows in importance, it is quite rare for the area to be cleaned up. People and institutions get used to the old ways of doing things. They have too much at stake to make reforms. This does suggest that it's important to get things right early on.
- (This is speculative, and not discussed in the post): The extent of insularity of a discipline seems to be an area where a few researchers can have significant effect on the discipline. If a few reasonably influential climate scientists had pushed for more integration with and understanding of ideas from other disciplines, the history of climate science research would have been different.
Relevant domains they may have failed to use or learn from
- Forecasting research: Although climate scientists were engaging in an exercise that had a lot to do with forecasting, they neither cited research nor consulted experts in the domain of forecasting.
- Statistics: Climate scientists used plenty of statistics in their analysis. They did follow the basic principles of statistics, but in many cases used them incorrectly or combined them with novel approaches that were nonstandard and did not have clear statistical literature justifying the use of such approaches.
- Programming and software engineering: Climate scientists used a lot of code both for their climate models and for their analyses of historical climate. But their code failed basic principles of decent programming, let alone good software engineering principles such as documentation, unit testing, consistent variable names, and version control.
- Publication of data, metadata, and code: This is a phenomenon becoming increasingly common in some other sectors of academia and industry. Climate scientists they failed to learn from econometrics and biomedical research, fields that had been struggling with some qualitatively similar problems and that had been moving to publishing data, metadata, and code.
Let's look at each of these critiques in turn.
Critique #1: Failure to consider forecasting research
We'll devote more attention to this critique, because it has been made, and addressed, cogently in considerable detail.
J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.
Armstrong and Green began their critique by noting the following:
- The climate science literature did not reference any of the forecasting literature, and there was no indication that they had consulted forecasting experts, even though what they were doing was to quite an extent a forecasting exercise.
- There was only one paper, by Stewart and Glantz, dating back to 1985, that could be described as a forecasting audit, and that paper was critical of the methodology of climate forecasting. And that paper appears to have been cited very little in the coming years.
- Armstrong and Green tried to contact leading climate scientists. Of the few who responded, none listed specific forecasting principles they followed, or reasons for not following general forecasting principles. They pointed to the IPCC reports as the best source for forecasts. Armstrong and Green estimated that the IPCC report violated 72 of 89 forecasting principles they were able to rate (their list of forecasting principles includes 140 principles, but they judged only 127 as applicable to climate forecasting, and were able to rate only 89 of them). No climate scientists responded to their invitation to provide their own ratings for the forecasting principles.
How significant are these general criticisms? It depends on the answers to the following questions:
- In general, how much credence do you assign to the research on forecasting principles, and how strong a prior do you have in favor of these principles being applicable to a specific domain? I think the answer is that forecasting principles as identified on the Forecasting Principles website are a reasonable starting point, and therefore, any major forecasting exercise (or exercise that implicitly generates forecasts) should at any rate justify major points of departure from these principles.
- How representative are the views of Armstrong and Green in the forecasting community? I have no idea about the representativeness of their specific views, but Armstrong in particular is high-status in the forecasting community (that I described a while back) and the Forecasting Principles website is one of the go-to sources, so material on the website is probably not too far from views in the forecasting community. (Note: I asked the question on Quora a while back, but haven't received any answers).
So it seems like there was arguably a failure of proper procedure in the climate science community in terms of consulting and applying practices from relevant domains. Still, how germane was it to the quality of their conclusions? Maybe it didn't matter after all?
In Chapter 12 of The Signal and the Noise, statistician and forecaster Nate Silver offers the following summary of Armstrong and Green's views:
- First, Armstrong and Green contend that agreement among forecasters is not related to accuracy—and may reflect bias as much as anything else. “You don’t vote,” Armstrong told me. “That’s not the way science progresses.”
- Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.”
- Finally, Armstrong and Green write that the forecasts do not adequately account for the uncertainty intrinsic to the global warming problem. In other words, they are potentially overconfident.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 382). Penguin Group US. Kindle Edition.
Silver addresses each of these in his book (read it to know what he says). Here are my own thoughts on the three points as put forth by Silver:
- I think consensus among experts (to the extent that it does exist) should be taken as a positive signal, even if the experts aren't good at forecasting. But certainly, the lack of interest or success in forecasting should dampen the magnitude of the positive signal. We should consider it likely that climate scientists have identified important potential phenomena, but should be skeptical of any actual forecasts derived from their work.
- I disagree somewhat with this point. I think forecasting could still be possible, but as of now, there is little of a successful track record of forecasting (as Green notes in a later draft paper). So forecasting efforts, including simple ones (such as persistence, linear regression, random walk with drift) and ones based on climate models (both the ones in common use right now and others that give more weight to the PDO/AMO), should continue but the jury is still out on the extent to which they work.
- I agree here that many forecasters are potentially overconfident.
Some counterpoints to the Armstrong and Green critique:
- One can argue that what climate scientists are doing isn't forecasting at all, but scenario analysis. After all, the IPCC generates scenarios, but not forecasts. But as I discussed in an earlier post, scenario planning and forecasting are closely related, and even if scenarios aren't direct explicit unconditional forecasts, they often involve implicit conditional forecasts. To its credit, the IPCC does seem to have used some best practices from the scenario planning literature in generating its emissions scenarios. But that is not part of the climate modeling exercise of the IPCC.
- Many other domains that involve planning for the future don't reference the forecasting literature. Examples include scenario planning (discussed here) and the related field of futures studies (discussed here). Insularity of disciplines from each other is a common feature (or bug) in much of academia. Can we really expect or demand that climate scientists hold themselves to a higher standard?
UPDATE: I forgot to mention in my original draft of the post that Armstrong challenged Al Gore to a bet pitting Armstrong's No Change model with the IPCC model. Gore did not accept the bet, but Armstrong created the website (here) anyway to record the relative performance of the two models.
UPDATE 2: Read drnickbone's comment and my replies for more information on the debate. drnickbone in particular points to responses from Real Climate and Skeptical Science, that I discuss in my response to his comment.
Critique #2: Inappropriate or misguided use of statistics, and failure to consult statisticians
To some extent, this overlaps with Critique #1, because best practices in forecasting include good use of statistical methods. However, the critique is a little broader. There are many parts of climate science not directly involved with forecasting, but where statistical methods still matter. Historical climate reconstruction is one such example. The purpose of these is to get a better understanding of the sorts of climate that could occur and have occurred, and how different aspects of the climate correlated. Unfortunately, historical climate data is not very reliable. How do we deal with different proxies for the climate variables we are interested in so that we can reconstruct them? A careful use of statistics is important here.
Let's consider an example that's quite far removed from climate forecasting, but has (perhaps undeservedly) played an important role in the public debate on global warming: Michael Mann's famed hockey stick (Wikipedia), discussed in detail in Mann, Bradley and Hughes (henceforth, MBH98) (available online here). The major critiques of the paper arose in a series of papers by McIntyre and McKitrick, the most important of them being their 2005 paper in Geophysical Research Letters (henceforth, MM05) (available online here).
I read about the controversy in the book The Hockey Stick Illusion by Andrew Montford (Amazon, Wikipedia), but the author also has a shorter article titled Caspar and the Jesus paper that covers the story as it unfolds from his perspective. While there's a lot more to the hockey stick controversy than statistics alone, some of the main issues are statistical.
Unfortunately, I wasn't able to resolve the statistical issues myself well enough to have an informed view. But my very crude intuition, as well as the statements made by statisticians as recorded below, supports Montford's broad outline of the story. I'll try to describe the broad critiques leveled from the statistical perspective:
- Choice of centering and standardization: The data was centered around the 20th century, a method known as short-centering, and bound to create a bias in favor of picking hockey stick-like shapes when doing principal components analysis. Each series was also standardized (divided by the standard deviation for the 20th century), which McIntyre argued was inappropriate.
- Unusual choice of statistic used for significance: MBH98 used a statistic called the RE statistic (reduction of error statistic). This is a fairly unusual statistic to use. In fact, it doesn't have a Wikipedia page, and practically the only stuff on the web (on Google and Google Scholar) about it was in relation to tree-ring research (the proxies used in MBH98 were tree rings). This should seem suspicious: why is tree-ring research using a statistic that's basically unused outside the field? There are good reasons to avoid using statistical constructs on which there is little statistical literature, because people don't have a feel for how they work. MBH98 could have used the R^2 statistic instead, and in fact, they mentioned it in their paper but then ended up not using it.
- Incorrect calculation of significance threshold: MM05 (plus subsequent comments by McIntyre) claims that not only is the RE statistic nonstandard, there were problems with the way MBH98 used it. First off, there is no theoretical distribution of the RE statistic, so calculating the cutoff needed to attain a particular significance level is a tricky exercise (this is one of many reasons why using a RE statistic may be ill-advised, according to McIntyre). MBH98 calculated the cutoff value for 99% significance incorrectly to be 0. The correct value according to McIntyre was about 0.54, whereas the actual RE statistic value for the data set in MBH98 was 0.48, i.e., not close enough. A later paper by Ammann and Wahl, cited by many as a vindication of MBH98, computed a similar cutoff of 0.52, so that the actual RE statistic value failed the significance test. So how did it manage to vindicate MBH98 when the value of the RE statistic failed the cutoff? They appear to have employed a novel statistical procedure, coming up with something called a calibration/verification RE ratio. McIntyre was quite critical of this, for reasons he described in detail here.
There has been a lengthy debate on the subject, plus two external inquiries and reports on the debate: the NAS Panel Report headed by Gerry North, and the Wegman Report headed by Edward Wegman. Both of them agreed with the statistical criticisms made by McIntyre, but the NAS report did not make any broader comments on what this says about the discipline or the general hockey stick hypothesis, while the Wegman report made more explicit criticism.
The Wegman Report made the insularity critique in some detail:
In general, we found MBH98 and MBH99 to be somewhat obscure and incomplete and the criticisms of MM03/05a/05b to be valid and compelling. We also comment that they were attempting to draw attention to the discrepancies in MBH98 and MBH99, and not to do paleoclimatic temperature reconstruction. Normally, one would try to select a calibration dataset that is representative of the entire dataset. The 1902-1995 data is not fully appropriate for calibration and leads to a misuse in principal component analysis. However, the reasons for setting 1902-1995 as the calibration point presented in the
narrative of MBH98 sounds reasonable, and the error may be easily overlooked by someone not trained in statistical methodology. We note that there is no evidence that Dr. Mann or any of the other authors in paleoclimatology studies have had significant interactions with mainstream statisticians.In our further exploration of the social network of authorships in temperature reconstruction, we found that at least 43 authors have direct ties to Dr. Mann by virtue of coauthored papers with him. Our findings from this analysis suggest that authors in the area of paleoclimate studies are closely connected and thus ‘independent studies’ may not be as independent as they might appear on the surface. This committee does not believe that web logs are an appropriate forum for the scientific debate on this issue.
It is important to note the isolation of the paleoclimate community; even though they rely heavily on statistical methods they do not seem to be interacting with the statistical community. Additionally, we judge that the sharing of research materials, data and results was haphazardly and grudgingly done. In this case we judge that there was too much reliance on peer review, which was not necessarily independent. Moreover, the work has been sufficiently politicized that this community can hardly reassess their public positions without losing credibility. Overall, our committee believes that Mann’s assessments that the decade of the 1990s was the hottest decade of the millennium and that 1998 was the hottest year of the millennium cannot be supported by his analysis.
McIntyre has a lengthy blog post summarizing what he sees as the main parts of the NAS Panel Report, the Wegman Report, and other statements made by statisticians critical of MBH98.
Critique #3: Inadequate use of software engineering, project management, and coding documentation and testing principles
In the aftermath of Climategate, most public attention was drawn to the content of the emails. But apart from the emails, data and code was also leaked, and this gave the world an inside view of the code that's used to simulate the climate. A number of criticisms of the coding practice emerged.
Chicago Boyz had a lengthy post titled Scientists are not Software Engineers that noted the sloppiness in the code, and some of the implications, but was also quick to point out that poor-quality code is not unique to climate science and is a general problem with large-scale projects that arise from small-scale academic research growing beyond what the coders originally intended, but with no systematic efforts being made to refactor the code (if you have thoughts on the general prevalence of good software engineering practices in code for academic research, feel free to share them by answering my Quora question here, and if you have insights on climate science code in particular, answer my Quora question here). Below are some excerpts from the post:
No, the real shocking revelation lies in the computer code and data that were dumped along with the emails. Arguably, these are the most important computer programs in the world. These programs generate the data that is used to create the climate models which purport to show an inevitable catastrophic warming caused by human activity. It is on the basis of these programs that we are supposed to massively reengineer the entire planetary economy and technology base.
The dumped files revealed that those critical programs are complete and utter train wrecks.
[...]
The design, production and maintenance of large pieces of software require project management skills greater than those required for large material construction projects. Computer programs are the most complicated pieces of technology ever created. By several orders of magnitude they have more “parts” and more interactions between those parts than any other technology.
Software engineers and software project managers have created procedures for managing that complexity. It begins with seemingly trivial things like style guides that regulate what names programmers can give to attributes of software and the associated datafiles. Then you have version control in which every change to the software is recorded in a database. Programmers have to document absolutely everything they do. Before they write code, there is extensive planning by many people. After the code is written comes the dreaded code review in which other programmers and managers go over the code line by line and look for faults. After the code reaches its semi-complete form, it is handed over to Quality Assurance which is staffed by drooling, befanged, malicious sociopaths who live for nothing more than to take a programmer’s greatest, most elegant code and rip it apart and possibly sexually violate it. (Yes, I’m still bitter.)
Institutions pay for all this oversight and double-checking and programmers tolerate it because it is impossible to create a large, reliable and accurate piece of software without such procedures firmly in place. Software is just too complex to wing it.
Clearly, nothing like these established procedures was used at CRU. Indeed, the code seems to have been written overwhelmingly by just two people (one at a time) over the past 30 years. Neither of these individuals was a formally trained programmer and there appears to have been no project planning or even formal documentation. Indeed, the comments of the second programmer, the hapless “Harry”, as he struggled to understand the work of his predecessor are now being read as a kind of programmer’s Icelandic saga describing a death march through an inexplicable maze of ineptitude and boobytraps.
[...]
A lot of the CRU code is clearly composed of hacks. Hacks are informal, off-the-cuff solutions that programmers think up on the spur of the moment to fix some little problem. Sometimes they are so elegant as to be awe inspiring and they enter programming lore. More often, however, they are crude, sloppy and dangerously unreliable. Programmers usually use hacks as a temporary quick solution to a bottleneck problem. The intention is always to come back later and replace the hack with a more well-thought-out and reliable solution, but with no formal project management and time constraints it’s easy to forget to do so. After a time, more code evolves that depends on the existence of the hack, so replacing it becomes a much bigger task than just replacing the initial hack would have been.
(One hack in the CRU software will no doubt become famous. The programmer needed to calculate the distance and overlapping effect between weather monitoring stations. The non-hack way to do so would be to break out the trigonometry and write a planned piece of code to calculate the spatial relationships. Instead, the CRU programmer noticed that that the visualization software that displayed the program’s results already plotted the station’s locations so he sampled individual pixels on the screen and used the color of the pixels between the stations to determine their location and overlap! This is a fragile hack because if the visualization changes the colors it uses, the components that depend on the hack will fail silently.)
For some choice comments excerpted from a code file, see here.
Critique #4: Practices of publication of data, metadata, and code (that had gained traction in other disciplines)
When McIntyre wanted to replicate MBH98, he emailed Mann asking for his data and code. Mann, though initially cooperative, soon started trying to fed McIntyre off. Part of this was because he thought McIntyre was out to find something wrong with his work (a well-grounded suspicion). But part of it was also that his data and code were a mess. He didn't maintain them in a way that he'd be comfortable sharing them around to anybody other than an already sympathetic academic. And, more importantly, as Mann's colleague Stephen Schneider noted, nobody asked for the code and underlying data during peer review. And most journals at the time did not require authors to submit or archive their code and data at the time of submission or acceptance of their paper. This also closely relates to Critique #3: a requirement or expectation that one's data and code would be published along with one's paper might make people more careful to follow good coding practices and avoid using various "tricks" and "hacks" in their code.
Here's how Andrew Montford puts it in The Hockey Stick Illusion:
The Hockey Stick affair is not the first scandal in which important scientific papers underpinning government policy positions have been found to be non-replicable – McCullough and McKitrick review a litany of sorry cases from several different fields – but it does underline the need for a more solid basis on which political decision-making should be based. That basis is replication. Centuries of scientific endeavour have shown that truth emerges only from repeated experimentation and falsification of theories, a process that only begins after publication and can continue for months or years or decades thereafter. Only through actually reproducing the findings of a scientific paper can other researchers be certain that those findings are correct. In the early history of European science, publication of scientific findings in a journal was usually adequate to allow other researchers to replicate them. However, as science has advanced, the techniques used have become steadily more complicated and consequently more difficult to explain. The advent of computers has allowed scientists to add further layers of complexity to their work and to handle much larger datasets, to the extent that a journal article can now, in most cases, no longer be considered a definitive record of a scientific result. There is simply insufficient space in the pages of a print journal to explain what exactly has been done. This has produced a rather profound change in the purpose of a scientific paper. As geophysicist Jon Claerbout puts it, in a world where powerful computers and vast datasets dominate scientific research, the paper ‘is not the scholarship itself, it is merely advertising of the scholarship’.b The actual scholarship is the data and code used to generate the figures presented in the paper and which underpin its claims to uniqueness. In passing we should note the implications of Claerbout’s observations for the assessment for our conclusions in the last section: by using only peer review to assess the climate science literature, the policymaking community is implicitly expecting that a read-through of a partial account of the research performed will be sufficient to identify any errors or other problems with the paper. This is simply not credible. With a full explanation of methodology now often not possible from the text of a paper, replication can usually only be performed if the data and code are available. This is a major change from a hundred years ago, but in the twenty-first century it should be a trivial problem to address. In some specialisms it is just that. We have seen, however, how almost every attempt to obtain data from climatologists is met by a wall of evasion and obfuscation, with journals and funding bodies either unable or unwilling to assist. This is, of course, unethical and unacceptable, particularly for publicly funded scientists. The public has paid for nearly all of this data to be collated and has a right to see it distributed and reused. As the treatment of the Loehle paper shows,c for scientists to open themselves up to criticism by allowing open review and full data access is a profoundly uncomfortable process, but the public is not paying scientists to have comfortable lives; they are paying for rapid advances in science. If data is available, doubts over exactly where the researcher has started from fall away. If computer code is made public too, then the task of replication becomes simpler still and all doubts about the methodology are removed. The debate moves on from foolish and long-winded arguments about what was done (we still have no idea exactly how Mann calculated his confidence intervals) onto the real scientific meat of whether what was done was correct. As we look back over McIntyre’s work on the Hockey Stick, we see that much of his time was wasted on trying to uncover from the obscure wording of Mann’s papers exactly what procedures had been used. Again, we can only state that this is entirely unacceptable for publicly funded science and is unforgiveable in an area of such enormous policy importance. As well as helping scientists to find errors more quickly, replication has other benefits that are not insignificant. David Goodstein of the California Insitute of Technology has commented that the possibility that someone will try to replicate a piece of work is a powerful disincentive to cheating – in other words, it can help to prevent scientific fraud.251 Goodstein also notes that, in reality, very few scientific papers are ever subject to an attempt to replicate them. It is clear from Stephen Schneider’s surprise when asked to obtain the data behind one of Mann’s papers that this criticism extends into the field of climatology.d In a world where pressure from funding agencies and the demands of university careers mean that academics have to publish or perish, precious few resources are free to replicate the work of others. In years gone by, some of the time of PhD students might have been devoted to replicating the work of rival labs, but few students would accept such a menial task in the modern world: they have their own publication records to worry about. It is unforgiveable, therefore, that in paleoclimate circles, the few attempts that have been made at replication have been blocked by all of the parties in a position to do something about it. Medical science is far ahead of the physical sciences in the area of replication. Doug Altman, of Cancer Research UK’s Medical Statistics group, has commented that archiving of data should be mandatory and that a failure to retain data should be treated as research misconduct.252 The introduction of this kind of regime to climatology could have nothing but a salutary effect on its rather tarnished reputation. Other subject areas, however, have found simpler and less confrontational ways to deal with the problem. In areas such as econometrics, which have long suffered from politicisation and fraud, several journals have adopted clear and rigorous policies on archiving of data. At publications such as the American Economic Review, Econometrica and the Journal of Money, Credit and Banking, a manuscript that is submitted for publication will simply not be accepted unless data and fully functional code are available. In other words, if the data and code are not public then the journals will not even consider the article for publication, except in very rare circumstances. This is simple, fair and transparent and works without any dissent. It also avoids any rancorous disagreements between journal and author after the event. Physical science journals are, by and large, far behind the econometricians on this score. While most have adopted one pious policy or another, giving the appearance of transparency on data and code, as we have seen in the unfolding of this story, there has been a near-complete failure to enforce these rules. This failure simply stores up potential problems for the editors: if an author refuses to release his data, the journal is left with an enforcement problem from which it is very difficult to extricate themselves. Their sole potential sanction is to withdraw the paper, but this then merely opens them up to the possibility of expensive lawsuits. It is hardly surprising that in practice such drastic steps are never taken. The failure of climatology journals to enact strict policies or enforce weaker ones represents a serious failure in the system of assurance that taxpayer-funded science is rigorous and reliable. Funding bodies claim that they rely on journals to ensure data availability. Journals want a quiet life and will not face down the academics who are their lifeblood. Will Nature now go back to Mann and threaten to withdraw his paper if he doesn’t produce the code for his confidence interval calculations? It is unlikely in the extreme. Until politicians and journals enforce the sharing of data, the public can gain little assurance that there is any real need for the financial sacrifices they are being asked to accept. Taking steps to assist the process of replication will do much to improve the conduct of climatology and to ensure that its findings are solidly based, but in the case of papers of pivotal importance politicians must also go further. Where a paper like the Hockey Stick appears to be central to a set of policy demands or to the shaping of public opinion, it is not credible for policymakers to stand back and wait for the scientific community to test the veracity of the findings over the years following publication. Replication and falsification are of little use if they happen after policy decisions have been made. The next lesson of the Hockey Stick affair is that if governments are truly to have assurance that climate science is a sound basis for decision-making, they will have to set up a formal process for replicating key papers, one in which the oversight role is peformed by scientists who are genuinely independent and who have no financial interest in the outcome.
Montford, Andrew (2011-06-06). The Hockey Stick Illusion (pp. 379-383). Stacey Arts. Kindle Edition.
[QUESTION]: What are your views on climate change, and how did you form them?
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
I have written a couple of blog posts on my understanding of climate forecasting, climate change, and the Anthropogenic Global Warming (AGW) hypothesis (here and here). I also laid down the sources I was using to inform myself here.
I think one question that a number of readers may have had is: given my lack of knowledge (and unwillingness to undertake extensive study) of the subject, why am I investigating it at all, rather than relying on the expert consensus, as documented by the IPCC that, even if we're not sure is correct, is still the best bet humanity has for getting things right? I intend to elaborate on the reasons for taking a closer look at the matter, while still refraining from making the study of atmospheric science a full-time goal, in a future post.
Right now, I'm curious to hear how you formed your views on climate change. In particular, I'm interested in answers to questions such as these (not necessarily answers to all of them, or even to only these questions).
- What are your current beliefs on climate change? Specifically, would you defer to the view that greenhouse gas forcing is the main source of long-term climate change? How long-term? Would you defer to the IPCC range for climate sensitivity estimates?
- What were your beliefs on climate change when you first came across the subject, and how did your views evolve (if at all) on further reading (if you did any)? (Obviously, your initial views wouldn't have included beliefs about terms like "greenhouse gas forcing" or "climate sensitivity").
- What are some surprising things you learned when reading up about climate change that led you to question your beliefs (regardless of whether you changed them)? For instance, perhaps reading about Climategate caused you to critically examine your deference to expert consensus on the issue, but you eventually concluded that the expert consensus was still right.
- If you read my recent posts linked above, did the posts contain information that was new to you? Did any of this information surprise you? Do you think it's valuable to carry out this sort of exercise in order to better understand the climate change debate?
Carbon dioxide, climate sensitivity, feedbacks, and the historical record: a cursory examination of the Anthropogenic Global Warming (AGW) hypothesis
Note: In this blog post, I reference a number of blog posts and academic papers. Two caveats to these references: (a) I often reference them for a specific graph or calculation, and in many cases I've not even examined the rest of the post or paper, while in other cases I've examined the rest and might even consider it wrong, (b) even for the parts I do reference, I'm not claiming they are correct, just that they provide what seems like a reasonable example of an argument in that reference class.
Note 2: Please see this post of mine for more on the project, my sources, and potential sources for bias.
In a previous post, I attempted simple time series forecasting for temperature from the outside view, i.e., as a complete non-expert would. I introduced carbon dioxide concentrations as an explanatory variable near the end of the post, but did not consider in detail the mechanisms through which carbon dioxide concentrations affect temperature. In this post, I switch to what Eliezer Yudkowsky has called the weak inside view. That's something like the inside view, but without knowledge of all the relevant details. Obviously, I'm somewhat constrained here: I can't take the full inside view because I don't know enough about the atmospheric system (partly because the state of human knowledge about the atmospheric system is incomplete, and partly because I know only a very miniscule fraction even of that small amount of human knowledge). But I think that the weak inside view also offers an alternate perspective to the inside view and is valuable in its own right.
One of the reasons I felt the need to switch from the outside view is that the issue is sufficiently complex, but at the same time, the component phenomena are sufficiently well-enumerated that a weak inside view can help. My initial framing of the issue was in terms of separating the roles of theory and evidence in belief in anthropogenic global warming. But a better weak inside view led me to the conclusion that most of the debate didn't center around the greenhouse effect or the level of direct radiative forcing at all; rather, it focused on the magnitude of positive feedback to greenhouse gas forcing (and therefore, the value of climate sensitivity) and the attribution of recent warming to greenhouse gas forcing versus other phenomena (such as the Pacific Decadal Oscillation and variation in solar activity). Although the theory-versus-evidence framing is still illuminative, I felt that a serious exploration of the issue would have to take at least a cursory look at the leading competing hypotheses.
My approximate takeaway
- Overall, I feel that there is considerable uncertainty about the level of positive feedback and therefore about climate sensitivity (the magnitude of temperature increase that would result from a doubling of atmospheric carbon dioxide). The estimates supported by skeptics fall at the low end of the range of uncertainty, and the stories they tell are all quite plausible and consistent with the science. But the IPCC estimate range already includes (or just barely misses) the skeptic range (1.5-4.5 for the IPCC versus 1.3-2 for the main skeptic blogs). While the stories put forth by skeptics are consistent with the science, other stories, including stories of substantially larger warming than the median estimate put forth by the IPCC, seem consistent as well. I don't see strong evidence that the median estimate of the IPCC (3 C) is wrong (or evidence that it's right).
- I do think that the IPCC consensus estimates underestimate the probability of lower warming, i.e., the models have too narrow a range, at least on the low-warming side. I don't have sufficient knowledge of whether they are underestimating the probability of high climate sensitivities as well. That could well be the case.
- The political, institutional and bureaucratic incentives and constraints of the players involved in the debate did inform my views, but since my overall conclusion is so fuzzy anyway, it probably didn't affect my bottom line. Again, for simplicity, I avoid explicit discussion of these in the post. I might discuss them in a later post.
- I think that temperature trends in the coming 10-15 years will allow us to improve our estimates of climate sensitivity considerably. If warming continues to be as slow over the next 15 years as it has been over the past 15 years, I would incline to the low climate sensitivity estimates put forth by skeptics. If the warming trend returns to the 1978-1998 rate, then I would incline to medium or high climate sensitivity estimates. It would be nice to operationalize this and come up with statements like "If we see less than this much warming over the next few years, then I'll reduce my confidence in the model by this much" but I don't think my mastery of the numbers is good enough to make that sort of statement.
Okay, now on to the stuff!
An overview of the AGW hypothesis
The Anthropogenic Global Warming (AGW) hypothesis can be broken down into three simple steps:
- Human activity, specifically emissions of greenhouse gases (particularly carbon dioxide), is responsible for increasing the concentration of greenhouse gases, specifically carbon dioxide, in the atmosphere.
- The increased concentration of carbon dioxide causes the earth to trap more solar radiation, and thereby causes the earth to become warmer than it otherwise would (aka the greenhouse effect).
- Over the decadal to centenial timescale, temperature exhibits positive feedbacks, i.e., slight increases in temperature beget further increases in temperature. Two examples of such positive feedbacks are the water vapor feedback and the ice-albedo feedback (both described later in the post). As a result, the actual level of warming would be more than predicted directly from increasing trapping of radiation.
There are other aspects commonly associated with the AGW hypothesis, such as the view that at the current margin, more warming increases the frequency of extreme-weather events. For simplicity, I will not discuss these in the post. Of course, for an evaluation of the impact of global warming on the environment or on society, an examination of this aspect would matter considerably.
Preliminary question: Does it make sense to talk of global temperature?
How do we measure whether the earth system is warming? In my previous post, I considered global mean surface temperatures, as measured through many different proxies. At the time, I wasn't concerned with the meaning of those temperatures, because the purpose was simply to use them in time series forecasting. Now that we are getting to the mechanisms involved, the significance of mean surface temperatures starts becoming more relevant.
A paper by Christopher Essex, Bjarne Andressen, and Ross McKitrick in the Journal of Non-equilibrium Thermodynamics takes issue with the very idea of a global mean surface temperature. My first reaction to the paper was one of skepticism. Surely, it's not wrong to use the average to keep track of how temperatures are changing? It turned out that the paper covered most of my prima facie concerns. Specifically:
- I already agreed with the point made in the paper that the global mean surface temperature has no intrinsic physical meaning, whereas the average energy in the system might. But I had thought that ceteris paribus, changes in either reflect changes in the other. The paper made some arguments against that view (specifically, noting that pressure changes also matter and were often of comparable magnitude of or larger than temperature changes). I don't think I understand the science well enough to offer clear judgment on this point.
- The paper argued that weather phenomena are driven by temperature gradients more than by absolute temperatures, and changes in the mean temperature are a poor way of tracking temperature gradients. This is something I hadn't thought about explicitly, though I do expect that temperature levels have some effect on the type of temperature gradient phenomena we might see. But it's a point worth noting that focusing on the mean temperature may be a poor way of thinking about the actual weather phenomena at hand.
- The paper pointed out that there are many averaging choices than the simple mean, each of which has slightly different justification, and that the same data could show a warming or cooling trend based on the choice of averaging process used. I agreed with this, but I didn't think it affected the real-world temperature record.
- However, the authors used real-world temperature data and two actual choices of means to show how they could produce opposite trends. My main concern is that the two choices of means the authors chose may have been cherry-picked (one of them used negative exponents).
Overall, the paper raises some interesting points, but I'm not convinced. I'm still mulling over it, but in the meantime, I will operate within the paradigm where global mean surface temperatures are a meaningful indicator of how warm the world is. In doing so, I am deferring to both conventional wisdom and my own crude prior intuition that standardized averages carry some sort of value.
An overview of #1: Are carbon dioxide concentrations increasing, and is human activity responsible?
This thesis does not seem controversial. Measured carbon dioxide concentrations have been increasing according to a wide range of measurements, the most famous and reliable of which is the Keeling curve, based on continuous measurements since 1958 at Hawaii:
![]()
There are also ways of attempting to reconstruct historic carbon dioxide levels in the atmosphere using proxies, and the general view is that carbon dioxide levels started rising around the time of the Industrial Revolution, and the rate of change was unprecedented. In a blog post attempting to compute equilibrium climate sensitivity, Jeff L. finds that the 1832-1978 Law Dome dataset does a good job of matching atmospheric carbon dioxide concentration values with the Mauna Loa dataset for the period of overal (1958-1978), so he splices the two datasets for his (note: commenters to the post pointed out many problems with it, and while I don't know enough to evaluate it myself, my limited knowledge suggests that the criticisms are spot on; however, I'm using the post just for the carbon dioxide graph):
![]()
Overall, the story checks out at every level:
- Prior to the advent of fossil fuels, the main source of carbon dioxide emissions in the atmosphere was the oxidation of food. But this food had to be prepared through processes that used carbon dioxide in the same amounts (i.e., photosynthesis). So the level of carbon dioxide was regulated in that fashion, and remained stable.
- With the advent of fossil fuels, "food" that had been prepared a long while ago and over millions of years was being released to the atmosphere in a short span of years. Thus, the release of carbon dioxide to the atmosphere was greater than the ability to absorb it back.
- Accounting for the changes in carbon dioxide concentrations shows that carbon dioxide concentrations have risen by a level about half of what emissions are pumping into the atmosphere. This is consistent with the idea that natural sinks (such as plants and the ocean) are still siphoning away some of the excess carbon dioxide, but not all of it.
As far as I understand, these facts are not in much dispute, though there is some uncertainty regarding the timescale over which the excess carbon dioxide will eventually be relinquished by the atmosphere. Could it be centuries or millennia? Either way, it probably doesn't affect decadal predictions.
Here's what Skeptical Science (that, despite its name is a website devoted to criticism of global warming skepticism) says:
There are many lines of evidence which clearly show that the atmospheric CO2 increase is caused by humans. The clearest of these is simple accounting - humans are emitting CO2 at a rate twice as fast as the atmospheric increase (natural sinks are absorbing the other half). There is no question whatsoever that the CO2 increase is human-caused. This is settled science.
Global warming skeptic Dr. Roy Spencer describes his agreement with the general consensus as follows:
8 ) Is Atmospheric CO2 Increasing? Yes, and most strongly in the last 50 years…which is why “most” climate researchers think the CO2 rise is the cause of the warming. Our site measurements of CO2 increase from around the world are possibly the most accurate long-term, climate-related, measurements in existence.
9) Are Humans Responsible for the CO2 Rise? While there are short-term (year-to-year) fluctuations in the atmospheric CO2 concentration due to natural causes, especially El Nino and La Nina, I currently believe that most of the long-term increase is probably due to our use of fossil fuels. But from what I can tell, the supposed “proof” of humans being the source of increasing CO2 — a change in the atmospheric concentration of the carbon isotope C13 — would also be consistent with a natural, biological source. The current atmospheric CO2 level is about 390 parts per million by volume, up from a pre-industrial level estimated to be around 270 ppm…maybe less. CO2 levels can be much higher in cities, and in buildings with people in them.
10) But Aren’t Natural CO2 Emissions About 20 Times the Human Emissions? Yes, but nature is believed to absorb CO2 at about the same rate it is produced. You can think of the reservoir of atmospheric CO2 as being like a giant container of water, with nature pumping in a steady stream into the bottom of the container (atmosphere) in some places, sucking out about the same amount in other places, and then humans causing a steady drip-drip-drip into the container. Significantly, about 50% of what we produce is sucked out of the atmosphere by nature, mostly through photosynthesis.
Quantifying the responsiveness of temperature to carbon dioxide concentrations: equilibrium climate sensitivity and transient climate response
- In a simple model involving the sun, the earth, and an atmosphere with some concentration of carbon dioxide, the equilibrium temperature attained (measured in the Kelvin scale) is proportional to the logarithm of the concentration of carbon dioxide (note that there are other greenhouse gases for which the dependence has a more complicated functional form). Therefore, the additive change in equilibrium temperature is proportional to the logarithmic of the multiplicative change in the concentration of carbon dioxide. For instance, here's Wikipedia.
- It is reasonable to extrapolate from this that, even in the presence of feedbacks, the relationship between carbon dioxide concentration and temperature remains logarithmic. The coefficient just gets appropriately scaled. This makes sense because feedback mechanisms generally operate proportionally.
- Thus, the following question makes sense: if we double atmospheric carbon dioxide concentrations, what is the additive effect on temperature? The answer to that question is sometimes termed the equilibrium climate sensitivity (ECS).
- Note that the warming to reach equilibrium climate sensitivity doesn't happen immediately, so even after carbon dioxide concentrations double, it could take several decades for the temperature to warm to the new equilibrium. The term for the amount by which temperature goes up on a doubling of carbon dioxide, if carbon dioxide is rising at 1% is transient climate response (TCR). This is usually over half the ECS but still well short of the full ECS (I'm currently too lazy to fish for more information on the relation between TCR and ECS).
In this post, when I talk of "climate sensitivity" I am by default referring to ECS. Note, however, that as a general rule, models that have higher ECS will also have higher TCR.
An overview of #2: the greenhouse effect
As far as I understand, the basic mechanics of the greenhouse effect are not in dispute, nor are the numerical estimates of how much warming there would be without feedbacks. On the American Geophysical Union blog, Dan Satterfield writes:
Climate sensitivity is an important and often poorly understood concept. Put simply, it is usually defined as the amount of global surface warming that will occur when atmospheric CO2 concentrations double. These estimates have proven remarkably stable over time, generally falling in the range of 1.5 to 4.5 degrees C per doubling of CO2.* Using its established terminology, IPCC in its Fourth Assessment Report slightly narrowed this range, arguing that climate sensitivity was “likely” between 2 C to 4.5 C, and that it was “very likely” more than 1.5 C.
The wide range of estimates of climate sensitivity is attributable to uncertainties about the magnitude of climate feedbacks (e.g., water vapor, clouds, and albedo). Those estimates also reflect uncertainties involving changes in temperature and forcing in the distant past. But based on the radiative properties, there is broad agreement that, all things being equal, a doubling of CO2 will yield a temperature increase of a bit more than 1 C if feedbacks are ignored.
Skeptical Science says:
Climate sensitivity describes how sensitive the global climate is to a change in the amount of energy reaching the Earth's surface and lower atmosphere (a.k.a. a radiative forcing). For example, we know that if the amount of carbon dioxide (CO2) in the Earth's atmosphere doubles from the pre-industrial level of 280 parts per million by volume (ppmv) to 560 ppmv, this will cause an energy imbalance by trapping more outgoing thermal radiation in the atmosphere, enough to directly warm the surface approximately 1.2°C. However, this doesn't account for feedbacks, for example ice melting and making the planet less reflective, and the warmer atmosphere holding more water vapor (another greenhouse gas).
An overview of #3: feedbacks
This is where things get most interesting. Both empirically and theoretically, there is good reason to believe that over the decadal to centennial time scale, the climate system exhibits positive feedback to temperature changes. So if carbon dioxide levels double and cause a direct increase of about 1 C, the actual increase, accounting for positive feedbacks, would be more.
Three feedback mechanisms often mentioned are:
- Water vapor feedback (positive): When the temperature rises, this increases the amount of water vapor the atmosphere can hold, and therefore also increases the amount of water vapor the atmosphere does hold. Water vapor is a greenhouse gas, and therefore absorbs more heat, causing the temperature to rise further. This is a positive feedback loop because an increase in temperature facilitates a further increase in temperature.
- Ice-albedo feedback (positive): Cooling causes more water to freeze, increasing the fraction of the surface covered with ice. Ice reflects more heat, therefore resulting in less absorption of heat by the earth, causing the temperature to drop further. This is a positive feedback loop because a decrease in temperature facilitates a further decrease in temperature.
- Cloud feedback (uncertain, generally believed to be positive over the decadal/centennial time scale): Changes in the temperature and water vapor level can result in changes in the amount and composition of the cloud cover. The cloud cover plays an important role in how much sunlight enters the atmospheric system and gets absorbed.
The levels of all three feedback mechanisms are uncertain. For water vapor feedback and ice-albedo feedback, the sign is clear, but the magnitude is unclear. Cloud cover feedback is uncertain in both sign and magnitude. Put together, climate scientists generally believe that feedbacks are positive, but are uncertain as to their magnitude.
Where do skeptics stand in relation to the IPCC here? As quoted above, the IPCC estimates climate sensitivity of 1.5 to 4.5 C, with a median estimate of about 3 C. The interval offered by skeptics is narrower and on the lower end, but even among skeptics, the view that feedback is net negative seems a minority view. For instance, browsing the climate sensitivity category on Watts Up With That?, a top climate skeptic blog, I found references to papers estimating climate sensitivity values of 1.3 C, 1.8 C, and 2 C.
UPDATE: I found a blog post by Pat Michaels here that suggests that recent published estimates (since 2010) of climate sensitivity have been around the 2 C median mark. The infographic is below. I don't know how reliable this data is (Michaels is a global warming skeptic who has received funding from fossil fuel industries, but this infographic doesn't seem that easy to fudge). Alternative sources are welcome.

Where do climate sensitivity estimates come from?
On the climate system side, the main source of difference in opinion on the amount of global warming that will unfold seems to be due to difference in beliefs about climate sensitivity. (There's another source of uncertainty, namely the level of emissions going into the future. We'll ignore this aspect in the post, though again, from the "what can/should we do about global warming" angle, that becomes quite relevant).
Broadly, there does not seem to be a single compelling theoretical argument for a particular climate sensitivity estimate. So the case for a particular value or range of climate sensitivity generally rests on what my friend Jonah Sinick has called many weak arguments. In principle, many weak arguments should work better in demonstrating facts about the climate system than one relatively strong argument. Of course, the arguments shouldn't be so weak that they basically collapse.
So what are the weak arguments in favor of a particular value of range of climate sensitivity, such as the middle of the IPCC range?
Here's what Skeptical Science says:
Some global warming 'skeptics' argue that the Earth's climate sensitivity is so low that a doubling of atmospheric CO2 will result in a surface temperature change on the order of 1°C or less, and that therefore global warming is nothing to worry about. However, values this low are inconsistent with numerous studies using a wide variety of methods, including (i) paleoclimate data, (ii) recent empirical data, and (iii) generally accepted climate models.
Data on sensitivity to greenhouse gas forcing in recent times is relatively limited (or more specifically, as I pointed out in my earlier post, the recent data alone paint a very inconclusive picture).
3(i): The use of data from alternate sources of radiative forcing
A lot of exercises that attempt to estimate climate sensitivity do so by looking at other sources of forcing, such as volcanic eruptions (that produce a cooling effect) and variations in solar activity. With the key assumption that the magnitude of the feedback does not depend on the source of forcing, estimates for the size of the feedback in these cases can be used to estimate the climate sensitivity. This assumption is the mainstream view. Skeptical Science says:
In other words, if you argue that the Earth has a low climate sensitivity to CO2, you are also arguing for a low climate sensitivity to other influences such as solar irradiance, orbital changes, and volcanic emissions. In fact, as shown in Figure 1, the climate is less sensitive to changes in solar activity than greenhouse gases. Thus when arguing for low climate sensitivity, it becomes difficult to explain past climate changes. For example, between glacial and interglacial periods, the planet's average temperature changes on the order of 6°C (more like 8-10°C in the Antarctic). If the climate sensitivity is low, for example due to increasing low-lying cloud cover reflecting more sunlight as a response to global warming, then how can these large past climate changes be explained?
In particular, one of the lines of evidence for current consensus values of climate sensitivity is historical data on the level of warming or cooling in response to forcings due to volcanic eruptions or variations in solar activity.
The point about the nature of feedbacks being independent of whether the radiative forcing is due to solar activity or carbon dioxide concentrations or volcanic eruptions has been disputed by some. See, for instance, here and here. I'm not qualified to judge the validity of these objections.
3(ii): Direct estimation of greenhouse gas forcing from the recent temperature record, and alleged confounding by other factors
The simplest model would presume that the trend of rising temperatures 1975-1998 can be attributed primarily to greenhouse gas forcing. If we attribute it all to greenhouse gas forcing, we get fairly high estimates for climate sensitivity. If, on the other hand, we attribute it to a mix of greenhouse gas forcing and other factors (discussed below) we get climate sensitivities at the low end of the scale. In neither case is there a dispute over the existence of the greenhouse gas effect, or even over the existence of feedbacks. But there is dispute over how much of the already observed temperature rise can be attributed to greenhouse gas forcing.
The lack of a single compelling explanation for the recent pause (or slowdown) in global warming (i.e., the very slow rate of warming since about 1998) is the main Achilles heel of this theory. Note that 1998 in itself was an unusually warm year due to the El Nino, and the lack of warming for a few years after that was not surprising, but the lack of warming after many years is a puzzle. Climate scientists often call it the problem of the "missing heat" (the global mean surface temperature being taken as a proxy index for heat, though the paper questioning global mean surface temperature raises questions about the connection). Fabius Maximus lists a number of possible reasons here. For most of these reasons, it seems the case that if the temperature fails to grow for another 10-15 years, climate sensitivity estimates would need serious downward estimation.
3(ii) alternate theory (a): The oceans: deep oceans as sinks for the missing heat, the Pacific Decadal Oscillation (PDO), and Atlantic Multidecadal Oscillation (AMO)
The Pacific Decadal Oscillation has a positive phase, that leads to warming, and a negative phase, that leads to cooling. Historical data on the PDO isn't too great, but each phase (positive and negative) is believed to last about 20-30 years. Don Easterbrook identified the phase dates as follows: 1915-1945 and 1979-1998 for positive phases, and 1880-1915, 1945-1977, 1999-2014 for cooling phases. He also showed that the phases he had identified were consistent with the temperature trends: warming occurred during the positive phases and cooling occurred during the negative phases. Easterbrook doesn't seem to give much weight to the overall secular trend arising from greenhouse gas forcing, but it's easy to modify this to incorporate a stronger role for greenhouse gas forcing, as follows.
According to this theory, the observed temperature increase during the PDO's positive phase is a combination of a secular trend of rising temperature due to greenhouse gas forcing, and the increase in temperature due to the PDO being in positive phase. When the PDO entered negative phase, the greenhouse gas forcing continued, but the PDO negative phase was now acting in the opposite direction, resulting in relatively stable temperatures. If we use a time period where the PDO was in positive phase and do not control for the PDO, then we'll overestimate climate sensitivity. If we use a time period where the PDO was in negative phase and do not control for the PDO, then we'll underestimate climate sensitivity (or may even ignore the secular trend of warming completely because it is successfully masked by the phase of the PDO).
One of the big arguments in favor of the PDO hypothesis is that it does a better job of explaining the pause (or slowdown) in global warming. Models based purely on greenhouse gas forcing didn't predict the pause, but models based on the PDO did (though, of course, such models would need to make accurate predictions of the starting and ending years of the phases of the PDO, and I haven't been able to track down explicit predictions made when the PDO was in positive phase about when it would switch to negative phase).
The Atlantic Multidecadal Oscillation (AMO) is somewhat similar to the PDO in ways relevant to the above discussion, though probably also different. The upshot is that the phase of the PDO/AMO may be controlling the rate of growth of global mean surface temperatures.
One of the common problems pointed out with the PDO/AMO theory is that ocean currents only move heat around. They can't change the total heat in the system. So, how could they affect the global mean surface temperature? For instance, here is Skeptical Science's take on the PDO.
Kevin Trenberth (who, many years ago, wondered in emails, later leaked by Climategate, about where the missing heat was going) has postulated that what the PDO/AMO do is to move heat down into the deep oceans, where it doesn't show up in mean surface temperature measurements. The idea that the missing heat goes into the deep oceans was pointed out in a LessWrong comment as well as by an atmospheric science student in private correspondence. This is listed as (6) in Fabius Maximus' list. It has been elaborated on in a paper titled An apparent hiatus in global warming by Kevin Trenberth and John T. Fasullo. Maximus also links to Trenberth's article Has Global Warming Stalled? in The Conversation. Judith Curry blogged about another related paper co-authored by Trenberth here. [Note: My understanding of the papers co-authored by Trenberth may be quite inaccurate].
If heat is being transferred to the deep oceans due to the PDO/AMO, global warming will probably be back, with a vengeance, once the phase of the PDO/AMO changes. If the heat transfer is for reasons that aren't governed by these oscillations, then heat may keep sinking into the oceans for a very long time. The oceans certainly have the thermal capacity to absorb all the excess heat, but whether they will actually do so is unclear.
A somewhat different view of the PDO/AMO is described in a paper by Marcia Wyatt and Judith Curry. They call their view the stadium wave hypothesis. From what I can understand, the PDO and AMO are both manifestations of a stadium wave that takes a long time to propagate. I am not clear on the differences between the stadius wave hypothesis and Trenberth's deep ocean sink hypothesis as far as forecasts of future global mean surface temperatures are concerned.
Final note: Over the centennial time scale, the PDO-based model would predict that the temperature trend would be an additive superimposition of a PDO-based sinusoidal trend and a greenhouse gas forcing-based secular linear trend. At any given time, the bulk of the year-on-year change would be driven by the PDO phase, but over the centennial time scale, the role of carbon dioxide would dominate.
3(ii) alternate theory (b): variation in solar activity
Another theory, offered both in some mainstream quarters to explain the recent slowdown/pause in global warming and offered by some skeptics as an alternative theory to greenhouse gases to explain global warming, is that variation in solar activity are driving some of the year-to-year variation in temperature. As we can see the NASA's sunspot cycle page, solar activity can be described as a combination of many cycles with different periods, the most notable of which is the 11-year cycle. But the heights of the peaks aren't the same for all cycles, and the most recent peaks have been less high than earlier ones (see also here and here). The reduced recent activity after high activity in the recent past has been attributed to one of the hypothesized longer solar cycles, namely the 210-year old Seuss cycle (aka the de Vries cycle). Note that the magnitude and time length of these longer solar cycles remain speculative because we don't have enough data to be sure.
Overall, the sun might offer a weak explanation for the recent slowdown in warming (namely, the fact that the peak in 2014 was smaller than the peak around 2003) but otherwise, it does not fit temperature patterns well. Here is Skeptical Science on the sun:
Over the last 35 years the sun has shown a slight cooling trend. However global temperatures have been increasing. Since the sun and climate are going in opposite directions scientists conclude the sun cannot be the cause of recent global warming.
The only way to blame the sun for the current rise in temperatures is by cherry picking the data. This is done by showing only past periods when sun and climate move together and ignoring the last few decades when the two are moving in opposite directions.
3(iii): Climate models
The last of the three reasons Skeptical Science offers for taking the median IPCC climate sensitivity estimates seriously is that climate models predict that sort of sensitivity. I give very little weight to this reason, because the climate models have not done a good job of forecasting (see here). To the very limited extent that they have been able to forecast anything at all during the period of fast warming (1975-1998), a simple theory that "it's going to warm" could have done about as well.
I'm not claiming that climate models are of no potential use, just that they are not strong enough to provide additional evidence in favor of a hypothesis that is, in some sense, built into the assumptions of those models in the first place. If the models are validated against empirical data (through measurement of their forecast skill relative to simple persistence-based models or random walks with drift), then I'll accept them as additional evidence. At present, they are neither here nor there.
Piecing together the evidence
Overall, the case for the median IPCC estimate of 3 C seems reasonably strong as a median estimate, but the range of uncertainty is high. I believe that the IPCC confidence interval is too narrow, particularly at the low end (i.e., I would put a higher probability of zero feedback than the IPCC model does). I haven't investigated the arguments for estimates at the high end, so I'm not sure if the probably of high sensitivities has been overestimated and underestimated.
The main reason is the combined evidence from 3(i) and 3(ii). Though both are individually weak (because of the problems mentioned), in concert, they provide a resonably compelling case.
While some of the evidence for 3(ii) will be collected naturally over the next few years, the case for 3(i) is less clear. How compelling are the arguments against the view that the level of feedback is independent of the source of forcing? And how reliable is the historical data that is used to estimate the level of feedback? If the evidence of 3(ii) weakens further, a closer examination of 3(i) would be warranted.
Finally, climate models aren't good enough right now, but they could well become better (I discussed the challenges of decadal forecasting in this post). If a climate model, with appropriate initialization, is able to make skilled forecasts for the next few years, I'd give a lot more weight to what it has to say about the next few decades. However, it's worth noting that the autocorrelation in climate makes the forecasting challenges for the near future different from those for the far future. So successful climate models aren't in my view a necessary condition for demonstrating a particular climate sensitivity, but they would be a powerful source of evidence if they did work.
Looking for feedback
Since I'm quite new to climate science and (largely, though not completely) new to statistical analysis, it's quite possible that I made some elementary errors above. Corrections would be appreciated.
It should be noted that when I say a particular work has problems, it is not a definitive statement that that work is false. Rather, it's simply a statement of my impression, based on a cursory analysis, that describes the amount of credibility I associate with that work. In many cases, I'm not qualified enough to offer a critique with high confidence.
[QUESTION]: LessWrong web traffic data?
I remember that LessWrong used to track its web traffic through through SiteMeter, but it was removed in November 2013 due to issues. Historical data about LessWrong is still available through SiteMeter, but the data stretches back only 12 months. I'm not sure if the data October 2013 and earlier is reliable either. Still, it was something that anybody could access.
Cookie Checker shows that LessWrong is currently using Google Analytics to track traffic. Does anybody know who maintains the Google Analytics account, and whether the data can be shared? While I'm mainly interested in seeing the data for myself, others here might also be interested in it, so it might be nice if a public summary of visit and pageview counts could be shared.
Time series forecasting for global temperature: an outside view of climate forecasting
Note: In this blog post, I reference a number of blog posts and academic papers. Two caveats to these references: (a) I often reference them for a specific graph or calculation, and in many cases I've not even examined the rest of the post or paper, while in other cases I've examined the rest and might even consider it wrong, (b) even for the parts I do reference, I'm not claiming they are correct, just that they provide what seems like a reasonable example of an argument in that reference class.
Note 2: Please see this post of mine for more on the project, my sources, and potential sources for bias.
As part of a review of forecasting, I've been looking at weather and climate forecasting. I wrote one post on weather forecasting and another on the different time horizons for weather and climate forecasting. Now, I want to turn to long-range climate forecasting, for motivations described in this post of mine.
Climate forecasting is turning out to be a fairly tricky topic to look into, partly because of the inherent complexity of the task, and partly because of the politicization surrounding Anthropogenic Global Warming (AGW).
I decided to begin with a somewhat "outside view" approach: if you were simply given a time series of global temperatures, what sort of patterns would you see? What forecasts would you make for the next 100 years? The forecast can be judged against a no-change forecast, or against the forecasts put out by the widely used climate models.
Below is a chart of four temperature proxies since 1880, courtesy NASA:

The Hadley Centre dataset goes back to 1850. Here it is (note that the centrings on the temperature axis are slightly different, because we are taking means of slightly different sets of numbers, but we are anyway interested only in the trend so that does not matter) (source):
![]()
Eyeballing, there does seem to be a secular trend of increase in the temperature data. Perhaps the naivest way of calculating the rate of change is to calculate (final temperature - initial temperature)/(time interval) to calculate the annual rate of change. Using that method, we get a temperature increase of about 0.54 degrees Celsius per century.
But just using final and initial temperatures overweights those two values and ignores the data in the other temperature readings. A somewhat more sophisticated approach (albeit still a pretty unsophisticated approach) is a linear regression model. I was wondering whether I should download the data and run a linear regression, but I found a picture of the regression online (source):

Note that the regression line starts off a little lower than the actual temperature in 1850, and also ends a little lower than the actual temperature in the 2000s. The rate of growth seems even less here (about 0.4 degrees Celsius per century). The reason the regression gives a lower rate than simply using initial and final temperatures is that the temperature growth since the 1970s has been well above trend, and those well-above-trend temperatures are given more weight if we just use final temperature than if we fit to a regression line.
Linear plus periodic?
Another plausible story that seems to emerge from eyeballing the model is that the temperature trend is the sum of an approximately linear trend and a periodic trend, given by something like a sine wave. I found one analysis of this sort by DocMartyn on Judith Curry's blog, and another in a paper by Syun Akasofu (note: there seem to be some problems with both analyses; I am linking to them mainly as simple examples of the rough nature of this sort of analysis, not as something to be taken very seriously). Note that both of these do more complicated things than look purely at temperature trends. While DocMartyn explicitly introduces carbon dioxide as the source of the linear-ish trend, Akasofu identifies "recovery from the little Ice Age" as the source of the linear-ish trend and the Pacific Decadal Oscillation as the source of the sinusoidal trend (but as far as I can make out, one could use the same graph and argue that the linear trend is driven by carbon dioxide).
Here's DocMartyn's forecast:

Here's Akasofu's picture:

Autocorrelation and random walks
Simple linear regression is unsuitable for time series forecasting for variables that exhibit autocorrelation: the value in any given year is correlated to the value the previous year, independent of any long-term trend. As Judith Curry explains here, autocorrelation can create an illusion of trends even when there aren't any. (This may seem a bit counterintuitive: if only temperature levels, and not temperature trends, exhibit the autocorrelation, i.e., if temperature is basically a random walk, then why should we see spurious trends? So read the whole post). Not only can apparent spurious linear-looking trends be detected, so can apparent spurious cyclical trends (see here).
Unfortunately, I don't have a good understanding of the statistical tools (such as ARIMA) that one would use to resolve such questions. I am aware of a few papers that have tried to demonstrate that, despite the appearance of a linear trend above, the temperature series is more consistent with a random walk model. See, for instance, this paper by Terence Mills and the literature it references, many of which seem to come to conclusions against a clear linear trend. Mills also published a paper in the Journal of Cosmology here that's ungated and seems to cover similar ground, but the Journal of Cosmology is not such a high-status journal, so the publication of the paper there should not be treated as giving it more authority than a blog post.
Linear increase is consistent with very simple assumptions about carbon dioxide concentrations and the anthropogenic global warming hypothesis
Here's a simple model that would lead to temperature increases being linear over time:
- The only secular trend in temperature occurs from radiative forcing due to a change in carbon dioxide concentration.
- The additive increase in temperature is proportional to the logarithm of the multiplicative increase in atmospheric carbon dioxide concentration (Wikipedia).
- About 50% of carbon dioxide emissions from burning fossil fuels is retained by the atmosphere. The magnitude of carbon dioxide emissions is proportional to world GDP, which is growing exponentially, so emissions are growing exponentially, and therefore, the total carbon dioxide concentration in the atmosphere is also growing exponentially.
Apply a logarithm to an exponential, and you get a linear trend line in temperature.
(As we'll see, while this looks nice on paper, actual carbon dioxide growth hasn't been exponential, and actual temperature growth has been pretty far from linear. But at least it offers some prima facie plausibility to the idea of fitting a straight line).
Turning on the heat: the time series of carbon dioxide concentrations
So how have carbon dioxide concentrations been growing? Since 1958, the Mauna Loa observatory in Hawaii has been tracking atmospheric carbon dioxide concentrations. The plot of the concentrations is termed the Keeling curve. Here's what it looks like (source: Wikipedia):
![]()
The growth is sufficiently slow that the distinction between linear, quadratic, and exponential isn't visible to the naked eye, but if you look carefully, you'll see that growth from 1960 to 1990 was about 1 ppm/year, whereas growth from 1990 to 2010 was about 2 ppm/year. Unfortunately the Mauna Loa data go back only to 1958. But there are other data sources. In a blog post attempting to compute equilibrium climate sensitivity, Jeff L. finds that the 1832-1978 Law Dome dataset does a good job of matching atmospheric carbon dioxide concentration values with the Mauna Loa dataset for the period of overal (1958-1978), so he splices the two datasets for his (note: commenters to the post pointed out many problems with it, and while I don't know enough to evaluate it myself, my limited knowledge suggests that the criticisms are spot on; however, I'm using the post just for the carbon dioxide graph):
![]()
Note that it's fairly well-established that carbon dioxide concentrations in the 18th century, and probably for a few centuries before that, were about 280 ppm. So even if the specifics of the Law Dome dataset aren't reliable, the broad shape of the curve should be similar. Notice that the growth from 1832 to around 1950 was fairly slow. In fact, even from 1900 to 1940, the relatively fastest-growing part of the period, carbon dioxide concentrations grew by only 15 ppm in 40 years. From what I can judge, there seems to have been an abrupt shift around 1950, to a rate of about 1 ppm/year. A linear or exponential curve doesn't explain the shift. And as noted earlier, the rate of growth seems to have gone up a lot around 1990 again, to about 2 ppm/year. The reason for the shift around 1950 is probably post-World War II global economic growth, including industrialization in the now-becoming-independent colonies, and the reason for the shift around 1990 is probably the rapid take-off of economic growth in India, combined with the acceleration of economic growth in China.
To the extent that the AGW hypothesis is true, i.e., the main source of long-term temperature trends is radiative forcing based on changes to carbon dioxide concentrations, perhaps looking for a linear trend isn't advisable, because of the significant changes to the rate of carbon dioxide growth over time (specifically, the fact that carbon dioxide concentrations don't grow exponentially, but have historically exhibited a piecewise growth pattern). So perhaps it makes sense to directly regress temperature against the logarithm of carbon dioxide concentration? Two such exercises were linked above: DocMartyn on Judith Curry's blog, and a blog post attempting to compute equilibrium climate sensitivity by Jeff L. Both seem like decent first passes but are also problematic in many ways.
One of the main problems is that the temperature response to carbon dioxide concentration changes doesn't all occur immediately. So the memoryless regression approach used by Jeff L., that basically just asks how correlated temperature in a given year is with carbon dioxide concentrations in that year, fails to account for the fact that temperature in a given year may be influenced by carbon dioxide concentrations over the last few years. Basically, there could be a lag between the increase in carbon dioxide concentrations and the full increase in temperatures.
Still, the prima facie story doesn't seem to be boding well for the AGW hypothesis:
- Carbon dioxide concentrations have not only been rising, they've been rising at an increasing rate, with notable changes in the rate of increase around 1950 and then again around 1990.
- Temperature exhibits fairly different trends. It was about flat from 1945-1978, then grew very quickly around 1978-1998, and then has been about flat (with a very minor warming trend) 1998-present.
So, even a story of carbon dioxide with lag doesn't provide a good fit for the observed temperature trend.
There are a few different ways of resolving this. One is to return to the point made earlier about how the actual temperature is a sum of the linear trend (driven by greenhouse gas forcing) plus a bunch of periodic trends, such as those driven by the PDO, AMO, and solar cycles. This sort of story was described by DocMartyn on Judith Curry's blog and in the paper by Syun Akasofu referenced above.
Another common explanation is that the 1945-1978 non-warming (and, according to some datasets, moderate cooling) is explained by the increased concentration of aerosols that blocked sunlight, and that therefore canceled the warming effect of carbon dioxide. Indeed, in the early 1970s, there were concerns about global cooling due to aerosols, but there were also a few voices that noted that over the somewhat longer run, as aerosol concentrations were controlled better, the greenhouse effect would dominate and we'd see rapid temperature increases. And given the way temperatures unfolded in the 1980s and 1990s, the people who were calling for global warming in the 1970s seemed unusually prescient. But the pause (or at any rate, significant slowdown) in warming after 1998, despite the fact that the rate of carbon dioxide emissions has been accelerating, suggests that there's more to the story than just aerosols and carbon dioxide.
UPDATE: Some people have questioned whether there was a pause or slowdown at all, and whether using 1998 as a start year may be misguided because it was an unusually hot year due to a strong El Nino. 1998 was unusually hot, and indeed the lack of warming relative to 1998 for the next few years was explainable in terms of 1998 being an anomaly. But the time period since then is sufficiently long that the slowness of warming can't just be explained by 1998 being very warm. For a list of the range of explanations offered for the pause in warming, see here.
Should we start using actual climate science now?
The discussions above were very light on both climate science theory and heavybrow statistical theory. We just looked at global temperature and carbon dioxide trends, eyeballed the graphs, and tried to reason what sort of growth patterns were there. We didn't talk about what the theory says, what independent lines of evidence there are for it, what sort of other indicators (such as regional temperatures) might be used to test the theory, and what historical (pre-1800) data can tell us.
A more serious analysis would consider all these. But here is what I believe: if a more complicated model cannot consistently beat out simple models such as those based on persistence, random walk, simple linear regression, random walk with drift, etc., then the model has not really arrived at prime time for forecasting. There may still be insights to be gleaned from the model, but its ability to forecast the future is not one of its selling points.
The history of climate modeling so far suggests that such success has been elusive (see this draft paper by Kesten C. Green, for instance). In hindsight from a 1990s vantage point, those in the 1970s who bucked the "global cooling" trend and argued that the greenhouse effect would dominate seemed very prescient. But the considerable slowdown of warming starting around 1998, even as carbon dioxide concentrations grew rapidly, took them (and many others) by surprise. We should keep in mind that there are many stories in financial markets of trading strategies that appear to have been successful for long periods of time, far exceeding what chance alone might suggest, but then suddenly stop working. The financial markets are different from the climate (in that there are humans competing, and eating away at each other's strategies) but the problem still remains that something (like "the earth is warming") may have been true over some decades for reasons quite different from those posited by people who successfully predicted them.
Note that even without the ability to make accurate or useful climate forecasts, many tenets of the AGW hypothesis may hold, and may usefully inform our understanding of the future. For instance, it could be that the cyclic trends and sources of random variation are bigger than we thought, but the part of the increase in temperatures due to increasing carbon dioxide concentrations (measured using the transient climate response or the equilibrium climate sensitivity) is still quite large. Which basically means we will see (large increase) + (large variation). In which case the large increase still matters a lot, but would be hard to detect using climate forecasting, and would be hard to use to make better climate forecasts. But if that's the case, then it's important to be all the more sure of the other lines of evidence that are being used to attain the equilibirum climate sensitivity estimate. More on this later.
Critique of insularity
I want to briefly mention a critique offered by forecasting experts J. Scott Armstrong and Kesten Green (I mentioned both of them in my post on general-purpose forecasting and the associated community). Their Global Warming Audit (PDF summary, website with many resources) looks at many climate forecasting exercises from the outside view, and finds that the climate forecasters pay little attention to general forecasting principles. One might detect a bit of a self-serving element here: Armstrong isn't happy that the climate forecasters are engaging in such a big and monumental exercise without consulting him or referring to his work, and an uncharitable reading is that he is feeling slighted at being ignored. On the other hand, if you believe that the forecasting community has come up with valuable insights, their critique that climate forecasters didn't even consider the insight obtained by the forecasting community in their work is a fairly powerful criticism. (Things may have changed somewhat since Armstrong and Green originally published their critique). Broadly, I agree with some of Amstrong and Green's main points, but I think their critique goes overboard in some ways (to quite an extent, I agree with Nate Silver's treatment of their critique in Chapter 12 of The Signal and the Noise). But more on that later. Also, I don't know how representative Armstrong and Green are of the forecasting community in their view on the state of climate forecasting.
I have also heard anecdotal evidence of similar critiques of insularity from statisticians, geologists, and weather forecasters. In each case, the claim has been that the work in climate science relied on methods and insights better developed in the other disciplines, but the climate scientists did not adequately consult experts in those domains, and as a result, made elementary errors (even though these errors may not have affected their final conclusions). I currently don't have a clear picture of just how widespread this criticism is, and how well-justified it is. I'll be discussing it more in future posts, not so much because it is directly important but because it gives us some idea of how authoritative to consider the statements of climate scientists in domains where direct verification or object-level engagement is difficult.
Looking for feedback
Since I'm quite new to climate science and (largely, though not completely) new to statistical analysis, it's quite possible that I made some elementary errors above. Corrections would be appreciated.
It should be noted that when I say a particular work has problems, it is not a definitive statement that that work is false. Rather, it's simply a statement of my impression, based on a cursory analysis, that describes the amount of credibility I associate with that work. In many cases, I'm not qualified enough to offer a critique with high confidence.
Climate science: how it matters for understanding forecasting, materials I've read or plan to read, sources of potential bias
As part of a review of forecasting, I've been looking at weather and climate forecasting (I wrote one post on weather forecasting and another on the different time horizons for weather and climate forecasting).
Climate forecasting is turning out to be a fairly tricky topic to look into, partly because of the inherent complexity of the task, and partly because of the politicization surrounding Anthropogenic Global Warming (AGW).
Due to the complexity and the potential for bias, I decided to disclose what materials I've read and my potential sources of bias.
Why am I looking at climate forecasting?
Climate forecasting, and the debate surrounding what'll happen to the climate and how human choices today can shape it, is one of the biggest examples of a long-range forecasting effort that has attracted widespread attention, both in terms of the science and the policy and political implications. Understanding how it was done can give insights into the ability of humans to make forecasts about the long-run future (on the decadal or centennial timescale) in the face of considerable uncertainty, and use those forecasts to drive decisions today. This would be relevant for other long-range forecasting problems, such as (possibly) friendly AI. Note though that my focus isn't driven by finding parallels with any other specific forecasting problem, such as friendly AI.
The sorts of questions I hope to answer by the end of this inquiry
The following are questions to which I hope to state relatively clear answers by the end:
- How good are we at climate forecasting?
- How good are we at knowing how good we are at climate forecasting? Are the forecasts appropriately calibrated, or do they tend to be overconfident or underconfident?
- Are climate forecasters using the best tools available to them from other domains (such as statistics, econometrics, forecasting, weather forecasting)? Are they using best practices in their efforts?
- What is the level of evidence regarding Anthropogenic Global Warming (AGW) and to what extent have the people generally deferred to as experts correctly weighed the evidence?
The following are questions to which I may not obtain clear answers, but I'll be looking for and reporting information on them because they influence the answers to the preceding questions:
- Given that climate forecasts, and the AGW hypothesis in particular, have been considered a basis for significant collective action (such as restricting emissions, or subsidies to alternative energy sources), there are obviously big political stakes in the outcome of the science. Oil and coal companies, particularly if they don't anticipate being easily able to diversify, stand to lose from policy measures, while nuclear, solar. and wind energy companies might gain. To what extent have these vested interests influenced the science?
- More generally, to what extent have people's beliefs about the possible political consequences about specific outcomes affected the science in ways that are not epistemically justified? For instance, do people who are more risk-averse tend to exaggerate the harms, so that they can convince a less risk-averse public to take action? Do people who view restrictions on carbon dioxide emissions as economically disastrous tend to downplay the scientific evidence for AGW in order to minimize the probability of emissions reduction legislation?
Sources
Courses or full-fledged reviews
- David Archer's global warming Coursera course (Archer is a climate scientist specializing in ocean-related stuff at the University of Chicago, and one of the bloggers at RealClimate).
Books about climate change aimed at a popular audience
- Six Degrees: Our Future on a Hotter Planet by Mark Lynas (Amazon, Wikipedia)): I only read the chapters about warming up to 3 degrees Celsius. The focus of my inquiry is the climate forecasting itself, not so much the consequences of it, but I did want to get a handle on what sorts of consequences people expect.
- Coming Climate Crisis? Consider the Past, Beware the Big Fix by Claire J. Parkinson (Amazon): I have read Chapters 1 and 5 so far, and intend to read/skim other chapters when writing about relevant material.
Books about specific controversies surrounding climate change
- The Hockey Stick Illusion: Climategate and the Corruption of Science by Andrew Montford (Amazon, Wikipedia): I read almost the whole book (skipping some pages of the last chapter). Despite the subtitle, the book is not about Climategate but rather about the debate surrounding the hockey stick graph. The graph is actually quite peripheral to the central debates of climate science, but the debate surrounding it provides important insight into the sociology of climate science and the IPCC process.
- The Climate Files: The Battle for the Truth about Global Warming by Fred Pearce (Amazon): I read the whole book.
Book chapters
- Chapter 12 of Nate Silver's The Signal and the Noise. This chapter is about climate science, and specifically about anthropogenic global warming. The book also has a chapter on weather forecasting that I read and used in an earlier post.
IPCC reports
- I read a large part of Chapter 8 of the IPCC 4th Assessment Report Working Group I on climate change models and their evaluation, and skimmed the corresponding chapter in the IPCC 5th Assessment Report Working Group I.
- I haven't read the other IPCC Working Group I chapters yet, but intend to do so where relevant.
- I don't think the other Working Groups of the IPCC are too relevant for my purposes. Also, I've heard that the quality of reports in the other Working Groups leaves a lot to be desired. But I might refer to those if I need to understand more about the policy implications.
Blogs and websites
I reference here only the blogs and websites I've identified as places to check out, rather than ones where I chanced upon an isolated blog post by link-traipsing or searching the web.
- Skeptical Science (website, Wikipedia): Unlike what the name suggests, it is not run by global warming skeptics but rather by people who seek to debunk global warming skepticism. I used this website mainly to understand both the standard arguments offered against the Anthropogenic Global Warming (AGW) hypothesis and the common mainstream rebuttals to these arguments. I found it to be a reasonably comprehensive compendium of arguments and rebuttals.
- Watts Up With That? (WUWT) (website, Wikipedia) run by Anthony Watts (Wikipedia): I used this website extensively to understand non-mainstream and skeptical perspectives on climate science, as well as some aspects of the source of rancor and skepticism expressed by outside-the-establishment bloggers. As a general rule, whenever looking up a topic, I searched for it on WUWT. I found WUWT to have fairly thorough and comprehensive coverage and the individual posts to be quite long and detailed, but not all posts should be treated as reliable or on par with a published article. Each post should be evaluated at the object level.
- Climate Audit (CA) (website, Wikipedia), run by Stephen McIntyre (Wikipedia): Although this too is labeled a skeptic site, it has much more limited scope. While WUWT covers any and all climate science-related topics and features guest posts from all sorts of people, CA is more focused on the modeling and statistical methods used in papers. As the name suggests, the purpose is more like an auditor than somebody attempting to sell a competing theory. I've used this to understand some of the controversies surrounding measurement, and get a sense of the politics and dynamics of disputes.
- RealClimate (website, Wikipedia): This was the web's first climate blog. In fact, it has been described in The Climate Files and The Hockey Stick Illusion as a way for climate scientists to regain control of the public debate in the face of all the Internet discussion among skeptics critiquing their papers. That being said, I didn't find the posts there very useful for understanding the issues involved. Part of it might be the combative tone used, part of it was the low frequency of posting, and part of it was the lack of mathematical detail accompanying many of the posts.
- Judith Curry's blog (website, Wikipedia on Curry): Curry is an interesting people because she identifies as a mainstream scientist but also engages with, and highly respects, the work of skeptic websites such as WUWT and CA.
Papers
I read many papers from a diverse array of sources. I arrived at most papers either by clicking links on one of the blogs or websites mentioned above, or using Google or Google Scholar searches for specific topics. Any paper that I use as input to my opinion in a specific post will be explicitly linked in that post.
Potential for bias and inaccuracy
- My political views lean libertarian, and although I don't think this affects my view of the plausibility of climate theories directly, it does affect the intellectual environment I operate in (less deferential to the mainstream consensus). I don't think this was an issue, since the list of sources I used were mostly derived using Google Search and Wikipedia as starting points, rather than my libertarian friends. But it might have affected me. Some people have also argued that since libertarians oppose heavy-handed government intervention, they have an incentive to not believe in anthropogenic global warming since it presents an "inconvenient truth" for their position.
- I don't know much about the subject. The above reading list is hardly enough to train myself in climate science. How might my lack of knowledge bias me? It might make me too sensitive to presentation. In particular, this may lead me to take positions espoused in skeptic blogs such as WUWT and CA more seriously: the authors combine (what seems to be) a careful examination of the data with a desire to get at the truth of whatever empirical issue they are investigating, and they share in considerable detail their thought process. In contrast, the Real Climate blog posts are more like announcements than investigations I feel part of. But this does not mean that WUWT or CA is more reliable, of course: the climate scientists blogging at Real Climate are more busy writing up stuff for publication than sharing it on blogs. Much as I might prefer the blogging culture to the paper-writing culture, I should avoid using this as an important input in my evaluation of the legitimacy of specific scientific claims.
Looking for suggestions
As always, I'm happy to hear suggestions. In particular, I am interested in suggestions on these fronts:
- Additional sources I should refer to
- Cautions or caveats for reading or interpreting the sources already on my list (or perhaps a suggestion to read some of the already listed sources more thoroughly)
- Other sources of bias I might have that I missed
- Potential ways to correct for my bias
Weather and climate forecasting: how the challenges differ by time horizon
Prelude: Climate change, in particular the question of anthropogenic global warming (AGW) is both an intellectually complex and a politically loaded topic. Politics has been called the mind-killer here. For a mix of both reasons (the intellectual complexity and the political loadedness), I hope to approach the issue in steps: I'll first lay out my (probably quite flawed, but hopefully still broadly correct) understanding of the scientific questions, and in subsequent posts, I'll tackle some of the trickier and more controversial questions. I'd appreciate any error corrections -- it'll help improve the accuracy of my subsequent posts.
In a previous post, I discussed weather forecasting through numerical weather simulation. With numerical weather simulation, we first construct a series of equations using the laws of physics that describe the evolution of the weather system. Then, we discretize the system in space and time (we break up the spatial region into a grid and we break the time into discrete time steps). We compute the evolution of the discretized system numerically. We tackle uncertainty in the measurement by computing several alternate scenarios and assigning probabilities to them.
Is this the way we predict long-term climate? Sort of, but not quite. The equations describing the evolution of the system are the same for weather and climate, and the only thing that's different in principle is the longer timescale. However, some mechanisms matter a lot in the short term, and others matter more in the long term.
Six time horizons for weather forecasting
There are qualitative differences between the challenges of forecasting for different time horizons. The set of time horizons spans a continuum, but to simplify the discussion, I'll identify five different types of time horizons:
- The very near future, i.e., the next half an hour to 2 hours. Weather forecasting for this time frame is sometimes called nowcasting.
- The period ranging from the next 1-2 days. This is sometimes called short-range weather forecasting and is generally quite reliable.
- The period ranging 3-14 days from the present. This is short-to-medium-range weather forecasting. The weather forecasts for up to a week show forecast skill (relative to the benchmarks of persistence and climatology), but the 7-14 day period is still being worked on, and forecast skill here is relatively small. Naive numerical weather simulations often show negative forecast skill, i.e., they do worse than climatology. However, multimember and multimodel ensemble forecasting can beat climatology by a small margin.
- Seasonal-to-interannual (SI) forecasting. This involves forecasting the seasons in the coming year and the year after that. Predictions are generaly vague, and are of the form "the average temperature this summer will be 0.1 degrees Celsius hotter than the historical average." This straddles the line between weather and climate forecasting: numerical weather simulation methods used for short-range and medium-range weather forecasting become too tricky, and the oceans start mattering more than the atmosphere.
- Decadal forecasting. This involves forecasting over a time period ranging from a few years out to a few decades ago.
- Centennial forecasting. This involves forecasting over the next century.
Three sources of uncertainty
NASA scientist and Real Climate blogger Gavin Schmidt identifies three sources of uncertainty in climate forecasting, as described by Nate Silver in Chapter 12 of The Signal and the Noise:
- Initial condition uncertainty: This form of uncertainty dominates short-term weather forecasts (though not necessarily the very short term weather forecasts; it seems to matter the most for intervals where numerical weather prediction gets too uncertain but long-run equilibrating factors haven't kicked in). Over timescales of several years, this form of uncertainty is not influential.
- Scenario uncertainty: This is uncertainty that arises from lack of knowledge of how some variable (such as carbon dioxide levels in the atmosphere, or levels of solar radiation, or aerosol levels in the atmosphere, or land use patterns) will change over time. Scenario uncertainty rises over time, i.e., scenario uncertainty plagues long-run climate forecasts far more than it plagues short-run climate forecasts.
- Structural uncertainty: This is uncertainty that is inherent to the climate models themselves. Structural uncertainty is problematic at all time scales to a roughly similar degree (some forms of structural uncertainty affect the short run more whereas some affect the long run more).
Different sorts of uncertainty emerge at different timescales: the atmosphere versus the ocean
Short-range weather forecasting (and most of medium-range weather forecasting, as far as I understand) basically involves modeling the behavior of the atmosphere. The standard approach of numerical weather simulation discretizes the three spatial dimensions of the atmosphere and chooses a discrete time step, then runs a simulation to figure out how the atmosphere will evolve.
Long-range weather and climate forecasting, ranging from SI forecasting to decadal forecasting to centennial forecasting, involves modeling the behavior of the oceans.
Why the distinction? The oceans have a thousand times the thermal capacity of the atmosphere, and they obviously contain a lot more of the water, so one would expect them to play a bigger role in temperature and precipitation over longer timescales. But the oceans also equilibrate more slowly. Some of the stabilizing currents in the ocean take centuries. The atmosphere is much more fast-moving. Thus, variation in the atmosphere dominates over shorter timescales. In particular, the initial conditions that matter in the short run are the initial conditions of the atmosphere, whereas the initial conditions that matter on the SI or decadal timescale are the initial conditions of the ocean. More information is in this overview provided by the UK Met Office.
Of course, the oceans aren't acting alone, and long-term changes to atmospheric composition (in particular, the increase in atmospheric concentrations of greenhouse gases such as carbon dioxide) can have significant effects on the climate. So what we need is a model (preferably a numerical simulation, though we might begin with statistical models) that considers the evolution of both the atmospheric and the ocean system, and the interaction between them. Such models are termed coupled models (i.e., coupled atmosphere-ocean models). The general term for the types of models used in long-range weather and climate prediction is general circulation model, so we'll call the coupled ones coupled general circulation models or coupled GCMs (as opposed to purely atmospheric GCMs).
SI Forecasting: of hot boys and cool girls
When it comes to Seasonal-to-Interannual forecasting, we have two possible benchmarks:
- Previous year's seasonal weather.
- Historical average climate for that season.
The forecast skill of any model can be measured in relation to either of these two benchmarks.
So what can a SI forecasting model do to improve on historical climate? Initial atmospheric conditions can have ballooning effects over short time ranges such as a week or two weeks, but over a month or two, we expect them to equilibrate. In other words, initial atmospheric conditions probably add little signal to our ability to predict the average temperature for forthcoming seasons. But ocean conditions do matter: there are seasonal currents in the ocean (and wind patterns that these ocean currents cause) and we can use the current condition of the oceans to make educated guesses about whether how the currents in coming seasons will differ from historical averages.
An example is the El Niño Southern Oscillation (ENSO) in the Pacific Ocean (off the South American coast). I actually don't understand the details much, but my rough understanding is that there are two phases: the warm water phase, called El Nino (Spanish for "the boy" and intended as a reference to Jesus Christ) and the cold water phase La Nina (Spanish for "the girl" and named as such simply as an appropriate counter-name to El Nino). When El Nino conditions prevail, they also cause a corresponding movement in the atmosphere called the Southern Oscillation (hence the name ENSO) and overall, we get warmer weather than we otherwise would. When La Nina conditions prevail, we get colder weather than we otherwise would. Successful prediction of whether a particular year will see a strong El Nino can help determine whether the weather will be warmer than usual. For instance, it's believed that a strong El Nino will develop this year, leading to warmer weather than usual (see here for instance; the canonical source for El Nino forecasts is the NOAA page, which, per the most recent update, forecasts a 70% probability of El Nino conditions this summer and a 80% probability of El Nino conditions this fall/winter).
For an overview on seasonal-to-interannual forecasting, see here.
Decadal forecasting
We noted above that the atmospheric conditions matter over the range of a few hours to a few weeks but the oceans have a longer memory. But even within the oceans, there are different types of currents and different phases and oscillations. At the very extreme are the stabilizing deep ocean currents, that take about a thousand years to run their course. But more relevant for decadal forecasting are the decadal and multidecadal oscillations. In particular, two oscillations are of particular importance:
- Pacific decadal oscillation (PDO): This is linked to the ENSO, but unlike the ENSO, which lasts for a short while, the PDO cycle is measured in decades (I couldn't get a clear picture about whether there is any regularity to the PDO cycle. Perhaps the issue hasn't been settled). The positive phase of the PDO is linked to warmer weather (similar to El Nino), and the negative phase of the PDO is linked to cooler weather (similar to La Nina).
- Atlantic multidecadal oscillation (AMO)
Apart from the oceans, two other factors that matter at the decadal level are atmospheric composition (specifically, greenhouse gas concentrations, since they affect the level of warming) and solar activity. Solar activity has its own cycles and phases, and therefore is (or might be) moderately predictable over the decadal timescale. Greenhouse gas concentrations don't change too fast, relative to the levels they already are, so they too can be predicted with reasonable confidence on the decadal timescale without needing to consider different scenarios about changes to emissions levels.
Finally, there are unpredictable events that can affect climate over decadal timescales. The classic example is volcanic eruptions. However, these are by nature unpredictable, so they limit the potential predictability of climate on a decadal timescale. Forecasts may be prepared conditional to the occurrence of such events, in addition to an unconditional forecast that assumes no such events.
For more information, see this overview provided by the UK Met Office or this overview of whether decadal forecasting can be skillful.
In what ways is decadal forecasting different from century-long forecasting and scenario analyses of the sort seen in IPCC reports?
As far as I can understand:
- Decadal forecasting is more sensitive to the initial condition of the oceans, in particular, the phases of the PDO and AMO.
- Very little of the uncertainty in decadal forecasting arises from uncertainty in estimates of the amount of carbon dioxide emissions over the coming years. This is because (a) it's unlikely that emissions will change drastically in a few years, (b) the amount of additional accumulated carbon dioxide over a few years would be quite small and have little effect on temperature predictions. Therefore, creating different scenarios for emission levels or other changes in human activity is unnecessary for forecasting at the decadal timescale. But obviously these become quite important at the centennial timescale.
Some terminology
If you plan to read stuff on weather and climate, you might encounter some terms that have technical meanings that are slightly more specific than you might naively expect. I'm listing a few below.
- Forcing (see here) refers to a change in the equilibrium weather or climate pattern due to something from outside the system. Examples of forcing include greenhouse gas forcing due to human emissions, forcing due to changes in solar activity, or forcing due to a volcanic eruption. This is contrasted with natural variability (which itself may be predictable or unpredictable depending on how reliably periodic it is).
- Initialization of a climate model (see here) refers to setting the initial values of variables in the model. For models that are used to make reliable forecasts, correct initialization matters. The variables for which correct initialization matters more depend on the time horizon over which we are forecasting. For forecasting over the SI or decadal timescale, initialization of the oscillatory phases of ocean currents matters, but it may not matter for the centennial timescale.
- Data assimilation (see here) refers to the process by which a climate model learns from existing data and observations of the current or past climate.
- Hindcast (Wikipedia) refers to a weather or climate forecast (using a model) for a historical period for which we already have climate data. The idea is that the hindcast is made without using the climate data it is trying to predict, and the accuracy of the hindcast can then be judged against the actual values. This allows us to estimate the forecast skill without having to wait several years. Hindcasting becomes more important for longer timescales, where we simply can't afford to run repeat experiments with actual forecasting. However, hindcasting suffers the problem that it's difficult to enforce the norm that the generation of hindcasts should be made without allowing the model to look at the data it is trying to predict. This becomes more of an issue for long-range forecasting, because even if the model does not explicitly use the data it is trying to predict, the researchers working on the model are implicitly aware of the information. For instance, a researcher working on a model that will be tested to produce a hindcast of the period 1985-1995 already knows what the climate in those years was like (if the researcher knows climate science at all). This problem is less pronounced for short-range forecasting, because a weather forecaster can credibly claim to not have known the weather for the particular region and day that his or her model hindcasted.
Futures studies: the field and the associated community
Note: This post is part of my series of posts on forecasting and related topics, related to my contract work for MIRI. While I think the post would interest a sufficiently large fraction of the LessWrong readership for it to be worth posting, it's not of as much general interest as some of my other recent posts. I do expect some of the information in the post to be new and useful to people who have hitherto associated futurology primarily with a few big names such as Ray Kurzweil, or are unaware of the distinction between the academic communities devoted to forecasting and futures studies.
This post is part of documentation of work I'm doing for MIRI on forecasting in various domains. This particular post is on the topic of futures studies, also known as futurology (Wikipedia). Futures studies is basically the study of the future, or rather, of possibilities for the future. I'd originally thought that the term would be futurism, but that term is used for an art movement.
What is futures studies, and how does it relate to forecasting?
Futures studies can crudely be thought of as forecasting the future, but while forecasting is an important component, futures studies often has the goal of guiding or shaping the future. It's called futures studies rather than future studies because it's the study of possible futures, how to cope with them, and possibly, how to bring them about (see, for instance, the futures techniques called backcasting and causal layered analysis).
Another difference between forecasting and futures studies is that while forecasting is typically done for specific timeframes, futures studies can be more amorphous: it can involve imagining and describing things that might exist in the future, without assigning either an expected time of arrival or a probability of realization. Futures studies can operate over time horizons ranging from 1-2 years to the very longest horizons of centuries that I discussed in a previous post. In general, the focus is not on getting the dates or probabilities right, but rather on describing what's possible.
Clearly, it's harder to evaluate the quality of work in futures studies compared to work in forecasting. With forecasting, the quality of forecasts can be judged by looking at the accuracy, bias, and utility to planners. Futures studies does not generally involve clear, falsifiable predictions. Thus, anybody can pontificate about the future and be labeled a futurologist, and it's not clear what criteria can be used to exclude the person from the futures studies community.
One way out is to start with an existing cluster of futures techniques, futures studies organizations, research centers, journals, and people, see what concepts they promote and what other entities they relate with. Indeed, there is a reasonably interconnected academic futures studies community and set of futures studies journals that can be used to bootstrap this process. It's not clear to me why an outsider (such as me) should grant credibility to this cluster. Indeed, unlike the case of the forecasting community (that I described in an earlier post), I haven't (yet) been convinced that this community as a whole deserves deference in questions of thinking about the future, relative to people outside this community who might spend time thinking about the relevant issues. Incidentally, an article by Michael Marien titled Futures-thinking and identity: Why “Futures Studies” is not a field, discipline, or discourse: a response to Ziauddin Sardar's ‘the namesake’ goes over the issue of the lack of clarity in what it means to be a futurologist.
Organizations and research centers involved with futures studies
- World Future Society (website, Wikipedia)
- Association of Professional Futurists (website, Wikipedia)
- World Futures Studies Federation (website, Wikipedia): Founded in 1973, the organization aims to promote futures studies as an academic discipline.
- The Graduate Institute of Futures Studies at Tamkang University, Taiwan (website). This publishes the Journal of Futures Studies.
- Hawaii Research Center for Futures Studies (website, Wikipedia)
- School of International Futures (website)
Wikipedia also has a list of research centers that includes MIRI, and a longer list of futures studies organizations.
Journals and magazines in futures studies
For a longer list, see here.
- Futures (The journal of policy, planning, and futures studies) (website, Wikipedia): Published by Elsevier, this seems to be the most serious academic journal in the futures studies domain. It has an impact factor of about 1.1. Most of the widely cited futures studies papers I encountered were in this journal.
- Journal of Futures Studies (website): This is associated with the Futures Studies program at Tamkang University, Taiwan. It has an impact factor of about 0.25.
- The Futurist, the magazine of the World Future Society (website)
- Technological Forecasting and Social Change (website, Wikipedia): Not strictly a futures studies journal, but close. This is the go-to journal for research on the Delphi method.
- foresight (website, Wikipedia) (not to be confused with Foresight: The International Journal of Applied Forecasting, a forecasting journal)
- European Journal of Futures Research
Websites (in addition to the organization websites)
- metafuture.org, a website run by Sohail Inayatullah and Ivana Milojevic.
- Introduction to Future Studies, maintained by Linda Groff and Paul Smoker.
Key people
- There is a fairly long and heterogeneous list on Wikipedia, but it includes a lot of historical figures (such as past science fiction writers) as well as journalists who are not necessarily the people at the cutting-edge of developing futures studies ideas today.
- The list of World Futures Studies Federation Fellows is a reasonable starting point if you are interested in current academics in futures studies. In addition, consider looking at the editors, editorial board, and writers for the Journal of Futures Studies and Futures.
- One name that repeatedly came up in my Internet search and exploration for futures studies was Sohail Inayatullah (Wikipedia). He introduced the concept of causal layered analysis (Wikipedia), co-runs the metafuture.org website, and seems to be involved with many of the futures studies journal. However, I'm not sure if his Internet prominence accurately reflects or overstates his intellectual contribution.
Other background reading
- Futures studies on Wikipedia (the page is unusually thorough)
- Futures techniques on Wikipedia (this page contains a fairly long list of methods, though not all methods have associated pages)
- Outline of futures studies on Wikipedia
Relation between futures studies and scenario planning
In an earlier blog post, I described scenario planning and what it's used for. How does this relate with futures studies? I couldn't get a clear answer, so the points below are just crude impressions of mine. They could be quite misguided.
- Most simply, scenario planning can be considered one of the techniques used in futures studies. Therefore, we can think of the study of scenario planning as a branch of futures studies.
- Historically, both scenario planning and futures studies arose out of the pioneering work of Herman Kahn at the RAND Corporation and later at the Hudson Institute.
- My impression (and I don't have high confidence in this) is that scenario planning ideas are more common in businesses, governments, and policy organizations, whereas futures studies ideas are more common among specific clusters in academia, science fiction, and entertainment. However, a reasonable fraction of futures studies literature is about the analysis of specific situations, such as agriculture or education in a particular country. So this difference may not be all that profound.
- Another impression I get is that futures studies is less moored in reality than scenario planning (this might be related to scenario planning being more established as something that businesses and governments deploy for practical purposes). I have lower confidence in the utility of futures studies as currently consituted than I do in scenario planning.
Relation between futures studies and forecasting (more)
The Wikipedia article on futures techniques lists a number of techniques for futures studies. Some of these are also of use in open-ended forecasting problems, but they are not of much use in short-term forecasting where the structure of the situation is well-understood and we need to predict a binary or continuous variable within that known structure. Some of the methods that overlap between medium-term forecasting and futures studies are:
- Delphi method (Wikipedia)
- Scenario analysis/scenario planning, as mentioned above.
The interaction/overlap between the forecasting community (as described in an earlier post) and the futures studies community seems minimal. There is some interaction: for instance, forecasting guru J. Scott Armstrong has written one article in the Journal of Futures Studies, and Armstrong's 1970 book Long-Range Forecasting is cited and used in futures studies programs.
Futurologists outside the futures studies cluster described here
There are many self-styled futurologists who fall outside the futures studies community as described here. Some of them belong to the scenario planning community, some belong to the forecasting community, some work in general predictive analytics, and some are subject matter experts in specific domains. A particular cluster of futurologists who fall outside the cluster described above is technology futurists. These are people enamored with technology, often technology as it relates to computers, automation, and the natural sciences. Indeed, I expect that most people at LessWrong are more familiar with this brand of futures scientists. Examples are:
- Ray Kurzweil, who runs kurzweilai.net and has an exponential progress-based view of the technological singularity (in contrast with the Yudkowsky view). Kurzweil has been evaluated on LessWrong before. Although Kurzweil appears on Wikipedia and elsewhere in the list of futurologists, he doesn't seem to be cited much in futures studies literature.
- Michio Kaku, a physicist who has written books about the future such as Physics of the Future.
- The website www.exponentialtimes.net.
Scenario planning, its utility, and its relationship with forecasting
Note: This post is part of my series of posts on forecasting and related topics, related to my contract work for MIRI. While I think the post would interest a sufficiently large fraction of the LessWrong readership for it to be worth posting, it's not of as much general interest as some of my other recent posts. If business strategy and planning don't interest you as topics, this post may not be for you.
I've been reviewing forecasting and related domains as part of contract work for MIRI. One of the related domains I'm looking at is scenario planning (Wikipedia), also known as scenario thinking or scenario analysis. To understand scenario planning, I read the book The Art of the Long View by Peter Schwartz (Amazon) (a fairly interesting read, though whether you consider it value-for-money depends on whether strategy planning and thinking about the future as a whole interest you) and his follow-up book Learnings from the Long View (Amazon). I also skimmed Scenario Planning in Organizations by Thomas J. Chermack (Amazon), which is somewhat drier but is more useful for people who have some understanding of scenario planning and want to implement it in full.
In this post, I discuss some aspects of scenario planning, but the post is not intended to be a standalone summary of scenario planning. For that purpose, I recommend reading the overviews of scenario planning listed below. In this post, I'll look at examples of scenario planning in diverse domains, and I'll consider the relationship between scenario planning and the more conventional, quantitative approach to forecasting.
Overviews of scenario planning
- Wikipedia page
- Scenario Planning: A Tool for Strategic Thinking by Paul J. H. Schoemaker.
- The origins and evolution of scenario techniques in long range business planning by Bradfield et al.
- For longer overviews, consider the books listed above.
Key people
- Herman Kahn (Wikipedia): He pioneered the use of scenario planning in the United States military and at the RAND Corporation, and he founded the Hudson Institute.
- Pierre Wack (Wikipedia): He spearheaded the use of scenario planning by Royal Dutch Shell in the early 1970s. This is claimed to have helped the company cope better with the OPEC oil shock and environmentalism than its competitors did. Shell was the first private company to use scenario planning in a big, systematic way.
- Arie de Geus (Wikipedia): He was the head of Shell Oil Company's Strategic Planning Group. He studied the history of scenario planning and found that its main utility arose from the decision-making processes following the scenario generation, not the scenarios themselves.
- Peter Schwartz (Wikipedia): He is the author of the books The Art of the Long View and Learnings from the Long View that describe scenario planning in detail, and also a co-founder of the Global Business Network (Wikipedia), a leading organization offering consulting and training services in scenario planning.
- Paul J. H. Schoemaker (Wikipedia): He is an academic studying strategic management and decision-making. He has devoted some research efforts to understanding scenario planning.
A quick summary of scenario planning
Scenario planning involves the generation of (usually two or three) scenarios for how the future might transpire. Here are some typical aspects of scenario planning:
- Historically, scenario planning has not been used to replace rigorous, quantitative forecasting. Rather, scenario planning in the corporate world arrived with the intention of replacing a mode of strategic planning where the whole company was supposed to believe in and bet upon a single Official Future. Scenario planning helped introduce the possibility of coping with uncertainty, through a route somewhat different from probabilistic forecasting. (See the last point, though).
- According to some scenario planning guidelines, including guidelines offered by Schwartz and Chermack, as well as a suggestion in Megamistakes (a book I blogged about a while back), the scenarios should be chosen to have approximately equal probability. At any rate, the probabilities do not differ by an order of magnitude. So, for instance, it's okay to take scenarios with probabilities of 25%, 35%, and 40%, but not okay to consider scenarios with probabilities of 1%, 10%, and 89%. It's unclear how hard-and-fast this rule is.
- Each scenario is described in considerable detail, along with early indicators that that is the relevant scenario. The idea is that, if we observe the early indicators of a particular scenario, we have higher confidence in that scenario than in the other scenarios.
- Ultra-optimistic and ultra-pessimistic scenarios are avoided. Even scenarios that unfold in optimistic ways or pessimistic ways usually incorporate some elements of pushback against the optimism or pessimism, as we might expect in real life.
- Schwartz describes scenario planning as combining predetermined elements and critical uncertainties. The predetermined elements are what allow us to build a storyline with reasonable confidence, without having to guess at every step. The critical uncertainties are what we vary between the different scenarios.
- The consideration of scenarios happens by looking at the conjunction of Social, Technological, Environmental, Economic, and Political (STEEP) factors, according to Schwartz. Other authors have used the acronym PESTEL for political, economic, socio-cultural, technological, environmental, and legal factors (see the Wikipedia page on environmental scanning). These factors may generate both predetermined elements and critical uncertainties for the scenario planning exercise.
- The point in time at which the scenarios diverge needs to be chosen carefully. In most real-world situations, the very near future can be forecast to a crude approximation with reasonable confidence. The divergence into scenarios is generally done from the point at which a clear forecast starts becoming difficult to build.
- One way of thinking of the above is in terms of trend analysis versus emerging issues analysis (Wikipedia). Trend analysis focuses on the trend (pattern of change) for the things that already exist and that we can observe and (roughly) measure or ballpark. Trend analysis is the domain where quantitative forecasting methods are more useful: we already have data from the past and we can extend that somewhat into the future. But some people, companies, ideas, memes, fashions, etc. that have near-zero influence today may emerge a few years down the line. These are sometimes called emerging issues and the identification of these is called emerging issues analysis. The point of divergence of the scenarios would therefore be as far in the future as it might take for an emerging issue to become a trend. In fast-changing and highly unpredictable domains, the scenarios may start diverging right away, because new issues can emerge anytime. In relatively stable domains, we would expect that for an issue to emerge and become a trend, it would take a few years.
- Usually, what transpires in the real world is not any one particular scenario, but a mix of elements from different scenarios.
- The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.
- The review article The origins and evolution of scenario techniques in long range business planning by Bradfield, linked from the overview section, identifies two schools of scenario planning: the Intuitive Logics school pioneered by Shell, and described by Schwartz and Chermack, and the probabilistic school pioneered by the RAND Corporation. The article notes that while the intuitive logics approach is the approach more commonly associated with scenario planning, the probabilistic school has also developed considerably. Indeed, some of the scenario planning examples we list below belong to the probabilist school.
Who uses scenario planning?
Scenario planning caught on after the 1973 oil shock; prior to that, only Shell and GE engaged in it (see the Bradfield et al. overview for more). A 1981 survey by Linneman and Klein found three predictors of whether a company used scenario planning:
- The size of the company: Larger companies were more likely to use scenario planning. Relatedly, in The Art of the Long View, Schwartz says that both large and small companies can benefit from scenario planning, but in different ways. Small companies need to use scenario planning mainly to get a sense for whether their overall business model will remain viable, whereas large companies need to make specific quantitative decisions, such as determine how much to invest in a particular product line. Larger companies therefore need to develop more detailed models, whereas for small companies, scenario planning serves to augment one's gut feel.
- The length of the planning horizon: Companies that planned for longer horizons were more likely to use scenario planning. This is consistent with the above observation that scenario planning starts becoming useful when the time horizon is long enough for new issues and players to emerge and for a tight quantitative forecast to therefore be impossible.
- The capital-intensivity of the industry: More capital-intensive industries are more likely to use scenario planning. This can be explained by the more long-term nature of capital investments and the difficulty of reallocating such investments quickly. In order to determine whether to make a big investment (such as an oil well or a factory) it's important to get a handle on the different possible scenarios and how profitable the investment would be in each.
Big successes of scenario planning
- As noted above, Royal Dutch Shell used scenario planning in the 1970s and this appears to have given the company an advantage over its competitors in coping with the OPEC oil shock and environmentalism. In the 1980s, scenario planners at Shell considered the possibility of the collapse of the Soviet Union when they were brainstorming ways that the price of oil might go down. Schwartz, who describes this in Art of the Long View, writes that at the time, European countries capped their Russian imports at 35% (for political, Cold War-related, reasons), keeping the price of oil high enough to make investing in some expensive oil wells worthwhile. But if the Soviet Union collapsed and the politically motivated caps were removed, then the price would fall, and the oil wells would become unprofitable. The team at Shell identified the possibility that the Soviet Union economy would collapse and that Gorbachev might lead a new country. They also surmised that even if the new economy didn't do well, it would become a corrupt crony-capitalist system and would not return to Leninism. With these scenarios in mind (even though they were not forecast as highly likely) Shell invested in fewer oil wells, and made more investment in technology that would keep the price of oil extraction low enough to keep the oil wells profitable even after a price collapse.
- In Art of the Long View and Learnings from the Long View, Peter Schwartz discusses a number of other examples of scenario planning. One example is an advertising firm in the early 1990s that is exposed to a scenario of broadband Internet and starts preparing for that scenario.
Some examples of scenario planning
I looked on the Internet for scenario planning writeups across diverse domains that were publicly available. I list some examples below. The focus is on what I was able to find on the Internet by using a range of search phrases, and the list below is unlikely to be representative of scenario planning.
- The book Learnings from the Long View by Schwartz reviews his scenario analyses from his earlier book The Art of the Long View. The Kindle edition of Learnings is priced at $2.99, so it might be worth buying even if you don't want to buy the earlier (and longer) book.
- The Global Scenario Group (website, Wikipedia) is a futures studies/scenario analysis group that publishes scenarios for the future. A number of papers with scenario analyses by them can be downloaded here.
- The Global Business Network (website (link not working), Wikipedia) does scenario planning for private sector companies, and also trains them in scenario analysis. The URL for their website (as listed on LinkedIn and Wikipedia) isn't working, and I don't know exactly what the situation with them is.
- Many of the reports published by the McKinsey Global Institute (website) or by other parts of McKinsey & Company use scenario analysis to explore different possibilities. For instance, here is a scenario analysis on Moore's Law from McKinsey & Company. Other consulting companies also often use scenario analyses in their reports.
- Climate change: Most of the high-profile analyses of climate change and the relationship with human activity (a two-way link) have used scenario analysis by considering different scenarios for economic growth, emissions levels, and climate sensitivity. Examples: IPCC report chapter, MIT paper on the influence of climate change on differing scenarios for future development.
- Energy: Another major (and closely related) area where scenario analysis is common is energy demand and supply. Because of how intricately energy is interwoven in the modern economy, creating scenarios for energy often requires creating scenarios for many aspects of the future. Shell (the organization to pioneer scenario analysis for the private sector, as described by Schwartz in his book) publishes some of its scenario analyses online at the Future Energy Scenarios page. While the understanding of future energy demand and supply is a driving force for the scenario analyses, they cover a wide range of aspects of society (the STEEP or PESTEL list). For instance, the New Lens Scenario published in 2012 described two candidate futures for how the world might unfold till 2100, a "Mountains" future where governments played a major role and coordinated to solve global crises, and an "Oceans" future that was more decentralized and market-driven. (For a critique of Shell's scenario planning, see here). Shell competitor BP also publishes an Energy Outlook that is structured more as a forecast than as a scenario analysis, but does briefly consider alternative assumptions in a fashion similar to scenario analysis.
- Macroeconomic and fiscal analysis: Budget projections made by the Congressional Budget Office (CBO) in the United States consider multiple scenarios along two dimensions: fiscal policy (tax and revenue) and other sources of varuation in economic growth. CBO projections are typically made for the scoring and analysis of specific fiscal proposals made by members of Congress. For instance, here are the CBO projections made in light of Congressman Paul Ryan's proposals. Scenario analysis is also a common tool in macroeconomic analysis. For instance, Moody's U.S. Macroeconomic Outlook Alternative Scenarios used scenario analysis to understand what awaits the United States economy. Unlike most scenario analyses, Moody's specified, for each scenario, its estimated numerical probability that reality would be better than the scenario. But the scenario analysis didn't hinge on believing the probability estimates.
- Land use and transportation analysis: The Federal Highway Administration in the United States uses scenario planning to understand different possibilities for future patterns of land use and transportation. See the scenario planning section of their website.
- Analysis of the technology sector: In 1986, Monthly Labor Review ran an article titled Computer manufacturing enters a new era of growth with a scenario analysis of different outcomes for the then nascent tech sector and its implications for productivity and employment. The article considered nine scenarios, obtained by combining three scenarios for the level of technological progress with three scenarios for the level of overall economic growth. In 2009, Leva et al. wrote a scenario analysis for the future of the Internet. They described four scenarios: (A) Wild and Free, (B) Isolated Walled Gardens, (C) Content-driven Overlays, (D) Device-Content Bundles. Also, here is a scenario analysis on Moore's Law from McKinsey & Company. Other consulting companies also often use scenario analyses in their reports.
- The future of medicine: Here is a scenario analysis considering four scenarios for the future of medicine: Sobriety in sufficiency, risk avoidance, technology on demand, and free market unfettered.
Does scenario planning involve forecasting?
There are many ways in which scenario planning overlaps with forecasting:
- The need to choose scenarios that have approximately equal probability, at least up to an order of magnitude, suggests that a scenario planning exercise implicitly includes a probability estimation exercise, even though the probability estimates are very imprecise. Note that the probabilistic school of scenario planning involves explicit probability estimates. Either way, scenario planning does involve making statements about what types of futures are likely and how likely they are.
- The clustering together of different sets of events into scenarios is an implicit conditional forecast. It says that if some events in the scenario occur, the scenario as a whole is likely to occur, and in particular, other events in the scenario are likely to occur. In particular, the identification of early indicators for each scenario is tantamount to giving rules for how to forecast the future by keeping an eye out for the early indicators.
- The scenarios in scenario analysis generally start diverging a little while after the present. The time period from the present to the point of divergence of the scenarios is the time period where we are essentially making a relatively tight forecast. The techniques used to determine what happens till the point of divergence are standard quantitative forecasting techniques.
- Even after the scenarios diverge, quantitative forecasting techniques may be used to fill in the relevant numbers in the various scenarios.
Evaluation of the utility of scenario planning (some random thoughts)
I wrote above: "The utility [of scenario planning] is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly."
What does research say about scenario planning? My impression is that there is very limited research, and it tends to be positive about the effect of scenario planning on long-range development, but there is considerable uncertainty. Bradfield et al. quote Schnaars as saying that there is "a small body of research based on empirical studies of related topics, which ‘offer some evidence as to the value of scenarios’ as a long range planning tool."
In general, evaluating scenario planning seems difficult, because unlike the case of forecasting, where the accuracy of the forecast is a good first proxy for utility, scenario planning does not lend itself to that sort of evaluation. The examples that Schwartz lists provide some anecdotal evidence. But there is the obvious selection versus treatment issue: maybe the pioneers of scenario planning were selected as the sort of people who could plan better for the future, and the scenario planning exercise itself wasn't useful. The selection versus treatment issue could be resolved by looking at whether business planning has become more efficient on the whole as scenario planning has come to be more widely used (because the continued spread of scenario planning is probably a treatment rather than a selection effect). But isolating scenario planning as a causal factor, when so many other aspects of business strategy are changing, is hard.
Another possible approach: how different would the world be today if people didn't use scenario planning? The IPCC reports on climate change may just consider one "official forecast" with a margin of uncertainty. Energy companies might still have an Official Future (again with an expressed margin of uncertainty, but without a discrete set of alternative futures). Based on my (non-expert) assessment, it seems that the quality of business strategy and policy insight in such a world would be worse than in the current world. But perhaps if scenario planning hadn't been developed, other ways of thinking about the future would have caught on more. The upshot is that I tentatively think scenario planning has been useful, but I don't see a clear way of demonstrating this in a scientifically rigorous manner, given the nature of the beast.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)