Comment author: Shri 25 July 2014 04:01:26AM *  3 points [-]

You may be interested in this white paper by a Google enginer using a NN to predict power consumption for their data centers with 99.6% accuracy.

http://googleblog.blogspot.com/2014/05/better-data-centers-through-machine.html

Looking at the interals of the model he was able to determine how sensitive the power consumption was to various factors. 3 examples were given for how the new model let them optimize power consumption. I'm a total newbie to ML but this is one of the only examples I've seen where: predictive model -> optimization.

Here's another example you might like from Kaggle cause-effect pairs challenge. The winning model was able to accurately classify whether A->B, or B->A with and AUC of >.8 , which is better than some medical tests. A writeup and code were provided by the top three kagglers.

http://clopinet.com/isabelle/Projects/NIPS2013/

Comment author: VipulNaik 25 July 2014 06:49:15AM 0 points [-]

Thanks, both of these look interesting. I'm reading the Google paper right now.

Claim: Scenario planning is preferable to quantitative forecasting for understanding and coping with AI progress

1 VipulNaik 25 July 2014 03:43AM

As part of my work for MIRI on forecasting, I'm considering the implications of what I've read up for the case of thinking about AI. My purpose isn't to actually come to concrete conclusions about AI progress, but more to provide insight into what approaches are more promising and what approaches are less promising for thinking about AI progress.

I've written a post on general-purpose forecasting and another post on scenario analysis. In a recent post, I considered scenario analyses for technological progress. I've also looked at many domains of forecasting and at forecasting rare events. With the knowledge I've accumulated, I've shifted in the direction of viewing scenario analysis as a more promising tool than timeline-driven quantitative forecasting for understanding AI and its implications.

I'll first summarize what I mean by scenario analysis and quantitative forecasting in the AI context. People who have some prior knowledge of the terms can probably skim through the summary quickly. Those who find the summary insufficiently informative, or want to delve deeper, are urged to read my more detailed posts linked above and the references therein.

Quantitative forecasting and scenario analysis in the AI context

The two approaches I am comparing are:

  • Quantitative forecasting: Here, specific predictions or forecasts are made, recorded, and later tested against what actually transpired. The forecasts are made in a form where it's easy to score whether they happened. Probabilistic forecasts are also included. These are scored using one of the standard methods to score probabilistic forecasts (such as logarithmic scoring or quadratic scoring).
  • Scenario analysis: A number of scenarios of how the future might unfold are generated in considerable detail. Predetermined elements, common to the scenario, are combined with critical uncertainties, that vary between the scenarios. Early indicators that help determine which scenario will transpire are identified. In many cases, the goal is to choose strategies that are robust to all scenarios. For more, read my post on scenario analysis.

Quantitative forecasts are easier to score for accuracy, and in particular offer greater scope for falsification. This has perhaps attracted rationalists more to quantitative forecasting, as a way of distinguishing themselves from what appears to be the more wishy-washy realm of unfalsifiable scenario analysis. In this post, I argue that, given the considerable uncertainty surrounding progress in artificial intelligence, scenario analysis is a more apt tool.

There are probably some people on LessWrong who have high confidence in quantitative forecasts. I'm happy to make bets (financial or purely honorary) on such subjects. However, if you're claiming high certainty while I am claiming uncertainty, I do want to have odds in my favor (depending on how much confidence you express in your opinion), for reasons similar to those that Bryan Caplan described here.

Below, I describe my reasons for preferring scenario analysis to forecasting.

#1: Considerable uncertainty

Proponents of the view that AI is scheduled to arrive in a few decades typically cite computing advances such as Moore's law. However, there's considerable uncertainty even surrounding short-term computing advances, as I described in my scenario analyses for technological progress. When it comes to the question of progress in AI, we have to combine uncertainties in hardware progress with uncertainties in software progress.

Quantitative forecasting methods, such as trend extrapolation, tend to do reasonably well, and might be better than nothing. But they are not foolproof. In particular, the impending death of Moore's law, despite the trend staying quite robust for about 50 years, should make us cautious about too naive an extrapolation of trends. Arguably, simple trend extrapolation is still the best choice relative to other forecasting methods, at least as a general rule. But acknowledging uncertainty and considering multiple scenarios could prepare us a lot better for reality.

In a post in May 2013 titled When Will AI Be Created?, MIRI director Luke Muehlhauser (who later assigned me the forecasting project) looked at the wide range of beliefs about the time horizon for the arrival of human-level AI. Here's how Luke described the situation:

To explore these difficulties, let’s start with a 2009 bloggingheads.tv conversation between MIRI researcher Eliezer Yudkowsky and MIT computer scientist Scott Aaronson, author of the excellent Quantum Computing Since Democritus. Early in that dialogue, Yudkowsky asked:

It seems pretty obvious to me that at some point in [one to ten decades] we’re going to build an AI smart enough to improve itself, and [it will] “foom” upward in intelligence, and by the time it exhausts available avenues for improvement it will be a “superintelligence” [relative] to us. Do you feel this is obvious?

Aaronson replied:

The idea that we could build computers that are smarter than us… and that those computers could build still smarter computers… until we reach the physical limits of what kind of intelligence is possible… that we could build things that are to us as we are to ants — all of this is compatible with the laws of physics… and I can’t find a reason of principle that it couldn’t eventually come to pass…

The main thing we disagree about is the time scale… a few thousand years [before AI] seems more reasonable to me.

Those two estimates — several decades vs. “a few thousand years” — have wildly different policy implications.

After more discussion of AI forecasts as well as some general findings on forecasting, Luke continues:

Given these considerations, I think the most appropriate stance on the question “When will AI be created?” is something like this:

We can’t be confident AI will come in the next 30 years, and we can’t be confident it’ll take more than 100 years, and anyone who is confident of either claim is pretending to know too much.

How confident is “confident”? Let’s say 70%. That is, I think it is unreasonable to be 70% confident that AI is fewer than 30 years away, and I also think it’s unreasonable to be 70% confident that AI is more than 100 years away.

This statement admits my inability to predict AI, but it also constrains my probability distribution over “years of AI creation” quite a lot.

I think the considerations above justify these constraints on my probability distribution, but I haven’t spelled out my reasoning in great detail. That would require more analysis than I can present here. But I hope I’ve at least summarized the basic considerations on this topic, and those with different probability distributions than mine can now build on my work here to try to justify them.

I believe that in the face of this considerable uncertainty, considering multiple scenarios, and the implications of each scenario, can be quite helpful.

#2: Isn't scenario analysis unfalsifiable, and therefore unscientific? Why not aim for rigorous quantitative forecasting instead, that can be judged against reality?

First off, just because a forecast is quantitative doesn't mean it is actually rigorous. I think it's worthwhile to elicit and record quantitative forecasts. These can have high value for near-term horizons, and can provide a rough idea of the range of opinion for longer timescales.

However, simply phoning up experts to ask them for their timelines, or sending them an Internet survey, is not too useful. Tetlock's work, described in Muehlhauser's post and in my post on historical evaluations of forecasting, shows that unaided expert judgment has little value. Asking people who haven't thought through the issue to come up with numbers can give a fake sense of precision with little accuracy (and little genuine precision, either, if we consider the diverse range of responses from different experts). On the other hand, eliciting detailed scenarios from experts can force them to think more clearly about the issues and the relationships between them. Note that there are dangers to eliciting detailed scenarios: people may fall into their own make-believe world. But I think the trade-off with the uncertainty in quantitative forecasting still points in favor of scenario analysis.

Explicit quantitative forecasts can be helpful when people have an opportunity to learn from wrong forecasts and adjust their methodology accordingly. Therefore, I argue that if we want to go down the quantitative forecasting route, it's important to record forecasts about the near and medium future instead of or in addition to forecasts about the far future. Also, providing experts some historical information and feedback at the time they make their forecasts can help reduce the chances of them simply saying things without reflecting. Depending on the costs of recording forecasts, it may be worthwhile to do so anyway, even if we don't have high hopes that the forecasts will yield value. Broadly, I agree with Luke's suggestions:

  • Explicit quantification: “The best way to become a better-calibrated appraiser of long-term futures is to get in the habit of making quantitative probability estimates that can be objectively scored for accuracy over long stretches of time. Explicit quantification enables explicit accuracy feedback, which enables learning.”
  • Signposting the future: Thinking through specific scenarios can be useful if those scenarios “come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another… Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.”13
  • Leveraging aggregation: “the average forecast is often more accurate than the vast majority of the individual forecasts that went into computing the average…. [Forecasters] should also get into the habit that some of the better forecasters in [an IARPA forecasting tournament called ACE] have gotten into: comparing their predictions to group averages, weighted-averaging algorithms, prediction markets, and financial markets.” See Ungar et al. (2012) for some aggregation-leveraging results from the ACE tournament.

But I argue that the bulk of the effort should go into scenario generation and scenario analysis. Even here, the problem of absence of feedback is acute: we can design scenarios all we want for what will happen over the next century, but we can't afford to wait a century to know if our scenarios transpired. Therefore, it makes sense to break the scenario analysis exercises into chunks of 10-15 years. For instance, one scenario analysis could consider scenarios for the next 10-15 years. For each of the scenarios, we can have a separate scenario analysis exercise that considers scenarios for the 10-15 years after that. And so on. Note that the number of scenarios increases exponentially with the time horizon, but this is simply a reflection of the underlying complexity and uncertainty. In some cases, scenarios could "merge" at later times, as scenarios with slow early progress and fast later progress yield the same end result that scenario with fast early progress and slow later progress do.

#3: Evidence from other disciplines

Explicit quantitative forecasting is common in many disciplines, but the more we look at longer time horizons, and the more uncertainty we are dealing with, the more common scenario analysis becomes. I considered many examples of scenario analysis in my scenario analysis post. As you'll see from the list there, scenario analysis, and variants of it, have become influential in areas ranging from climate change (as seen in IPCC reports) to energy to macroeconomic and fiscal analysis to land use and transportation analysis. And big consulting companies such as McKinsey & Company use scenario analysis frequently in their reports.

It's of course possible to argue that the use of scenario analyses is a reflection of human failing: people don't want to make single forecasts because they are afraid of being proven wrong, or of contradicting other people's beliefs about the future. Or maybe people are shy of thinking quantitatively. I think there is some truth to such a critique. But until we have human-level AI, we have to rely on the failure-prone humans for input on the question of AI progress. Perhaps scenario analysis is superior to quantitative forecasting because humans are insufficiently rational, but to the extent it's superior, it's superior.

Addendum: What are the already existing scenario analyses for artificial intelligence?

I had a brief discussion with Luke Muehlhauser and some of the names below were suggested by him, but I didn't run the final list by him. All responsibility for errors is mine.

To my knowledge (and to the knowledge of people I've talked to) there are no formal scenario analyses of Artificial General Intelligence structured in a manner similar to the standard examples of scenario analyses. However, if scenario analysis is construed sufficiently loosely as a discussion of various predetermined elements and critical uncertainties and a brief mention of different possible scenarios, then we can list a few scenario analyses:

  • Nick Bostrom's book Superintelligence (released in the UK and on Kindle, but not released as a print book in the US at the time of this writing) discusses several scenarios for paths to AGI.
  • Eliezer Yudkowsky's report on Intelligence Explosion Microeconomics (93 pages, direct PDF link) can be construed as an analysis of AI scenarios.
  • Robin Hanson's forthcoming book on em economics discusses one future scenario that is somewhat related to AI progress.
  • The Hanson-Yudkowsky AI Foom debate includes a discussion of many scenarios.

The above are scenario analyses for the eventual properties and behavior of an artificial general intelligence, rather than scenario analyses for the immediate future. The work of Ray Kurzwzeil can be thought of as a scenario analysis that lays out an explicit timeline from now to the arrival of AGI.

[QUESTION]: Looking for insights from machine learning that helped improve state-of-the-art human thinking

3 VipulNaik 25 July 2014 02:10AM

This question is a follow-up of sorts to my earlier question on academic social science and machine learning.

Machine learning algorithms are used for a wide range of prediction tasks, including binary (yes/no) prediction and prediction of continuous variables. For binary prediction, common models include logistic regression, support vector machines, neural networks, and decision trees and forests.

Now, I do know that methods such as linear and logistic regression, and other regression-type techniques, are used extensively in science and social science research. Some of this research looks at the coefficients of such a model and then re-interprets them.

I'm interesting in examples where knowledge of the insides of other machine learning techniques (i.e., knowledge of the parameters for which the models perform well) has helped provide insights that are of direct human value, or perhaps even directly improved unaided human ability. In my earlier post, I linked to an example (courtesy Sebastian Kwiatkowski) where the results of  naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit).

PS: Here's a very quick description of how these supervised learning algorithms work. We first postulate a functional form that describes how the output depends on the input. For instance, the functional form in the case of logistic regression outputs the probability as the logistic function applied to a linear combination of the inputs (features). The functional form has a number of unknown parameters. Specific values of the parameters give specific functions that can be used to make predictions. Our goal is to find the parameter values.

We use a huge amount of labeled training data, plus a cost function (which itself typically arises from a statistical model for the nature of the error distribution) to find the parameter values. In the crudest form, this is purely a multivariable calculus optimization problem: choose parameters so that the total error function between the predicted function values and the observed function values is as small as possible. There are a few complications that need to be addressed to get to working algorithms.

So what makes machine learning problems hard? There are a few choice points:

  1. Feature selection: Figuring out the inputs (features) to use in predicting the outputs.
  2. Selection of the functional form model
  3. Selection of the cost function (error function)
  4. Selection of the algorithmic approach used to optimize the cost function, addressing the issue of overfitting through appropriate methods such as regularization and early stopping.

Of these steps, (1) is really the only step that is somewhat customized by domain, but even here, when we have enough data, it's more common to just throw in lots of features and see which ones actually help with prediction (in a regression model, the features that have predictive power will have nonzero coefficients in front of them, and removing them will increase the overall error of the model). (2) and (3) are mostly standardized, with our choice really being between a small number of differently flavored models (logistic regression, neural networks, etc.). (4) is the part where much of the machine learning research is concentrated: figuring out newer and better algorithms to find (approximate) solutions to the optimization problems for particular mathematical structures of the data.

 

[QUESTION]: Academic social science and machine learning

11 VipulNaik 19 July 2014 03:13PM

I asked this question on Facebook here, and got some interesting answers, but I thought it would be interesting to ask LessWrong and get a larger range of opinions. I've modified the list of options somewhat.

What explains why some classification, prediction, and regression methods are common in academic social science, while others are common in machine learning and data science?

For instance, I've encountered probit models in some academic social science, but not in machine learning.

Similarly, I've encountered support vector machines, artificial neural networks, and random forests in machine learning, but not in academic social science.

The main algorithms that I believe are common to academic social science and machine learning are the most standard regression algorithms: linear regression and logistic regression.

Possibilities that come to mind:

(0) My observation is wrong and/or the whole question is misguided.

(1) The focus in machine learning is on algorithms that can perform well on large data sets. Thus, for instance, probit models may be academically useful but don't scale up as well as logistic regression.

(2) Academic social scientists take time to catch up with new machine learning approaches. Of the methods mentioned above, random forests and support vector machines was introduced as recently as 1995. Neural networks are older but their practical implementation is about as recent. Moreover, the practical implementations of these algorithm in the standard statistical softwares and packages that academics rely on is even more recent. (This relates to point (4)).

(3) Academic social scientists are focused on publishing papers, where the goal is generally to determine whether a hypothesis is true. Therefore, they rely on approaches that have clear rules for hypothesis testing and for establishing statistical significance (see also this post of mine). Many of the new machine learning approaches don't have clearly defined statistical approaches for significance testing. Also, the strength of machine learning approaches is more exploratory than testing already formulated hypotheses (this relates to point (5)).

(4) Some of the new methods are complicated to code, and academic social scientists don't know enough mathematics, computer science, or statistics to cope with the methods (this may change if they're taught more about these methods in graduate school, but the relative newness of the methods is a factor here, relating to (2)).

(5) It's hard to interpret the results of fancy machine learning tools in a manner that yields social scientific insight. The results of a linear or logistic regression can be interpreted somewhat intuitively: the parameters (coefficients) associated with individual features describe the extent to which those features affect the output variable. Modulo issues of feature scaling, larger coefficients mean those features play a bigger role in determining the output. Pairwise and listwise R^2 values provide additional insight on how much signal and noise there is in individual features. But if you're looking at a neural network, it's quite hard to infer human-understandable rules from that. (The opposite direction is not too hard: it is possible to convert human-understandable rules to a decision tree and then to use a neural network to approximate that, and add appropriate fuzziness. But the neural networks we obtain as a result of machine learning optimization may be quite different from those that we can interpret as humans). To my knowledge, there haven't been attempts to reinterpret neural network results in human-understandable terms, though Sebastian Kwiatkowski's comment on my Facebook post points to an example where the results of  naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit). But Kwiatkowski's comment also pointed to other instances where the machine's algorithms weren't human-interpretable.

What's your personal view on my main question, and on any related issues?

Comment author: John_Maxwell_IV 16 July 2014 03:08:22AM *  2 points [-]

Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.

Algorithms to find the parameters for a classifier/regression, or algorithms to make use of it? And if I've got a large dataset that I'm training a classifier/regression on, what's to stop me from taking a relatively small sample of the data in order to train my model on? (The one time I used machine learning in a professional capacity, this is what I did. FYI I should not be considered an expert on machine learning.)

(On the other hand, if you're training a classifier/regression for every datum, say every book on Amazon, and the number of books on Amazon is growing superlinearly, then yes I think you would get a robust increase.)

Comment author: VipulNaik 16 July 2014 05:37:31AM *  1 point [-]

Good question.

I'm not an expert in machine learning either, but here is what I meant.

If you're running an algorithm such as linear or logistic regression, then there are two dimension numbers that are relevant: the number of data points, and the number of features (i.e., the number of parameters). For the design matrix of the regression, the number of data points is the number of rows and the number of features/parameters is the number of columns.

Holding the number of parameters constant, it's true that if you increase the number of data points beyond a certain amount, you can get most of the value through subsampling. And even if not, more data points is not such a big issue.

But the main advantage of having more data is lost if you still use the same (small) number of features. Generally, when you have more data, you'd try to use that additional data to use a model with more features. The number of features would still be less than the number of data points. I'd say that in many cases it's about 1% of the number of data points.

Of course, you could still use the model with the smaller number of features. In that case, you're just not putting the new data to much good use. Which is fine, but not an effective use of the enlarged data set. (There may be cases where even with more data, adding more features is no use, because the model has already reached the limits of its predictive power).

For linear regression, the algorithm to solve it exactly (using normal equations) takes time that is cubic in the number of parameters (if you use the naive inverse). Although matrix inversion can in principle be done faster than cubic, it can't be faster than quadratic, which is a general lower bound. Other iterative algorithms aren't quite cubic, but they're still more than linear.

Comment author: Nornagest 15 July 2014 10:58:26PM *  1 point [-]

To be blunt, I don't believe Dally. A while back, in the context of technological stagnation, I compared a 2012 Ford Focus to a 1970 Ford Maverick -- both popular midrange compact cars for their time -- and found that the Focus beat the pants off the Maverick on every metric but price (it cost about twice what the Maverick did, adjusted for inflation). Roughly twice the engine power with 1.5 to 2x the gas mileage; more interior room; far safer and more reliable; vastly better amenities.

It's not scaling as fast as Moore's Law by any means, but progress is happening. That might be tempered a bit by the price point, but reliability alone would be a strong counter to that once you amortize over the lifetime of the car.

Comment author: VipulNaik 16 July 2014 01:06:01AM 1 point [-]

My scenario #1 explicitly says that even in the face of a slowdown, we'll see doubling times of 10-25 years: "If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation."

So I'm not predicting complete stagnation, just a slowdown where computing power gains aren't happening fast enough for us to see new products every few years.

Comment author: jimrandomh 14 July 2014 06:07:34PM 4 points [-]

I think your predictions about where Moore's Law will stop are wildly pessimistic. You quote EETimes saying that "28nm is actually the last node of Moore's Law", but Intel is already shipping processors at 22nm! Meanwhile on an axis entirely orthogonal to transistor size and count, there's a new architecture in the pipeline (Mill) which credibly claims an order of magnitude improvement in perf/power and 2x in single-threaded speed. Based on technical details which I can't really get into, I think there's another 2x to be had after that.

Comment author: VipulNaik 15 July 2014 12:29:46AM 4 points [-]

I think continued progress of Moore's law is quite plausible, and that was one of the scenarios I considered (Scenario #2). That said, it's interesting that you express high confidence in this scenario relative to the other scenarios, despite the considerable skepticism of computer scientists, engineers, and the McKinsey report.

Would you like to make a bet for a specific claim about the technological progress we'll see? We could do it with actual money if you like, or just an honorary bet. Since you're claiming more confidence than I am, I'd like the odds in my favor, at somewhere between 2:1 and 4:1 (details depend on the exact proposed bet).

My suggestion to bet (that you can feel free to ignore) isn't intended to be confrontational. cf.

http://econlog.econlib.org/archives/2012/05/the_bettors_oat.html

How deferential should we be to the forecasts of subject matter experts?

12 VipulNaik 14 July 2014 11:41PM

This post explores the question: how strongly should we defer to predictions and forecasts made by people with domain expertise? I'll assume that the domain expertise is legitimate, i.e., the people with domain expertise do have a lot of information in their minds that non-experts don't. The information is usually not secret, and non-experts can usually access it through books, journals, and the Internet. But experts have more information inside their head, and may understand it better. How big an advantage does this give them in forecasting?

Tetlock and expert political judgment

In an earlier post on historical evaluations of forecasting, I discussed Philip E. Tetlock's findings on expert political judgment and forecasting skill, and summarized his own article for Cato Unbound co-authored with Dan Gardner that in turn summarized the themes of the book:

  1. The average expert’s forecasts were revealed to be only slightly more accurate than random guessing—or, to put more harshly, only a bit better than the proverbial dart-throwing chimpanzee. And the average expert performed slightly worse than a still more mindless competition: simple extrapolation algorithms that automatically predicted more of the same.
  2. The experts could be divided roughly into two overlapping yet statistically distinguishable groups. One group (the hedgehogs) would actually have been beaten rather soundly even by the chimp, not to mention the more formidable extrapolation algorithm. The other (the foxes) would have beaten the chimp and sometimes even the extrapolation algorithm, although not by a wide margin.
  3. The hedgehogs tended to use one analytical tool in many different domains; they preferred keeping their analysis simple and elegant by minimizing “distractions.” These experts zeroed in on only essential information, and they were unusually confident—they were far more likely to say something is “certain” or “impossible.” In explaining their forecasts, they often built up a lot of intellectual momentum in favor of their preferred conclusions. For instance, they were more likely to say “moreover” than “however.”
  4. The foxes used a wide assortment of analytical tools, sought out information from diverse sources, were comfortable with complexity and uncertainty, and were much less sure of themselves—they tended to talk in terms of possibilities and probabilities and were often happy to say “maybe.” In explaining their forecasts, they frequently shifted intellectual gears, sprinkling their speech with transition markers such as “although,” “but,” and “however.”
  5. It's unclear whether the performance of the best forecasters is the best that is in principle possible.
  6. This widespread lack of curiosity—lack of interest in thinking about how we think about possible futures—is a phenomenon worthy of investigation in its own right.

Tetlock has since started The Good Judgment Project (website, Wikipedia), a political forecasting competition where anybody can participate, and with a reputation of doing a much better job at prediction than anything else around. Participants are given a set of questions and can basically collect freely available online information (in some rounds, participants were given additional access to some proprietary data). They then use that to make predictions. The aggregate predictions are quite good. For more information, visit the website or see the references in the Wikipedia article. In particular, this Economist article and this Business Insider article are worth reading. (I discussed the GJP and other approaches to global political forecasting in this post).

So at least in the case of politics, it seems that amateurs, armed with basic information plus the freedom to look around for more, can use "fox-like" approaches and do a better job of forecasting than political scientists. Note that experts still do better than ignorant non-experts who are denied access to information. But once you have basic knowledge and are equipped to hunt more down, the constraining factor does not seem to be expertise, but rather, the approach you use (fox-like versus hedgehog-like). This should not be taken as a claim that expertise is irrelevant or unnecessary to forecasting. Experts play an important role in expanding the scope of knowledge and methodology that people can draw on to make their predictions. But the experts themselves, as people, do not have a unique advantage when it comes to forecasting.

Tetlock's research focused on politics. But the claim that the fox-hedgehog distinction turns out to be a better prediction of forecasting performance than the level of expertise is a general one. How true is this claim in domains other than politics? Domains such as climate science, economic growth, computing technology, or the arrival of artificial general intelligence?

Armstrong and Green again

J. Scott Armstrong is a leading figure in the forecasting community. Along with Kesten C. Green, he penned a critique of the forecasting exercises in climate science in 2007, with special focus on the IPCC reports. I discussed the critique at length in my post on the insularity critique of climate science. Here, I quote a part from the introduction of the critique that better explains the general prior that Armstrong and Green claim to be bringing to the table when they begin their evaluation. Of the points they make at the beginning, two bear directly on the deference we should give to expert judgment and expert consensus:

  • Unaided judgmental forecasts by experts have no value: This applies whether the opinions are expressed in words, spreadsheets, or mathematical models. It applies regardless of how much scientific evidence is possessed by the experts. Among the reasons for this are:
    a) Complexity: People cannot assess complex relationships through unaided observations.
    b) Coincidence: People confuse correlation with causation.
    c) Feedback: People making judgmental predictions typically do not receive unambiguous feedback they can use to improve their forecasting.
    d) Bias: People have difficulty in obtaining or using evidence that contradicts their initial beliefs. This problem is especially serious for people who view themselves as experts.
  • Agreement among experts is only weakly related to accuracy: This is especially true when the experts communicate with one another and when they work together to solve problems, as is the case with the IPCC process.

Armstrong and Green later elaborate on these claims, referencing Tetlock's work. (Note that I have removed the parts of the section that involve direct discussion of climate-related forecasts, since the focus here is on the general question of how much deference to show to expert consensus).

Many public policy decisions are based on forecasts by experts. Research on persuasion has shown that people have substantial faith in the value of such forecasts. Faith increases when experts agree with one another. Our concern here is with what we refer to as unaided expert judgments. In such cases, experts may have access to empirical studies and other information, but they use their knowledge to make predictions without the aid of well-established forecasting principles. Thus, they could simply use the information to come up with judgmental forecasts. Alternatively, they could translate their beliefs into mathematical statements (or models) and use those to make forecasts.

Although they may seem convincing at the time, expert forecasts can make for humorous reading in retrospect. Cerf and Navasky’s (1998) book contains 310 pages of examples, such as Fermi Award-winning scientist John von Neumann’s 1956 prediction that “A few decades hence, energy may be free”. [...] The second author’s review of empirical research on this problem led him to develop the “Seer-sucker theory,” which can be stated as “No matter how much evidence exists that seers do not exist, seers will find suckers” (Armstrong 1980). The amount of expertise does not matter beyond a basic minimum level. There are exceptions to the Seer-sucker Theory: When experts get substantial well-summarized feedback about the accuracy of their forecasts and about the reasons why their forecasts were or were not accurate, they can improve their forecasting. This situation applies for short-term (up to five day) weather forecasts, but we are not aware of any such regime for long-term global climate forecasting. Even if there were such a regime, the feedback would trickle in over many years before it became useful for improving forecasting.

Research since 1980 has provided much more evidence that expert forecasts are of no value. In particular, Tetlock (2005) recruited 284 people whose professions included, “commenting or offering advice on political and economic trends.” He asked them to forecast the probability that various situations would or would not occur, picking areas (geographic and substantive) within and outside their areas of expertise. By 2003, he had accumulated over 82,000 forecasts. The experts barely if at all outperformed non-experts and neither group did well against simple rules. Comparative empirical studies have routinely concluded that judgmental forecasting by experts is the least accurate of the methods available to make forecasts. For example, Ascher (1978, p. 200), in his analysis of long-term forecasts of electricity consumption found that was the case.

Note that the claims that Armstrong and Green make are in relation to unaided expert judgment, i.e., expert judgment that is not aided by some form of assistance or feedback that promotes improved forecasting. (One can argue that expert judgment in climate science is not unaided, i.e., that the critique is mis-applied to climate science, but whether that is the case is not the focus of my post). While Tetlock's suggestion to be more fox-like, Armstrong and Green recommend the use of their own forecasting principles, as encoded in their full list of principles and described on their website.

A conflict of intuitions, and an attempt to resolve it

I have two conflicting intuitions here. I like to use the majority view among experts as a reasonable Bayesian prior to start with, that I might then modify based on further study. The relevant question here is who the experts are. Do I defer to the views of domain experts, who may know little about the challenges of forecasting, or do I defer to the views of forecasting experts, who may know little of the domain but argue that domain experts who are not following good forecasting principles do not have any advantage over non-experts?

I think the following heuristics are reasonable starting points:

  • In cases where we have a historical track record of forecasts, we can use that to evaluate the experts and non-experts. For instance, I reviewed the track record of survey-based macroeconomic forecasts, thanks to a wealth of recorded data on macroeconomic forecasts by economists over the last few decades. (Unfortunately, these surveys did not include corresponding data on layperson opinion).
  • The faster the feedback from making a forecast to knowing whether it's right, the more likely it is that experts would have learned how to make good forecasts.
  • The more central forecasting is to the overall goals of the domain, the more likely people are to get it right. For instance, forecasting is a key part of weather and climate science. But forecasting progress on mathematical problems has a negligible relation with doing mathematical research.
  • Ceteris paribus, if experts are clearly recording their forecasts and the reasons behind them, and systematically evaluating the performance on past forecasts, that should be taken as (weak) evidence in favor of the experts' views being taken more seriously (even if we don't have enough of a historical track record to properly calibrate forecast accuracy). However, if they simply make forecasts but then fail to review their past history of forecasts, this may be taken as being about as bad as not forecasting at all. And in cases that the forecasts were bold, failed miserably, and yet the errors were not acknowledged, this should be taken as being considerably worse than not forecasting at all.
  • A weak inside view of the nature of domain expertise can give some idea of whether expertise should generally translate to better forecasting skill. For instance, even a very weak understanding of physics will tell us that physicists are no more likely to determine whether a coin toss will yield heads or tails, even though the fate of the coin is determined by physics. Similarly, with the exception of economists who specialize in the study of macroeconomic indicators, one wouldn't expect economists to be able to forecast macroeconomic indicators better than most moderately economically informed people.

Politicization?

My first thought was that the more politicized a field, the less reliable any forecasts coming out of it. I think there are obvious reasons for that view, but there are also countervailing considerations.

The main claimed danger of politicization is groupthink and lack of openness to evidence. It could even lead to suppression, misrepresentation, or fabrication of evidence. Quite often, however, we see these qualities in highly non-political fields. People believe that certain answers are the right ones. Their political identity or ego is not attached to it. They just have high confidence that that answer is correct, and when the evidence they have does not match up, they think there is a problem with the evidence. Of course, if somebody does start challenging the mainstream view, and the issue is not quickly resolved either way, it can become politicized, with competing camps of people who hold the mainstream view and people who side with the challengers. Note, however, that the politicization has arguably reduced the aggregate amount of groupthink in the field. Now that there are two competing camps rather than one received wisdom, new people can examine evidence and better decide which camp is more on the side of truth. People in both camps, now that they are competing, may try to offer better evidence that could convince the undecideds or skeptics. So "politicization" might well improve the epistemic situation (I don't doubt that the opposite happens quite often). Examples of such politicization might be the replacement of geocentrism by heliocentrism, the replacement of creationism by evolution, and the replacement of Newtonian mechanics by relativity and/or quantum mechanics. In the first two cases, religious authorities pushed against the new idea, even though the old idea had not been a "politicized" tenet before the competing claims came along. In the case of Newtonian and quantum mechanics, the debate seems to have been largely intra-science, but quantum mechanics had its detractors, including Einstein, famous for the "God does not play dice" quip. (This post on Slate Star Codex is somewhat related).

The above considerations aren't specific to forecasting, and they apply even for assertions that fall squarely within the domain of expertise and require no forecasting skill per se. The extent to which they apply to forecasting problems is unclear. It's unclear whether most domains have any significant groupthink in favor of particular forecasts. In fact, in most domains, forecasts aren't really made or publicly recorded at all. So concerns of groupthink in a non-politicized scenario may not apply to forecasting. Perhaps the problem is the opposite: forecasts are so unimportant in many domains that the forecasts offered by experts are almost completely random and hardly informed in a systematic way by their expert knowledge. Even in such situations, politicization can be helpful, in so far as it makes the issue more salient and might prompt individuals to give more attention to trying to figure out which side is right.

The case of forecasting AI progress

I'm still looking at the case of forecasting AI progress, but for now, I'd like to point people to Luke Muehlhauser's excellent blog post from May 2013 discussing the difficulty with forecasting AI progress. Interestingly, he makes many points similar to those I make here. (Note: Although I had read the post around the time it was published, I hadn't read it recently until I finished drafting the rest of my current post. Nonetheless, my views can't be considered totally independent of Luke's because we've discussed my forecasting contract work for MIRI).

Should we expect experts to be good at predicting AI, anyway? As Armstrong & Sotala (2012) point out, decades of research on expert performance2 suggest that predicting the first creation of AI is precisely the kind of task on which we should expect experts to show poor performance — e.g. because feedback is unavailable and the input stimuli are dynamic rather than static. Muehlhauser & Salamon (2013) add, “If you have a gut feeling about when AI will be created, it is probably wrong.”

[...]

On the other hand, Tetlock (2005) points out that, at least in his large longitudinal database of pundit’s predictions about politics, simple trend extrapolation is tough to beat. Consider one example from the field of AI: when David Levy asked 1989 World Computer Chess Championship participants when a chess program would defeat the human World Champion, their estimates tended to be inaccurately pessimistic,8 despite the fact that computer chess had shown regular and predictable progress for two decades by that time. Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).

Looking for thoughts

I'm particularly interested in thoughts from people on the following fronts:

  1. What are some indicators you use to determine the reliability of forecasts by subject matter experts?
  2. How do you resolve the conflict of intuitions between deferring to the views of domain experts and deferring to the conclusion that forecasters have drawn about the lack of utility of domain experts' forecasts?
  3. In particular, what do you think of the way that "politicization" affects the reliability of forecasts?
  4. Also, how much value do you assign to agreement between experts when judging how much trust to place in expert forecasts?
  5. Comments that elaborate on these questions or this general topic within the context of a specific domain or domains would also be welcome.

Scenario analyses for technological progress for the next decade

10 VipulNaik 14 July 2014 04:31PM

This is a somewhat long and rambling post. Apologies for the length. I hope the topic and content are interesting enough for you to forgive the meandering presentation.

I blogged about the scenario planning method a while back, where I linked to many past examples of scenario planning exercises. In this post, I take a closer look at scenario analysis in the context of understanding the possibilities for the unfolding of technological progress over the next 10-15 years. Here, I will discuss some predetermined elements and critical uncertainties, offer my own scenario analysis, and then discuss scenario analyses by others.

Remember: it is not the purpose of scenario analysis to identify a set of mutually exclusive and collectively exhaustive outcomes. In fact, usually, the real-world outcome has some features from two or more of the scenarios considered, with one scenario dominating somewhat. As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

The predetermined element: the imminent demise of Moore's law "as we know it"

As Steven Schnaars noted in Megamistakes (discussed here), forecasts of technological progress in most domains have been overoptimistic, but in the domain of computing, they've been largely spot-on, mostly because the raw technology has improved quickly. The main reason has been Moore's law, and a couple other related laws, that have undergirded technological progress. But now, the party is coming to an end! The death of Moore's law (as we know it) is nigh, and there are significant implications for the future of computing.

Moore's law refers to many related claims about technological progress. Some forms of this technological progress have already stalled. Other forms are slated to stall in the near future, barring unexpected breakthroughs. These facts about Moore's law form the backdrop for all our scenario planning.

The critical uncertainty arises in how industry will respond to the prospect of Moore's law death. Will there be a doubling down on continued improvement at the cutting edge? Will the battle focus on cost reductions? Or will we have neither cost reduction nor technological improvement? What sort of pressure will hardware stagnation put on software?

Now, onto a description of the different versions of Moore's law (slightly edited version of information from Wikipedia):

  • Transistors per integrated circuit. The most popular formulation is of the doubling of the number of transistors on integrated circuits every two years.

  • Density at minimum cost per transistor. This is the formulation given in Moore's 1965 paper. It is not just about the density of transistors that can be achieved, but about the density of transistors at which the cost per transistor is the lowest. As more transistors are put on a chip, the cost to make each transistor decreases, but the chance that the chip will not work due to a defect increases. In 1965, Moore examined the density of transistors at which cost is minimized, and observed that, as transistors were made smaller through advances in photolithography, this number would increase at "a rate of roughly a factor of two per year".

  • Dennard scaling. This suggests that power requirements are proportional to area (both voltage and current being proportional to length) for transistors. Combined with Moore's law, performance per watt would grow at roughly the same rate as transistor density, doubling every 1–2 years. According to Dennard scaling transistor dimensions are scaled by 30% (0.7x) every technology generation, thus reducing their area by 50%. This reduces the delay by 30% (0.7x) and therefore increases operating frequency by about 40% (1.4x). Finally, to keep electric field constant, voltage is reduced by 30%, reducing energy by 65% and power (at 1.4x frequency) by 50%. Therefore, in every technology generation transistor density doubles, circuit becomes 40% faster, while power consumption (with twice the number of transistors) stays the same.

So how are each of these faring?

  • Transistors per integrated circuit: At least in principle, this can continue for a decade or so. The technological ideas exist to publish transistor sizes down from the current values of 32 nm and 28 nm all the way down to 7 nm.
  • Density at minimum cost per transistor. This is probably stopping around now. There is good reason to believe that, barring unexpected breakthroughs, the transistor size for which we have minimum cost per transistor shall not go down below 28 nm. There may still be niche applications that benefit from smaller transistor sizes, but there will be no overwhelming economic case to switch production to smaller transistor sizes (i.e., higher densities).
  • Dennard scaling. This broke down around 2005-2007. So for approximately a decade, we've essentially seen continued miniaturization but without any corresponding improvement in processor speed or performance per watt. There have been continued overall improvements in energy efficiency of computing, but not through this mechanism. The absence of automatic speed improvements has led to increased focus on using greater parallelization (note that the miniaturization means more parallel processors can be packed in the same space, so Moore's law is helping in this other way). In particular, there has been an increased focus on multicore processors, though there may be limits to how far that can take us too.

Moore's law isn't the only law that is slated to end. Other similar laws, such as Kryder's law (about the cost of hard disk space) may also end in the near future. Koomey's law on energy efficiency may also stall, or might continue to hold but through very different mechanisms compared to the ones that have driven it so far.

Some discussions that do not use explicit scenario analysis

The quotes below are to give a general idea of what people seem to generally agree on, before we delve into different scenarios.

EETimes writes:

We have been hearing about the imminent demise of Moore's Law quite a lot recently. Most of these predictions have been targeting the 7nm node and 2020 as the end-point. But we need to recognize that, in fact, 28nm is actually the last node of Moore's Law.

[...]

Summarizing all of these factors, it is clear that -- for most SoCs -- 28nm will be the node for "minimum component costs" for the coming years. As an industry, we are facing a paradigm shift because dimensional scaling is no longer the path for cost scaling. New paths need to be explored such as SOI and monolithic 3D integration. It is therefore fitting that the traditional IEEE conference on SOI has expanded its scope and renamed itself as IEEE S3S: SOI technology, 3D Integration, and Subthreshold Microelectronics.

Computer scientist Moshe Yardi writes:

So the real question is not when precisely Moore's Law will die; one can say it is already a walking dead. The real question is what happens now, when the force that has been driving our field for the past 50 years is dissipating. In fact, Moore's Law has shaped much of the modern world we see around us. A recent McKinsey study ascribed "up to 40% of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements." Indeed, the demise of Moore's Law is one reason some economists predict a "great stagnation" (see my Sept. 2013 column).

"Predictions are difficult," it is said, "especially about the future." The only safe bet is that the next 20 years will be "interesting times." On one hand, since Moore's Law will not be handing us improved performance on a silver platter, we will have to deliver performance the hard way, by improved algorithms and systems. This is a great opportunity for computing research. On the other hand, it is possible that the industry would experience technological commoditization, leading to reduced profitability. Without healthy profit margins to plow into research and development, innovation may slow down and the transition to the post-CMOS world may be long, slow, and agonizing.

However things unfold, we must accept that Moore's Law is dying, and we are heading into an uncharted territory.

CNet says:

"I drive a 1964 car. I also have a 2010. There's not that much difference -- gross performance indicators like top speed and miles per gallon aren't that different. It's safer, and there are a lot of creature comforts in the interior," said Nvidia Chief Scientist Bill Dally. If Moore's Law fizzles, "We'll start to look like the auto industry."

Three critical uncertainties: technological progress, demand for computing power, and interaction with software

Uncertainty #1: Technological progress

Moore's law is dead, long live Moore's law! Even if Moore's law as originally stated is no longer valid, there are other plausible computing advances that would preserve the spirit of the law.

Minor modifications of current research (as described in EETimes) include:

  • Improvements in 3D circuit design (Wikipedia), so that we can stack multiple layers of circuits one on top of the other, and therefore pack more computing power per unit volume.
  • Improvements in understanding electronics at the nanoscale, in particular understanding subthreshold leakage (Wikipedia) and how to tackle it.

Then, there are possibilities for totally new computing paradigms. These have fairly low probability, and are highly unlikely to become commercially viable within 10-15 years. Each of these offers an advantage over currently available general-purpose computing only for special classes of problems, generally those that are parallelizable in particular ways (the type of parallelizability needed differs somewhat between the computing paradigms).

  • Quantum computing (Wikipedia) (speeds up particular types of problems). Quantum computers already exist, but the current ones can tackle only a few qubits. Currently, the best known quantum computers in action are those maintained at the Quantum AI Lab (Wikipedia) run jointly by Google, NASA. and USRA. It is currently unclear how to manufacture quantum computers with a larger number of qubits. It's also unclear how the cost will scale in the number of qubits. If the cost scales exponentially in the number of qubits, then quantum computing will offer little advantage over classical computing. Ray Kurzweil explains this as follows:
    A key question is: how difficult is it to add each additional qubit? The computational power of a quantum computer grows exponentially with each added qubit, but if it turns out that adding each additional qubit makes the engineering task exponentially more difficult, we will not be gaining any leverage. (That is, the computational power of a quantum computer will be only linearly proportional to the engineering difficulty.) In general, proposed methods for adding qubits make the resulting systems significantly more delicate and susceptible to premature decoherence.

    Kurzweil, Ray (2005-09-22). The Singularity Is Near: When Humans Transcend Biology (Kindle Locations 2152-2155). Penguin Group. Kindle Edition.
  • DNA computing (Wikipedia)
  • Other types of molecular computing (Technology Review featured story from 2000, TR story from 2010)
  • Spintronics (Wikipedia): The idea is to store information using the spin of the electron, a quantum property that is binary and can be toggled at zero energy cost (in principle). The main potential utility of spintronics is in data storage, but it could potentially help with computation as well.
  • Optical computing aka photonic computing (Wikipedia): This uses beams of photons that store the relevant information that needs to be manipulated. Photons promise to offer higher bandwidth than electrons, the tool used in computing today (hence the name electronic computing).

Uncertainty #2: Demand for computing

Even if computational advances are possible in principle, the absence of the right kind of demand can lead to a lack of financial incentive to pursue the relevant advances. I discussed the interaction between supply and demand in detail in this post.

As that post discussed, demand for computational power at the consumer end is probably reaching saturation. The main source of increased demand will now be companies that want to crunch huge amounts of data in order to more efficiently mine data for insight and offer faster search capabilities to their users. The extent to which such demand grows is uncertain. In principle, the demand is unlimited: the more data we collect (including "found data" that will expand considerably as the Internet of Things grows), the more computational power is needed to apply machine learning algorithms to the data. Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.

Uncertainty #3: Interaction with software

Much of the increased demand for computing, as noted above, does not arise so much from a need for raw computing power by consumers, but a need for more computing power to manipulate and glean insight from large data sets. While there has been some progress with algorithms for machine learning and data mining, the fields are probably far from mature. So an alternative to hardware improvements is improvements in the underlying algorithms. In addition to the algorithms themselves, execution details (such as better use of parallel processing capabilities and more efficient use of idle processor capacity) can also yield huge performance gains.

This might be a good time to note a common belief about software and why I think it's wrong. We often tend to hear of software bloat, and some people subscribe to Wirth's law, the claim that software is getting slower more quickly than hardware is getting faster. I think that there are some softwares that have gotten feature-bloated over time, largely because there are incentives to keep putting out new editions that people are willing to pay money for, and Microsoft Word might be one case of such bloat. For the most part, though, software has been getting more efficient, partly by utilizing the new hardware better, but also partly due to underlying algorithmic improvements. This was one of the conclusions of Katja Grace's report on algorithmic progress (see also this link on progress on linear algebra and linear programming algorithms). There are a few softwares that get feature-bloated and as a result don't appear to improve over time as far as speed goes, but it's arguably the case that people's revealed preferences show that they are willing to put up with the lack of speed improvements as long as they're getting feature improvements.

Computing technology progress over the next 10-15 years: my three scenarios

  1. Slowdown to ordinary rates of growth of cutting-edge industrial productivity: For the last few decades, several dimensions of computing technology have experienced doublings over time periods ranging from six months to five years. With such fast doubling, we can expect price-performance thresholds for new categories of products to be reached every few years, with multiple new product categories a decade. Consider, for instance, desktops, then laptops, then smartphones, then tablets. If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation. There are already some indications of a possible slowdown, and it remains to be seen whether we see a bounceback.
  2. Continued fast doubling: The other possibility is that the evidence for a slowdown is largely illusory, and computing technology will continue to experience doublings over timescales of less than five years. There would therefore be scope to introduce new product categories every few years.
  3. New computing paradigm with high promise, but requiring significant adjustment: This is an unlikely, but not impossible, scenario. Here, a new computing paradigm, such as quantum computing, reaches the realm of feasibility. However, the existing infrastructure of algorithms is ill-designed for quantum computing, and in fact, quantum computing engenders many security protocols while offering its own unbreakable ones. Making good use of this new paradigm requires a massive re-architecting of the world's computing infrastructure.

There are two broad features that are likely to be common to all scenarios:

  • Growing importance of algorithms: Scenario (1): If technological progress in computing power stalls, then the pressure for improvements to the algorithms and software may increase. Scenario (2): if technological progress in computing power continues, that might only feed the hunger for bigger data. And as the size of data sets increases, asymptotic performance starts mattering more (the distinction between O(n) and O(n2) matters more when n is large). In both cases, I expect more pressure on algorithms and software, but in different ways: in the case of stalling hardware progress, the focus will be more on improving the software and making minor changes to improve the constants, whereas in the case of rapid hardware progress, the focus will be more on finding algorithms that have better asymptotic (big-oh) performance. Scenario (3): In the case of paradigm shifts, the focus will be on algorithms that better exploit the new paradigm. In all cases, there will need to be some sort of shift toward new algorithms and new code that better exploits the new situation.
  • Growing importance of parallelization: Although the specifics of how algorithms will become more important varies between the scenarios, one common feature is that algorithms that can better make parallel use of large numbers of machines will become more important. We have seen parallelization grow in importance over the last 15 years, even as the computing gains for individual processors through Moore's law seems to be plateauing out, while data centers have proliferated in number. However, the full power of parallelization is far from tapped out. Again, parallelization matters for slightly different reasons in different cases. Scenario (1): A slowdown in technological progress would mean that gains in the amount of computation can largely be achieved by scaling up the number of machines. In other words, the usage of computing shifts further in a capital-intensive direction. Parallel computing is important for effective utilization of this capital (the computing resources). Scenario (2): Even in the face of rapid hardware progress, automatic big data generation will likely improve much faster than storage, communication, and bandwidth. This "big data" is too huge to store or even stream on a single machine, so parallel processing across huge clusters of machines becomes important. Scenario (3): Note also that almost all the new computing paradigms currently under consideration (including quantum computing) offer massive advantages for special types of parallelizable problems, so parallelization matters even in the case of a paradigm shift in computing.

Other scenario analyses

McKinsey carried out a scenario analysis here, focused more on the implications for the semiconductor manufacturing industry than for users of computing. The report notes the importance of Moore's law in driving productivity improvements over the last few decades:

As a result, Moore’s law has swept much of the modern world along with it. Some estimates ascribe up to 40 percent of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements.

The scenario analysis identifies four potential sources of innovation related to Moore's law:

  1. More Moore (scaling)
  2. Wafer-size increases (maximize productivity)
  3. More than Moore (functional diversification)
  4. Beyond CMOS (new technologies)

Their scenario analysis uses a 2 X 2 model, with the two dimensions under consideration being performance improvements (continue versus stop) and cost improvements (continue versus stop). The case that both performance improvements and cost improvements continue is the "good" case for the semiconductor industry. The case that both stop is the case where the industry is highly likely to get commodified, with profit margins going down and small players catching up to the big ones. In the intermediate cases (where one of the two continues and the other stops), consolidation of the semiconductor industry is likely to continue, but there is still a risk of falling demand.

The McKinsey scenario analysis was discussed by Timothy Taylor on his blog, The Conversable Economist, here.

Roland Berger carried out a detailed scenario analysis focused on the "More than Moore" strategy here.

Blegging for missed scenarios, common features and early indicators

Are there scenarios that the analyses discussed above missed? Are there some types of scenario analysis that we didn't adequately consider? If you had to do your own scenario analysis for the future of computing technology and hardware progress over the next 10-15 years, what scenarios would you generate?

As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

I already identified some features I believe to be common to all scenarios (namely, increased focus on algorithms, and increased focus on parallelization). Do you agree with my assessment that these are likely to matter regardless of scenario? Are there other such common features you have high confidence in?

If you generally agree with one or more of the scenario analyses here (mine or McKinsey's or Roland Berger's), what early indicators would you use to identify which of the enumerated scenarios we are in? Is it possible to look at how events unfold over the next 2-3 years and draw intelligent conclusions from that about the likelihood of different scenarios?

Communicating forecast uncertainty

5 VipulNaik 12 July 2014 09:30PM

Note: This post is part of my series of posts on forecasting, but this particular post may be of fairly limited interest to many LessWrong readers. I'm posting it here mainly for completeness. As always, I appreciate feedback.

In the course of my work looking at forecasting for MIRI, I repeatedly encountered discussions of how to communicate forecasts. In particular, a concern that emerged repeatedly was the clear communication of the uncertainty in forecasts. Nate Silver's The Signal and the Noise, in particular, focused quite a bit on the virtue of clear communication of uncertainty, in contexts as diverse as financial crises, epidemiology, weather forecasting, and climate change.

In this post, I pull together discussions from a variety of domains about the communication of uncertainty, and also included my overall impression of the findings.

Summary of overall findings

  • In cases where forecasts are made and used frequently (the most salient example being temperature and precipitation forecasts) people tend to form their own models of the uncertainty surrounding forecasts, even if you present forecasts as point estimates. The models people develop are quite similar to the correct ones, but still different in important ways.
  • In cases where forecasts are made more rarely, as with forecasting rare events, people are more likely to have simpler models that acknowledge some uncertainty but are less nuanced. In these cases, acknowledging uncertainty becomes quite important, because wrong forecasts of such events can lead to a loss of trust in the forecasting process, and can lead people to ignore correct forecasts later.
  • In some cases, there are arguments for modestly exaggerating small probabilities to overcome specific biases that people have that cause them to ignore low-probability events.
  • However, the balance of evidence suggests that forecasts should be reported as honestly as possible, and all uncertainty should be clearly acknowledged. If the forecast does not acknowledge uncertainty, people are likely to either use their own models of uncertainty, or lose faith in the forecasting process entirely if the forecast turns out to be far off from reality.

Probabilities of adverse events and the concept of the cost-loss ratio

A useful concept developed for understanding the utility of weather forecasting is the cost-loss model (Wikipedia). Consider that if a particular adverse event occurs, and we do not take precautionary measures, the loss incurred is L, whereas if we do take precautionary measures, the cost is C, regardless of whether the event occurs. An example: you're planning an outdoor party, and the adverse event in question is rain. If it rains during the event, you experience a loss of L. If you knew in advance that it would rain, you'd move the venue indoors, at a cost of C. Obviously, C < L for you to even consider the precautionary measure.

The ratio C/L is termed the cost-loss ratio and describes the probability threshold above which it makes sense to take the precautionary measure.

One way of thinking of the utility of weather forecasting, particularly in the context of forecasting adverse events (rain, snow, winds, and more extreme events) is in terms of whether people have adequate information to make correct decisions based on their cost-loss model. This would boil down to several questions:

  • Is the probability of the adverse event communicated with sufficient clarity and precision that people who need to use it can plug it into their cost-loss model?
  • Do people have a correct estimate of their cost-loss ratio (implicitly or explicitly)?

As I discussed in an earlier post, The Weather Channel has admitted to explicitly introducing wet bias into its probability-of-precipitation (PoP) forecasts. The rationale they offered could be interpreted as a claim that people overestimate their cost-loss ratio. For instance, a person may think his cost-loss ratio for precipitation is 0.2 (20%), but his actual cost-loss ratio may be 0.05 (5%). In this case, in order to make sure people still make the "correct" decision, PoP forecasts that fall between 0.05 and 0.2 would need to inflated to 0.2 or higher. Note that TWC does not introduce wet bias at higher probabilities of precipitation, arguably because (they believe) that this is well above the cost-loss ratio for most situations.

Words of estimative probability

In 1964, Sherman Kent (Wikipedia), the father of intelligence analysis, wrote an essay titled "Words of Estimative Probability" that discussed the use of words to describe probability estimates, and how different people may interpret the same word as referring to very different ranges of probability estimates. The concept of words of estimative probability (Wikipedia), along with its acronym, WEP, is now standard jargon in intelligence analysis.

Some related discussion of the use of words to convey uncertainty in estimates can be found in the part of this post where I excerpt from the paper discussing the communication of uncertainty in climate change.

Other general reading

#1: The case of weather forecasting

Weather forecasting has some features that make it stand out among other forecasting domains:

  • Forecasts are published explicitly and regularly: News channels and newspapers carry forecasts every day. Weather websites update their forecasts on at least an hourly basis, sometimes even faster, particularly if there are unusual weather developments. In the United States, The Weather Channel is dedicated to 24 X 7 weather news coverage.
  • Forecasts are targeted at and consumed by the general public: This sets weather forecasting apart from other forms of forecasting and prediction. We can think of prices in financial markets and betting markets as implicit forecasts. But they are targeted at the niche audiences that pay attention to them, not at everybody. The mode of consumption varies. Some people just get their forecasts from the weather reports in their local TV and radio channel. Some people visit the main weather websites (such as the National Weather Service, The Weather Channel, AccuWeather, or equivalent sources in other countries). Some people have weather reports emailed to them daily. As smartphones grow in popularity, weather apps are an increasingly common way for people to keep tabs on the weather. The study on communicating weather uncertainty (discussed below) found that in the United States, people in its sample audience saw weather forecasts an average of 115 times a month. Even assuming heavy selection bias in the study, people in the developed world probably encounter a weather forecast at least once a day.
  • Forecasts are used to drive decision-making: Particularly in places where weather fluctuations are significant, forecasts play an important role in event planning for individuals and organizations. At the individual level, this can include deciding whether to carry an umbrella, choosing what clothes to wear, deciding whether to wear snow boots, deciding whether conditions are suitable for driving, and many other small decisions. At the organizational level, events may be canceled or relocated based on forecasts of adverse weather. In locations with variable weather, it's considered irresponsible to plan an event without checking the weather forecast.
  • People get quick feedback on whether the forecast was accurate: The next day, people know whether what was forecast transpired.

The upshot: people are exposed to weather forecasts, pay attention to them, base decisions on them, and then come to know whether the forecast was correct. This happens on a daily basis. Therefore, they have both the incentives and the information to form their own mental model of the reliability and uncertainty in forecasts. Note also that because the reliability of forecasts varies considerably by location, people who move from one location to another may take time adjusting to the new location. (For instance, when I moved to Chicago, I didn't pay much attention to weather forecasts in the beginning, but soon learned that the high variability of the weather combined with reasonable accuracy of forecasts made then worth paying attention to. Now that I'm in Berkeley, I probably pay too much attention to the forecast relative to its value, given the stability of weather in Berkeley).

With these general thoughts in mind, let's look at the paper Communicating Uncertainty in Weather Forecasts: A Survey of the U. S. Public by Rebecca E. Morss, Julie L. Demuth, and Jeffrey K. Lazo. The paper is based on a survey of about 1500 people in the United States. The whole paper is worth a careful read if you find the issue fascinating. But for the benefits of those of you who find the issue somewhat interesting but not enough to read the paper, I include some key takeaways from the paper.

Temperature forecasts: the authors find that even though temperature forecasts are generally made as point estimates, people interpret these point estimates as temperature ranges. The temperature ranges are not even necessarily centered at the point estimates. Further, the range of temperatures increases with the forecast horizon. In other words, people (correctly) realize that forecasts made for three days later have more uncertainty attached to them than forecasts made for one day later. In other words, peoples understanding of the nature of forecast uncertainty in temperatures is correct, at least in the broad qualitative sense.

The authors believe that people arrive at these correct models through their own personal history of seeing weather forecasts and evaluating how they compare with the reality. Clearly, most people don't keep close track of how forecasts compare with the reality, but they are still able to get the general idea over several years of exposure to weather forecasts. The authors also believe that since the accuracy of weather forecasts varies by region, people's models of uncertainty may also differ by region. However, the data they collect does not allow for a test of this hypothesis. For more, read Sections 3a and 3b of the paper.

Probability-of-precipitation (PoP) forecasts: The authors also look at people's perception of probability-of-precipitation (PoP) forecasts. The correct meteorological interpretation of PoP is "the probability that precipitation occurs given these meteorological conditions." The frequentist operationalization of this would be "the fraction (situations with meteorological conditions like this where precipitation does occur)/(situations with meteorological conditions like this)." To what extent are people aware of this meaning? One of the questions in the survey elicits information on this front:

TABLE 2. Responses to Q14a, the meaning of the forecast
“There is a 60% chance of rain for tomorrow” (N 1330).
It will rain tomorrow in 60% of the region. 16% of respondents
It will rain tomorrow for 60% of the time. 10% of respondents
It will rain on 60% of the days like tomorrow.* 19% of respondents
60% of weather forecasters believe that it will rain tomorrow. 22% of respondents
I don’t know. 9% of respondents
Other (please explain). 24% of respondents
* Technically correct interpretation, according to how PoP forecasts are verified, as interpreted by Gigerenzer et al. (2005).

So about 19% of participants choose the correct meteorological interpretation. However, of the 24% who offer other explanations, many suggest that they are not so much interested in the meteorological interpretation as in how this affects their decision-making. So it might be the case that even if people aren't aware of the frequentist definition, they are still using the information approximately correctly as it applies to their lives. One such application would be a comparison with the cost-loss ratio to determine whether to engage in precautionary measures. Note that, as noted earlier in the post, it may be the case that people overestimate their own cost-loss ratio, but this is a distinct problem from incorrectly interpreting the probability.

I also found the following resources, that I haven't had the time to read through, but that might help people interested in exploring the issue in more detail (I'll add more to this list if I find more):

#2: Extreme rare events (usually weather-related) that require significant response

For some rare events (such as earthquakes) we don't know how to make specific predictions of their imminent arrival. But for others, such as hurricanes, cyclones, blizzards, tornadoes, and thunderstorms, specific probabilistic predictions can be made. Based on these predictions, significant action can be undertaken, ranging from everybody deciding to stock up on supplies and stay at home, to a mass evacuation. Such responses are quite costly, but the loss they would avert if the event did occur is even bigger. In the cost-loss framework discussed above, we are dealing with both a high cost and a loss that could be much higher. However, unlike the binary case discussed above, the loss spans more of a continuum: the amount of loss that would occur without precautionary measures depends on the intensity of the event. Similarly, the costs span a continuum: the cost depends on the extent of precautionary measures taken.

Since both the cost and loss are huge, it's quite important to get a good handle on the probability. But should the correct probability be communicated, or should it be massaged or simply converted to a "yes/no" statement? We discussed earlier the (alleged) problem of people overestimating their cost-loss ratio, and therefore not taking adequate precautionary measures, and how the Weather Channel addresses this by deliberately introducing a wet bias. But the stakes are much higher when we are talking of shutting down a city for a day or ordering a mass evacuation.

Another complication is that the rarity of the event means that people's own mental models haven't had a lot of data to calibrate the accuracy and reliability of forecasts. When it comes to temperature and precipitation forecasts, people have years of experience to rely on. They will not lose faith in a forecast based on a single occurrence. When it comes to rare events, even a few memories of incorrect forecasts, and the concomitant huge costs or huge losses, can lead people to be skeptical of the forecasts in the future. In The Signal and the Noise, Nate Silver extensively discusses the case of Hurricane Katrina and the dilemmas facing the mayor of New Orleans that led him to delay the evacuation of the city, and led many people to ignore the evacuation order even after it was announced.

A direct strike of a major hurricane on New Orleans had long been every weather forecaster’s worst nightmare. The city presented a perfect set of circumstances that might contribute to the death and destruction there. [...]

The National Hurricane Center nailed its forecast of Katrina; it anticipated a potential hit on the city almost five days before the levees were breached, and concluded that some version of the nightmare scenario was probable more than forty-eight hours away . Twenty or thirty years ago, this much advance warning would almost certainly not have been possible, and fewer people would have been evacuated. The Hurricane Center’s forecast, and the steady advances made in weather forecasting over the past few decades, undoubtedly saved many lives.

Not everyone listened to the forecast, however. About 80,000 New Orleanians —almost a fifth of the city’s population at the time— failed to evacuate the city, and 1,600 of them died. Surveys of the survivors found that about two-thirds of them did not think the storm would be as bad as it was. Others had been confused by a bungled evacuation order; the city’s mayor, Ray Nagin, waited almost twenty-four hours to call for a mandatory evacuation, despite pleas from Mayfield and from other public officials. Still other residents— impoverished, elderly, or disconnected from the news— could not have fled even if they had wanted to.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 109-110). Penguin Group US. Kindle Edition.

So what went wrong? Silver returns to this later in the chapter:

As Max Mayfield told Congress, he had been prepared for a storm like Katrina to hit New Orleans for most of his sixty-year life. Mayfield grew up around severe weather— in Oklahoma, the heart of Tornado Alley— and began his forecasting career in the Air Force, where people took risk very seriously and drew up battle plans to prepare for it. What took him longer to learn was how difficult it would be for the National Hurricane Center to communicate its forecasts to the general public.

“After Hurricane Hugo in 1989,” Mayfield recalled in his Oklahoma drawl, “I was talking to a behavioral scientist from Florida State. He said people don’t respond to hurricane warnings. And I was insulted. Of course they do. But I have learned that he is absolutely right. People don’t respond just to the phrase ‘hurricane warning.’ People respond to what they hear from local officials. You don’t want the forecaster or the TV anchor making decisions on when to open shelters or when to reverse lanes.”

Under Mayfield’s guidance, the National Hurricane Center began to pay much more attention to how it presented its forecasts. It contrast to most government agencies, whose Web sites look as though they haven’t been updated since the days when you got those free AOL CDs in the mail, the Hurricane Center takes great care in the design of its products, producing a series of colorful and attractive charts that convey information intuitively and accurately on everything from wind speed to storm surge.

The Hurricane Center also takes care in how it presents the uncertainty in its forecasts. “Uncertainty is the fundamental component of weather prediction,” Mayfield said. “No forecast is complete without some description of that uncertainty.” Instead of just showing a single track line for a hurricane’s predicted path, for instance, their charts prominently feature a cone of uncertainty—“ some people call it a cone of chaos,” Mayfield said. This shows the range of places where the eye of the hurricane is most likely to make landfall. Mayfield worries that even this isn’t enough. Significant impacts like flash floods (which are often more deadly than the storm itself) can occur far from the center of the storm and long after peak wind speeds have died down. No people in New York City died from Hurricane Irene in 2011 despite massive media hype surrounding the storm, but three people did from flooding in landlocked Vermont once the TV cameras were turned off.

[...]


Mayfield told Nagin that he needed to issue a mandatory evacuation order, and to do so as soon as possible.

Nagin dallied, issuing a voluntary evacuation order instead. In the Big Easy, that was code for “take it easy”; only a mandatory evacuation order would convey the full force of the threat. Most New Orleanians had not been alive when the last catastrophic storm, Hurricane Betsy, had hit the city in 1965. And those who had been, by definition, had survived it. “If I survived Hurricane Betsy, I can survive that one, too. We all ride the hurricanes, you know,” an elderly resident who stayed in the city later told public officials. Reponses like these were typical. Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes. 

The reasons for Nagin’s delay in issuing the evacuation order is a matter of some dispute— he may have been concerned that hotel owners might sue the city if their business was disrupted. Either way, he did not call for a mandatory evacuation until Sunday at 11 A.M. —and by that point the residents who had not gotten the message yet were thoroughly confused . One study found that about a third of residents who declined to evacuate the city had not heard the evacuation order at all. Another third heard it but said it did not give clear instructions. Surveys of disaster victims are not always reliable— it is difficult for people to articulate why they behaved the way they did under significant emotional strain, and a small percentage of the population will say they never heard an evacuation order even when it is issued early and often. But in this case, Nagin was responsible for much of the confusion.

There is, of course, plenty of blame to go around for Katrina— certainly to FEMA in addition to Nagin. There is also credit to apportion— most people did evacuate, in part because of the Hurricane Center’s accurate forecast. Had Betsy topped the levees in 1965, before reliable hurricane forecasts were possible, the death toll would probably have been even greater than it was in Katrina. One lesson from Katrina, however, is that accuracy is the best policy for a forecaster. It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse. The Hurricane Center works as hard as it can to avoid letting these things compromise its forecasts. It may not be a concidence that, in contrast to all the forecasting failures in this book, theirs have become 350 percent more accurate in the past twenty-five years alone.

“The role of a forecaster is to produce the best forecast possible,” Mayfield says. It’s so simple— and yet forecasters in so many fields routinely get it wrong.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 138-141). Penguin Group US. Kindle Edition. 

Silver notes similar failures of communication of forecast uncertainty in other domains, including exaggeration of the 1976 swine flu outbreak.

I also found a few related papers that may be worth reading if you're interested in understanding the communication of weather-related rare event forecasts:

#3: Long-run changes that might necessitate policy responses or long-term mitigation or adaptation strategies, such as climate change

In marked contrast to daily weather forecasting as well as extreme rare event forecasting is the forecasting of gradual long-term structural changes. Examples include climate change, economic growth, changes in the size and composition of the population, and technological progress. Here, the general recommendation is clear and detailed communication of uncertainty using multiple formats, with the format tailored to the types of decisions that will be based on the information.

On the subject of communicating uncertainty in climate change, I found the paper Communicating uncertainty: lessons learned and suggestions for climate change assessment by Anthony Patt and Suraje Dessai. The paper is quite interesting (and has been referenced by some of the other papers mentioned in this post).

The paper identifies three general sources of uncertainty:

  • Epistemic uncertainty arises from incomplete knowledge of processes that influence events.
  • Natural stochastic uncertainty refers to the chaotic nature of the underlying system (in this case, the climate system).
  • Human reflexive uncertainty refers to uncertainty in human activity that could affect the system. Some of the activity may be undertaken specifically in response to the forecast.

This is somewhat similar to, but not directly mappable to, the classification of sources of uncertainty by Gavin Schmidt from NASA that I discussed in my post on weather and climate forecasting:

  • Initial condition uncertainty: This form of uncertainty dominates short-term weather forecasts (though not necessarily the very short term weather forecasts; it seems to matter the most for intervals where numerical weather prediction gets too uncertain but long-run equilibrating factors haven't kicked in). Over timescales of several years, this form of uncertainty is not influential.
  • Scenario uncertainty: This is uncertainty that arises from lack of knowledge of how some variable (such as carbon dioxide levels in the atmosphere, or levels of solar radiation, or aerosol levels in the atmosphere, or land use patterns) will change over time. Scenario uncertainty rises over time, i.e., scenario uncertainty plagues long-run climate forecasts far more than it plagues short-run climate forecasts.
  • Structural uncertainty: This is uncertainty that is inherent to the climate models themselves. Structural uncertainty is problematic at all time scales to a roughly similar degree (some forms of structural uncertainty affect the short run more whereas some affect the long run more).

Section 2 of the paper has a general discussion of interpreting and communicating probabilities. One of the general points made is that the more extreme the event, the lower people's mental probability threshold for verbal descriptions of likelihood. For instance, for a serious disease, the probability threshold for "very likely" may be 30%, whereas for a minor ailment, it may be 90% (these numbers are my own, not from the paper). The authors also discuss the distinction between frequentist and Bayesian approaches and claim that the frequentist approach is better suited to assimilating multiple pieces of information, and therefore, frequentist framings should be preferred to Bayesian framings when communicating uncertainty:

As should already be evident, whether the task of estimating and responding to uncertainty is framed in stochastic (usually frequentist) or epistemic (often Bayesian) terms can strongly influence which heuristics people use, and likewise lead to different choice outcomes [23]. Framing in frequentist terms on the one hand promotes the availability heuristic, and on the other hand promotes the simple acts of multiplying, dividing, and counting. Framing in Bayesian terms, by contrast, promotes the representativeness heuristic, which is not well adapted to combining multiple pieces of information. In one experiment, people were given the problem of estimating the chances that a person has a rare disease, given a positive result from a test that sometimes generates false positives. When people were given the problem framed in terms of a single patient receiving the diagnostic test, and the base probabilities of the disease (e.g., 0.001) and the reliability of the test (e.g., 0.95), they significantly over-estimate the chances that the person has the disease (e.g., saying there is a 95% chance). But when people were given the same problem framed in terms of one thousand patients being tested, and the same probabilities for the disease and the test reliability, they resorted to counting patients, and typically arrived at the correct answer (in this case, about 2%). It has, indeed, been speculated that the gross errors at probability estimation, and indeed errors of logic, observed in the literature take place primarily when people are operating within the Bayesian probability framework, and that these disappear when people evaluate problems in frequentist terms [23,58].

The authors offer the following suggestions in the discussion section (Section 4) of their paper:

The challenge of communicating probabilistic information so that it will be used, and used appropriately, by decision-makers has been long recognized. [...] In some cases, the heuristics that people use are not well suited to the particular problem that they are solving or decision that they are making; this is especially likely for types of problems outside their normal experience. In such cases, the onus is on the communicators of the probabilistic information to help people find better ways of using the information, in such a manner that respects the users’ autonomy, full set of concerns and goals, and cognitive perspective.

That these difficulties appear to be most pronounced when dealing with predictions of one-time events, where the probability estimates result from a lack of complete confidence in the predictive models. When people speak about such epistemic or structural uncertainty, they are far more likely to shun quantitative descriptions, and are far less likely to combine separate pieces of information in ways that are mathematically correct. Moreover, people perceive decisions that involve structural uncertainty as riskier, and will take decisions that are more risk averse. By contrast, when uncertainty results from well-understood stochastic processes, for which the probability estimate results from counting of relative frequencies, people are more likely to work effectively with multiple pieces of information, and to take decisions that are more risk neutral.

In many ways, the most recent approach of the IPCC WGI responds to these issues. Most of the uncertainties with respect to climate change science are in fact epistemic or structural, and the probability estimates of experts reflect degrees of confidence in the occurrence of one-time events, rather than measurement of relative frequencies in relevant data sets. Using probability language, rather than numerical ranges, matches people’s cognitive framework, and will likely make the information both easier to understand, and more likely to be used. Moreover, defining the words in terms of specific numerical ranges ensures consistency within the report, and does allow comparison of multiple events, for which the uncertainty may derive from different sources.

We have already mentioned the importance of target audiences in communicating uncertainties, but this cannot be emphasized enough. The IPCC reports have a wide readership so a pluralistic approach is necessary. For example, because of its degree of sophistication, the water chapter could communicate uncertainties using numbers, whereas the regional chapters might use words and the adaptive capacity chapter could use narratives. “Careful design of communication and reporting should be done in order to avoid information divide, misunderstandings, and misinterpretations. The communication of uncertainty should be understandable by the audience. There should be clear guidelines to facilitate clear and consistent use of terms provided. Values should be made explicit in the reporting process” [32].

However, by writing the assessment in terms of people’s intuitive framework, the IPCC authors need to understand that this intuitive framework carries with it several predictable biases. [...]

The literature suggests, and the two experiments discussed here further confirm, that the approach of the IPCC leaves room for improvement. Further, as the literature suggests, there is no single solution for these potential problems, but there are communication practices that could help. [...]

Finally, the use of probability language, instead of numbers, addresses only some of the challenges in uncertainty communication that have been identified in the modern decision support literature. Most importantly, it is important in the communication process to address how the information can and should be used, using heuristics that are appropriate for the particular decisions. [...] Obviously, there are limits to the length of the report, but within the balancing act of conciseness and clarity, greater attention to full dimensions of uncertainty could likely increase the chances that users will decide to take action on the basis of the new information.

View more: Prev | Next