Comment author: polymathwannabe 12 July 2014 01:43:58AM *  3 points [-]

Off the top of my head, other rare events worth anticipating:

  • Assassination of a head of state and/or coup d'état

  • War between and/or within highly developed countries

  • A new pandemic

  • Unavoidable meteorite

  • Extraterrestrial invasion

Comment author: VipulNaik 12 July 2014 02:34:03PM 1 point [-]

Thanks! I added pandemics (though not in the depth I should have). I'll look at some of the others.

Forecasting rare events

5 VipulNaik 11 July 2014 10:48PM

In an earlier post, I looked at some general domains of forecasting. This post looks at some more specific classes of forecasting, some of which overlap with the general domains, and some of which are more isolated. The common thread to these classes of forecasting is that they involve rare events.

Different types of forecasting for rare events

When it comes to rare events, there are three different classes of forecasts:

  1. Point-in-time-independent probabilistic forecasts: Forecasts that provide a probability estimate for the event occurring in a given timeframe, but with no distinction based on the point in time. In other words, the forecast may say "there is a 5% chance of an earthquake higher than 7 on the Richter scale in this geographical region in a year" but the forecast is not sensitive to the choice of year. These are sufficient to inform decisions on general preparedness. In the case of earthquakes, for instance, the amount of care to be taken in building structures can be determined based on these forecasts. On the other hand, it's useless for deciding the timing of specific activities.
  2. Point-in-time-dependent probabilistic forecasts: Forecasts that provide a probability estimate that varies somewhat over time based on history, but aren't precise enough for a remedial measure that substantially offsets major losses. For instance, if I know that an earthquake will occur in San Francisco in the next 6 months with probability 90%, it's still not actionable enough for a mass evacuation of San Francisco. But some preparatory measures may be undertaken.
  3. Predictions made with high confidence (i.e., a high estimated probability when the event is predicted) and a specific time, location, and characteristics: Precise predictions of date and time, sufficient for remedial measures that substantially offset major losses (but possibly at huge, if much smaller, cost). The situation with hurricanes, tornadoes, and blizzards is roughly in this category.

Statistical distributions: normal distributions versus power law distributions

Perhaps the most ubiquitous distribution used in probability and statistics is the normal distribution. The normal distribution is a symmetric distribution whose probability density function decays superexponentially with distance from the mean (more precisely, it is exponential decay in the square of the distance). In other words, the probability decays slowly at the beginning, and faster later. Thus, for instance, the ratio of pdfs for 2 standard deviations from the mean and 1 standard deviation from the mean is greater than the ratio of pdfs for 3 standard deviations from the mean and 2 standard deviations from the mean. To give explicit numbers: about 68.2% of the distribution lies between -1 and +1 SD, 95.4% lies between -2 and +2 SD, 99.7% lies between -3 and +3 SD, and 99.99% lies between -4 and +4 SD. So the probability of being more than 4 standard deviations is less than 1 in 10000.

If the probability distribution for intensity looks (roughly) like a normal distribution, then high-intensity events are extremely unlikely. So, if the probability distribution for intensity is normal, we do not have to worry about high-intensity events much.

The types of situations where rare event forecasting becomes more important is where events that are high-intensity, or "extreme" in some sense, occur rarely but not as rarely as in a normal distribution. We say that the tails of such distributions are thicker than those of the normal distribution, and the distributions are termed "thick-tailed" or "fat-tailed" distributions. [Formally, the thickness of tails is measured using a quantity called excess kurtosis, which sees how the fourth central moment compares with the square of the second central moment (the second central moment is the variance, and it is the square of the standard deviation), then subtracts off the number 3, which is the corresponding value for the normal distribution. If the excess kurtosis for a distribution is positive, it is a thick-tailed distribution.]

The most common example of such distributions that is of interest to us is power law distributions. Here, the probability is proportional to a negative power. So the decay is like a power. If you remember some basic precalculus/calculus, you'll recall that power functions (such as the square function or cube function) grow more slowly than exponential functions. So power law distributions decay more subexponentially: they decay more slowly than exponential decay (to be more precise, the decay starts off as fast, then slows down). As noted above, the pdf for the normal distribution decays exponentially in the square of the distance from the mean, so the upshot is that power law distributions decay more slowly than normal distributions.

For most of the rare event classes we discuss, to the extent that it has been possible to pin down a distribution, it has looked a lot more like a power law distribution than a normal distribution. Thus, rare events need to be heeded. (There's obviously a selection effect here: for those cases where the distributions are close to normal, forecasting rare events just isn't that challenging, so they wouldn't be included in my post).

UPDATE: Aaron Clauset, who appears in #4, pointed me (via email) to his Rare Events page, containing the code (Matlab and Python) that he used in his terrorism statistics paper mentioned as an update at the bottom of #4. He noted in the email that the statistical methods are fairly general, so interested people could use the code if they were interested in cross-applying to rare events in other domains.

Talebisms

One of the more famous advocates of the idea that people overestimate the ubiquity of normal distributions and underestimate the prevalence of power law distributions is Nassim Nicholas Taleb. Taleb calls the world of normal distributions Mediocristan (the world of mediocrity, where things are mostly ordinary and weird things are very very rare) and the world of power law distributions Extremistan (the world of extremes, where rare and weird events are more common). Taleb has elaborated on this thesis in his book The Black Swan, though some parts of the idea are also found in his earlier book Fooled by Randomness.

I'm aware that a lot of people swear by Taleb, but I personally don't find his writing very impressive. He does cover a lot of important ideas but they didn't originate with him, and he goes off on a lot of tangents. In contrast, I found Nate Silver's The Signal and the Noise a pretty good read, and although it wasn't focused on rare events per se, the parts of it that did discuss such forecasting were used by me in this post.

(Sidenote: My criticism of Taleb is broadly similar to that offered by Jamie Whyte here in Standpoint Magazine. Also, here's a review by Steve Sailer of Taleb. Sailer is much more favorably inclined to the normal distribution than Taleb is, and this is probably related to his desire to promote IQ distributions/The Bell Curve type ideas, but I think many of Sailer's criticisms are spot on).

Examples of rare event classes that we discuss in this post

The classes discussed in this post include:

  1. Earthquakes: Category #1, also, hypothesized to follows a power law distribution.
  2. Volcanoes: Category #2.
  3. Extreme weather events (hurricanes/cyclones, tornadoes, blizzards): Category #3.
  4. Major terrorist acts: Questionable, at least Category #1, some argue it is Category #2 or Category #3. Hypothesized to follow a power law distribution.
  5. Power outages (could be caused by any of 1-4, typically 3)
  6. Server outages (could be caused by 5)
  7. Financial crises
  8. Global pandemics, such as the 1918 flu pandemic (popularly called the "Spanish flu") that, according to Wikipedia, "infected 500 million people across the world, including remote Pacific islands and the Arctic, and killed 50 to 100 million of them—three to five percent of the world's population." They probably fall under Category #2, but I couldn't get a clear picture. (Pandemics were not in the list at the time of original publication of the post; I added them based on a comment suggestion).
  9. Near-earth object impacts (not in the list at the time of original publication of the post; I added them based on a comment suggestion).

Other examples of rare events would also be appreciated.

#1: Earthquakes

Earthquake prediction remains mostly in category 1: there are probability estimates of the occurrence of earthquakes of a given severity or higher within a given timeframe, but these estimates do not distinguish between different points in time. In The Signal and the Noise, statistician and forecasting expert Nate Silver talks to Susan Hough (Wikipedia) of the United States Geological Survey and describes what she has to say about the current state of earthquake forecasting:

What seismologists are really interested in— what Susan Hough calls the “Holy Grail” of seismology— are time-dependent forecasts, those in which the probability of an earthquake is not assumed to be constant across time.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 154). Penguin Group US. Kindle Edition.

The whole Silver chapter is worth reading, as is the Wikipedia page on earthquake prediction, which covers much of the same ground.

In fact, even for the time-independent earthquake forecasting, currently the best known forecasting method is the extremely simple Gutenberg-Richter law, which says that for a given location, the frequency of earthquakes obeys a power law with respect to intensity. Since the Richter scale is logarithmic (to base 10), this means that adding a point on the Richter scale makes the frequency of earthquakes decrease to a fraction of the previous value. Note that the Gutenberg-Richter law can't be the full story: there are probably absolute limits on the intensity of the earthquake (some people believe that an earthquake of intensity 10 or higher is impossible). But so far, it seems to have the best track record.

Why haven't we been able to come up with better models? This relates to the problem of overfitting common in machine learning and statistics: when the number of data points is very small, and quite noisy, then trying a more complicated law (with more freely varying parameters) ends up fitting the noise in the data rather than the signal, and therefore ends up being a poor fit for new, out-of-sample data. The problem is dealt with in statistics using various goodness of fit tests and measures such as the Akaike information criterion, and it's dealt with in machine learning using a range of techniques such as cross-validation, regularization, and early stopping. These approaches can generally work well in situations where there is lots of data and lots of parameters. But in cases where there is very little data, it often makes sense to just manually select a simple model. The Gutenberg-Richter law has two parameters, and can be fit using a simple linear regression. There isn't enough information to reliably fit even modestly more complicated models, such as the characteristic earthquake models, and past attempts based on characteristic earthquakes failed in both directions (a predicted earthquake at Parkfield never materialized, and the probability of the 2011 Japan earthquake was underestimated by the model relative to the Gutenberg-Richter law).

Silver's chapter and other sources do describe some possibilities for short-term forecasting based on foreshocks and aftershocks, and seismic disturbances, but note considerable uncertainty.

The existence of time-independent forecasts for earthquakes has probably had major humanitarian benefits. Building codes and standards, in particular, can adapt to the probability of earthquakes. For instance, building standards are greater in the San Francisco Bay Area than in other parts of the United States, partly because of the greater probability of earthquakes. Note also that Gutenberg-Richter does make out-of-sample predictions: it can use the frequency of low-intensity earthquakes to predict the frequency of high-intensity earthquakes, and therefore obtain a time-independent forecast of such an earthquake in a region that may never have experienced it.

#2: Volcanic eruptions

Volcanoes are an easier case than earthquakes. Silver's book doesn't discuss them, but the Wikipedia article offers basic information. A few points:

  • Volcanic activity falls close to category #2: time-dependent forecasts can be made, albeit with considerable uncertainty.
  • Volcanic activity poses less immediate risk because fewer people live close to the regions where volcanoes typically erupt.
  • However, volcanic activity can affect regional and global climate for a few years (in the cooling direction), and might even shift the intercept of other long-term secular and cyclic trends in climate (the reason is that the dust particles released by volcanoes into the atmosphere reduce the extent to which solar radiation is absorbed). For instance, the 1991 Mount Pinatubo eruption is credited with causing the next 1-2 years to be cooler than they otherwise would be, masking the heating effect of a strong El Nino.

#3:  Extreme weather events (lightning, hurricanes/cyclones, blizzards, tornadoes)

Forecasting for lightning and thunderstorms has improved quite a bit over the last century, and falls squarely within Category #3. In The Signal and the Noise, Nate Silver notes that the probability of an American dying from lightning has dropped from 1 in 400,000 in 1940 to 1 in 11,000,000 today, and a large part of the credit goes to better weather forecasting causing people to avoid the outdoors at the times and places that lightning might strike.

Forecasting for hurricanes and cyclones (which are the same weather phenomenon, just at different latitudes) is quite good, and getting better. It falls squarely in category #3: in addition to having general probability estimates of the likelihood of particular types of extreme weather events, we can forecast them a day or a few days in advance, allowing for preparation and minimization of negative impact.

The precision for forecasting the eye of the storm has increased about 3.5-fold in length terms (so about 12-fold in area terms) over the last 25 years. Nate Silver notes that 25 years ago, the National Hurricane Center's forecasts for where a hurricane would hit on landfall, made three days in advance, were 350 miles off on average. Now they're about 100 miles off on average. Most of the major hurricanes that hit the United States, and many other parts of the world, were forecast well in advance, and people even made preparations (for instance, by declaring holidays, or stocking up on goods). Blizzard forecasting is also fairly impressive: I was at Chicago in 2011 when a blizzard hit, and it had been forecast at least a day in advance. With tornadoes, tornado warning alerts are often issued, albeit the tornado often doesn't actually touch down even after the alert is issued (fortunately for us).

See also my posts on weather forecasting and climate forecasting.

#4: Major terrorist acts

Terrorist attacks are interesting. It has been claimed that the frequency-damage relationship for terrorist attacks follows a power law. The academic paper that popularized this observation is a paper by Aaron Clauset, Maxwell Young and Kristian Gleditsch titled "On the Frequency of Severe Terrorist Attacks" (Journal of Conflict Resolution 51(1), 58 - 88 (2007)), here. Bruce Schneier wrote a blog post about a later paper by Clauset and Frederick W. Wiegel, and see also more discussion here, here, here, and here (I didn't select these links through a very discerning process; I just picked the top results of a Google Search).

Silver's book does allude to power laws for terrorism, but I couldn't find any reference to Clauset in his book (oops, seems like my Kindle search was buggy!) and says the following about Clauset:

Clauset’s insight, however, is actually quite simple— or at least it seems that way with the benefit of hindsight. What his work found is that the mathematics of terrorism resemble those of another domain discussed in this book: earthquakes.

Imagine that you live in a seismically active area like California. Over a period of a couple of decades, you experience magnitude 4 earthquakes on a regular basis, magnitude 5 earthquakes perhaps a few times a year, and a handful of magnitude 6s. If you have a house that can withstand a magnitude 6 earthquake but not a magnitude 7, would it be right to conclude that you have nothing to worry about?

Of course not. According to the power-law distribution that these earthquakes obey, those magnitude 5s and magnitude 6s would have been a sign that larger earthquakes were possible—inevitable, in fact, given enough time. The big one is coming, eventually. You ought to have been prepared.

Terror attacks behave in something of the same way. The Lockerbie bombing and Oklahoma City were the equivalent of magnitude 7 earthquakes. While destructive enough on their own, they also implied the potential for something much worse— something like the September 11 attacks, which might be thought of as a magnitude 8. It was not an outlier but instead part of the broader mathematical pattern.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 427-428). Penguin Group US. Kindle Edition.

So terrorist attacks are at least in category 1. What about categories 2 and 3? Can we forecast terrorist attacks the way we can forecast volcanoes, or the way we can forecast hurricanes. One difference between terrorist acts and the "acts of God" discussed so far is that to the extent one has inside information about a terrorist attack that's good enough to predict it with high accuracy, it's usually also sufficient to actually prevent the terrorist attack. So Category 3 becomes trickier to define. Should we count the numerous foiled terrorist plots as evidence that terrorist acts can be successfully "predicted" or should we only consider successful terrorist acts in the denominator? And another complication is that terrorist acts are responsive to geopolitical decisions in ways that earthquakes are definitely not, with extreme weather events falling somewhere in between.

As for Category 2, the evidence is unclear, but it's highly likely that terrorist acts can be forecast in a time-dependent fashion to quite a degree. If you want to crunch the numbers yourself, the Global Terrorism Database (website, Wikipedia) and Suicide Attack Database (website, Wikipedia) are available for you to use. I discussed some general issues with political and conflict forecasting in my earlier post on the subject.

UPDATE: Clauset emailed me with some corrections to this section of the post, which I have made. He also pointed to a recent paper he co-wrote with Ryan Woodward about estimating the historical and future probabilities of terror events, available on the ArXiV. Here's the abstract:

Quantities with right-skewed distributions are ubiquitous in complex social systems, including political conflict, economics and social networks, and these systems sometimes produce extremely large events. For instance, the 9/11 terrorist events produced nearly 3000 fatalities, nearly six times more than the next largest event. But, was this enormous loss of life statistically unlikely given modern terrorism's historical record? Accurately estimating the probability of such an event is complicated by the large fluctuations in the empirical distribution's upper tail. We present a generic statistical algorithm for making such estimates, which combines semi-parametric models of tail behavior and a nonparametric bootstrap. Applied to a global database of terrorist events, we estimate the worldwide historical probability of observing at least one 9/11-sized or larger event since 1968 to be 11-35%. These results are robust to conditioning on global variations in economic development, domestic versus international events, the type of weapon used and a truncated history that stops at 1998. We then use this procedure to make a data-driven statistical forecast of at least one similar event over the next decade.

#5: Power outages

Power outages could have many causes. Note that insofar as we can forecast the phenomena underlying the causes, this can be used to reduce, rather than simply forecast, power outages.

  • Poor load forecasting, i.e., electricity companies don't forecast how much demand there will be and don't prepare supplies adequately. This is less of an issue in developed countries, where the power systems are more redundant (at some cost to efficiency): Note here that the power outage occurs due to a failure of a more mundane forecasting exercise. Forecasting the frequency of power outages due to this cause is basically an exercise in calibrating the quality of the mundane forecasting exercise.
  • Abrupt or significant shortages in fuel, often for geopolitical reasons. This therefore ties in with the general exercise of geopolitical forecasting (see my earlier post on the subject). This seems rare in the modern world, due to the considerably redundancy built into global fuel supplies.
  • Disruption of power lines or power supply units due to weather events. The most common causes appear to be lightning, ice, wind, rain, and flooding. This ties in with #3, and with my weather forecasting and climate forecasting posts. This is the most common cause of power outages in developed countries with advanced electricity grids (see, for instance, here and here).
  • Disruption by human or animal activity, including car accidents and animals climbing onto and playing with the power lines.
  • Perhaps the most niche source of power outages, that many people may be unaware of, is geomagnetic storms (Wikipedia). These are quite rare, but can result in major power blackouts. Geomagnetic storms were discussed in past MIRI posts (here and here). Geomagnetic storms are low-frequency and low-probability events but with potentially severe negative impact.

My impression is that when it comes to power outages, we are at Category 2 in forecasting. Load forecasting can identify seasons, times of the day, and special occasions when power demand will be high. Note that the infrastructure needs to built for peak capacity.

We can't quite be in Category 3,  because in cases where we can forecast more finely, we could probably prevent the outage anyway.

What sort of preventive measures do people undertake with knowledge of the frequency of power outages? In places where power outages are more likely, people are more likely to have backup generators. People may be more likely to use battery-powered devices. If you know that a power outage is likely to happen in the next few days, you might take more care to charge the batteries on your devices.

#6: Server outages

In our increasingly connected world, websites going down can have a huge effect on the functioning of the Internet and of the world economy. As with power infrastructure, the complexity of server infrastructure needed to increase uptime increases very quickly. The point is that routing around failures at different points in the infrastructure requires redundancy. For instance, if any one server fails 10% of the time, and the failures of different components are independent, you'd need two servers to get to a 1% failure rate. But in practice, the failures aren't independent. For instance, having loads of servers in a single datacenter covers the risk of any given server there crashing, but it doesn't cover the risk of the datacenter itself getting disconnected (e.g., losing electricity, or getting disconnected from the Internet, or catching fire). So now we need multiple datacenters. But multiple datacenters are far from each other, so that increases the time costs of synchronization. And so on. For more detailed discussions of the issues, see here and here.

My impression is that server outages are largely Category 1: we can use the probability of outages to determine the trade-off between the cost of having redundant infrastructure and the benefit of more uptime. There is an element of Category 2: in some cases, we have knowledge that traffic will be higher at specific times, and additional infrastructure can be brought to bear for those times. As with power infrastructure, server infrastructure needs to be built to handle peak capacity.

#7: Financial crises

The forecasting of financial crises is a topic worthy of its own post. As with climate science, financial crisis forecasting has the potential for heavy politicization, given the huge stakes both of forecasting financial crises and of any remedial or preventative measures that may be undertaken. In fact, the politicization and ideology problem is probably substantially worse in financial crisis forecasting. At the same time, real-world feedback occurs faster, providing more opportunity for people to update their beliefs and less scope for people getting away with sloppiness because their predictions take too long to evaluate.

A literally taken strong efficient market hypothesis (EMH) (Wikipedia) would suggest that financial crises are almost impossible to forecast, while a weaker reading of the EMH would suggest that the financial market is efficient (Wikipedia): it's hard to make money off the business of forecasting financial crises (for instance, you may know that a financial crisis is imminent with high probability, but the element of uncertainty, particularly with regards to timing, can destroy your ability to leverage that information to make money). On the other hand, there are a lot of people, often subscribed to competing schools of economic thought, who successfully forecast the 2007-08 financial crisis, at least in broad strokes.

Note that there are people who reject the EMH, yet claim that financial crises are very hard to forecast in a time-dependent fashion. Among them is Nassim Nicholas Taleb, as described here. Interestingly, Taleb's claim to fame appears to have been that he was able to forecast the 2007-08 financial crisis, albeit it was more of a time-independent forecast than a specific timed call. The irony was noted by by Jamie Whyte here in Standpoint Magazine.

I found a few sources of information for financial crises, that are discussed below.

Economic Predictions records predictions made by many prominent people and how they compared to what transpired.  In particular, this page on their website notes how many of the top investors, economists, and bureaucrats missed the financial crisis, but also identifies some exceptions: Dean Baker, Med Jones, Peter Schiff, and Nouriel Roubini. The page also discusses other candidates who claim to have forecasted the crisis in advance, and reasons why they were not included. While I think they've put in a fair deal of effort into their project, I didn't see good evidence that they have a strong grasp of the underlying fundamental issues they are discussing.

An insightful general overview of the financial crisis is found in Chapter 1 of Nate Silver's The Signal and the Noise, a book that I recommend you read in its entirety. Silver notes four levels of forecasting failure.

 

  • The housing bubble can be thought of as a poor prediction. Homeowners and investors thought that rising prices implied that home values would continue to rise, when in fact history suggested this made them prone to decline.
  • There was a failure on the part of the ratings agencies, as well as by banks like Lehman Brothers, to understand how risky mortgage-backed securities were. Contrary to the assertions they made before Congress, the problem was not that the ratings agencies failed to see the housing bubble. Instead, their forecasting models were full of faulty assumptions and false confidence about the risk that a collapse in housing prices might present.
  • There was a widespread failure to anticipate how a housing crisis could trigger a global financial crisis. It had resulted from the high degree of leverage in the market, with $ 50 in side bets staked on every $ 1 that an American was willing to invest in a new home.
  • Finally, in the immediate aftermath of the financial crisis, there was a failure to predict the scope of the economic problems that it might create. Economists and policy makers did not heed Reinhart and Rogoff’s finding that financial crises typically produce very deep and long-lasting recessions.


Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 42-43). Penguin Group US. Kindle Edition.

 

Silver finds a common thread among all the failures (emphases in original):

There is a common thread among these failures of prediction. In each case, as people evaluated the data, they ignored a key piece of context:

  • The confidence that homeowners had about housing prices may have stemmed from the fact that there had not been a substantial decline in U.S. housing prices in the recent past. However, there had never before been such a widespread increase in U.S. housing prices like the one that preceded the collapse.
  • The confidence that the banks had in Moody’s and S& P’s ability to rate mortgage-backed securities may have been based on the fact that the agencies had generally performed competently in rating other types of financial assets. However, the ratings agencies had never before rated securities as novel and complex as credit default options.
  • The confidence that economists had in the ability of the financial system to withstand a housing crisis may have arisen because housing price fluctuations had generally not had large effects on the financial system in the past. However, the financial system had probably never been so highly leveraged, and it had certainly never made so many side bets on housing before.
  • The confidence that policy makers had in the ability of the economy to recuperate quickly from the financial crisis may have come from their experience of recent recessions, most of which had been associated with rapid, “V-shaped” recoveries. However, those recessions had not been associated with financial crises, and financial crises are different.

There is a technical term for this type of problem: the events these forecasters were considering were out of sample. When there is a major failure of prediction, this problem usually has its fingerprints all over the crime scene.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 43). Penguin Group US. Kindle Edition.

While I find Silver's analysis plausible and generally convincing, I don't think I have enough of an inside-view understanding of the issue.

A few other resources that I found, but didn't get a chance to investigate, are listed below:

#8: Pandemics

I haven't investigated this thoroughly, but here are a few of my impressions and findings:

  • I think that pandemics stand in relation to ordinary epidemiology in the same way that extreme weather events stand in relation to ordinary weather forecasting. In both cases, the main way we can get better at forecasting the rare and high-impact events is by getting better across the board. There is a difference that makes the relation between moderate disease outbreaks and pandemics even more important than the corresponding case for weather: measures taken quickly to react to local disease outbreaks can help prevent global pandemics.
  • Chapter 7 of Nate Silver's The Signal and the Noise, titled "Role Models", discusses forecasting and prediction in the domain of epidemiology. The goal of epidemiologists is to obtain predictive models that have a level of accuracy and precision similar to those used for the weather. However, the greater complexity of human behavior, as well as the self-fulfilling and self-canceling nature of various predictions, makes the modeling problem harder. Silver notes that agent-based modeling (Wikipedia) is one of the commonly used tools. Silver cites a few examples from recent history where people were overly alarmed about possible pandemics, when the reality turned out to be considerably milder. However, the precautions taken due to the alarm may still have saved lives. Silver talks in particular of the 1976 swine flu outbreak (where the reaction turned out be grossly disproportional to the problem, and caused its own unintended consequences) and the 2009 flu pandemic.
  • In recent years, Google Flu Trends (website, Wikipedia) has been a common technique in identifying and taking quick action against the flu. Essentially, Google uses the volume of web search for flu-related terms by geographic location to identify the incidence of the flu by geographic location. It offers an early "leading indicator" of flu incidence compared to official reports, that are published after a time lag. However, Google Flu Trends has run into problems of reliability: news stories about the flu might prompt people to search for flu-related terms, even if they aren't experiencing symptoms of the flu. Or it may even be the case that Google's own helpful search query completions get people to search for flu-related terms once other people start searching for the term. Tim Harford discusses the problems in the Financial Times here. I think Silver doesn't discuss this (which is a surprise, since it would have fit well with the theme of his chapter).

#9: Near-earth object impacts

I haven't looked into this category in sufficient detail. I'll list below the articles I read.

Comment author: buybuydandavis 11 July 2014 03:07:20AM *  1 point [-]

Is this what it comes down to, that Gore refused to bet, so they presumed to make a pretend bet for him?

Boo. Lame. Worse than lame. Deceptive. (On their part.)

Tell me it aint so.

http://www.theclimatebet.com/?p=206&cpage=1#comment-229

“Now, assume that Armstrong and Gore made a gentleman‟s bet (no money) and that the ten years of the bet started on January 1, 2008. Armstrong‟s forecast was that there would be no change in global mean temperature over the next ten years. Gore did not specify a method or a forecast. Nor did searches of his book or the Internet reveal any quantitative forecasts or any methodology that he relied on. He did, however, imply that the global mean temperature would increase at a rapid rate – presumably at least as great as the IPCC‟s 1992 projection of 0.03°C-per-year. Thus, the IPCC‟s 1992 projection is used as Gore‟s forecast.

Comment author: VipulNaik 11 July 2014 03:11:38AM 0 points [-]

The full correspondence is here:

http://www.theclimatebet.com/?page_id=4

Maybe it's lame (?) but I don't think they're being deceptive -- they're quite explicit that Gore refused to bet.

The fact that he refused to bet could be interpreted either as evidence that the bet was badly designed and didn't reflect the fundamental point of disagreement between Gore and Armstrong, or as evidence that Gore was unwilling to put his money where his mouth is.

I'm not sure what interpretation to take.

btw, here's a bet that was actually properly entered into by both parties (neither of them a climate scientist):

http://econlog.econlib.org/archives/2014/06/bauman_climate.html

[QUESTION]: Driverless car forecasts

7 VipulNaik 11 July 2014 12:25AM

Of the technologies that have a reasonable chance of come to mass market in the next 20-25 years and having a significant impact on human society, driverless cars (also known as self-driving cars or autonomous cars) stand out. I was originally planning to collect material discussing driverless cars, but Gwern has a really excellent compendium of statements about driverless cars, published January 2013 (if you're reading this, Gwern, thanks!). There have been a few developments since then (for instance, Google's announcement that it was building its own driverless car, or a startup called Cruise Automation planning to build a $10,000 driverless car) but the overall landscape remains similar. There's been some progress with understanding and navigating city streets and with handling adverse weather conditions, and it's more or less on schedule.

My question is about driverless car forecasts. Driverless Future has a good summary page of forecasts made by automobile manufacturer, insurers, and professional societies. The range of time for the arrival of the first commercial driverless cars varies between 2018 and 2030. The timeline for driverless cars to achieve mass penetration is similarly stagged between the early 2020s and 2040. (The forecasts aren't all directly comparable).

A few thoughts come to mind:

  1. Insurer societies and professional societies seem more conservative in their estimates than manufacturers (both automobile manufacturers and people manufacturing the technology for driverless cars). Note that the estimates of many manufacturers are centered on their projected release dates for their own driverless cars. This suggests an obvious conflict of interest: manufacturers may be incentivized to be optimistic in their projections of when driverless cars will be released, insofar as making more optimistic predictions wins them news coverage and might also improve their market valuation. (At the same time, the release dates are sufficiently far in the future that it's unlikely that they'll be held to account for false projections, so there isn't a strong incentive to be conservative the same way as there is with quarterly sales and earning forecasts). Overall, then, I'd defer more to the judgment of the professional societies, namely the IEEE and the Society of Autonomous Engineers.
  2. The statements compiled by Gwern point to the many legal hurdles and other thorny issues of ethics that would need to be resolved, at least partially, before driverless cars start becoming a big presence in the market.
  3. The general critique made by Schnaars in Megamistakes (that I discussed here) applies to driverless car technology: consumers may be unwilling to pay the added cost despite the safety benefits. Some of the quotes in Gwern's compendium reference related issues. This points further in the direction of forecasts by manufacturers being overly optimistic.

Questions for the people here:

  • Do you agree with my points (1)-(3) above?
  • Would you care to make forecasts for things such as: (a) the date that the first commercial driverless car will hit the market in a major country or US state? (b) the date by which over 10% of new cars sold in a large country or US state will be driverless (i.e., capable of fully autonomous operation), (c) same as (b), but over 50%, (d) the date by which over 10% of cars on the road (in a large country or US state) will be operating autonomously, (e) same as (d), but over 50%. You don't have to answer these exact questions, I'm just providing some suggestions since "forecast the future of driverless cars" is overly vague.
  • What's your overall view on whether it is desirable at the margin to speed up or slow down the arrival of autonomous vehicles on the road? What factors would you consider in answering such a question?
Comment author: buybuydandavis 09 July 2014 09:28:23PM 1 point [-]

I hope some high profile people start challenging big talkers with public bets. Put up or shut up, publicly.

Comment author: VipulNaik 10 July 2014 03:08:05PM 1 point [-]

Have you looked at http://www.theclimatebet.com (mentioned in an UPDATE at the end of Critique #1 in my post)?

Comment author: Emile 10 July 2014 06:05:29AM 0 points [-]

(Your quote is mangled, you probably have four spaces at the beginning which makes the rendering engine interpret it as a needing to be formatted like code, i.e. No linebreaks)

Comment author: VipulNaik 10 July 2014 06:17:00AM 2 points [-]

Thanks, fixed!

Comment author: bramflakes 09 July 2014 09:48:31PM 3 points [-]

When you're done with this sequence, you should really make a summary post in Main laying out links to them in order, along with brief descriptions of each. I'd hate to see these posts disappear into the abyss of old open threads and links.

Comment author: VipulNaik 09 July 2014 09:55:28PM *  2 points [-]

Thanks for both the appreciation and the suggestion.

I intend to do a concluding post on the MIRI blog, linking to all of these; if Luke agrees, I can cross-post that to LessWrong and accompany that with a full listing of blog posts.

I'll also put a list of all my posts on my personal website later on.

Comment author: Lumifer 09 July 2014 08:56:43PM 3 points [-]

A big hole in your list is forecasting of financial markets which is highly lucrative (when it works) and so attracts a considerable amount of effort and talent.

Comment author: VipulNaik 09 July 2014 09:35:40PM 2 points [-]

Good point. I'd looked at financial market forecasting along with macroeconomic forecasting, when I was investigating survey-based macroeconomic forecasting. I have some of the collected material, but I don't think I ever wrote it up. Thanks for reminding me! I'll add it to this post later.

Comment author: drnickbone 09 July 2014 07:27:24PM *  1 point [-]

Thanks for a comprehensive summary - that was helpful.

It seems that A&G contacted the working scientists to identify papers which (in the scientists' view) contained the most credible climate forecasts. Not many responded, but 30 referred to the recent (at the time) IPCC WP1 report, which in turn referenced and attempted to summarize over 700 primary papers. There also appear to have been a bunch of other papers cited by the surveyed scientists, but the site has lost them. So we're somewhat at a loss to decide which primary sources climate scientists find most credible/authoritative. (Which is a pity, because those would be worth rating, surely?)

However, A&G did their rating/scoring on the IPCC WP1, Chapter 8. But they didn't contact the climate scientists to help with this rating (or they did, but none of them answered?) They didn't attempt to dig into the 700 or so underlying primary papers, identify which of them contained climate forecasts, and/or had been identified by the scientists as containing the most credible forecasts and then rate those. Or even pick a random sample, and rate those? All that does sound just a tad superficial.

What I find really bizarre is their site's conclusion that because IPCC got a low score by their preferred rating principles, then a "no change" forecast is superior, and more credible! That's really strange, since "no change" has historically done much worse as a predictor than any of the IPCC models.

Comment author: VipulNaik 09 July 2014 08:24:56PM 0 points [-]

Actually, it's somewhat unclear whether the IPCC scenarios did better than a "no change" model -- it is certainly true over the short time period, but perhaps not over a longer time period where temperatures had moved in other directions.

Co-author Green wrote a paper later claiming that the IPCC models did not do better than the no change model when tested over a broader time period:

http://www.kestencgreen.com/gas-improvements.pdf

But it's just a draft paper and I don't know if the author ever plans to clean it up or have it published.

I would really like to see more calibrations and scorings of the models from a pure outside view approach over longer time periods.

Armstrong was (perhaps wrongly) confident enough of his views that he decided to make a public bet claiming that the No Change scenario would beat out the other scenario. The bet is described at:

http://www.theclimatebet.com/

Overall, I have high confidence in the view that models of climate informed by some knowledge of climate should beat the No Change model, though a lot depends on the details of how the competition is framed (Armstrong's climate bet may have been rigged in favor of No Change). That said, it's not clear how well climate models can do relative to simple time series forecasting approaches or simple (linear trend from radiative forcing + cyclic trend from ocean currents) type approaches. The number of independent out-of-sample validations does not seem to be enough and the predictive power of complex models relative to simple curve-fitting models seems to be low (probably negative). So, I think that arguments that say "our most complex, sophisticated models show X" should be treated with suspicion and should not necessarily be given more credence than arguments that rely on simple models and historical observations.

Comment author: drnickbone 09 July 2014 07:27:24PM *  1 point [-]

Thanks for a comprehensive summary - that was helpful.

It seems that A&G contacted the working scientists to identify papers which (in the scientists' view) contained the most credible climate forecasts. Not many responded, but 30 referred to the recent (at the time) IPCC WP1 report, which in turn referenced and attempted to summarize over 700 primary papers. There also appear to have been a bunch of other papers cited by the surveyed scientists, but the site has lost them. So we're somewhat at a loss to decide which primary sources climate scientists find most credible/authoritative. (Which is a pity, because those would be worth rating, surely?)

However, A&G did their rating/scoring on the IPCC WP1, Chapter 8. But they didn't contact the climate scientists to help with this rating (or they did, but none of them answered?) They didn't attempt to dig into the 700 or so underlying primary papers, identify which of them contained climate forecasts, and/or had been identified by the scientists as containing the most credible forecasts and then rate those. Or even pick a random sample, and rate those? All that does sound just a tad superficial.

What I find really bizarre is their site's conclusion that because IPCC got a low score by their preferred rating principles, then a "no change" forecast is superior, and more credible! That's really strange, since "no change" has historically done much worse as a predictor than any of the IPCC models.

Comment author: VipulNaik 09 July 2014 08:08:29PM *  2 points [-]

See the last sentence in my longer quote:

We sent out general calls for experts to use the Forecasting Audit Software to conduct their own audits and we also asked a few individuals to do so. At the time of writing, none have done so.

It's not clear how much effort they put into this step, and whether e.g. they offered the Forecasting Audit Software for free to people they asked (if they were trying to sell the software, which they themselves created, that might have seemed bad).

My guess is that most of the climate scientists they contacted just labeled them mentally along with the numerous "cranks" they usually have to deal with, and didn't bother engaging.

I also am skeptical of some aspects of Armstrong and Green's exercise. But a first outside-view analysis that doesn't receive much useful engagement from insiders can only go so far. What would have been interesting was if, after Armstrong and Green published their analysis and it was somewhat clear that their critique would receive attention, climate scientists had offered a clearer and more direct response to the specific criticisms, and perhaps even read up more about the forecasting principles and the evidence cited for them. I don't think all climate scientists should have done so, I just think at least a few should have been interested enough to do it. Even something similar to Nate Silver's response would have been nice. And maybe that did happen -- if so, I'd like to see links. Schmidt's response, on the other hand, seems downright careless and bad.

My focus here is the critique of insularity, not so much the effect it had on the factual conclusions. Basically, did climate scientists carefully consider forecasting principles (or statistical methods, or software engineering principles) then reject them? Had they never heard of the relevant principles? Did they hear about the principles, but dismiss them as unworthy of investigation? Armstrong and Green's audit may have been sloppy (though perhaps a first pass shouldn't be expected to be better than sloppy) but even if the audit itself wasn't much use, did it raise questions or general directions of inquiry worthy of investigation (or a simple response pointing to past investigation)? Schmidt's reaction seems evidence in favor of the dismissal hypothesis. And in the particular instance, maybe he was right, but it does seem to fit the general idea of insularity.

View more: Prev | Next