Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Claim: Scenario planning is preferable to quantitative forecasting for understanding and coping with AI progress

0 VipulNaik 25 July 2014 03:43AM

As part of my work for MIRI on forecasting, I'm considering the implications of what I've read up for the case of thinking about AI. My purpose isn't to actually come to concrete conclusions about AI progress, but more to provide insight into what approaches are more promising and what approaches are less promising for thinking about AI progress.

I've written a post on general-purpose forecasting and another post on scenario analysis. In a recent post, I considered scenario analyses for technological progress. I've also looked at many domains of forecasting and at forecasting rare events. With the knowledge I've accumulated, I've shifted in the direction of viewing scenario analysis as a more promising tool than timeline-driven quantitative forecasting for understanding AI and its implications.

I'll first summarize what I mean by scenario analysis and quantitative forecasting in the AI context. People who have some prior knowledge of the terms can probably skim through the summary quickly. Those who find the summary insufficiently informative, or want to delve deeper, are urged to read my more detailed posts linked above and the references therein.

Quantitative forecasting and scenario analysis in the AI context

The two approaches I am comparing are:

  • Quantitative forecasting: Here, specific predictions or forecasts are made, recorded, and later tested against what actually transpired. The forecasts are made in a form where it's easy to score whether they happened. Probabilistic forecasts are also included. These are scored using one of the standard methods to score probabilistic forecasts (such as logarithmic scoring or quadratic scoring).
  • Scenario analysis: A number of scenarios of how the future might unfold are generated in considerable detail. Predetermined elements, common to the scenario, are combined with critical uncertainties, that vary between the scenarios. Early indicators that help determine which scenario will transpire are identified. In many cases, the goal is to choose strategies that are robust to all scenarios. For more, read my post on scenario analysis.

Quantitative forecasts are easier to score for accuracy, and in particular offer greater scope for falsification. This has perhaps attracted rationalists more to quantitative forecasting, as a way of distinguishing themselves from what appears to be the more wishy-washy realm of unfalsifiable scenario analysis. In this post, I argue that, given the considerable uncertainty surrounding progress in artificial intelligence, scenario analysis is a more apt tool.

There are probably some people on LessWrong who have high confidence in quantitative forecasts. I'm happy to make bets (financial or purely honorary) on such subjects. However, if you're claiming high certainty while I am claiming uncertainty, I do want to have odds in my favor (depending on how much confidence you express in your opinion), for reasons similar to those that Bryan Caplan described here.

Below, I describe my reasons for preferring scenario analysis to forecasting.

#1: Considerable uncertainty

Proponents of the view that AI is scheduled to arrive in a few decades typically cite computing advances such as Moore's law. However, there's considerable uncertainty even surrounding short-term computing advances, as I described in my scenario analyses for technological progress. When it comes to the question of progress in AI, we have to combine uncertainties in hardware progress with uncertainties in software progress.

Quantitative forecasting methods, such as trend extrapolation, tend to do reasonably well, and might be better than nothing. But they are not foolproof. In particular, the impending death of Moore's law, despite the trend staying quite robust for about 50 years, should make us cautious about too naive an extrapolation of trends. Arguably, simple trend extrapolation is still the best choice relative to other forecasting methods, at least as a general rule. But acknowledging uncertainty and considering multiple scenarios could prepare us a lot better for reality.

In a post in May 2013 titled When Will AI Be Created?, MIRI director Luke Muehlhauser (who later assigned me the forecasting project) looked at the wide range of beliefs about the time horizon for the arrival of human-level AI. Here's how Luke described the situation:

To explore these difficulties, let’s start with a 2009 bloggingheads.tv conversation between MIRI researcher Eliezer Yudkowsky and MIT computer scientist Scott Aaronson, author of the excellent Quantum Computing Since Democritus. Early in that dialogue, Yudkowsky asked:

It seems pretty obvious to me that at some point in [one to ten decades] we’re going to build an AI smart enough to improve itself, and [it will] “foom” upward in intelligence, and by the time it exhausts available avenues for improvement it will be a “superintelligence” [relative] to us. Do you feel this is obvious?

Aaronson replied:

The idea that we could build computers that are smarter than us… and that those computers could build still smarter computers… until we reach the physical limits of what kind of intelligence is possible… that we could build things that are to us as we are to ants — all of this is compatible with the laws of physics… and I can’t find a reason of principle that it couldn’t eventually come to pass…

The main thing we disagree about is the time scale… a few thousand years [before AI] seems more reasonable to me.

Those two estimates — several decades vs. “a few thousand years” — have wildly different policy implications.

After more discussion of AI forecasts as well as some general findings on forecasting, Luke continues:

Given these considerations, I think the most appropriate stance on the question “When will AI be created?” is something like this:

We can’t be confident AI will come in the next 30 years, and we can’t be confident it’ll take more than 100 years, and anyone who is confident of either claim is pretending to know too much.

How confident is “confident”? Let’s say 70%. That is, I think it is unreasonable to be 70% confident that AI is fewer than 30 years away, and I also think it’s unreasonable to be 70% confident that AI is more than 100 years away.

This statement admits my inability to predict AI, but it also constrains my probability distribution over “years of AI creation” quite a lot.

I think the considerations above justify these constraints on my probability distribution, but I haven’t spelled out my reasoning in great detail. That would require more analysis than I can present here. But I hope I’ve at least summarized the basic considerations on this topic, and those with different probability distributions than mine can now build on my work here to try to justify them.

I believe that in the face of this considerable uncertainty, considering multiple scenarios, and the implications of each scenario, can be quite helpful.

#2: Isn't scenario analysis unfalsifiable, and therefore unscientific? Why not aim for rigorous quantitative forecasting instead, that can be judged against reality?

First off, just because a forecast is quantitative doesn't mean it is actually rigorous. I think it's worthwhile to elicit and record quantitative forecasts. These can have high value for near-term horizons, and can provide a rough idea of the range of opinion for longer timescales.

However, simply phoning up experts to ask them for their timelines, or sending them an Internet survey, is not too useful. Tetlock's work, described in Muehlhauser's post and in my post on historical evaluations of forecasting, shows that unaided expert judgment has little value. Asking people who haven't thought through the issue to come up with numbers can give a fake sense of precision with little accuracy (and little genuine precision, either, if we consider the diverse range of responses from different experts). On the other hand, eliciting detailed scenarios from experts can force them to think more clearly about the issues and the relationships between them. Note that there are dangers to eliciting detailed scenarios: people may fall into their own make-believe world. But I think the trade-off with the uncertainty in quantitative forecasting still points in favor of scenario analysis.

Explicit quantitative forecasts can be helpful when people have an opportunity to learn from wrong forecasts and adjust their methodology accordingly. Therefore, I argue that if we want to go down the quantitative forecasting route, it's important to record forecasts about the near and medium future instead of or in addition to forecasts about the far future. Also, providing experts some historical information and feedback at the time they make their forecasts can help reduce the chances of them simply saying things without reflecting. Depending on the costs of recording forecasts, it may be worthwhile to do so anyway, even if we don't have high hopes that the forecasts will yield value. Broadly, I agree with Luke's suggestions:

  • Explicit quantification: “The best way to become a better-calibrated appraiser of long-term futures is to get in the habit of making quantitative probability estimates that can be objectively scored for accuracy over long stretches of time. Explicit quantification enables explicit accuracy feedback, which enables learning.”
  • Signposting the future: Thinking through specific scenarios can be useful if those scenarios “come with clear diagnostic signposts that policymakers can use to gauge whether they are moving toward or away from one scenario or another… Falsifiable hypotheses bring high-flying scenario abstractions back to Earth.”13
  • Leveraging aggregation: “the average forecast is often more accurate than the vast majority of the individual forecasts that went into computing the average…. [Forecasters] should also get into the habit that some of the better forecasters in [an IARPA forecasting tournament called ACE] have gotten into: comparing their predictions to group averages, weighted-averaging algorithms, prediction markets, and financial markets.” See Ungar et al. (2012) for some aggregation-leveraging results from the ACE tournament.

But I argue that the bulk of the effort should go into scenario generation and scenario analysis. Even here, the problem of absence of feedback is acute: we can design scenarios all we want for what will happen over the next century, but we can't afford to wait a century to know if our scenarios transpired. Therefore, it makes sense to break the scenario analysis exercises into chunks of 10-15 years. For instance, one scenario analysis could consider scenarios for the next 10-15 years. For each of the scenarios, we can have a separate scenario analysis exercise that considers scenarios for the 10-15 years after that. And so on. Note that the number of scenarios increases exponentially with the time horizon, but this is simply a reflection of the underlying complexity and uncertainty. In some cases, scenarios could "merge" at later times, as scenarios with slow early progress and fast later progress yield the same end result that scenario with fast early progress and slow later progress do.

#3: Evidence from other disciplines

Explicit quantitative forecasting is common in many disciplines, but the more we look at longer time horizons, and the more uncertainty we are dealing with, the more common scenario analysis becomes. I considered many examples of scenario analysis in my scenario analysis post. As you'll see from the list there, scenario analysis, and variants of it, have become influential in areas ranging from climate change (as seen in IPCC reports) to energy to macroeconomic and fiscal analysis to land use and transportation analysis. And big consulting companies such as McKinsey & Company use scenario analysis frequently in their reports.

It's of course possible to argue that the use of scenario analyses is a reflection of human failing: people don't want to make single forecasts because they are afraid of being proven wrong, or of contradicting other people's beliefs about the future. Or maybe people are shy of thinking quantitatively. I think there is some truth to such a critique. But until we have human-level AI, we have to rely on the failure-prone humans for input on the question of AI progress. Perhaps scenario analysis is superior to quantitative forecasting because humans are insufficiently rational, but to the extent it's superior, it's superior.

Addendum: What are the already existing scenario analyses for artificial intelligence?

I had a brief discussion with Luke Muehlhauser and some of the names below were suggested by him, but I didn't run the final list by him. All responsibility for errors is mine.

To my knowledge (and to the knowledge of people I've talked to) there are no formal scenario analyses of Artificial General Intelligence structured in a manner similar to the standard examples of scenario analyses. However, if scenario analysis is construed sufficiently loosely as a discussion of various predetermined elements and critical uncertainties and a brief mention of different possible scenarios, then we can list a few scenario analyses:

  • Nick Bostrom's book Superintelligence (released in the UK and on Kindle, but not released as a print book in the US at the time of this writing) discusses several scenarios for paths to AGI.
  • Eliezer Yudkowsky's report on Intelligence Explosion Microeconomics (93 pages, direct PDF link) can be construed as an analysis of AI scenarios.
  • Robin Hanson's forthcoming book on em economics discusses one future scenario that is somewhat related to AI progress.
  • The Hanson-Yudkowsky AI Foom debate includes a discussion of many scenarios.

The above are scenario analyses for the eventual properties and behavior of an artificial general intelligence, rather than scenario analyses for the immediate future. The work of Ray Kurzwzeil can be thought of as a scenario analysis that lays out an explicit timeline from now to the arrival of AGI.

[QUESTION]: Looking for insights from machine learning that helped improve state-of-the-art human thinking

1 VipulNaik 25 July 2014 02:10AM

This question is a follow-up of sorts to my earlier question on academic social science and machine learning.

Machine learning algorithms are used for a wide range of prediction tasks, including binary (yes/no) prediction and prediction of continuous variables. For binary prediction, common models include logistic regression, support vector machines, neural networks, and decision trees and forests.

Now, I do know that methods such as linear and logistic regression, and other regression-type techniques, are used extensively in science and social science research. Some of this research looks at the coefficients of such a model and then re-interprets them.

I'm interesting in examples where knowledge of the insides of other machine learning techniques (i.e., knowledge of the parameters for which the models perform well) has helped provide insights that are of direct human value, or perhaps even directly improved unaided human ability. In my earlier post, I linked to an example (courtesy Sebastian Kwiatkowski) where the results of  naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit).

PS: Here's a very quick description of how these supervised learning algorithms work. We first postulate a functional form that describes how the output depends on the input. For instance, the functional form in the case of logistic regression outputs the probability as the logistic function applied to a linear combination of the inputs (features). The functional form has a number of unknown parameters. Specific values of the parameters give specific functions that can be used to make predictions. Our goal is to find the parameter values.

We use a huge amount of labeled training data, plus a cost function (which itself typically arises from a statistical model for the nature of the error distribution) to find the parameter values. In the crudest form, this is purely a multivariable calculus optimization problem: choose parameters so that the total error function between the predicted function values and the observed function values is as small as possible. There are a few complications that need to be addressed to get to working algorithms.

So what makes machine learning problems hard? There are a few choice points:

  1. Feature selection: Figuring out the inputs (features) to use in predicting the outputs.
  2. Selection of the functional form model
  3. Selection of the cost function (error function)
  4. Selection of the algorithmic approach used to optimize the cost function, addressing the issue of overfitting through appropriate methods such as regularization and early stopping.

Of these steps, (1) is really the only step that is somewhat customized by domain, but even here, when we have enough data, it's more common to just throw in lots of features and see which ones actually help with prediction (in a regression model, the features that have predictive power will have nonzero coefficients in front of them, and removing them will increase the overall error of the model). (2) and (3) are mostly standardized, with our choice really being between a small number of differently flavored models (logistic regression, neural networks, etc.). (4) is the part where much of the machine learning research is concentrated: figuring out newer and better algorithms to find (approximate) solutions to the optimization problems for particular mathematical structures of the data.


Intuitive cooperation

3 Adele_L 25 July 2014 01:48AM

This is an exposition of some of the main ideas in the paper Robust Cooperation. My goal is to make the ideas and proofs seem natural and intuitive - instead of some mysterious thing where we invoke Löb's theorem at the right place and the agents magically cooperate. Also I hope it is accessible to people without a math or CS background. Be warned, it is pretty cheesy ok.



In a small quirky town, far away from other cities or towns, the most exciting event is a game called (for historical reasons) The Prisoner's Dilemma. Everyone comes out to watch the big tournament at the end of Summer, and you (Alice) are especially excited because this year it will be your first time playing in the tournament! So you've been thinking of ways to make sure that you can do well.


The way the game works is this: Each player can choose to cooperate or defect with the other player. If you both cooperate, then you get two points each. If one of you defects, then that player will get three points, and the other player won't get any points. But if you both defect, then you each get only one point. You have to make your decisions separately, without communicating with each other - however, everyone is required to register the algorithm they will be using before the tournament, and you can look at the other player's algorithm if you want to. You also are allowed to use some outside help in your algorithm. 

Now if you were a newcomer, you might think that no matter what the other player does, you can always do better by defecting. So the best strategy must be to always defect! Of course, you know better, if everyone tried that strategy, then they would end up defecting against each other, which is a shame since they would both be better off if they had just cooperated. 

But how can you do better? You have to be able to describe your algorithm in order to play. You have a few ideas, and you'll be playing some practice rounds with your friend Bob soon, so you can try them out before the actual tournament. 

Your first plan:

I'll cooperate with Bob if I can tell from his algorithm that he'll cooperate with me. Otherwise I'll defect. 

For your first try, you'll just run Bob's algorithm and see if he cooperates. But there's a problem - if Bob tries the same strategy, he'll have to run your algorithm, which will run his algorithm again, and so on into an infinite loop!

So you'll have to be a bit more clever than that... luckily you know a guy, Shady, who is good at these kinds of problems. 



You call up Shady, and while you are waiting for him to come over, you remember some advice your dad Löb gave you. 

(Löb's theorem) "If someone says you can trust them on X, well then they'll just tell you X." 

If  (someone tells you If [I tell you] X, then X is true)

Then  (someone tells you X is true)

(See The Cartoon Guide to Löb's Theorem[pdf] for a nice proof of this)

Here's an example:

Sketchy watch salesman: Hey, if I tell you these watches are genuine then they are genuine!

You: Ok... so are these watches genuine?

Sketchy watch salesman: Of course!

It's a good thing to remember when you might have to trust someone. If someone you already trust tells you you can trust them on something, then you know that something must be true. 

On the other hand, if someone says you can always trust them, well that's pretty suspicious... If they say you can trust them on everything, that means that they will never tell you a lie - which is logically equivalent to them saying that if they were to tell you a lie, then that lie must be true. So by Löb's theorem, they will lie to you. (Gödel's second incompleteness theorem)



Despite his name, you actually trust Shady quite a bit. He's never told you or anyone else anything that didn't end up being true. And he's careful not to make any suspiciously strong claims about his honesty.

So your new plan is to ask Shady if Bob will cooperate with you. If so, then you will cooperate. Otherwise, defect. (FairBot)

It's game time! You look at Bob's algorithm, and it turns out he picked the exact same algorithm! He's going to ask Shady if you will cooperate with him. Well, the first step is to ask Shady, "will Bob cooperate with me?" 

Shady looks at Bob's algorithm and sees that if Shady says you cooperate, then Bob cooperates. He looks at your algorithm and sees that if Shady says Bob cooperates, then you cooperate. Combining these, he sees that if he says you both cooperate, then both of you will cooperate. So he tells you that you will both cooperate (your dad was right!)

Let A stand for "Alice cooperates with Bob" and B stand for "Bob cooperates with Alice".

From looking at the algorithms,  and 

So combining these, .

Then by Löb's theorem, .

Since that means that Bob will cooperate, you decide to actually cooperate. 

Bob goes through an analagous thought process, and also decides to cooperate. So you cooperate with each other on the prisoner's dilemma! Yay!



That night, you go home and remark, "it's really lucky we both ended up using Shady to help us, otherwise that wouldn't have worked..."

Your dad interjects, "Actually, it doesn't matter - as long as they were both smart enough to count, it would work. This  doesn't just say 'I tell you X', it's stronger than that - it actually says 'Anyone who knows basic arithmetic will tell you X'. So as long as they both know a little arithmetic, it will still work - even if one of them is pro-axiom-of-choice, and the other is pro-axiom-of-life. The cooperation is robust." That's really cool! 

But there's another issue you think of. Sometimes, just to be tricky, the tournament organizers will set up a game where you have to play against a rock. Yes, literally just a rock that holding the cooperate button down. If you played against a rock with your current algorithm, well you start by asking Shady if the rock will cooperate with you. Shady is like, "well yeah, duh." So then you cooperate too. But you could have gotten three points by defecting! You're missing out on a totally free point! 

You think that it would be a good idea to make sure the other player isn't a complete idiot before you cooperate with them. How can you check? Well, let's see if they would cooperate with a rock placed on the defect button (affectionately known as 'DefectRock'). If they know better than that, and they will cooperate with you, then you will cooperate with them. 



The next morning, you excitedly tell Shady about your new plan. "It will be like before, except this time, I also ask you if the other player will cooperate with DefectRock! If they are dumb enough to do that, then I'll just defect. That way, I can still cooperate with other people who use algorithms like this one, or the one from before, but I can also defect and get that extra point when there's just a rock on cooperate."

Shady get's an awkward look on his face, "Sorry, but I can't do that... or at least it wouldn't work out the way you're thinking. Let's say you're playing against Bob, who is still using the old algorithm. You want to know if Bob will cooperate with DefectRock, so I have to check and see if I'll tell Bob that DefectRock will cooperate with him. I would have say I would never tell Bob that DefectRock will cooperate with him. But by Löb's theorem, that means I would tell you this obvious lie! So that isn't gonna work."

Notation,  if X cooperates with Y in the prisoner's dilemma (or = D if not). 

You ask Shady, does ?

Bob's algorithm:  only if .

So to say , we would need 

This is equivalent to , since  is an obvious lie. 

By Löb's theorem, , which is a lie. 

<Extra credit: does the fact that Shady is the one explaining this mean you can't trust him?>

<Extra extra credit: find and fix the minor technical error in the above argument.>

Shady sees the dismayed look on your face and adds, "...but, I know a guy who can vouch for me, and I think maybe that could make your new algorithm work."

So Shady calls his friend T over, and you work out the new details. You ask Shady if Bob will cooperate with you, and you ask T if Bob will cooperate with DefectRock. So T looks at Bob's algorithm, which asks Shady if DefectRock will cooperate with him. Shady, of course, says no. So T sees that Bob will defect against DefectRock, and lets you know. Like before, Shady tells you Bob will cooperate with you, and thus you decide to cooperate! And like before, Bob decides to cooperate with you, so you both cooperate! Awesome! (PrudentBot)

If Bob is using your new algorithm, you can see that the same argument goes through mostly unchanged, and that you will still cooperate! And against a rock on cooperate, T will tell you that it will cooperate with DefectRock, so you can defect and get that extra point! This is really great!!



(ok now it's time for the really cheesy ending)

It's finally time for the tournament. You have a really good feeling about your algorithm, and you do really well! Your dad is in the audience cheering for you, with a really proud look on his face. You tell your friend Bob about your new algorithm so that he can also get that extra point sometimes, and you end up tying for first place with him!

A few weeks later, Bob asks you out, and you two start dating. Being able to cooperate with each other robustly is a good start to a healthy relationship, and you live happily ever after! 

The End.

Three questions about source code uncertainty

6 cousin_it 24 July 2014 01:18PM

In decision theory, we often talk about programs that know their own source code. I'm very confused about how that theory applies to people, or even to computer programs that don't happen to know their own source code. I've managed to distill my confusion into three short questions:

1) Am I uncertain about my own source code?

2) If yes, what kind of uncertainty is that? Logical, indexical, or something else?

3) What is the mathematically correct way for me to handle such uncertainty?

Don't try to answer them all at once! I'll be glad to see even a 10% answer to one question.

Alpha Mail

5 Chef 24 July 2014 05:01AM

I recently stumbled upon an article from early 2003 in Physics World outlining a bit of evidence that some of the constants in nature may change over time. In this particular case, researchers studying quasars noticed that the fine-structure constant (α) might have fluctuated a bit billions of years ago, in both directions (bigger and smaller) with significance 4.1 sigma. What intrigues me about this is that I’ve previously pondered if something like this might be found, albeit for very different reasons.

Back in the 90s I read a book that made a case for the universe as a computer simulation. That particular book wasn’t all that compelling to me, but I’ve never been completely satisfied with arguments against that model and tend to think of the universe generally in those terms anyway. Can I still call myself an atheist if I allow the possibility of a creator in this context? A non-practicing atheist maybe?

If this universe is a computer-generated simulation, programmed by another life form, perhaps the search for extraterrestrial intelligence (SETI) should be expanded to include life forms beyond our universe. It sounds nonsensical, but is it?

If I was to design and code an environment sophisticated enough to allow a species of life to evolve in that environment, I am not convinced that I would have many tools at my disposal to truly be able to understand and evaluate that species very well. Sure, I may be able to see them generating patterns that indicate intelligent life within my simulation, but this life form evolved and exists in an environment completely alien to me. I might have only limited methods at my disposal through which to communicate with them. They would exist in a place that to me is not exactly real and vice-versa.

I’ve always imagined it would be more like evaluating patterns and data readouts or viewing cells through a microscope more than say something like, The Sims.  Having designed and implemented the very laws of their universe though, the fundamental constants of the universe could act as a sort of communication channel – one that allows me to at the very least let them know I existed (assuming they were intelligent and were looking). I could modify those constants in such a way over time in much the same manner that we might try to communicate with the more local and familiar concept of alien.

I realize this is all just rambling, but because the alpha is so closely related to those parts of nature that allow for our own existence, it made me take notice, and wonder if this could be some sort of alpha mail. The thought of being able to communicate with an external intelligence is thought provoking enough for me that I decided to write this as my first post here. Who knows? If it ever was confirmed, perhaps we could turn out to be the paper clip maximizer, and we should start looking for our ticket out of here.    


Jokes Thread

15 JosephY 24 July 2014 12:31AM

This is a thread for rationality-related or LW-related jokes and humor. Please post jokes (new or old) in the comments.


Q: Why are Chromebooks good Bayesians?

A: Because they frequently update!


A super-intelligent AI walks out of a box...


Q: Why did the psychopathic utilitarian push a fat man in front of a trolley?

A: Just for fun.

Fifty Shades of Self-Fulfilling Prophecy

19 PhilGoetz 24 July 2014 12:17AM

The official story: "Fifty Shades of Grey" was a Twilight fan-fiction that had over two million downloads online. The publishing giant Vintage Press saw that number and realized there was a huge, previously-unrealized demand for stories like this. They filed off the Twilight serial numbers, put it in print, marketed it like hell, and now it's sold 60 million copies.

The reality is quite different.

continue reading »

Gauging interest for a Tokyo area meetup group

6 lirene 23 July 2014 11:55AM

I'd like to gauge interest in an (english-language) Tokyo area meetup - given Tokyo's size, if a couple people are interested, it would be good to pick a location/day that's convenient for everybody. Otherwise I will announce a date and time and wait in a cafe with a book hoping that somebody will turn up.


I have been to several LW gatherings and have met consistently awesome and nice people, so if any Tokyo lurkers are reading this, I can assure you it's totally worth it to come! Please make yourself heard in the comments if you are interested.

Top-Down and Bottom-Up Logical Probabilities

2 Manfred 22 July 2014 08:53AM


I don't know very much model theory, and thus I don't fully understand Hutter et al.'s logical prior, detailed here, but nonetheless I can tell you that it uses a very top-down approach. About 60% of what I mean is that the prior is presented as a completed object with few moving parts, which fits the authors' mathematical tastes and proposed abstract properties the function should have. And for another thing, it uses model theory - a dead giveaway.

There are plenty of reasons to take a top-down approach. Yes, Hutter et al.'s function isn't computable, but sometimes the properties you want require uncomputability. And it's easier to come up with something vaguely satisfactory if you don't have to have many moving parts. This can range from "the prior is defined as a thing that fulfills the properties I want" on the lawful good side of the spectrum, to "clearly the right answer is just the exponential of the negative complexity of the statement, duh".

Probably the best reason to use a top-down approach to logical uncertainty is so you can do math to it. When you have some elegant description of global properties, it's a lot easier to prove that your logical probability function has nice properties, or to use it in abstract proofs. Hence why model theory is a dead giveaway.

There's one other advantage to designing a logical prior from the top down, which is that you can insert useful stuff like a complexity penalty without worrying too much. After all, you're basically making it up as you go anyhow, you don't have to worry about where it comes from like you would if you were going form the bottom up.

A bottom-up approach, by contrast, starts with an imagined agent with some state of information and asks what the right probabilities to assign are. Rather than pursuing mathematical elegance, you'll see a lot of comparisons to what humans do when reasoning through similar problems, and demands for computability from the outset.

For me, a big opportunity of the bottom-up approach is to use desiderata that look like principles of reasoning. This leads to more moving parts, but also outlaws some global properties that don't have very compelling reasons behind them.



Before we get to the similarities, rather than the differences, we'll have to impose the condition of limited computational resources. A common playing field, as it were. It would probably serve just as well to extend bottom-up approaches to uncomputable heights, but I am the author here, and I happen to be biased towards the limited-resources case.

The part of top-down assignment using limited resources will be played by a skeletonized pastiche of Paul Christiano's recent report:

i. No matter what, with limited resources we can only assign probabilities to a limited pool of statements. Accordingly, step one is to use some process to choose the set S0 of statements (and their negations) to assign probabilities.

ii. Then we use something a weakened consistency condition (that can be decided between pairs of sentences in polynomial time) to set constraints on the probability function over S0. For example, sentences that are identical except for a double-negation have to be given the same probability.

iii. Christiano constructs a description-length-based "pre-prior" function that is bigger for shorter sentences. There are lots of options for different pre-priors, and I think this is a pretty good one.

iv. Finally, assign a logical probability function over S0 that is as similar as possible to the pre-prior while fulfilling the consistency condition. Christiano measures similarity using cross-entropy between the two functions, so that the problem is one of minimizing cross-entropy subject to a finite list of constraints. (Even if the pre-prior decreases exponentially, this doesn't mean that complicated statements will have exponentially low logical probability, because of the condition from step two that P(a statement) + P(its negation) = 1 - in a state of ignorance, everything still gets probability 1/2. The pre-prior only kicks in when there are more options with different description lengths.)

Next, let's look at the totally different world of a bottom-up assignment of logical probabilities, played here by a mildly rephrased version of my past proposal.

i. Pick a set of sentences S1 to try and figure out the logical probabilities of.

ii. Prove the truth or falsity of a bunch of statements in the closure of S1 under conjugation and negation (i.e. if sentences a and b are in S1, a&b is in the closure of S1).

iii. Assign a logical probability function over the closure of S1 under conjugation with maximum entropy, subject to the constraints proved in part two, plus the constraints that each sentence && its negation has probability 0.

These turn out to be really similar! Look in step three of my bottom-up example - there's a even a sneakily-inserted top-down condition about going through every single statement and checking an aspect of consistency. In the top-down approach, every theorem of a certain sort is proved, while in the bottom-up approach there are allowed to be lots of gaps - but the same sorts of theorems are proved. I've portrayed one as using proofs only about sentences in S0, and the other as using proofs in the entire closure of S1 under conjunction, but those are just points on an available continuum (for more discussion, see Christiano's section on positive semidefinite methods).

The biggest difference is this "pre-prior" thing. On the one hand, it's essential for giving us guarantees about inductive learning. On the other hand, what piece of information do we have that tells us that longer sentences really are less likely? I have unresolved reservations, despite the practical advantages.



A minor confession - my choice of Christiano's report was not coincidental at all. The causal structure went like this:

Last week - Notice dramatic similarities in what gets proved and how it gets used between my bottom-up proposal and Christiano's top-down proposal.

Now - Write post talking about generalities of top-down and bottom-up approaches to logical probability, and then find as a startling conclusion the thing that motivated me to write the post in the first place.

The teeensy bit of selection bias here means that though these similarities are cool, it's hard to draw general conclusions.

So let's look at one more proposal, this one due to Abram Demski, modified by to use limited resources.

i. Pick a set of sentences S2 to care about.

ii. Construct a function on sentences in S2 that is big for short sentences and small for long sentences.

iii. Start with the set of sentences that are axioms - we'll shortly add new sentences to the set.

iv. Draw a sentence from S2 with probability proportional to the function from step two.

v. Do a short consistency check (can use a weakened consistency condition, or just limited time) between this sentence and the sentences already in the set. If it's passed, add the sentence to the set.

vi. Keep doing steps four and five until you've either added or ruled out all the sentences in S2.

vii. The logical probability of a sentence is defined as the probability that it ends up in our set after going through this process. We can find this probability using Monte Carlo by just running the process a bunch of times and counting up what portion of the time each sentences is in the set by the end.

Okay, so this one looks pretty different. But let's look for the similarities. The exact same kinds of things get proved again - weakened or scattershot consistency checks between different sentences. If all you have in S2 are three mutually exclusive and exhaustive sentences, the one that's picked first wins - meaning that the probability function over what sentence gets picked first is acting like our pre-prior.

So even though the method is completely different, what's really going on is that sentences are being given measure that looks like the pre-prior, subject to the constraints of weakened consistency (via rejection sampling) and normalization (keep repeating until all statements are checked).

In conclusion: not everything is like everything else, but some things are like some other things.

Compiling my writings for Lesswrong and others.

1 diegocaleiro 22 July 2014 08:11AM

I've just inserted about 50 new links to my list of writings, most of which from Lesswrong, here. For convenience, I'm copying it below.


I write a lot about a variety of topics in English and until 2013 also did in Portuguese, Notice Google Chrome automatically translates texts if you need. This will someday be a compilation of all my writings, divided by Borgean topics. There are also writings I wish I had written:

The ones I really, really want you to read before you read the rest:

Those that may help you save the world:

Those that are very long and full of ideas:

Those short:

Those about how to live life to the fullest:

Those related to evolution:

Those about minds:

Those which are on Lesswrong but I think should have been read more:

Those defying authority and important notions of the Status Quo:

Those I currently dislike or find silly:

Those humorous:


Those I want someone else to finish or rehash:

Those in portuguese:

Those not above:

Politics is hard mode

18 RobbBB 21 July 2014 10:14PM

Summary: I don't think 'politics is the mind-killer' works well rthetorically. I suggest 'politics is hard mode' instead.


Some people in and catawampus to the LessWrong community have objected to "politics is the mind-killer" as a framing (/ slogan / taunt). Miri Mogilevsky explained on Facebook:

My usual first objection is that it seems odd to single politics out as a “mind-killer” when there’s plenty of evidence that tribalism happens everywhere. Recently, there has been a whole kerfuffle within the field of psychology about replication of studies. Of course, some key studies have failed to replicate, leading to accusations of “bullying” and “witch-hunts” and what have you. Some of the people involved have since walked their language back, but it was still a rather concerning demonstration of mind-killing in action. People took “sides,” people became upset at people based on their “sides” rather than their actual opinions or behavior, and so on.

Unless this article refers specifically to electoral politics and Democrats and Republicans and things (not clear from the wording), “politics” is such a frightfully broad category of human experience that writing it off entirely as a mind-killer that cannot be discussed or else all rationality flies out the window effectively prohibits a large number of important issues from being discussed, by the very people who can, in theory, be counted upon to discuss them better than most. Is it “politics” for me to talk about my experience as a woman in gatherings that are predominantly composed of men? Many would say it is. But I’m sure that these groups of men stand to gain from hearing about my experiences, since some of them are concerned that so few women attend their events.

In this article, Eliezer notes, “Politics is an important domain to which we should individually apply our rationality — but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational.” But that means that we all have to individually, privately apply rationality to politics without consulting anyone who can help us do this well. After all, there is no such thing as a discussant who is “rational”; there is a reason the website is called “Less Wrong” rather than “Not At All Wrong” or “Always 100% Right.” Assuming that we are all trying to be more rational, there is nobody better to discuss politics with than each other.

The rest of my objection to this meme has little to do with this article, which I think raises lots of great points, and more to do with the response that I’ve seen to it — an eye-rolling, condescending dismissal of politics itself and of anyone who cares about it. Of course, I’m totally fine if a given person isn’t interested in politics and doesn’t want to discuss it, but then they should say, “I’m not interested in this and would rather not discuss it,” or “I don’t think I can be rational in this discussion so I’d rather avoid it,” rather than sneeringly reminding me “You know, politics is the mind-killer,” as though I am an errant child. I’m well-aware of the dangers of politics to good thinking. I am also aware of the benefits of good thinking to politics. So I’ve decided to accept the risk and to try to apply good thinking there. [...]

I’m sure there are also people who disagree with the article itself, but I don’t think I know those people personally. And to add a political dimension (heh), it’s relevant that most non-LW people (like me) initially encounter “politics is the mind-killer” being thrown out in comment threads, not through reading the original article. My opinion of the concept improved a lot once I read the article.

In the same thread, Andrew Mahone added, “Using it in that sneering way, Miri, seems just like a faux-rationalist version of ‘Oh, I don’t bother with politics.’ It’s just another way of looking down on any concerns larger than oneself as somehow dirty, only now, you know, rationalist dirty.” To which Miri replied: “Yeah, and what’s weird is that that really doesn’t seem to be Eliezer’s intent, judging by the eponymous article.”

Eliezer replied briefly, to clarify that he wasn't generally thinking of problems that can be directly addressed in local groups (but happen to be politically charged) as "politics":

Hanson’s “Tug the Rope Sideways” principle, combined with the fact that large communities are hard to personally influence, explains a lot in practice about what I find suspicious about someone who claims that conventional national politics are the top priority to discuss. Obviously local community matters are exempt from that critique! I think if I’d substituted ‘national politics as seen on TV’ in a lot of the cases where I said ‘politics’ it would have more precisely conveyed what I was trying to say.

But that doesn't resolve the issue. Even if local politics is more instrumentally tractable, the worry about polarization and factionalization can still apply, and may still make it a poor epistemic training ground.

A subtler problem with banning “political” discussions on a blog or at a meet-up is that it’s hard to do fairly, because our snap judgments about what counts as “political” may themselves be affected by partisan divides. In many cases the status quo is thought of as apolitical, even though objections to the status quo are ‘political.’ (Shades of Pretending to be Wise.)

Because politics gets personal fast, it’s hard to talk about it successfully. But if you’re trying to build a community, build friendships, or build a movement, you can’t outlaw everything ‘personal.’

And selectively outlawing personal stuff gets even messier. Last year, daenerys shared anonymized stories from women, including several that discussed past experiences where the writer had been attacked or made to feel unsafe. If those discussions are made off-limits because they relate to gender and are therefore ‘political,’ some folks may take away the message that they aren’t allowed to talk about, e.g., some harmful or alienating norm they see at meet-ups. I haven’t seen enough discussions of this failure mode to feel super confident people know how to avoid it.

Since this is one of the LessWrong memes that’s most likely to pop up in cross-subcultural dialogues (along with the even more ripe-for-misinterpretation “policy debates should not appear one-sided“…), as a first (very small) step, my action proposal is to obsolete the ‘mind-killer’ framing. A better phrase for getting the same work done would be ‘politics is hard mode’:

1. ‘Politics is hard mode’ emphasizes that ‘mind-killing’ (= epistemic difficulty) is quantitative, not qualitative. Some things might instead fall under Middlingly Hard Mode, or under Nightmare Mode…

2. ‘Hard’ invites the question ‘hard for whom?’, more so than ‘mind-killer’ does. We’re used to the fact that some people and some contexts change what’s ‘hard’, so it’s a little less likely we’ll universally generalize.

3. ‘Mindkill’ connotes contamination, sickness, failure, weakness. In contrast, ‘Hard Mode’ doesn’t imply that a thing is low-status or unworthy. As a result, it’s less likely to create the impression (or reality) that LessWrongers or Effective Altruists dismiss out-of-hand the idea of hypothetical-political-intervention-that-isn’t-a-terrible-idea. Maybe some people do want to argue for the thesis that politics is always useless or icky, but if so it should be done in those terms, explicitly — not snuck in as a connotation.

4. ‘Hard Mode’ can’t readily be perceived as a personal attack. If you accuse someone of being ‘mindkilled’, with no context provided, that smacks of insult — you appear to be calling them stupid, irrational, deluded, or the like. If you tell someone they’re playing on ‘Hard Mode,’ that’s very nearly a compliment, which makes your advice that they change behaviors a lot likelier to go over well.

5. ‘Hard Mode’ doesn’t risk bringing to mind (e.g., gendered) stereotypes about communities of political activists being dumb, irrational, or overemotional.

6. ‘Hard Mode’ encourages a growth mindset. Maybe some topics are too hard to ever be discussed. Even so, ranking topics by difficulty encourages an approach where you try to do better, rather than merely withdrawing. It may be wise to eschew politics, but we should not fear it. (Fear is the mind-killer.)

7. Edit: One of the larger engines of conflict is that people are so much worse at noticing their own faults and biases than noticing others'. People will be relatively quick to dismiss others as 'mindkilled,' while frequently flinching away from or just-not-thinking 'maybe I'm a bit mindkilled about this.' Framing the problem as a challenge rather than as a failing might make it easier to be reflective and even-handed.

This is not an attempt to get more people to talk about politics. I think this is a better framing whether or not you trust others (or yourself) to have productive political conversations.

When I playtested this post, Ciphergoth raised the worry that 'hard mode' isn't scary-sounding enough. As dire warnings go, it's light-hearted—exciting, even. To which I say: good. Counter-intuitive fears should usually be argued into people (e.g., via Eliezer's politics sequence), not connotation-ninja'd or chanted at them. The cognitive content is more clearly conveyed by 'hard mode,' and if some group (people who love politics) stands to gain the most from internalizing this message, the message shouldn't cast that very group (people who love politics) in an obviously unflattering light. LW seems fairly memetically stable, so the main issue is what would make this meme infect friends and acquaintances who haven't read the sequences. (Or Dune.)

If you just want a scary personal mantra to remind yourself of the risks, I propose 'politics is SPIDERS'. Though 'politics is the mind-killer' is fine there too.

If you and your co-conversationalists haven’t yet built up a lot of trust and rapport, or if tempers are already flaring, conveying the message ‘I’m too rational to discuss politics’ or ‘You’re too irrational to discuss politics’ can make things worse. In that context, ‘politics is the mind-killer’ is the mind-killer. At least, it’s a needlessly mind-killing way of warning people about epistemic hazards.

‘Hard Mode’ lets you speak as the Humble Aspirant rather than the Aloof Superior. Strive to convey: ‘I’m worried I’m too low-level to participate in this discussion; could you have it somewhere else?’ Or: ‘Could we talk about something closer to Easy Mode, so we can level up together?’ More generally: If you’re worried that what you talk about will impact group epistemology, you should be even more worried about how you talk about it.

[ACTIVITY]: Exploratory Visit to the Bay Area

2 Daniel_Burfoot 21 July 2014 07:49PM

In my opinion, living anywhere other than the center of your industry is a mistake. A lot of people — those who don’t live in that place — don’t want to hear it. But it’s true. Geographic locality is still — even in the age of the Internet — critically important if you want to maximize your access to the best companies, the best people, and the best opportunities. You can always cite exceptions, but that’s what they are: exceptions.

- Marc Andreessen


Like many people in the technology industry, I have been thinking seriously about moving to the Bay Area. However, before I decide to move, I want to do a lot of information gathering. Some basic pieces of information - employment prospects, cost of living statistics, and weather averages - can be found online. But I feel that one's quality of life is determined by a large number of very subtle factors - things like walkability, public transportation, housing quality/dollar of rent, lifestyle options, and so on. These kinds of things seem to require first-hand, in-person examination. For that reason, I'm planning to visit the Bay Area and do an in-depth exploration next month, August 20th-24th. 

My guess is that a significant number of LWers are also thinking about moving to the Bay Area, and so I wanted to invite people to accompany me in this exploration. Here are some activities we might do: 


  • Travel around using public transportation. Which places are convenient to get from/to, and which places aren't?
  • Visit the offices of the major tech companies like Google, Facebook, Apple, and Twitter. Ask some of their employees how they feel about being a software engineer in Silicon Valley.
  • Eat at local restaurants - not so much the fancy/expensive ones, but the ones a person might go to for a typical, everyday  lunch outing. 
  • See some of the sights. Again, the emphasis would be on the things that would affect our everyday lifestyle, should be decide to move, not so much on the tourist attractions. For example, the Golden Gate Bridge is an awesome structure, but I doubt it would improve my everyday life very much. In contrast, living near a good running trail would be a big boost to my lifestyle. 
  • Do some apartment viewing, to get a feel for how much rent a good/medium/student apartment costs in different areas and how good the amenities are. 
  • Go to some local LW meetups, if there are any scheduled for the time window. 
  • Visit the Stanford and UC Berkeley campuses and the surrounding areas.
  • Interact with locals and ask them about their experience living in the region
  • Visit a number of different neighborhoods, to try to get a sense of the pros and cons of each
  • Discuss how to apply Bayesian decision theory to the problem of finding the optimal place to live ;)

I would also love to connect with LWers who are currently living in the Bay Area. If you are willing to meet up, discuss your experience living in the area, and share some local tips, I'd be happy to compensate you with a nice dinner or a few beers. 

If you are interested in participating in this activity, either as a visitor to the area or as a local, please comment below and I will PM you details for how to contact me. Depending on the level of interest, I will probably set up a shared Google Doc or one-off email list to distribute information. 

In general, my plan is to keep things loosely organized - less like a conference and more like a couple of friends on a weekend vacation. If you want to participate for a single day or just one activity, that's fine. The main exception is: if you are interested in sharing accommodations, please let me know and we will try to coordinate something (sharing rooms will make things cheaper on a per-person basis). I am planning to use AirBNB (if you are a local LWer who rents a room through AirBNB, that would be perfect!)









Open thread, July 21-27, 2014

3 polymathwannabe 21 July 2014 01:15PM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one.

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

A simple game that has no solution

6 James_Miller 20 July 2014 06:36PM

The following simple game has one solution that seems correct, but isn’t.  Can you figure out why?


The Game


Player One moves first.  He must pick A, B, or C.  If Player One picks A the game ends and Player Two does nothing.  If Player One picks B or C, Player Two will be told that Player One picked B or C, but will not be told which of these two strategies Player One picked, Player Two must then pick X or Y, and then the game ends.  The following shows the Players’ payoffs for each possible outcome.  Player One’s payoff is listed first.


A   3,0    [And Player Two never got to move.]

B,X 2,0

B,Y 2,2

C,X 0,1

C,Y 6,0

continue reading »

Experiments 1: Learning trivia

11 casebash 20 July 2014 10:31AM

There has been some talk of a lack of content being posted to Less Wrong, so I decided to start a series on various experiments that I've tried and what I've learned from them as I believe that experimentation is key to being a rationalist. My first few posts will be adapted from content I've written for /r/socialskills, but as Less Wrong has a broader scope I plan to post some original content too. I hope that this post will encourage other people to share detailed descriptions of the experiments that they have tried as I believe that this is much more valuable than a list of lessons posted outside of the context in which they were learned. If anyone has already posted any similar posts, then I would really appreciate any links.

Trivia Experiment

I used to have a lot of trouble in conversation thinking of things to say. I wanted to be a more interesting person and I noticed that my brother uses his knowledge of a broad range of topics to engage people in conversations, so I wanted to do the same.

I was drawn quite quickly towards facts because of how quickly they can be read. If a piece of trivia takes 10 seconds to read, then you can read 360 in an hour. If only 5% are good, then that's still 18 usable facts per hour. Articles are longer, but have significantly higher chances of teaching you something. It seemed like you should be able to prevent ever running out of things to talk about with a reasonable investment of time. It didn't quite work out this way, but this was the idea.d

Another motivation was that I have always valued intelligence and learning more information made me feel good about myself.


Today I learned: #1 recommended source

The straight dope: Many articles in the archive are quite interesting, but I unsubscribed because I found the more recent ones boring

Damn interesting

Now I know

Cracked: Not the most reliable source and can be a huge time sink, but occasionally there are articles there that will give you 6 or 7 interesting facts in one go

Dr Karl: Science blog

Skeptics Stackexchange

Mythbusters results

The future is now

I read through the top 1000 links on Today I learned, the entire archive of the straight dope, maybe half of damn interesting and now I know, half of Karl and all the mythbusters results up to about a year or two ago. We are pretty much talking about months of solid reading.


You probably guessed it, but my return on investment wasn't actually that great. I tended to consume this trivia in ridiculously huge batches because by reading all this information I at least felt like I was doing something. If someone came up to me and asked me for a random piece of trivia - I actually don't have that much that I can pull out. It's actually much easier if someone asks about a specific topic, but there's still not that much I can access.

To test my knowledge I decided to pick the first three topics that came into my head and see how much random trivia I could remember about each. As you can see, the results were rather disappointing:


  • Cats can survive falls from a higher number of floors better than a lower number of falls because they have a low terminal velocity and more time to orient themselves to ensure they land on their feet
  • House cats can run faster than Ursain bolt


  • If you are attacked by a dog the best strategy is to shove your hand down its mouth and attack the neck with your other hand
  • Dogs can be trained to drive cars (slowly)
  • There is such a thing as the world's ugliest dog competition


  • Cheese is poisonous to rats
  • The existence of rat kings - rats who got their tails stuck together

Knowing these facts does occasionally help me by giving me something interesting to say when I wouldn't have otherwise had it, but quite often I want to quote one of these facts, but I can't quite remember the details. It's hard to quantify how much this helps me though. There have been a few times when I've been able to get someone interested in a conversation that they wouldn't have otherwise been interested in, but I can also go a dozen conversations without quoting any of these facts. No-one has ever gone "Wow, you know so many facts!". Another motivation I had was that being knowledgeable makes me feel good about myself. I don't believe that there was any significant impact in this regard either - I don't have a strong self-concept of myself as someone who is particularly knowledgeable about random facts. Overall this experiment was quite disappointing given the high time investment.

Other benefits:

While the social benefits have been extremely minimal, learning all of these facts has expanded my world view.

Possible Refinements:

While this technique worked poorly for me, there are many changes that I could have made that might have improved effectiveness.

  • Lower batch sizes: when you read too many facts in one go you get tired and it all tends to blur together
  • Notes: I started making notes of the most interesting facts I was finding using Evernote. I regularly add new facts, but only very occasionally go back and actually look them up. I was trying to review the new facts that I learned regularly, but I got busy and just fell out of the habit. Perhaps I could have a separate list for the most important facts I learn every week and this would be less effort?
  • Rereading saved facts: I did a complete reread through my saved notes once. I still don't think that I have a very good recall - probably related to batch size!
  • Spaced repetition: Many people claim that this make memorisation easy
  • Thoughtback: This is a lighter alternative to spaced repetition - it gives you notifications on your phone of random facts - about one per day
  • Talking to other people: This is a very effective method for remembering facts. That vast majority of facts that I've shared with other people, I still remember. Perhaps I should create a list of facts that I want to remember and then pick one or two at a time to share with people. Once I've shared them a few times, I could move on to the next fact
  • Blog posts - perhaps if I collected some of my related facts into blog posts, having to decide which to include and which to not include my help me remember these facts more
  • Pausing: I find that I am more likely to remember things if I pause and think that this is something that I want to remember. I was trying to build that habit, but I didn't succeed in this
  • Other memory techniques: brains are better at remembering things if you process them. So if you want to remember the story where thieves stole a whole beach in one night, try to picture the beach and then the shock when some surfer turns up and all the sand is gone. Try to imagine what you'd need to pull that off.

I believe that if I had spread my reading out over a greater period of time, then the cost would have been justified. Part of this would have been improved retention and part of this would have been having a new interesting fact to use in conversation every week that I know I hadn't told anyone else before.

The social benefits are rather minimal, so it would be difficult to get them to match up with the time invested. I believe that with enough refinement, someone could improve their effectiveness to the stage where the benefits matched up with the effort invested, but broadening one's knowledge will always be the primary advantage gained.

LINK: Top HIV researcher killed in plane crash

-5 polymathwannabe 19 July 2014 05:03PM

As most of you may already know, the plane that recently crashed on disputed Ukrainian soil carried some of the world's top HIV researchers.

One part of me holds vehemently that all human beings are of equal value.

Another part of me wishes there could be extra-creative punishments for depriving the world of its best minds.




[QUESTION]: Academic social science and machine learning

11 VipulNaik 19 July 2014 03:13PM

I asked this question on Facebook here, and got some interesting answers, but I thought it would be interesting to ask LessWrong and get a larger range of opinions. I've modified the list of options somewhat.

What explains why some classification, prediction, and regression methods are common in academic social science, while others are common in machine learning and data science?

For instance, I've encountered probit models in some academic social science, but not in machine learning.

Similarly, I've encountered support vector machines, artificial neural networks, and random forests in machine learning, but not in academic social science.

The main algorithms that I believe are common to academic social science and machine learning are the most standard regression algorithms: linear regression and logistic regression.

Possibilities that come to mind:

(0) My observation is wrong and/or the whole question is misguided.

(1) The focus in machine learning is on algorithms that can perform well on large data sets. Thus, for instance, probit models may be academically useful but don't scale up as well as logistic regression.

(2) Academic social scientists take time to catch up with new machine learning approaches. Of the methods mentioned above, random forests and support vector machines was introduced as recently as 1995. Neural networks are older but their practical implementation is about as recent. Moreover, the practical implementations of these algorithm in the standard statistical softwares and packages that academics rely on is even more recent. (This relates to point (4)).

(3) Academic social scientists are focused on publishing papers, where the goal is generally to determine whether a hypothesis is true. Therefore, they rely on approaches that have clear rules for hypothesis testing and for establishing statistical significance (see also this post of mine). Many of the new machine learning approaches don't have clearly defined statistical approaches for significance testing. Also, the strength of machine learning approaches is more exploratory than testing already formulated hypotheses (this relates to point (5)).

(4) Some of the new methods are complicated to code, and academic social scientists don't know enough mathematics, computer science, or statistics to cope with the methods (this may change if they're taught more about these methods in graduate school, but the relative newness of the methods is a factor here, relating to (2)).

(5) It's hard to interpret the results of fancy machine learning tools in a manner that yields social scientific insight. The results of a linear or logistic regression can be interpreted somewhat intuitively: the parameters (coefficients) associated with individual features describe the extent to which those features affect the output variable. Modulo issues of feature scaling, larger coefficients mean those features play a bigger role in determining the output. Pairwise and listwise R^2 values provide additional insight on how much signal and noise there is in individual features. But if you're looking at a neural network, it's quite hard to infer human-understandable rules from that. (The opposite direction is not too hard: it is possible to convert human-understandable rules to a decision tree and then to use a neural network to approximate that, and add appropriate fuzziness. But the neural networks we obtain as a result of machine learning optimization may be quite different from those that we can interpret as humans). To my knowledge, there haven't been attempts to reinterpret neural network results in human-understandable terms, though Sebastian Kwiatkowski's comment on my Facebook post points to an example where the results of  naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit). But Kwiatkowski's comment also pointed to other instances where the machine's algorithms weren't human-interpretable.

What's your personal view on my main question, and on any related issues?

Look for the Next Tech Gold Rush?

29 Wei_Dai 19 July 2014 10:08AM

In early 2000, I registered my personal domain name weidai.com, along with a couple others, because I was worried that the small (sole-proprietor) ISP I was using would go out of business one day and break all the links on the web to the articles and software that I had published on my "home page" under its domain. Several years ago I started getting offers, asking me to sell the domain, and now they're coming in almost every day. A couple of days ago I saw the first six figure offer ($100,000).

In early 2009, someone named Satoshi Nakamoto emailed me personally with an announcement that he had published version 0.1 of Bitcoin. I didn't pay much attention at the time (I was more interested in Less Wrong than Cypherpunks at that point), but then in early 2011 I saw a LW article about Bitcoin, which prompted me to start mining it. I wrote at the time, "thanks to the discussion you started, I bought a Radeon 5870 and started mining myself, since it looks likely that I can at least break even on the cost of the card." That approximately $200 investment (plus maybe another $100 in electricity) is also worth around six figures today.

Clearly, technological advances can sometimes create gold rush-like situations (i.e., first-come-first-serve opportunities to make truly extraordinary returns with minimal effort or qualifications). And it's possible to stumble into them without even trying. Which makes me think, maybe we should be trying? I mean, if only I had been looking for possible gold rushes, I could have registered a hundred domain names optimized for potential future value, rather than the few that I happened to personally need. Or I could have started mining Bitcoins a couple of years earlier and be a thousand times richer.

I wish I was already an experienced gold rush spotter, so I could explain how best to do it, but as indicated above, I participated in the ones that I did more or less by luck. Perhaps the first step is just to keep one's eyes open, and to keep in mind that tech-related gold rushes do happen from time to time and they are not impossibly difficult to find. What other ideas do people have? Are there other past examples of tech gold rushes besides the two that I mentioned? What might be some promising fields to look for them in the future?

Effective Writing

6 diegocaleiro 18 July 2014 08:45PM

Granted, writing is not very effective. But some of us just love writing...

Earning to Give Writing: Which are the places that pay 1USD or more dollars per word?

Mind Changing Writing: What books need being written that can actually help people effectively change the world?

Clarification Writing: What needs being written because it is only through writing that these ideas will emerge in the first place?

Writing About Efficacy: Maybe nothing else needs to be written on this.

What should we be writing about if we have already been, for very long, training the craft? What has not yet been written, what is the new thing?

The world surely won't save itself through writing, but it surely won't write itself either.


Be Wary of Thinking Like a FAI

6 kokotajlod 18 July 2014 08:22PM

I recently realized that, encouraged by LessWrong, I had been using a heuristic in my philosophical reasoning that I now think is suspect. I'm not accusing anybody else of falling into the same trap; I'm just recounting my own situation for the benefit of all.

I actually am not 100% sure that the heuristic is wrong. I hope that this discussion about it generalizes into a conversation about intuition and the relationship between FAI epistemology and our own epistemology.

The heuristic is this: If the ideal FAI would think a certain way, then I should think that way as well. At least in epistemic matters, I should strive to be like an ideal FAI.

Examples of the heuristic in use are:

--The ideal FAI wouldn't care about its personal identity over time; it would have no problem copying itself and deleting the original as the need arose. So I should (a) not care about personal identity over time, even if it exists, and (b) stop believing that it exists.

--The ideal FAI wouldn't care about its personal identity at a given time either; if it was proven that 99% of all observers with its total information set were in fact Boltzmann Brains, then it would continue to act as if it were not a Boltzmann Brain, since that's what maximizes utility. So I should (a) act as if I'm not a BB even if I am one, and (b) stop thinking it is even a meaningful possibility.

--The ideal FAI would think that the specific architecture it is implemented on (brains, computers, nanomachines, giant look-up tables) is irrelevant except for practical reasons like resource efficiency. So, following its example, I should stop worrying about whether e.g. a simulated brain would be conscious.

--The ideal FAI would think that it was NOT a "unified subject of experience" or an "irreducible substance" or that it was experiencing "ineffable, irreducible quale," because believing in those things would only distract it from understanding and improving its inner workings. Therefore, I should think that I, too, am nothing but a physical mechanism and/or an algorithm implemented somewhere but capable of being implemented elsewhere.

--The ideal FAI would use UDT/TDT/etc. Therefore I should too.

--The ideal FAI would ignore uncomputable possibilities. Therefore I should too.


Arguably, most if not all of the conclusions I drew in the above are actually correct. However, I think that the heuristic is questionable, for the following reasons:

(1) Sometimes what we think of as the ideal FAI isn't actually ideal. Case in point: The final bullet above about uncomputable possibilities. We intuitively think that uncomputable possibilites ought to be countenanced, so rather than overriding our intuition when presented with an attractive theory of the ideal FAI (in this case AIXI) perhaps we should keep looking for an ideal that better matches our intuitions.

(2) The FAI is a tool for serving our wishes; if we start to think of ourselves as being fundamentally the same sort of thing as the FAI, our values may end up drifting badly. For simplicity, let's suppose the FAI is designed to maximize happy human life-years. The problem is, we don't know how to define a human. Do simulated brains count? What about patterns found inside rocks? What about souls, if they exist? Suppose we have the intuition that humans are indivisible entities that persist across time. If we reason using the heuristic I am talking about, we would decide that, since the FAI doesn't think it is an indivisible entity that persists across time, we shouldn't think we are either. So we would then proceed to tell the FAI "Humans are naught but a certain kind of functional structure," and (if our overruled intuition was correct) all get killed.



Note 1: "Intuitions" can (I suspect) be thought of as another word for "Priors."

Note 2: We humans are NOT solomonoff-induction-approximators, as far as I can tell. This bodes ill for FAI, I think.

Weekly LW Meetups

1 FrankAdamek 18 July 2014 04:25PM

[LINK] Another "LessWrongers are crazy" article - this time on Slate

8 CronoDAS 18 July 2014 04:57AM

The Correct Use of Analogy

24 SilentCal 16 July 2014 09:07PM

In response to: Failure by AnalogySurface Analogies and Deep Causes

Analogy gets a bad rap around here, and not without reason. The kinds of argument from analogy condemned in the above links fully deserve the condemnation they get. Still, I think it's too easy to read them and walk away thinking "Boo analogy!" when not all uses of analogy are bad. The human brain seems to have hardware support for thinking in analogies, and I don't think this capability is a waste of resources, even in our highly non-ancestral environment. So, assuming that the linked posts do a sufficient job detailing the abuse and misuse of analogy, I'm going to go over some legitimate uses.


The first thing analogy is really good for is description. Take the plum pudding atomic model. I still remember this falsified proposal of negative 'raisins' in positive 'dough' largely because of the analogy, and I don't think anyone ever attempted to use it to argue for the existence of tiny subnuclear particles corresponding to cinnamon. 

But this is only a modest example of what analogy can do. The following is an example that I think starts to show the true power: my comment on Robin Hanson's 'Don't Be "Rationalist"'. To summarize, Robin argued that since you can't be rationalist about everything you should budget your rationality and only be rational about the most important things; I replied that maybe rationality is like weightlifting, where your strength is finite yet it increases with use. That comment is probably the most successful thing I've ever written on the rationalist internet in terms of the attention it received, including direct praise from Eliezer and a shoutout in a Scott Alexander (yvain) post, and it's pretty much just an analogy.

Here's another example, this time from Eliezer. As part of the AI-Foom debate, he tells the story of Fermi's nuclear experiments, and in particular his precise knowledge of when a pile would go supercritical.

What do the above analogies accomplish? They provide counterexamples to universal claims. In my case, Robin's inference that rationality should be spent sparingly proceeded from the stated premise that no one is perfectly rational about anything, and weightlifting was a counterexample to the implicit claim 'a finite capacity should always be directed solely towards important goals'. If you look above my comment, anon had already said that the conclusion hadn't been proven, but without the counterexample this claim had much less impact.

In Eliezer's case, "you can never predict an unprecedented unbounded growth" is the kind of claim that sounds really convincing. "You haven't actually proved that" is a weak-sounding retort; "Fermi did it" immediately wins the point. 

The final thing analogies do really well is crystallize patterns. For an example of this, let's turn to... Failure by Analogy. Yep, the anti-analogy posts are themselves written almost entirely via analogy! Alchemists who glaze lead with lemons and would-be aviators who put beaks on their machines are invoked to crystallize the pattern of 'reasoning by similarity'. The post then makes the case that neural-net worshippers are reasoning by similarity in just the same way, making the same fundamental error.

It's this capacity that makes analogies so dangerous. Crystallizing a pattern can be so mentally satisfying that you don't stop to question whether the pattern applies. The antidote to this is the question, "Why do you believe X is like Y?" Assessing the answer and judging deep similarities from superficial ones may not always be easy, but just by asking you'll catch the cases where there is no justification at all.

LINK: Blood from youth keeps you young

2 polymathwannabe 16 July 2014 01:06AM

In experiments performed on mice, blood transfusions from young mice reversed age-related markers in older mice. The protein involved is identical in humans.



Group Rationality Diary, July 16-31

1 therufs 16 July 2014 12:34AM

This is the public group instrumental rationality diary for July 16-31. 

It's a place to record and chat about it if you have done, or are actively doing, things like: 

  • Established a useful new habit
  • Obtained new evidence that made you change your mind about some belief
  • Decided to behave in a different way in some set of situations
  • Optimized some part of a common routine or cached behavior
  • Consciously changed your emotions or affect with respect to something
  • Consciously pursued new valuable information about something that could make a big difference in your life
  • Learned something new about your beliefs, behavior, or life that surprised you
  • Tried doing any of the above and failed

Or anything else interesting which you want to share, so that other people can think about it, and perhaps be inspired to take action themselves. Try to include enough details so that everyone can use each other's experiences to learn about what tends to work out, and what doesn't tend to work out.

Thanks to cata for starting the Group Rationality Diary posts, and to commenters for participating.

Previous diary: July 1-15

Rationality diaries archive

An Experiment In Social Status: Software Engineer vs. Data Science Manager

17 JQuinton 15 July 2014 08:24PM

Here is an interesting blog post about a guy who did a resume experiment between two positions which he argues are by experience identical, but occupy different "social status" positions in tech: A software engineer and a data manager.

Interview A: as Software Engineer

Bill faced five hour-long technical interviews. Three went well. One was so-so, because it focused on implementation details of the JVM, and Bill’s experience was almost entirely in C++, with a bit of hobbyist OCaml. The last interview sounds pretty hellish. It was with the VP of Data Science, Bill’s prospective boss, who showed up 20 minutes late and presented him with one of those interview questions where there’s “one right answer” that took months, if not years, of in-house trial and error to discover. It was one of those “I’m going to prove that I’m smarter than you” interviews...

Let’s recap this. Bill passed three of his five interviews with flying colors. One of the interviewers, a few months later, tried to recruit Bill to his own startup. The fourth interview was so-so, because he wasn’t a Java expert, but came out neutral. The fifth, he failed because he didn’t know the in-house Golden Algorithm that took years of work to discover. When I asked that VP/Data Science directly why he didn’t hire Bill (and he did not know that I knew Bill, nor about this experiment) the response I got was “We need people who can hit the ground running.” Apparently, there’s only a “talent shortage” when startup people are trying to scam the government into changing immigration policy. The undertone of this is that “we don’t invest in people”.

Or, for a point that I’ll come back to, software engineers lack the social status necessary to make others invest in them.

Interview B: as Data Science manager.

A couple weeks later, Bill interviewed at a roughly equivalent company for the VP-level position, reporting directly to the CTO.

Worth noting is that we did nothing to make Bill more technically impressive than for Company A. If anything, we made his technical story more honest, by modestly inflating his social status while telling a “straight shooter” story for his technical experience. We didn’t have to cover up periods of low technical activity; that he was a manager, alone, sufficed to explain those away.

Bill faced four interviews, and while the questions were behavioral and would be “hard” for many technical people, he found them rather easy to answer with composure. I gave him the Golden Answer, which is to revert to “There’s always a trade-off between wanting to do the work yourself, and knowing when to delegate.” It presents one as having managerial social status (the ability to delegate) but also a diligent interest in, and respect for, the work. It can be adapted to pretty much any “behavioral” interview question...

Bill passed. Unlike for a typical engineering position, there were no reference checks. The CEO said, “We know you’re a good guy, and we want to move fast on you”. As opposed tot he 7-day exploding offers typically served to engineers, Bill had 2 months in which to make his decision. He got a fourth week of vacation without even having to ask for it, and genuine equity (about 75% of a year’s salary vesting each year)...

It was really interesting, as I listened in, to see how different things are once you’re “in the club”. The CEO talked to Bill as an equal, not as a paternalistic, bullshitting, “this is good for your career” authority figure. There was a tone of equality that a software engineer would never get from the CEO of a 100-person tech company.

The author concludes that positions that are labeled as code-monkey-like are low status, while positions that are labeled as managerial are high status. Even if they are "essentially" doing the same sort of work.

Not sure about this methodology, but it's food for thought.

Wealth from Self-Replicating Robots

3 Algernoq 15 July 2014 04:42AM

I have high confidence that economically-valuable self-replicating robots are possible with existing technology: initially, something similar in size and complexity to a RepRap, but able to assemble a copy of itself from parts ordered online with zero human interaction. This is important because more robots could provide the economic growth needed to solve many urgent problems. I've held this idea for long enough that I'm worried about being a crank, so any feedback is appreciated.

I care because to fulfill my naive and unrealistic dreams (not dying, owning a spaceship) I need the world to be a LOT richer. Specifically, naively assuming linear returns to medical research funding, a funding increase of ~10x (to ~$5 trillion/year, or ~30% of current USA GDP) is needed to achieve actuarial escape velocity (average lifespans currently increase by about 1 year each decade, so a 10x increase is needed for science to keep up with aging). The simplest way to get there is to have 10x as many machines per person.

My vision is that someone does for hardware what open-source has done for software: make useful tools free. A key advantage of software is that making a build or copying a program takes only one step. In software, you click "compile" and (hopefully) it's done and ready to test in seconds. In hardware, it takes a bunch of steps to build a prototype (order parts, screw fiddly bits together, solder, etc.). A week is an insanely short lead time for building a new prototype of something mechanical. 1-2 months is typical in many industries. This means that mechanical things have high marginal cost, because people have to build and debug them, and typically transport them for thousands of miles from factory to consumer.

Relevant previous research projects include trivial self-replication from pre-fabricated components and an overly-ambitious NASA-funded plan from the 1980s to develop the Moon using self-replicating robots. Current research funding tends to go toward bio-inspired systems, re-configurable systems using prefabricated cubes (conventionally-manufactured), or chemistry deceptively called "nanotech", all of which seem to miss the opportunity to use existing autonomous assembly technology with online ordering of parts to make things cheaper by getting rid of setup cost and building cost.

I envision a library/repository of useful robots for specific tasks (cleaning, manufacturing, etc.), in a standard format for download (parts list, 3D models, assembly instructions, etc.). Parts could be ordered online. A standard fabricator robot with the capability to identify and manipulate parts, and fasten them using screws, would verify that the correct parts were received, put everything together, and run performance checks. For comparison, the RepRap takes >9 hours of careful human labor to build. An initial self-replicating implementation would be a single fastener robot. It would spread by undercutting the price of competing robot arm systems. Existing systems sell for ~2x the cost of components, due to overhead for engineering, assembly, and shipping. This appears true for robots at a range of price points, including $200 robot arms using hobby servos and $40,000+ robot arms using optical encoders and direct-drive brushless motors. A successful system that undercut the price of conventionally-assembled hobby robots would provide a platform for hobbyists to create additional robots that could be autonomously built (e.g. a Roomba for 1/5 the price, due to not needing to pay the 5x markup for overhead and distribution). Once a beachhead is established in the form of a successful self-replicating assembly robot, market pressures would drive full automation of more products/industries, increasing output for everyone.

This is a very hard programming challenge, but the tools exist to identify, manipulate and assemble parts. Specifically, ROS is an open-source software library whose packages can be put together to solve tasks such as mapping a building or folding laundry. It's hard because it would require a lot of steps and a new combination of existing tools.

This is also a hard systems/mechanical challenge: delivering enough data and control bandwidth for observability and controllability, and providing lightweight and rigid hardware, so that the task for the software is possible rather than impossible. Low-cost components have less performance: a webcam has limited resolution, and hobby servos have limited accuracy. The key problem - autonomously picking up a screw and screwing it into a hole - has been solved years ago for assembly-line robots. Doing the same task with low-cost components appears possible in principle. A comparable problem that has been solved is autonomous construction using quadcopters.

Personally, I would like to build a robot arm that could assemble more robot arms. It would require, at minimum, a robot arm using hobby servos, a few webcams, custom grippers (for grasping screws, servos, and laser-cut sheet parts), custom fixtures (blocks with a cutout to hold two parts in place while the robot arm inserts a screw; ideally multiple robot arms would be used to minimize unique tooling but fixtures would be easier initially), and a lot of challenging code using ROS and Gazebo. Just the mechanical stuff, which I have the education for, would be a challenging months-long side project, and the software stuff could take years of study (the equivalent of a CS degree) before I'd have the required background to reasonably attempt it.

I'm not sure what to do with this idea. Getting a CS degree on top of a mechanical engineering degree (so I could know enough to build this) seems like a good career choice for interesting work and high pay (even if/when this doesn't work). Previous ideas like this I've had that are mostly outside my field have been unfeasible for reasons only someone familiar with the field would know. It's challenging to stay motivated to work on this, because the payoff is so distant, but it's also challenging not to work on this, because there's enough of a chance that this would work that I'm excited about it. I'm posting this here in the hopes someone with experience with industrial automation will be inspired to build this, and to get well-reasoned feedback.

How deferential should we be to the forecasts of subject matter experts?

11 VipulNaik 14 July 2014 11:41PM

This post explores the question: how strongly should we defer to predictions and forecasts made by people with domain expertise? I'll assume that the domain expertise is legitimate, i.e., the people with domain expertise do have a lot of information in their minds that non-experts don't. The information is usually not secret, and non-experts can usually access it through books, journals, and the Internet. But experts have more information inside their head, and may understand it better. How big an advantage does this give them in forecasting?

Tetlock and expert political judgment

In an earlier post on historical evaluations of forecasting, I discussed Philip E. Tetlock's findings on expert political judgment and forecasting skill, and summarized his own article for Cato Unbound co-authored with Dan Gardner that in turn summarized the themes of the book:

  1. The average expert’s forecasts were revealed to be only slightly more accurate than random guessing—or, to put more harshly, only a bit better than the proverbial dart-throwing chimpanzee. And the average expert performed slightly worse than a still more mindless competition: simple extrapolation algorithms that automatically predicted more of the same.
  2. The experts could be divided roughly into two overlapping yet statistically distinguishable groups. One group (the hedgehogs) would actually have been beaten rather soundly even by the chimp, not to mention the more formidable extrapolation algorithm. The other (the foxes) would have beaten the chimp and sometimes even the extrapolation algorithm, although not by a wide margin.
  3. The hedgehogs tended to use one analytical tool in many different domains; they preferred keeping their analysis simple and elegant by minimizing “distractions.” These experts zeroed in on only essential information, and they were unusually confident—they were far more likely to say something is “certain” or “impossible.” In explaining their forecasts, they often built up a lot of intellectual momentum in favor of their preferred conclusions. For instance, they were more likely to say “moreover” than “however.”
  4. The foxes used a wide assortment of analytical tools, sought out information from diverse sources, were comfortable with complexity and uncertainty, and were much less sure of themselves—they tended to talk in terms of possibilities and probabilities and were often happy to say “maybe.” In explaining their forecasts, they frequently shifted intellectual gears, sprinkling their speech with transition markers such as “although,” “but,” and “however.”
  5. It's unclear whether the performance of the best forecasters is the best that is in principle possible.
  6. This widespread lack of curiosity—lack of interest in thinking about how we think about possible futures—is a phenomenon worthy of investigation in its own right.

Tetlock has since started The Good Judgment Project (website, Wikipedia), a political forecasting competition where anybody can participate, and with a reputation of doing a much better job at prediction than anything else around. Participants are given a set of questions and can basically collect freely available online information (in some rounds, participants were given additional access to some proprietary data). They then use that to make predictions. The aggregate predictions are quite good. For more information, visit the website or see the references in the Wikipedia article. In particular, this Economist article and this Business Insider article are worth reading. (I discussed the GJP and other approaches to global political forecasting in this post).

So at least in the case of politics, it seems that amateurs, armed with basic information plus the freedom to look around for more, can use "fox-like" approaches and do a better job of forecasting than political scientists. Note that experts still do better than ignorant non-experts who are denied access to information. But once you have basic knowledge and are equipped to hunt more down, the constraining factor does not seem to be expertise, but rather, the approach you use (fox-like versus hedgehog-like). This should not be taken as a claim that expertise is irrelevant or unnecessary to forecasting. Experts play an important role in expanding the scope of knowledge and methodology that people can draw on to make their predictions. But the experts themselves, as people, do not have a unique advantage when it comes to forecasting.

Tetlock's research focused on politics. But the claim that the fox-hedgehog distinction turns out to be a better prediction of forecasting performance than the level of expertise is a general one. How true is this claim in domains other than politics? Domains such as climate science, economic growth, computing technology, or the arrival of artificial general intelligence?

Armstrong and Green again

J. Scott Armstrong is a leading figure in the forecasting community. Along with Kesten C. Green, he penned a critique of the forecasting exercises in climate science in 2007, with special focus on the IPCC reports. I discussed the critique at length in my post on the insularity critique of climate science. Here, I quote a part from the introduction of the critique that better explains the general prior that Armstrong and Green claim to be bringing to the table when they begin their evaluation. Of the points they make at the beginning, two bear directly on the deference we should give to expert judgment and expert consensus:

  • Unaided judgmental forecasts by experts have no value: This applies whether the opinions are expressed in words, spreadsheets, or mathematical models. It applies regardless of how much scientific evidence is possessed by the experts. Among the reasons for this are:
    a) Complexity: People cannot assess complex relationships through unaided observations.
    b) Coincidence: People confuse correlation with causation.
    c) Feedback: People making judgmental predictions typically do not receive unambiguous feedback they can use to improve their forecasting.
    d) Bias: People have difficulty in obtaining or using evidence that contradicts their initial beliefs. This problem is especially serious for people who view themselves as experts.
  • Agreement among experts is only weakly related to accuracy: This is especially true when the experts communicate with one another and when they work together to solve problems, as is the case with the IPCC process.

Armstrong and Green later elaborate on these claims, referencing Tetlock's work. (Note that I have removed the parts of the section that involve direct discussion of climate-related forecasts, since the focus here is on the general question of how much deference to show to expert consensus).

Many public policy decisions are based on forecasts by experts. Research on persuasion has shown that people have substantial faith in the value of such forecasts. Faith increases when experts agree with one another. Our concern here is with what we refer to as unaided expert judgments. In such cases, experts may have access to empirical studies and other information, but they use their knowledge to make predictions without the aid of well-established forecasting principles. Thus, they could simply use the information to come up with judgmental forecasts. Alternatively, they could translate their beliefs into mathematical statements (or models) and use those to make forecasts.

Although they may seem convincing at the time, expert forecasts can make for humorous reading in retrospect. Cerf and Navasky’s (1998) book contains 310 pages of examples, such as Fermi Award-winning scientist John von Neumann’s 1956 prediction that “A few decades hence, energy may be free”. [...] The second author’s review of empirical research on this problem led him to develop the “Seer-sucker theory,” which can be stated as “No matter how much evidence exists that seers do not exist, seers will find suckers” (Armstrong 1980). The amount of expertise does not matter beyond a basic minimum level. There are exceptions to the Seer-sucker Theory: When experts get substantial well-summarized feedback about the accuracy of their forecasts and about the reasons why their forecasts were or were not accurate, they can improve their forecasting. This situation applies for short-term (up to five day) weather forecasts, but we are not aware of any such regime for long-term global climate forecasting. Even if there were such a regime, the feedback would trickle in over many years before it became useful for improving forecasting.

Research since 1980 has provided much more evidence that expert forecasts are of no value. In particular, Tetlock (2005) recruited 284 people whose professions included, “commenting or offering advice on political and economic trends.” He asked them to forecast the probability that various situations would or would not occur, picking areas (geographic and substantive) within and outside their areas of expertise. By 2003, he had accumulated over 82,000 forecasts. The experts barely if at all outperformed non-experts and neither group did well against simple rules. Comparative empirical studies have routinely concluded that judgmental forecasting by experts is the least accurate of the methods available to make forecasts. For example, Ascher (1978, p. 200), in his analysis of long-term forecasts of electricity consumption found that was the case.

Note that the claims that Armstrong and Green make are in relation to unaided expert judgment, i.e., expert judgment that is not aided by some form of assistance or feedback that promotes improved forecasting. (One can argue that expert judgment in climate science is not unaided, i.e., that the critique is mis-applied to climate science, but whether that is the case is not the focus of my post). While Tetlock's suggestion to be more fox-like, Armstrong and Green recommend the use of their own forecasting principles, as encoded in their full list of principles and described on their website.

A conflict of intuitions, and an attempt to resolve it

I have two conflicting intuitions here. I like to use the majority view among experts as a reasonable Bayesian prior to start with, that I might then modify based on further study. The relevant question here is who the experts are. Do I defer to the views of domain experts, who may know little about the challenges of forecasting, or do I defer to the views of forecasting experts, who may know little of the domain but argue that domain experts who are not following good forecasting principles do not have any advantage over non-experts?

I think the following heuristics are reasonable starting points:

  • In cases where we have a historical track record of forecasts, we can use that to evaluate the experts and non-experts. For instance, I reviewed the track record of survey-based macroeconomic forecasts, thanks to a wealth of recorded data on macroeconomic forecasts by economists over the last few decades. (Unfortunately, these surveys did not include corresponding data on layperson opinion).
  • The faster the feedback from making a forecast to knowing whether it's right, the more likely it is that experts would have learned how to make good forecasts.
  • The more central forecasting is to the overall goals of the domain, the more likely people are to get it right. For instance, forecasting is a key part of weather and climate science. But forecasting progress on mathematical problems has a negligible relation with doing mathematical research.
  • Ceteris paribus, if experts are clearly recording their forecasts and the reasons behind them, and systematically evaluating the performance on past forecasts, that should be taken as (weak) evidence in favor of the experts' views being taken more seriously (even if we don't have enough of a historical track record to properly calibrate forecast accuracy). However, if they simply make forecasts but then fail to review their past history of forecasts, this may be taken as being about as bad as not forecasting at all. And in cases that the forecasts were bold, failed miserably, and yet the errors were not acknowledged, this should be taken as being considerably worse than not forecasting at all.
  • A weak inside view of the nature of domain expertise can give some idea of whether expertise should generally translate to better forecasting skill. For instance, even a very weak understanding of physics will tell us that physicists are no more likely to determine whether a coin toss will yield heads or tails, even though the fate of the coin is determined by physics. Similarly, with the exception of economists who specialize in the study of macroeconomic indicators, one wouldn't expect economists to be able to forecast macroeconomic indicators better than most moderately economically informed people.


My first thought was that the more politicized a field, the less reliable any forecasts coming out of it. I think there are obvious reasons for that view, but there are also countervailing considerations.

The main claimed danger of politicization is groupthink and lack of openness to evidence. It could even lead to suppression, misrepresentation, or fabrication of evidence. Quite often, however, we see these qualities in highly non-political fields. People believe that certain answers are the right ones. Their political identity or ego is not attached to it. They just have high confidence that that answer is correct, and when the evidence they have does not match up, they think there is a problem with the evidence. Of course, if somebody does start challenging the mainstream view, and the issue is not quickly resolved either way, it can become politicized, with competing camps of people who hold the mainstream view and people who side with the challengers. Note, however, that the politicization has arguably reduced the aggregate amount of groupthink in the field. Now that there are two competing camps rather than one received wisdom, new people can examine evidence and better decide which camp is more on the side of truth. People in both camps, now that they are competing, may try to offer better evidence that could convince the undecideds or skeptics. So "politicization" might well improve the epistemic situation (I don't doubt that the opposite happens quite often). Examples of such politicization might be the replacement of geocentrism by heliocentrism, the replacement of creationism by evolution, and the replacement of Newtonian mechanics by relativity and/or quantum mechanics. In the first two cases, religious authorities pushed against the new idea, even though the old idea had not been a "politicized" tenet before the competing claims came along. In the case of Newtonian and quantum mechanics, the debate seems to have been largely intra-science, but quantum mechanics had its detractors, including Einstein, famous for the "God does not play dice" quip. (This post on Slate Star Codex is somewhat related).

The above considerations aren't specific to forecasting, and they apply even for assertions that fall squarely within the domain of expertise and require no forecasting skill per se. The extent to which they apply to forecasting problems is unclear. It's unclear whether most domains have any significant groupthink in favor of particular forecasts. In fact, in most domains, forecasts aren't really made or publicly recorded at all. So concerns of groupthink in a non-politicized scenario may not apply to forecasting. Perhaps the problem is the opposite: forecasts are so unimportant in many domains that the forecasts offered by experts are almost completely random and hardly informed in a systematic way by their expert knowledge. Even in such situations, politicization can be helpful, in so far as it makes the issue more salient and might prompt individuals to give more attention to trying to figure out which side is right.

The case of forecasting AI progress

I'm still looking at the case of forecasting AI progress, but for now, I'd like to point people to Luke Muehlhauser's excellent blog post from May 2013 discussing the difficulty with forecasting AI progress. Interestingly, he makes many points similar to those I make here. (Note: Although I had read the post around the time it was published, I hadn't read it recently until I finished drafting the rest of my current post. Nonetheless, my views can't be considered totally independent of Luke's because we've discussed my forecasting contract work for MIRI).

Should we expect experts to be good at predicting AI, anyway? As Armstrong & Sotala (2012) point out, decades of research on expert performance2 suggest that predicting the first creation of AI is precisely the kind of task on which we should expect experts to show poor performance — e.g. because feedback is unavailable and the input stimuli are dynamic rather than static. Muehlhauser & Salamon (2013) add, “If you have a gut feeling about when AI will be created, it is probably wrong.”


On the other hand, Tetlock (2005) points out that, at least in his large longitudinal database of pundit’s predictions about politics, simple trend extrapolation is tough to beat. Consider one example from the field of AI: when David Levy asked 1989 World Computer Chess Championship participants when a chess program would defeat the human World Champion, their estimates tended to be inaccurately pessimistic,8 despite the fact that computer chess had shown regular and predictable progress for two decades by that time. Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).

Looking for thoughts

I'm particularly interested in thoughts from people on the following fronts:

  1. What are some indicators you use to determine the reliability of forecasts by subject matter experts?
  2. How do you resolve the conflict of intuitions between deferring to the views of domain experts and deferring to the conclusion that forecasters have drawn about the lack of utility of domain experts' forecasts?
  3. In particular, what do you think of the way that "politicization" affects the reliability of forecasts?
  4. Also, how much value do you assign to agreement between experts when judging how much trust to place in expert forecasts?
  5. Comments that elaborate on these questions or this general topic within the context of a specific domain or domains would also be welcome.

Scenario analyses for technological progress for the next decade

10 VipulNaik 14 July 2014 04:31PM

This is a somewhat long and rambling post. Apologies for the length. I hope the topic and content are interesting enough for you to forgive the meandering presentation.

I blogged about the scenario planning method a while back, where I linked to many past examples of scenario planning exercises. In this post, I take a closer look at scenario analysis in the context of understanding the possibilities for the unfolding of technological progress over the next 10-15 years. Here, I will discuss some predetermined elements and critical uncertainties, offer my own scenario analysis, and then discuss scenario analyses by others.

Remember: it is not the purpose of scenario analysis to identify a set of mutually exclusive and collectively exhaustive outcomes. In fact, usually, the real-world outcome has some features from two or more of the scenarios considered, with one scenario dominating somewhat. As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

The predetermined element: the imminent demise of Moore's law "as we know it"

As Steven Schnaars noted in Megamistakes (discussed here), forecasts of technological progress in most domains have been overoptimistic, but in the domain of computing, they've been largely spot-on, mostly because the raw technology has improved quickly. The main reason has been Moore's law, and a couple other related laws, that have undergirded technological progress. But now, the party is coming to an end! The death of Moore's law (as we know it) is nigh, and there are significant implications for the future of computing.

Moore's law refers to many related claims about technological progress. Some forms of this technological progress have already stalled. Other forms are slated to stall in the near future, barring unexpected breakthroughs. These facts about Moore's law form the backdrop for all our scenario planning.

The critical uncertainty arises in how industry will respond to the prospect of Moore's law death. Will there be a doubling down on continued improvement at the cutting edge? Will the battle focus on cost reductions? Or will we have neither cost reduction nor technological improvement? What sort of pressure will hardware stagnation put on software?

Now, onto a description of the different versions of Moore's law (slightly edited version of information from Wikipedia):

  • Transistors per integrated circuit. The most popular formulation is of the doubling of the number of transistors on integrated circuits every two years.

  • Density at minimum cost per transistor. This is the formulation given in Moore's 1965 paper. It is not just about the density of transistors that can be achieved, but about the density of transistors at which the cost per transistor is the lowest. As more transistors are put on a chip, the cost to make each transistor decreases, but the chance that the chip will not work due to a defect increases. In 1965, Moore examined the density of transistors at which cost is minimized, and observed that, as transistors were made smaller through advances in photolithography, this number would increase at "a rate of roughly a factor of two per year".

  • Dennard scaling. This suggests that power requirements are proportional to area (both voltage and current being proportional to length) for transistors. Combined with Moore's law, performance per watt would grow at roughly the same rate as transistor density, doubling every 1–2 years. According to Dennard scaling transistor dimensions are scaled by 30% (0.7x) every technology generation, thus reducing their area by 50%. This reduces the delay by 30% (0.7x) and therefore increases operating frequency by about 40% (1.4x). Finally, to keep electric field constant, voltage is reduced by 30%, reducing energy by 65% and power (at 1.4x frequency) by 50%. Therefore, in every technology generation transistor density doubles, circuit becomes 40% faster, while power consumption (with twice the number of transistors) stays the same.

So how are each of these faring?

  • Transistors per integrated circuit: At least in principle, this can continue for a decade or so. The technological ideas exist to publish transistor sizes down from the current values of 32 nm and 28 nm all the way down to 7 nm.
  • Density at minimum cost per transistor. This is probably stopping around now. There is good reason to believe that, barring unexpected breakthroughs, the transistor size for which we have minimum cost per transistor shall not go down below 28 nm. There may still be niche applications that benefit from smaller transistor sizes, but there will be no overwhelming economic case to switch production to smaller transistor sizes (i.e., higher densities).
  • Dennard scaling. This broke down around 2005-2007. So for approximately a decade, we've essentially seen continued miniaturization but without any corresponding improvement in processor speed or performance per watt. There have been continued overall improvements in energy efficiency of computing, but not through this mechanism. The absence of automatic speed improvements has led to increased focus on using greater parallelization (note that the miniaturization means more parallel processors can be packed in the same space, so Moore's law is helping in this other way). In particular, there has been an increased focus on multicore processors, though there may be limits to how far that can take us too.

Moore's law isn't the only law that is slated to end. Other similar laws, such as Kryder's law (about the cost of hard disk space) may also end in the near future. Koomey's law on energy efficiency may also stall, or might continue to hold but through very different mechanisms compared to the ones that have driven it so far.

Some discussions that do not use explicit scenario analysis

The quotes below are to give a general idea of what people seem to generally agree on, before we delve into different scenarios.

EETimes writes:

We have been hearing about the imminent demise of Moore's Law quite a lot recently. Most of these predictions have been targeting the 7nm node and 2020 as the end-point. But we need to recognize that, in fact, 28nm is actually the last node of Moore's Law.


Summarizing all of these factors, it is clear that -- for most SoCs -- 28nm will be the node for "minimum component costs" for the coming years. As an industry, we are facing a paradigm shift because dimensional scaling is no longer the path for cost scaling. New paths need to be explored such as SOI and monolithic 3D integration. It is therefore fitting that the traditional IEEE conference on SOI has expanded its scope and renamed itself as IEEE S3S: SOI technology, 3D Integration, and Subthreshold Microelectronics.

Computer scientist Moshe Yardi writes:

So the real question is not when precisely Moore's Law will die; one can say it is already a walking dead. The real question is what happens now, when the force that has been driving our field for the past 50 years is dissipating. In fact, Moore's Law has shaped much of the modern world we see around us. A recent McKinsey study ascribed "up to 40% of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements." Indeed, the demise of Moore's Law is one reason some economists predict a "great stagnation" (see my Sept. 2013 column).

"Predictions are difficult," it is said, "especially about the future." The only safe bet is that the next 20 years will be "interesting times." On one hand, since Moore's Law will not be handing us improved performance on a silver platter, we will have to deliver performance the hard way, by improved algorithms and systems. This is a great opportunity for computing research. On the other hand, it is possible that the industry would experience technological commoditization, leading to reduced profitability. Without healthy profit margins to plow into research and development, innovation may slow down and the transition to the post-CMOS world may be long, slow, and agonizing.

However things unfold, we must accept that Moore's Law is dying, and we are heading into an uncharted territory.

CNet says:

"I drive a 1964 car. I also have a 2010. There's not that much difference -- gross performance indicators like top speed and miles per gallon aren't that different. It's safer, and there are a lot of creature comforts in the interior," said Nvidia Chief Scientist Bill Dally. If Moore's Law fizzles, "We'll start to look like the auto industry."

Three critical uncertainties: technological progress, demand for computing power, and interaction with software

Uncertainty #1: Technological progress

Moore's law is dead, long live Moore's law! Even if Moore's law as originally stated is no longer valid, there are other plausible computing advances that would preserve the spirit of the law.

Minor modifications of current research (as described in EETimes) include:

  • Improvements in 3D circuit design (Wikipedia), so that we can stack multiple layers of circuits one on top of the other, and therefore pack more computing power per unit volume.
  • Improvements in understanding electronics at the nanoscale, in particular understanding subthreshold leakage (Wikipedia) and how to tackle it.

Then, there are possibilities for totally new computing paradigms. These have fairly low probability, and are highly unlikely to become commercially viable within 10-15 years. Each of these offers an advantage over currently available general-purpose computing only for special classes of problems, generally those that are parallelizable in particular ways (the type of parallelizability needed differs somewhat between the computing paradigms).

  • Quantum computing (Wikipedia) (speeds up particular types of problems). Quantum computers already exist, but the current ones can tackle only a few qubits. Currently, the best known quantum computers in action are those maintained at the Quantum AI Lab (Wikipedia) run jointly by Google, NASA. and USRA. It is currently unclear how to manufacture quantum computers with a larger number of qubits. It's also unclear how the cost will scale in the number of qubits. If the cost scales exponentially in the number of qubits, then quantum computing will offer little advantage over classical computing. Ray Kurzweil explains this as follows:
    A key question is: how difficult is it to add each additional qubit? The computational power of a quantum computer grows exponentially with each added qubit, but if it turns out that adding each additional qubit makes the engineering task exponentially more difficult, we will not be gaining any leverage. (That is, the computational power of a quantum computer will be only linearly proportional to the engineering difficulty.) In general, proposed methods for adding qubits make the resulting systems significantly more delicate and susceptible to premature decoherence.

    Kurzweil, Ray (2005-09-22). The Singularity Is Near: When Humans Transcend Biology (Kindle Locations 2152-2155). Penguin Group. Kindle Edition.
  • DNA computing (Wikipedia)
  • Other types of molecular computing (Technology Review featured story from 2000, TR story from 2010)
  • Spintronics (Wikipedia): The idea is to store information using the spin of the electron, a quantum property that is binary and can be toggled at zero energy cost (in principle). The main potential utility of spintronics is in data storage, but it could potentially help with computation as well.
  • Optical computing aka photonic computing (Wikipedia): This uses beams of photons that store the relevant information that needs to be manipulated. Photons promise to offer higher bandwidth than electrons, the tool used in computing today (hence the name electronic computing).

Uncertainty #2: Demand for computing

Even if computational advances are possible in principle, the absence of the right kind of demand can lead to a lack of financial incentive to pursue the relevant advances. I discussed the interaction between supply and demand in detail in this post.

As that post discussed, demand for computational power at the consumer end is probably reaching saturation. The main source of increased demand will now be companies that want to crunch huge amounts of data in order to more efficiently mine data for insight and offer faster search capabilities to their users. The extent to which such demand grows is uncertain. In principle, the demand is unlimited: the more data we collect (including "found data" that will expand considerably as the Internet of Things grows), the more computational power is needed to apply machine learning algorithms to the data. Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.

Uncertainty #3: Interaction with software

Much of the increased demand for computing, as noted above, does not arise so much from a need for raw computing power by consumers, but a need for more computing power to manipulate and glean insight from large data sets. While there has been some progress with algorithms for machine learning and data mining, the fields are probably far from mature. So an alternative to hardware improvements is improvements in the underlying algorithms. In addition to the algorithms themselves, execution details (such as better use of parallel processing capabilities and more efficient use of idle processor capacity) can also yield huge performance gains.

This might be a good time to note a common belief about software and why I think it's wrong. We often tend to hear of software bloat, and some people subscribe to Wirth's law, the claim that software is getting slower more quickly than hardware is getting faster. I think that there are some softwares that have gotten feature-bloated over time, largely because there are incentives to keep putting out new editions that people are willing to pay money for, and Microsoft Word might be one case of such bloat. For the most part, though, software has been getting more efficient, partly by utilizing the new hardware better, but also partly due to underlying algorithmic improvements. This was one of the conclusions of Katja Grace's report on algorithmic progress (see also this link on progress on linear algebra and linear programming algorithms). There are a few softwares that get feature-bloated and as a result don't appear to improve over time as far as speed goes, but it's arguably the case that people's revealed preferences show that they are willing to put up with the lack of speed improvements as long as they're getting feature improvements.

Computing technology progress over the next 10-15 years: my three scenarios

  1. Slowdown to ordinary rates of growth of cutting-edge industrial productivity: For the last few decades, several dimensions of computing technology have experienced doublings over time periods ranging from six months to five years. With such fast doubling, we can expect price-performance thresholds for new categories of products to be reached every few years, with multiple new product categories a decade. Consider, for instance, desktops, then laptops, then smartphones, then tablets. If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation. There are already some indications of a possible slowdown, and it remains to be seen whether we see a bounceback.
  2. Continued fast doubling: The other possibility is that the evidence for a slowdown is largely illusory, and computing technology will continue to experience doublings over timescales of less than five years. There would therefore be scope to introduce new product categories every few years.
  3. New computing paradigm with high promise, but requiring significant adjustment: This is an unlikely, but not impossible, scenario. Here, a new computing paradigm, such as quantum computing, reaches the realm of feasibility. However, the existing infrastructure of algorithms is ill-designed for quantum computing, and in fact, quantum computing engenders many security protocols while offering its own unbreakable ones. Making good use of this new paradigm requires a massive re-architecting of the world's computing infrastructure.

There are two broad features that are likely to be common to all scenarios:

  • Growing importance of algorithms: Scenario (1): If technological progress in computing power stalls, then the pressure for improvements to the algorithms and software may increase. Scenario (2): if technological progress in computing power continues, that might only feed the hunger for bigger data. And as the size of data sets increases, asymptotic performance starts mattering more (the distinction between O(n) and O(n2) matters more when n is large). In both cases, I expect more pressure on algorithms and software, but in different ways: in the case of stalling hardware progress, the focus will be more on improving the software and making minor changes to improve the constants, whereas in the case of rapid hardware progress, the focus will be more on finding algorithms that have better asymptotic (big-oh) performance. Scenario (3): In the case of paradigm shifts, the focus will be on algorithms that better exploit the new paradigm. In all cases, there will need to be some sort of shift toward new algorithms and new code that better exploits the new situation.
  • Growing importance of parallelization: Although the specifics of how algorithms will become more important varies between the scenarios, one common feature is that algorithms that can better make parallel use of large numbers of machines will become more important. We have seen parallelization grow in importance over the last 15 years, even as the computing gains for individual processors through Moore's law seems to be plateauing out, while data centers have proliferated in number. However, the full power of parallelization is far from tapped out. Again, parallelization matters for slightly different reasons in different cases. Scenario (1): A slowdown in technological progress would mean that gains in the amount of computation can largely be achieved by scaling up the number of machines. In other words, the usage of computing shifts further in a capital-intensive direction. Parallel computing is important for effective utilization of this capital (the computing resources). Scenario (2): Even in the face of rapid hardware progress, automatic big data generation will likely improve much faster than storage, communication, and bandwidth. This "big data" is too huge to store or even stream on a single machine, so parallel processing across huge clusters of machines becomes important. Scenario (3): Note also that almost all the new computing paradigms currently under consideration (including quantum computing) offer massive advantages for special types of parallelizable problems, so parallelization matters even in the case of a paradigm shift in computing.

Other scenario analyses

McKinsey carried out a scenario analysis here, focused more on the implications for the semiconductor manufacturing industry than for users of computing. The report notes the importance of Moore's law in driving productivity improvements over the last few decades:

As a result, Moore’s law has swept much of the modern world along with it. Some estimates ascribe up to 40 percent of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements.

The scenario analysis identifies four potential sources of innovation related to Moore's law:

  1. More Moore (scaling)
  2. Wafer-size increases (maximize productivity)
  3. More than Moore (functional diversification)
  4. Beyond CMOS (new technologies)

Their scenario analysis uses a 2 X 2 model, with the two dimensions under consideration being performance improvements (continue versus stop) and cost improvements (continue versus stop). The case that both performance improvements and cost improvements continue is the "good" case for the semiconductor industry. The case that both stop is the case where the industry is highly likely to get commodified, with profit margins going down and small players catching up to the big ones. In the intermediate cases (where one of the two continues and the other stops), consolidation of the semiconductor industry is likely to continue, but there is still a risk of falling demand.

The McKinsey scenario analysis was discussed by Timothy Taylor on his blog, The Conversable Economist, here.

Roland Berger carried out a detailed scenario analysis focused on the "More than Moore" strategy here.

Blegging for missed scenarios, common features and early indicators

Are there scenarios that the analyses discussed above missed? Are there some types of scenario analysis that we didn't adequately consider? If you had to do your own scenario analysis for the future of computing technology and hardware progress over the next 10-15 years, what scenarios would you generate?

As I noted in my earlier post:

The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.

I already identified some features I believe to be common to all scenarios (namely, increased focus on algorithms, and increased focus on parallelization). Do you agree with my assessment that these are likely to matter regardless of scenario? Are there other such common features you have high confidence in?

If you generally agree with one or more of the scenario analyses here (mine or McKinsey's or Roland Berger's), what early indicators would you use to identify which of the enumerated scenarios we are in? Is it possible to look at how events unfold over the next 2-3 years and draw intelligent conclusions from that about the likelihood of different scenarios?

Open thread, 14-20 July 2014

4 David_Gerard 14 July 2014 11:16AM

Previous thread


If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one.

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

LW Australia's online hangout results, (short stories about cognitive biases)

2 Elo 14 July 2014 06:25AM

In the Australia Mega-Online-hangout; a member mentioned a task/goal of his to write a few short stories to convey cognitive biases.  After a while and a few more goals, someone suggested we actually write the short stories (the power of group resources!).  So we did.  They might be a bit silly, answers are at the very bottom, try to guess the biases.

We had some fun writing them up.  This project was intended to be a story-per-day blog.  feel free to write a short story in the discussion, or comment on how a different cognitive bias might be attributed to any of the stories.

Guess the bias in the short stories:

Cathy hates catching the train.  She hates waiting in line for tickets, she hates lazy people who can't get their wallet out before they get to the front of the line, she hates missing her train because people are disorganised and carry bags of junk around with them, "why are you so disorganised", she said to the woman in front of her, who looks at her in a huff.  As she gets to the front of the line she opens her bag to find her wallet, she looks under her umbrella that she keeps for a rainy day, even though its not rainy today, moves her phone to her pocket so that she can listen to a rationality audiobook when she gets on the train, moves her book away, shuffles the gum around that she never eats, rifles past the dirty tissues and finally pulls out her wallet.  A grumpy man behind cathy in the line mutters, "why are you so disorganised".  Which she knows is not true because she is usually very organised.


Mark always felt like an outcast.  He was always dressing a little wacky, and enjoyed hanging out with people like him. He was especially fond of wearing Hawaiian shirts!  When we was walking in the mall yesterday a man in a suit and holding a clipboard came up to him and started talking to him about donating to charity.  As usual he brushed him off and kept walking.  Today a man in a Hawaiian shirt and shorts; also with a clipboard came up to him and started talking to him about donating to charity.  But that's okay, he was just doing his job.  Mark chatted to him for a few minutes and considered donating.


Mr. Fabulous Fox was in a hurry, he had to get to the Millar farm before Mr. Millar got back. Mr. Fox had never been before but he knew that it would take at least 10 minutes to get there, and he had to guess it would take him at least 20 minutes to grab some chickens and ducks to feed his family. Mr. Fox waited until he saw Mr. Millar drive away to the fair, Mr. Millar would be selling the plumpest hens and the fattest ducks, for a tidy profit, and Mr. Fox could take advantage of that to have himself a bountiful meal.

Mr. Fox dashed out onto the road and made his down the farmyard road, scuttling his way toward the ducks in their pen, he jumped the fence and caught a few, looking forward to snacking on them. Sneaking into the henhouse, Mr. Fox spotted the fattest hen he’d ever seen sitting down the very end of the shack.  He immediately bolted down to catch it, chasing it up and down the wooden floorboards, scattering the other hens and causing a ruckus. 

Catching the Fat Hen had only taken an hour, so it was somewhat of a surprise to Mr. Fabulous Fox when he spotted Mr. Millar, moments before he shot him.


Mike is an extraordinarily compassionate and nice person. He is so nice that someone once said that he used Mike to ground morality. Many people who know Mike concurred, and Alice once observed that ‘Do what Mike Blume would do’ was the most effective practical ethical decision-making algorithm they could think of for people capable of modelling Mike Blume.

One day, Jessica was in trouble. She had to vote on a motion, but the motion was phrased in incredibly obtuse language that she didn’t have time to study. She realized that Mike was also voting, and sighed in relief. Reassured by Mike’s ethical soundness, she voted with him on the motion.  She figured that was better than voting based on the extremely lossy interpretation she would come up with in 10 minutes. Later, when looking at the motion, she realized it was terrible, and she was shocked at the failure of the usually-excellent algorithm!


Eliot walked along the cold, grey road.  The cool  breeze reminded him that it was nearly autumn.  Then, he remembered it: the stock market had recently crashed.  He had taken this walk to get  away from the news stories about the recession on the television at  home.  As he walked, he came across a vending machine.  In the mood for  some simple chocolate comfort, he pitched in some quarters and out came a  sugary snack.  As he ate, he remembered his mother.  She had taken him  in after he lost his job a few weeks ago.  The sweet, woody smell of  coffee drifted past.  Enjoying the smell, he realized that it would give  him energy: just what he needed.  He stopped in at the coffee shop and  ordered a tall coffee, black.  After enjoying the first few sips, he  wandered back into the city.  He watched the cars go past one after  another as he walked, watched them stream up into the distance in a long  traffic jam.  Monday rush hour.  He found it odd, but he wished that he  was in it.  He decided to stop at the video store and rent a few movies  to take his mind off of things.  When it was time to make the purchase,  he was shocked to discover that he didn't have enough money left over  to cover the movie he chose.  He thought to himself "If I'm going to  survive the recession, I had better get control over my spending."

Fred squirrel had long been a good friend to Jean Squirrel, and she hadn't seen him in many years. She decided to visit him to reminisce about their high school days. As she was walking though the forest, looking forward to having acorns with her good friend, she found Fred lying on the ground, unconscious. It was immediately clear that Fred must've fallen out of the tree and hit his head whilst he was storing nuts for the winter. Jean was inclined to think that this was due to his laziness and lack of vigilance whilst climbing around the tree. Obviously he deserved to fall and hit his head to teach him a lesson.

Jean later found out that he'd been hit on the head by a falling bowl of petunias.

Cathy Story
Fundamental Attribution Error, Illusory superiority

Mark Story
Ingroup Bias

Mr. Fox
Planning Fallacy, Normalcy Bias, Optimism Bias?

Mike Story
Halo  Effect (Actually, wouldn't halo effect require you to start with Mike  Bloom's good looks and then make assumptions about his decision-making  based on this?  I think this is not really halo effect.  Is it halo  effect if the positive trait you assume is not *different* from the  positive trait you observed?)

Elliot Story
Denomination Effect, Insensitivity to sample size

Bragging Thread, July 2014

7 diegocaleiro 14 July 2014 03:22AM

Your job, should you choose to accept it, is to comment on this thread explaining the most awesome thing you've done this since June 1st. You may be as blatantly proud of yourself as you feel. You may unabashedly consider yourself the coolest freaking person ever because of that awesome thing you're dying to tell everyone about. This is the place to do just that.

Remember, however, that this isn't any kind of progress thread. Nor is it any kind of proposal thread. This thread is solely for people to talk about the awesome things they have done. Not "will do". Not "are working on". Have already done. This is to cultivate an environment of object level productivity rather than meta-productivity methods.

So, what's the coolest thing you've done this month?

This is why we can't have social science

33 Costanza 13 July 2014 09:04PM

Jason Mitchell is [edit: has been] the John L. Loeb Associate Professor of the Social Sciences at Harvard. He has won the National Academy of Science's Troland Award as well as the Association for Psychological Science's Janet Taylor Spence Award for Transformative Early Career Contribution.

Here, he argues against the principle of replicability of experiments in science. Apparently, it's disrespectful, and presumptively wrong.

Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.

Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.

Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.

Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.

The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their “degrees of freedom,” for example, by specifying designs in advance.

Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.

This is why we can't have social science. Not because the subject is not amenable to the scientific method -- it obviously is. People are conducting controlled experiments and other people are attempting to replicate the results. So far, so good. Rather, the problem is that at least one celebrated authority in the field hates that, and would prefer much, much more deference to authority.

Why I Am Not a Rationalist, or, why several of my friends warned me that this is a cult

10 Algernoq 13 July 2014 05:54PM

A common question here is how the LW community can grow more rapidly. Another is why seemingly rational people choose not to participate.

I've read all of HPMOR and some of the sequences, attended a couple of meetups, am signed up for cryonics, and post here occasionally. But, that's as far as I go. In this post, I try to clearly explain why I don't participate more and why some of my friends don't participate at all and have warned me not to participate further.

  • Rationality doesn't guarantee correctness. Given some data, rational thinking can get to the facts accurately, i.e. say what "is". But, deciding what to do in the real world requires non-rational value judgments to make any "should" statements. (Or, you could not believe in free will. But most LWers don't live like that.) Additionally, huge errors are possible when reasoning beyond limited data. Many LWers seem to assume that being as rational as possible will solve all their life problems. It usually won't; instead, a better choice is to find more real-world data about outcomes for different life paths, pick a path (quickly, given the time cost of reflecting), and get on with getting things done. When making a trip by car, it's not worth spending 25% of your time planning to shave off 5% of your time driving. In other words, LW tends to conflate rationality and intelligence.

  • In particular, AI risk is overstated There are a bunch of existential threats (asteroids, nukes, pollution, unknown unknowns, etc.). It's not at all clear if general AI is a significant threat. It's also highly doubtful that the best way to address this threat is writing speculative research papers, because I have found in my work as an engineer that untested theories are usually wrong for unexpected reasons, and it's necessary to build and test prototypes in the real world. My strong suspicion is that the best way to reduce existential risk is to build (non-nanotech) self-replicating robots using existing technology and online ordering of materials, and use the surplus income generated to brute-force research problems, but I don't know enough about manufacturing automation to be sure.

  • LW has a cult-like social structure. The LW meetups (or, the ones I experienced) are very open to new people. Learning the keywords and some of the cached thoughts for the LW community results in a bunch of new friends and activities to do. However, involvement in LW pulls people away from non-LWers. One way this happens is by encouraging contempt for less-rational Normals. I imagine the rationality "training camps" do this to an even greater extent. LW recruiting (hpmor, meetup locations near major universities) appears to target socially awkward intellectuals (incl. me) who are eager for new friends and a "high-status" organization to be part of, and who may not have many existing social ties locally.

  • Many LWers are not very rational. A lot of LW is self-help. Self-help movements typically identify common problems, blame them on (X), and sell a long plan that never quite achieves (~X). For the Rationality movement, the problems (sadness! failure! future extinction!) are blamed on a Lack of Rationality, and the long plan of reading the sequences, attending meetups, etc. never achieves the impossible goal of Rationality (impossible because "is" cannot imply "should"). Rationalists tend to have strong value judgments embedded in their opinions, and they don't realize that these judgments are irrational.

  • LW membership would make me worse off. Though LW membership is an OK choice for many people needing a community (joining a service organization could be an equally good choice), for many others it is less valuable than other activities. I'm struggling to become less socially awkward, more conventionally successful, and more willing to do what I enjoy rather than what I "should" do. LW meetup attendance would work against me in all of these areas. LW members who are conventionally successful (e.g. PhD students at top-10 universities) typically became so before learning about LW, and the LW community may or may not support their continued success (e.g. may encourage them, with only genuine positive intent, to spend a lot of time studying Rationality instead of more specific skills). Ideally, LW/Rationality would help people from average or inferior backgrounds achieve more rapid success than the conventional path of being a good student, going to grad school, and gaining work experience, but LW, though well-intentioned and focused on helping its members, doesn't actually create better outcomes for them.

  • "Art of Rationality" is an oxymoron.  Art follows (subjective) aesthetic principles; rationality follows (objective) evidence.

I desperately want to know the truth, and especially want to beat aging so I can live long enough to find out what is really going on. HPMOR is outstanding (because I don't mind Harry's narcissism) and LW is is fun to read, but that's as far as I want to get involved. Unless, that is, there's someone here who has experience programming vision-guided assembly-line robots who is looking for a side project with world-optimization potential.

Communicating forecast uncertainty

5 VipulNaik 12 July 2014 09:30PM

Note: This post is part of my series of posts on forecasting, but this particular post may be of fairly limited interest to many LessWrong readers. I'm posting it here mainly for completeness. As always, I appreciate feedback.

In the course of my work looking at forecasting for MIRI, I repeatedly encountered discussions of how to communicate forecasts. In particular, a concern that emerged repeatedly was the clear communication of the uncertainty in forecasts. Nate Silver's The Signal and the Noise, in particular, focused quite a bit on the virtue of clear communication of uncertainty, in contexts as diverse as financial crises, epidemiology, weather forecasting, and climate change.

In this post, I pull together discussions from a variety of domains about the communication of uncertainty, and also included my overall impression of the findings.

Summary of overall findings

  • In cases where forecasts are made and used frequently (the most salient example being temperature and precipitation forecasts) people tend to form their own models of the uncertainty surrounding forecasts, even if you present forecasts as point estimates. The models people develop are quite similar to the correct ones, but still different in important ways.
  • In cases where forecasts are made more rarely, as with forecasting rare events, people are more likely to have simpler models that acknowledge some uncertainty but are less nuanced. In these cases, acknowledging uncertainty becomes quite important, because wrong forecasts of such events can lead to a loss of trust in the forecasting process, and can lead people to ignore correct forecasts later.
  • In some cases, there are arguments for modestly exaggerating small probabilities to overcome specific biases that people have that cause them to ignore low-probability events.
  • However, the balance of evidence suggests that forecasts should be reported as honestly as possible, and all uncertainty should be clearly acknowledged. If the forecast does not acknowledge uncertainty, people are likely to either use their own models of uncertainty, or lose faith in the forecasting process entirely if the forecast turns out to be far off from reality.

Probabilities of adverse events and the concept of the cost-loss ratio

A useful concept developed for understanding the utility of weather forecasting is the cost-loss model (Wikipedia). Consider that if a particular adverse event occurs, and we do not take precautionary measures, the loss incurred is L, whereas if we do take precautionary measures, the cost is C, regardless of whether the event occurs. An example: you're planning an outdoor party, and the adverse event in question is rain. If it rains during the event, you experience a loss of L. If you knew in advance that it would rain, you'd move the venue indoors, at a cost of C. Obviously, C < L for you to even consider the precautionary measure.

The ratio C/L is termed the cost-loss ratio and describes the probability threshold above which it makes sense to take the precautionary measure.

One way of thinking of the utility of weather forecasting, particularly in the context of forecasting adverse events (rain, snow, winds, and more extreme events) is in terms of whether people have adequate information to make correct decisions based on their cost-loss model. This would boil down to several questions:

  • Is the probability of the adverse event communicated with sufficient clarity and precision that people who need to use it can plug it into their cost-loss model?
  • Do people have a correct estimate of their cost-loss ratio (implicitly or explicitly)?

As I discussed in an earlier post, The Weather Channel has admitted to explicitly introducing wet bias into its probability-of-precipitation (PoP) forecasts. The rationale they offered could be interpreted as a claim that people overestimate their cost-loss ratio. For instance, a person may think his cost-loss ratio for precipitation is 0.2 (20%), but his actual cost-loss ratio may be 0.05 (5%). In this case, in order to make sure people still make the "correct" decision, PoP forecasts that fall between 0.05 and 0.2 would need to inflated to 0.2 or higher. Note that TWC does not introduce wet bias at higher probabilities of precipitation, arguably because (they believe) that this is well above the cost-loss ratio for most situations.

Words of estimative probability

In 1964, Sherman Kent (Wikipedia), the father of intelligence analysis, wrote an essay titled "Words of Estimative Probability" that discussed the use of words to describe probability estimates, and how different people may interpret the same word as referring to very different ranges of probability estimates. The concept of words of estimative probability (Wikipedia), along with its acronym, WEP, is now standard jargon in intelligence analysis.

Some related discussion of the use of words to convey uncertainty in estimates can be found in the part of this post where I excerpt from the paper discussing the communication of uncertainty in climate change.

Other general reading

#1: The case of weather forecasting

Weather forecasting has some features that make it stand out among other forecasting domains:

  • Forecasts are published explicitly and regularly: News channels and newspapers carry forecasts every day. Weather websites update their forecasts on at least an hourly basis, sometimes even faster, particularly if there are unusual weather developments. In the United States, The Weather Channel is dedicated to 24 X 7 weather news coverage.
  • Forecasts are targeted at and consumed by the general public: This sets weather forecasting apart from other forms of forecasting and prediction. We can think of prices in financial markets and betting markets as implicit forecasts. But they are targeted at the niche audiences that pay attention to them, not at everybody. The mode of consumption varies. Some people just get their forecasts from the weather reports in their local TV and radio channel. Some people visit the main weather websites (such as the National Weather Service, The Weather Channel, AccuWeather, or equivalent sources in other countries). Some people have weather reports emailed to them daily. As smartphones grow in popularity, weather apps are an increasingly common way for people to keep tabs on the weather. The study on communicating weather uncertainty (discussed below) found that in the United States, people in its sample audience saw weather forecasts an average of 115 times a month. Even assuming heavy selection bias in the study, people in the developed world probably encounter a weather forecast at least once a day.
  • Forecasts are used to drive decision-making: Particularly in places where weather fluctuations are significant, forecasts play an important role in event planning for individuals and organizations. At the individual level, this can include deciding whether to carry an umbrella, choosing what clothes to wear, deciding whether to wear snow boots, deciding whether conditions are suitable for driving, and many other small decisions. At the organizational level, events may be canceled or relocated based on forecasts of adverse weather. In locations with variable weather, it's considered irresponsible to plan an event without checking the weather forecast.
  • People get quick feedback on whether the forecast was accurate: The next day, people know whether what was forecast transpired.

The upshot: people are exposed to weather forecasts, pay attention to them, base decisions on them, and then come to know whether the forecast was correct. This happens on a daily basis. Therefore, they have both the incentives and the information to form their own mental model of the reliability and uncertainty in forecasts. Note also that because the reliability of forecasts varies considerably by location, people who move from one location to another may take time adjusting to the new location. (For instance, when I moved to Chicago, I didn't pay much attention to weather forecasts in the beginning, but soon learned that the high variability of the weather combined with reasonable accuracy of forecasts made then worth paying attention to. Now that I'm in Berkeley, I probably pay too much attention to the forecast relative to its value, given the stability of weather in Berkeley).

With these general thoughts in mind, let's look at the paper Communicating Uncertainty in Weather Forecasts: A Survey of the U. S. Public by Rebecca E. Morss, Julie L. Demuth, and Jeffrey K. Lazo. The paper is based on a survey of about 1500 people in the United States. The whole paper is worth a careful read if you find the issue fascinating. But for the benefits of those of you who find the issue somewhat interesting but not enough to read the paper, I include some key takeaways from the paper.

Temperature forecasts: the authors find that even though temperature forecasts are generally made as point estimates, people interpret these point estimates as temperature ranges. The temperature ranges are not even necessarily centered at the point estimates. Further, the range of temperatures increases with the forecast horizon. In other words, people (correctly) realize that forecasts made for three days later have more uncertainty attached to them than forecasts made for one day later. In other words, peoples understanding of the nature of forecast uncertainty in temperatures is correct, at least in the broad qualitative sense.

The authors believe that people arrive at these correct models through their own personal history of seeing weather forecasts and evaluating how they compare with the reality. Clearly, most people don't keep close track of how forecasts compare with the reality, but they are still able to get the general idea over several years of exposure to weather forecasts. The authors also believe that since the accuracy of weather forecasts varies by region, people's models of uncertainty may also differ by region. However, the data they collect does not allow for a test of this hypothesis. For more, read Sections 3a and 3b of the paper.

Probability-of-precipitation (PoP) forecasts: The authors also look at people's perception of probability-of-precipitation (PoP) forecasts. The correct meteorological interpretation of PoP is "the probability that precipitation occurs given these meteorological conditions." The frequentist operationalization of this would be "the fraction (situations with meteorological conditions like this where precipitation does occur)/(situations with meteorological conditions like this)." To what extent are people aware of this meaning? One of the questions in the survey elicits information on this front:

TABLE 2. Responses to Q14a, the meaning of the forecast
“There is a 60% chance of rain for tomorrow” (N 1330).
It will rain tomorrow in 60% of the region. 16% of respondents
It will rain tomorrow for 60% of the time. 10% of respondents
It will rain on 60% of the days like tomorrow.* 19% of respondents
60% of weather forecasters believe that it will rain tomorrow. 22% of respondents
I don’t know. 9% of respondents
Other (please explain). 24% of respondents
* Technically correct interpretation, according to how PoP forecasts are verified, as interpreted by Gigerenzer et al. (2005).

So about 19% of participants choose the correct meteorological interpretation. However, of the 24% who offer other explanations, many suggest that they are not so much interested in the meteorological interpretation as in how this affects their decision-making. So it might be the case that even if people aren't aware of the frequentist definition, they are still using the information approximately correctly as it applies to their lives. One such application would be a comparison with the cost-loss ratio to determine whether to engage in precautionary measures. Note that, as noted earlier in the post, it may be the case that people overestimate their own cost-loss ratio, but this is a distinct problem from incorrectly interpreting the probability.

I also found the following resources, that I haven't had the time to read through, but that might help people interested in exploring the issue in more detail (I'll add more to this list if I find more):

#2: Extreme rare events (usually weather-related) that require significant response

For some rare events (such as earthquakes) we don't know how to make specific predictions of their imminent arrival. But for others, such as hurricanes, cyclones, blizzards, tornadoes, and thunderstorms, specific probabilistic predictions can be made. Based on these predictions, significant action can be undertaken, ranging from everybody deciding to stock up on supplies and stay at home, to a mass evacuation. Such responses are quite costly, but the loss they would avert if the event did occur is even bigger. In the cost-loss framework discussed above, we are dealing with both a high cost and a loss that could be much higher. However, unlike the binary case discussed above, the loss spans more of a continuum: the amount of loss that would occur without precautionary measures depends on the intensity of the event. Similarly, the costs span a continuum: the cost depends on the extent of precautionary measures taken.

Since both the cost and loss are huge, it's quite important to get a good handle on the probability. But should the correct probability be communicated, or should it be massaged or simply converted to a "yes/no" statement? We discussed earlier the (alleged) problem of people overestimating their cost-loss ratio, and therefore not taking adequate precautionary measures, and how the Weather Channel addresses this by deliberately introducing a wet bias. But the stakes are much higher when we are talking of shutting down a city for a day or ordering a mass evacuation.

Another complication is that the rarity of the event means that people's own mental models haven't had a lot of data to calibrate the accuracy and reliability of forecasts. When it comes to temperature and precipitation forecasts, people have years of experience to rely on. They will not lose faith in a forecast based on a single occurrence. When it comes to rare events, even a few memories of incorrect forecasts, and the concomitant huge costs or huge losses, can lead people to be skeptical of the forecasts in the future. In The Signal and the Noise, Nate Silver extensively discusses the case of Hurricane Katrina and the dilemmas facing the mayor of New Orleans that led him to delay the evacuation of the city, and led many people to ignore the evacuation order even after it was announced.

A direct strike of a major hurricane on New Orleans had long been every weather forecaster’s worst nightmare. The city presented a perfect set of circumstances that might contribute to the death and destruction there. [...]

The National Hurricane Center nailed its forecast of Katrina; it anticipated a potential hit on the city almost five days before the levees were breached, and concluded that some version of the nightmare scenario was probable more than forty-eight hours away . Twenty or thirty years ago, this much advance warning would almost certainly not have been possible, and fewer people would have been evacuated. The Hurricane Center’s forecast, and the steady advances made in weather forecasting over the past few decades, undoubtedly saved many lives.

Not everyone listened to the forecast, however. About 80,000 New Orleanians —almost a fifth of the city’s population at the time— failed to evacuate the city, and 1,600 of them died. Surveys of the survivors found that about two-thirds of them did not think the storm would be as bad as it was. Others had been confused by a bungled evacuation order; the city’s mayor, Ray Nagin, waited almost twenty-four hours to call for a mandatory evacuation, despite pleas from Mayfield and from other public officials. Still other residents— impoverished, elderly, or disconnected from the news— could not have fled even if they had wanted to.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 109-110). Penguin Group US. Kindle Edition.

So what went wrong? Silver returns to this later in the chapter:

As Max Mayfield told Congress, he had been prepared for a storm like Katrina to hit New Orleans for most of his sixty-year life. Mayfield grew up around severe weather— in Oklahoma, the heart of Tornado Alley— and began his forecasting career in the Air Force, where people took risk very seriously and drew up battle plans to prepare for it. What took him longer to learn was how difficult it would be for the National Hurricane Center to communicate its forecasts to the general public.

“After Hurricane Hugo in 1989,” Mayfield recalled in his Oklahoma drawl, “I was talking to a behavioral scientist from Florida State. He said people don’t respond to hurricane warnings. And I was insulted. Of course they do. But I have learned that he is absolutely right. People don’t respond just to the phrase ‘hurricane warning.’ People respond to what they hear from local officials. You don’t want the forecaster or the TV anchor making decisions on when to open shelters or when to reverse lanes.”

Under Mayfield’s guidance, the National Hurricane Center began to pay much more attention to how it presented its forecasts. It contrast to most government agencies, whose Web sites look as though they haven’t been updated since the days when you got those free AOL CDs in the mail, the Hurricane Center takes great care in the design of its products, producing a series of colorful and attractive charts that convey information intuitively and accurately on everything from wind speed to storm surge.

The Hurricane Center also takes care in how it presents the uncertainty in its forecasts. “Uncertainty is the fundamental component of weather prediction,” Mayfield said. “No forecast is complete without some description of that uncertainty.” Instead of just showing a single track line for a hurricane’s predicted path, for instance, their charts prominently feature a cone of uncertainty—“ some people call it a cone of chaos,” Mayfield said. This shows the range of places where the eye of the hurricane is most likely to make landfall. Mayfield worries that even this isn’t enough. Significant impacts like flash floods (which are often more deadly than the storm itself) can occur far from the center of the storm and long after peak wind speeds have died down. No people in New York City died from Hurricane Irene in 2011 despite massive media hype surrounding the storm, but three people did from flooding in landlocked Vermont once the TV cameras were turned off.


Mayfield told Nagin that he needed to issue a mandatory evacuation order, and to do so as soon as possible.

Nagin dallied, issuing a voluntary evacuation order instead. In the Big Easy, that was code for “take it easy”; only a mandatory evacuation order would convey the full force of the threat. Most New Orleanians had not been alive when the last catastrophic storm, Hurricane Betsy, had hit the city in 1965. And those who had been, by definition, had survived it. “If I survived Hurricane Betsy, I can survive that one, too. We all ride the hurricanes, you know,” an elderly resident who stayed in the city later told public officials. Reponses like these were typical. Studies from Katrina and other storms have found that having survived a hurricane makes one less likely to evacuate the next time one comes. 

The reasons for Nagin’s delay in issuing the evacuation order is a matter of some dispute— he may have been concerned that hotel owners might sue the city if their business was disrupted. Either way, he did not call for a mandatory evacuation until Sunday at 11 A.M. —and by that point the residents who had not gotten the message yet were thoroughly confused . One study found that about a third of residents who declined to evacuate the city had not heard the evacuation order at all. Another third heard it but said it did not give clear instructions. Surveys of disaster victims are not always reliable— it is difficult for people to articulate why they behaved the way they did under significant emotional strain, and a small percentage of the population will say they never heard an evacuation order even when it is issued early and often. But in this case, Nagin was responsible for much of the confusion.

There is, of course, plenty of blame to go around for Katrina— certainly to FEMA in addition to Nagin. There is also credit to apportion— most people did evacuate, in part because of the Hurricane Center’s accurate forecast. Had Betsy topped the levees in 1965, before reliable hurricane forecasts were possible, the death toll would probably have been even greater than it was in Katrina. One lesson from Katrina, however, is that accuracy is the best policy for a forecaster. It is forecasting’s original sin to put politics, personal glory, or economic benefit before the truth of the forecast. Sometimes it is done with good intentions, but it always makes the forecast worse. The Hurricane Center works as hard as it can to avoid letting these things compromise its forecasts. It may not be a concidence that, in contrast to all the forecasting failures in this book, theirs have become 350 percent more accurate in the past twenty-five years alone.

“The role of a forecaster is to produce the best forecast possible,” Mayfield says. It’s so simple— and yet forecasters in so many fields routinely get it wrong.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 138-141). Penguin Group US. Kindle Edition. 

Silver notes similar failures of communication of forecast uncertainty in other domains, including exaggeration of the 1976 swine flu outbreak.

I also found a few related papers that may be worth reading if you're interested in understanding the communication of weather-related rare event forecasts:

#3: Long-run changes that might necessitate policy responses or long-term mitigation or adaptation strategies, such as climate change

In marked contrast to daily weather forecasting as well as extreme rare event forecasting is the forecasting of gradual long-term structural changes. Examples include climate change, economic growth, changes in the size and composition of the population, and technological progress. Here, the general recommendation is clear and detailed communication of uncertainty using multiple formats, with the format tailored to the types of decisions that will be based on the information.

On the subject of communicating uncertainty in climate change, I found the paper Communicating uncertainty: lessons learned and suggestions for climate change assessment by Anthony Patt and Suraje Dessai. The paper is quite interesting (and has been referenced by some of the other papers mentioned in this post).

The paper identifies three general sources of uncertainty:

  • Epistemic uncertainty arises from incomplete knowledge of processes that influence events.
  • Natural stochastic uncertainty refers to the chaotic nature of the underlying system (in this case, the climate system).
  • Human reflexive uncertainty refers to uncertainty in human activity that could affect the system. Some of the activity may be undertaken specifically in response to the forecast.

This is somewhat similar to, but not directly mappable to, the classification of sources of uncertainty by Gavin Schmidt from NASA that I discussed in my post on weather and climate forecasting:

  • Initial condition uncertainty: This form of uncertainty dominates short-term weather forecasts (though not necessarily the very short term weather forecasts; it seems to matter the most for intervals where numerical weather prediction gets too uncertain but long-run equilibrating factors haven't kicked in). Over timescales of several years, this form of uncertainty is not influential.
  • Scenario uncertainty: This is uncertainty that arises from lack of knowledge of how some variable (such as carbon dioxide levels in the atmosphere, or levels of solar radiation, or aerosol levels in the atmosphere, or land use patterns) will change over time. Scenario uncertainty rises over time, i.e., scenario uncertainty plagues long-run climate forecasts far more than it plagues short-run climate forecasts.
  • Structural uncertainty: This is uncertainty that is inherent to the climate models themselves. Structural uncertainty is problematic at all time scales to a roughly similar degree (some forms of structural uncertainty affect the short run more whereas some affect the long run more).

Section 2 of the paper has a general discussion of interpreting and communicating probabilities. One of the general points made is that the more extreme the event, the lower people's mental probability threshold for verbal descriptions of likelihood. For instance, for a serious disease, the probability threshold for "very likely" may be 30%, whereas for a minor ailment, it may be 90% (these numbers are my own, not from the paper). The authors also discuss the distinction between frequentist and Bayesian approaches and claim that the frequentist approach is better suited to assimilating multiple pieces of information, and therefore, frequentist framings should be preferred to Bayesian framings when communicating uncertainty:

As should already be evident, whether the task of estimating and responding to uncertainty is framed in stochastic (usually frequentist) or epistemic (often Bayesian) terms can strongly influence which heuristics people use, and likewise lead to different choice outcomes [23]. Framing in frequentist terms on the one hand promotes the availability heuristic, and on the other hand promotes the simple acts of multiplying, dividing, and counting. Framing in Bayesian terms, by contrast, promotes the representativeness heuristic, which is not well adapted to combining multiple pieces of information. In one experiment, people were given the problem of estimating the chances that a person has a rare disease, given a positive result from a test that sometimes generates false positives. When people were given the problem framed in terms of a single patient receiving the diagnostic test, and the base probabilities of the disease (e.g., 0.001) and the reliability of the test (e.g., 0.95), they significantly over-estimate the chances that the person has the disease (e.g., saying there is a 95% chance). But when people were given the same problem framed in terms of one thousand patients being tested, and the same probabilities for the disease and the test reliability, they resorted to counting patients, and typically arrived at the correct answer (in this case, about 2%). It has, indeed, been speculated that the gross errors at probability estimation, and indeed errors of logic, observed in the literature take place primarily when people are operating within the Bayesian probability framework, and that these disappear when people evaluate problems in frequentist terms [23,58].

The authors offer the following suggestions in the discussion section (Section 4) of their paper:

The challenge of communicating probabilistic information so that it will be used, and used appropriately, by decision-makers has been long recognized. [...] In some cases, the heuristics that people use are not well suited to the particular problem that they are solving or decision that they are making; this is especially likely for types of problems outside their normal experience. In such cases, the onus is on the communicators of the probabilistic information to help people find better ways of using the information, in such a manner that respects the users’ autonomy, full set of concerns and goals, and cognitive perspective.

That these difficulties appear to be most pronounced when dealing with predictions of one-time events, where the probability estimates result from a lack of complete confidence in the predictive models. When people speak about such epistemic or structural uncertainty, they are far more likely to shun quantitative descriptions, and are far less likely to combine separate pieces of information in ways that are mathematically correct. Moreover, people perceive decisions that involve structural uncertainty as riskier, and will take decisions that are more risk averse. By contrast, when uncertainty results from well-understood stochastic processes, for which the probability estimate results from counting of relative frequencies, people are more likely to work effectively with multiple pieces of information, and to take decisions that are more risk neutral.

In many ways, the most recent approach of the IPCC WGI responds to these issues. Most of the uncertainties with respect to climate change science are in fact epistemic or structural, and the probability estimates of experts reflect degrees of confidence in the occurrence of one-time events, rather than measurement of relative frequencies in relevant data sets. Using probability language, rather than numerical ranges, matches people’s cognitive framework, and will likely make the information both easier to understand, and more likely to be used. Moreover, defining the words in terms of specific numerical ranges ensures consistency within the report, and does allow comparison of multiple events, for which the uncertainty may derive from different sources.

We have already mentioned the importance of target audiences in communicating uncertainties, but this cannot be emphasized enough. The IPCC reports have a wide readership so a pluralistic approach is necessary. For example, because of its degree of sophistication, the water chapter could communicate uncertainties using numbers, whereas the regional chapters might use words and the adaptive capacity chapter could use narratives. “Careful design of communication and reporting should be done in order to avoid information divide, misunderstandings, and misinterpretations. The communication of uncertainty should be understandable by the audience. There should be clear guidelines to facilitate clear and consistent use of terms provided. Values should be made explicit in the reporting process” [32].

However, by writing the assessment in terms of people’s intuitive framework, the IPCC authors need to understand that this intuitive framework carries with it several predictable biases. [...]

The literature suggests, and the two experiments discussed here further confirm, that the approach of the IPCC leaves room for improvement. Further, as the literature suggests, there is no single solution for these potential problems, but there are communication practices that could help. [...]

Finally, the use of probability language, instead of numbers, addresses only some of the challenges in uncertainty communication that have been identified in the modern decision support literature. Most importantly, it is important in the communication process to address how the information can and should be used, using heuristics that are appropriate for the particular decisions. [...] Obviously, there are limits to the length of the report, but within the balancing act of conciseness and clarity, greater attention to full dimensions of uncertainty could likely increase the chances that users will decide to take action on the basis of the new information.


1 CyrilDan 12 July 2014 06:34AM

Hi, everyone.

I just started reading Total Freedom by Chris Sciabarra (warning: politics book), and a good half of it seems to be about 'dialectics' as a thinking tool, but it's been total rubbish in trying to explain it. From poking around on the internet, it seems to have been a proto-systems theory that became a Marxist shibboleth.

Am I understanding that correctly? The LW survey says about 1 in 4 of us is a communist, so I'm hoping someone can point to me resources or something. Also, I've read through most of the sequences, and it didn't use the word dialectics in there at all, which seems strange if it's such a useful thinking tool. Is there something wrong with it as an epistemological practice? Is the word just outdated?

Sorry about the (tangentially) political post, I'm just kind of confused. Help?

Forecasting rare events

5 VipulNaik 11 July 2014 10:48PM

In an earlier post, I looked at some general domains of forecasting. This post looks at some more specific classes of forecasting, some of which overlap with the general domains, and some of which are more isolated. The common thread to these classes of forecasting is that they involve rare events.

Different types of forecasting for rare events

When it comes to rare events, there are three different classes of forecasts:

  1. Point-in-time-independent probabilistic forecasts: Forecasts that provide a probability estimate for the event occurring in a given timeframe, but with no distinction based on the point in time. In other words, the forecast may say "there is a 5% chance of an earthquake higher than 7 on the Richter scale in this geographical region in a year" but the forecast is not sensitive to the choice of year. These are sufficient to inform decisions on general preparedness. In the case of earthquakes, for instance, the amount of care to be taken in building structures can be determined based on these forecasts. On the other hand, it's useless for deciding the timing of specific activities.
  2. Point-in-time-dependent probabilistic forecasts: Forecasts that provide a probability estimate that varies somewhat over time based on history, but aren't precise enough for a remedial measure that substantially offsets major losses. For instance, if I know that an earthquake will occur in San Francisco in the next 6 months with probability 90%, it's still not actionable enough for a mass evacuation of San Francisco. But some preparatory measures may be undertaken.
  3. Predictions made with high confidence (i.e., a high estimated probability when the event is predicted) and a specific time, location, and characteristics: Precise predictions of date and time, sufficient for remedial measures that substantially offset major losses (but possibly at huge, if much smaller, cost). The situation with hurricanes, tornadoes, and blizzards is roughly in this category.

Statistical distributions: normal distributions versus power law distributions

Perhaps the most ubiquitous distribution used in probability and statistics is the normal distribution. The normal distribution is a symmetric distribution whose probability density function decays superexponentially with distance from the mean (more precisely, it is exponential decay in the square of the distance). In other words, the probability decays slowly at the beginning, and faster later. Thus, for instance, the ratio of pdfs for 2 standard deviations from the mean and 1 standard deviation from the mean is greater than the ratio of pdfs for 3 standard deviations from the mean and 2 standard deviations from the mean. To give explicit numbers: about 68.2% of the distribution lies between -1 and +1 SD, 95.4% lies between -2 and +2 SD, 99.7% lies between -3 and +3 SD, and 99.99% lies between -4 and +4 SD. So the probability of being more than 4 standard deviations is less than 1 in 10000.

If the probability distribution for intensity looks (roughly) like a normal distribution, then high-intensity events are extremely unlikely. So, if the probability distribution for intensity is normal, we do not have to worry about high-intensity events much.

The types of situations where rare event forecasting becomes more important is where events that are high-intensity, or "extreme" in some sense, occur rarely but not as rarely as in a normal distribution. We say that the tails of such distributions are thicker than those of the normal distribution, and the distributions are termed "thick-tailed" or "fat-tailed" distributions. [Formally, the thickness of tails is measured using a quantity called excess kurtosis, which sees how the fourth central moment compares with the square of the second central moment (the second central moment is the variance, and it is the square of the standard deviation), then subtracts off the number 3, which is the corresponding value for the normal distribution. If the excess kurtosis for a distribution is positive, it is a thick-tailed distribution.]

The most common example of such distributions that is of interest to us is power law distributions. Here, the probability is proportional to a negative power. So the decay is like a power. If you remember some basic precalculus/calculus, you'll recall that power functions (such as the square function or cube function) grow more slowly than exponential functions. So power law distributions decay more subexponentially: they decay more slowly than exponential decay (to be more precise, the decay starts off as fast, then slows down). As noted above, the pdf for the normal distribution decays exponentially in the square of the distance from the mean, so the upshot is that power law distributions decay more slowly than normal distributions.

For most of the rare event classes we discuss, to the extent that it has been possible to pin down a distribution, it has looked a lot more like a power law distribution than a normal distribution. Thus, rare events need to be heeded. (There's obviously a selection effect here: for those cases where the distributions are close to normal, forecasting rare events just isn't that challenging, so they wouldn't be included in my post).

UPDATE: Aaron Clauset, who appears in #4, pointed me (via email) to his Rare Events page, containing the code (Matlab and Python) that he used in his terrorism statistics paper mentioned as an update at the bottom of #4. He noted in the email that the statistical methods are fairly general, so interested people could use the code if they were interested in cross-applying to rare events in other domains.


One of the more famous advocates of the idea that people overestimate the ubiquity of normal distributions and underestimate the prevalence of power law distributions is Nassim Nicholas Taleb. Taleb calls the world of normal distributions Mediocristan (the world of mediocrity, where things are mostly ordinary and weird things are very very rare) and the world of power law distributions Extremistan (the world of extremes, where rare and weird events are more common). Taleb has elaborated on this thesis in his book The Black Swan, though some parts of the idea are also found in his earlier book Fooled by Randomness.

I'm aware that a lot of people swear by Taleb, but I personally don't find his writing very impressive. He does cover a lot of important ideas but they didn't originate with him, and he goes off on a lot of tangents. In contrast, I found Nate Silver's The Signal and the Noise a pretty good read, and although it wasn't focused on rare events per se, the parts of it that did discuss such forecasting were used by me in this post.

(Sidenote: My criticism of Taleb is broadly similar to that offered by Jamie Whyte here in Standpoint Magazine. Also, here's a review by Steve Sailer of Taleb. Sailer is much more favorably inclined to the normal distribution than Taleb is, and this is probably related to his desire to promote IQ distributions/The Bell Curve type ideas, but I think many of Sailer's criticisms are spot on).

Examples of rare event classes that we discuss in this post

The classes discussed in this post include:

  1. Earthquakes: Category #1, also, hypothesized to follows a power law distribution.
  2. Volcanoes: Category #2.
  3. Extreme weather events (hurricanes/cyclones, tornadoes, blizzards): Category #3.
  4. Major terrorist acts: Questionable, at least Category #1, some argue it is Category #2 or Category #3. Hypothesized to follow a power law distribution.
  5. Power outages (could be caused by any of 1-4, typically 3)
  6. Server outages (could be caused by 5)
  7. Financial crises
  8. Global pandemics, such as the 1918 flu pandemic (popularly called the "Spanish flu") that, according to Wikipedia, "infected 500 million people across the world, including remote Pacific islands and the Arctic, and killed 50 to 100 million of them—three to five percent of the world's population." They probably fall under Category #2, but I couldn't get a clear picture. (Pandemics were not in the list at the time of original publication of the post; I added them based on a comment suggestion).
  9. Near-earth object impacts (not in the list at the time of original publication of the post; I added them based on a comment suggestion).

Other examples of rare events would also be appreciated.

#1: Earthquakes

Earthquake prediction remains mostly in category 1: there are probability estimates of the occurrence of earthquakes of a given severity or higher within a given timeframe, but these estimates do not distinguish between different points in time. In The Signal and the Noise, statistician and forecasting expert Nate Silver talks to Susan Hough (Wikipedia) of the United States Geological Survey and describes what she has to say about the current state of earthquake forecasting:

What seismologists are really interested in— what Susan Hough calls the “Holy Grail” of seismology— are time-dependent forecasts, those in which the probability of an earthquake is not assumed to be constant across time.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 154). Penguin Group US. Kindle Edition.

The whole Silver chapter is worth reading, as is the Wikipedia page on earthquake prediction, which covers much of the same ground.

In fact, even for the time-independent earthquake forecasting, currently the best known forecasting method is the extremely simple Gutenberg-Richter law, which says that for a given location, the frequency of earthquakes obeys a power law with respect to intensity. Since the Richter scale is logarithmic (to base 10), this means that adding a point on the Richter scale makes the frequency of earthquakes decrease to a fraction of the previous value. Note that the Gutenberg-Richter law can't be the full story: there are probably absolute limits on the intensity of the earthquake (some people believe that an earthquake of intensity 10 or higher is impossible). But so far, it seems to have the best track record.

Why haven't we been able to come up with better models? This relates to the problem of overfitting common in machine learning and statistics: when the number of data points is very small, and quite noisy, then trying a more complicated law (with more freely varying parameters) ends up fitting the noise in the data rather than the signal, and therefore ends up being a poor fit for new, out-of-sample data. The problem is dealt with in statistics using various goodness of fit tests and measures such as the Akaike information criterion, and it's dealt with in machine learning using a range of techniques such as cross-validation, regularization, and early stopping. These approaches can generally work well in situations where there is lots of data and lots of parameters. But in cases where there is very little data, it often makes sense to just manually select a simple model. The Gutenberg-Richter law has two parameters, and can be fit using a simple linear regression. There isn't enough information to reliably fit even modestly more complicated models, such as the characteristic earthquake models, and past attempts based on characteristic earthquakes failed in both directions (a predicted earthquake at Parkfield never materialized, and the probability of the 2011 Japan earthquake was underestimated by the model relative to the Gutenberg-Richter law).

Silver's chapter and other sources do describe some possibilities for short-term forecasting based on foreshocks and aftershocks, and seismic disturbances, but note considerable uncertainty.

The existence of time-independent forecasts for earthquakes has probably had major humanitarian benefits. Building codes and standards, in particular, can adapt to the probability of earthquakes. For instance, building standards are greater in the San Francisco Bay Area than in other parts of the United States, partly because of the greater probability of earthquakes. Note also that Gutenberg-Richter does make out-of-sample predictions: it can use the frequency of low-intensity earthquakes to predict the frequency of high-intensity earthquakes, and therefore obtain a time-independent forecast of such an earthquake in a region that may never have experienced it.

#2: Volcanic eruptions

Volcanoes are an easier case than earthquakes. Silver's book doesn't discuss them, but the Wikipedia article offers basic information. A few points:

  • Volcanic activity falls close to category #2: time-dependent forecasts can be made, albeit with considerable uncertainty.
  • Volcanic activity poses less immediate risk because fewer people live close to the regions where volcanoes typically erupt.
  • However, volcanic activity can affect regional and global climate for a few years (in the cooling direction), and might even shift the intercept of other long-term secular and cyclic trends in climate (the reason is that the dust particles released by volcanoes into the atmosphere reduce the extent to which solar radiation is absorbed). For instance, the 1991 Mount Pinatubo eruption is credited with causing the next 1-2 years to be cooler than they otherwise would be, masking the heating effect of a strong El Nino.

#3:  Extreme weather events (lightning, hurricanes/cyclones, blizzards, tornadoes)

Forecasting for lightning and thunderstorms has improved quite a bit over the last century, and falls squarely within Category #3. In The Signal and the Noise, Nate Silver notes that the probability of an American dying from lightning has dropped from 1 in 400,000 in 1940 to 1 in 11,000,000 today, and a large part of the credit goes to better weather forecasting causing people to avoid the outdoors at the times and places that lightning might strike.

Forecasting for hurricanes and cyclones (which are the same weather phenomenon, just at different latitudes) is quite good, and getting better. It falls squarely in category #3: in addition to having general probability estimates of the likelihood of particular types of extreme weather events, we can forecast them a day or a few days in advance, allowing for preparation and minimization of negative impact.

The precision for forecasting the eye of the storm has increased about 3.5-fold in length terms (so about 12-fold in area terms) over the last 25 years. Nate Silver notes that 25 years ago, the National Hurricane Center's forecasts for where a hurricane would hit on landfall, made three days in advance, were 350 miles off on average. Now they're about 100 miles off on average. Most of the major hurricanes that hit the United States, and many other parts of the world, were forecast well in advance, and people even made preparations (for instance, by declaring holidays, or stocking up on goods). Blizzard forecasting is also fairly impressive: I was at Chicago in 2011 when a blizzard hit, and it had been forecast at least a day in advance. With tornadoes, tornado warning alerts are often issued, albeit the tornado often doesn't actually touch down even after the alert is issued (fortunately for us).

See also my posts on weather forecasting and climate forecasting.

#4: Major terrorist acts

Terrorist attacks are interesting. It has been claimed that the frequency-damage relationship for terrorist attacks follows a power law. The academic paper that popularized this observation is a paper by Aaron Clauset, Maxwell Young and Kristian Gleditsch titled "On the Frequency of Severe Terrorist Attacks" (Journal of Conflict Resolution 51(1), 58 - 88 (2007)), here. Bruce Schneier wrote a blog post about a later paper by Clauset and Frederick W. Wiegel, and see also more discussion here, here, here, and here (I didn't select these links through a very discerning process; I just picked the top results of a Google Search).

Silver's book does allude to power laws for terrorism, but I couldn't find any reference to Clauset in his book (oops, seems like my Kindle search was buggy!) and says the following about Clauset:

Clauset’s insight, however, is actually quite simple— or at least it seems that way with the benefit of hindsight. What his work found is that the mathematics of terrorism resemble those of another domain discussed in this book: earthquakes.

Imagine that you live in a seismically active area like California. Over a period of a couple of decades, you experience magnitude 4 earthquakes on a regular basis, magnitude 5 earthquakes perhaps a few times a year, and a handful of magnitude 6s. If you have a house that can withstand a magnitude 6 earthquake but not a magnitude 7, would it be right to conclude that you have nothing to worry about?

Of course not. According to the power-law distribution that these earthquakes obey, those magnitude 5s and magnitude 6s would have been a sign that larger earthquakes were possible—inevitable, in fact, given enough time. The big one is coming, eventually. You ought to have been prepared.

Terror attacks behave in something of the same way. The Lockerbie bombing and Oklahoma City were the equivalent of magnitude 7 earthquakes. While destructive enough on their own, they also implied the potential for something much worse— something like the September 11 attacks, which might be thought of as a magnitude 8. It was not an outlier but instead part of the broader mathematical pattern.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 427-428). Penguin Group US. Kindle Edition.

So terrorist attacks are at least in category 1. What about categories 2 and 3? Can we forecast terrorist attacks the way we can forecast volcanoes, or the way we can forecast hurricanes. One difference between terrorist acts and the "acts of God" discussed so far is that to the extent one has inside information about a terrorist attack that's good enough to predict it with high accuracy, it's usually also sufficient to actually prevent the terrorist attack. So Category 3 becomes trickier to define. Should we count the numerous foiled terrorist plots as evidence that terrorist acts can be successfully "predicted" or should we only consider successful terrorist acts in the denominator? And another complication is that terrorist acts are responsive to geopolitical decisions in ways that earthquakes are definitely not, with extreme weather events falling somewhere in between.

As for Category 2, the evidence is unclear, but it's highly likely that terrorist acts can be forecast in a time-dependent fashion to quite a degree. If you want to crunch the numbers yourself, the Global Terrorism Database (website, Wikipedia) and Suicide Attack Database (website, Wikipedia) are available for you to use. I discussed some general issues with political and conflict forecasting in my earlier post on the subject.

UPDATE: Clauset emailed me with some corrections to this section of the post, which I have made. He also pointing to a recent paper he co-wrote with Ryan Woodward about estimating the historical and future probabilities of terror events, available on the ArXiV. Here's the abstract:

Quantities with right-skewed distributions are ubiquitous in complex social systems, including political conflict, economics and social networks, and these systems sometimes produce extremely large events. For instance, the 9/11 terrorist events produced nearly 3000 fatalities, nearly six times more than the next largest event. But, was this enormous loss of life statistically unlikely given modern terrorism's historical record? Accurately estimating the probability of such an event is complicated by the large fluctuations in the empirical distribution's upper tail. We present a generic statistical algorithm for making such estimates, which combines semi-parametric models of tail behavior and a nonparametric bootstrap. Applied to a global database of terrorist events, we estimate the worldwide historical probability of observing at least one 9/11-sized or larger event since 1968 to be 11-35%. These results are robust to conditioning on global variations in economic development, domestic versus international events, the type of weapon used and a truncated history that stops at 1998. We then use this procedure to make a data-driven statistical forecast of at least one similar event over the next decade.

#5: Power outages

Power outages could have many causes. Note that insofar as we can forecast the phenomena underlying the causes, this can be used to reduce, rather than simply forecast, power outages.

  • Poor load forecasting, i.e., electricity companies don't forecast how much demand there will be and don't prepare supplies adequately. This is less of an issue in developed countries, where the power systems are more redundant (at some cost to efficiency): Note here that the power outage occurs due to a failure of a more mundane forecasting exercise. Forecasting the frequency of power outages due to this cause is basically an exercise in calibrating the quality of the mundane forecasting exercise.
  • Abrupt or significant shortages in fuel, often for geopolitical reasons. This therefore ties in with the general exercise of geopolitical forecasting (see my earlier post on the subject). This seems rare in the modern world, due to the considerably redundancy built into global fuel supplies.
  • Disruption of power lines or power supply units due to weather events. The most common causes appear to be lightning, ice, wind, rain, and flooding. This ties in with #3, and with my weather forecasting and climate forecasting posts. This is the most common cause of power outages in developed countries with advanced electricity grids (see, for instance, here and here).
  • Disruption by human or animal activity, including car accidents and animals climbing onto and playing with the power lines.
  • Perhaps the most niche source of power outages, that many people may be unaware of, is geomagnetic storms (Wikipedia). These are quite rare, but can result in major power blackouts. Geomagnetic storms were discussed in past MIRI posts (here and here). Geomagnetic storms are low-frequency and low-probability events but with potentially severe negative impact.

My impression is that when it comes to power outages, we are at Category 2 in forecasting. Load forecasting can identify seasons, times of the day, and special occasions when power demand will be high. Note that the infrastructure needs to built for peak capacity.

We can't quite be in Category 3,  because in cases where we can forecast more finely, we could probably prevent the outage anyway.

What sort of preventive measures do people undertake with knowledge of the frequency of power outages? In places where power outages are more likely, people are more likely to have backup generators. People may be more likely to use battery-powered devices. If you know that a power outage is likely to happen in the next few days, you might take more care to charge the batteries on your devices.

#6: Server outages

In our increasingly connected world, websites going down can have a huge effect on the functioning of the Internet and of the world economy. As with power infrastructure, the complexity of server infrastructure needed to increase uptime increases very quickly. The point is that routing around failures at different points in the infrastructure requires redundancy. For instance, if any one server fails 10% of the time, and the failures of different components are independent, you'd need two servers to get to a 1% failure rate. But in practice, the failures aren't independent. For instance, having loads of servers in a single datacenter covers the risk of any given server there crashing, but it doesn't cover the risk of the datacenter itself getting disconnected (e.g., losing electricity, or getting disconnected from the Internet, or catching fire). So now we need multiple datacenters. But multiple datacenters are far from each other, so that increases the time costs of synchronization. And so on. For more detailed discussions of the issues, see here and here.

My impression is that server outages are largely Category 1: we can use the probability of outages to determine the trade-off between the cost of having redundant infrastructure and the benefit of more uptime. There is an element of Category 2: in some cases, we have knowledge that traffic will be higher at specific times, and additional infrastructure can be brought to bear for those times. As with power infrastructure, server infrastructure needs to be built to handle peak capacity.

#7: Financial crises

The forecasting of financial crises is a topic worthy of its own post. As with climate science, financial crisis forecasting has the potential for heavy politicization, given the huge stakes both of forecasting financial crises and of any remedial or preventative measures that may be undertaken. In fact, the politicization and ideology problem is probably substantially worse in financial crisis forecasting. At the same time, real-world feedback occurs faster, providing more opportunity for people to update their beliefs and less scope for people getting away with sloppiness because their predictions take too long to evaluate.

A literally taken strong efficient market hypothesis (EMH) (Wikipedia) would suggest that financial crises are almost impossible to forecast, while a weaker reading of the EMH would suggest that the financial market is efficient (Wikipedia): it's hard to make money off the business of forecasting financial crises (for instance, you may know that a financial crisis is imminent with high probability, but the element of uncertainty, particularly with regards to timing, can destroy your ability to leverage that information to make money). On the other hand, there are a lot of people, often subscribed to competing schools of economic thought, who successfully forecast the 2007-08 financial crisis, at least in broad strokes.

Note that there are people who reject the EMH, yet claim that financial crises are very hard to forecast in a time-dependent fashion. Among them is Nassim Nicholas Taleb, as described here. Interestingly, Taleb's claim to fame appears to have been that he was able to forecast the 2007-08 financial crisis, albeit it was more of a time-independent forecast than a specific timed call. The irony was noted by by Jamie Whyte here in Standpoint Magazine.

I found a few sources of information for financial crises, that are discussed below.

Economic Predictions records predictions made by many prominent people and how they compared to what transpired.  In particular, this page on their website notes how many of the top investors, economists, and bureaucrats missed the financial crisis, but also identifies some exceptions: Dean Baker, Med Jones, Peter Schiff, and Nouriel Roubini. The page also discusses other candidates who claim to have forecasted the crisis in advance, and reasons why they were not included. While I think they've put in a fair deal of effort into their project, I didn't see good evidence that they have a strong grasp of the underlying fundamental issues they are discussing.

An insightful general overview of the financial crisis is found in Chapter 1 of Nate Silver's The Signal and the Noise, a book that I recommend you read in its entirety. Silver notes four levels of forecasting failure.


  • The housing bubble can be thought of as a poor prediction. Homeowners and investors thought that rising prices implied that home values would continue to rise, when in fact history suggested this made them prone to decline.
  • There was a failure on the part of the ratings agencies, as well as by banks like Lehman Brothers, to understand how risky mortgage-backed securities were. Contrary to the assertions they made before Congress, the problem was not that the ratings agencies failed to see the housing bubble. Instead, their forecasting models were full of faulty assumptions and false confidence about the risk that a collapse in housing prices might present.
  • There was a widespread failure to anticipate how a housing crisis could trigger a global financial crisis. It had resulted from the high degree of leverage in the market, with $ 50 in side bets staked on every $ 1 that an American was willing to invest in a new home.
  • Finally, in the immediate aftermath of the financial crisis, there was a failure to predict the scope of the economic problems that it might create. Economists and policy makers did not heed Reinhart and Rogoff’s finding that financial crises typically produce very deep and long-lasting recessions.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (pp. 42-43). Penguin Group US. Kindle Edition.


Silver finds a common thread among all the failures (emphases in original):

There is a common thread among these failures of prediction. In each case, as people evaluated the data, they ignored a key piece of context:

  • The confidence that homeowners had about housing prices may have stemmed from the fact that there had not been a substantial decline in U.S. housing prices in the recent past. However, there had never before been such a widespread increase in U.S. housing prices like the one that preceded the collapse.
  • The confidence that the banks had in Moody’s and S& P’s ability to rate mortgage-backed securities may have been based on the fact that the agencies had generally performed competently in rating other types of financial assets. However, the ratings agencies had never before rated securities as novel and complex as credit default options.
  • The confidence that economists had in the ability of the financial system to withstand a housing crisis may have arisen because housing price fluctuations had generally not had large effects on the financial system in the past. However, the financial system had probably never been so highly leveraged, and it had certainly never made so many side bets on housing before.
  • The confidence that policy makers had in the ability of the economy to recuperate quickly from the financial crisis may have come from their experience of recent recessions, most of which had been associated with rapid, “V-shaped” recoveries. However, those recessions had not been associated with financial crises, and financial crises are different.

There is a technical term for this type of problem: the events these forecasters were considering were out of sample. When there is a major failure of prediction, this problem usually has its fingerprints all over the crime scene.

Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 43). Penguin Group US. Kindle Edition.

While I find Silver's analysis plausible and generally convincing, I don't think I have enough of an inside-view understanding of the issue.

A few other resources that I found, but didn't get a chance to investigate, are listed below:

#8: Pandemics

I haven't investigated this thoroughly, but here are a few of my impressions and findings:

  • I think that pandemics stand in relation to ordinary epidemiology in the same way that extreme weather events stand in relation to ordinary weather forecasting. In both cases, the main way we can get better at forecasting the rare and high-impact events is by getting better across the board. There is a difference that makes the relation between moderate disease outbreaks and pandemics even more important than the corresponding case for weather: measures taken quickly to react to local disease outbreaks can help prevent global pandemics.
  • Chapter 7 of Nate Silver's The Signal and the Noise, titled "Role Models", discusses forecasting and prediction in the domain of epidemiology. The goal of epidemiologists is to obtain predictive models that have a level of accuracy and precision similar to those used for the weather. However, the greater complexity of human behavior, as well as the self-fulfilling and self-canceling nature of various predictions, makes the modeling problem harder. Silver notes that agent-based modeling (Wikipedia) is one of the commonly used tools. Silver cites a few examples from recent history where people were overly alarmed about possible pandemics, when the reality turned out to be considerably milder. However, the precautions taken due to the alarm may still have saved lives. Silver talks in particular of the 1976 swine flu outbreak (where the reaction turned out be grossly disproportional to the problem, and caused its own unintended consequences) and the 2009 flu pandemic.
  • In recent years, Google Flu Trends (website, Wikipedia) has been a common technique in identifying and taking quick action against the flu. Essentially, Google uses the volume of web search for flu-related terms by geographic location to identify the incidence of the flu by geographic location. It offers an early "leading indicator" of flu incidence compared to official reports, that are published after a time lag. However, Google Flu Trends has run into problems of reliability: news stories about the flu might prompt people to search for flu-related terms, even if they aren't experiencing symptoms of the flu. Or it may even be the case that Google's own helpful search query completions get people to search for flu-related terms once other people start searching for the term. Tim Harford discusses the problems in the Financial Times here. I think Silver doesn't discuss this (which is a surprise, since it would have fit well with the theme of his chapter).

#9: Near-earth object impacts

I haven't looked into this category in sufficient detail. I'll list below the articles I read.

[Question] Adoption and twin studies confounders

4 Stuart_Armstrong 11 July 2014 04:44PM

Adoption and twin studies are very important for determining the impact of genes versus environment in the modern world (and hence the likely impact of various interventions). Other types of studies tend to show larger effects for some types of latter interventions, but these studies are seen as dubious, as they may fail to adjust for various confounders (eg families with more books also have more educated parents).

But adoption studies have their own confounders. The biggest ones are that in many countries, the genetic parents have a role in choosing the adoptive parents. Add the fact that adoptive parents also choose their adopted children, and that various social workers and others have great influence over the process, this would seem a huge confounder interfering with the results.

This paper also mentions a confounder for some types of twin studies, such as identical versus fraternal twins. They point out that identical twins in the same family will typically get a much greater shared environment than fraternal twins, because people will treat them much more similarly. This is to my mind quite a weak point, but it is an issue nonetheless.

Since I have very little expertise in these areas, I was just wondering if anyone knew about efforts to estimate the impact of these confounders and adjust for them.

Weekly LW Meetups

1 FrankAdamek 11 July 2014 04:00PM
This summary was posted to LW Main on July 4th, the following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Mountain View, New York, Philadelphia, Research Triangle NC, Salt Lake City, Seattle, Sydney, Toronto, Vienna, Washington DC, Waterloo, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

[QUESTION]: Driverless car forecasts

7 VipulNaik 11 July 2014 12:25AM

Of the technologies that have a reasonable chance of come to mass market in the next 20-25 years and having a significant impact on human society, driverless cars (also known as self-driving cars or autonomous cars) stand out. I was originally planning to collect material discussing driverless cars, but Gwern has a really excellent compendium of statements about driverless cars, published January 2013 (if you're reading this, Gwern, thanks!). There have been a few developments since then (for instance, Google's announcement that it was building its own driverless car, or a startup called Cruise Automation planning to build a $10,000 driverless car) but the overall landscape remains similar. There's been some progress with understanding and navigating city streets and with handling adverse weather conditions, and it's more or less on schedule.

My question is about driverless car forecasts. Driverless Future has a good summary page of forecasts made by automobile manufacturer, insurers, and professional societies. The range of time for the arrival of the first commercial driverless cars varies between 2018 and 2030. The timeline for driverless cars to achieve mass penetration is similarly stagged between the early 2020s and 2040. (The forecasts aren't all directly comparable).

A few thoughts come to mind:

  1. Insurer societies and professional societies seem more conservative in their estimates than manufacturers (both automobile manufacturers and people manufacturing the technology for driverless cars). Note that the estimates of many manufacturers are centered on their projected release dates for their own driverless cars. This suggests an obvious conflict of interest: manufacturers may be incentivized to be optimistic in their projections of when driverless cars will be released, insofar as making more optimistic predictions wins them news coverage and might also improve their market valuation. (At the same time, the release dates are sufficiently far in the future that it's unlikely that they'll be held to account for false projections, so there isn't a strong incentive to be conservative the same way as there is with quarterly sales and earning forecasts). Overall, then, I'd defer more to the judgment of the professional societies, namely the IEEE and the Society of Autonomous Engineers.
  2. The statements compiled by Gwern point to the many legal hurdles and other thorny issues of ethics that would need to be resolved, at least partially, before driverless cars start becoming a big presence in the market.
  3. The general critique made by Schnaars in Megamistakes (that I discussed here) applies to driverless car technology: consumers may be unwilling to pay the added cost despite the safety benefits. Some of the quotes in Gwern's compendium reference related issues. This points further in the direction of forecasts by manufacturers being overly optimistic.

Questions for the people here:

  • Do you agree with my points (1)-(3) above?
  • Would you care to make forecasts for things such as: (a) the date that the first commercial driverless car will hit the market in a major country or US state? (b) the date by which over 10% of new cars sold in a large country or US state will be driverless (i.e., capable of fully autonomous operation), (c) same as (b), but over 50%, (d) the date by which over 10% of cars on the road (in a large country or US state) will be operating autonomously, (e) same as (d), but over 50%. You don't have to answer these exact questions, I'm just providing some suggestions since "forecast the future of driverless cars" is overly vague.
  • What's your overall view on whether it is desirable at the margin to speed up or slow down the arrival of autonomous vehicles on the road? What factors would you consider in answering such a question?

View more: Next