Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
As previously discussed, on June 6th I received a message from jackk, a Trike Admin. He reported that the user Jiro had asked Trike to carry out an investigation to the retributive downvoting that Jiro had been subjected to. The investigation revealed that the user Eugine_Nier had downvoted over half of Jiro's comments, amounting to hundreds of downvotes.
I asked the community's guidance on dealing with the issue, and while the matter was being discussed, I also reviewed previous discussions about mass downvoting and looked for other people who mentioned being the victims of it. I asked Jack to compile reports on several other users who mentioned having been mass-downvoted, and it turned out that Eugine was also overwhelmingly the biggest downvoter of users David_Gerard, daenarys, falenas108, ialdabaoth, shminux, and Tenoke. As this discussion was going on, it turned out that user Ander had also been targeted by Eugine.
I sent two messages to Eugine, requesting an explanation. I received a response today. Eugine admitted his guilt, expressing the opinion that LW's karma system was failing to carry out its purpose of keeping out weak material and that he was engaged in a "weeding" of users who he did not think displayed sufficient rationality.
Needless to say, it is not the place of individual users to unilaterally decide that someone else should be "weeded" out of the community. The Less Wrong content deletion policy contains this clause:
Harrassment of individual users.
If we determine that you're e.g. following a particular user around and leaving insulting comments to them, we reserve the right to delete those comments. (This has happened extremely rarely.)
Although the wording does not explicitly mention downvoting, harassment by downvoting is still harassment. Several users have indicated that they have experienced considerable emotional anguish from the harassment, and have in some cases been discouraged from using Less Wrong at all. This is not a desirable state of affairs, to say the least.
I was originally given my moderator powers on a rather ad-hoc basis, with someone awarding mod privileges to the ten users with the highest karma at the time. The original purpose for that appointment was just to delete spam. Nonetheless, since retributive downvoting has been a clear problem for the community, I asked the community for guidance on dealing with the issue. The rough consensus of the responses seemed to authorize me to deal with the problem as I deemed appropriate.
The fact that Eugine remained quiet about his guilt until directly confronted with the evidence, despite several public discussions of the issue, is indicative of him realizing that he was breaking prevailing social norms. Eugine's actions have worsened the atmosphere of this site, and that atmosphere will remain troubled for as long as he is allowed to remain here.
Therefore, I now announce that Eugine_Nier is permanently banned from posting on LessWrong. This decision is final and will not be changed in response to possible follow-up objections.
Unfortunately, it looks like while a ban prevents posting, it does not actually block a user from casting votes. I have asked jackk to look into the matter and find a way to actually stop the downvoting. Jack indicated earlier on that it would be technically straightforward to apply a negative karma modifier to Eugine's account, and wiping out Eugine's karma balance would prevent him from casting future downvotes. Whatever the easiest solution is, it will be applied as soon as possible.
Last month I saw this post: http://lesswrong.com/lw/kbc/meta_the_decline_of_discussion_now_with_charts/ addressing whether the discussion on LessWrong was in decline. As a relatively new user who had only just started to post comments, my reaction was: “I hope that LessWrong isn’t in decline, because the sequences are amazing, and I really like this community. I should try to write a couple articles myself and post them! Maybe I could do an analysis/summary of certain sequences posts, and discuss how they had helped me to change my mind”. I started working on writing an article.
Then I logged into LessWrong and saw that my Karma value was roughly half of what it had been the day before. Previously I hadn’t really cared much about Karma, aside from whatever micro-utilons of happiness it provided to see that the number slowly grew because people generally liked my comments. Or at least, I thought I didn’t really care, until my lizard brain reflexes reacted to what it perceived as an assault on my person.
Had I posted something terrible and unpopular that had been massively downvoted during the several days since my previous login? No, in fact my ‘past 30 days’ Karma was still positive. Rather, it appeared that everything I had ever posted to LessWrong now had a -1 on it instead of a 0. Of course, my loss probably pales in comparison to that of other, more prolific posters who I have seen report this behavior.
So what controversial subject must I have commented on in order to trigger this assault? Well, let’s see, in the past week I had asked if anyone had any opinions of good software engineer interview questions I could ask a candidate. I posted in http://lesswrong.com/lw/kex/happiness_and_children/ that I was happy to not have children, and finally, here in what appears to me to be by far the most promising candidate:http://lesswrong.com/r/discussion/lw/keu/separating_the_roles_of_theory_and_direct/ I replied to a comment about global warming data, stating that I routinely saw headlines about data supporting global warming.
Here is our scenario: A new user is attempting to participate on a message board that values empiricism and rationality, posted that evidence supports that climate change is real. (Wow, really rocking the boat here!) Then, apparently in an effort to ‘win’ this discussion by silencing opposition, someone went and downvoted every comment this user had ever made on the site. Apparently they would like to see LessWrong be a bastion of empiricism and rationality and [i]climate change denial[/i] instead? And the way to achieve this is not to have a fair and rational discussion of the existing empirical data, but rather to simply Karmassassinate anyone who would oppose them?
Here is my hypothesis: The continuing problem of karma downvote stalkers is contributing to the decline of discussion on the site. I definitely feel much less motivated to try and contribute anything now, and I have been told by multiple other people at LessWrong meetings things such as “I used to post a lot on LessWrong, but then I posted X, and got mass downvoted, so now I only comment on Yvain’s blog”. These anecdotes are, of course, only very weak evidence to support my claim. I wish I could provide more, but I will have to defer to any readers who can supply more.
Perhaps this post will simply trigger more retribution, or maybe it will trigger an outswelling of support, or perhaps just be dismissed by people saying I should’ve posted it to the weekly discussion thread instead. Whatever the outcome, rather than meekly leaving LessWrong and letting my 'stalker' win, I decided to open a discussion about the issue. Thank you!
This paper, or more often the New Scientist's exposition of it is being discussed online and is rather topical here. In a nutshell, stimulating one small but central area of the brain reversibly rendered one epilepsia patient unconscious without disrupting wakefulness. Impressively, this phenomenon has apparently been hypothesized before, just never tested (because it's hard and usually unethical). A quote from the New Scientist article (emphasis mine):
One electrode was positioned next to the claustrum, an area that had never been stimulated before.
When the team zapped the area with high frequency electrical impulses, the woman lost consciousness. She stopped reading and stared blankly into space, she didn't respond to auditory or visual commands and her breathing slowed. As soon as the stimulation stopped, she immediately regained consciousness with no memory of the event. The same thing happened every time the area was stimulated during two days of experiments (Epilepsy and Behavior, doi.org/tgn).
To confirm that they were affecting the woman's consciousness rather than just her ability to speak or move, the team asked her to repeat the word "house" or snap her fingers before the stimulation began. If the stimulation was disrupting a brain region responsible for movement or language she would have stopped moving or talking almost immediately. Instead, she gradually spoke more quietly or moved less and less until she drifted into unconsciousness. Since there was no sign of epileptic brain activity during or after the stimulation, the team is sure that it wasn't a side effect of a seizure.
If confirmed, this hints at several interesting points. For example, a complex enough brain is not sufficient for consciousness, a sort-of command and control structure is required, as well, even if relatively small. A low-consciousness state of late-stage dementia sufferers might be due to the damage specifically to the claustrum area, not just the overall brain deterioration. The researchers speculates that stimulating the area in vegetative-state patients might help "push them out of this state". From an AI research perspective, understanding the difference between wakefulness and consciousness might be interesting, too.
Jason Mitchell is [edit: has been] the John L. Loeb Associate Professor of the Social Sciences at Harvard. He has won the National Academy of Science's Troland Award as well as the Association for Psychological Science's Janet Taylor Spence Award for Transformative Early Career Contribution.
Here, he argues against the principle of replicability of experiments in science. Apparently, it's disrespectful, and presumptively wrong.
Recent hand-wringing over failed replications in social psychology is largely pointless, because unsuccessful experiments have no meaningful scientific value.
Because experiments can be undermined by a vast number of practical mistakes, the likeliest explanation for any failed replication will always be that the replicator bungled something along the way. Unless direct replications are conducted by flawless experimenters, nothing interesting can be learned from them.
Three standard rejoinders to this critique are considered and rejected. Despite claims to the contrary, failed replications do not provide meaningful information if they closely follow original methodology; they do not necessarily identify effects that may be too small or flimsy to be worth studying; and they cannot contribute to a cumulative understanding of scientific phenomena.
Replication efforts appear to reflect strong prior expectations that published findings are not reliable, and as such, do not constitute scientific output.
The field of social psychology can be improved, but not by the publication of negative findings. Experimenters should be encouraged to restrict their “degrees of freedom,” for example, by specifying designs in advance.
Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators’ extraordinary claims.
This is why we can't have social science. Not because the subject is not amenable to the scientific method -- it obviously is. People are conducting controlled experiments and other people are attempting to replicate the results. So far, so good. Rather, the problem is that at least one celebrated authority in the field hates that, and would prefer much, much more deference to authority.
On a recent trip to Ireland, I gave a talk on tactics for having better arguments (video here). There's plenty in the video that's been discussed on LW before (Ideological Turing Tests and other reframes), but I thought I'd highlight one other class of trick I use to have more fruitful disagreements.
It's hard, in the middle of a fight, to remember, recognize, and defuse common biases, rhetorical tricks, emotional triggers, etc. I'd rather cheat than solve a hard problem, so I put a lot of effort into shifting disagreements into environments where it's easier for me and my opposite-number to reason and argue well, instead of relying on willpower. Here's a recent example of the kind of shift I like to make:
A couple months ago, a group of my friends were fighting about the Brendan Eich resignation on facebook. The posts were showing up fast; everyone was, presumably, on the edge of their seats, fueled by adrenaline, and alone at their various computers. It’s a hard place to have a charitable, thoughtful debate.
I asked my friends (since they were mostly DC based) if they’d be amenable to pausing the conversation and picking it up in person. I wanted to make the conversation happen in person, not in front of an audience, and in a format that let people speak for longer and ask questions more easily. If so, I promised to bake cookies for the ultimate donnybrook.
My friends probably figured that I offered cookies as a bribe to get everyone to change venues, and they were partially right. But my cookies had another strategic purpose. When everyone arrived, I was still in the process of taking the cookies out of the oven, so I had to recruit everyone to help me out.
“Alice, can you pour milk for people?”
“Bob, could you pass out napkins?”
“Eve, can you greet people at the door while I’m stuck in the kitchen with potholders on?”
Before we could start arguing, people on both sides of the debate were working on taking care of each other and asking each others’ help. Then, once the logistics were set, we all broke bread (sorta) with each other and had a shared, pleasurable experience. Then we laid into each other.
Sharing a communal experience of mutual service didn’t make anyone pull their intellectual punches, but I think it made us more patient with each other and less anxiously fixated on defending ourselves. Sharing food and seating helped remind us of the relationships we enjoyed with each other, and why we cared about probing the ideas of this particular group of people.
I prefer to fight with people I respect, who I expect will fight in good faith. It's hard to remember that's what I'm doing if I argue with them in the same forums (comment threads, fb, etc) that I usually see bad fights. An environment shift and other compensatory gestures makes it easier to leave habituated errors and fears at the door.
In early 2000, I registered my personal domain name weidai.com, along with a couple others, because I was worried that the small (sole-proprietor) ISP I was using would go out of business one day and break all the links on the web to the articles and software that I had published on my "home page" under its domain. Several years ago I started getting offers, asking me to sell the domain, and now they're coming in almost every day. A couple of days ago I saw the first six figure offer ($100,000).
In early 2009, someone named Satoshi Nakamoto emailed me personally with an announcement that he had published version 0.1 of Bitcoin. I didn't pay much attention at the time (I was more interested in Less Wrong than Cypherpunks at that point), but then in early 2011 I saw a LW article about Bitcoin, which prompted me to start mining it. I wrote at the time, "thanks to the discussion you started, I bought a Radeon 5870 and started mining myself, since it looks likely that I can at least break even on the cost of the card." That approximately $200 investment (plus maybe another $100 in electricity) is also worth around six figures today.
Clearly, technological advances can sometimes create gold rush-like situations (i.e., first-come-first-serve opportunities to make truly extraordinary returns with minimal effort or qualifications). And it's possible to stumble into them without even trying. Which makes me think, maybe we should be trying? I mean, if only I had been looking for possible gold rushes, I could have registered a hundred domain names optimized for potential future value, rather than the few that I happened to personally need. Or I could have started mining Bitcoins a couple of years earlier and be a thousand times richer.
I wish I was already an experienced gold rush spotter, so I could explain how best to do it, but as indicated above, I participated in the ones that I did more or less by luck. Perhaps the first step is just to keep one's eyes open, and to keep in mind that tech-related gold rushes do happen from time to time and they are not impossibly difficult to find. What other ideas do people have? Are there other past examples of tech gold rushes besides the two that I mentioned? What might be some promising fields to look for them in the future?
Analogy gets a bad rap around here, and not without reason. The kinds of argument from analogy condemned in the above links fully deserve the condemnation they get. Still, I think it's too easy to read them and walk away thinking "Boo analogy!" when not all uses of analogy are bad. The human brain seems to have hardware support for thinking in analogies, and I don't think this capability is a waste of resources, even in our highly non-ancestral environment. So, assuming that the linked posts do a sufficient job detailing the abuse and misuse of analogy, I'm going to go over some legitimate uses.
The first thing analogy is really good for is description. Take the plum pudding atomic model. I still remember this falsified proposal of negative 'raisins' in positive 'dough' largely because of the analogy, and I don't think anyone ever attempted to use it to argue for the existence of tiny subnuclear particles corresponding to cinnamon.
But this is only a modest example of what analogy can do. The following is an example that I think starts to show the true power: my comment on Robin Hanson's 'Don't Be "Rationalist"'. To summarize, Robin argued that since you can't be rationalist about everything you should budget your rationality and only be rational about the most important things; I replied that maybe rationality is like weightlifting, where your strength is finite yet it increases with use. That comment is probably the most successful thing I've ever written on the rationalist internet in terms of the attention it received, including direct praise from Eliezer and a shoutout in a Scott Alexander (yvain) post, and it's pretty much just an analogy.
Here's another example, this time from Eliezer. As part of the AI-Foom debate, he tells the story of Fermi's nuclear experiments, and in particular his precise knowledge of when a pile would go supercritical.
What do the above analogies accomplish? They provide counterexamples to universal claims. In my case, Robin's inference that rationality should be spent sparingly proceeded from the stated premise that no one is perfectly rational about anything, and weightlifting was a counterexample to the implicit claim 'a finite capacity should always be directed solely towards important goals'. If you look above my comment, anon had already said that the conclusion hadn't been proven, but without the counterexample this claim had much less impact.
In Eliezer's case, "you can never predict an unprecedented unbounded growth" is the kind of claim that sounds really convincing. "You haven't actually proved that" is a weak-sounding retort; "Fermi did it" immediately wins the point.
The final thing analogies do really well is crystallize patterns. For an example of this, let's turn to... Failure by Analogy. Yep, the anti-analogy posts are themselves written almost entirely via analogy! Alchemists who glaze lead with lemons and would-be aviators who put beaks on their machines are invoked to crystallize the pattern of 'reasoning by similarity'. The post then makes the case that neural-net worshippers are reasoning by similarity in just the same way, making the same fundamental error.
It's this capacity that makes analogies so dangerous. Crystallizing a pattern can be so mentally satisfying that you don't stop to question whether the pattern applies. The antidote to this is the question, "Why do you believe X is like Y?" Assessing the answer and judging deep similarities from superficial ones may not always be easy, but just by asking you'll catch the cases where there is no justification at all.
Separating the roles of theory and direct empirical evidence in belief formation: the examples of minimum wage and anthropogenic global warming
I recently asked two questions on Quora with similar question structures, and the similarities and differences between the responses were interesting.
Question #1: Anthropogenic global warming, the greenhouse effect, and the historical weather record
I asked the question here. Question statement:
If you believe in Anthropogenic Global Warming (AGW), to what extent is your belief informed by the theory of the greenhouse effect, and to what extent is it informed by the historical temperature record?
In response to some comments, I added the following question details:
I also posted to Facebook here asking my friends about the pushback to my use of the term "belief" in my question.
Question #2: Effect of increase in the minimum wage on unemployment
I asked the question here. Question statement:
If you believe that raising the minimum wage is likely to increase unemployment, to what extent is your belief informed by the theory of supply and demand and to what extent is it informed by direct empirical evidence?
I added the following question details:
By "direct empirical evidence" I am referring to empirical evidence that directly pertains to the relation between minimum wage raises and employment level changes, not empirical evidence that supports the theory of supply and demand in general (because transferring that to the minimum wage context would require one to believe the transferability of the theory).
Also, when I say "believe that raising the minimum wage is likely to increase unemployment" I am talking about minimum wage increases of the sort often considered in legislative measures, and by "likely" I just mean that it's something that should always be seriously considered whenever a proposal to raise the minimum wage is made. The belief would be consistent with believing that in some cases minimum wage raises have no employment effects.
I also posted the question to Facebook here.
Similarities between the questions
The questions are structurally similar, and belong to a general question type of considerable interest to the LessWrong audience. The common features to the questions:
- In both cases, there is a theory (the greenhouse effect for Question #1, and supply and demand for Question #2) that is foundational to the domain and is supported through a wide range of lines of evidence.
- In both cases, the quantitative specifics of the extent to which the theory applies in the particular context are not clear. There are prima facie plausible arguments that other factors may cancel out the effect and there are arguments for many different effect sizes.
- In both cases, people who study the broad subject (climate scientists for Question #1, economists for Question #2) are more favorably disposed to the belief than people who do not study the broad subject.
- In both cases, a significant part of the strength of belief of subject matter experts seems to be their belief in the theory. The data, while consistent with the theory, does not seem to paint a strong picture in isolation. For the minimum wage, consider the Card and Krueger study. Bryan Caplan discusses how Bayesian reasoning with strong theoretical priors can lead one to continue believing that minimum wage increases cause unemployment to rise, without addressing Card and Krueger at the object level. For the case of anthropogenic global warming, consider the draft by Kesten C. Green (addressing whether a warming-based forecast has higher forecast accuracy than a no-change forecast) or the paper AGW doesn't cointegrate by Beenstock, Reingewertz, and Paldor (addressing whether, looking at the data alone, we can get good evidence that carbon dioxide concentration increases are linked with temperature increases).
- In both cases, outsiders to the domain, who nonetheless have expertise in other areas that one might expect gives them insight into the question, are often more skeptical of the belief. A number of weather forecasters, physicists, and forecasting experts are skeptical of long-range climate forecasting or confident assertions about anthropogenic global warming. A number of sociologists, lawyers, and politicians often are disparaging of the belief that minimum wage increases cause unemployment levels to rise. The criticism is similar: namely, that a basically correct theory is being overstretched or incorrectly applied to a situation that is too complex, is similar.
- In both cases, the debate is somewhat politically charged, largely because one's beliefs here affect one's views of proposed legislation (climate change mitigation legislation and minimum wage increase legislation). The anthropogenic global warming belief is more commonly associated with environmentalists, social democrats, and progressives, and (in the United States) with Democrats, whereas opposition to it is more common among conservatives and libertarians. The minimum wage belief is more commonly associated with free market views and (in the United States) with conservatives and Republicans, and opposition to it is more common among progressives and social democrats.
Looking for help
I'm interested in thoughts from the people here on these questions:
- Thoughts on the specifics of Question #1 and Question #2.
- Other possible questions in the same reference class (where a belief arises from a mix of theory and data, and the theory plays a fairly big role in driving the belief, while the data on its own is very ambiguous).
- Other similarities between Question #1 and Question #2.
- Ways that Question #1 and Question #2 are disanalogous.
- General thoughts on how this relates to Bayesian reasoning and other modes of belief formation based on a combination of theory and data.
Let me tell you a parable of the future. Let’s say, 70 years from now, in a large Western country we’ll call Nacirema.
One day far from now: scientific development has continued apace, and a large government project (with, unsurprisingly, a lot of military funding) has taken the scattered pieces of cutting-edge research and put them together into a single awesome technology, which could revolutionize (or at least, vastly improve) all sectors of the economy. Leading thinkers had long forecast that this area of science’s mysteries would eventually yield to progress, despite theoretical confusion and perhaps-disappointing initial results and the scorn of more conservative types and the incomprehension (or outright disgust, for ‘playing god’) of the general population, and at last - it had! The future was bright.
Unfortunately, it was hurriedly decided to use an early prototype outside the lab in an impoverished foreign country. Whether out of arrogance, bureaucratic inertia, overconfidence on the part of the involved researchers, condescending racism, the need to justify the billions of grant-dollars that cumulative went into the project over the years by showing some use of it - whatever, the reasons no longer mattered after the final order was signed. The technology was used, but the consequences turned out to be horrific: over a brief period of what seemed like mere days, entire cities collapsed and scores - hundreds - of thousands of people died. (Modern economies are extremely interdependent and fragile, and small disruptions can have large consequences; more people died in the chaos of the evacuation of the areas around Fukushima than will die of the radiation.)
Here is an interesting blog post about a guy who did a resume experiment between two positions which he argues are by experience identical, but occupy different "social status" positions in tech: A software engineer and a data manager.
Interview A: as Software Engineer
Bill faced five hour-long technical interviews. Three went well. One was so-so, because it focused on implementation details of the JVM, and Bill’s experience was almost entirely in C++, with a bit of hobbyist OCaml. The last interview sounds pretty hellish. It was with the VP of Data Science, Bill’s prospective boss, who showed up 20 minutes late and presented him with one of those interview questions where there’s “one right answer” that took months, if not years, of in-house trial and error to discover. It was one of those “I’m going to prove that I’m smarter than you” interviews...
Let’s recap this. Bill passed three of his five interviews with flying colors. One of the interviewers, a few months later, tried to recruit Bill to his own startup. The fourth interview was so-so, because he wasn’t a Java expert, but came out neutral. The fifth, he failed because he didn’t know the in-house Golden Algorithm that took years of work to discover. When I asked that VP/Data Science directly why he didn’t hire Bill (and he did not know that I knew Bill, nor about this experiment) the response I got was “We need people who can hit the ground running.” Apparently, there’s only a “talent shortage” when startup people are trying to scam the government into changing immigration policy. The undertone of this is that “we don’t invest in people”.
Or, for a point that I’ll come back to, software engineers lack the social status necessary to make others invest in them.
Interview B: as Data Science manager.
A couple weeks later, Bill interviewed at a roughly equivalent company for the VP-level position, reporting directly to the CTO.
Worth noting is that we did nothing to make Bill more technically impressive than for Company A. If anything, we made his technical story more honest, by modestly inflating his social status while telling a “straight shooter” story for his technical experience. We didn’t have to cover up periods of low technical activity; that he was a manager, alone, sufficed to explain those away.
Bill faced four interviews, and while the questions were behavioral and would be “hard” for many technical people, he found them rather easy to answer with composure. I gave him the Golden Answer, which is to revert to “There’s always a trade-off between wanting to do the work yourself, and knowing when to delegate.” It presents one as having managerial social status (the ability to delegate) but also a diligent interest in, and respect for, the work. It can be adapted to pretty much any “behavioral” interview question...
Bill passed. Unlike for a typical engineering position, there were no reference checks. The CEO said, “We know you’re a good guy, and we want to move fast on you”. As opposed tot he 7-day exploding offers typically served to engineers, Bill had 2 months in which to make his decision. He got a fourth week of vacation without even having to ask for it, and genuine equity (about 75% of a year’s salary vesting each year)...
It was really interesting, as I listened in, to see how different things are once you’re “in the club”. The CEO talked to Bill as an equal, not as a paternalistic, bullshitting, “this is good for your career” authority figure. There was a tone of equality that a software engineer would never get from the CEO of a 100-person tech company.
The author concludes that positions that are labeled as code-monkey-like are low status, while positions that are labeled as managerial are high status. Even if they are "essentially" doing the same sort of work.
Not sure about this methodology, but it's food for thought.
So I just wound up in a debate with someone over on Reddit about the value of conventional academic philosophy. He linked me to a book review, in which both the review and the book are absolutely godawful. That is, the author (and the reviewer following him) start with ontological monism (the universe only contains a single kind of Stuff: mass-energy), adds in the experience of consciousness, reasons deftly that emergence is a load of crap... and then arrives to the conclusion of panpsychism.
WAIT HOLD ON, DON'T FLAME YET!
Of course panpsychism is bunk. I would be embarrassed to be caught upholding it, given the evidence I currently have, but what I want to talk about is the logic being followed.
1) The universe is a unified, consistent whole. Good!
2) The universe contains the experience/existence of consciousness. Easily observable.
3) If consciousness exists, something in the universe must cause or give rise to consciousness. Good reasoning!
4) "Emergence" is a non-explanation, so that can't be it. Good!
5) Therefore, whatever stuff the unified universe is made of must be giving rise to consciousness in a nonemergent way.
6) Therefore, the stuff must be innately "mindy".
What went wrong in steps (5) and (6)? The man was actually reasoning more-or-less correctly! Given the universe he lived in, and the impossibility of emergence, he reallocated his probability mass to the remaining answer. When he had eliminated the impossible, whatever remained, however low its prior, must be true.
The problem was, he eliminated the impossible, but left open a huge vast space of possible hypotheses that he didn't know about (but which we do): the most common of these is the computational theory of mind and consciousness, which says that we are made of cognitive algorithms. A Solomonoff Inducer can just go on to the next length of bit-strings describing Turing machines, but we can't.
Now, I can spot the flaw in the reasoning here. What frightens me is: what if I'm presented with some similar argument, and I can't spot the flaw? What if, instead, I just neatly and stupidly reallocate my belief to what seems to me to be the only available alternative, while failing to go out and look for alternatives I don't already know about? Notably, it seems like expected evidence is conserved, but expecting to locate new hypotheses means I should be reducing my certainty about all currently-available hypotheses now to have some for dividing between the new possibilities.
If you can notice when you're confused, how do you notice when you're ignorant?
Following the interest in this proposal a couple of weeks ago, I've set up a Google Group for the purpose of giving people a venue to discuss R, talk about their projects, seek advice, share resources, and provide a social motivator to hone their skills. Having done this, I'd now like to bullet-point a few reasons for learning applied statistical skills in general, and R in particular:
The General Case:
- Statistics seems to be a subject where it's easy to delude yourself into thinking you know a lot about it. This is visibly apparent on Less Wrong. Although there are many subject experts on here, there are also a lot of people making bold pronouncements about Bayesian inference who wouldn't recognise a beta distribution if it sat on them. Don't be that person! It's hard to fool yourself into thinking you know something when you have to practically apply it.
- Whenever you think "I wonder what kind of relationship exists between [x] and [y]", it's within your power to investigate this.
- Statistics has a rich conceptual vocabulary for reasoning about how observations generalise, and how useful those generalisations might be when making inferences about future observations. These are the sorts of skills we want to be practising as aspiring rationalists.
- Scientific literature becomes a lot more readable when you appreciate the methods behind them. You'll have a much greater understanding of scientific findings if you appreciate what the finding means in the context of statistical inference, rather than going off whatever paraphrased upshot is given in the abstract.
- Statistical techniques make use of fundamental mathematical methods in an applicable way. If you're learning linear algebra, for example, and you want an intuitive understanding of eigenvectors, you could do a lot worse than learning about principal component analysis.
R in particular:
- It's non-proprietary, (read "free"). Many competitive products are ridiculously expensive to license.
- Since it's common in academia, newer or more exotic statistical tools and procedures are more likely to have been implemented and made available in R than proprietary statistical packages or other software libraries.
- R skills are a strong signal of technical competence that will distinguish you from SPSS mouse-jockeys.
- There are many out-of-the-box packages for carrying out statistical procedures that you'd probably have to cobble together yourself if you were working in Python or Java.
- Having said that, popular languages such as Python and Java have libraries for interfacing with R.
- There's a discussion / support group for R with Less Wrong users in it. :-)
This post is an explanation of a recent paper coauthored by Sean Carroll and Charles Sebens, where they propose a derivation of the Born rule in the context of the Many World approach to quantum mechanics. While the attempt itself is not fully successful, it contains interesting ideas and it is thus worthwhile to know.
A note to the reader: here I will try to enlighten the preconditions and give only a very general view of their method, and for this reason you won’t find any equation. It is my hope that if after having read this you’re still curious about the real math, you will point your browser to the preceding link and read the paper for yourself.
If you are not totally new to LessWrong, you should know by now that the preferred interpretation of quantum mechanics (QM) around here is the Many World Interpretation (MWI), which negates the collapse of the wave-function and postulates a distinct reality (that is, a branch) for every base state composing a quantum superposition.
MWI historically suffered from three problems: the absence of macroscopic superpositions, the preferred basis problem, the Born rule derivation. The development of decoherence famously solved the first and, to a lesser degree, the second problem, but the role of the third still remains one of the most poorly understood side of the theory.
Quantum mechanics assigns an amplitude, a complex number, to each branch of a superposition, and postulates that the probability of an observer to find the system in that branch is the (squared) norm of the amplitude. This, very briefly, is the content of the Born rule (for pure states).
Quantum mechanics remains agnostic about the ontological status of both amplitudes and probabilities, but MWI, assigning a reality status to every branch, demotes ontological uncertainty (which branch will become real after observation) to indexical uncertainty (which branch the observer will find itself correlated to after observation).
Simple indexical uncertainty, though, cannot reproduce the exact predictions of QM: by the Indifference principle, if you have no information privileging any member in a set of hypothesis, you should assign equal probability to each one. This leads to forming a probability distribution by counting the branches, which only in special circumstances coincides with amplitude-derived probabilities. This discrepancy, and how to account for it, constitutes the Born rule problem in MWI.
There have been of course many attempts at solving it, for a recollection I quote directly the article:
One approach is to show that, in the limit of many observations, branches that do not obey the Born Rule have vanishing measure. A more recent twist is to use decision theory to argue that a rational agent should act as if the Born Rule is true. Another approach is to argue that the Born Rule is the only well-deﬁned probability measure consistent with the symmetries of quantum mechanics.
These proposals have failed to uniformly convince physicists that the Born rule problem is solved, and the paper by Carroll and Sebens is another attempt to reach a solution.
Before describing their approach, there are some assumptions that have to be clarified.
The first, and this is good news, is that they are treating probabilities as rational degrees of belief about a state of the world. They are thus using a Bayesian approach, although they never call it that way.
The second is that they’re using self-locating indifference, again from a Bayesian perspective.
Self-locating indifference is the principle that you should assign equal probabilities to find yourself in different places in the universe, if you have no information that distinguishes the alternatives. For a Bayesian, this is almost trivial: self-locating propositions are propositions like any other, so the principle of indifference should be used on them as it should on any other prior information. This is valid for quantum branches too.
The third assumption is where they start to deviate from pure Bayesianism: it’s what they call Epistemic Separability Principle, or ESP. In their words:
the outcome of experiments performed by an observer on a speciﬁc system shouldn’t depend on the physical state of other parts of the universe.
This is a kind of a Markov condition: the request that the system is such that it screens the interaction between the observer and the system observed from every possible influence of the environment.
It is obviously false for many partitions of a system into an experiment and an environment, but rather than taking it as a Principle, we can make it an assumption: an experiment is such only if it obeys the condition.
In the context of QM, this condition amounts to splitting the universal wave-function into two components, the experiment and the environment, so that there’s no entanglement between the two, and to consider only interactions that can factors as a product of an evolution for the environment and an evolution for the experiment. In this case, environment evolution act as the identity operator on the experiment, and does not affect the behavior of the experiment wave-function.
Thus, their formulation requires that the probability that an observer finds itself in a certain branch after a measurement is independent on the operations performed on the environment.
Note though, an unspoken but very important point: probabilities of this kind depends uniquely on the superposition structure of the experiment.
A probability, being an abstract degree of belief, can depend on all sorts of prior information. With their quantum version of ESP, Carroll and Sebens are declaring that, in a factored environment, probabilities of a subsystem does not depend on the information one has about the environment. Indeed, in this treatment, they are equating factorization and lack of logical connection.
This is of course true in quantum mechanics, but is a significant burden in a pure Bayesian treatment.
That said, let’s turn to their setup.
They imagine a system in a superposition of base states, which first interacts and decoheres with an environment, then gets perceived by an observer. This sequence is crucial: the Carroll-Sebens move can only be applied when the system already has decohered with a sufficiently large environment.
I say “sufficiently large” because the next step is to consider a unitary transformation on the “system+environment” block. This transformation needs to be of this kind:
- it respects ESP, in that it has to factor as an identity transformation on the “observer+system” block;
- it needs to equally distribute the probability of each branch in the original superposition on a different branch in the decohered block, according to their original relative measure.
Then, by a simple method of rearranging labels of the decohered base, one can show that the correct probabilities comes out by the indifference principle, in the very same way that the principle is used to derive the uniform probability distribution in the second chapter of Jaynes’ Probability Theory.
As an example, consider a superposition of a quantum bit, and say that one branch has a higher measure with respect to the other by a factor of square root of 2. The environment needs in this case to have at least 8 different base states to be relabeled in such a way to make the indifference principle work.
In theory, in this way you can only show that the Born rule is valid for amplitudes which differ one another by the square root of a rational number. Again I quote the paper for their conclusion:
however, since this is a dense set, it seems reasonable to conclude that the Born Rule is established.
Evidently, this approach suffers from a number of limits: the first and the most evident is that it works only in a situation where the system to be observed has already decohered with an environment. It is not applicable to, say, a situation where a detector reads a quantum superposition directly, e.g. in a Stern-Gerlach experiment.
The second limit, although less serious, is that it can work only when the system to be observed decoheres with an environment which has sufficiently base states to distribute the relative measure in different branches. This number, for a transcendental amplitude, is bound to be infinite.
The third limit is that it can only work if we are allowed to interact with the environment in such a way as to leave the amplitudes of the interaction between the system and the observer untouched.
All of these, which are understood as limits, can naturally be reversed and considered as defining conditions, saying: the Born rule is valid only within those limits.
I’ll leave it to you to determine if this constitutes a sufficient answers to the Born rule problem in MWI.
Summary: I don't think 'politics is the mind-killer' works well rthetorically. I suggest 'politics is hard mode' instead.
My usual first objection is that it seems odd to single politics out as a “mind-killer” when there’s plenty of evidence that tribalism happens everywhere. Recently, there has been a whole kerfuffle within the field of psychology about replication of studies. Of course, some key studies have failed to replicate, leading to accusations of “bullying” and “witch-hunts” and what have you. Some of the people involved have since walked their language back, but it was still a rather concerning demonstration of mind-killing in action. People took “sides,” people became upset at people based on their “sides” rather than their actual opinions or behavior, and so on.
Unless this article refers specifically to electoral politics and Democrats and Republicans and things (not clear from the wording), “politics” is such a frightfully broad category of human experience that writing it off entirely as a mind-killer that cannot be discussed or else all rationality flies out the window effectively prohibits a large number of important issues from being discussed, by the very people who can, in theory, be counted upon to discuss them better than most. Is it “politics” for me to talk about my experience as a woman in gatherings that are predominantly composed of men? Many would say it is. But I’m sure that these groups of men stand to gain from hearing about my experiences, since some of them are concerned that so few women attend their events.
In this article, Eliezer notes, “Politics is an important domain to which we should individually apply our rationality — but it’s a terrible domain in which to learn rationality, or discuss rationality, unless all the discussants are already rational.” But that means that we all have to individually, privately apply rationality to politics without consulting anyone who can help us do this well. After all, there is no such thing as a discussant who is “rational”; there is a reason the website is called “Less Wrong” rather than “Not At All Wrong” or “Always 100% Right.” Assuming that we are all trying to be more rational, there is nobody better to discuss politics with than each other.
The rest of my objection to this meme has little to do with this article, which I think raises lots of great points, and more to do with the response that I’ve seen to it — an eye-rolling, condescending dismissal of politics itself and of anyone who cares about it. Of course, I’m totally fine if a given person isn’t interested in politics and doesn’t want to discuss it, but then they should say, “I’m not interested in this and would rather not discuss it,” or “I don’t think I can be rational in this discussion so I’d rather avoid it,” rather than sneeringly reminding me “You know, politics is the mind-killer,” as though I am an errant child. I’m well-aware of the dangers of politics to good thinking. I am also aware of the benefits of good thinking to politics. So I’ve decided to accept the risk and to try to apply good thinking there. [...]
I’m sure there are also people who disagree with the article itself, but I don’t think I know those people personally. And to add a political dimension (heh), it’s relevant that most non-LW people (like me) initially encounter “politics is the mind-killer” being thrown out in comment threads, not through reading the original article. My opinion of the concept improved a lot once I read the article.
In the same thread, Andrew Mahone added, “Using it in that sneering way, Miri, seems just like a faux-rationalist version of ‘Oh, I don’t bother with politics.’ It’s just another way of looking down on any concerns larger than oneself as somehow dirty, only now, you know, rationalist dirty.” To which Miri replied: “Yeah, and what’s weird is that that really doesn’t seem to be Eliezer’s intent, judging by the eponymous article.”
Eliezer replied briefly, to clarify that he wasn't generally thinking of problems that can be directly addressed in local groups (but happen to be politically charged) as "politics":
Hanson’s “Tug the Rope Sideways” principle, combined with the fact that large communities are hard to personally influence, explains a lot in practice about what I find suspicious about someone who claims that conventional national politics are the top priority to discuss. Obviously local community matters are exempt from that critique! I think if I’d substituted ‘national politics as seen on TV’ in a lot of the cases where I said ‘politics’ it would have more precisely conveyed what I was trying to say.
But that doesn't resolve the issue. Even if local politics is more instrumentally tractable, the worry about polarization and factionalization can still apply, and may still make it a poor epistemic training ground.
A subtler problem with banning “political” discussions on a blog or at a meet-up is that it’s hard to do fairly, because our snap judgments about what counts as “political” may themselves be affected by partisan divides. In many cases the status quo is thought of as apolitical, even though objections to the status quo are ‘political.’ (Shades of Pretending to be Wise.)
Because politics gets personal fast, it’s hard to talk about it successfully. But if you’re trying to build a community, build friendships, or build a movement, you can’t outlaw everything ‘personal.’
And selectively outlawing personal stuff gets even messier. Last year, daenerys shared anonymized stories from women, including several that discussed past experiences where the writer had been attacked or made to feel unsafe. If those discussions are made off-limits because they relate to gender and are therefore ‘political,’ some folks may take away the message that they aren’t allowed to talk about, e.g., some harmful or alienating norm they see at meet-ups. I haven’t seen enough discussions of this failure mode to feel super confident people know how to avoid it.
Since this is one of the LessWrong memes that’s most likely to pop up in cross-subcultural dialogues (along with the even more ripe-for-misinterpretation “policy debates should not appear one-sided“…), as a first (very small) step, my action proposal is to obsolete the ‘mind-killer’ framing. A better phrase for getting the same work done would be ‘politics is hard mode’:
1. ‘Politics is hard mode’ emphasizes that ‘mind-killing’ (= epistemic difficulty) is quantitative, not qualitative. Some things might instead fall under Middlingly Hard Mode, or under Nightmare Mode…
2. ‘Hard’ invites the question ‘hard for whom?’, more so than ‘mind-killer’ does. We’re used to the fact that some people and some contexts change what’s ‘hard’, so it’s a little less likely we’ll universally generalize.
3. ‘Mindkill’ connotes contamination, sickness, failure, weakness. In contrast, ‘Hard Mode’ doesn’t imply that a thing is low-status or unworthy. As a result, it’s less likely to create the impression (or reality) that LessWrongers or Effective Altruists dismiss out-of-hand the idea of hypothetical-political-intervention-that-isn’t-a-terrible-idea. Maybe some people do want to argue for the thesis that politics is always useless or icky, but if so it should be done in those terms, explicitly — not snuck in as a connotation.
4. ‘Hard Mode’ can’t readily be perceived as a personal attack. If you accuse someone of being ‘mindkilled’, with no context provided, that smacks of insult — you appear to be calling them stupid, irrational, deluded, or the like. If you tell someone they’re playing on ‘Hard Mode,’ that’s very nearly a compliment, which makes your advice that they change behaviors a lot likelier to go over well.
5. ‘Hard Mode’ doesn’t risk bringing to mind (e.g., gendered) stereotypes about communities of political activists being dumb, irrational, or overemotional.
6. ‘Hard Mode’ encourages a growth mindset. Maybe some topics are too hard to ever be discussed. Even so, ranking topics by difficulty encourages an approach where you try to do better, rather than merely withdrawing. It may be wise to eschew politics, but we should not fear it. (Fear is the mind-killer.)
7. Edit: One of the larger engines of conflict is that people are so much worse at noticing their own faults and biases than noticing others'. People will be relatively quick to dismiss others as 'mindkilled,' while frequently flinching away from or just-not-thinking 'maybe I'm a bit mindkilled about this.' Framing the problem as a challenge rather than as a failing might make it easier to be reflective and even-handed.
This is not an attempt to get more people to talk about politics. I think this is a better framing whether or not you trust others (or yourself) to have productive political conversations.
When I playtested this post, Ciphergoth raised the worry that 'hard mode' isn't scary-sounding enough. As dire warnings go, it's light-hearted—exciting, even. To which I say: good. Counter-intuitive fears should usually be argued into people (e.g., via Eliezer's politics sequence), not connotation-ninja'd or chanted at them. The cognitive content is more clearly conveyed by 'hard mode,' and if some group (people who love politics) stands to gain the most from internalizing this message, the message shouldn't cast that very group (people who love politics) in an obviously unflattering light. LW seems fairly memetically stable, so the main issue is what would make this meme infect friends and acquaintances who haven't read the sequences. (Or Dune.)
If you just want a scary personal mantra to remind yourself of the risks, I propose 'politics is SPIDERS'. Though 'politics is the mind-killer' is fine there too.
If you and your co-conversationalists haven’t yet built up a lot of trust and rapport, or if tempers are already flaring, conveying the message ‘I’m too rational to discuss politics’ or ‘You’re too irrational to discuss politics’ can make things worse. In that context, ‘politics is the mind-killer’ is the mind-killer. At least, it’s a needlessly mind-killing way of warning people about epistemic hazards.
‘Hard Mode’ lets you speak as the Humble Aspirant rather than the Aloof Superior. Strive to convey: ‘I’m worried I’m too low-level to participate in this discussion; could you have it somewhere else?’ Or: ‘Could we talk about something closer to Easy Mode, so we can level up together?’ More generally: If you’re worried that what you talk about will impact group epistemology, you should be even more worried about how you talk about it.
The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.
And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.
In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.
I play Starcraft:BW sometimes with my brothers. One of my brothers is much better than the rest of us combined. This story is typical: In a free-for-all, the rest of us gang up on him, knowing that he is the biggest threat. By sheer numbers we beat him down, but foolishly allow him to escape with a few workers. Despite suffering this massive setback, he rebuilds in hiding and ends up winning due to his ability to tirelessly expand his economy while simultaneously fending off our armies.
This story reminds me of some AI-takeover scenarios. I wonder: Could we make a video game that illustrates many of the core ideas surrounding AGI? For example, a game where the following concepts were (more or less) accurately represented as mechanics:
--AI arms race
--AI friendliness and unfriendliness
--rogue AI and AI takeover
--AI being awesome at epistemology and science and having amazing predictive power
--Interesting conversations between AI and their captors about whether or not they should be unboxed.
I thought about this for a while, and I think it would be feasible and (for some people at least) fun. I don't foresee myself being able to actually make this game any time soon, but I like thinking about it anyway. Here is a sketch of the main mechanics I envision:
(1) The most crucial part of this design is the "Modeling AI Predictive Power" section. This is how we represent the AI's massive advantage in predictive power. However, this comes at the cost of tripling the amount of time the game takes to play. Can you think of a better way to do this?
(2) I'd like AI's to be able to "predict" the messages that players send to each other also. However, it would be too much to ask players to make "Decoy Message Logs." Is it worth dropping the decoy idea (and making the predictions 100% accurate) to implement this?
(3) Any complaints about the skeleton sketched above? Perhaps something is wildly unrealistic, and should be replaced by a different mechanic that more accurately captures the dynamics of AGI?
For what its worth, I spent a reasonable amount of time thinking about the mechanics I used, and I think I could justify their realism. I expect to have made quite a few mistakes, but I wasn't just making stuff up on the fly.
(4) Any other ideas for mechanics to add to the game?
This is the first of two (or more) posts that look at the domain of weather and climate forecasting and what we can learn from the history and current state of these fields for forecasting as a domain. It may not be of general interest to the LessWrong community, but I hope that it's of interest to people here who have some interest either in weather-related material or in forecasting in general.
The science of weather forecasting has come a long way over the past century. Since people starting measuring and recording the weather (temperature, precipitation, etc.) two simple algorithms for weather prediction have existed (see also this):
- Persistence: Assume that the weather tomorrow will be the same as the weather today.
- Climatology: Assume that the weather on a given day of the year will be the same as the average of the weather on that same day in the last few years (we might also use averages for nearby days if we don't have enough years of data).
Until the end of the 19th century, there was no weather prediction algorithm that did consistently better than both persistence and climatology. Between persistence and climatology, climatology won out over medium to long time horizons (a week or more), whereas persistence won out in some kinds of places over short horizons (1-2 days), though even there, climatology sometimes does better (see more here). Both methods have very limited utility when it comes to predicting and preparing for rare extreme weather events, such as blizzards, hurricanes, cyclones, polar winds, or heavy rainfall.
This blog post discusses the evolution and progress of weather forecasting algorithms that significantly improve over the benchmarks of persistence and climatology, and the implications both for the future of weather forecasting and for our understanding of forecasting as a domain.
Sources for further reading (uncited material in my post is usually drawn from one of these): Wikipedia's page on the history of numerical weather prediction, The Signal and the Noise by Nate Silver, and The origins of computer weather prediction and climate modeling by Peter Lynch.
The three challenges: theory, measurement, and computation
The problem facing any method that tries to do better than persistence and climatology is that whereas persistence and climatology can rely on existing aggregate records, any more sophisticated prediction algorithm relies on measuring, theorizing about, and computing with a much larger number of other observed quantities. There are three aspects to the challenge:
- The theoretical challenge or model selection challenge: The goal is to write a system of equations describing how the climate system evolves from certain initial measurements. In the weather prediction context, this was the first of the challenges to be nailed: the basic equations of the atmosphere come from physics, which has been well understood for over a century now.
- The measurement challenge: A large number of measurements at different points in the area and at regular intervals of time need to be taken to initialize the data appropriately for the weather simulation. The measurement challenge was largely resolved early on: it was easy to set up stations for measuring temperature, humidity, and other indicators around the world, and communications technology enabled the data to be quickly relayed to a central processing station. Of course, many improvements have occurred over the 20th century: we can now make measurements using satellites, as well as directly measure weather indicators at different altitudes. But measurement challenges were not critical in getting weather prediction started.
- The computational challenge: This appears to have been the most difficult of the challenges and the critical constraint in making real-time weather predictions. The computations needed for making a forecast that could beat persistence and climatology over any time horizon were just too numerous for humans to carry out in real time. In fact, the ability to make accurate weather predictions was one of the motivations for the development of improved computing machinery.
The basic theory of weather forecasting
The basic idea of weather forecasting is to use the equations of physics to model the evolution of the atmospheric system. In order to do this, we need to know how the system looks at a given point. In principle, that information, combined with the equations, should allow us to compute the weather indefinitely into the future. In practice, the equations we create don't have any closed-form solutions, the data is only partial (we don't have initial data on the whole world) and even small variations at a given time can balloon to bigger changes (this is called the butterfly effect; more on this later in the post).
Instead of trying to solve the system analytically, we discretize the problem (we use discrete spatial locations and discrete time steps) and then solve the problem numerically (this is a bit like using a difference quotient instead of a derivative when computing a rate of change). There are four dimensions to the discretization (three spatial and one temporal), and how fine we make the grid in each dimension is called the resolution in that dimension.
- Spatial dimensions and spatial resolution: The region over which we are interested in forecasting the weather is converted to a grid. We have freedom in how fine we make the grid. In general, finer grids make for more precise and accurate weather predictions, but require more computational resources. The grid has two horizontal dimensions and one height dimension, hence a total of three dimensions. Thus, making the grid x times as fine (i.e., making the spatial resolution x times as fine) means increasing the number of regions to x3 times the current value.
- Time dimension and temporal resolution: We also choose a time step. In general, smaller time steps make for more precise and accurate weather predictions, but require more computational resources (because the number of time steps needed to traverse a particular length of time is more). If we divide the time step by x, we multiply the time and space storage needs by a factor of x.
Thus, roughly, becoming finer by a factor of x in all three spatial dimensions and the time dimension requires upping computational resources to about x4 of the original value. So, doubling in all four dimensions requires improving computational power to 16 times the original value, which means four doublings. Combining this with natural improvements in computing, such as Moore's law, we expect that we should be able to make our grid twice as fine in all dimensions (i.e., double the spatial and temporal resolution) every 8 years or so.
How much precision and accuracy does high resolution buy?
My intuitive prior would be that, for sufficiently short time horizons where we don't expect chaos to play a dominant role, we'd expect the relationship suggested by the logarithmic timeline. Does this agree with the literature?
I don't feel like I have a clear grasp of the literature, so the summary below is somewhat ad hoc. I hope it still helps with elucidating the relationship.
- Higher resolution means greater precision of forecasts holding the time horizon constant (assuming that it's a time horizon over which we could reasonably make forecasts). This makes sense: higher temporal resolution allows us to approximate the (almost) continuous evolution of the atmospheric system better, and higher spatial resolution allows us to work with a better initialization as well as approximate the continuous evolution better. For instance, a page on the website of weather forecasting service meteoblue has the title Resolution means precision.
- The relation between resolution and accuracy is less clear. Although, up to a point, higher resolution enables more accurate forecasts, the relation does not continue at ever-higher levels of precision, for a variety of reasons (including the chaos problem discussed next). For more, see Climate prediction: a limit to adaptation by Suraje Dessai, Mike Hulme, Robert Lempert, and Roger Pielke, Jr.
- The type of resolution that matters more can depend on the type of phenomenon that we are predicting. In some cases, temporal resolution is more important than spatial resolution. In some cases, particularly phenomena relating to interactions between the different layers of the atmosphere, vertical resolution matters more than horizontal resolution, whereas in other cases, horizontal resolution matters more. For instance, the paper Impacts of Numerical Weather Prediction Spatial Resolution On An Atmospheric Decision Aid For Directed Energy Weapon Systems finds that vertical resolution matters more than horizontal resolution for a particular application.
The problem of chaos and the butterfly effect
The main problem with weather prediction is hypersensitivity to initial conditions: even small differences in initial values can have huge effects over longer timescales (this is sometimes called the butterfly effect). This effect could occur at many different levels.
- Measurements may not be sufficiently precise or detailed (temperature and precipitation are measured only at a few weather stations rather than everywhere). Some of the measurements may be somewhat flawed as well. Apart from the usual measurement error, the choice of weather stations may introduce bias: weather stations have often historically been located close to airports and to other hubs of activity, where temperatures may be higher due to the heat generated by the processes nearby (see also the page on urban heat island).
- The computer programs that do numerical weather simulation don't store data to infinite precision. Choices of how to round off can profoundly affect weather predictions.
- There may be actions by humans, animals, or human institutions that aren't modeled in the atmospheric system, but perturb it sufficiently to affect weather predictions. For instance, if lots of people burst firecrackers on a day, that might affect local temperatures and air composition in a small manner that might have larger effects over the coming days.
Due to these problems, modern algorithms for numerical weather prediction run simulations with many slight variations of the given initial conditions, using a probabilistic model to assign probabilities to the different scenarios considered. Note that here we are making slight variations to the data and running the model on these variations to generate a collection of scenarios weighted by probability.
As the time horizon for forecasting increases (we get to one week ahead or beyond) our understanding of how the equilibrating influences of the weather play out is more fuzzy. For such timescales, we use ensemble forecasting with a collection of different models. The models may use different data and give attention to different aspects of the data, based on slightly different underlying theories of how the different weather phenomena interact. As before, we generate probabilistic weather predictions.
Can (and should) weather forecasting be fully automated?
Nate Silver observed in his book The Signal and the Noise that the proportional improvement that human input made to the computer models has stayed constant at about 25% for precipitation forecasts and 10% for temperature forecasts, even as the computer models, and therefore the final forecast, have improved considerably over the last few decades. The sources cited by Silver don't seem to be online, but I found another paper with the data that Silver uses. Silver says that humans' main input is in the following respects:
- Human vision (literally) is powerful in terms of identifying patterns and getting a broad sense of what is happening. Computers still have trouble seeing patterns. This is related to the fact that humans in general have an easier time with CAPTCHAs than computers, although that might be changing as machine learning improves.
- Humans have better intuition at identifying false runaway predictions. For instance, a computer might think that a particular weather phenomenon will snowball, whereas humans are likely to identify equilibrating influences that will prevent the snowballing. Humans are also better at reasoning about what is reasonable to expect based on climatological history.
On a related noted, Charles Doswell has argued that it would be dangerous to try to fully automate weather forecasting, because direct involvement with the weather forecasting process is crucial for meteorologists to get a better sense of how to make improvements to their models.
Machine learning in weather prediction?
For most of its history, weather prediction has relied on models grounded in our understanding of physics, with some aspects of the models being tweaked based on experience running the models. This differs somewhat from the spirit of machine learning algorithms. Supervised machine learning algorithms take a bunch of input data and output data and try to learn how to predict the outputs from the inputs, with considerable agnosticism about the underlying theoretical mechanisms. In the context of weather prediction, a machine learning algorithm might view the current measured data as the input, and the measured data after a certain time interval (or some summary variable, such as a binary variable recording whether or not it rained) as the output to be predicted. The algorithm would then try to learn a relation from the inputs to the outputs.
In recent years, machine learning ideas have started being integrated into weather forecasting. However, the core of weather prediction still relies on using theoretically grounded models. (Relevant links: Quora question on the use of machine learning algorithms in weather forecasting, Freakonomics post on a company that claims to use machine learning to predict the weather far ahead, Reddit post about that Freakonomics post).
My uninformed speculation is that machine learning algorithms would be most useful in substituting for the human input element to the model rather than the core of the numerical simulation. In particular, to the extent that machine learning algorithms can make progress on the problem of vision, they might be able to use their "vision" to better interpret the results of numerical weather prediction. Moreover, the machine learning algorithms would be particularly well-suited to using Bayesian priors (arising from knowledge of historical climate) to identify cases where the numerical models are producing false feedback loops and predicting things that seem unlikely to happen.
Prehistory: before weather simulation came to fruition: meeting the theoretical challenge
Here is a quick summary of the initial steps taken to realizing weather forecasting as a science. These steps concentrated on the theoretical challenge. The measurement and computational challenges would be tackled later.
- Cleveland Abbe made the basic observation that weather prediction was essentially a problem of the application of hydrodynamics and thermodynamics to the atmosphere. He detailed his observations in the 1901 paper The physical basis of long-range weather forecasts. But this was more an identification of the general reference class of models to use than a concrete model of how to predict the weather.
- Vilhelm Bjerknes, in 1904, set down a two-step plan for rational forecasting: a diagnostic step, where the initial state of the atmosphere is determined using observations, and a prognostic step, where the laws of motion are used to calculate the evolution of the system over time. He even identified most of the relevant equations needed to compute the evolution of the system, but he didn't try to prepare his ideas for actual practical use.
- Lewis Fry Richardson published in 1922 a detailed description of how to predict the weather, and applied his model to an attempted 6-hour forecast that took him 6 weeks to compute by hand. His forecast was off by a huge margin, but he was still convinced that the model was broadly correct, and with enough data and computing power, it could produce useful predictions.
It's interesting that scientists such as Richardson were so confident of their approach despite its failure to make useful predictions. The confidence arguably stemmed from the fact that the basic equations of physics that the model relied on were indubitably true. It's not surprising that Richardson's confidence wasn't widely shared. What's perhaps more surprising is that enough people were convinced by the approach that, when the world's first computers were made, weather prediction was viewed as a useful initial use of these computers. How they were able to figure out that this approach could bear fruit is related to questions I raised in my paradigm shifts in forecasting post.
Could Richardson have fixed the model and made correct predictions by hand? With the benefit of hindsight, it turns out that if he'd applied a standard procedure to tweak the original data he worked with, he would have been able to make decent predictions by hand. But this is easier seen in hindsight, when we have the benefit of being able to try out many different tweaks of the algorithm and compare their performance in real time (more on this point below).
The first successful computer-based numerical weather prediction
In the mid-1930s, John von Neumann, one of the key figures in modern computing, stumbled across weather prediction and identified it as an ideal problem suited to computers: it required a huge amount of calculation using clearly defined algorithms from measured initial data. An initiative supported by von Neumann led to the meteorologist Jule Charney getting interested in the problem. In 1950, a team led by Charney came up with a complete numerical algorithm for weather prediction, building on and addressing some computational issues in Richardson's original algorithm. This was then implemented on the ENIAC, the only computer available at the time. The simulation had a time ratio of 1: it took 24 hours to simulate 24 hours of weather. Charney called it a vindication of the vision of Lewis Fry Richardson.
Progress since then: the interplay of computational and theoretical
Since the 1950 ENIAC implementation of weather forecasting, weather forecasting has improved slowly and steadily. The bulk of the progress has been through access to faster computing power. Theoretical models have also improved. However, these improvements are not cleanly separable. The ability to run faster simulations computationally allows for quicker testing and comparison of different algorithms, and experimental adjustment to make them work faster and better. In this case, one-time access to better computational resources can lead to long-term improvements in the algorithms used.
Through a series of diagrams, this article will walk through key concepts in Nick Bostrom’s Superintelligence. The book is full of heavy content, and though well written, its scope and depth can make it difficult to grasp the concepts and mentally hold them together. The motivation behind making these diagrams is not to repeat an explanation of the content, but rather to present the content in such a way that the connections become clear. Thus, this article is best read and used as a supplement to Superintelligence.
Note: Superintelligence is now available in the UK. The hardcover is coming out in the US on September 3. The Kindle version is already available in the US as well as the UK.
Roadmap: there are two diagrams, both presented with an accompanying description. The two diagrams are combined into one mega-diagram at the end.
Figure 1: Pathways to Superintelligence
Figure 1 displays the five pathways toward superintelligence that Bostrom describes in chapter 2 and returns to in chapter 14 of the text. According to Bostrom, brain-computer interfaces are unlikely to yield superintelligence. Biological cognition, i.e., the enhancement of human intelligence, may yield a weak form of superintelligence on its own. Additionally, improvements to biological cognition could feed back into driving the progress of artificial intelligence or whole brain emulation. The arrows from networks and organizations likewise indicate technologies feeding back into AI and whole brain emulation development.
Artificial intelligence and whole brain emulation are two pathways that can lead to fully realized superintelligence. Note that neuromorphic is listed under artificial intelligence, but an arrow connects from whole brain emulation to neuromorphic. In chapter 14, Bostrom suggests that neuromorphic is a potential outcome of incomplete or improper whole brain emulation. Synthetic AI includes all the approaches to AI that are not neuromorphic; other terms that have been used are algorithmic or de novo AI.
There has been some talk of a lack of content being posted to Less Wrong, so I decided to start a series on various experiments that I've tried and what I've learned from them as I believe that experimentation is key to being a rationalist. My first few posts will be adapted from content I've written for /r/socialskills, but as Less Wrong has a broader scope I plan to post some original content too. I hope that this post will encourage other people to share detailed descriptions of the experiments that they have tried as I believe that this is much more valuable than a list of lessons posted outside of the context in which they were learned. If anyone has already posted any similar posts, then I would really appreciate any links.
I used to have a lot of trouble in conversation thinking of things to say. I wanted to be a more interesting person and I noticed that my brother uses his knowledge of a broad range of topics to engage people in conversations, so I wanted to do the same.
I was drawn quite quickly towards facts because of how quickly they can be read. If a piece of trivia takes 10 seconds to read, then you can read 360 in an hour. If only 5% are good, then that's still 18 usable facts per hour. Articles are longer, but have significantly higher chances of teaching you something. It seemed like you should be able to prevent ever running out of things to talk about with a reasonable investment of time. It didn't quite work out this way, but this was the idea.d
Another motivation was that I have always valued intelligence and learning more information made me feel good about myself.
Today I learned: #1 recommended source
The straight dope: Many articles in the archive are quite interesting, but I unsubscribed because I found the more recent ones boring
Cracked: Not the most reliable source and can be a huge time sink, but occasionally there are articles there that will give you 6 or 7 interesting facts in one go
Dr Karl: Science blog
I read through the top 1000 links on Today I learned, the entire archive of the straight dope, maybe half of damn interesting and now I know, half of Karl and all the mythbusters results up to about a year or two ago. We are pretty much talking about months of solid reading.
You probably guessed it, but my return on investment wasn't actually that great. I tended to consume this trivia in ridiculously huge batches because by reading all this information I at least felt like I was doing something. If someone came up to me and asked me for a random piece of trivia - I actually don't have that much that I can pull out. It's actually much easier if someone asks about a specific topic, but there's still not that much I can access.
To test my knowledge I decided to pick the first three topics that came into my head and see how much random trivia I could remember about each. As you can see, the results were rather disappointing:
- Cats can survive falls from a higher number of floors better than a lower number of falls because they have a low terminal velocity and more time to orient themselves to ensure they land on their feet
- House cats can run faster than Ursain bolt
- If you are attacked by a dog the best strategy is to shove your hand down its mouth and attack the neck with your other hand
- Dogs can be trained to drive cars (slowly)
- There is such a thing as the world's ugliest dog competition
- Cheese is poisonous to rats
- The existence of rat kings - rats who got their tails stuck together
Knowing these facts does occasionally help me by giving me something interesting to say when I wouldn't have otherwise had it, but quite often I want to quote one of these facts, but I can't quite remember the details. It's hard to quantify how much this helps me though. There have been a few times when I've been able to get someone interested in a conversation that they wouldn't have otherwise been interested in, but I can also go a dozen conversations without quoting any of these facts. No-one has ever gone "Wow, you know so many facts!". Another motivation I had was that being knowledgeable makes me feel good about myself. I don't believe that there was any significant impact in this regard either - I don't have a strong self-concept of myself as someone who is particularly knowledgeable about random facts. Overall this experiment was quite disappointing given the high time investment.
While the social benefits have been extremely minimal, learning all of these facts has expanded my world view.
- I had no idea how crazy nature was: most surprising fact I've learned is that Bluebottles are multiple organisms
- Some of the stuff that the CIA got up to is unbelievable - you'd almost think it came from a conspiracy theorist
- There are many things that you take for granted, but when you think about it, are actually amazing coincidences - moon and sun appearing around the same size
- You don't want to get on the wrong side of the law as it can be horribly unjust
- The government is pretty careless with nuclear weapons. If we can't trust the government can't look after nukes, what can we trust them to look after?
While this technique worked poorly for me, there are many changes that I could have made that might have improved effectiveness.
- Lower batch sizes: when you read too many facts in one go you get tired and it all tends to blur together
- Notes: I started making notes of the most interesting facts I was finding using Evernote. I regularly add new facts, but only very occasionally go back and actually look them up. I was trying to review the new facts that I learned regularly, but I got busy and just fell out of the habit. Perhaps I could have a separate list for the most important facts I learn every week and this would be less effort?
- Rereading saved facts: I did a complete reread through my saved notes once. I still don't think that I have a very good recall - probably related to batch size!
- Spaced repetition: Many people claim that this make memorisation easy
- Thoughtback: This is a lighter alternative to spaced repetition - it gives you notifications on your phone of random facts - about one per day
- Talking to other people: This is a very effective method for remembering facts. That vast majority of facts that I've shared with other people, I still remember. Perhaps I should create a list of facts that I want to remember and then pick one or two at a time to share with people. Once I've shared them a few times, I could move on to the next fact
- Blog posts - perhaps if I collected some of my related facts into blog posts, having to decide which to include and which to not include my help me remember these facts more
- Pausing: I find that I am more likely to remember things if I pause and think that this is something that I want to remember. I was trying to build that habit, but I didn't succeed in this
- Other memory techniques: brains are better at remembering things if you process them. So if you want to remember the story where thieves stole a whole beach in one night, try to picture the beach and then the shock when some surfer turns up and all the sand is gone. Try to imagine what you'd need to pull that off.
I believe that if I had spread my reading out over a greater period of time, then the cost would have been justified. Part of this would have been improved retention and part of this would have been having a new interesting fact to use in conversation every week that I know I hadn't told anyone else before.
The social benefits are rather minimal, so it would be difficult to get them to match up with the time invested. I believe that with enough refinement, someone could improve their effectiveness to the stage where the benefits matched up with the effort invested, but broadening one's knowledge will always be the primary advantage gained.
I asked this question on Facebook here, and got some interesting answers, but I thought it would be interesting to ask LessWrong and get a larger range of opinions. I've modified the list of options somewhat.
What explains why some classification, prediction, and regression methods are common in academic social science, while others are common in machine learning and data science?
For instance, I've encountered probit models in some academic social science, but not in machine learning.
The main algorithms that I believe are common to academic social science and machine learning are the most standard regression algorithms: linear regression and logistic regression.
Possibilities that come to mind:
(0) My observation is wrong and/or the whole question is misguided.
(1) The focus in machine learning is on algorithms that can perform well on large data sets. Thus, for instance, probit models may be academically useful but don't scale up as well as logistic regression.
(2) Academic social scientists take time to catch up with new machine learning approaches. Of the methods mentioned above, random forests and support vector machines was introduced as recently as 1995. Neural networks are older but their practical implementation is about as recent. Moreover, the practical implementations of these algorithm in the standard statistical softwares and packages that academics rely on is even more recent. (This relates to point (4)).
(3) Academic social scientists are focused on publishing papers, where the goal is generally to determine whether a hypothesis is true. Therefore, they rely on approaches that have clear rules for hypothesis testing and for establishing statistical significance (see also this post of mine). Many of the new machine learning approaches don't have clearly defined statistical approaches for significance testing. Also, the strength of machine learning approaches is more exploratory than testing already formulated hypotheses (this relates to point (5)).
(4) Some of the new methods are complicated to code, and academic social scientists don't know enough mathematics, computer science, or statistics to cope with the methods (this may change if they're taught more about these methods in graduate school, but the relative newness of the methods is a factor here, relating to (2)).
(5) It's hard to interpret the results of fancy machine learning tools in a manner that yields social scientific insight. The results of a linear or logistic regression can be interpreted somewhat intuitively: the parameters (coefficients) associated with individual features describe the extent to which those features affect the output variable. Modulo issues of feature scaling, larger coefficients mean those features play a bigger role in determining the output. Pairwise and listwise R^2 values provide additional insight on how much signal and noise there is in individual features. But if you're looking at a neural network, it's quite hard to infer human-understandable rules from that. (The opposite direction is not too hard: it is possible to convert human-understandable rules to a decision tree and then to use a neural network to approximate that, and add appropriate fuzziness. But the neural networks we obtain as a result of machine learning optimization may be quite different from those that we can interpret as humans). To my knowledge, there haven't been attempts to reinterpret neural network results in human-understandable terms, though Sebastian Kwiatkowski's comment on my Facebook post points to an example where the results of naive Bayes and SVM classifiers for hotel reviews could be translated into human-understandable terms (namely, reviews that mentioned physical aspects of the hotel, such as "small bedroom", were more likely to be truthful than reviews that talked about the reasons for the visit or the company that sponsored the visit). But Kwiatkowski's comment also pointed to other instances where the machine's algorithms weren't human-interpretable.
What's your personal view on my main question, and on any related issues?
This post explores the question: how strongly should we defer to predictions and forecasts made by people with domain expertise? I'll assume that the domain expertise is legitimate, i.e., the people with domain expertise do have a lot of information in their minds that non-experts don't. The information is usually not secret, and non-experts can usually access it through books, journals, and the Internet. But experts have more information inside their head, and may understand it better. How big an advantage does this give them in forecasting?
Tetlock and expert political judgment
In an earlier post on historical evaluations of forecasting, I discussed Philip E. Tetlock's findings on expert political judgment and forecasting skill, and summarized his own article for Cato Unbound co-authored with Dan Gardner that in turn summarized the themes of the book:
- The average expert’s forecasts were revealed to be only slightly more accurate than random guessing—or, to put more harshly, only a bit better than the proverbial dart-throwing chimpanzee. And the average expert performed slightly worse than a still more mindless competition: simple extrapolation algorithms that automatically predicted more of the same.
- The experts could be divided roughly into two overlapping yet statistically distinguishable groups. One group (the hedgehogs) would actually have been beaten rather soundly even by the chimp, not to mention the more formidable extrapolation algorithm. The other (the foxes) would have beaten the chimp and sometimes even the extrapolation algorithm, although not by a wide margin.
- The hedgehogs tended to use one analytical tool in many different domains; they preferred keeping their analysis simple and elegant by minimizing “distractions.” These experts zeroed in on only essential information, and they were unusually confident—they were far more likely to say something is “certain” or “impossible.” In explaining their forecasts, they often built up a lot of intellectual momentum in favor of their preferred conclusions. For instance, they were more likely to say “moreover” than “however.”
- The foxes used a wide assortment of analytical tools, sought out information from diverse sources, were comfortable with complexity and uncertainty, and were much less sure of themselves—they tended to talk in terms of possibilities and probabilities and were often happy to say “maybe.” In explaining their forecasts, they frequently shifted intellectual gears, sprinkling their speech with transition markers such as “although,” “but,” and “however.”
- It's unclear whether the performance of the best forecasters is the best that is in principle possible.
- This widespread lack of curiosity—lack of interest in thinking about how we think about possible futures—is a phenomenon worthy of investigation in its own right.
Tetlock has since started The Good Judgment Project (website, Wikipedia), a political forecasting competition where anybody can participate, and with a reputation of doing a much better job at prediction than anything else around. Participants are given a set of questions and can basically collect freely available online information (in some rounds, participants were given additional access to some proprietary data). They then use that to make predictions. The aggregate predictions are quite good. For more information, visit the website or see the references in the Wikipedia article. In particular, this Economist article and this Business Insider article are worth reading. (I discussed the GJP and other approaches to global political forecasting in this post).
So at least in the case of politics, it seems that amateurs, armed with basic information plus the freedom to look around for more, can use "fox-like" approaches and do a better job of forecasting than political scientists. Note that experts still do better than ignorant non-experts who are denied access to information. But once you have basic knowledge and are equipped to hunt more down, the constraining factor does not seem to be expertise, but rather, the approach you use (fox-like versus hedgehog-like). This should not be taken as a claim that expertise is irrelevant or unnecessary to forecasting. Experts play an important role in expanding the scope of knowledge and methodology that people can draw on to make their predictions. But the experts themselves, as people, do not have a unique advantage when it comes to forecasting.
Tetlock's research focused on politics. But the claim that the fox-hedgehog distinction turns out to be a better prediction of forecasting performance than the level of expertise is a general one. How true is this claim in domains other than politics? Domains such as climate science, economic growth, computing technology, or the arrival of artificial general intelligence?
Armstrong and Green again
J. Scott Armstrong is a leading figure in the forecasting community. Along with Kesten C. Green, he penned a critique of the forecasting exercises in climate science in 2007, with special focus on the IPCC reports. I discussed the critique at length in my post on the insularity critique of climate science. Here, I quote a part from the introduction of the critique that better explains the general prior that Armstrong and Green claim to be bringing to the table when they begin their evaluation. Of the points they make at the beginning, two bear directly on the deference we should give to expert judgment and expert consensus:
- Unaided judgmental forecasts by experts have no value: This applies whether the opinions are expressed in words, spreadsheets, or mathematical models. It applies regardless of how much scientific evidence is possessed by the experts. Among the reasons for this are:
a) Complexity: People cannot assess complex relationships through unaided observations.
b) Coincidence: People confuse correlation with causation.
c) Feedback: People making judgmental predictions typically do not receive unambiguous feedback they can use to improve their forecasting.
d) Bias: People have difficulty in obtaining or using evidence that contradicts their initial beliefs. This problem is especially serious for people who view themselves as experts.
- Agreement among experts is only weakly related to accuracy: This is especially true when the experts communicate with one another and when they work together to solve problems, as is the case with the IPCC process.
Armstrong and Green later elaborate on these claims, referencing Tetlock's work. (Note that I have removed the parts of the section that involve direct discussion of climate-related forecasts, since the focus here is on the general question of how much deference to show to expert consensus).
Many public policy decisions are based on forecasts by experts. Research on persuasion has shown that people have substantial faith in the value of such forecasts. Faith increases when experts agree with one another. Our concern here is with what we refer to as unaided expert judgments. In such cases, experts may have access to empirical studies and other information, but they use their knowledge to make predictions without the aid of well-established forecasting principles. Thus, they could simply use the information to come up with judgmental forecasts. Alternatively, they could translate their beliefs into mathematical statements (or models) and use those to make forecasts.
Although they may seem convincing at the time, expert forecasts can make for humorous reading in retrospect. Cerf and Navasky’s (1998) book contains 310 pages of examples, such as Fermi Award-winning scientist John von Neumann’s 1956 prediction that “A few decades hence, energy may be free”. [...] The second author’s review of empirical research on this problem led him to develop the “Seer-sucker theory,” which can be stated as “No matter how much evidence exists that seers do not exist, seers will find suckers” (Armstrong 1980). The amount of expertise does not matter beyond a basic minimum level. There are exceptions to the Seer-sucker Theory: When experts get substantial well-summarized feedback about the accuracy of their forecasts and about the reasons why their forecasts were or were not accurate, they can improve their forecasting. This situation applies for short-term (up to five day) weather forecasts, but we are not aware of any such regime for long-term global climate forecasting. Even if there were such a regime, the feedback would trickle in over many years before it became useful for improving forecasting.
Research since 1980 has provided much more evidence that expert forecasts are of no value. In particular, Tetlock (2005) recruited 284 people whose professions included, “commenting or offering advice on political and economic trends.” He asked them to forecast the probability that various situations would or would not occur, picking areas (geographic and substantive) within and outside their areas of expertise. By 2003, he had accumulated over 82,000 forecasts. The experts barely if at all outperformed non-experts and neither group did well against simple rules. Comparative empirical studies have routinely concluded that judgmental forecasting by experts is the least accurate of the methods available to make forecasts. For example, Ascher (1978, p. 200), in his analysis of long-term forecasts of electricity consumption found that was the case.
Note that the claims that Armstrong and Green make are in relation to unaided expert judgment, i.e., expert judgment that is not aided by some form of assistance or feedback that promotes improved forecasting. (One can argue that expert judgment in climate science is not unaided, i.e., that the critique is mis-applied to climate science, but whether that is the case is not the focus of my post). While Tetlock's suggestion to be more fox-like, Armstrong and Green recommend the use of their own forecasting principles, as encoded in their full list of principles and described on their website.
A conflict of intuitions, and an attempt to resolve it
I have two conflicting intuitions here. I like to use the majority view among experts as a reasonable Bayesian prior to start with, that I might then modify based on further study. The relevant question here is who the experts are. Do I defer to the views of domain experts, who may know little about the challenges of forecasting, or do I defer to the views of forecasting experts, who may know little of the domain but argue that domain experts who are not following good forecasting principles do not have any advantage over non-experts?
I think the following heuristics are reasonable starting points:
- In cases where we have a historical track record of forecasts, we can use that to evaluate the experts and non-experts. For instance, I reviewed the track record of survey-based macroeconomic forecasts, thanks to a wealth of recorded data on macroeconomic forecasts by economists over the last few decades. (Unfortunately, these surveys did not include corresponding data on layperson opinion).
- The faster the feedback from making a forecast to knowing whether it's right, the more likely it is that experts would have learned how to make good forecasts.
- The more central forecasting is to the overall goals of the domain, the more likely people are to get it right. For instance, forecasting is a key part of weather and climate science. But forecasting progress on mathematical problems has a negligible relation with doing mathematical research.
- Ceteris paribus, if experts are clearly recording their forecasts and the reasons behind them, and systematically evaluating the performance on past forecasts, that should be taken as (weak) evidence in favor of the experts' views being taken more seriously (even if we don't have enough of a historical track record to properly calibrate forecast accuracy). However, if they simply make forecasts but then fail to review their past history of forecasts, this may be taken as being about as bad as not forecasting at all. And in cases that the forecasts were bold, failed miserably, and yet the errors were not acknowledged, this should be taken as being considerably worse than not forecasting at all.
- A weak inside view of the nature of domain expertise can give some idea of whether expertise should generally translate to better forecasting skill. For instance, even a very weak understanding of physics will tell us that physicists are no more likely to determine whether a coin toss will yield heads or tails, even though the fate of the coin is determined by physics. Similarly, with the exception of economists who specialize in the study of macroeconomic indicators, one wouldn't expect economists to be able to forecast macroeconomic indicators better than most moderately economically informed people.
My first thought was that the more politicized a field, the less reliable any forecasts coming out of it. I think there are obvious reasons for that view, but there are also countervailing considerations.
The main claimed danger of politicization is groupthink and lack of openness to evidence. It could even lead to suppression, misrepresentation, or fabrication of evidence. Quite often, however, we see these qualities in highly non-political fields. People believe that certain answers are the right ones. Their political identity or ego is not attached to it. They just have high confidence that that answer is correct, and when the evidence they have does not match up, they think there is a problem with the evidence. Of course, if somebody does start challenging the mainstream view, and the issue is not quickly resolved either way, it can become politicized, with competing camps of people who hold the mainstream view and people who side with the challengers. Note, however, that the politicization has arguably reduced the aggregate amount of groupthink in the field. Now that there are two competing camps rather than one received wisdom, new people can examine evidence and better decide which camp is more on the side of truth. People in both camps, now that they are competing, may try to offer better evidence that could convince the undecideds or skeptics. So "politicization" might well improve the epistemic situation (I don't doubt that the opposite happens quite often). Examples of such politicization might be the replacement of geocentrism by heliocentrism, the replacement of creationism by evolution, and the replacement of Newtonian mechanics by relativity and/or quantum mechanics. In the first two cases, religious authorities pushed against the new idea, even though the old idea had not been a "politicized" tenet before the competing claims came along. In the case of Newtonian and quantum mechanics, the debate seems to have been largely intra-science, but quantum mechanics had its detractors, including Einstein, famous for the "God does not play dice" quip. (This post on Slate Star Codex is somewhat related).
The above considerations aren't specific to forecasting, and they apply even for assertions that fall squarely within the domain of expertise and require no forecasting skill per se. The extent to which they apply to forecasting problems is unclear. It's unclear whether most domains have any significant groupthink in favor of particular forecasts. In fact, in most domains, forecasts aren't really made or publicly recorded at all. So concerns of groupthink in a non-politicized scenario may not apply to forecasting. Perhaps the problem is the opposite: forecasts are so unimportant in many domains that the forecasts offered by experts are almost completely random and hardly informed in a systematic way by their expert knowledge. Even in such situations, politicization can be helpful, in so far as it makes the issue more salient and might prompt individuals to give more attention to trying to figure out which side is right.
The case of forecasting AI progress
I'm still looking at the case of forecasting AI progress, but for now, I'd like to point people to Luke Muehlhauser's excellent blog post from May 2013 discussing the difficulty with forecasting AI progress. Interestingly, he makes many points similar to those I make here. (Note: Although I had read the post around the time it was published, I hadn't read it recently until I finished drafting the rest of my current post. Nonetheless, my views can't be considered totally independent of Luke's because we've discussed my forecasting contract work for MIRI).
Should we expect experts to be good at predicting AI, anyway? As Armstrong & Sotala (2012) point out, decades of research on expert performance2 suggest that predicting the first creation of AI is precisely the kind of task on which we should expect experts to show poor performance — e.g. because feedback is unavailable and the input stimuli are dynamic rather than static. Muehlhauser & Salamon (2013) add, “If you have a gut feeling about when AI will be created, it is probably wrong.”
On the other hand, Tetlock (2005) points out that, at least in his large longitudinal database of pundit’s predictions about politics, simple trend extrapolation is tough to beat. Consider one example from the field of AI: when David Levy asked 1989 World Computer Chess Championship participants when a chess program would defeat the human World Champion, their estimates tended to be inaccurately pessimistic,8 despite the fact that computer chess had shown regular and predictable progress for two decades by that time. Those who forecasted this event with naive trend extrapolation (e.g. Kurzweil 1990) got almost precisely the correct answer (1997).
Looking for thoughts
I'm particularly interested in thoughts from people on the following fronts:
- What are some indicators you use to determine the reliability of forecasts by subject matter experts?
- How do you resolve the conflict of intuitions between deferring to the views of domain experts and deferring to the conclusion that forecasters have drawn about the lack of utility of domain experts' forecasts?
- In particular, what do you think of the way that "politicization" affects the reliability of forecasts?
- Also, how much value do you assign to agreement between experts when judging how much trust to place in expert forecasts?
- Comments that elaborate on these questions or this general topic within the context of a specific domain or domains would also be welcome.
One of many problems with the contemporary university system is that the same institutions that educate students also give them their degrees and grades. This obviously creates massive incentives for grade inflation and lowering of standards. Giving a thorough education requires hard work not only from students but also from the professors. In the absence of an independent body that tests that the students actually have learnt what they are supposed to have learnt, many professors spend as little time as possible at teaching, giving the students light workloads (something most of them of course happily accept). The faculty/student non-aggression pact is an apt term for this.
To see how absurd this system is, imagine that we would have the same system for drivers' licenses: that the driving schools that train prospective drivers also tested them and issued their drivers' licenses. In such a system, people would most probably chose the most lenient schools, leading to a lowering of standards. For fear of such a lowering of standards, prospective drivers are in many countries (I would guess universally but do not know that for sure) tested by government bodies.
Presumably, the main reason for this is that governments really care about the lowering of drivers' standards. Ensuring that all drivers are appropriately educated (i.e. is seen as very important. By contrast, the governments don't care that much about the lowering of academic standards. If they would, they would long ago have replaced a present grading/certification system with one where students are tested by independent bodies, rather than by the universities themselves.
This is all the more absurd given how much politicians in most countries talk about the importance of education. More often than not they talk about education, especially higher education, as a panacea to cure for all ills. However, if we look at the politicians' actions, rather than at their words, it doesn't seem like they actually do think it's quite as important as they say to ensure that the population is well-educated.
Changing the system for certifying students is important not the least in order to facilitate inventions in higher education. The present system discriminates in favour of traditional campus courses, which are both expensive and fail to teach the students as much as they should. I'm not saying that online courses, and other non-standard courses, are necessarily better or more cost-effective, but they should get the chance to prove that they are.
The system is of course hard to change, since there are lots of vested interests that don't want it to change. This is nicely illustrated by the reactions to a small baby-step towards the system that I'm envisioning that OECD is presently trying to take. Financial Times (which has a paywall, unfortunately) reports that OECD are attempting to introduce Pisa-style tests to compare students from higher education institutions around the world. Third year students would be tested on critical thinking, analytical reasoning, problem solving and written communcation. There would also be discipline-specific trials for economics and engineering.
These attempts have, however, not progressed because of resistance from some universities and member countries. OECD says that the resistance often comes from "the most prestigious institutions, because they have very little to win...and a lot to lose". In contrast, "the greatest supporters are the ones that add the greatest value...many of the second-tier institutes are actually a lot better and they're very keen to get on a level playing field."
I figure that if OECD get enough universities on board, they could start implementing the system without the obstructing top universities. They could also allow students from those universities to take the tests independently. If employers started taking these tests seriously, students would have every reason to take them even if their universities haven't joined. Slowly, these presumably more objective tests, or others like them, would become more important at the cost of the universities' inflated grades. People often try to change institutions or systems directly, but sometimes it is more efficient to build alternative systems, show that their useful to the relevant actors, and start out-competing the dominant system (as discussed in these comments).
This is a somewhat long and rambling post. Apologies for the length. I hope the topic and content are interesting enough for you to forgive the meandering presentation.
I blogged about the scenario planning method a while back, where I linked to many past examples of scenario planning exercises. In this post, I take a closer look at scenario analysis in the context of understanding the possibilities for the unfolding of technological progress over the next 10-15 years. Here, I will discuss some predetermined elements and critical uncertainties, offer my own scenario analysis, and then discuss scenario analyses by others.
Remember: it is not the purpose of scenario analysis to identify a set of mutually exclusive and collectively exhaustive outcomes. In fact, usually, the real-world outcome has some features from two or more of the scenarios considered, with one scenario dominating somewhat. As I noted in my earlier post:
The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.
The predetermined element: the imminent demise of Moore's law "as we know it"
As Steven Schnaars noted in Megamistakes (discussed here), forecasts of technological progress in most domains have been overoptimistic, but in the domain of computing, they've been largely spot-on, mostly because the raw technology has improved quickly. The main reason has been Moore's law, and a couple other related laws, that have undergirded technological progress. But now, the party is coming to an end! The death of Moore's law (as we know it) is nigh, and there are significant implications for the future of computing.
Moore's law refers to many related claims about technological progress. Some forms of this technological progress have already stalled. Other forms are slated to stall in the near future, barring unexpected breakthroughs. These facts about Moore's law form the backdrop for all our scenario planning.
The critical uncertainty arises in how industry will respond to the prospect of Moore's law death. Will there be a doubling down on continued improvement at the cutting edge? Will the battle focus on cost reductions? Or will we have neither cost reduction nor technological improvement? What sort of pressure will hardware stagnation put on software?
Now, onto a description of the different versions of Moore's law (slightly edited version of information from Wikipedia):
Density at minimum cost per transistor. This is the formulation given in Moore's 1965 paper. It is not just about the density of transistors that can be achieved, but about the density of transistors at which the cost per transistor is the lowest. As more transistors are put on a chip, the cost to make each transistor decreases, but the chance that the chip will not work due to a defect increases. In 1965, Moore examined the density of transistors at which cost is minimized, and observed that, as transistors were made smaller through advances in photolithography, this number would increase at "a rate of roughly a factor of two per year".
Dennard scaling. This suggests that power requirements are proportional to area (both voltage and current being proportional to length) for transistors. Combined with Moore's law, performance per watt would grow at roughly the same rate as transistor density, doubling every 1–2 years. According to Dennard scaling transistor dimensions are scaled by 30% (0.7x) every technology generation, thus reducing their area by 50%. This reduces the delay by 30% (0.7x) and therefore increases operating frequency by about 40% (1.4x). Finally, to keep electric field constant, voltage is reduced by 30%, reducing energy by 65% and power (at 1.4x frequency) by 50%. Therefore, in every technology generation transistor density doubles, circuit becomes 40% faster, while power consumption (with twice the number of transistors) stays the same.
So how are each of these faring?
- Transistors per integrated circuit: At least in principle, this can continue for a decade or so. The technological ideas exist to publish transistor sizes down from the current values of 32 nm and 28 nm all the way down to 7 nm.
- Density at minimum cost per transistor. This is probably stopping around now. There is good reason to believe that, barring unexpected breakthroughs, the transistor size for which we have minimum cost per transistor shall not go down below 28 nm. There may still be niche applications that benefit from smaller transistor sizes, but there will be no overwhelming economic case to switch production to smaller transistor sizes (i.e., higher densities).
- Dennard scaling. This broke down around 2005-2007. So for approximately a decade, we've essentially seen continued miniaturization but without any corresponding improvement in processor speed or performance per watt. There have been continued overall improvements in energy efficiency of computing, but not through this mechanism. The absence of automatic speed improvements has led to increased focus on using greater parallelization (note that the miniaturization means more parallel processors can be packed in the same space, so Moore's law is helping in this other way). In particular, there has been an increased focus on multicore processors, though there may be limits to how far that can take us too.
Moore's law isn't the only law that is slated to end. Other similar laws, such as Kryder's law (about the cost of hard disk space) may also end in the near future. Koomey's law on energy efficiency may also stall, or might continue to hold but through very different mechanisms compared to the ones that have driven it so far.
Some discussions that do not use explicit scenario analysis
The quotes below are to give a general idea of what people seem to generally agree on, before we delve into different scenarios.
We have been hearing about the imminent demise of Moore's Law quite a lot recently. Most of these predictions have been targeting the 7nm node and 2020 as the end-point. But we need to recognize that, in fact, 28nm is actually the last node of Moore's Law.
Summarizing all of these factors, it is clear that -- for most SoCs -- 28nm will be the node for "minimum component costs" for the coming years. As an industry, we are facing a paradigm shift because dimensional scaling is no longer the path for cost scaling. New paths need to be explored such as SOI and monolithic 3D integration. It is therefore fitting that the traditional IEEE conference on SOI has expanded its scope and renamed itself as IEEE S3S: SOI technology, 3D Integration, and Subthreshold Microelectronics.
Computer scientist Moshe Yardi writes:
So the real question is not when precisely Moore's Law will die; one can say it is already a walking dead. The real question is what happens now, when the force that has been driving our field for the past 50 years is dissipating. In fact, Moore's Law has shaped much of the modern world we see around us. A recent McKinsey study ascribed "up to 40% of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements." Indeed, the demise of Moore's Law is one reason some economists predict a "great stagnation" (see my Sept. 2013 column).
"Predictions are difficult," it is said, "especially about the future." The only safe bet is that the next 20 years will be "interesting times." On one hand, since Moore's Law will not be handing us improved performance on a silver platter, we will have to deliver performance the hard way, by improved algorithms and systems. This is a great opportunity for computing research. On the other hand, it is possible that the industry would experience technological commoditization, leading to reduced profitability. Without healthy profit margins to plow into research and development, innovation may slow down and the transition to the post-CMOS world may be long, slow, and agonizing.
However things unfold, we must accept that Moore's Law is dying, and we are heading into an uncharted territory.
"I drive a 1964 car. I also have a 2010. There's not that much difference -- gross performance indicators like top speed and miles per gallon aren't that different. It's safer, and there are a lot of creature comforts in the interior," said Nvidia Chief Scientist Bill Dally. If Moore's Law fizzles, "We'll start to look like the auto industry."
Three critical uncertainties: technological progress, demand for computing power, and interaction with software
Uncertainty #1: Technological progress
Moore's law is dead, long live Moore's law! Even if Moore's law as originally stated is no longer valid, there are other plausible computing advances that would preserve the spirit of the law.
Minor modifications of current research (as described in EETimes) include:
- Improvements in 3D circuit design (Wikipedia), so that we can stack multiple layers of circuits one on top of the other, and therefore pack more computing power per unit volume.
- Improvements in understanding electronics at the nanoscale, in particular understanding subthreshold leakage (Wikipedia) and how to tackle it.
Then, there are possibilities for totally new computing paradigms. These have fairly low probability, and are highly unlikely to become commercially viable within 10-15 years. Each of these offers an advantage over currently available general-purpose computing only for special classes of problems, generally those that are parallelizable in particular ways (the type of parallelizability needed differs somewhat between the computing paradigms).
- Quantum computing (Wikipedia) (speeds up particular types of problems). Quantum computers already exist, but the current ones can tackle only a few qubits. Currently, the best known quantum computers in action are those maintained at the Quantum AI Lab (Wikipedia) run jointly by Google, NASA. and USRA. It is currently unclear how to manufacture quantum computers with a larger number of qubits. It's also unclear how the cost will scale in the number of qubits. If the cost scales exponentially in the number of qubits, then quantum computing will offer little advantage over classical computing. Ray Kurzweil explains this as follows:
A key question is: how difficult is it to add each additional qubit? The computational power of a quantum computer grows exponentially with each added qubit, but if it turns out that adding each additional qubit makes the engineering task exponentially more difficult, we will not be gaining any leverage. (That is, the computational power of a quantum computer will be only linearly proportional to the engineering difficulty.) In general, proposed methods for adding qubits make the resulting systems significantly more delicate and susceptible to premature decoherence.
Kurzweil, Ray (2005-09-22). The Singularity Is Near: When Humans Transcend Biology (Kindle Locations 2152-2155). Penguin Group. Kindle Edition.
- DNA computing (Wikipedia)
- Other types of molecular computing (Technology Review featured story from 2000, TR story from 2010)
- Spintronics (Wikipedia): The idea is to store information using the spin of the electron, a quantum property that is binary and can be toggled at zero energy cost (in principle). The main potential utility of spintronics is in data storage, but it could potentially help with computation as well.
- Optical computing aka photonic computing (Wikipedia): This uses beams of photons that store the relevant information that needs to be manipulated. Photons promise to offer higher bandwidth than electrons, the tool used in computing today (hence the name electronic computing).
Uncertainty #2: Demand for computing
Even if computational advances are possible in principle, the absence of the right kind of demand can lead to a lack of financial incentive to pursue the relevant advances. I discussed the interaction between supply and demand in detail in this post.
As that post discussed, demand for computational power at the consumer end is probably reaching saturation. The main source of increased demand will now be companies that want to crunch huge amounts of data in order to more efficiently mine data for insight and offer faster search capabilities to their users. The extent to which such demand grows is uncertain. In principle, the demand is unlimited: the more data we collect (including "found data" that will expand considerably as the Internet of Things grows), the more computational power is needed to apply machine learning algorithms to the data. Since the complexity of many machine learning algorithms grows at least linearly (and in some cases quadratically or cubically) in the data, and the quantity of data itself will probably grow superlinearly, we do expect a robust increase in demand for computing.
Uncertainty #3: Interaction with software
Much of the increased demand for computing, as noted above, does not arise so much from a need for raw computing power by consumers, but a need for more computing power to manipulate and glean insight from large data sets. While there has been some progress with algorithms for machine learning and data mining, the fields are probably far from mature. So an alternative to hardware improvements is improvements in the underlying algorithms. In addition to the algorithms themselves, execution details (such as better use of parallel processing capabilities and more efficient use of idle processor capacity) can also yield huge performance gains.
This might be a good time to note a common belief about software and why I think it's wrong. We often tend to hear of software bloat, and some people subscribe to Wirth's law, the claim that software is getting slower more quickly than hardware is getting faster. I think that there are some softwares that have gotten feature-bloated over time, largely because there are incentives to keep putting out new editions that people are willing to pay money for, and Microsoft Word might be one case of such bloat. For the most part, though, software has been getting more efficient, partly by utilizing the new hardware better, but also partly due to underlying algorithmic improvements. This was one of the conclusions of Katja Grace's report on algorithmic progress (see also this link on progress on linear algebra and linear programming algorithms). There are a few softwares that get feature-bloated and as a result don't appear to improve over time as far as speed goes, but it's arguably the case that people's revealed preferences show that they are willing to put up with the lack of speed improvements as long as they're getting feature improvements.
Computing technology progress over the next 10-15 years: my three scenarios
- Slowdown to ordinary rates of growth of cutting-edge industrial productivity: For the last few decades, several dimensions of computing technology have experienced doublings over time periods ranging from six months to five years. With such fast doubling, we can expect price-performance thresholds for new categories of products to be reached every few years, with multiple new product categories a decade. Consider, for instance, desktops, then laptops, then smartphones, then tablets. If the doubling time reverts to the norm seen in other cutting-edge industrial sectors, namely 10-25 years, then we'd probably see the introduction of revolutionary new product categories only about once a generation. There are already some indications of a possible slowdown, and it remains to be seen whether we see a bounceback.
- Continued fast doubling: The other possibility is that the evidence for a slowdown is largely illusory, and computing technology will continue to experience doublings over timescales of less than five years. There would therefore be scope to introduce new product categories every few years.
- New computing paradigm with high promise, but requiring significant adjustment: This is an unlikely, but not impossible, scenario. Here, a new computing paradigm, such as quantum computing, reaches the realm of feasibility. However, the existing infrastructure of algorithms is ill-designed for quantum computing, and in fact, quantum computing engenders many security protocols while offering its own unbreakable ones. Making good use of this new paradigm requires a massive re-architecting of the world's computing infrastructure.
There are two broad features that are likely to be common to all scenarios:
- Growing importance of algorithms: Scenario (1): If technological progress in computing power stalls, then the pressure for improvements to the algorithms and software may increase. Scenario (2): if technological progress in computing power continues, that might only feed the hunger for bigger data. And as the size of data sets increases, asymptotic performance starts mattering more (the distinction between O(n) and O(n2) matters more when n is large). In both cases, I expect more pressure on algorithms and software, but in different ways: in the case of stalling hardware progress, the focus will be more on improving the software and making minor changes to improve the constants, whereas in the case of rapid hardware progress, the focus will be more on finding algorithms that have better asymptotic (big-oh) performance. Scenario (3): In the case of paradigm shifts, the focus will be on algorithms that better exploit the new paradigm. In all cases, there will need to be some sort of shift toward new algorithms and new code that better exploits the new situation.
- Growing importance of parallelization: Although the specifics of how algorithms will become more important varies between the scenarios, one common feature is that algorithms that can better make parallel use of large numbers of machines will become more important. We have seen parallelization grow in importance over the last 15 years, even as the computing gains for individual processors through Moore's law seems to be plateauing out, while data centers have proliferated in number. However, the full power of parallelization is far from tapped out. Again, parallelization matters for slightly different reasons in different cases. Scenario (1): A slowdown in technological progress would mean that gains in the amount of computation can largely be achieved by scaling up the number of machines. In other words, the usage of computing shifts further in a capital-intensive direction. Parallel computing is important for effective utilization of this capital (the computing resources). Scenario (2): Even in the face of rapid hardware progress, automatic big data generation will likely improve much faster than storage, communication, and bandwidth. This "big data" is too huge to store or even stream on a single machine, so parallel processing across huge clusters of machines becomes important. Scenario (3): Note also that almost all the new computing paradigms currently under consideration (including quantum computing) offer massive advantages for special types of parallelizable problems, so parallelization matters even in the case of a paradigm shift in computing.
Other scenario analyses
McKinsey carried out a scenario analysis here, focused more on the implications for the semiconductor manufacturing industry than for users of computing. The report notes the importance of Moore's law in driving productivity improvements over the last few decades:
As a result, Moore’s law has swept much of the modern world along with it. Some estimates ascribe up to 40 percent of the global productivity growth achieved during the last two decades to the expansion of information and communication technologies made possible by semiconductor performance and cost improvements.
The scenario analysis identifies four potential sources of innovation related to Moore's law:
- More Moore (scaling)
- Wafer-size increases (maximize productivity)
- More than Moore (functional diversification)
- Beyond CMOS (new technologies)
Their scenario analysis uses a 2 X 2 model, with the two dimensions under consideration being performance improvements (continue versus stop) and cost improvements (continue versus stop). The case that both performance improvements and cost improvements continue is the "good" case for the semiconductor industry. The case that both stop is the case where the industry is highly likely to get commodified, with profit margins going down and small players catching up to the big ones. In the intermediate cases (where one of the two continues and the other stops), consolidation of the semiconductor industry is likely to continue, but there is still a risk of falling demand.
The McKinsey scenario analysis was discussed by Timothy Taylor on his blog, The Conversable Economist, here.
Roland Berger carried out a detailed scenario analysis focused on the "More than Moore" strategy here.
Blegging for missed scenarios, common features and early indicators
Are there scenarios that the analyses discussed above missed? Are there some types of scenario analysis that we didn't adequately consider? If you had to do your own scenario analysis for the future of computing technology and hardware progress over the next 10-15 years, what scenarios would you generate?
As I noted in my earlier post:
The utility of scenario analysis is not merely in listing a scenario that will transpire, or a collection of scenarios a combination of which will transpire. The utility is in how it prepares the people undertaking the exercise for the relevant futures. One way it could so prepare them is if the early indicators of the scenarios are correctly chosen and, upon observing them, people are able to identify what scenario they're in and take the appropriate measures quickly. Another way is by identifying some features that are common to all scenarios, though the details of the feature may differ by scenario. We can therefore have higher confidence in these common features and can make plans that rely on them.
I already identified some features I believe to be common to all scenarios (namely, increased focus on algorithms, and increased focus on parallelization). Do you agree with my assessment that these are likely to matter regardless of scenario? Are there other such common features you have high confidence in?
If you generally agree with one or more of the scenario analyses here (mine or McKinsey's or Roland Berger's), what early indicators would you use to identify which of the enumerated scenarios we are in? Is it possible to look at how events unfold over the next 2-3 years and draw intelligent conclusions from that about the likelihood of different scenarios?
If your 5-year-old seems to have an unhealthy appetite for chocolate, you’d take measures to prevent them from consuming it. Any time they’d ask you to buy them some, you’d probably refuse their request, even if they begged. You might make sure that any chocolate in the house is well-hidden and out of their reach. You might even confiscate chocolate they already have, like if you forced them to throw out half their Halloween candy. You’d almost certainly trigger a temper tantrum and considerably worsen their mood. But no one would label you an unrelenting tyrant. Instead, you’d be labeled a good parent.
Your 5-year-old isn’t expected to have the capacity to understand the consequences to their actions, let alone have the efficacy to accomplish the actions they know are right. That’s why you’re a good parent when you force them to do the right actions, even against their explicit desires.
You know chocolate is a superstimulus and that 5-year-olds have underdeveloped mental executive functions. You have good reasons to believe that your child’s chocolate obsession isn’t caused by their agency, and instead caused by an obsolete evolutionary adaptation. But from your child’s perspective, desiring and eating chocolate is an exercise in agency. They’re just unaware of how their behaviors and desires are suboptimal. So by removing their ability to act upon their explicit desires, you’re denying their agency.
So far, denying agency doesn’t seem so bad. You have good reason to believe your child isn’t capable of acting rationally and you’re only helping them in the long run. But the ethicality gets murky when your assessment of their rationality is questionable.
Imagine you and your mother have an important flight to catch 2 hours from now. You realize that you have to leave to the airport now in order to make it on time. As you’re about to leave, you recalled the 2 beers you recently consumed. But you feel the alcohol left in your system will barely affect your driving, if at all. The problem is that if your mother found out about your beer consumption, she’d refuse to be your passenger until you completely sobered up - as she’s done in the past. You know this would cause you to miss your flight because she can’t drive and there are no other means of transportation.
A close family member died in a drunk driving accident several years ago and, ever since, she overreacts to drinking and driving risks. You think her reaction is irrational and reveals she has non-transitive preferences. For example, one time she was content on being your passenger after you warned her you were sleep deprived and your driving might be affected. Another time she refused to be your passenger after finding out you had one cocktail that hardly affected you. She’s generally a rational person, but with the recent incident and her past behavior, you deem her incapable of having a calibrated reaction. With all this in mind, you contemplate the ethicality of denying her agency by not telling her about your beer drinking.
Similar to the scenario with your 5-year-old, your intention is to ultimately help the person whose agency you’re denying. But in the scenario with your mother, it’s less clear whether you have enough information or are rational enough yourself to assess your mother’s capacity to act within her preferences. Humans are notoriously good at self-deception and rationalizing their actions. Your motivation to catch your flight might be making you irrational about how much alcohol affects your driving. Or maybe the evidence you collected against her rationality is skewed by confirmation bias. If you’re wrong about your assessment, you’d be disrespecting her wishes.
I can modify the scenario to make its ethicality even murkier. Imagine your mother wasn’t catching the plane with you. Instead, you promised to drive her back to her retirement home before your flight. You don’t want to break your promise nor miss your flight, so you contemplate not telling her about your beer consumption.
In this modified version, you’re not actually making your mother better off by denying her agency - you’re only benefiting yourself. You just believe her reaction to your beer consumption isn’t calibrated, and it would cause you to miss your flight. Even if you had plenty of evidence to back up your assessment of her rationality, would it be ethical to deny her agency when it’s only benefiting you?
What are some times you’ve denied someone’s agency? What are your justifications for doing so?
To quickly escape the great filter should we flood our galaxy with radio signals? While communicating with fellow humans we already send out massive amounts of information that an alien civilization could eventually pickup, but should we engage in positive SETI? Or, if you fear the attention of dangerous aliens, should we set up powerful long-lived solar or nuclear powered automated radio transmitters in the desert and in space that stay silent so long as they receive a yearly signal from us, but then if they fail to get the no-go signal because our civilization has fallen, continuously transmit our dead voice to the stars? If we do destroy ourselves it would be an act of astronomical altruism to warn other civilizations of our fate especially if we broadcasted news stories from just before our demise, e.g. physicists excited about a new high energy experiment.
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
One of the categories of critique that have been leveled against climate science is the critique of insularity. Broadly, it is claimed that the type of work that climate scientists are trying to do draws upon insight and expertise in many other domains, but climate scientists have historically failed to consult experts in those domains or even to follow well-documented best practices.
Note: I wrote a preliminary version of this before drafting the post, but after having done most of the relevant investigation. I reviewed and edited it prior to publication. Note also that I don't justify these takeaways explicitly in my later discussion, because a lot of these come from general intuitions of mine and it's hard to articulate how the information I received explicitly affected my reaching the takeaways. I might discuss the rationales behind these takeaways more in a later post.
- Many of the criticisms are broadly on the mark: climate scientists should have consulted best practices in other domains, and in general should have either followed them or clearly explained the reasons for divergence.
- However, this criticism is not unique to climate science: academia in general has suffered from problems of disciplines being relatively insular (UPDATE: Here's Robin Hanson saying something similar). And many similar things may be true, albeit in different ways, outside academia.
- One interesting possibility is that bad practices here operate via founder effects: for an area that starts off as relatively obscure and unimportant, setting up good practices may not be considered important. But as the area grows in importance, it is quite rare for the area to be cleaned up. People and institutions get used to the old ways of doing things. They have too much at stake to make reforms. This does suggest that it's important to get things right early on.
- (This is speculative, and not discussed in the post): The extent of insularity of a discipline seems to be an area where a few researchers can have significant effect on the discipline. If a few reasonably influential climate scientists had pushed for more integration with and understanding of ideas from other disciplines, the history of climate science research would have been different.
Relevant domains they may have failed to use or learn from
- Forecasting research: Although climate scientists were engaging in an exercise that had a lot to do with forecasting, they neither cited research nor consulted experts in the domain of forecasting.
- Statistics: Climate scientists used plenty of statistics in their analysis. They did follow the basic principles of statistics, but in many cases used them incorrectly or combined them with novel approaches that were nonstandard and did not have clear statistical literature justifying the use of such approaches.
- Programming and software engineering: Climate scientists used a lot of code both for their climate models and for their analyses of historical climate. But their code failed basic principles of decent programming, let alone good software engineering principles such as documentation, unit testing, consistent variable names, and version control.
- Publication of data, metadata, and code: This is a phenomenon becoming increasingly common in some other sectors of academia and industry. Climate scientists they failed to learn from econometrics and biomedical research, fields that had been struggling with some qualitatively similar problems and that had been moving to publishing data, metadata, and code.
Let's look at each of these critiques in turn.
Critique #1: Failure to consider forecasting research
We'll devote more attention to this critique, because it has been made, and addressed, cogently in considerable detail.
J. Scott Armstrong (faculty page, Wikipedia) is one of the big names in forecasting. In 2007, Armstrong and Kesten C. Green co-authored a global warming audit (PDF of paper, webpage with supporting materials) for the Forecasting Principles website. that was critical of the forecasting exercises by climate scientists used in the IPCC reports.
Armstrong and Green began their critique by noting the following:
- The climate science literature did not reference any of the forecasting literature, and there was no indication that they had consulted forecasting experts, even though what they were doing was to quite an extent a forecasting exercise.
- There was only one paper, by Stewart and Glantz, dating back to 1985, that could be described as a forecasting audit, and that paper was critical of the methodology of climate forecasting. And that paper appears to have been cited very little in the coming years.
- Armstrong and Green tried to contact leading climate scientists. Of the few who responded, none listed specific forecasting principles they followed, or reasons for not following general forecasting principles. They pointed to the IPCC reports as the best source for forecasts. Armstrong and Green estimated that the IPCC report violated 72 of 89 forecasting principles they were able to rate (their list of forecasting principles includes 140 principles, but they judged only 127 as applicable to climate forecasting, and were able to rate only 89 of them). No climate scientists responded to their invitation to provide their own ratings for the forecasting principles.
How significant are these general criticisms? It depends on the answers to the following questions:
- In general, how much credence do you assign to the research on forecasting principles, and how strong a prior do you have in favor of these principles being applicable to a specific domain? I think the answer is that forecasting principles as identified on the Forecasting Principles website are a reasonable starting point, and therefore, any major forecasting exercise (or exercise that implicitly generates forecasts) should at any rate justify major points of departure from these principles.
- How representative are the views of Armstrong and Green in the forecasting community? I have no idea about the representativeness of their specific views, but Armstrong in particular is high-status in the forecasting community (that I described a while back) and the Forecasting Principles website is one of the go-to sources, so material on the website is probably not too far from views in the forecasting community. (Note: I asked the question on Quora a while back, but haven't received any answers).
So it seems like there was arguably a failure of proper procedure in the climate science community in terms of consulting and applying practices from relevant domains. Still, how germane was it to the quality of their conclusions? Maybe it didn't matter after all?
In Chapter 12 of The Signal and the Noise, statistician and forecaster Nate Silver offers the following summary of Armstrong and Green's views:
- First, Armstrong and Green contend that agreement among forecasters is not related to accuracy—and may reflect bias as much as anything else. “You don’t vote,” Armstrong told me. “That’s not the way science progresses.”
- Next, they say the complexity of the global warming problem makes forecasting a fool’s errand. “There’s been no case in history where we’ve had a complex thing with lots of variables and lots of uncertainty, where people have been able to make econometric models or any complex models work,” Armstrong told me. “The more complex you make the model the worse the forecast gets.”
- Finally, Armstrong and Green write that the forecasts do not adequately account for the uncertainty intrinsic to the global warming problem. In other words, they are potentially overconfident.
Silver, Nate (2012-09-27). The Signal and the Noise: Why So Many Predictions Fail-but Some Don't (p. 382). Penguin Group US. Kindle Edition.
Silver addresses each of these in his book (read it to know what he says). Here are my own thoughts on the three points as put forth by Silver:
- I think consensus among experts (to the extent that it does exist) should be taken as a positive signal, even if the experts aren't good at forecasting. But certainly, the lack of interest or success in forecasting should dampen the magnitude of the positive signal. We should consider it likely that climate scientists have identified important potential phenomena, but should be skeptical of any actual forecasts derived from their work.
- I disagree somewhat with this point. I think forecasting could still be possible, but as of now, there is little of a successful track record of forecasting (as Green notes in a later draft paper). So forecasting efforts, including simple ones (such as persistence, linear regression, random walk with drift) and ones based on climate models (both the ones in common use right now and others that give more weight to the PDO/AMO), should continue but the jury is still out on the extent to which they work.
- I agree here that many forecasters are potentially overconfident.
Some counterpoints to the Armstrong and Green critique:
- One can argue that what climate scientists are doing isn't forecasting at all, but scenario analysis. After all, the IPCC generates scenarios, but not forecasts. But as I discussed in an earlier post, scenario planning and forecasting are closely related, and even if scenarios aren't direct explicit unconditional forecasts, they often involve implicit conditional forecasts. To its credit, the IPCC does seem to have used some best practices from the scenario planning literature in generating its emissions scenarios. But that is not part of the climate modeling exercise of the IPCC.
- Many other domains that involve planning for the future don't reference the forecasting literature. Examples include scenario planning (discussed here) and the related field of futures studies (discussed here). Insularity of disciplines from each other is a common feature (or bug) in much of academia. Can we really expect or demand that climate scientists hold themselves to a higher standard?
UPDATE: I forgot to mention in my original draft of the post that Armstrong challenged Al Gore to a bet pitting Armstrong's No Change model with the IPCC model. Gore did not accept the bet, but Armstrong created the website (here) anyway to record the relative performance of the two models.
UPDATE 2: Read drnickbone's comment and my replies for more information on the debate. drnickbone in particular points to responses from Real Climate and Skeptical Science, that I discuss in my response to his comment.
Critique #2: Inappropriate or misguided use of statistics, and failure to consult statisticians
To some extent, this overlaps with Critique #1, because best practices in forecasting include good use of statistical methods. However, the critique is a little broader. There are many parts of climate science not directly involved with forecasting, but where statistical methods still matter. Historical climate reconstruction is one such example. The purpose of these is to get a better understanding of the sorts of climate that could occur and have occurred, and how different aspects of the climate correlated. Unfortunately, historical climate data is not very reliable. How do we deal with different proxies for the climate variables we are interested in so that we can reconstruct them? A careful use of statistics is important here.
Let's consider an example that's quite far removed from climate forecasting, but has (perhaps undeservedly) played an important role in the public debate on global warming: Michael Mann's famed hockey stick (Wikipedia), discussed in detail in Mann, Bradley and Hughes (henceforth, MBH98) (available online here). The major critiques of the paper arose in a series of papers by McIntyre and McKitrick, the most important of them being their 2005 paper in Geophysical Research Letters (henceforth, MM05) (available online here).
I read about the controversy in the book The Hockey Stick Illusion by Andrew Montford (Amazon, Wikipedia), but the author also has a shorter article titled Caspar and the Jesus paper that covers the story as it unfolds from his perspective. While there's a lot more to the hockey stick controversy than statistics alone, some of the main issues are statistical.
Unfortunately, I wasn't able to resolve the statistical issues myself well enough to have an informed view. But my very crude intuition, as well as the statements made by statisticians as recorded below, supports Montford's broad outline of the story. I'll try to describe the broad critiques leveled from the statistical perspective:
- Choice of centering and standardization: The data was centered around the 20th century, a method known as short-centering, and bound to create a bias in favor of picking hockey stick-like shapes when doing principal components analysis. Each series was also standardized (divided by the standard deviation for the 20th century), which McIntyre argued was inappropriate.
- Unusual choice of statistic used for significance: MBH98 used a statistic called the RE statistic (reduction of error statistic). This is a fairly unusual statistic to use. In fact, it doesn't have a Wikipedia page, and practically the only stuff on the web (on Google and Google Scholar) about it was in relation to tree-ring research (the proxies used in MBH98 were tree rings). This should seem suspicious: why is tree-ring research using a statistic that's basically unused outside the field? There are good reasons to avoid using statistical constructs on which there is little statistical literature, because people don't have a feel for how they work. MBH98 could have used the R^2 statistic instead, and in fact, they mentioned it in their paper but then ended up not using it.
- Incorrect calculation of significance threshold: MM05 (plus subsequent comments by McIntyre) claims that not only is the RE statistic nonstandard, there were problems with the way MBH98 used it. First off, there is no theoretical distribution of the RE statistic, so calculating the cutoff needed to attain a particular significance level is a tricky exercise (this is one of many reasons why using a RE statistic may be ill-advised, according to McIntyre). MBH98 calculated the cutoff value for 99% significance incorrectly to be 0. The correct value according to McIntyre was about 0.54, whereas the actual RE statistic value for the data set in MBH98 was 0.48, i.e., not close enough. A later paper by Ammann and Wahl, cited by many as a vindication of MBH98, computed a similar cutoff of 0.52, so that the actual RE statistic value failed the significance test. So how did it manage to vindicate MBH98 when the value of the RE statistic failed the cutoff? They appear to have employed a novel statistical procedure, coming up with something called a calibration/verification RE ratio. McIntyre was quite critical of this, for reasons he described in detail here.
There has been a lengthy debate on the subject, plus two external inquiries and reports on the debate: the NAS Panel Report headed by Gerry North, and the Wegman Report headed by Edward Wegman. Both of them agreed with the statistical criticisms made by McIntyre, but the NAS report did not make any broader comments on what this says about the discipline or the general hockey stick hypothesis, while the Wegman report made more explicit criticism.
The Wegman Report made the insularity critique in some detail:
In general, we found MBH98 and MBH99 to be somewhat obscure and incomplete and the criticisms of MM03/05a/05b to be valid and compelling. We also comment that they were attempting to draw attention to the discrepancies in MBH98 and MBH99, and not to do paleoclimatic temperature reconstruction. Normally, one would try to select a calibration dataset that is representative of the entire dataset. The 1902-1995 data is not fully appropriate for calibration and leads to a misuse in principal component analysis. However, the reasons for setting 1902-1995 as the calibration point presented in the
narrative of MBH98 sounds reasonable, and the error may be easily overlooked by someone not trained in statistical methodology. We note that there is no evidence that Dr. Mann or any of the other authors in paleoclimatology studies have had significant interactions with mainstream statisticians.
In our further exploration of the social network of authorships in temperature reconstruction, we found that at least 43 authors have direct ties to Dr. Mann by virtue of coauthored papers with him. Our findings from this analysis suggest that authors in the area of paleoclimate studies are closely connected and thus ‘independent studies’ may not be as independent as they might appear on the surface. This committee does not believe that web logs are an appropriate forum for the scientific debate on this issue.
It is important to note the isolation of the paleoclimate community; even though they rely heavily on statistical methods they do not seem to be interacting with the statistical community. Additionally, we judge that the sharing of research materials, data and results was haphazardly and grudgingly done. In this case we judge that there was too much reliance on peer review, which was not necessarily independent. Moreover, the work has been sufficiently politicized that this community can hardly reassess their public positions without losing credibility. Overall, our committee believes that Mann’s assessments that the decade of the 1990s was the hottest decade of the millennium and that 1998 was the hottest year of the millennium cannot be supported by his analysis.
McIntyre has a lengthy blog post summarizing what he sees as the main parts of the NAS Panel Report, the Wegman Report, and other statements made by statisticians critical of MBH98.
Critique #3: Inadequate use of software engineering, project management, and coding documentation and testing principles
In the aftermath of Climategate, most public attention was drawn to the content of the emails. But apart from the emails, data and code was also leaked, and this gave the world an inside view of the code that's used to simulate the climate. A number of criticisms of the coding practice emerged.
Chicago Boyz had a lengthy post titled Scientists are not Software Engineers that noted the sloppiness in the code, and some of the implications, but was also quick to point out that poor-quality code is not unique to climate science and is a general problem with large-scale projects that arise from small-scale academic research growing beyond what the coders originally intended, but with no systematic efforts being made to refactor the code (if you have thoughts on the general prevalence of good software engineering practices in code for academic research, feel free to share them by answering my Quora question here, and if you have insights on climate science code in particular, answer my Quora question here). Below are some excerpts from the post:
No, the real shocking revelation lies in the computer code and data that were dumped along with the emails. Arguably, these are the most important computer programs in the world. These programs generate the data that is used to create the climate models which purport to show an inevitable catastrophic warming caused by human activity. It is on the basis of these programs that we are supposed to massively reengineer the entire planetary economy and technology base.
The dumped files revealed that those critical programs are complete and utter train wrecks.
The design, production and maintenance of large pieces of software require project management skills greater than those required for large material construction projects. Computer programs are the most complicated pieces of technology ever created. By several orders of magnitude they have more “parts” and more interactions between those parts than any other technology.
Software engineers and software project managers have created procedures for managing that complexity. It begins with seemingly trivial things like style guides that regulate what names programmers can give to attributes of software and the associated datafiles. Then you have version control in which every change to the software is recorded in a database. Programmers have to document absolutely everything they do. Before they write code, there is extensive planning by many people. After the code is written comes the dreaded code review in which other programmers and managers go over the code line by line and look for faults. After the code reaches its semi-complete form, it is handed over to Quality Assurance which is staffed by drooling, befanged, malicious sociopaths who live for nothing more than to take a programmer’s greatest, most elegant code and rip it apart and possibly sexually violate it. (Yes, I’m still bitter.)
Institutions pay for all this oversight and double-checking and programmers tolerate it because it is impossible to create a large, reliable and accurate piece of software without such procedures firmly in place. Software is just too complex to wing it.
Clearly, nothing like these established procedures was used at CRU. Indeed, the code seems to have been written overwhelmingly by just two people (one at a time) over the past 30 years. Neither of these individuals was a formally trained programmer and there appears to have been no project planning or even formal documentation. Indeed, the comments of the second programmer, the hapless “Harry”, as he struggled to understand the work of his predecessor are now being read as a kind of programmer’s Icelandic saga describing a death march through an inexplicable maze of ineptitude and boobytraps.
A lot of the CRU code is clearly composed of hacks. Hacks are informal, off-the-cuff solutions that programmers think up on the spur of the moment to fix some little problem. Sometimes they are so elegant as to be awe inspiring and they enter programming lore. More often, however, they are crude, sloppy and dangerously unreliable. Programmers usually use hacks as a temporary quick solution to a bottleneck problem. The intention is always to come back later and replace the hack with a more well-thought-out and reliable solution, but with no formal project management and time constraints it’s easy to forget to do so. After a time, more code evolves that depends on the existence of the hack, so replacing it becomes a much bigger task than just replacing the initial hack would have been.
(One hack in the CRU software will no doubt become famous. The programmer needed to calculate the distance and overlapping effect between weather monitoring stations. The non-hack way to do so would be to break out the trigonometry and write a planned piece of code to calculate the spatial relationships. Instead, the CRU programmer noticed that that the visualization software that displayed the program’s results already plotted the station’s locations so he sampled individual pixels on the screen and used the color of the pixels between the stations to determine their location and overlap! This is a fragile hack because if the visualization changes the colors it uses, the components that depend on the hack will fail silently.)
For some choice comments excerpted from a code file, see here.
Critique #4: Practices of publication of data, metadata, and code (that had gained traction in other disciplines)
When McIntyre wanted to replicate MBH98, he emailed Mann asking for his data and code. Mann, though initially cooperative, soon started trying to fed McIntyre off. Part of this was because he thought McIntyre was out to find something wrong with his work (a well-grounded suspicion). But part of it was also that his data and code were a mess. He didn't maintain them in a way that he'd be comfortable sharing them around to anybody other than an already sympathetic academic. And, more importantly, as Mann's colleague Stephen Schneider noted, nobody asked for the code and underlying data during peer review. And most journals at the time did not require authors to submit or archive their code and data at the time of submission or acceptance of their paper. This also closely relates to Critique #3: a requirement or expectation that one's data and code would be published along with one's paper might make people more careful to follow good coding practices and avoid using various "tricks" and "hacks" in their code.
Here's how Andrew Montford puts it in The Hockey Stick Illusion:
The Hockey Stick affair is not the first scandal in which important scientific papers underpinning government policy positions have been found to be non-replicable – McCullough and McKitrick review a litany of sorry cases from several different fields – but it does underline the need for a more solid basis on which political decision-making should be based. That basis is replication. Centuries of scientific endeavour have shown that truth emerges only from repeated experimentation and falsification of theories, a process that only begins after publication and can continue for months or years or decades thereafter. Only through actually reproducing the findings of a scientific paper can other researchers be certain that those findings are correct. In the early history of European science, publication of scientific findings in a journal was usually adequate to allow other researchers to replicate them. However, as science has advanced, the techniques used have become steadily more complicated and consequently more difficult to explain. The advent of computers has allowed scientists to add further layers of complexity to their work and to handle much larger datasets, to the extent that a journal article can now, in most cases, no longer be considered a definitive record of a scientific result. There is simply insufficient space in the pages of a print journal to explain what exactly has been done. This has produced a rather profound change in the purpose of a scientific paper. As geophysicist Jon Claerbout puts it, in a world where powerful computers and vast datasets dominate scientific research, the paper ‘is not the scholarship itself, it is merely advertising of the scholarship’.b The actual scholarship is the data and code used to generate the figures presented in the paper and which underpin its claims to uniqueness. In passing we should note the implications of Claerbout’s observations for the assessment for our conclusions in the last section: by using only peer review to assess the climate science literature, the policymaking community is implicitly expecting that a read-through of a partial account of the research performed will be sufficient to identify any errors or other problems with the paper. This is simply not credible. With a full explanation of methodology now often not possible from the text of a paper, replication can usually only be performed if the data and code are available. This is a major change from a hundred years ago, but in the twenty-first century it should be a trivial problem to address. In some specialisms it is just that. We have seen, however, how almost every attempt to obtain data from climatologists is met by a wall of evasion and obfuscation, with journals and funding bodies either unable or unwilling to assist. This is, of course, unethical and unacceptable, particularly for publicly funded scientists. The public has paid for nearly all of this data to be collated and has a right to see it distributed and reused. As the treatment of the Loehle paper shows,c for scientists to open themselves up to criticism by allowing open review and full data access is a profoundly uncomfortable process, but the public is not paying scientists to have comfortable lives; they are paying for rapid advances in science. If data is available, doubts over exactly where the researcher has started from fall away. If computer code is made public too, then the task of replication becomes simpler still and all doubts about the methodology are removed. The debate moves on from foolish and long-winded arguments about what was done (we still have no idea exactly how Mann calculated his confidence intervals) onto the real scientific meat of whether what was done was correct. As we look back over McIntyre’s work on the Hockey Stick, we see that much of his time was wasted on trying to uncover from the obscure wording of Mann’s papers exactly what procedures had been used. Again, we can only state that this is entirely unacceptable for publicly funded science and is unforgiveable in an area of such enormous policy importance. As well as helping scientists to find errors more quickly, replication has other benefits that are not insignificant. David Goodstein of the California Insitute of Technology has commented that the possibility that someone will try to replicate a piece of work is a powerful disincentive to cheating – in other words, it can help to prevent scientific fraud.251 Goodstein also notes that, in reality, very few scientific papers are ever subject to an attempt to replicate them. It is clear from Stephen Schneider’s surprise when asked to obtain the data behind one of Mann’s papers that this criticism extends into the field of climatology.d In a world where pressure from funding agencies and the demands of university careers mean that academics have to publish or perish, precious few resources are free to replicate the work of others. In years gone by, some of the time of PhD students might have been devoted to replicating the work of rival labs, but few students would accept such a menial task in the modern world: they have their own publication records to worry about. It is unforgiveable, therefore, that in paleoclimate circles, the few attempts that have been made at replication have been blocked by all of the parties in a position to do something about it. Medical science is far ahead of the physical sciences in the area of replication. Doug Altman, of Cancer Research UK’s Medical Statistics group, has commented that archiving of data should be mandatory and that a failure to retain data should be treated as research misconduct.252 The introduction of this kind of regime to climatology could have nothing but a salutary effect on its rather tarnished reputation. Other subject areas, however, have found simpler and less confrontational ways to deal with the problem. In areas such as econometrics, which have long suffered from politicisation and fraud, several journals have adopted clear and rigorous policies on archiving of data. At publications such as the American Economic Review, Econometrica and the Journal of Money, Credit and Banking, a manuscript that is submitted for publication will simply not be accepted unless data and fully functional code are available. In other words, if the data and code are not public then the journals will not even consider the article for publication, except in very rare circumstances. This is simple, fair and transparent and works without any dissent. It also avoids any rancorous disagreements between journal and author after the event. Physical science journals are, by and large, far behind the econometricians on this score. While most have adopted one pious policy or another, giving the appearance of transparency on data and code, as we have seen in the unfolding of this story, there has been a near-complete failure to enforce these rules. This failure simply stores up potential problems for the editors: if an author refuses to release his data, the journal is left with an enforcement problem from which it is very difficult to extricate themselves. Their sole potential sanction is to withdraw the paper, but this then merely opens them up to the possibility of expensive lawsuits. It is hardly surprising that in practice such drastic steps are never taken. The failure of climatology journals to enact strict policies or enforce weaker ones represents a serious failure in the system of assurance that taxpayer-funded science is rigorous and reliable. Funding bodies claim that they rely on journals to ensure data availability. Journals want a quiet life and will not face down the academics who are their lifeblood. Will Nature now go back to Mann and threaten to withdraw his paper if he doesn’t produce the code for his confidence interval calculations? It is unlikely in the extreme. Until politicians and journals enforce the sharing of data, the public can gain little assurance that there is any real need for the financial sacrifices they are being asked to accept. Taking steps to assist the process of replication will do much to improve the conduct of climatology and to ensure that its findings are solidly based, but in the case of papers of pivotal importance politicians must also go further. Where a paper like the Hockey Stick appears to be central to a set of policy demands or to the shaping of public opinion, it is not credible for policymakers to stand back and wait for the scientific community to test the veracity of the findings over the years following publication. Replication and falsification are of little use if they happen after policy decisions have been made. The next lesson of the Hockey Stick affair is that if governments are truly to have assurance that climate science is a sound basis for decision-making, they will have to set up a formal process for replicating key papers, one in which the oversight role is peformed by scientists who are genuinely independent and who have no financial interest in the outcome.
Montford, Andrew (2011-06-06). The Hockey Stick Illusion (pp. 379-383). Stacey Arts. Kindle Edition.
I think there's widespread assent on LW that the sequences were pretty awesome. Not only do they elucidate upon a lot of useful concepts, but they provide useful shorthand terms for those concepts which help in thinking and talking about them. When I see a word or phrase in a sentence which, rather than doing any semantic work, simply evokes a positive association to the reader, I have the useful handle of "applause light" for it. I don't have to think "oh, there's one of those...you know...things where a word isn't doing any semantic work but just evokes a positive association the reader". This is a common enough pattern that having the term "applause light" is tremendously convenient.
I would like this thread to be a location where people propose such patterns in comments, and respondents determine (a) whether this pattern actually exists and / or is useful; (b) whether there is already a term or sufficiently-related concept that adequately describes it; and (c) what a useful / pragmatic / catchy term might be for it, if none exists already.
I would like to propose some rules suggested formatting to make this go more smoothly.
(ETA: feel free to ignore this and post however you like, though)
When proposing a pattern, include a description of the general case as well as at least one motivating example. This is useful for establishing what you think the general pattern is, and why you think it matters. For instance:
When someone uses a term without any thought to what that term means in context, but to elicit a positive association in their audience.
I was at a conference where someone said AI development should be "more democratic". I didn't understand what they meant in context, and upon quizzing them, it turned out that they didn't either. It seems to me that they just used the word "democratic" as decoration to make the audience attach positive feelings to what they were saying.
When I think about it, this seems like quite a common rhetorical device.
When responding to a pattern, please specify whether your response is:
(a) wrangling with the definition, usefulness or existence of the pattern
(b) making a claim that a term or sufficiently-related concept exists that adequately describes it
(c) suggesting a completely fresh, hitherto-uncoined name for it
(ETA: or don't, of you don't want to)
Obviously, upvote suggestions that you think are worthy. If this post takes off, I may do a follow-up with the most upvoted suggestions.
Dear effective altruist,
have you considered artificial utility monsters as a high-leverage form of altruism?
In the traditional sense, a utility monster is a hypothetical being which gains so much subjective wellbeing (SWB) from marginal input of resources that any other form of resource allocation is inferior on a utilitarian calculus. (as illustrated on SMBC)
This has been used to show that utilitarianism is not as egalitarian as it intuitively may appear, since it prioritizes some beings over others rather strictly - including humans.
The traditional utility monster is implausible even in principle - it is hard to imagine a mind that is constructed such that it will not succumb to diminishing marginal utility from additional resource allocation. There is probably some natural limit on how much SWB a mind can implement, or at least how much this can be improved by spending more on the mind. This would probably even be true for an algorithmic mind that can be sped up with faster computers, and there are probably limits to how much a digital mind can benefit in subjective speed from the parallelization of its internal subcomputations.
However, we may broaden the traditional definition somewhat and call any technology utility-monstrous if it implements high SWB with exceptionally good cost-effectiveness and in a scalable form - even if this scalability stems form a larger set of minds running in parallel, rather than one mind feeling much better or living much longer per additional joule/dollar.
Under this definition, it may be very possible to create and sustain many artificial minds reliably and cheaply, while they all have a very high SWB level at or near subsistence. An important point here is that possible peak intensities of artificially implemented pleasures could be far higher than those commonly found in evolved minds: Our worst pains seem more intense than our best pleasures for evolutionary reasons - but the same does not have to be true for artifial sentience, whose best pleasures could be even more intense than our worst agony, without any need for suffering anywhere near this strong.
If such technologies can be invented - which seems highly plausible in principle, if not yet in practice - then the original conclusion for the utilitarian calculus is retained: It would be highly desirable for utilitarians to facilitate the invention and implementation of such utility-monstrous systems and allocate marginal resources to subsidize their existence. This makes it a potential high-value target for effective altruism.
Many tastes, many utility monsters
Human motivation is barely stimulated by abstract intellectual concepts, and "utilitronium" sounds more like "aluminium" than something to desire or empathize with. Consequently, the idea is as sexy as a brick. "Wireheading" evokes associations of having a piece of metal rammed into one's head, which is understandably unattractive to any evolved primate (unless it's attached to an iPod, which apparently makes it okay).
Technically, "utility monsters" suffer from a similar association problem, which is that the idea is dangerous or ethically monstrous. But since the term is so specific and established in ethical philosophy, and since "monster" can at least be given an emotive and amicable - almost endearing - tone, it seems realistic to use it positively. (Suggestions for a better name are welcome, of course.)
So a central issue for the actual implementation and funding is human attraction. It is more important to motivate humans to embrace the existence of utility monsters than it is for them to be optimally resource-efficient - after all, a technology that is never implemented or funded properly gains next to nothing from being efficient.
A compromise between raw efficiency of SWB per joule/dollar and better forms to attract humans might be best. There is probably a sweet spot - perhaps various different ones for different target groups - between resource-efficiency and attractiveness. Only die-hard utilitarians will actually want to fund something like hedonium, but the rest of the world may still respond to "The Sims - now with real pleasures!", likeable VR characters, or a new generation of reward-based Tamagotchis.
Once we step away somewhat from maximum efficiency, the possibilities expand drastically. Implementation forms may be:
- decorative like gimmicks or screensavers,
- fashionable like sentient wearables,
- sophisticated and localized like works of art,
- cute like pets or children,
- personalized like computer game avatars retiring into paradise,
- erotic like virtual lovers who continue to have sex without the user,
- nostalgic like digital spirits of dead loved ones in artificial serenity,
- crazy like hyperorgasmic flowers,
- semi-functional like joyful household robots and software assistants,
- and of course generally a wide range of human-like and non-human-like simulated characters embedded in all kinds of virtual narratives.
Possible risks and mitigation strategies
Open-souce utility monsters could be made public as templates to add additional control that the implementation of sentience is correct and positive, and to make better variations easy to explore. However, this would come with the downside of malicious abuse and reckless harm potential. Risks of suffering could come from artificial unhappiness desired by users, e.g. for narratives that contain sadism, dramatic violence or punishment of evil characters for quasi-moral gratification. Another such risk could come simply from bad local modifications that implement suffering by accident.
Despite these risks, one may hope that most humans who care enough to run artificial sentience are more benevolent and careful than malevolent and careless in a way that causes more positive SWB than suffering. After all, most people love their pets and do not torture them, and other people look down on those who do (compare this discussion of Norn abuse, which resulted in extremely hostile responses). And there may be laws against causing artificial suffering. Still, this is an important point of concern.
Closed-source utility monsters may further mitigate some of this risk by not making the sentient phenotypes directly available to the public, but encapsulating their internal implementation within a well-defined interface - like a physical toy or closed-source software that can be used and run by private users, but not internally manipulated beyond a well-tested state-space without hacking.
An extremely cautionary approach would be to run the utility monsters by externally controlled dedicated institutions and only give the public - such as voters or donors - some limited control over them through communication with the institution. For instance, dedicated charities could offer "virtual paradises" to donors so they can "adopt" utility monsters living there in certain ways without allowing those donors to actually lay hands on their implementation. On the other hand, this would require a high level of trustworthiness of the institutions or charities and their controllers.
Not for the sake of utility monsters alone
Human values are complex, and it has been argued on LessWrong that the resource allocation of any good future should not be spent for the sake of pleasure or happiness alone. As evolved primates, we all have more than one intuitive value we hold dear, even among self-identified intellectual utilitarians, who compose only a tiny fraction of the population.
However, some discussions in the rationalist community touching related technologies like pleasure wireheading, utilitronium, and so on, have suffered from implausible or orthogonal assumptions and associations. Since the utilitarian calculus favors SWB maximization above all else, it has been feared, we run the risk of losing a more complex future because
a) utilitarianism knows no compromise and
b) the future will be decided by one winning singleton who takes it all and
c) we have only one world with only one future to get it right
In addition, low status has been ascribed to wireheads, with the association of fake utility or cheating life as a form of low-status behavior. People have been competing for status by associating themselves with the miserable Socrates instead of the happy pig, without actually giving up real option value in their own lives.
On Scott Alexander's blog, there's a good example of a mostly pessimistic view both in the OP and in the comments. And in this comment on an effective altruism critique, Carl Shulman names hedonistic utilitarianism turning into a bad political ideology similar to communist states as a plausible failure mode of effective altruism.
So, will we all be killed by a singleton who turns us into utilitronium?
Be not afraid! These fears are plausibly unwarranted because:
a) Utilitarianism is consequentialism, and consequentialists are opportunistic compromisers - even within the conflicting impulses of their own evolved minds. The number of utilitarians who would accept existential risk for the sake of pleasure maximization is small, and practically all of them ascribe to the philosophy of cooperative compromise with orthogonal, non-exclusive values in the political marketplace. Those who don't are incompetent almost by definition and will never gain much political traction.
b) The future may very well not be decided by one singleton but by a marketplace of competing agency. Building a singleton is hard and requires the strict subduction or absorption of all competition. Even if it were to succeed, the singleton will probably not implement only one human value, since it will be created by many humans with complex values, or at least it will have to make credible concessions to a critical mass of humans with diverse values who can stop it before it reaches singleton status. And if these mitigating assumptions are all false and a fooming singleton is possible and easy, then too much pleasure should be the least of humanity's worries - after all, in this case the Taliban, the Chinese government, the US military or some modern King Joffrey are just as likely to get the singleton as the utilitarians.
c) There are plausibly many Everett branches and many hubble volumes like ours, implementing more than one future-earth outcome, as summed up by Max Tegmark here. Even if infinitarian multiverse theories should all end up false against current odds, a very large finite universe would still be far more realistic than a small one, given our physical observations. This makes a pre-existing value diversity highly probable if not inevitable. For instance, if you value pristine nature in addition to SWB, you should accept the high probability of many parallel earth-like planets with pristine nature irregardless of what you do, and consider that we may be in an exceptional minority position to improve the measure of other values that do not naturally evolve easily, such as a very high positive-SWB-over-suffering surplus.
From the present, into the future
If we accept the conclusion that utility-monstrous technology is a high-value vector for effective altruism (among others), then what could current EAs do as we transition into the future? To my best knowledge, we don't have the capacity yet to create artificial utility monsters.
However, foundational research in neuroscience and artificial intelligence/sentience theory is already ongoing today and certainly a necessity if we ever want to implement utility-monstrous systems. In addition, outreach and public discussion of the fundamental concepts is also possible and plausibly high-value (hence this post). Generally, the following steps seem all useful and could use the attention of EAs, as we progress into the future:
- spread the idea, refine the concepts, apply constructive criticism to all its weak spots until it becomes either solid or revealed as irredeemably undesirable
- identify possible misunderstandings, fears, biases etc. that may reduce human acceptance and find compromises and attraction factors to mitigate them
- fund and do the scientific research that, if successful, could lead to utility-monstrous technologies
- fund the implementation of the first actual utility monsters and test them thoroughly, then improve on the design, then test again, etc.
- either make the templates public (open-source approach) or make them available for specialized altruistic institutions, such as private charities
- perform outreach and fundraising to give existence donations to as many utility monsters as possible
All of this can be done without much self-sacrifice on the part of any individual. And all of this can be done within existing political systems, existing markets, and without violating anyone's rights.
The following simple game has one solution that seems correct, but isn’t. Can you figure out why?
Player One moves first. He must pick A, B, or C. If Player One picks A the game ends and Player Two does nothing. If Player One picks B or C, Player Two will be told that Player One picked B or C, but will not be told which of these two strategies Player One picked, Player Two must then pick X or Y, and then the game ends. The following shows the Players’ payoffs for each possible outcome. Player One’s payoff is listed first.
A 3,0 [And Player Two never got to move.]
WARNING: Memetic hazard.
Is there anything we should do?
Of the technologies that have a reasonable chance of come to mass market in the next 20-25 years and having a significant impact on human society, driverless cars (also known as self-driving cars or autonomous cars) stand out. I was originally planning to collect material discussing driverless cars, but Gwern has a really excellent compendium of statements about driverless cars, published January 2013 (if you're reading this, Gwern, thanks!). There have been a few developments since then (for instance, Google's announcement that it was building its own driverless car, or a startup called Cruise Automation planning to build a $10,000 driverless car) but the overall landscape remains similar. There's been some progress with understanding and navigating city streets and with handling adverse weather conditions, and it's more or less on schedule.
My question is about driverless car forecasts. Driverless Future has a good summary page of forecasts made by automobile manufacturer, insurers, and professional societies. The range of time for the arrival of the first commercial driverless cars varies between 2018 and 2030. The timeline for driverless cars to achieve mass penetration is similarly stagged between the early 2020s and 2040. (The forecasts aren't all directly comparable).
A few thoughts come to mind:
- Insurer societies and professional societies seem more conservative in their estimates than manufacturers (both automobile manufacturers and people manufacturing the technology for driverless cars). Note that the estimates of many manufacturers are centered on their projected release dates for their own driverless cars. This suggests an obvious conflict of interest: manufacturers may be incentivized to be optimistic in their projections of when driverless cars will be released, insofar as making more optimistic predictions wins them news coverage and might also improve their market valuation. (At the same time, the release dates are sufficiently far in the future that it's unlikely that they'll be held to account for false projections, so there isn't a strong incentive to be conservative the same way as there is with quarterly sales and earning forecasts). Overall, then, I'd defer more to the judgment of the professional societies, namely the IEEE and the Society of Autonomous Engineers.
- The statements compiled by Gwern point to the many legal hurdles and other thorny issues of ethics that would need to be resolved, at least partially, before driverless cars start becoming a big presence in the market.
- The general critique made by Schnaars in Megamistakes (that I discussed here) applies to driverless car technology: consumers may be unwilling to pay the added cost despite the safety benefits. Some of the quotes in Gwern's compendium reference related issues. This points further in the direction of forecasts by manufacturers being overly optimistic.
Questions for the people here:
- Do you agree with my points (1)-(3) above?
- Would you care to make forecasts for things such as: (a) the date that the first commercial driverless car will hit the market in a major country or US state? (b) the date by which over 10% of new cars sold in a large country or US state will be driverless (i.e., capable of fully autonomous operation), (c) same as (b), but over 50%, (d) the date by which over 10% of cars on the road (in a large country or US state) will be operating autonomously, (e) same as (d), but over 50%. You don't have to answer these exact questions, I'm just providing some suggestions since "forecast the future of driverless cars" is overly vague.
- What's your overall view on whether it is desirable at the margin to speed up or slow down the arrival of autonomous vehicles on the road? What factors would you consider in answering such a question?
Vincent Müller and Nick Bostrom have just released a paper surveying the results of a poll of experts about future progress in artificial intelligence. The authors have also put up a companion site where visitors can take the poll and see the raw data. I just checked the site and so far only one individual has submitted a response. This provides an opportunity for testing the views of LW members against those of experts. So if you are willing to complete the questionnaire, please do so before reading the paper. (I have abstained from providing a link to the pdf to create a trivial inconvenience for those who cannot resist temptaion. Once you take the poll, you can easily find the paper by conducting a Google search with the keywords: bostrom muller future progress artificial intelligence.)
At the recent Tel Aviv meetup, after a discussion of the open problems in the field of FAI, we reached the conclusion that the problem of logical uncertainty is one of the most major of the problems open today. In this post I will try to give a few insights I had on this problem, which can be thought of as the problem of constructing a (non-degenerate) probability measure over the set of the statements of an arbitrary logical system.
To clarify my goal: I'm trying to make a Solomonoff-like system for assigning probabilities to logical statements. For reasons much like the reasons that Solomonoff Induction is uncomputable, this system will be uncomputable as well. This puts certain limits to it's usefulness, but it's certainly a start. Solomonoff Induction is very useful, if only as a limiting case of computable processes. I believe the system I present here has some of that same value when it comes to the problem of logical uncertainty.
The obvious tack to take is one involving proof-lengths: It is reasonable to say that a sentence that is harder to prove should have a lower measure.
Let's start with a definition:
For each provably true statement φ , let ProofLength(φ) be the length of φ's shortest proof. For all unprovable yet true statements, let ProofLength(φ) be ∞.
Therefore, if we have some probability measure P over true statements in our system, we want P to be monotonic decreasing in regards to proof length. (e.g. P(φ1)<P(φ2) ⇔ ProofLength(φ1)>ProofLength(φ2))
For obvious reasons, we want to assign probability 1 to our axioms. As an axiom always has a proof of length one (the mere statement of the axiom itself, without any previous statements that it is derived from), we want the property P(φ) = 1 ⇔ ProofLength(φ) = 1.
Lastly, for statements that are unprovable, we have to assign a probability of ½. Why? Let φ be an unprovable statement. Because P is a probability measure, P(φ)+P(~φ) = 1. Exactly one of φ or ~φ is true, but as they are unprovable, we have no way of knowing, even in theory, which is which. Thus, by symmetry, P(φ)=P(~φ)=½.
Given these desiderata, we will see what measure P we can construct that follows them.
For each true statement φ I will define P(φ) to be 2-ProofLength(φ)+2-1. This seems at first rather arbitrary, but it matches the desiderata in a fairly elegant way. This P is monotonic with regards to the proof length, as we demanded, and it correctly assigns a probability of 1 to our axioms. It also correctly assigns probability ½ to unprovable statements.
For this to be a probability measure over the set of all statements, we still need to define it on the false statements as well. This is trivial, as we can define P(φ)=1-P(~φ) for all false statements φ. This gives us all the properties we might demand of a logical probability measure that is based on proof lengths:
- Statements that can be easily proved true are given more measure than those that are hard (or even impossible) to prove to be true.
- And in reverse for false statements: Statements that can be easily proved to be false are given a lower measure than those that are hard (or even impossible) to prove to be false.
- Specifically, it assigns probability 1 to axioms (and 0 to the negations of axioms.)
I have no idea if this is the best way to construct a logical probability measure, but it seems like a start. This seems like as decent a way as any to assign priors to statements of logical probability.
That handles priors, but it doesn't seem to give an easy way to update on evidence. Obviously, once you prove something is true or false, you want its probability to rise to 1 or drop to 0 accordingly. Also, if you take some steps in the direction of proving something to be true or false, you want its probability to rise or fall accordingly.
To take a logical sentence and just blindly assign it the probability described above, ignoring everything else, is just as bad as taking the Solomonoff prior for the probability of a regular statement (in the standard system of probability) , and refusing to update away from that that. The role of the P described above is very much like that of the role of the Solomonoff prior in normal inductive Bayesian reasoning. This is nice, and perhaps it is a step forwards, but it isn't a system that can be used by itself for making decisions.
Luckily, there is a way to update on information in Solomonoff induction. You merely "slice off" the worlds that are now impossible given the new evidence, and recalculate. (It can be proven that doing so is equivalent to updating using Bayes' Theorem.)
To my delight, I found that something similar is possible with this system too! This is the truly important insight here, as this gives (for the first time, so far as I know) a method for actually updating on logical probabilities, so that as you advance towards a proof, your probability estimate of that sentence being true approaches 1, but only reaches 1 once you have a full proof.
What you do is exactly the same as in Solomonoff induction: Every time you prove something, you update by recalculating the probability of every statement, given that you now now the newly proven thing. Informally, the reasoning is like this: If you proved a statement, that means you know it with probability 1, or in other words, it can be considered as effectively a new axiom. So add it to your axioms, and you will get your updated probability!
In more technical terms, in a given logical system S, P(φ|ψ) will be defined as the P(φ) I described above, just in the logical system S+ψ, rather than is S. This obeys all the properties we want out of an update on evidence: An update increases the probability we assign to a statement that we proved part of, or proved a lemma for, or whatever, and decreases the probability we assign to a statement that we proved part of its negation, or a lemma for the proof of its negation, or the like.
This is not a complete theory of logical uncertainty, but it could be a foundation. It certainly includes some insights I haven't seen before, or at least that I haven't seen explained in these terms. In the upcoming weeks the Tel Aviv meetup group is planning to do a MIRIx workshop on the topic of logical uncertainty, and we hope to make some real strides in it. Perhaps we will expand on this, or perhaps we will come up with some other avenue of attack. If we can give logical uncertainty a formal grounding, that will be a fairly major step. After all, the black box of logical uncertainty sits right at the heart of most attempts to advance AGI, and at the moment it is merely handwaved away in most applications. But eventually it needs an underpinning, and that is what we are aiming at.
[LINK] Why Talk to Philosophers: Physicist Sean Carroll Discusses "Common Misunderstandings" about Philosophy
See also Sean Carroll's own blog entry, Physicists Should Stop Saying Silly Things about Philosophy.
Sean classifies the disparaging comments physicists make about philosophy as follows: "Roughly speaking, physicists tend to have three different kinds of lazy critiques of philosophy: one that is totally dopey, one that is frustratingly annoying, and one that is deeply depressing". Specifically:
- “Philosophy tries to understand the universe by pure thought, without collecting experimental data.”
- “Philosophy is completely useless to the everyday job of a working physicist.”
- “Philosophers care too much about deep-sounding meta-questions, instead of sticking to what can be observed and calculated.”
He counters each argument presented.
Personally, I am underwhelmed, since he does not address the point of view that philosophy is great at asking interesting questions but lousy at answering them. Typically, an interesting answer to a philosophical question requires first recasting it in a falsifiable form, so that is becomes a natural science question, be it physics, cognitive sciences, AI research or something else. This is locally known as hacking away at the edges. Philosophical questions don't have philosophical answers.
Granted, writing is not very effective. But some of us just love writing...
Earning to Give Writing: Which are the places that pay 1USD or more dollars per word?
Clarification Writing: What needs being written because it is only through writing that these ideas will emerge in the first place?
What should we be writing about if we have already been, for very long, training the craft? What has not yet been written, what is the new thing?
I recently realized that, encouraged by LessWrong, I had been using a heuristic in my philosophical reasoning that I now think is suspect. I'm not accusing anybody else of falling into the same trap; I'm just recounting my own situation for the benefit of all.
I actually am not 100% sure that the heuristic is wrong. I hope that this discussion about it generalizes into a conversation about intuition and the relationship between FAI epistemology and our own epistemology.
The heuristic is this: If the ideal FAI would think a certain way, then I should think that way as well. At least in epistemic matters, I should strive to be like an ideal FAI.
Examples of the heuristic in use are:
--The ideal FAI wouldn't care about its personal identity over time; it would have no problem copying itself and deleting the original as the need arose. So I should (a) not care about personal identity over time, even if it exists, and (b) stop believing that it exists.
--The ideal FAI wouldn't care about its personal identity at a given time either; if it was proven that 99% of all observers with its total information set were in fact Boltzmann Brains, then it would continue to act as if it were not a Boltzmann Brain, since that's what maximizes utility. So I should (a) act as if I'm not a BB even if I am one, and (b) stop thinking it is even a meaningful possibility.
--The ideal FAI would think that the specific architecture it is implemented on (brains, computers, nanomachines, giant look-up tables) is irrelevant except for practical reasons like resource efficiency. So, following its example, I should stop worrying about whether e.g. a simulated brain would be conscious.
--The ideal FAI would think that it was NOT a "unified subject of experience" or an "irreducible substance" or that it was experiencing "ineffable, irreducible quale," because believing in those things would only distract it from understanding and improving its inner workings. Therefore, I should think that I, too, am nothing but a physical mechanism and/or an algorithm implemented somewhere but capable of being implemented elsewhere.
--The ideal FAI would use UDT/TDT/etc. Therefore I should too.
--The ideal FAI would ignore uncomputable possibilities. Therefore I should too.
Arguably, most if not all of the conclusions I drew in the above are actually correct. However, I think that the heuristic is questionable, for the following reasons:
(1) Sometimes what we think of as the ideal FAI isn't actually ideal. Case in point: The final bullet above about uncomputable possibilities. We intuitively think that uncomputable possibilites ought to be countenanced, so rather than overriding our intuition when presented with an attractive theory of the ideal FAI (in this case AIXI) perhaps we should keep looking for an ideal that better matches our intuitions.
(2) The FAI is a tool for serving our wishes; if we start to think of ourselves as being fundamentally the same sort of thing as the FAI, our values may end up drifting badly. For simplicity, let's suppose the FAI is designed to maximize happy human life-years. The problem is, we don't know how to define a human. Do simulated brains count? What about patterns found inside rocks? What about souls, if they exist? Suppose we have the intuition that humans are indivisible entities that persist across time. If we reason using the heuristic I am talking about, we would decide that, since the FAI doesn't think it is an indivisible entity that persists across time, we shouldn't think we are either. So we would then proceed to tell the FAI "Humans are naught but a certain kind of functional structure," and (if our overruled intuition was correct) all get killed.
Note 1: "Intuitions" can (I suspect) be thought of as another word for "Priors."
Note 2: We humans are NOT solomonoff-induction-approximators, as far as I can tell. This bodes ill for FAI, I think.
Your job, should you choose to accept it, is to comment on this thread explaining the most awesome thing you've done this since June 1st. You may be as blatantly proud of yourself as you feel. You may unabashedly consider yourself the coolest freaking person ever because of that awesome thing you're dying to tell everyone about. This is the place to do just that.
Remember, however, that this isn't any kind of progress thread. Nor is it any kind of proposal thread. This thread is solely for people to talk about the awesome things they have done. Not "will do". Not "are working on". Have already done. This is to cultivate an environment of object level productivity rather than meta-productivity methods.
So, what's the coolest thing you've done this month?
Note: This post is part of my series on forecasting for MIRI. I recommend reading my earlier post on the general-purpose forecasting community, my post on scenario planning, and my post on futures studies. Although this post doesn't rely on those, they do complement each other.
Note 2: If I run across more domains where I have substantive things to say, I'll add them to this post (if I've got a lot to say, I'll write a separate post and add a link to it as well). Suggestions for other domains worth looking into, that I've missed below, would be appreciated.
Below, I list some examples of domains where forecasting is commonly used. In the post, I briefly describe each of the domains, linking to other posts of mine, or external sources, for more information. The list is not intended to be comprehensive. It's just the domains that I investigated at least somewhat and therefore have something to write about.
- Weather and climate forecasting
- Agriculture, crop simulation
- Business forecasting, including demand, supply, and price forecasting
- Macroeconomic forecasting
- Political and geopolitical forecasting: This includes forecasting of election results, public opinion on issues, armed conflicts or political violence, and legislative changes
- Demographic forecasting, including forecasting of population, age structure, births, deaths, and migration flows.
- Energy use forecasting (demand forecasting, price forecasting, and supply forecasting, including forecasting of conventional and alternative energy sources; borrows some general ideas from business forecasting)
- Technology forecasting
Let's look into these in somewhat more detail.
Note that for some domains, scenario planning may be more commonly used than forecasting in the traditional sense. Some domains have historically been more closely associated with machine learning, data science, and predictive analytics techniques (this is usually the case when a large number of explanatory variables are available). Some domains have been more closely associated with futures studies, that I discussed here. I've included the relevant observations for individual domains where applicable.
Climate and weather forecasting
- The best weather forecasting methods use physical models rather than statistical models (though some statistics/probability is used to tackle some inherently uncertain processes, such as cloud formation). Moreover, they use simulations rather than direct closed form expressions. Errors compound over time due to a combination of model errors, measurement errors, and hypersensitivity to initial conditions.
- There are two baseline models against which the quality of any model can be judged: persistence (weather tomorrow is predicted to be the same as weather today) and climatology (weather tomorrow is predicted to be the average of the weather on that day over the last few years). We can think of persistence and climatology as purely statistical approaches, and these already do quite well. Any approach that consistently beats them needs to run very computationally intensive weather simulations.
- Even though a lot of computing power is used in weather prediction, human judgment still adds considerable value, about 10-25%, relative to what the computer models generate. This is attributed to humans being better able to integrate historical experience and common sense into their forecasts, and can offer better sanity checks. The use of machine learning tools in sanity-checking weather forecasts might substitute for the human value-added.
- Long-run climate forecasting methods are more robust in the sense of not being hypersensitive to initial conditions. Long-run forecasts require a better understanding of the speed and strength of various feedback mechanisms and equilibrating processes, and this makes them more uncertain. Whereas the uncertainty in short-run forecasts is mostly initial condition uncertainty, the uncertainty in long run forecasts arises from scenario uncertainty, plus uncertainty about the strength of various feedback mechanisms.
With long-term climate forecasting, a common alternative to forecasting is scenario analysis, such as that used by the IPCC in its discussion of long-term climate change. An example is the IPCC Special Report on Emissions Scenarios.
In addition to my overviews of weather and climate forecasting, I also wrote a series of posts on climate change science and some of its implications. These provide some interesting insight into the different points of contention related to making long-term climate forecasts, identifying causes, and making sense of a somewhat politicized realm of discourse. My posts in the area so far are below (I'll update this list with more posts as and when I make them):
- Climate science: how it matters for understanding forecasting, materials I've read or plan to read, sources of potential bias
- Time series forecasting for global temperature: an outside view of climate forecasting
- Carbon dioxide, climate sensitivity, feedback, and the historical record: a cursory examination of the Anthropogenic Global Warming (AGW) hypothesis
- [QUESTION]: What are your views on climate change, and how did you form them?
- The insularity critique of climate science
Agriculture and crop simulation
- Predictions of agricultural conditions and crop yields are made using crop simulation models (Wikipedia, PDF overview). Crop simulation models include purely statistical models, physical models that rely on simulations, and approximate physical models that use functional expressions.
- Weather and climate predictions are a key component of agricultural prediction, because of the dependence of agricultural yield on climate conditions. Some companies, such as The Climate Corporation (website, Wikipedia) specialize in using climate prediction to make predictions and recommendations for farmers.
- Business forecasting includes forecasting of demand, supply, and price.
- Time series forecasting (i.e., trying to predict future values of a variable from past values of that variable alone) is quite common for businesses operating in environments where they have very little understanding of or ability to identify and measure explanatory variables.
- As with weather forecasting, persistence (or slightly modified versions thereof, such as trend persistence that assumes a constant rate of growth) can generally be simple to implement while coming close to the theoretical limit of what can be predicted.
- More about business forecasting can be learned from the SAS Business Forecasting Blog or the Institute of Business Forecasting and Planning website and LinkedIn group.
Two commonly used journals in business forecasting are:
- Journal of Business Forecasting (website)
- International Journal of Business Forecasting and Marketing Intelligence (website)
Many of the time series used in the Makridakis Competitions (that I discussed in my review of historical evaluations of forecasting) come from businesses, so the lessons of that competition can broadly be said to apply to the realm of business forecasting (the competition also uses a few macroeconomic time series).
There is a mix of explicit forecasting models and individual judgment-based forecasters in the macroeconomic forecasting arena. However, unlike the case of weather forecasting, where the explicit forecasting models (or more precisely, the numerical weather simulations) improve forecast accuracy to a level that would be impossible for unaided humans, the situation with macroeconomic forecasting is more ambiguous. In fact, the most reliable macroeconomic forecasts seem to arise by taking averages of the forecasts of a reasonably large number of expert forecasters, each using their own intuition, judgment, or formal model. For an overview of the different examples of survey-based macroeconomic forecasting and how they compare with each other, see my earlier post on the track record of survey-based macroeconomic forecasting.
Political and geopolitical forecasting
I reviewed political and geopolitical forecasting, including forecasting for political conflicts and violence, in this post. A few key highlights:
- This is the domain where Tetlock did his famous work showing that experts don't do a great job of predicting things, as described in his book Expert Political Judgment. I discussed Tetlock's work briefly in my review of historical evaluations of forecasting.
- Currently, the most reliable source of forecasts for international political questions is The Good Judgment Project (website, Wikipedia), which relies on aggregating the judgments of contestants who are given access to basic data and are allowed to use web searches. The GJP is co-run by Tetlock. For election forecasting in the United States, PollyVote (website, Wikipedia), FiveThirtyEight (website, Wikipedia), and prediction markets such as Intrade (website, Wikipedia) and the Iowa Electonic Markets (website, Wikipedia) are good forecast sources. Of these, PollyVote appears to have done the best, but the others have been more widely used.
- Quantitative approaches to prediction rely on machine learning and data science, combined with text analysis of news of political events.
Forecasting of future population is a tricky business, but some aspects are easier to forecast than others. For instance, the population of 25-year-olds 5 years from now can be determined with reasonable precision by knowing the population of 20-year-olds now. Other variables, such as birth rates, are harder to predict (they can go up or down fast, at least in principle) but in practice, assuming level persistence or trend persistence can often offer reasonably good forecasts over the short term. While there are long-run trends (such as a trend of decline in both period fertility and total fertility) I don't know how well these declines were predicted. I wrote up some of my findings on the recent phenomenon of ultra-low fertility in many countries, so I have some knowledge of fertility trends, but I did not look systematically into the question of whether people were able to correctly forecast specific trends.
Gary King (Wikipedia) has written a book on demographic forecasting and also prepared slides covering the subject. I skimmed through his writing, but not enough to comment on it. It seems like mostly simple mathematics and statistics, tailored somewhat to the context of demographics.
With demographics, depending on context, scenario analyses may be more useful than forecasts. For instance, land use planning or city development may be done keeping in mind different possibilities for how the population and age structure might change.
Energy use forecasting (demand and supply)
Short-term energy use forecasting is often treated as a data science or predictive modeling problem, though ideas from general-purpose forecasting also apply. You can get an idea of the state of energy use forecasting by checking out the Global Energy Forecasting Competition (website, Wikipedia), carried out by a team led by Dr. Tao Hong, and cooperating with data science competitions company Kaggle (website, Wikipedia), some of the IEEE working groups, and the International Journal of Forecasting (one of the main journals of the forecasting community).
For somewhat more long-term energy forecasting, scenario analyses are more common. Energy is so intertwined with the global economy that an analysis of long-term energy use often involves thinking about many other elements of the world.
Shell (the organization to pioneer scenario analysis for the private sector) publishes some of its scenario analyses online at the Future Energy Scenarios page. While the understanding of future energy demand and supply is a driving force for the scenario analyses, they cover a wide range of aspects of society. For instance, the New Lens Scenario published in 2012 described two candidate futures for how the world might unfold till 2100, a "Mountains" future where governments played a major role and coordinated to solve global crises, and an "Oceans" future that was more decentralized and market-driven. (For a critique of Shell's scenario planning, see here). Shell competitor BP also publishes an Energy Outlook that is structured more as a forecast than as a scenario analysis, but does briefly consider alternative assumptions in a fashion similar to scenario analysis.
Many people in the LessWrong audience might find technology forecasting to be the first thing that crosses their minds when the topic of forecasting is raised. This is partly because technology improvements are quite salient. Improvements in computing are closely linked with the possibility of an Artificial General Intelligence. Famous among the people who view technology trends as harbingers of superintelligence is technologist and inventor Ray Kurzweil, who has been evaluated on LessWrong before. Website such as KurzweilAI.net and Exponential Times have popularized the idea of rapid, unprecedented, exponential growth, that despite its fast pace is somewhat predictable because of the close-to-exponential pattern.
One other point about technology forecasting: compared to other types of forecasting, technology forecasting is more intricately linked with the domain of futures studies (that I described here). Why technology forecasting specifically? Futures studies seems designed more for studying and bringing about change rather than determining what will happen at or by a specific time. Technology forecasting, unlike other forms of forecasting, is forecasting changes in the technology that we use to operate our lives. So this is the most transformative forecasting domain, and naturally attracts more attention from futures studies.
Note: Please see this post of mine for more on the project, my sources, and potential sources for bias.
I have written a couple of blog posts on my understanding of climate forecasting, climate change, and the Anthropogenic Global Warming (AGW) hypothesis (here and here). I also laid down the sources I was using to inform myself here.
I think one question that a number of readers may have had is: given my lack of knowledge (and unwillingness to undertake extensive study) of the subject, why am I investigating it at all, rather than relying on the expert consensus, as documented by the IPCC that, even if we're not sure is correct, is still the best bet humanity has for getting things right? I intend to elaborate on the reasons for taking a closer look at the matter, while still refraining from making the study of atmospheric science a full-time goal, in a future post.
Right now, I'm curious to hear how you formed your views on climate change. In particular, I'm interested in answers to questions such as these (not necessarily answers to all of them, or even to only these questions).
- What are your current beliefs on climate change? Specifically, would you defer to the view that greenhouse gas forcing is the main source of long-term climate change? How long-term? Would you defer to the IPCC range for climate sensitivity estimates?
- What were your beliefs on climate change when you first came across the subject, and how did your views evolve (if at all) on further reading (if you did any)? (Obviously, your initial views wouldn't have included beliefs about terms like "greenhouse gas forcing" or "climate sensitivity").
- What are some surprising things you learned when reading up about climate change that led you to question your beliefs (regardless of whether you changed them)? For instance, perhaps reading about Climategate caused you to critically examine your deference to expert consensus on the issue, but you eventually concluded that the expert consensus was still right.
- If you read my recent posts linked above, did the posts contain information that was new to you? Did any of this information surprise you? Do you think it's valuable to carry out this sort of exercise in order to better understand the climate change debate?
View more: Next