Thank you, I think this has direct applications for evaluating research in everyday life. Specifically, the most valuable role an expert can play is not synthesis of evidence (presenting a conclusion) but simply making sure that there is a correct overview of what evidence there is available. I should increase my credence in experts who seem to be engaging in this (collection and summary of evidence) behavior and lower my credence in experts who engage in lots of synthesis. Likewise, I should also not bother synthesizing but devote effort towards finding the best evidence available, collating it, and then getting feedback from lots of others on what the best synthesis would be. Perhaps doing things like Rot13 my own preliminary conclusions and ask people to comment on the evidence before reading them.
The key is to see how much their experience in the subject matter has facilitated being an expert on forecasting the subject matter, or forecasting in general. A doctor may be expert in the mechanics of a disease process, but wholly incompetent in statistical inference about that disease process.
Good post. To which I would add...
There is much more to expertise than forecasting. Also
"Things" could include social systems, people, business structures, advertising campaigns, not just machines of course.
A person may be a very good football coach in the sense of putting together a team that wins, or fixing a losing team, but may not be too good at making predictions. Doctors are notoriously bad a predicting patient outcomes but are often be very skilled at actually treating them.
I think to a degree you confuse assessing whether a group does have expertise with assessing whether they are *likely* to have expertise.
As far as factors that count against expertise being reliably or significantly present, to your politics I would add
1. Money. The medical literature is replete with studies showing huge effect sizes from "who paid the piper". In pharmaceutical research this seems to result in a 4X different chance of a positive result. But there is more than this; the ability to offer speaking and consultancy fees, funding of future projects etc can have a powerful effect.
Another example is the problem alluded to in relation to the consensus about the historical Jesus. When a field is dominated by people whose livelihood depends on their continuing to espouse a certain belief, the effect goes beyond those individuals and infects the whole field.
2. The pernicious effects of "great men" who can suppress dissent against their out of date views. Is the field pluralistic, realistically, and is dissent allowed? Science advances funeral by funeral. Have a look at what happened to John "Pure white and deadly" Yudkin.
3. Politics beyond what we normally think of as politics. Academia is notoriously "political" in this wider sense. Amplifying your point about reality checks, if feedback is not accurate, rapid, and unambiguous, it is hard for people in the field to know who is right, if anyone.
4. "Publish" or perish. There are massive incentives to get published or to get publicity or a high profile. This leads to people claiming expertise, results, achievements that are bogus. Consider for example the case of Theranos, which seemed, if media reports are accurate, to have no useful ability to build systems that did pathology tests, yet apparently hoodwinked many into thinking that they did.
You make a good point that claims of expertise without evidence or, worse, in the face of adverse evidence, are really really bad. I would go as far as to say that if you claim expertise but cannot prove it, I have a strong prior that you don't have it.
There are large groups of self-described experts who do not have expertise or at best have far less than they think. One should be alert to the possibility that "experts" aren't.
Also to reinforce a very important point: even when experts are not very expert, they are probably a lot better than you+google+30minutes!
I believe we should use analytics to find the commonalities in the opinions of groups of interacting experts
This post explores the question: how strongly should we defer to predictions and forecasts made by people with domain expertise? I'll assume that the domain expertise is legitimate, i.e., the people with domain expertise do have a lot of information in their minds that non-experts don't. The information is usually not secret, and non-experts can usually access it through books, journals, and the Internet. But experts have more information inside their head, and may understand it better. How big an advantage does this give them in forecasting?
Tetlock and expert political judgment
In an earlier post on historical evaluations of forecasting, I discussed Philip E. Tetlock's findings on expert political judgment and forecasting skill, and summarized his own article for Cato Unbound co-authored with Dan Gardner that in turn summarized the themes of the book:
Tetlock has since started The Good Judgment Project (website, Wikipedia), a political forecasting competition where anybody can participate, and with a reputation of doing a much better job at prediction than anything else around. Participants are given a set of questions and can basically collect freely available online information (in some rounds, participants were given additional access to some proprietary data). They then use that to make predictions. The aggregate predictions are quite good. For more information, visit the website or see the references in the Wikipedia article. In particular, this Economist article and this Business Insider article are worth reading. (I discussed the GJP and other approaches to global political forecasting in this post).
So at least in the case of politics, it seems that amateurs, armed with basic information plus the freedom to look around for more, can use "fox-like" approaches and do a better job of forecasting than political scientists. Note that experts still do better than ignorant non-experts who are denied access to information. But once you have basic knowledge and are equipped to hunt more down, the constraining factor does not seem to be expertise, but rather, the approach you use (fox-like versus hedgehog-like). This should not be taken as a claim that expertise is irrelevant or unnecessary to forecasting. Experts play an important role in expanding the scope of knowledge and methodology that people can draw on to make their predictions. But the experts themselves, as people, do not have a unique advantage when it comes to forecasting.
Tetlock's research focused on politics. But the claim that the fox-hedgehog distinction turns out to be a better prediction of forecasting performance than the level of expertise is a general one. How true is this claim in domains other than politics? Domains such as climate science, economic growth, computing technology, or the arrival of artificial general intelligence?
Armstrong and Green again
J. Scott Armstrong is a leading figure in the forecasting community. Along with Kesten C. Green, he penned a critique of the forecasting exercises in climate science in 2007, with special focus on the IPCC reports. I discussed the critique at length in my post on the insularity critique of climate science. Here, I quote a part from the introduction of the critique that better explains the general prior that Armstrong and Green claim to be bringing to the table when they begin their evaluation. Of the points they make at the beginning, two bear directly on the deference we should give to expert judgment and expert consensus:
Armstrong and Green later elaborate on these claims, referencing Tetlock's work. (Note that I have removed the parts of the section that involve direct discussion of climate-related forecasts, since the focus here is on the general question of how much deference to show to expert consensus).
Note that the claims that Armstrong and Green make are in relation to unaided expert judgment, i.e., expert judgment that is not aided by some form of assistance or feedback that promotes improved forecasting. (One can argue that expert judgment in climate science is not unaided, i.e., that the critique is mis-applied to climate science, but whether that is the case is not the focus of my post). While Tetlock's suggestion to be more fox-like, Armstrong and Green recommend the use of their own forecasting principles, as encoded in their full list of principles and described on their website.
A conflict of intuitions, and an attempt to resolve it
I have two conflicting intuitions here. I like to use the majority view among experts as a reasonable Bayesian prior to start with, that I might then modify based on further study. The relevant question here is who the experts are. Do I defer to the views of domain experts, who may know little about the challenges of forecasting, or do I defer to the views of forecasting experts, who may know little of the domain but argue that domain experts who are not following good forecasting principles do not have any advantage over non-experts?
I think the following heuristics are reasonable starting points:
Politicization?
My first thought was that the more politicized a field, the less reliable any forecasts coming out of it. I think there are obvious reasons for that view, but there are also countervailing considerations.
The main claimed danger of politicization is groupthink and lack of openness to evidence. It could even lead to suppression, misrepresentation, or fabrication of evidence. Quite often, however, we see these qualities in highly non-political fields. People believe that certain answers are the right ones. Their political identity or ego is not attached to it. They just have high confidence that that answer is correct, and when the evidence they have does not match up, they think there is a problem with the evidence. Of course, if somebody does start challenging the mainstream view, and the issue is not quickly resolved either way, it can become politicized, with competing camps of people who hold the mainstream view and people who side with the challengers. Note, however, that the politicization has arguably reduced the aggregate amount of groupthink in the field. Now that there are two competing camps rather than one received wisdom, new people can examine evidence and better decide which camp is more on the side of truth. People in both camps, now that they are competing, may try to offer better evidence that could convince the undecideds or skeptics. So "politicization" might well improve the epistemic situation (I don't doubt that the opposite happens quite often). Examples of such politicization might be the replacement of geocentrism by heliocentrism, the replacement of creationism by evolution, and the replacement of Newtonian mechanics by relativity and/or quantum mechanics. In the first two cases, religious authorities pushed against the new idea, even though the old idea had not been a "politicized" tenet before the competing claims came along. In the case of Newtonian and quantum mechanics, the debate seems to have been largely intra-science, but quantum mechanics had its detractors, including Einstein, famous for the "God does not play dice" quip. (This post on Slate Star Codex is somewhat related).
The above considerations aren't specific to forecasting, and they apply even for assertions that fall squarely within the domain of expertise and require no forecasting skill per se. The extent to which they apply to forecasting problems is unclear. It's unclear whether most domains have any significant groupthink in favor of particular forecasts. In fact, in most domains, forecasts aren't really made or publicly recorded at all. So concerns of groupthink in a non-politicized scenario may not apply to forecasting. Perhaps the problem is the opposite: forecasts are so unimportant in many domains that the forecasts offered by experts are almost completely random and hardly informed in a systematic way by their expert knowledge. Even in such situations, politicization can be helpful, in so far as it makes the issue more salient and might prompt individuals to give more attention to trying to figure out which side is right.
The case of forecasting AI progress
I'm still looking at the case of forecasting AI progress, but for now, I'd like to point people to Luke Muehlhauser's excellent blog post from May 2013 discussing the difficulty with forecasting AI progress. Interestingly, he makes many points similar to those I make here. (Note: Although I had read the post around the time it was published, I hadn't read it recently until I finished drafting the rest of my current post. Nonetheless, my views can't be considered totally independent of Luke's because we've discussed my forecasting contract work for MIRI).
Looking for thoughts
I'm particularly interested in thoughts from people on the following fronts: