I'm unfortunately swamped right now, because I'd love to spend time working on this. However, I want to include a few notes, plus reserve a spot to potentially reply more in depth when I decide to engage in some procrastivity.
First, the need for extremizing forecasts (See: Jonathan Baron, Barbara A. Mellers, Philip E. Tetlock, Eric Stone, Lyle H. Ungar (2014) Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis 11(2):133-145. http://dx.doi.org/10.1287/deca.2014.0293) seems like evidence that this isn't typically the dominant factor in forecasting. However, c.f. the usefulness of teaming and sharing as a way to ensure actual reasons get accounted for ( Mellers, B., Ungar, L., Baron, J., Ramos, J., Gurcay, B., Fincher, K., ... & Murray, T. (2014). Psychological strategies for winning a geopolitical forecasting tournament. Psychological science, 25(5), 1106-1115. )
Second, the solution that Pearl proposed for message-passing to eliminate over-reinforcement / double counting of data seems to be critical and missing from this discussion. See his book: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades. The assumption of both models, however, is that there is iterated / repeated communication. I suspect that we can model info-cascades as a failure at exactly that point - in the examples given, people publish papers, and there is no dialogue. For forecasting, explicit discussion of forecasting reasons should fix this. (That is, I might say "My model says 25%, but I'm giving that only 50% credence and allocating the rest to the consensus value of 90%, leading to my final estimate of 57.5%")
Third, I'd be really interested in formulating testable experimental setups in Mturk or similar to show/not show this occurring, but on reflection this seems non-trivial, and I haven't thought much about how to do it other than to note that it's not as easy as it sounded at first.
Thanks for this.
Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that "more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke."
That's a great point. I'm uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It's almost the equivalent of betting a dollar more than the current high bid in price is right - you don't need to be close, you just need to beat the other people's scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
I need to think about this more, but if Aumann agreement is done properly, people eventually converge on correct models of other reasoners, which should also stop info-cascades.
There has been work on this. I believe this is a relevant reference, but I can't tell for sure without paying to access the article:
Protocols Forcing Consensus, Paul Krasucki
The idea is this: Aumann agreement is typically studied with two communicating agents. We can instead study networks of agents, with various protocols (ie, rules for when agents talk to each other). However, not all such protocols reach consensus, the way we see with two agents!
I believe the condition for reaching consensus is directly analogous to the condition for correctness of belief prop in Bayesian networks, IE, the graph should be a tree.
We (jacobjacob and Benito) decided to award $150 (out of the total bounty of $800) to this answer (and the additional points made in the discussion).
It offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.
I'll PM you about payment details.
Here's a quick bibliography we threw together.
Background:
Information Cascades and Rational Herding: An Annotated Bibliography and Resource Reference (Bikchandani et al. 2004). The best resource on the topic, see in particular the initial papers on the subject.
Y2K Bibliography of Experimental Economics and Social Science Information Cascades and Herd Effects (Holt, 1999. Less thorough, but catches some papers the first one misses.
“Information cascade” from Wikipedia. An excellent introduction.
“Understanding Information Cascades” from Investopedia.
Previous LessWrong posts referring to info cascades:
Information cascades, by Johnicholas, 2009
Information cascades in scientific practice, by RichardKennaway, 2009
Information cascades, LW Wiki
And then here are all the LW posts we could find that used the concept (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) . Not sure how relevant they are, but might be useful in orienting around the concept.
Two recent articles that review the existing economic literature on information cascades:
An earlier review:
Information Cascades in Multi-Agent Models by Arthur De Vany & Cassey Lee has a section with a useful summary of the relevant economic literature up to 1999. (For more recent overviews, see my other comment.) I copy it below, with links to the works cited (with the exception of Chen (1978) and Lee (1999), both unpublished doctoral dissertations, and De Vany and Walls (1999b), an unpublished working paper):
A seminal paper by Bikhchandani et al (1992) explains the conformity and fragility of mass behavior in terms of informational cascades. In a closely related paper Banerjee (1992) models optimizing agents who engage in herd behavior which results in an inefficient equilibrium. Anderson and Holt (1997) are able to induce information cascades in a laboratory setting by implementing a version of Bikhchandani et al (1992) model.
The second strand of literature examines the relationship between information cascades and large fluctuations. Lee (1998) shows how failures in information aggregation in a security market under sequential trading result in market volatility. Lee advances the notion of “informational avalanches” which occurs when hidden information (e.g. quality) is revealed during an informational cascade thus reversing the direction of information cascades.
The third strand explores the link between information cascades and heavy tailed distributions. Cont and Bouchaud (1998) put forward a model with random groups of imitators that gives rise to stock price variations that are heavy-tailed distributed. De Vany and Walls (1996) use a Bose-Einstein allocation model to model the box office revenue distribution in the motion picture industry. The authors describe how supply adapts dynamically to an evolving demand that is driven by an information cascade (via word-of-mouth) and show that the distribution converges to a Pareto-Lévy distribution. The ability of the Bose-Einstein allocation model to generate the Pareto size distribution of rank and revenue has been proven by Hill (1974) and Chen (1978). De Vany and Walls (1996) present empirical evidence that the size distribution of box office revenues is Pareto. Subsequent work by Walls (1997), De Vany and Walls (1999a), and Lee (1999) has verified this finding for other markets, periods and larger data sets. De Vany and Walls (1999a) show that the tail weight parameter of the Pareto-Levy distribution implies that the second moment may not be finite. Lastly, De Vany and Walls (1999b) have shown that motion picture information cascades begin as action-based, noninformative cascades, but undergo a transition to an informative cascade after enough people have seen it to exchange “word of mouth” information. At the point of transition from an uninformed to an informed cascade, there is loss of correlation and an onset of turbulence, followed by a recovery of week to week correlation among high quality movies.
We (jacobjacob and Ben Pace) decided to award $100 (out of the total bounty of $800) to this answer.
It compiles a useful summary of the literature (we learnt a lot from going through on of the papers linked), and it attaches handy links to everything, which is a task which is on the one hand very helpful to other people, and on the other tedious and without many marginal benefits for the writer, and so likely to be under-incentivised.
I'll PM you for payment details.
Generally, there is a substantial literature on the topic within the field of network science. The right keywords for Google scholar are something like spreading dynamics in complex networks. Information cascades does not seem to be the best choice of keywords.
There are many options how you can model the state of the node (discrete states, oscillators, continuous variables, vectors of anything of the above,...), multiple options how you may represent the dynamics (something like Ising model / softmax, versions of voter model, oscillator coupling, ...) and multiple options how you model the topology (graphs with weighted or unweighted edges, adaptive wiring or not, topologies based on SBM, or scale-free networks, or Erdős–Rényi, or Watts-Strogatz, or real-world network data,... This creates somewhat large space of options, which were usually already explored somewhere in the literature.
What is possibly the single most important thing to know about this, there are universality classes of systems which exhibit similar behaviour; so you can often ignore the details of the dynamics/topology/state representation.
Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)
I haven't looked through your links in much detail, but wanted to reply to this:
Overall I would suggest to approach this with some intellectual humility and study existing research more, rather then try to reinvent large part of network science on LessWrong. (My guess is something like >2000 research years were spent on the topic often by quite good people.)
I either disagree or am confused. It seems good to use resources to outsource your ability to do literature reviews, distillation or extrapolation, to someone with higher comparative advantage...
Agreed. I realise the OP could be misread; I've updated the first paragraph with an extra sentence mentioning that summarising and translating existing work/literature in related domains is also really helpful.
Thanks for the pointers to network science Jan, I don't know this literature, and if it's useful here then I'm glad you understand it well enough to guide us (and others) to key parts of it. I don't see yet how to apply it to thinking quantitatively about scientific and forecasting communities.
If you (or another LWer) thinks that the theory around universality classes is applicable in thinking about how to ensure good info propagation in e.g. a scientific community, and you're right, then I (and Jacob and likely many others) would ...
Short summary of how is the lined paper important: you can think about bias as some sort of perturbation. You are then interested in the "cascade of spreading" of the perturbation, and especially factors like the distribution of sizes of cascades. The universality classes tell you this can be predicted by just a few parameters (Table 1 in the linked paper) depending mainly on local dynamic (forecaster-forecaster interactions). Now if you have a good model of the local dynamic, you can determine the parameters and determine into which universality class the problem belongs. Also you can try to infer the dynamics if you have good data on your interactions.
I'm afraid I don't know enough about how "forecasting communities" work to be able to give you some good guesses what may be the points of leverage. One quick idea, if you have everybody on the same platform, may be to do some sort fo A/B experiment - manipulate the data so some forecasters would see the predictions of other with an artificially introduced perturbation, and see how their output will be different from the control group. If you have data on "individual dynamics" liken that, and some knowledge of network structure, the theory can help you predict the cascade size distribution.
(I also apologize for not being more helpful, but I really don't have time to work on this for you.)
Information cascades does not seem to be the best choice of keywords.
I wouldn't say that 'information cascades' isn't the best choice of keywords. What's happening here is that the same phenomenon is studied by different disciplines in relative isolation from each other. As a consequence, the phenomenon is discussed under different names, depending on the discipline studying it. 'Information cascades' (or, as it is sometimes spelled, 'informational cascades') is the name used in economics, while network science seems to use a variety of related expressions, such as the one you mention.
We (jacobjacob and Ben Pace) decided to award $200 (out of the total bounty of $800) to this answer (and the additional comment below).
It seems to offer a learnt summary of the relevance of network science (which offers a complementary perspective on the phenomenon to the microeconomic literature linked by other commenters), which not implausibly took Jan at least an order of magnitude less time to compile than it would have taken us. (For example, the seemingly simple fact of using a different Google scholar keyword than "information cascade" m...
This paper looks at the dynamics of information flows in social networks using multi-agent reinforcement learning. I haven't read it, but am impressed by the work of the second author. Abstract:
We model the spread of news as a social learning game on a network. Agents can either endorse or oppose a claim made in a piece of news, which itself may be either true or false. Agents base their decision on a private signal and their neighbors' past actions. Given these inputs, agents follow strategies derived via multi-agent deep reinforcement learning and receive utility from acting in accordance with the veracity of claims. Our framework yields strategies with agent utility close to a theoretical, Bayes optimal benchmark, while remaining flexible to model re-specification. Optimized strategies allow agents to correctly identify most false claims, when all agents receive unbiased private signals. However, an adversary's attempt to spread fake news by targeting a subset of agents with a biased private signal can be successful. Even more so when the adversary has information about agents' network position or private signal. When agents are aware of the presence of an adversary they re-optimize their strategies in the training stage and the adversary's attack is less effective. Hence, exposing agents to the possibility of fake news can be an effective way to curtail the spread of fake news in social networks. Our results also highlight that information about the users' private beliefs and their social network structure can be extremely valuable to adversaries and should be well protected.
There's better, simpler results that I recall but cannot locate right now on doing local updating that is algebraic, rather than deep learning. I did find this, which is related in that it models this type of information flow and shows it works even without fully Bayesian reasoning; Jadbabaie, A., Molavi, P., Sandroni, A., & Tahbaz-Salehi, A. (2012). Non-Bayesian social learning. Games and Economic Behavior, 76(1), 210–225. https://doi.org/https://doi.org/10.1016/j.geb.2012.06.001
Given those types of results, the fact that RL agents can learn to do this should be obvious. (Though the social game dynamic result in the paper is cool, and relevant to other things I'm working on, so thanks!)
Have you read the backreaction blog where Sabine Hossenfelder details much the same phenomenon in high-energy physics? She claims that the prevailing groupthink ended up believing into String Theory without a shred of evidence for (only some vague hints), and so far with every single prediction of it refuted?
UPDATE.
We (jacobjacob and Ben Pace) have finally settled on the allocation of the $800 bounty for this question. All the motivations are summarised in this comment, together with links to the relevant prize-winning answer/comment.
We will also post individual notices with motivations next to each comment for ease of discussing them.
We'll PM all prize winners to sort out logistical details of payment.
Main post
David Manheim (answer and additional points made in discussion) $150
This answer offers relevant and robust evidence about the role of info-cascades in forecasting environments, together with a discussion of its interpretation.
Jan Kulveit (answer and additional comment below) $200
This answer seems to offer a learnt summary of the relevance of network science (which offers a complementary perspective on the phenomenon to the microeconomic literature linked by other commenters), which not implausibly took Jan at least an order of magnitude less time to compile than it would have taken us. (For example, the seemingly simple fact of using a different Google scholar keyword than "information cascade" might have taken several hours to realise for a non-expert.) It also attempts to apply these to the case of forecasting (despite Jan's limited knowledge of the domain), which is a task that would likely have been even harder to do without deep experience of the field.
Pablo (1 and 2) $100
These answers compile a useful summary of the literature (we learnt a lot from going through on of the papers linked), and it attaches handy links to everything, which is a task which is on the one hand very helpful to other people, and on the other tedious and without many marginal benefits for the writer, and so likely to be under-incentivised.
Michael McLaren $50
This answer:
Ways of responding
David Manheim $50
This answer offers a practical example of a cascade-like phenomenon, which is both generally applicable and has real economic consequences. Also, the fact that it comes with a game to understand and practice responding is rare and potentially quite valuable (I (jacobjacob) am of the opinion that deliberate practice is currently a neglected virtue in the rationality/EA spheres).
rossry $250
This answer does several important things.
Is it necessarily a good idea to break up the topic into so many separate questions before having a general discussion post about it first? I would imagine that people might have comments which were related to several of the different questions, but now the discussion is going to get fragmented over many places.
E.g. if someone knows about a historical info cascade in academia and how people failed to deal with that, then that example falls under two different questions. So then the answer with that example either has to be be split into two or to be posted in an essentially similar form on both pages, neither of which is good for keeping the entire context of the discussion in one place.
Separately, there's a part of me that finds it viscerally annoying to have multiple questions around the same theme posted around the same time. It feels like it incentivizes people with a pet topic to promote that topic by asking a lot of questions about it so that other topics get temporarily drowned out. Even if the topic is sometimes important enough to be worth it, it still feels like the kind of thing to discourage.
I also have this visceral feeling. It feels like a "subquestions" feature could fix both these issues.
Seems like a sensible worry, and we did consider some version of it. My reasoning was roughly:
1) The questions feature is quite new, and if it will be very valuable, most use-cases and the proper UI haven't been discovered yet (these can be hard to predict in advance without getting users to play around with different things and then talking to them).
No one has yet attempted to use multiple questions. So it would be valuable for the LW team and the community to experiment with that, despite possible countervailing considerations (any good experiment will have sufficient uncertainty that such considerations will always exist).
2) Questions 1/2, 3 and 4 are quite different, and it seems good to be able to do research on one sub-problem without taking mindshare from everyone working on any subproblem.
This is an update on the timeline for paying out the bounties on this question. They will be awarded for work done before May 13th, but we're delayed by another few weeks in deciding on the allocation. Apologies!
Nissen et al 2016 ("Publication bias and the canonization of false facts") give a simple model for how publication bias in academic research can have a similar effect to the "information cascades" described in the OP. False scientific claims are likely to be falsified by an experiment, but will sometimes be found to be true. Positive results supporting a claim may be more likely to be published than negative results against the claim. The authors' model assumes that the credence of the scientific community in the claim is determined by the number of published positive and negative results, and that new studies will be done to repeatedly test the claim until the credence becomes sufficiently close to 0 or 1. The publication bias favoring false results can overpower the odds against getting a positive result in any given experimental replication and lead to false claims becoming canonized as fact with a non-negligible probability.
The mechanism here differs in a sense from the "information cascade" examples in the OP and on the Wikipedia page in that the false claim is being repeatedly tested with new experiments. However, I think it could be seen as fundamentally the same as the citation bias example of Greenberg 2009 in the OP, if we think of the scientific community rather than an individual scientist as being the actor. In the Greenberg 2009 example, the problem is that individual scientists tend only to cite positive findings; in the Nissen et al model, the scientific community tends to only publish positive findings. (Of course, this second problem feeds into the first.)
Meta: Because we think understanding info cascades are important, we recently spent ~10 hours trying to figure out how to quantitatively model them, and have contributed our thinking as answers below. While we currently didn't have the time to continue exploring, we wanted to experiment with seeing how much the LW community could together build on top of our preliminary search, so we’ve put up a basic prize for more work and tried to structure the work around a couple of open questions. This is an experiment! We’re looking forward to reading any of your contributions to the topic, including things like summaries of existing literature and building out new models of the domain.
Background
Consider the following situation:
An information-cascade occurs when people update on each others beliefs, rather than sharing the causes of those beliefs, and those beliefs end up with a vestige of support that far outstrips the evidence for them. Satvik Beri might describe this as the problem of only sharing the outputs of your thinking process, not your inputs.
The dynamics here are perhaps reminiscent of those underlying various failures of collective rationality such as asset bubbles, bystander effects and stampedes.
Note that his effect is different from other problems of collective rationality like the replication crisis, which involve low standards for evidence (such as unreasonably lax p-value thresholds or coordination problems preventing publishing of failed experiments), or the degeneracy of much online discussion, which involves tribal signalling and UI encouraging problematic selection effects. Rather, information cascades involve people rationally updating without any object-level evidence at all, and would persist even if the replication crisis and online outrage culture disappeared. If nobody lies or tells untruths, you can still be subject to an information cascade.
Questions
Ben and I are confused about how to think about the negative effects of this problem. We understand the basic idea, but aren't sure how to reason quantitatively about the impacts, and how to trade-off solving these problems in a community versus doing other improvements to overall efficacy and efficiency of a community. We currently know only how to think about these qualitatively.
We’re posting a couple of related questions that we have some initial thoughts on, that might help clarify the problem.
How common, and how large, are info-cascades in communities that seek to make intellectual progress, such as academia?
How can we quantify the impact (harm) of info-cascades, and think about them in cost-effectiveness terms?
What have been some historically effective ways of responding to cascades, and where have those approaches failed?
How do you mathematically formalise information cascades around continuous variables?
If you have something you’d like to contribute, but that doesn’t seem to fit into the related questions above, leave it as an answer to this question.
Bounties
We are committing to pay at least either $800 or (No. of answers and comments * $25), whichever is smaller, for work on this problem recorded on LW, done before May 13th. The prize pool will be split across comments in accordance with how valuable we find them, and we might make awards earlier than the deadline (though if you know you’ll put in work in x weeks, it would be good to mention that to one of us via PM).
Ben and Jacob are each responsible for half of the prize money.
Jacob is funding this through Metaculus AI, a new forecasting platform tracking and improving the state-of-the-art in AI forecasting, partly to help avoid info-cascades in the AI safety and policy communities (we’re currently live and inviting beta-users, you can sign-up here).
Examples of work each of us are especially excited about:
Jacob
Contributions to our Guesstimate model (linked here), such as reducing uncertainty on the inputs or using better models.
Extensions of the Guesstimate model beyond biomedicine, especially in ways that make it more directly applicable to the rationality/effective altruism communities
Examples and analysis of existing interventions that deal with this and what makes them work, possibly suggestions for novel ones (though avoiding the trap of optimising for good-seeming ideas)
Discussion of how the problem of info-cascades relates to forecasting
Ben
Concise summaries of relevant papers and their key contributions
Clear and concise explanations of what other LWers have found (e.g. turning 5 long answers into 1 medium sized answer that links back to the others while still conveying the key info. Here’s a good example of someone distilling an answer section).