Curated.
There's a certain challenge in articulating theories, but another challenge in showing that those theories borne out in the real world. I really value this post for taking the contents of a highly upvoted post that only used a vivid illustration and showing that you actually see those things in the wild. It's confirming that the map actually matches the territory, and I'd love to see that happen for even more of the ideas developed on LessWrong. Kudos!
Another possible example:
If we view markets as prediction systems, there is a great example of self-fulfilling prophecy in the form of the Black-Scholes option pricing model. Before its publication, the price of options were very random, and the prices could be almost anywhere. Once a (supposedly) normative model for prices was available, people's willingness to trade converged to those prices fairly quickly.
(This simplifies slightly, because part of the B-S model was arbitrage, which allowed markets to reinforce these "correct" prices, but it's a useful example of when a prediction can stabilize the system.)
For anyone interested, the keyword to read about things like this in the economics literature is "performativity"
There's a legend about a stock market prediction scam:
Pick 2^N potential targets. Send half of them a prediction that a stock will go up and the other half a prediction will go down. Eliminate the people who got a wrong prediction, and then do this again and again. Eventually you'll end up with one guy who's convinced you're never wrong, so charge him an arm and a leg for investment advice.
You can see this effect for election predictions, such that there are plenty of smallish predictors which predicted the result of the current election closely (but such that it's easy to speculate that they're just a selection effect)
Thankfully, this scam is far less viable now that people can google the writers of these predictions.
And there was always the simple defense of not trusting stock picks from people who aren't very wealthy themselves, and already managing people's money successfully in public view.
Thanks for this selection of examples! Predict-o-Matic scenarios are some of the short term scenarios that worry me the most, and it's great to see someone tackling them.
What I would personally want to know is "Which minimal conditions are necessary for a Predict-O-Matic scenario to appear?". Splitting the issues as you did will definitely help in answering that question!
"Superforecasters learning to choose easier questions"
Just wanted to note that it's not easier questions, per se, it's ones where you have a marginal advantage due to information or skill asymmetry. And because it's a competition, at least sometimes, you have an incentive to predict on questions that are being ignored as well. There are definitely fewer people forecasting more intrinsically uncertain questions, but since participants get scored with the superforecaster median for questions they don't answer, that's a resource allocation question, rather than the system interfering with the real world. We see this happening broadly when prediction scoring systems don't match incentives, but I've discussed that elsewhere, and there was a recent LW post on the point as well.
Mostly, this type of interference is from real-world goals to predictions, rather than the reverse. We do see some interference in prediction markets in order to change real world outcomes happens, in the first half of the 20th century: "The newspapers periodically contained charges that the partisans were manipulating the reported betting odds to create a bandwagon effect." (Rhode and Strumpf, 2003)
I read that line differently, though I agree with your remarks. "Superforecasters learning to choose easier questions" was, to me, at least as much about the suite of questions posed to the forecasters as the questions each individual forecaster chooses to answer. If a forecasting firm wants to build a reputation, they could potentially learn how to ask questions that look harder to answer than they really are.
That's a good point. For some of the questions, that's a reasonable criticism, but as GJ Inc. becomes increasingly based on client-driven questions, it's a less viable strategy.
Note that Trump got around 63M votes in 2016, and around 71M in 2020, whereas Democrats got 66M and 75M respectively.
The 2020 results are 81M-74M with some votes still left to count. 75M-71M might have been the margin a few weeks ago when there were still a bunch more not-yet-counted votes.
Related to the ReplicationMarkets example: on Metaculus, there is an entire category of self-resolving questions, where resolution is at least in part determined by how users predict the question will resolve. We have seen at least one instance of manipulation of such questions. And there is even a kind of meta-self-resolving question, asking users to predict what the sentiment of Metaculus users will be with regard to self-resolving questions.
Predictive accuracy brings trust, and trust brings power. Making a series of correct and meaningful predictions can bring fame and fortune.
It's actually surprising that people don't do it more. Even if they're just guessing, it's a little bit like buying a lottery ticket. Maybe this is because our society has enforcement mechanisms against wild prognostication. You have to earn the right to make forecasts.
Perhaps we can view credentialism, in this light, as a guard against false positives.
Unfortunately, we don't take the further step of vetting the predictive track record of the people with credentials. We just kind of assume we know what they're talking about.
I think this is exactly what most pundits do, and it's well known that correct predictions are reputation makers.
The problem is that making more than one correct but still low-probability prediction is incredibly unlikely, since you multiply two small numbers. This functions as a very strong filter. And you don't need to carefully vet track records to see when someone loudly gets it wrong, so as we see, most pundits stop making clear and non-consensus predictions once they start making money as pundits.
Another example, from @albrgr
"This is kind of crazy: https://nber.org/digest-202012/corporate-reporting-era-artificial-intelligence Companies have learned to use (or exclude) certain words to make their corporate filings be interpreted more positively by financial ML algorithms."
Then quoting from the article:
The researchers find that companies expecting higher levels of machine readership prepare their
disclosures in ways that are more readable by this audience. "Machine readability" is measured in
terms of how easily the information can be processed and parsed, with a one standard deviation
increase in expected machine downloads corresponding to a 0.24 standard deviation increase in
machine readability. For example, a table in a disclosure document might receive a low readability
score because its formatting makes it difficult for a machine to recognize it as a table. A table in a
disclosure document would receive a high readability score if it made effective use of tagging so
that a machine could easily identify and analyze the content.
Companies also go beyond machine readability and manage the sentiment and tone of their
disclosures to induce algorithmic readers to draw favorable conclusions about the content. For
example, companies avoid words that are listed as negative in the directions given to algorithms.
The researchers show this by contrasting the occurrence of positive and negative words from the
Harvard Psychosocial Dictionary — which has long been used by human readers — with those
from an alternative, finance-specific dictionary that was published in 2011 and is now used
extensively to train machine readers. After 2011, companies expecting high machine readership
significantly reduced their use of words labelled as negatives in the finance-specific dictionary,
relative to words that might be close synonyms in the Harvard dictionary but were not included in
the finance publication. A one standard deviation increase in the share of machine downloads for a
company is associated with a 0.1 percentage point drop in negative-sentiment words based on the
finance-specific dictionary, as a percentage of total word count.
In particular, I’d appreciate more examples of prediction systems making the world more predictable.
There is a possibly apocryphal anecdote about how, prior to the publication of the Black-Scholes model, option prices approximately reflected theory. After the publication of the model, option prices precisely reflected theory because everyone was using the model to price options!
I have never been able to find a source for this story but it should be easy enough to verify through historical options data.
EDIT: I apparently failed to read as far as the first comment: https://www.lesswrong.com/posts/6bSjRezJDxR2omHKE/real-life-examples-of-prediction-systems-interfering-with?commentId=2kKZ87cQxmMviyJmc
I think crypto markets can't be regulated except by random moderators' filtering on bets and betters' choices of where to put money. It seems someone could put a million dollars against a terrorist attack on a certain date and hope someone bets against it & executes to get the money. So a betting market allows hiring for certain tasks (not most tasks) with reliable verification & payout, and you get your money back if it doesn't happen. I have some faith in moderators' filters, though. I hope they would have the wisdom to forbid bets on terrorist attacks, assassinations, etc. Insider trading cannot be prevented (as far as I can tell) if betting is anonymous…
One such example that comes to my mind that happens all the time is: a grocery store sends a forecast to the supplier of how much you want to buy, the supplier gets the goods and sends them to you. You cannot sell more than you have =) So the forecast will impact reality, if you actually have two times more customers wanting to buy that product, you will sell only what you forecasted. So 100% accuracy in forecast (because you will sell everything that you forecasted), but in fact, it was a very very bad forecast with 100% accuracy.
Two other examples:
Thanks to Ozzie Gooen for reviewing this post.
Introduction
The Parable of the Predict-O-Matic is a short story which considers a forecasting system which is ostensibly set-up to maximize accuracy, and which ends up interfering with the world in ways not intended. In the original story, some of these problems were:
Below, I give some real-life examples of these problems, though some are speculative.
Previous work:
Fake polls by PredictIt forecasters
Example of: Markets for entropy.
PredictIt traders created fake polls to fool and troll other forecasters and the media, per FiveThirtyEight’s Fake Polls Are A Real Problem. Quoting liberally from the article:
(the story then continues).
The paper Fake Polls, Real Consequences: The Rise of Fake Polls and the Case for Criminal Liability contains many more examples in pages 140 to 150 (13 to 23 of the linked pdf):
Stock markets
Example of: Self-fulfilling prophecies, markets for entropy.
This example was mentioned in the original Predict-O-Matic story: "If it says stocks will rise, they'll rise." One sometimes sees this effect with companies Warren Buffet is rumored to be buying.
Additionally, hedge funds normally try to predict which companies will do better, but companies such as Third Point Management also exist:
Further, rules against insider trading exist in order to avoid markets for entropy; otherwise a CEO of a company could profit by shorting its stock and running the company to the ground. More narratively satisfying, in Casino Royale the villain buys put options on an experimental aerospace manufacturer, betting on the company's failure and then organizing a terrorist attack on their only experimental plane.
Outside the realm of fiction:
US election
Example of: Fixed-point problems
Plausibly, in the 2016 election, overconfident win predictions for Hillary Clinton led to lower turnout, which led to her loss. Note that Trump got around 63M votes in 2016, and around 74M in 2020, whereas Democrats got 66M and 81M respectively.
This paper (available on sci-hub) makes a similar point (note in particular Figure 3, with two fixed points):
This NYT article makes a similar point:
Ebola forecast may have run into fixed-point problem
Example of: Fixed point problems.
A fatalistic Ebola forecast may have played a role in Ebola having been contained early.
Source: Assessing the Performance of Real-Time Epidemic Forecasts: A Case Study of Ebola in the Western Area Region of Sierra Leone, 2014-15.
ReplicationMarkets participants may have tried to cheat Keynesian beauty contest.
Example of: Markets for entropy.
ReplicationMarkets is an experiment to see if the replication of papers can be predicted. They run contests, structured with a survey round, in which participants make predictions alone, followed by a market round, in which participants trade contracts in a market.
Some of the papers are then chosen for replication, and the contracts resolve, giving some payouts to the participants. But this happens far in the future, and in the meantime, participants are also paid according to their predictions during the survey round. I suspect some participants coordinated to exploit this mechanism, coordinating to predict something unlikely during the survey round:
Source: Speculation, ReplicationMarkets newsletter, this comment.
Superforecasters learning to choose easier questions
Example of: Other.
Tetlock explicitly mentions this in one of his Ten Commandments for Superforecasters: "Focus on questions where your hard work is likely to pay off," so Superforecasters learn to not forecast on the more intractable questions.
Surnames as a mechanism of control and taxation
Example of: Nudge towards legibility and predictability.
The introduction of surnames facilitated identification, taxation and statistical aggregation, and was often resisted by the local population. In this example, the prediction problem is usually “how much can the authorities tax or conscript?,” and the interference is forcing or incentivizing locals to adopt unambiguous name-surname combinations.
One can see an example of this need in this scene from The Wire (the big guy is ironically called "Little Kevin", and the police can't identify him.)
Source: The Production of Legal Identities Proper to States: The Case of the Permanent Family Surname (available on sci-hub):
Conclusion
Above are some real-life examples of prediction systems problematically interfering with the real world. More examples are welcome! In particular, I’d appreciate more examples of prediction systems making the world more predictable.