2021 Note: This was written and posted in December, 2016. The date it shows on LessWrong is 2019; I believe this refers to a time the post was (very minorly) updated, as part of moving it to the Prediction-Driven Collaborative Reasoning Systems sequence.

Prediction markets are powerful, but also still quite niche. I believe that part of this lack of popularity could be solved with significantly better tools. During my work with Guesstimate I’ve thought a lot about this issue and have some ideas for what I would like to see in future attempts at prediction technologies.

1. Machine learning for forecast aggregation

In financial prediction markets, the aggregation method is the market price. In non-market prediction systems, simple algorithms are often used. For instance, in the Good Judgement Project, the consensus trends displays “the median of the most recent 40% of the current forecasts from each forecaster.”[1] Non-financial prediction aggregation is a pretty contested field with several proposed methods.[2][3][4]

I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.

Perhaps aggregation methods, especially with training data, could partially detect and offset predictable human biases. If it is well known that people making estimates of project timelines are overconfident, then this could be taken into account. For instance, someone enters in “I think I will finish this project in 8 weeks”, and the system can infer something like, “Well, given the reference class I have of similar people making similar calls, I’d expect it to take 12.

A strong machine learning system would of course require a lot of sample data, but small strides may be possible with even limited data. I imagine that if data is needed, lots of people on platforms like Mechanical Turk could be sampled.

2. Prediction interval input

The prediction tools I am familiar with focus on estimating the probabilities of binary events. This can be extremely limiting. For instance, instead of allowing users to estimate what Trump’s favorable rating would be, they instead have to bet on whether it will be over a specific amount, like “Will Trump’s favorable rate be at least 45.0% on December 31st?”[5]

It’s probably no secret that I have a love for probability densities. I propose that users should be able to enter probability densities directly. User entered probability densities would require more advanced aggregation techniques, but is doable.[6]

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.

I would suspect that using probability densities could simplify questions about continuous variables and also give much more useful information on their predictions. If there are tail risks these would be obvious; and perhaps more interestingly, probability intervals from prediction tools could be directly used in further calculations. For instance, if there were separate predictions about the population of the US and the average income, these could be multiplied to have an estimate of the total GDP (correlations complicate this, but for some problems may not be much of an issue, and in others perhaps they could be estimated as well).

Probability densities make less sense for questions with a discrete set of options, like predicting who will win an election. There are a few ways of dealing with these. One is to simply leave these questions to other platforms, or to resort back to the common technique of users estimating specific percentage likelihoods in these cases. Another is to modify some of these to be continuous variables that determine discrete outcomes; like the number of electoral college votes a U.S. presidential candidate will receive. Another option is to estimate the ‘true’ probability of something as a distribution, where the ‘true’ probability is defined very specifically. For instance, a group could make probability density forecasts for the probability that the blog 538 will give to a specific outcome on a specific date. In the beginning of an election, people would guess 538's percent probability for one candidate winning a month before the election.

3. Intelligent Prize Systems

I think the main reason why so many academics and rationalists are excited about prediction markets is because of their positive externalities. Prediction markets like InTrade seem to do quite well at predicting many political and future outcomes, and this information is very valuable to outside third parties.

I’m not sure how comfortable I feel about the incentives here. The fact that the main benefits come from externalities indicates that the main players in the markets aren’t exactly optimizing for these benefits. While users are incentivized to be correct and calibrated, they are not typically incentivized to predict things that happen to be useful for observing third parties.

I would imagine that the externalities created by prediction tools would be strongly correlate with the value of information to these third parties, which does rely on actionable and uncertain decisions. So if the value of information from prediction markets were to be optimized, it would make sense that these third parties have some way of ranking what gets attention based on what their decisions are.

For instance, a whole lot of prediction markets and related tools focus heavily on sports forecasts. I highly doubt that this is why most prediction market enthusiasts get excited about these markets.

In many ways, promoting prediction markets for their positive externalities is very strange endeavor. It’s encouraging the creation of a marketplace because of the expected creation of some extra benefit that no one directly involved in that marketplace really cares about. Perhaps instead there should be otherwise-similar ways for those who desire information from prediction groups to directly pay for that information.

One possibility that has been discussed is for prediction markets to be subsidized in specific ways. This obviously would have to be done carefully in order to not distort incentives. I don’t recall seeing this implemented successfully yet, just hearing it be proposed.

For prediction tools that aren’t markets, prizes can be given out by sponsoring parties. A naive system is for one large sponsor to sponsor a ‘category’, then the best few people in that category get the prizes. I believe something like this is done by Hypermind.

I imagine a much more sophisticated system could pay people as they make predictions. One could imagine a system that numerically estimates how much information was added to the new aggregate when a new prediction is made. Users with established backgrounds will influence the aggregate forecast significantly more than newer ones, and thus will be rewarded proportionally. A more advanced system would also take into account estimate supply and demand; if there are some conditions where users particularly enjoy adding forecasts, they may not need to be compensated as much for these, despite the amount or value of information contributed.

On the prize side, a sophisticated system could allow various participants to pool money for different important questions and time periods. For instance, several parties put down a total of $10k on the question ‘what will the US GDP be in 2020’, to be rewarded over the period of 2016 to 2017. Participants who put money down could be rewarded by accessing that information earlier than others or having improved API access.

Using the system mentioned above, an actor could hypothetically build up a good reputation, and then use it to make a biased prediction in the expectation that it would influence third parties. While this would be very possible, I would expect it to require the user to generate more value than their eventual biased prediction would cost. So while some metrics may become somewhat biased, in order for this to happen many others would become improved. If this were still a problem, perhaps forecasts could make bets in order to demonstrate confidence (even if the bet were made in a separate application).

4. Non-falsifiable questions

Prediction tools are really a subset of estimation tools, where the requirement is that they estimate things that are eventually falsifiable. This is obviously a very important restriction, especially when bets are made. However, it’s not an essential restriction, and hypothetically prediction technologies could be used for much more general estimates.

To begin, we could imagine how very long term ideas could be forecasted. A simple model would be to have one set of forecasts for what the GDP will be in 2020, and another for what the systems’ aggregate will think the GDP is in 2020, at the time of 2018. Then in 2018 everyone could be ranked, even though the actual event has not yet occurred.

In order for the result in 2018 to be predictive, it would obviously require that participants would expect future forecasts to be predictive. If participants thought everyone else would be extremely optimistic, they would be encouraged to make optimistic predictions as well. This leads to a feedback loop that the more accurate the system is thought to be the more accurate it will be (approaching the accuracy of an immediately falsifiable prediction). If there is sufficient trust in a community and aggregation system, I imagine this system could work decently, but if there isn’t, then it won’t.

In practice, I would imagine that forecasters would be continually judged as future forecasts are contributed that agree or disagree with them, rather than only when definitive events happen that prove or disprove their forecasts. This means that forecasters could forecast things that happen in very long time horizons, and still be ranked based on their ability in the short term.

Going more abstract, there could be more abstract poll-like questions like, “How many soldiers died in war in WW2?” or “How many DALYs would donating $10,000 to the AMF create in 2017?”. For these, individuals could propose their estimates, then the aggregation system would work roughly like normal to combine these estimates. Even though these questions may never be known definitively, if there is built in trust in the system, I could imagine that they could produce reasonable results.

One question here which is how to evaluate the results of aggregation systems for non-falsifiable questions. I don’t imagine any direct way but could imagine ways of approximating it by asking experts how reasonable the results seem to them. While methods to aggregate results for non-falsifiable questions are themselves non-falsifiable, the alternatives also are very lacking. Given how many of these questions exist, it seems to me like perhaps they should be dealt with; and perhaps they can use the results from communities and statistical infrastructure optimized in situations that do have answers.

Conclusion

Each one of the above features could be described in much more detail, but I think the basic ideas are quite simple. I’m very enthusiastic about these, and would be interested in talking with anyone interested in collaborating on or just talking about similar tools. I’ve been considering attempting a system myself, but first want to get more feedback.

  1. The Good Judgement Project FAQ, https://www.gjopen.com/faq

  2. Sharpening Your Forecasting Skills, Link

  3. IARPA Aggregative Contingent Estimation (ACE) research program https://www.iarpa.gov/index.php/research-programs/ace

  4. The Good Judgement Project: A Large Scale Test of Different Methods of Combining Expert Predictions
    Link

  5. “Will Trump’s favorable rate be at least 45.0% on December 31st?” on PredictIt (Link).

  6. I believe Quantile Regression Averaging is one way of aggregating prediction intervals https://en.wikipedia.org/wiki/Quantile_regression_averaging

  7. Hypermind (http://hypermind.com/)

New Comment
25 comments, sorted by Click to highlight new comments since:

Prediction should be build into more tools.

If we take computer programming, every time a programmer starts tests for a new code change he could be asked for his credence that the tests pass.

I would expect such an addon lead to skill development among programmers.

[-]pcm20

I subsidized some InTrade contracts in 2008. See here, here and here.

What's the best way to have people indicate probability distributions? It has to be easy enough to not cut down on participation unduly.

My first guess would be asking people to state their 95% confidence intervals and pretend that the distribution is normal. Other possibilities: ask people to bet on binary events at various odds; ask people to drag-and-drop a probability curve; ask a short sequence of frequentist-language questions to get at intuition ("if you tried this 10 times, how many times would you expect the positive outcome?" "if you tried this ten times, would you expect one example to reach a score of at least X?")

It's definitely an important question.

I think we've had pretty decent success with Guesstimate, where users enter 90% confidence intervals which are fitted to lognormal distributions (toggle-able to normal distributions.)

I'd imagine here that really simple defaults are useful, but for the cases that users want more precision, they should be able to get it. Being asked a few separate questions is one way of inferring that.

It also of course depends a bit on the education of the audience.

Separately, I imagine that if algorithms would compete, they should just be able to enter the distribution directly.

Thank you for this reference!

Sharpening Your Forecasting Skills, Link

Are there any case histories of how superforcaster work, where they "show their work" as it were?

I was a super-forecaster. I think my main advantages were 1) skill at Googling and 2) noticing that most people, when you ask them “Will [an interesting thing] happen?”, are irrationally biased toward saying yes. I also seem to be naturally foxy and well-calibrated, but not more so than lots of other people in the tournament. I did not obsess, but I tried fairly hard.

Source

Edit: "Foxy" in this context means "knowing many small things instead of one big thing". See this pair (one, two) of Overcoming Bias posts by the late Hal Finney.

Thanks for the link to that. I'm definitely on the foxy end of the spectrum.

I'm curious when Hedgehogs do well. Perhaps in Physics/Maths when they hedgehog on the right idea?

The most popular is the book Superforecasting. https://www.amazon.com/Superforecasting-Science-Prediction-Philip-Tetlock/dp/0804136718

That, and resulting published papers, are really the only docs I know of 'superforecasters', in part because they specifically named that term.

The larger field of group aggregation and forecasting is much more wide. The journal 'Foresight' is one dedicated to the topic. https://www.researchgate.net/journal/1465-9832_Foresight

I'd also recommend the book Business Forecasting. https://www.amazon.com/Business-Forecasting-Practical-Problems-Solutions/dp/111922456X/ref=sr_1_2?s=books&ie=UTF8&qid=1482353235&sr=1-2&keywords=business+forecasting

Prediction markets are powerful, but also still quite niche.

I believe you're mistaken about that. Financial markets are, to a large degree, prediction markets and they are not niche at all. Notably, a LOT of money and brainpower is committed to the task of constructing better tools for prediction in that sphere.

I'm discussing prediction markets outside of financial markets. I don't think most people would consider financial markets under 'prediction markets', though I obviously realize there are similarities.

You are talking about terminology: whether most people call financial markets a kind of prediction markets.

I'm pointing out that "prediction technologies" are basically the same for both, regardless of what is called what.

Literally the only difference in terms of prediction dynamics is that currently prediction markets include political/non-financial questions, which are only implicitly included in financial markets.

I think of the difference as: a prediction market is subsidized by someone with an interest in the information (or participants who want to influence some decision-maker who will act on the basis of market prices), while a financial market facilitates trade (or as a special case hedges risk).

How is a prediction market subsidized by someone with an interest in the information? As far as I'm aware, most of them make money on bid/ask spreads, and can be thought of as a future or Arrow–Debreu security.

As the current institutions stand there are differences. Prediction market sites and the Nasdaq are obviously different in a lot of institutional ways. In prediction markets you can't own companies. But in the more abstract way in which people trade on current information as a prediction, which is eventually realized, they are similar.

For example, a corporate bond is going to make a series of payments to the holder over its maturity. Market makers can strip off these payments and sell them as bespoke securities, so you could buy the rights to a single payment on debt from company X in 12 months. If you'd like, people can then write binary options on those such that they receive everything or nothing based on a specified strike price.

In the general security there is lots of information and dynamics, but with the right derivatives structure you can break it up into a state of a series of binary predictions.

The dynamic structure behind prediction markets and financial markets as trading current values built on models of future expectations is very similar, and I think identical.

I agree with you that there is no difference in kind between the assets traded in existing financial markets and those traded in a prediction market.

Existing prediction markets primarily offer amusements to participants, and are run like other gambling sites, with a profit to the market and the average trader losing money. Existing markets may hedge some participants' risk, and in that respect are like a financial market.

Around here, prediction markets are usually proposed as an institution for making predictions (following Robin). In that context, someone who wants a prediction subsidizes the market, perhaps by subsidizing a market maker. The traders aren't trading because it's fun or they have a hedging interest, they are doing it because they are getting paid by someone who values the cognitive work they are doing.

In some cases this is unnecessary, because someone has a natural interest in influencing the prediction (e.g. if the prediction will determine whether to fire a CEO, then the CEO has a natural interest in ensuring that the prediction is favorable). In this case the decision-maker pays for the cognitive work of the traders by making a slightly suboptimal decision. Manipulative traders pay for the right to influence the decision, and informed traders are compensated by being counterparties to a manipulative trader.

I think this is the important distinction between a prediction market and other kinds of markets---in the case of prediction markets, traders make money because someone is willing to pay for the information generated by the market. I agree that this is not the case for existing prediction markets, and so it's not clear if my story is reasonable. But it is clear that there is a difference in kind between the intended use of prediction markets and other financial markets.

On the topic of financial markets and aggregation of signals, there is actually a lot of ML-based signal aggregation going on within trading companies. A common structure is having specialized individuals or teams develop predictive signals (for publicly traded securities) and then have them aggregated into a single meta-prediction (typically a "microprice", or theoretical price) which is then used for order placement.

A couple anecdotes:

  1. There is some publicly available story about a high frequency KOSPI options desk that made a substantial improvement in P/L after adding a Random Forest in their signal aggregation logic. Eventually, this became the difference between making and losing money
  2. Accurate up until about a year or two ago, I believe one of the top 10 worldwide trading firms was using a simple average as their aggregation mechanism for most of their trading. This would include averaging effectively redundant signals which could be easily identified as such.

I haven’t heard much about machine learning used for forecast aggregation. It would seem to me like many, many factors could be useful in aggregating forecasts. For instance, some elements of one’s social media profile may be indicative of their forecasting ability. Perhaps information about the educational differences between multiple individuals could provide insight on how correlated their knowledge is.

I think people are looking in to it: The Good Judgment Project team used simple machine learning algorithms as part of their submission to IARPA during the ACE Tournament. One of the PhD students involved in the project wrote his dissertation on a framework for aggregating probability judgments. In the Good Judgment team at least, people are also in using ML for other aspects of prediction - for example, predicting if a given comment will change another person's forecasts - but I don't think there's been much success.

I think a real problem is that there's a real paucity of data for ML-based prediction aggregation compared to most machine learning projects - a good prediction tournament gets a couple hundred forecasts resolving in a year, at most.

Probability density inputs would also require additional understanding from users. While this could definitely be a challenge, many prediction markets already are quite complicated, and existing users of these tools are quite sophisticated.

I think this is a bigger hurdle than you'd expect if you're implementing these for prediction tournaments, though it might be possible to do for prediction markets. (However, I'm curious how you're going to implement the market mechanism in this case.) Anecdotally speaking many of the people involved in GJ Open are not particularly math or tech savvy, even amongst the people who are good at prediction.

But will it convince people there is no dragon in the garage?

Explanation: No system is useful if people don't believe the results. A prediction market on "Did Lee Harvey Oswald act alone" won't convince any of the conspiracy theory peddlers, etc. I'd bet my immortal soul (or lack thereof) that there's no afterlife, but there's no way to actually resolve that bet without dying. ;)

I think that convincing people in the use of prediction technologies is a very different question from making more accurate and comprehensive technologies (which is what I focussed on above). In my opinion there's more than enough evidence for them, it's more a matter of education.

I consider belief in prediction technologies to be an extension of empirical thinking. Not everyone may accept it, but those who do can get significant value.

No system is useful if people don't believe the results.

Will people who don't believe the results be willing to accept bets? Or, say, commit capital to liquid markets? :-D

They might make the bet, but they'll refuse to accept the result and won't pay up. For example, what bet could you make with someone claiming that the World Trade Center towers were brought down by a controlled demolition?

There's also the opposite problem. Suppose that a prediction market has assigned greater than zero probability to "In the year 3000, 2 + 2 = 5." This will not happen, but you won't make any money betting against it because you can't adjudicate it until after your death...

For example, what bet could you make with someone claiming that the World Trade Center towers were brought down by a controlled demolition?

That's not a prediction.

And refusing to pay up is a very old problem with a variety of solutions.

Suppose that a prediction market has assigned greater than zero probability to "In the year 3000, 2 + 2 = 5." This will not happen, but you won't make any money betting against it

I don't see how this is a problem.

In general, the usual sane standards of proper betting apply: be clear what you're betting on, be very specific about which criteria have to be satisfied for a bet's outcome to be decided in a particular way, try to precommit everyone to payment, etc.

That's not a prediction.

Exactly... the original post was about applying prediction market techniques and other opinion aggregation methods to other types of problems. I was saying that a prediction market is only as good as its eventual adjudication method - we already know, for example, that the Shroud of Turin was radiocarbon dated to long after the death of Jesus, but believers keep coming up with excuses not to change their minds anyway.

The techniques described in the post could be used in isolation; only the last was about non-forecast estimates. I agree that these are much, much tougher than ones with definitive answers. They may need pretty strict filtering of participants, and would be the least accepted.