Information Markets

eva_

Epistemic status: Exploratory. This post would be shorter and contain more math if I'd thought about it more first.

I don't like prediction markets, as currently described. They're similar to ordinary stock markets, which economists say are supposed to be efficient, but don't look it. People say "If you think the markets are wrong then you can bet on them and make a profit" but I don't actually expect that to be true, because markets don't only contain sincere attempts to optimise prices. They also contain schemes to extract money from others without adding information, or to cheat in forbidden ways without getting caught, and similar nonsense, and so honest trading strategies have to be not only beneficial but also inexploitable, or you'll just end up paying people to rob you. Most of this gets much worse in a prediction market, especially a market that is being used to inform decisions that people care about, and where who knows how many people have who knows how much private information behind their opaque betting strategies. I don't expect the libertarian zero-regulation "let the market blindly solve all our problems" will actually produce something anyone should trust here.

Other facts I dislike about prediction markets:

They aren't fair in any technical sense over the value of information provided.
Can't accurately collate private information known to different participants.
Implement EDT instead of LDT if followed blindly, which they don't provide any alternative to doing.
Don't legibly convey why they reached the conclusions they reached.
Pay information ransoms to people who intentionally create uncertainty in the first place, which is a perverse incentive.

Make adding information a race to the microsecond even though there's usually no strong time pressure on the part of a subsidiser / people who want the prediction.
Include rational agents who don't think others are lying, but are betting money, even though that's clearly a zero-sum game.

I think I can fix most of these, albeit at the cost of proposing something that is substantially less pure and simple as "everyone can bet about what will happen". To distinguish them, I'll call the thing I'm trying to describe an Information Market, although it fills the same niche.

Shareability levels of Information:

Public Information: Everyone has it; everyone knows everyone has it; it's basically part of the common prior.
Private, Verifiable Information: Some subset of the participants know it to start with, and they can prove it's true to anyone they feel like telling. Maybe there's a signed certificate from Omega, maybe they've got a photo of the murderer fleeing the scene, something like that. You can't meaningfully accuse them of lying, but they've still got the option of not telling you if, for some reason, they don't want to.
Private, Expensive-to-Fake Information: It's somewhat a matter of trust, but at least you can put a number to it. They'd have to pay x dollars on the fake certificate black market to tell the specific lie they're telling now, assuming it's a lie.
Private, Unverifiable Information: Some subset of the participants know it to start with, and you'll have to take their word for it as to whether it's true. It's entirely possible they're just making it up for a laugh or as a scheme to get money from you.
Delayed Verifiable Information: You've no idea if they're lying now, but you'll be able to find out later. There are some hazards around this, but mostly they can say something like "put my life savings in a holding account and conviscate it if it turns out I'm lying" and if they've got enough money total, it turns into Verifiable Information with Extra Steps.

How an Information Market works through a problem:

Information selling: where people who know stuff they expect the market will want to know tell the market, and get paid in subsidy share in a few different ways calculated after the fact.
Price Calculation: where traders who have seen the sold information try to approximate the correct prices of the things the market is predicting.
Price Reflection: where traders can update on each others estimates for an Agreement Theorem Price, or aggressively bet for a solution if that fails.
Subsidisation: where everyone is retroactively paid for the bits that their information adds to the market by everyone who derives value from the information contained in the aggregated predictions.

How is information bought from sellers?

Public information doesn't get bought from anyone, the traders all know it/have access to it/can look it up for free from public databases.
Private, Verifiable information gets bought by the market in ordinary cash, paying at least a tiny transactional fee representing the cost to the private dataholder of physically giving them the information, and at most a Shapley calculation of how much the information was worth that'll happen later.
Private, Unverifiable information can't be paid for in cash, because then people will show up pretending to have it when they don't, and sensible information markets don't pay people to tell lies. Instead, the seller is paid in the form of a bet, shaped to be slightly less than breakeven at what the market thinks the fair odds would have been without the information / would be if the market thought the seller was faking the information, and as efficiently profitable as possible to produce an expected profit of the same Shapley calculation of how much the information was worth if and only if the information is true, at minumum risk. Since the market doesn't actually expect the seller is lying (they'd be losing money if they were), none of the traders will want to bet against this seller who they all agree probably really does have private information, and so the other side of the bet has to be taken by the subsidiser.
Private Expensive-to-Fake information is in between those two, you can pay up to the price they'd theoretically have to be expending to fake the information in ordinary cash, but if the fair value of the information is greater than that you have to pay the excess in the form of a well-shaped bet as with unverifiable information.

Important Note: Selling information is a legible activity. You include notes about who you are and why you think you know what you know, because it is useful for the market determining what to pay you, even if the reason is just "I have an inexplicable hunch". It is not a mysterious anonymous trade on an opaque platform. The more info you include in your sale, the more the market will want to update from it, so the more you expect to get paid when they divide gains. This should align everyone's incentives towards making the market as informed about things as they individually have the ability to do.

Who can sell information?

Verifiable information anyone can sell, because you don't have to trust them, except for people the market thinks caused the uncertainty in the first place in the hope of profiting later, which would be paying a ransom. If you've got verifiable information that you'd prefer the market not know by some amount for external reasons, either they'll pay you enough to cover your losses and you sell, or they say "sorry it's not worth enough to us to pay you to sell and you should keep your secrets". e.g. a market on "who did this murder" won't offer to pay the murderer for a confession enough to make it profitable for them to confess, if the market result is being seen by detectives, so the murderer still won't confess, and that's perfectly OK. If the market isn't being seen by detectives, e.g. if the family wants to know if their relative is dead or just missing and has sworn oaths not to tell the police about it, then the murderer does have sufficient incentive to give this information and can be paid cash for it when he attaches verifiable proof (although that's a whole separate bag of moral hazards to deal with).

Unverifiable or expensive-to-fake information has a possible hazard: people with strong interests in the market's outcome trying to trick it. If these interests are small enough that the seller can make a bet that is expected to lose enough to overcome the external interest if the information is fake, while profiting the Shapley value amount if it is true, then they can do that. If the subsidiser can't afford a bet sufficient to overpower the other interests, then this again gets a "sorry we can't afford to sufficiently incentivise you to be telling the truth enough for us to believe you, so we can't buy your information". e.g. In a presidential predictive performance market with a subsidy of $10,000, a presidential candidate might say, "I have private unverifiable information that all the other candidates are terrible and have a 90% chance of starting WW3." Even a bet which pays them the entire subsidy if they're right only costs them 100k to be wrong, less than the value to them of being elected, and nobody except the subsidiser would offer to make a large enough bet to sufficiently incentivise the candidate to be honest because there's no opportunity to profit. If they're lying, they'll withdraw the claim and you earn nothing, and if they actually make the bet, it's proof they have actual reason to believe that they're being honest and you stand to lose substantially more money when it turns out they're right. Since there is no expectation of profit from taking the other side of the bet, and no trust without the bet being made, you unfortunately can't buy the information.

Unverifiable information from sellers without sufficient capital to fully support the bet required to get them their fair share also can't be fully paid for their information, but the market can at least buy the information at a reduced price so that they can afford the bet. This is sad but seems inevitable.

Moral Hazard Examples: If you buy information from someone who created uncertainty in the first place, e.g., paying the kidnapper to tell you where the victims were taken, then you've paid an implicit ransom and incentivised the behaviour you're trying to stop. You must refuse to buy from them, even though you'd be willing to pay a witness for that same information if they can verifiably prove they're not secretly the kidnapper in disguise.
A separate case is a terrorist who plants bombs that detonate if and only if the government doesn't fund his pet issue, and tells everyone about it. If the government has a market for "will this decision cause mass deaths and if so how many", an honest and correct Information Market says "Yes, because of a terroristic Threat that you should ignore", and the government can respond appropriately. This is possible because an information market knows why it thinks what it thinks, and is capable of being legible about its conclusions.

How is the actual prediction calculated?

Traders get to look at all the information sold to the market, and try to guess a probability distribution over final outputs according to whatever approximation of ideal Bayesianism they prefer. The important feature here is that "a skilled trader looked at the info and thinks x is the correct price" is a valuable piece of unverifiable private opinion, so the estimates over probability distributions are themselves pieces of information that the market values and can pay for in the form of bets to protect against fake uninformed traders. When the market looks at its own information plus the conclusions a group of traders reach based on that information, plus the conclusions the same or different traders reach on reflection after seeing each other's conclusions, the Agreement Theorem says they'll converge on a final prediction that is actually sensible.

That works because the traders are also legible about information. They want people to believe their own models are good and shift towards their conclusions because that gets them a greater share, and explaining why your model is correct convinces other traders to do that.

What happens if they don't?

People are, validly, scared by the idea of betting money against someone who knows what you believe and why you believe it and is willing to take the other side of a bet anyway even though it looks profitable or at least break-even to you.

It's possible they're just insane, and you should expect to profit, but it's also possible they've secretly got private information that they're not telling you, that they did not include in the reasons for their beliefs that they told you, so as to disguise themselves as insane. Maybe they're in league with a market manipulator, or maybe they've done secret forbidden deals with villains to get information the market isn't allowed to buy. You'd be foolish to bet against them if it was something like that.

Suppose, however, the trader can verify they haven't, can give logs of all the info they've accessed and every phone call they've made and expensively prove nothing shady is going on, that you just mutually think the other person is being incorrect. If not that, maybe at least they can be checked for shenanigans later and heavily fined enough for it not to be worth it retroactively. At this point, finally, I think there's a place for pure unsubsidised gambling, at whatever intermediate price balances the willingness to bet money, and may the best trader win. If they were being rational, it wouldn't have come to this, but you've got to back up beliefs with confidence at some point.

This gets a final predicted probability, at which everyone who added unverifiable information to the market will retroactively be betting against the subsidiser at whatever quantity is necessary to pay them their fair value without being exploitable.

How do retroactively calculated payments work?

People are supposed to be compensated for the Shapley value of their information. The subsidiser says, "I need accurate information about e.g. Weather Futures, will test the markets' accuracy in the form of bits of information difference between outcomes vs. predictions and outcomes vs. priors, and will pay out subsidy based on some value-price curve involving the bits of information the market produces". The subsidy value will be divided between the market participants based on how many additional bits of information the market ends up producing (and therefore how much additional payment) on average if you add the participants in a random order.

Since the market has been managed in a way where nobody has an incentive to lie, it can at least trust that everyone was actually honest, so we can simplify for the purposes of internal division-of-gains that the market's final prediction vs. prior information improvement is correct and ideally calibrated, which means it only has to consider the information gain from predictions vs. priors and can ignore the outcomes.
In order to do that, we simply consider all the random subsets of participants, what they would have predicted, the average marginal information gain when each new participant is added, and therefore the average subsidy gain from adding them. Feel free to insert your favourite alternative Approximation-of-Shapley here if you want to.

Now that we know everyone's fair share of subsidy, the people who can verify their information get paid that in cash, and the people who can't get to make a bet against the subsidiser that pays out their fair share in expectation if their information is true, or somewhat less than that if they don't have enough capital to back the bet. Traders' estimates are considered unverifiable information, because proving you've done the statistical modelling exactly correctly is even more intractible than this is already, and so they all end up making a bunch of expected-to-be-profitable bets against the subsidiser at prices between what the prediction would be without them and what it is now including them.

This results in the subsidiser making a whole bunch of bets that have no particular obligation to cancel out exactly, and therefore having a huge stake on one side or the other of the final target. This is risk that the subsidiser probably doesn't want, so it sells the risk to a friendly nearby insurance company based on the conveniently provided estimate of the final probabilities that the prediction market just created. This leaves it on the hook only for the expected loss of the bets it made at the final price, which happens to be exactly the subsidy price it signed up to pay out. Yay!

Finally, the subsidiser gets to be told that tomorrow's weather is sunny but in a decision theoretically complicated way where Omega's planning to make it rain if and only if they don't bring an umbrella, so they can't actually use the information. Sucks to be them. Implementing the difference between EDT and LDT after the market has given you correctly calibrated beliefs with legible explanations remains your problem, not mine. I'm just trying to produce correct group beliefs without anyone having to act irrationally or exploitably.

Shortcuts to make this even close to possible:

If multiple sources are adding the same fact to the market, you can use the standard Shapley identical-players constraint, so really you only have to evaluate the value of the combinations of facts and then divide between the repeated sources.
If all the traders are in practice using nice coherent statistical models of the information added, they can cheaply run the same models on subsets of the information to get the counterfactual predictions of how much the information sellers are adding in terms of predicted outputs.
If traders are using more nice coherent statistical models for updating their conclusions based on each others conclusions, then they can grade each others work. This shouldn't be exploitable because updating your estimate based on theirs gives credit to them, and if you refuse to credit the other traders then you end up in the destructive betting phase which costs you money unless you let it go to the true price in which case it credits the other traders correctly anyway.

This works best for predicting many many things simultaneously, e.g., for all the weather everywhere at once, where sellers are weather stations and balloons and traders are meteorologists and statisticians. Since everyone tells the market their statistical model and its conclusions after they see the evidence but before they see the results of other traders, you can explicitly calculate the bits of information added by each model to the prediction over the constant-weather prior. After that, whatever weighted combination of those models is actually observed to be correct after the fact can be used to determine the payout to each trader's meta-model of how to combine them.

The market can separate out the verify-sellers-are-honest step to the insurance company, in the sense that you can imagine all sellers being assumed unverifiable and required to justify evidence with bets, and then the same insurance company buying the risk from the subsidiser can buy the other half of the risk from as many sellers as it can without creating moral hazards.

If you've got information many people want, like again the weather, the subsidiser can easily be a huge crowd paying 10c every time they open the weather app, providing payout to everyone who was in the market adding information at the time they did that, for the specific bits added to the places and times they personally care about.

Conclusions

Does this make policy markets work? Not by itself.
In a policy market, your bet must be in terms of what happens if a policy is chosen, which only pays out if it is chosen, which gets exponentially unlikely as the market becomes convinced the policy is a bad idea, so you need exponentially massive bets to protect the same small expected payout. This means information that a policy is bad gets paid less as it gets stronger, unless you're committed to ignoring it, so people who think it's very bad will bet as if it's merely slightly bad in hopes of a better payout. I think the solution there is to try multiple policies, in small boxes or different regions, so you can pay for predicting the results of multiple experiments actually performed and validate models that way before applying them more generally.

There are probably a bunch of clever exploits I've missed if traders aren't being as friendly as they're supposed to be, perhaps involving multiple traders who are conspiring to pretend to disagree. At least the demand for legibility can make detecting that possible at all. It's kludgy and complicated in the ways that legal systems are instead of the ways that financial systems are, because this does require a whole bunch of regulation to prevent shenanigans. This is also very intractible, but people could at least try to make a reasonable approximation of it and know that any remaining unfairness is due to a lack of processing power and not a failure to even try.

What it does get you is a system with:

People who have information getting paid for telling the community about it by people who are paying the community for the value of that information.
Nobody constantly afraid of clever schemes to exploit them.
Information collation between different private sources, who are all being paid, by statisticians who are also being paid, in a way where faking skill does not result in a profit.
Legible explanations for why the market believes its conclusions, which makes participating in it and using its conclusions far less scary.
A reasonable approximation of "what the community collectively believes" for any problem that you want good societal epistemics on.

LESSWRONG
LW