I came across a 2015 blog post by Vitalik Buterin that contains some ideas similar to Paul Christiano's recent Crowdsourcing moderation without sacrificing quality. The basic idea in both is that it would be nice to have a panel of trusted moderators carefully pore over every comment and decide on its quality, but since that is too expensive, we can instead use some tools to predict moderator decisions, and have the trusted moderators look at only a small subset of comments in order to calibrate the prediction tools. In Paul's proposal the prediction tool is machine learning (mainly using individual votes as features), and in Vitalik's proposal it's prediction markets where people bet on what the moderators would decide if they were to review each comment.

It seems worth thinking about how to combine the two proposals to get the best of both worlds. One fairly obvious idea is to let people both vote on comments as an expression of their own opinions, and also place bets about moderator decisions, and use ML to set baseline odds, which would reduce how much the forum would have to pay out to incentivize accurate prediction markets. The hoped for outcome is that the ML algorithm would make correct decisions most of the time, but people can bet against it when they see it making mistakes, and moderators would review comments that have the greatest disagreements between ML and people or between different bettors in general. Another part of Vitalik's proposal is that each commenter has to make an initial bet that moderators would decide that their comment is good. The article notes that such a bet can also be viewed as a refundable deposit. Such forced bets / refundable deposits would help solve a security problem with Paul's ML-based proposal.

Are there better ways to combine these prediction tools to help with forum moderation? Are there other prediction tools that can be used instead or in addition to these?


New Comment
15 comments, sorted by Click to highlight new comments since:

Luis von Ahn (of CAPTCHA fame) came up with a number of games with a purpose, such as the ESP game. The idea to bet on moderator decisions reminded me of those games.

I recall another one that judged the aesthetic quality of images, but I don't remember the name. We could use something similar to judge the quality of posts in a way that would be resistant to abuse.

I'm pretty sure that aesthetic game was on the original gwap.com, but unfortunately, the creators seem to have moved on to other projects and the game part of the site doesn't seem to exist anymore. I'm not sure I remember exactly how it worked. Maybe you could find the rules in Wayback Machine, but it doesn't seem to be preserved very well. Does anyone remember the name of that game or how it worked?

Von Ahn also invented reCAPTCHA, which gives me another idea. We could perhaps require that a user participate in a judging game on two other posts as a cost to submit a post of their own.

The aim of the judging game is to predict how another other player would judge the post. (for example, upvote, downvote, flag as inappropriate, or some other set of emojis with standard meanings) The post could be chosen randomly from recent posts. The identity of the other player is kept secret by the system. If they agree on an emoji, it gets applied to the post, and the players both earn points in their predictor score. If they disagree, they lose predictor points, and the emoji doesn't get applied.

New moderators can then be chosen from the highest-scoring predictors.

Games with a purpose seem to work when useful things are also fun or can be made fun; indeed, to the extent that people sometimes do fun useless things, and there are fun useful things, then I certainly agree that we should be moving as much of the fun to "useful stuff" as possible.

Schemes based on markets can work even if they are not fun (e.g. even when participants are algorithms and companies offering professional moderation services).

Luis von Ahn (of CAPTCHA fame)

And Duolingo.

Some thoughts:

  • Algorithms can be participants in prediction markets. That seems like the most natural way to combine them. There isn't a distinction in kind between baseline odds and other bets---in particular, people can write algorithms and have them participate in markets, the ML algorithm does not need a distinguished role. In this sense my proposal is a special case of Vitalik's, and I think this is the main observation that makes the prediction market proposal workable.
  • We can choose what the moderator reviews based on dissent in prediction markets. The moderator can review a post when there is sufficiently persistent disagreement to create arbitrage opportunities that are profitable enough to cover the cost of the moderator's time (as in my proposal for arbitrating disagreements), and when there isn't persistent disagreement then we can use the market estimate to decide what to display.
  • The social cost of putting up content can hopefully be quantified using market liquidity and prices, e.g. as the subsidy needed to make the market clear "well enough." Done right, this could produce estimates that reflect the value of the moderator's time, the ability of other participants to quickly approximate the moderator's judgments, and the probability that a post is spam. (e.g. it would include "friends of trusted users can post more cheaply" as a special case).
  • The deposit / initial bet provided by a contributor is what actually fixes the security problem you identified in my proposal. Using prediction markets might reduce the size of the problem but I don't see how it qualitatively changes the situation (you still require someone to make informative bets on the market, and that costs something).

I am quite excited about seeing these things worked out in more detail over the coming years.

See also dialog markets.

Algorithms can be participants in prediction markets.

Nope. To participate in markets you need to commit capital. Which you need to own to start with.

Anyone can, of course, put an algorithm in charge of his money, but you do need to have money.

The social cost of putting up content can hopefully be quantified using market liquidity and prices, e.g. as the subsidy needed to make the market clear "well enough."

What does that mean? A liquid market always clears, you may not like the price or the volume, but that is a different issue.

The deposit / initial bet provided by a contributor

If you require a deposit from a contributor, you will find that you have very very few contributions.

Given my experience with forum moderation most attackers aren't very sophisticated, I don't think the raised security concerns are a huge issue.

A prediction market adds unnecessary complexity to the user experience.

A prediction market adds unnecessary complexity to the user experience.

Most users presumably never see the prediction market, they simply vote (or respond with a comment, or whatever). These signals can be used by algorithms and a small number of gamblers in order to place bets on the markets.

most attackers aren't very sophisticated, I don't think the raised security concerns are a huge issue.

This is also my intuition for now, but over the very long run I am hoping that similar systems will be used in cases that are radically higher stakes, e.g. for structuring policy discussions that will directly shape major power policies.

I also think that sophisticated manipulation will become cheaper as AI improves, and that there is a significant chance that many existing norms for online discussion will break down significantly over the coming decade or two as sophisticated autonomous participants with manipulative agendas greatly outnumber humans.

This is also my intuition for now, but over the very long run I am hoping that similar systems will be used in cases that are radically higher stakes, e.g. for structuring policy discussions that will directly shape major power policies.

For high stakes situation the task of picking moderators becomes a lot more complex. I don't think prediction markets help for that purpose.

It also adds an attack vector, both for those willing to spend to influence the automation, and for those wanting to make a profit on their influence over the moderators.

I'd love to see a model displayed alongside the actual karma and results, and I'd like to be able to set my thresholds for each mechanism independently. But I don't think there's any solution that doesn't involve a lot more ground-truthing by trusted evaluators.

Note that we could move one level of abstraction out - use algorithms (possibly ML, possibly simple analytics) to identify trust level in moderators, which the actual owners (those who pick the moderators and algorithms) can use to spread the moderation load more widely.

It also adds an attack vector, both for those willing to spend to influence the automation

I'm optimistic that we can cope with this in a very robust way (e.g. by ensuring that when there is disagreement, the disagreeing parties end up putting in enough money that the arbitrage can be used to fund moderation).

and for those wanting to make a profit on their influence over the moderators

This seems harder to deal with convincingly address.

But I don't think there's any solution that doesn't involve a lot more ground-truthing by trusted evaluators.

So far I don't see any lower bounds on the amount of ground truth required. I expect that there aren't really theoretical limits---if the moderator was only willing to moderate in return for very large sums of money, then the cost per comment would be quite high, but they would potentially have to moderate very few times. I see two fundamental limits:

  • Moderation is required in order to reveal info about the moderator's behavior, which is needed by sophisticated bettors. This could also be provided in other ways.
  • Moderation is required in order to actually move money from the bad predictors to the good predictors. (This doesn't seem important for "small" forums, since then the incentive effects are always the main thing, i.e. the relevant movement of funds from bad- to good- predictors is happening at the scale of the world at large, not at the scale of a particular small forum).

I'm optimistic that we can cope with this in a very robust way (e.g. by ensuring that when there is disagreement, the disagreeing parties end up putting in enough money that the arbitrage can be used to fund moderation).

That assumes that many people are away of a given post over which there are disagreements in the first place.

But I don't think there's any solution that doesn't involve a lot more ground-truthing by trusted evaluators.

100 to 1000 votes might not be enough votes by trusted elevators. On the other hand I think the amount of votes needed is small enough, to have a stable system.

When a given post is very unclear because there are strong signals that it should be hidden and also strong signals that it should be displayed prominently, that post could go to a special assessment list that the moderator prunes from time to time.

I don't think the effort would be to high for a blogger like Scott.

If the moderator had enough voting power or stake, then it would be a pure prediction market. The decision could be the final voting results. Maybe some mechanism can be designed to ensure the early voters get some profits if the vote is the same as the final voting result. If an incident happens and the other side increase substantially because of it, the final results can also be reversed.

Personally to each of us the value of content is a function of our goals. So, ideally, I would want to be able to have access to all comments, and simply have a smart filter to zero-in to those comments that matter to me. That would be a lot more universally useful and desirable to me than having something one-directionally useful, such as a machine learning model that simulates moderator based on a single standard or a limited set of values or limited set of extracted features.

So, one way to be universally useful would be to empower the user to compute the scores themselves based on arbitrary goals by providing all uninterpreted raw data to the user. However, since the community usually does have opinion, of what type of posts matter to be seen by a first-time viewer to give a sense of what the forum should feel like, it would make sense for the forum community to define some specific goal to be set as a default moderating filter.

Would such an ML-based system also keep track of users as individuals? For example, if one particular user makes the same moderation calls as the actual moderator 99% of the time, should the system start weighting that person's calls more heavily?

I would say that it should do so. Even if a high-scoring community member decides to maliciously downvote somebody they are biased against 1% of the time, they're still providing their useful moderator-aligned judgement 99% of the time, and the overall system still comes out ahead.