Swarm AI (tool)

Alexei

I remember playing with this a while back, answering random questions. I guess now they've released it as a business tool for companies to run their own voting rooms.

Quick overview:

You create a room.
You invite 4-200 participants.
You ask questions.
Participants vote.
You get prediction results using their AI.

Their video and website claims they got a lot of hard predictions right. Of course, they aren't saying how many other things they guessed. So it's hard to say how magical it is, but it seems worth trying out. I'm up for joining people's room if they want to run some experiments.

I'm pretty sure (epistemic status: Good Judgment Project Superforecaster) the "AI" in the name is pure buzz and the underlying aggregation algorithm is something very simple. If you want to set up some quick group predictions for free, there's https://tinycast.cultivatelabs.com/ which has a transparent and battle-tested aggregation mechanism (LMSR prediction markets) and doesn't use catchy buzzwords to market itself. For other styles of aggregation there's "the original" Good Judgment Inc, a spinoff from GJP which actually ran an aggregation algorithm contest in parallel with the forecaster contest (somehow no "AI" buzz either). They are running a public competition at https://www.gjopen.com/ where anyone can forecast and get scored, but if you want to ask your own questions that's a bit more expensive than Swarm. Unfortunately there doesn't seem to be a good survey-style group forecasting platform out in the open. But that's fine, TinyCast is adequate as long as you read their LMSR algorithm intro.

The fact that they're measuring accuracy in a pretty bad way is evidence against them having a good algorithm.

Here's Anthony Aguirre (Metaculus) and Julia Galef on Rationally Speaking.

Anthony: On the results side, there's now an accrued track record of a couple of hundred predictions that have been resolved, and you can just look at the numbers. So, that shows that it does work quite well.

Julia: Oh, how do you measure how well it works?

Anthony: There's a few ways — going from the bad but easy to explain, to the better but harder to explain…

Julia: That's a good progression.

Anthony: And there's the worst way, which I won't even use — which is just to give you some examples of great predictions that it made. This I hate, so I won't even do it.

Julia: Good for you for shunning that.

Anthony: So looking over sort of the last half year or so, since December 1st, for example… If you ask for how many predictions was Metaculus on the right side of 50% — above 50% if it happened or below 50% if it didn't happen — that happens 77 out of 81 times the question resolved, so that's quite good.

And some of the aficionados will know about Brier scores. That's sort of the fairly easy to understand way to do it, which is that you assign a zero if something doesn't happen, and a one if something does happen. Then you take the difference between the predicted probability and that number. So if you predict at 20% and it didn't happen, you'd take that as a .2, or if it's 80% and it does happen and that's also a .2, because it's a difference between the 80% and a one, and then you square that number.

So Brier scores can run from basically zero to one, where low numbers are good. And if you calculate that for that same set of 80 questions, it's .072, which is a pretty good score.