Follow-Up to Good Judgment Project, Season Three.
During the last forecasting season I took part in the Good Judgment Project (GJP; see also the blog) and this is a short summary of my participation (actually triggered by hamnox comment).
The GJP estimates world events like
- Ukraine conflict
- Arctic ice cap melting
- Ebola outbreak duration
- Chinese sea conflict
- ISIS attacks
- Terrorist attacks
- Oil price
- Certain exchange rates
- Election results
- and many other political events
To participate in that study one has to register (can't remember where exactly I stumbled over the link, possibly the one at the top). And one has to do an preparatory online course and one has to pass an online test. At least I had to complete it. Whether the result affected my assignment to any group I can't say. The course explains the scoring and gives recommendations for making good forecasts (choose forecasts one has an edge in, estimate early, update often, do post-mortems). The test seems to test for calibration and accuracy by asking for known (mostly political) events and whether one is sure about them.
The current forecasting season started in November 2014 and has just ended. I invested significantly less then half an hour a week on 8 questions of about 100 (and thus less than I projected in an early questionaire). I did 2 to 15 updates for these questions and I earned a score in the middle range (mostly due to getting hit by an unexpected terrorist attack). As I just learned I was assigned to the study condition were I could neither see the total group estimate nor the estimates of the other group members - only their comments. I was somewhat disappointed by this as I had hoped to learn something from how the scores developed. Too bad I wasn't in a prediction marked group. But I hope to get the study results later.
I will not take part in further rounds as I shy the effort for the types of forecasts which are mostly political. They are political because the sponsor (guess who) is interested mostly in political events - less in economical, environmental, scientific or other types. But I enjoyed forecasting artic ice cap melting and ebola - and netted a better than average score on that.
The scoring - at least in this group - is interesting and uses an averaged Brier Score - averaged over a) all forecast questiontion and b) within a question over all the days for which a forecast is provided. I intended to game that by betting on questions that a) I could forecast well and b) that had an expected reliable outcome. Sadly there were few of type a.
From this experience I learned that
- such prediction organizations ask mostly for political events,
- political events are hard to predict and
- predicting political events requires a lot of background information.
- I'm below average in predicting political event (at least compared to my group which I'd guess has more interest in politics than I) but
- I'm above average on non-political topics.
Thanks for posting this. The GJP's sparked only sporadic discussion here, maybe because it focuses so much on world politics as opposed to stereotypically LWesque STEM stuff, and that's a bit of a shame. I'm a STEM nerd myself, but in a way that made the GJP more enticing because I thought participating in it might nudge me to learn a tiny bit about world politics (it did), and because I wanted to see whether I could beat the averages despite having minimal domain-specific knowledge (I could).
IIRC I filled out a pre-registration form that just asked for bare-bones demographic info like occupation and highest-level education qualification. After the GJP let me into the study, but before they assigned me to a group, I think I filled out a longer background survey about myself, and did the political knowledge/calibration test.
I did the short training session after getting the group assignment. Presumably the (sub)group assignments are randomized so the researchers can make causal inferences about which treatments generate better forecasts.
It's actually still running for my group. We have 31 questions still open which don't close until the 8th or 9th.
I wound up putting in more time than I think I anticipated, probably more than half an hour a week most weeks, and so far I've made 335 predictions on 36 questions. Since GJP started displaying my rank in my group, my overall Brier score's consistently been in the lowest 20%.
Maybe we were in the same group. My group also had no prediction markets, but I could read "tips" written by other people who were apparently chatting to each other in a forum to which I didn't have access. I also couldn't/can't see other users' predictions in real time, although I could see the group's median Brier score for each question after it was closed.
Ah, but if you didn't make a prediction on a question, you still got a Brier score for it — GJP gave you the median score of the group members who did make a prediction. (Or that's how it worked for me, anyway.) So the trick is to choose questions where you expect to do better than the median predictor, even if those questions look difficult. (Perhaps especially questions which look difficult to you, because other people might be overconfident about them.) The sample size is small, but on each of the 4 questions where my Brier score was high (≥ 0.5) I scored 0.07-0.15 fewer points than the group score, which really helped drive down my overall score.
Mostly true, although election-result questions tended to be nice & easy. A few other political events weren't obvious slam-dunks if I looked at them from a distance, but became very obvious slam-dunks as soon as I investigated them.
Example: "Will a referendum on Quebec's affiliation with Canada be held before 31 December 2014?", which I didn't touch until October. But when I ran Google News searches about it, the lack of positive evidence for expecting a referendum was stark, and I immediately gave it only a 5% probability. During October I monotonically lowered that as tips came in pointing out that the one party pushing for a referendum was unpopular and leader-less, and that a referendum would take time to organize. For all of November & December I had that question at 0%, and my final Brier score for it halved the (already tiny) group score.
I also discovered that the prediction difficulty of the political questions was often time-dependent. IARPA tried to pick relevant & topical questions, which meant that a lot of questions were provoked by news coverage. But because the news prefers dramatic, sudden events, quite a few of the resulting questions were about transient crises or other hot issues that rapidly cooled down and became highly predictable within days or weeks, leaving them easy to predict for most of the (months-long) prediction windows.
A good tactic therefore turned out to be: just wait. It'd be interesting to see how people would do in a GJP re-run where the questions had shorter prediction windows, and that tactic would surely be less successful.
Yes. Actually it is. Somehow I misinterpreted one of the last mails. At least it's closed on all my forecasts.
Maybe. The best forecaster is grossz18 in my group.