Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[LINK] Get paid to train your rationality

27 Post author: XFrequentist 03 August 2011 03:01PM

A tournament is currently being initiated by the Intelligence Advanced Research Project Activity (IARPA) with the goal of improving forecasting methods for global events of national (US) interest. One of the teams (The Good Judgement Team) is recruiting volunteers to have their forecasts tracked. Volunteers will receive an annual honorarium ($150), and it appears there will be ongoing training to improve one's forecast accuracy (not sure exactly what form this will take).

I'm registered, and wondering if any other LessWrongers are participating/considering it. It could be interesting to compare methods and results.

Extensive quotes and links below the fold.

Despite its importance in modern life, forecasting remains (ironically) unpredictable. Who is a good forecaster? How do you make people better forecasters? Are there processes or technologies that can improve the ability of governments, companies, and other institutions to perceive and act on trends and threats? Nobody really knows.

The goal of the Good Judgment Project is to answer these questions. We will systematically compare the effectiveness of different training methods (general education, probabilistic-reasoning training, divergent-thinking training) and forecasting tools (low- and high-information opinion-polls, prediction market, and process-focused tools) in accurately forecasting future events. We also will investigate how different combinations of training and forecasting work together. Finally, we will explore how to more effectively communicate forecasts in ways that avoid overwhelming audiences with technical detail or oversimplifying difficult decisions.

Over the course of each year, forecasters will have an opportunity to respond to 100 questions, each requiring a separate prediction, such as “How many countries in the Euro zone will default on bonds in 2011?” or “Will Southern Sudan become an independent country in 2011?” Researchers from the Good Judgment Project will look for the best ways to combine these individual forecasts to yield the most accurate “collective wisdom” results.  Participants also will receive feedback on their individual results.

All training and forecasting will be done online. Forecasters’ identities will not be made public; however, successful forecasters will have the option to publicize their own track records.

Who We Are

The Good Judgment research team is based in the University of Pennsylvania and the University of California Berkeley. The project is led by psychologists Philip Tetlock, author of the award-winning Expert Political Judgment, Barbara Mellers, an expert on judgment and decision-making, and Don Moore, an expert on overconfidence. Other team members are experts in psychology, economics, statistics, interface design, futures, and computer science.

We are one of five teams competing in the Aggregative Contingent Estimation (ACE) Program, sponsored by IARPA (the U.S. Intelligence Advanced Research Projects Activity). The ACE Program aims "to dramatically enhance the accuracy, precision, and timeliness of forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many intelligence analysts." The project is unclassified: our results will be published in traditional scholarly and scientific journals, and will be available to the general public.

A general description of the expected benefits for volunteers:

All decisions involve forecasts, and we all make forecasts all the time.  When we decide to change jobs, we perform an analysis of potential futures for each of our options.  When a business decides to invest or disinvest in a project, it moves in the direction it believes to present the best opportunity.  The same applies when a government decides to launch or abandon a policy.

But we virtually never keep score. Very few forecasters know what their forecasting batting average is — or even how to go about estimating what it is.

If you want to discover what your forecasting batting average is — and how to think about the very concept — you should seriously consider joining The Good Judgment Project. Self-knowledge is its own reward. But with self-knowledge, you have a baseline against which you can measure improvement over time. If you want to explore how high your forecasting batting average could go, and are prepared to put in some work at self-improvement, this is definitely the project for you.

Could that be any more LessWrong-esque?

Prediction markets can harness the "wisdom of crowds" to solve problems, develop products, and make forecasts. These systems typically treat collective intelligence as a commodity to be mined, not a resource that can be grown and improved. That’s about to change.

Starting in mid-2011, five teams will compete in a U.S.-government-sponsored forecasting tournament. Each team will develop its own tools for harnessing and improving collective intelligence and will be judged on how well its forecasters predict major trends and events around the world over the next four years.

The Good Judgment Team, based in the University of Pennsylvania and the University of California Berkeley, will be one of the five teams competing – and we’d like you to consider joining our team as a forecaster. If you're willing to experiment with ways to improve your forecasting ability and if being part of cutting-edge scientific research appeals to you, then we want your help.

We can promise you the chance to: (1) learn about yourself (your skill in predicting – and your skill in becoming more accurate over time as you learn from feedback and/or special training exercises); (2) contribute to cutting-edge scientific work on both individual-level factors that promote or inhibit accuracy and group- or team-level factors that contribute to accuracy; and (3) help us distinguish better from worse approaches to generating forecasts of importance to national security, global affairs, and economics.

Who Can Participate

Requirements for participation include the following:

(1) A baccalaureate, bachelors, or undergraduate degree from an accredited college or university (more advanced degrees are welcome);

(2) A curiosity about how well you make predictions about world events – and an interest in exploring techniques for improvement.

More info: http://goodjudgmentproject.blogspot.com/

Pre-Register: http://surveys.crowdcast.com/s3/ACERegistration

Comments (55)

Comment author: Morendil 02 November 2011 06:29:51PM *  2 points [-]

The Good Judgment project has started publishing a leaderboard. FWIW, as of this writing I am in pole position with a "Brier score" of 0.18, with numbers 2 and 3 at 0.2 and 0.23 respectively. (I'm not sure whether other participants are also from LW.)

(ETA: dethroned! I'm #2 now, #1 has a score of .16.)

Team scores seem a bit below the best individual scores: 0.32, 0.33 and 0.36 for the best three teams.

From the emails I've been getting from the organizers, they have trouble sustaining participation from all who signed up; poor participation is leading to poor forecasting scores.

Comment author: anonomouse 08 November 2011 04:22:45PM 1 point [-]

FYI the leaderboard rankings are fake, or at least generated strategically to provide users with specific information. I am near the top of my own leaderboard, while my friend sees his own name but not mine. Also, my Brier is listed at 0.19, strikingly close to yours. I wonder if they are generated with some apparent distribution.

My take is that the leader stats are some kind of specific experimental treatment they're toying with.

Comment author: Morendil 08 November 2011 06:27:44PM 1 point [-]

This is almost more interesting than the study itself. :)

Are your friend and you able to see each other's comments on predictions?

Comment author: JoshuaZ 02 November 2011 06:32:13PM 1 point [-]

poor participation is leading to poor forecasting scores.

Hmm, correlation v. causation maybe? It is possible that some people were doing poorly and so started participating less?

Comment author: Morendil 02 November 2011 06:54:45PM 1 point [-]

Yes, it's possible too. I used "causing" referring to a direct link: some predictions are of the form "event X will happen before date D", and you lose points if you fail to revise your estimates as D draws nearer.

Apparently many people weren't aware of this aspect - they took a "fire and forget" approach to prediction. (That is in itself an interesting lesson.) That was before the leaderboard was set up.

Comment author: Lightwave 04 August 2011 08:14:48AM *  2 points [-]

Is this limited to graduates from U.S. universities?

Comment author: Morendil 04 August 2011 11:43:04AM 3 points [-]

Apparently the only way to know is to try. It seems likely that there is such a restriction. I'd estimate a better than 70% chance that I get turned down. :)

Comment author: gwern 05 August 2011 08:17:34PM 5 points [-]

I got an email an hour ago from the study saying I was accepted and taking me to the initial survey (a long one, covering calibration on geopolitics, finance, and religion; personality surveys with a lot of fox/hedgehog questions; basic probability; a critical thinking test, the CRT; and then what looked like a full matrix IQ test). The message at the end of all the questions:

Congratulations! You’ve completed the survey. Sometime later this year, we’ll post information on the distribution of answers among those participating in this study.

What comes next? Some of you (by random assignment) will receive an e-mail with a link to a training exercise. Again, we ask you to complete that exercise before forecasting begins on September 1st. That’s the big day for the entire team – the official start of forecasting on 9/1/2011.

Be sure to watch your e-mail for a personalized link to “your” forecasting website. We hope you’re as eager as we are for the tournament to begin.

So I'm marking me as accepted, anyway.

Comment author: Morendil 07 September 2011 10:05:03AM 2 points [-]

And the "tournament" is now begun. Just got email with login instructions.

Looks somewhat similar to PredictionBook, actually. :)

Comment author: gwern 07 September 2011 12:49:06PM 3 points [-]

I did all my predictions last night immediately after the email showed up, so that meant I got to place a lot of bets at 50/50 odds :)

(Then I recorded everything privately in PredictionBook. No point in leaving my predictions trapped on their site.)

Interface-wise, I don't like it at all. I'm still not sure what exactly I am betting at or with, compared to PB with straight probabilities or Intrade with share prices.

Comment author: Morendil 07 September 2011 02:51:23PM 2 points [-]

Did you take the "training refresher"? That includes a general-knowledge test at the end which scores you on both calibration and resolution. My results were pretty poor (but not abysmal):

You got 63% of the items correct, and your average confidence rating over all of the items was 74.33%. (...) In this exercise, your calibration is 11.00 (average confidence minus percent correct). (...) Your confidence when you were correct was 75.26%, and your confidence when you were incorrect was 72.73%. The difference is 2.53%.

I'd be curious to compare with yours if you'd care to share.

Comment author: gwern 07 September 2011 03:00:53PM 2 points [-]

Without actually going through the whole refresher, it seems to be the same; when I did the training, I don't remember that calibration/resolution test. Perhaps that is one of the experimental differences.

Comment author: Morendil 07 September 2011 03:05:02PM 2 points [-]

I didn't remember that test from earlier, either. Worth checking out? I don't mind accidentally unblinding a little if it is an experimental/control difference - curious folks will be curious.

Comment author: gwern 07 September 2011 03:16:52PM 2 points [-]

I just went through the whole thing again; there was no test of that kind at the end. (What there was was the previous multiple-choice quiz about some example forecasts and how they went wrong.) Looks like this is an experimental/control difference. I'd rather not discuss that bit further - this isn't about possibly life-or-death drugs, after all, and I already know where I can find calibration tests like that.

Comment author: Morendil 07 September 2011 03:19:57PM 1 point [-]

Fine with me. :)

BTW, look what I found. Did you know about this one?

Comment author: Morendil 08 September 2011 01:38:09PM 1 point [-]

Have you entered any comments on your predictions at the GJ site? (You're supposed to enter a minimum number of comments over one year, and also a minimum number of responses to others' comments. My understanding is that this will in time be run as a team game, with team play conventions.)

From my first experiences, I'm assuming the scoring will be pretty much as with PB.com - based on probability. Their model seems to be calibration/resolution rather than the visual "slope" representation.

Comment author: gwern 08 September 2011 02:21:31PM 1 point [-]

Comments? I don't see any relevant fields for that, checking right now, nor does my 'About' include the substring "comment". Another experimental difference, I guess...

Comment author: Morendil 09 September 2011 07:00:16AM 1 point [-]

The "Why did you answer the way you did" field. I've been assuming we're both using the same underlying app, i.e. Crowdcast. But perhaps we're not...

Comment author: gwern 04 August 2011 05:31:21PM 4 points [-]
Comment author: Morendil 05 August 2011 08:17:03PM 2 points [-]

I'm in; pleasantly surprised.

This bit from the final registration page is interesting - "But one other requirement for forecasters has changed. We can welcome those who are not US citizens." Implying that at some prior point non-US citizens were not accepted.

Comment author: XFrequentist 04 August 2011 06:00:25PM 1 point [-]

That is awesome!

Comment author: Morendil 04 August 2011 08:47:11PM 2 points [-]

Especially (mischievous mode ON) as I've only implied, not outright stated, that I've applied.

Mischievous mode OFF - that's a problem in arbitrating predictions, btw - the potential for ambiguity inherent in all human languages. If I hadn't in fact applied (I have), how should the prediction that I am "turned down" be judged?

I should use PredictionBook more often but I don't, partly due this kind of thing, also due to the trivial-inconvenience effort of having to come up with my own predictions to assess and the general uselessness for that purpose of the stream of other users' predictions.

Other than Tricycle folks, is anyone here on LW officially (or unofficially) "in charge" of maintaining and enhancing PredictionBook?

Comment author: gwern 04 August 2011 11:09:17PM *  3 points [-]

Other than Tricycle folks, is anyone here on LW officially (or unofficially) "in charge" of maintaining and enhancing PredictionBook?

I have some sort of moderator power; I am de facto in charge of the content house-keeping - editing bad due-dates, making private bad or long-overdue-unjudged predictions, criticizing predictions, etc. I also make and register hundreds of predictions, obviously.

(In addition, I have commit access to the codebase on GitHub, but I don't know Ruby, so I will probably never make use of said commit-bit.)

Comment author: Morendil 04 September 2011 03:53:59PM 1 point [-]

One thing that would probably greatly improve PB for my purposes is a tagging / filtering system, so that you could for instance pick out predictions about consumer devices or predictions about politics; or conversely leave out some uninteresting categories (e.g. predictions about the private lives of particular PB users, which I interpret as pure noise).

Comment author: gwern 04 September 2011 03:57:35PM 0 points [-]

Google is not sufficient, I take it?

Comment author: Morendil 04 September 2011 04:38:01PM 3 points [-]

No; I just tried the query "consumer electronics site:predictionbook.com", and that only returned 1 hit; I know there are more (including one I just made and another I just voted on). It really is the lack of user-supplied meta-information that prevents useful querying, not the lack of a UI for doing so. The UI encourages predictions to be written very tersely, and doesn't supply an extended-info field when you make a prediction.

PB.com is quite possibly the least well executed idea out there that I keep not giving up on. :)

Comment author: gwern 04 September 2011 05:37:13PM *  1 point [-]

Ah, that's what you meant by tags. Yes, that would be nice. On the other hand, I rather doubt that tags would instantly create massive demand for PB's services - other places like Intrade have well-categorized predictions/bets, and none of them have seen traffic explode the moment they implemented that feature.

If you really found tags all that valuable, you could start doing them inside comments. Go over the 969 upcoming predictions and add comments like 'tags: personal, exercise' or 'tags: America, politics'. Later, it'd be even easier to turn them into some real software-supported tags/categories, and in the meantime, you can query using Google. This wouldn't even take very long - at 30 predictions a day, which ought to take 10 minutes max, you'd be done in a month.

(I doubt you will adopt my suggestion and tag even 500 predictions (10%). This seems to be common to suggestions for PB: 'I'd use and really find PB useful if only it were executed better in this way', which of course never happens. It's starting to remind me of cryonics.)

Comment author: XFrequentist 04 August 2011 01:53:07PM 1 point [-]

No. My degrees are from Canada and France, and I'm in.

Comment author: Morendil 04 August 2011 08:37:41PM 2 points [-]

How long did it take between your "preregistering" and hearing back?

What form did that take? (I.e. form email, personal email, direct link to a Web page, or whatever?)

How long between hearing back and being fully accepted? (I'm assuming that's what "I'm in" means...)

Comment author: XFrequentist 05 August 2011 12:24:52PM *  1 point [-]

How long did it take between your "preregistering" and hearing back?

3-4 days.

What form did that take?

An email welcoming me to the study with a link to the pre-study survey (which is a mix of attitude/ideology, knowledge, logic, and intelligence questions).

How long between hearing back and being fully accepted?

Same as above.

Comment author: Morendil 30 November 2011 05:50:51PM 1 point [-]

Excerpted from interesting recent news from the GJP, which is now entering the "official" tournament phase:

Meanwhile, we have updated the scoring sidebar accessible from the "About" tab of your forecasting website to provide forecasters affected by the new scoring rule with more information (this does not apply to prediction-market forecasters). We also will be using the FAQs to provide all of you with details about the number of forecasters participating in the tournament (currently over 2,700 on the Good Judgment Team, spread over 12 experimental conditions) and other topics that have prompted questions to our Help Desk or project administrator.

Comment author: mindspillage 12 August 2011 07:55:25PM 1 point [-]

I'm in too. Took 5 days to get back to me.

Comment author: Jayson_Virissimo 06 August 2011 02:08:52AM 1 point [-]

XFrequentist, I owe you one.

Comment author: nazgulnarsil 05 August 2011 09:17:27PM 1 point [-]

"How many countries in the Euro zone will default on bonds in 2011?” or “Will Southern Sudan become an independent country in 2011?”

It's hard to make predictions about politics because the decision makers have perverse/unknown sets of incentives. In contrast, it's much easier to make guesses with reasonable error bars when the decision maker is spending his/her own money.

Comment author: magfrump 03 August 2011 08:03:50PM 1 point [-]

I'm in.

Comment author: GuySrinivasan 03 August 2011 05:24:20PM 1 point [-]

That's fantastic. Thanks for the pointer.

Comment author: gwern 03 August 2011 03:50:00PM *  1 point [-]

Many thanks for posting this! I'd probably want to do this even if there were no payment, so it's doubly attractive to me. I've submitted the form.

EDIT: I wonder if I'll get in; it just got posted to Marginal Revolution and I doubt they have that huge a budget...

Comment author: Armok_GoB 05 August 2011 06:24:31PM *  0 points [-]

So... what's the catch?

Also, my main reason for not signing up is time, responsibility and commitment. Any idea how much of those this might require?

Edit: entire conversation removed due to complete failure of reading comprehension.

Comment author: XFrequentist 05 August 2011 08:32:54PM 2 points [-]

The necessary time commitment is in the order of 6-10 hours per year. You can put as much time into training as you like, of course.

Comment author: Armok_GoB 05 August 2011 10:28:34PM 0 points [-]

is that 10 hours the week of signup, or 1.6438356164 minutes per day?

... wait, that's still almost 2 min? Probably not worth it even then.

Comment author: Cyan 06 August 2011 02:25:27AM 1 point [-]

Just to be clear, the deal is that you will receive somewhere between $15 to $25 per hour and also receive an assessment of your calibration and possibly also receive forecasting training...

Comment author: Armok_GoB 06 August 2011 02:25:28PM 0 points [-]


Well, I'm probably disqualified by virtue of this conversation taking place then.