[LINK] Get paid to train your rationality
A tournament is currently being initiated by the Intelligence Advanced Research Project Activity (IARPA) with the goal of improving forecasting methods for global events of national (US) interest. One of the teams (The Good Judgement Team) is recruiting volunteers to have their forecasts tracked. Volunteers will receive an annual honorarium ($150), and it appears there will be ongoing training to improve one's forecast accuracy (not sure exactly what form this will take).
I'm registered, and wondering if any other LessWrongers are participating/considering it. It could be interesting to compare methods and results.
Extensive quotes and links below the fold.
Despite its importance in modern life, forecasting remains (ironically) unpredictable. Who is a good forecaster? How do you make people better forecasters? Are there processes or technologies that can improve the ability of governments, companies, and other institutions to perceive and act on trends and threats? Nobody really knows.
The goal of the Good Judgment Project is to answer these questions. We will systematically compare the effectiveness of different training methods (general education, probabilistic-reasoning training, divergent-thinking training) and forecasting tools (low- and high-information opinion-polls, prediction market, and process-focused tools) in accurately forecasting future events. We also will investigate how different combinations of training and forecasting work together. Finally, we will explore how to more effectively communicate forecasts in ways that avoid overwhelming audiences with technical detail or oversimplifying difficult decisions.
Over the course of each year, forecasters will have an opportunity to respond to 100 questions, each requiring a separate prediction, such as “How many countries in the Euro zone will default on bonds in 2011?” or “Will Southern Sudan become an independent country in 2011?” Researchers from the Good Judgment Project will look for the best ways to combine these individual forecasts to yield the most accurate “collective wisdom” results. Participants also will receive feedback on their individual results.
All training and forecasting will be done online. Forecasters’ identities will not be made public; however, successful forecasters will have the option to publicize their own track records.
Who We Are
The Good Judgment research team is based in the University of Pennsylvania and the University of California Berkeley. The project is led by psychologists Philip Tetlock, author of the award-winning Expert Political Judgment, Barbara Mellers, an expert on judgment and decision-making, and Don Moore, an expert on overconfidence. Other team members are experts in psychology, economics, statistics, interface design, futures, and computer science.
We are one of five teams competing in the Aggregative Contingent Estimation (ACE) Program, sponsored by IARPA (the U.S. Intelligence Advanced Research Projects Activity). The ACE Program aims "to dramatically enhance the accuracy, precision, and timeliness of forecasts for a broad range of event types, through the development of advanced techniques that elicit, weight, and combine the judgments of many intelligence analysts." The project is unclassified: our results will be published in traditional scholarly and scientific journals, and will be available to the general public.
A general description of the expected benefits for volunteers:
All decisions involve forecasts, and we all make forecasts all the time. When we decide to change jobs, we perform an analysis of potential futures for each of our options. When a business decides to invest or disinvest in a project, it moves in the direction it believes to present the best opportunity. The same applies when a government decides to launch or abandon a policy.
But we virtually never keep score. Very few forecasters know what their forecasting batting average is — or even how to go about estimating what it is.
If you want to discover what your forecasting batting average is — and how to think about the very concept — you should seriously consider joining The Good Judgment Project. Self-knowledge is its own reward. But with self-knowledge, you have a baseline against which you can measure improvement over time. If you want to explore how high your forecasting batting average could go, and are prepared to put in some work at self-improvement, this is definitely the project for you.
Could that be any more LessWrong-esque?
Prediction markets can harness the "wisdom of crowds" to solve problems, develop products, and make forecasts. These systems typically treat collective intelligence as a commodity to be mined, not a resource that can be grown and improved. That’s about to change.
Starting in mid-2011, five teams will compete in a U.S.-government-sponsored forecasting tournament. Each team will develop its own tools for harnessing and improving collective intelligence and will be judged on how well its forecasters predict major trends and events around the world over the next four years.
The Good Judgment Team, based in the University of Pennsylvania and the University of California Berkeley, will be one of the five teams competing – and we’d like you to consider joining our team as a forecaster. If you're willing to experiment with ways to improve your forecasting ability and if being part of cutting-edge scientific research appeals to you, then we want your help.
We can promise you the chance to: (1) learn about yourself (your skill in predicting – and your skill in becoming more accurate over time as you learn from feedback and/or special training exercises); (2) contribute to cutting-edge scientific work on both individual-level factors that promote or inhibit accuracy and group- or team-level factors that contribute to accuracy; and (3) help us distinguish better from worse approaches to generating forecasts of importance to national security, global affairs, and economics.
Who Can Participate
Requirements for participation include the following:
(1) A baccalaureate, bachelors, or undergraduate degree from an accredited college or university (more advanced degrees are welcome);
(2) A curiosity about how well you make predictions about world events – and an interest in exploring techniques for improvement.
More info: http://goodjudgmentproject.blogspot.com/
Pre-Register: http://surveys.crowdcast.com/s3/ACERegistration
Loading…
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Comments (55)
The Good Judgment project has started publishing a leaderboard. FWIW, as of this writing I am in pole position with a "Brier score" of 0.18, with numbers 2 and 3 at 0.2 and 0.23 respectively. (I'm not sure whether other participants are also from LW.)
(ETA: dethroned! I'm #2 now, #1 has a score of .16.)
Team scores seem a bit below the best individual scores: 0.32, 0.33 and 0.36 for the best three teams.
From the emails I've been getting from the organizers, they have trouble sustaining participation from all who signed up; poor participation is leading to poor forecasting scores.
FYI the leaderboard rankings are fake, or at least generated strategically to provide users with specific information. I am near the top of my own leaderboard, while my friend sees his own name but not mine. Also, my Brier is listed at 0.19, strikingly close to yours. I wonder if they are generated with some apparent distribution.
My take is that the leader stats are some kind of specific experimental treatment they're toying with.
This is almost more interesting than the study itself. :)
Are your friend and you able to see each other's comments on predictions?
Hmm, correlation v. causation maybe? It is possible that some people were doing poorly and so started participating less?
Yes, it's possible too. I used "causing" referring to a direct link: some predictions are of the form "event X will happen before date D", and you lose points if you fail to revise your estimates as D draws nearer.
Apparently many people weren't aware of this aspect - they took a "fire and forget" approach to prediction. (That is in itself an interesting lesson.) That was before the leaderboard was set up.
Is this limited to graduates from U.S. universities?
Apparently the only way to know is to try. It seems likely that there is such a restriction. I'd estimate a better than 70% chance that I get turned down. :)
I got an email an hour ago from the study saying I was accepted and taking me to the initial survey (a long one, covering calibration on geopolitics, finance, and religion; personality surveys with a lot of fox/hedgehog questions; basic probability; a critical thinking test, the CRT; and then what looked like a full matrix IQ test). The message at the end of all the questions:
So I'm marking me as accepted, anyway.
And the "tournament" is now begun. Just got email with login instructions.
Looks somewhat similar to PredictionBook, actually. :)
I did all my predictions last night immediately after the email showed up, so that meant I got to place a lot of bets at 50/50 odds :)
(Then I recorded everything privately in PredictionBook. No point in leaving my predictions trapped on their site.)
Interface-wise, I don't like it at all. I'm still not sure what exactly I am betting at or with, compared to PB with straight probabilities or Intrade with share prices.
Did you take the "training refresher"? That includes a general-knowledge test at the end which scores you on both calibration and resolution. My results were pretty poor (but not abysmal):
I'd be curious to compare with yours if you'd care to share.
Without actually going through the whole refresher, it seems to be the same; when I did the training, I don't remember that calibration/resolution test. Perhaps that is one of the experimental differences.
I didn't remember that test from earlier, either. Worth checking out? I don't mind accidentally unblinding a little if it is an experimental/control difference - curious folks will be curious.
I just went through the whole thing again; there was no test of that kind at the end. (What there was was the previous multiple-choice quiz about some example forecasts and how they went wrong.) Looks like this is an experimental/control difference. I'd rather not discuss that bit further - this isn't about possibly life-or-death drugs, after all, and I already know where I can find calibration tests like that.
Fine with me. :)
BTW, look what I found. Did you know about this one?
Have you entered any comments on your predictions at the GJ site? (You're supposed to enter a minimum number of comments over one year, and also a minimum number of responses to others' comments. My understanding is that this will in time be run as a team game, with team play conventions.)
From my first experiences, I'm assuming the scoring will be pretty much as with PB.com - based on probability. Their model seems to be calibration/resolution rather than the visual "slope" representation.
Comments? I don't see any relevant fields for that, checking right now, nor does my 'About' include the substring "comment". Another experimental difference, I guess...
The "Why did you answer the way you did" field. I've been assuming we're both using the same underlying app, i.e. Crowdcast. But perhaps we're not...
I'm in; pleasantly surprised.
This bit from the final registration page is interesting - "But one other requirement for forecasters has changed. We can welcome those who are not US citizens." Implying that at some prior point non-US citizens were not accepted.
That is awesome!
Especially (mischievous mode ON) as I've only implied, not outright stated, that I've applied.
Mischievous mode OFF - that's a problem in arbitrating predictions, btw - the potential for ambiguity inherent in all human languages. If I hadn't in fact applied (I have), how should the prediction that I am "turned down" be judged?
I should use PredictionBook more often but I don't, partly due this kind of thing, also due to the trivial-inconvenience effort of having to come up with my own predictions to assess and the general uselessness for that purpose of the stream of other users' predictions.
Other than Tricycle folks, is anyone here on LW officially (or unofficially) "in charge" of maintaining and enhancing PredictionBook?
I have some sort of moderator power; I am de facto in charge of the content house-keeping - editing bad due-dates, making private bad or long-overdue-unjudged predictions, criticizing predictions, etc. I also make and register hundreds of predictions, obviously.
(In addition, I have commit access to the codebase on GitHub, but I don't know Ruby, so I will probably never make use of said commit-bit.)
One thing that would probably greatly improve PB for my purposes is a tagging / filtering system, so that you could for instance pick out predictions about consumer devices or predictions about politics; or conversely leave out some uninteresting categories (e.g. predictions about the private lives of particular PB users, which I interpret as pure noise).
Google is not sufficient, I take it?
No; I just tried the query "consumer electronics site:predictionbook.com", and that only returned 1 hit; I know there are more (including one I just made and another I just voted on). It really is the lack of user-supplied meta-information that prevents useful querying, not the lack of a UI for doing so. The UI encourages predictions to be written very tersely, and doesn't supply an extended-info field when you make a prediction.
PB.com is quite possibly the least well executed idea out there that I keep not giving up on. :)
Ah, that's what you meant by tags. Yes, that would be nice. On the other hand, I rather doubt that tags would instantly create massive demand for PB's services - other places like Intrade have well-categorized predictions/bets, and none of them have seen traffic explode the moment they implemented that feature.
If you really found tags all that valuable, you could start doing them inside comments. Go over the 969 upcoming predictions and add comments like 'tags: personal, exercise' or 'tags: America, politics'. Later, it'd be even easier to turn them into some real software-supported tags/categories, and in the meantime, you can query using Google. This wouldn't even take very long - at 30 predictions a day, which ought to take 10 minutes max, you'd be done in a month.
(I doubt you will adopt my suggestion and tag even 500 predictions (10%). This seems to be common to suggestions for PB: 'I'd use and really find PB useful if only it were executed better in this way', which of course never happens. It's starting to remind me of cryonics.)
No. My degrees are from Canada and France, and I'm in.
How long did it take between your "preregistering" and hearing back?
What form did that take? (I.e. form email, personal email, direct link to a Web page, or whatever?)
How long between hearing back and being fully accepted? (I'm assuming that's what "I'm in" means...)
3-4 days.
An email welcoming me to the study with a link to the pre-study survey (which is a mix of attitude/ideology, knowledge, logic, and intelligence questions).
Same as above.
Excerpted from interesting recent news from the GJP, which is now entering the "official" tournament phase:
I'm in too. Took 5 days to get back to me.
XFrequentist, I owe you one.
"How many countries in the Euro zone will default on bonds in 2011?” or “Will Southern Sudan become an independent country in 2011?”
It's hard to make predictions about politics because the decision makers have perverse/unknown sets of incentives. In contrast, it's much easier to make guesses with reasonable error bars when the decision maker is spending his/her own money.
I'm in.
That's fantastic. Thanks for the pointer.
Many thanks for posting this! I'd probably want to do this even if there were no payment, so it's doubly attractive to me. I've submitted the form.
EDIT: I wonder if I'll get in; it just got posted to Marginal Revolution and I doubt they have that huge a budget...
So... what's the catch?
Also, my main reason for not signing up is time, responsibility and commitment. Any idea how much of those this might require?
Edit: entire conversation removed due to complete failure of reading comprehension.
The necessary time commitment is in the order of 6-10 hours per year. You can put as much time into training as you like, of course.
is that 10 hours the week of signup, or 1.6438356164 minutes per day?
... wait, that's still almost 2 min? Probably not worth it even then.
Just to be clear, the deal is that you will receive somewhere between $15 to $25 per hour and also receive an assessment of your calibration and possibly also receive forecasting training...
Oh.
Well, I'm probably disqualified by virtue of this conversation taking place then.