Stefan_King comments on Raising the forecasting waterline (part 1) - LessWrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (108)
del
On 16 questions currently scored, I've done better than the team average at 15. Two of the questions where I outperformed the team by a large margin where the Syrian refugee question, basically a matter of extrapolating a trend and predicting status quo with respect to the conflict, and the Kismayo question, basically a matter of knowing my loss function. I had zero home ground advantage on either question.
Some of my wins resulted purely from general knowledge rather than from having any idea of the specifics of the situation: for instance, in mid-August I answered 40% to "Will Kuwait commence parliamentary elections before 1 October 2012?", reflecting only status quo bias in that a date for the election had not yet been announced. However, early in September I downgraded this to 10%, because I know that as a rule of thumb it takes at least one month to convene an election. The week before, I went to 5% (and even that was quite a generous margin), while several of my teammates made predictions, after I published mine, of 15%, 19%, 33% and even 51% (!).
This felt like entering a poker tournament where people routinely raise pre-flop with a "beer hand" (seven and two - when you play this, either you've had too many beers, or it's time you have one). Elections aren't a mysterious thing, we participate in one every so often. You need to print ballots, set up voting booths, audit voter registration records, give people time to campaign on national media, all very mundane stuff. Even dictatorships make at least a half-hearted attempt at this, and it's not like anyone in Kuwait had any particular interest in meeting an October deadline, this was strictly an internal-to-GJP deadline.
So while this question had to do, ostensibly, with something happening in Kuwait, all you needed to make a call at least as good as mine was background knowledge about extremely mundane, practical stuff that, if I had any hint that you wouldn't factor that in when making a close-to-home prediction, I wouldn't trust you with organizing so much as the PTA president election. Maybe a birthday party.
I wouldn't go so far as to claim that "skill at forecasting macro trends transfer to microeconomic moves".
But I'd take a stand on "demonstrated incompetence at the most elementary moves of forecasting, in a macro domain, is a strong indicator of likely incompetence at forecasting in any micro domain, other than the few narrow ones you might happen to be good at".
Yeah. Answering “1%” that “there will be a major earthquake in California during $time_period” a month before the end of $time_period kind-of felt like cheating to me.
How does GJP score predictions that change over time?
They compute your Brier score for each day that the question is open, according to what your forecast is on that day, and average over all days.
Suppose you start at 80%, six days pass, you switch to 40% three days before the deadline, and the event doesn't happen, your score is (6*(0.8)^2+3*(0.4)^2)/9 = .48, which is a so-so score - but an improvement over the .64 that you'd get if you didn't change your mind.
In a nutshell, no.
Consider some practicalities. An advantage of forecasting world events is that it permits participation by a much broader population. I could run a forecasting contest on when the city of Paris will complete a construction project on the banks of the Seine, which is "my backyard" compared to Syria. Nobody would bother.
The point is to find out something about how you think, and comparing yourself to other people will yield information that you can't get by sitting on your own, minding your own business. (On the other hand, there's nothing preventing you from doing both.)
Finally, I'm not aware that people routinely make explicit, quantified forecasts even about their own business. Rather, it seems plain that most of the time, we think "probable" the things we would like to happen, and as a result fail to plan for contingencies we don't like to think about.
To go from not forecasting at all to making forecasts in any domain is progress. It would certainly be useful to many to make forecasts about their daily lives (which I now do, a little bit). But let's imagine this were taught in schools as a life skill: I suspect you would have people practicing precisely on events that they have no control over and that allow interpersonal comparison.
del
Thanks for inspiring the following bit of staircase wit, which might make it into some further version of the post: Tool 0 of forecasting is "forecast". If you don't do it, you can't become better at it.
Gwern prefers PredictionBook - where you can, if you want, record private predictions - to GJP. For my part I prefer GJP, precisely because they ask me questions that might not occur to me otherwise, and the competitive aspect suits me. You could also do just fine by recording your own forecasts in a spreadsheet or a notepad, on whatever topics you like.
Is accuracy what you're after? Which component of accuracy? I can get perfect calibration by throwing a thousand coin flips and predicting 50% all the time. What I seek is debiasing, making the most of whatever information is available without overweighting any part of it (including my own hunches, feelings and fears); and I'm most vulnerable to bias when there are many moving parts, many of which are hidden from me or unknown to me.
No, tool 0 is more like 'mind your base rates' or 'don't predict what you would like, predict what you really think would happen'. I dunno where you're getting Tool 0 as 'Mind your own business' from; certainly I or Morendil didn't write it.
I dunno, did you look into any research?
Per the huge amount of material on Outside View vs Inside View and performance of SPRs already discussed on LW, I would guess quite the opposite.
Do you know that, or are you just guessing, as you said you were before?
Or was your entire comment just an excuse to do an awful lot of rhetorical questions?
I think he's saying it's a waste of effort to predict who or what will happen in the world if you can't exert any control over it. That sort of makes sense because it seems useless to worry about those sort of things, at first. But it's important to understand the consequences of the actions of other people so that you can react to them, and he didn't take that into account. So, for example, a French citizen might be interested in knowing who the next US president will be because they're curious about the implications that has for their business contacts in America.
Buying insurance is a decision that relates to things that may or may not happen, that you have little or no control over: illness, accidents, burglaries, etc. Being able to make informed predictions as to the likelihood of these things is a valuable life skill.
del
Are they different in kind? I'm uncertain.
The distinction seems arbitrary at first glance both because what's personal for one person is impersonal for another and because causality is causality no matter where it occurs. However, if you meant that they're different in kind in a more epistemic sense, that they're different in kind from any particular perspective because of the way that they go through your reasoning process, then that seems plausible.
The question is then what types of data work best and why. You're likely to have less total amounts of data in Near Mode, but you'll be working with things that are important to you personally which it seems like evolution would favor (individual selection).
On the other hand, evolution seems to make biases more frequent and more intense when they're about personal matters. But evolution wouldn't do this if it hadn't worked often in the past, so perhaps those biases are good? I think that this is fairly plausible, but I also think that these biases would only be "good" in a reproductive sense and not in the sense of epistemic accuracy. They would move you towards maximizing your social status, not the quality of your predictions. It's unlikely those would overlap.
How likely is it that people are good at evaluating the credibility of the ideas of specific people? I would say that most people are probably bad at this when seeing others face to face because of things like the halo effect and because credibility is rather easy to fake. I would also say that people are rather good at this otherwise. Are these evaluations still accurate when they interact with social motivations, like rivalry? I would say that they probably end up even worse under those circumstances.
So, I believe that personal events and impersonal events should be considered differently because I believe trying to evaluate the accuracy of the views of specific experts would improve the accuracy of your predictions if and only if you avoided personal familiarity or intimacy with those experts, and that otherwise it would damage your accuracy.
I failed to consider the implications of social motivation for professional accuracy, and a bunch of other stuff.
del
I'm sorry, either I'm misunderstanding you or you misunderstood my comment. I don't understand what you mean by the phrase "choosing types of data". I think that although we're better at dealing with some types of data, that doesn't mean we should focus exclusively on that type of data. I think that becoming a skilled general forecaster is a very useful thing and something that should be pursued.
What sort of questions did you have in mind?
del
Well, I can give you an argument, though you'll have to evaluate the strength of it yourself.
Forecasting, in a Bayesian sense, is a matter of repeated application of Bayes' theorem. In short, I make an observation (B) and then ask - what are the chances of prediction (A), given observation (B)? ('Prediction' may be the wrong word, given that I may be predicting something unseen that has already happened). Bayes' theorem states that this is equal to the following:
The chances of observation B, given prediction A, multiplied by the prior probability of prediction A, divided by the prior probability of observation B
Now, the result of the equation is only as good as the figures you feed into it. In your example of the freelancer, the new freelancer (just starting out) has poor estimates of the probabilities involved, though he can improve these estimates by asking a more experienced freelancer for help. The experienced freelancer, on the other hand, has got a better grasp of the input probabilities, and thus gets a more accurate output probability. The equation works for both large-scale, macro events and small-scale, personal events - the difference is, once again, a matter of the input numbers. For a macro event, you'll have more people looking at, commenting on, discussing the situation; reading the words of others will improve your estimates of the probabilities involved, and putting better numbers in will get you better numbers out. Also, with macro events, you're more likely to have more time to sit down with pencil and paper and work it out.
However, predicting macro events will help you to better practice the equation, and thus learn how to apply it more quickly and easily to micro events. Sufficient practice will also help you to more quickly and accurately estimate the result for a given set of inputs. So while it is true that the skill of guessing the input probabilities for macro events may have little to do with the skill of guessing the input probabilites for micro events (though there is some correlation there - the skill of accurately putting figures to the probability may transfer to some degree), the skill of practicing the application of the equation is transferable between the two realms.
Well, to start with: what evidence do you have at the moment about how well calibrated you are?
The methods that Morendil is discussing here are pretty general forecasting techniques, not limited to a particular domain. Some skills are worth developing, even if you're practicing them in domains you don't care about.
Personal example: I was a bio major in college, and I found it very difficult to care about organic chemistry, because we were mostly learning about chemicals that had no biological relevance. Consequently, I didn't learn it very well, which came back to bite me pretty hard when I took biochemistry.