Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Raising the forecasting waterline (part 1)

31 Post author: Morendil 09 October 2012 03:49PM

Previously: Raising the waterline, see also: 1001 PredictionBook Nights (LW copy), Techniques for probability estimates

Low waterlines imply that it's relatively easy for a novice to outperform the competition. (In poker, as discussed in Nate Silver's book, the "fish" are those who can't master basic techniques such as folding when they have a poor hand, or calculating even roughly the expected value of a pot.) Does this apply to the domain of making predictions? It's early days, but it looks as if a smallish set of tools - a conscious status quo bias, respecting probability axioms when considering alternatives, considering references classes, leaving yourself a line of retreat, detaching from sunk costs, and a few more - can at least place you in a good position. 

A bit of backstory

Like perhaps many LessWrongers, my first encounter with the notion of calibrated confidence was "A Technical Explanation of Technical Explanation". My first serious stab at publicly expressing my own beliefs as quantified probabilities was the Amanda Knox case - an eye-opener, waking me up to how everyday opinions could correspond to degrees of certainty, and how these had consequences. By the following year, I was trying to improve my calibration for work-related purposes, and playing with various Web sites, like PredictionBook or Guessum (now defunct).

Then the Good Judgment Project was announced on Less Wrong. Like several of us, I applied, unexpectedly got in, and started taking forecasting more seriously. (I tend to apply myself somewhat better to learning when there is a competitive element - not an attitude I'm particularly proud of, but being aware of that is useful.)

The GJP is both a contest and an experimental study, in fact a group of related studies: several distinct groups of researchers (1,2,3,4) are being funded by IARPA to each run their own experimental program. Within each, small or large number of participants have been recruited, allocated to different experimental conditions, and encouraged to compete with each other (or even, as far as I know, for some experimental conditions, collaborate with each other). The goal is to make predictions about "world events" - and if possible to get them more right, collectively, than we would individually.1

Tool 1: Favor the status quo

The first hint I got that my approach to forecasting needed more explicit thinking tools was a blog post by Paul Hewitt I came across late in the first season. My scores in that period (summer 2011 to spring 2012) had been decent but not fantastic; I ended up 5th on my team, which itself placed quite modestly in the contest.

Hewitt pointed out that in general, you could do better than most other forecasters by favoring the status quo outcome.2 This may not quite be on the same order of effectiveness as the poker advice to "err on the side of folding mediocre hands more often", but it makes a lot of sense, at least for the Good Judgment Project (and possibly for many of the questions we might worry about). Many of the GJP questions refer to possibilities that loom large in the media at a given time, that are highly available - in the sense of the availability heuristic. This results in a tendency to favor forecasts of change from status quo.

For instance, one of the Season 1 questions was "Will Marine LePen cease to be a candidate for President of France before 10 April 2012?" (also on PredictionBook). Just because the question is being asked doesn't mean that you should assign "yes" and "no" equal probabilities of 50%, or even close to 50%, any more than you should assign 50% to the proposition "I will win the lottery".

Rather, you might start from a relatively low prior probability that anyone who undertakes something as significant as a bid for national presidency would throw in the towel before the contest even starts. Then, try to find evidence that positively favors a change. In this particular case, there was such evidence -  the National Front, of which she was the candidate, consistently reports difficulties rounding up the endorsements required to register a candidate legally. However, only once in the past (1981) had this resulted in their candidate being barred (admittedly a very small sample). It would have been a mistake to weigh that evidence excessively. (I got a good score on that question, compared to the team, but definitely owing to a "home ground advantage" as a French citizen rather than my superior forecasting skills.)

Tool 2: Flip the question around

The next technique I try to apply consistently is respecting the axioms of probability. If the probability of event A is 70%, then the probability of not-A is 30%.

This may strike everyone as obvious... it's not. In Season 2, several of my team-mates are on record as assigning a 75% probability to the proposition "The number of registered Syrian conflict refugees reported by the UNHCR will exceed 250,000 at any point before 1 April 2013".

That number was reached today, six months in advance of the deadline. This was clear as early as August. The trend in the past few months has been an increase of 1000 to 2000 a day, and the UNHCR have recently provided estimates that this number will eventually reach 700,000. The kicker is that this number is only the count of people who are fully processed by the UNHCR administration and officially in their database; there are tens of thousands more in the camps who only have "appointments to be registered".

I've been finding it hard to understand why my team-mates haven't been updating to, maybe not 100%, but at least 99%; and how one wouldn't see these as the only answers worth considering. At any point in the past few weeks, to state your probability as 85% or 91% (as some have quite recently) was to say, "There is still a one in ten chance that the Syrian conflict will suddenly stop and all these people will go home, maybe next week?."

This is kind of like saying "There is a one in ten chance Santa Claus will be the one distributing the presents this year." It feels like a huge "clack".

I can only speculate as to what's going on there. Queried for a probability, people are translating something like "Sure, A is happening" into a biggish number, and reporting that. They are totally failing to flip the question around and explicitly consider what it would take for not-A to happen. (Perhaps, too, people have been so strongly cautioned by cautions, from Tetlock and others, against being overconfident that they reflexively shy away from the extreme numbers.)

Just because you're expressing beliefs as percentages doesn't mean that you are automatically applying the axioms of probability. Just because you use "75%" as a shorthand for "I'm pretty sure" doesn't mean you are thinking probabilistically; you must train the skill of seeing that for some events, its complement "25%" also counts as "I'm pretty sure". The axioms are more important than the use of numbers - in fact for this sort of forecast "91%" strikes me as needlessly precise; increments of 5% are more than enough, away from the extremes.

Tool 3: Reference class forecasting

The order in which I'm discussing these "basics of forecasting" reflects not so much their importance, as the order in which I tend to run through them when encountering a new question. (This might not be the optimal order, or even very good - but that should matter little if the waterline is indeed low.)

Using reference classes was actually part of the "training package" of the GJP. From the linked post comes the warning that "deciding what's the proper reference class is not straightforward". And in fact, this tool only applies in some cases, not systematically. One of our recently closed questions was "Will any government force gain control of the Somali town of Kismayo before 1 November 2012?". Clearly, you could spend quite a while trying to figure out an appropriate reference class here. (In fact, this question also stands as a counter-example to the "Favor status quo" tool, and flipping the question around might not have been too useful either. All these tools require some discrimination.)

On the other hand, it came in rather handy in assessing the short-term question we got late september: "What change will occur in the FAO Food Price index during September 2012?" - with barely two weeks to go before the FAO was to post the updated index in early October. More generally, it's a useful tool when you're asked to make predictions regarding a numerical indicator, for which you can observe past data. 

The FAO price data can be retrieved as a spreadsheet (.xsl download). Our forecast question divided the outcomes into four: A) an increase of 3% or more, B) an increase of less than 3%, C) a decrease of less than 3%, D) a decrease of more than 3%, E) "no change" - meaning a change too small to alter the value rounded to the nearest integer.

It's not clear from the chart that there is any consistent seasonal variation. A change of 3% would have been about 6.4 points; since 8/2011 there had been four month-on-month changes of that magnitude, 3 decreases and 1 increase. Based on that reference class, the probability of a small change (B+C+E) came out to about 2/3. The probability for "no change" (E) to 1/12 - the August price was the same as the July price. The probability for an increase (A+B), roughly the same as for a decrease (C+D). My first-cut forecast allocated the probability mass as follows: 15/30/30/15/10.

However, I figured I did need to apply a correction, based on reports of a drought in the US that could lead to some food shortages. I took 10% probability mass from the "decrease" outcomes and allocated it to the "increase" outcomes. My final forecast was 20/35/25/10/10. I didn't mess around with it any more than that. As it turned out, the actual outcome was B! My score was bettered by only 3 forecasters, out of a total of 9.

Next up: lines of retreat, ditching sunk costs, loss functions

This post has grown long enough, and I still have 3+ tools I want to cover. Stay tuned for Part 2!



1 The GJP is being run by Phil Tetlock, known for his "hedgehog and fox" analysis of forecasting. At that time I wasn't aware of the competing groups - one of them, DAGGRE, is run by Robin Hanson (of OB fame) among others, which might have made it an appealing alternate choice if I'd know about it.

2 Unfortunately, the experimental condition Paul belonged to used a prediction market where forecasters played virtual money by "betting" on predictions; this makes it hard to translate the numbers he provides into probabilities. The general point is still interesting.

Comments (108)

Comment author: thomblake 09 October 2012 08:37:09PM 5 points [-]

Talking about increments of 5% runs counter to my intuitions regarding good thinking about probability estimates. For most purposes, the difference between 90% and 95% is significantly larger than the difference between 50% and 55%. Think in logs.

Comment author: Morendil 09 October 2012 09:30:19PM 4 points [-]

Yes, near the extremes it makes a difference - but we're using a Brier scoring rule, averaged over all days a forecast is open. That makes thinking in logs less important - 99% isn't much worse than 100% on errors. I'll discuss that in pt.2 under 'loss function'.

Comment author: thomblake 10 October 2012 01:49:12PM 0 points [-]

I'll discuss that in pt.2 under 'loss function'.

Hooray!

Comment author: army1987 10 October 2012 09:30:44AM 1 point [-]

It depends on whether you're using probabilities epistemically or instrumentally. Changing the probability of A from 90% to 95% doesn't affect your expected utility any more than changing it from 50% to 55%.

Comment author: roystgnr 12 October 2012 08:00:15AM 2 points [-]

The change in expected utility given constant decisions is the same for any 5% change in probability regardless of where the baseline is for the change. However, that "given constant decisions" criterion may be less likely to hold for a change from 90-95% than it is for a change from 50-55%. If you have to choose whether to risk a negative consequence of not-A in exchange for some benefit, for example, then it matters whether the expected negative utility of not-A just fell by a tenth or by half.

Comment author: thomblake 10 October 2012 01:49:36PM 1 point [-]

Yeah, that's why I said "For most purposes".

Comment author: gwern 10 October 2012 12:29:12AM 4 points [-]

I've been finding it hard to understand why my team-mates haven't been updating to, maybe not 100%, but at least 99%; and how one wouldn't see these as the only answers worth considering. At any point in the past few weeks, to state your probability as 85% or 91% (as some have quite recently) was to say, "There is still a one in ten chance that the Syrian conflict will suddenly stop and all these people will go home, maybe next week?."

GJP has pointed out already that forecasters are not updating as fast as they could. I assume a lot of forecasters are like me in rarely updating their predictions.

(In season 2, out of distaste for the new UI, I've barely been participating at all.)

Comment author: Pablo_Stafforini 11 October 2012 04:54:37AM 3 points [-]

So, when do you predict you'll post Part 2?

Comment author: Morendil 11 October 2012 06:17:26PM 3 points [-]

I really, really want to answer "shortly" and leave it at that.

But since you ask, 45% chance I'll do it tomorrow, 25% over the week-end, 10% monday, and 20% later than monday.

Predictions involving what I'm gonna do are trickier for me, because there's a feedback loop between the act of making a prediction, and my likelihood of taking the corresponding actions once the prediction has turned them into a public commitment; it's a complicated one which sometimes triggers procrastination, sometimes increased motivation.

Comment author: Pablo_Stafforini 11 October 2012 09:13:53PM *  1 point [-]

Thanks. Your prediction is now recorded on PredictionBook.

I hope you don't take it personally, but my estimate that you'll have the essay ready by tomorrow is lower than yours. Even those who, like Kahneman, know that the inside view yields overoptimistic estimates in cases of this sort tend to rely on it more than they should.

Of course, the fact that I'm making this prediction might also enter into the feedback loop you describe. I suspect the overall effect is that your prediction is now more likely to be true as a consequence of my having publicly given a lower estimate than you did.

Comment author: Pablo_Stafforini 12 October 2012 06:40:30PM 1 point [-]
Comment author: Pablo_Stafforini 11 October 2012 09:27:11PM *  0 points [-]

By the way, when you say "I really, really want to answer 'shortly'", is this just because you sometimes dislike giving precise estimates, or do you think there is sometimes a rational justification for this reluctance? Without having thought about the matter carefully, it seems to me that the only valid reason for abstaining from giving precise estimates is that one's audience might make assumptions about the reliability of the estimate from the fact that it is expressed in precise language (more precision suggests higher reliability). But provided one gives independent reliability measures (by e.g. being explicit about one's confidence intervals), can this reluctance still be justified?

Comment author: Morendil 12 October 2012 12:47:26AM 0 points [-]

is this just because you sometimes dislike giving precise estimates

It's because it's now 3am and I've stuck a knife in the back of my tomorrow-self, who will wake up sleep deprived, so that my present-self (with a 1500 word first draft completed) can enjoy the certainty of hitting an estimate which was only that, not a commitment. Hyperbolic discounting is a royal pain.

It's because I'm a sucker for this kind of thing, as are many of my colleagues working in software development. :-/

Comment author: Matt_Simpson 10 October 2012 05:33:32PM *  2 points [-]

Hewitt pointed out that in general, you could do better than most other forecasters by favoring the status quo outcome.

I vaguely recall some academic work showing this to be true, or more generally if you're predicting the variable X_t over time, the previous period's value tends to be a better predictor than more complicated models. Can anyone confirm/deny my memory? And maybe provide a citation?

Comment author: gwern 10 October 2012 07:00:45PM 4 points [-]

This is a theme of multiple papers in the 2001 anthology Principles of Forecasting (a PDF of which is findable online), to give a specific citation.

Comment author: Matt_Simpson 10 October 2012 09:23:07PM 0 points [-]

Thanks! That's exactly the sort of thing I was looking for, and maybe remembering.

Comment author: Vaniver 10 October 2012 05:47:21PM 2 points [-]

I vaguely recall some academic work showing this to be true, or more generally if you're predicting the variable X_t over time, the previous period's value tends to be a better predictor than more complicated models.

These get called AR(1) models, for autoregressive 1.

Most complicated models that I'm familiar with include both the previous value and other factors (since there is generally more going on than a random walk).

Comment author: TraderJoe 10 October 2012 10:24:46AM *  2 points [-]

[comment deleted]

Comment author: TheOtherDave 10 October 2012 02:13:21PM *  1 point [-]

provided I think a question and its negation are equally likely to have been asked, there is a 50% chance that the answer to the question you have asked is yes.

Well, yes. But ought I believe that a yes/no question I have no idea about is as likely as its negation to have been asked? (Especially if it's being asked implicitly by a situation, rather than explicitly by a human?)

Comment author: TraderJoe 10 October 2012 05:48:05PM *  1 point [-]

[comment deleted]

Comment author: chaosmosis 10 October 2012 06:22:14PM 1 point [-]

Ratio of true statements to false ones: low. Probability TraderJoe wants to make TheOtherDave look foolish: moderate, slightly on the higher end. Ratio of the probability that giving an obviously false statement an answer of relatively high probability would make TheOtherDave look foolish to the probability that giving an obviously true statement a relatively low probability would make TheOtherDave look foolish: moderately high. Probability that the statement is neither true nor false: low.

Conclusion: أنا من (أمريك is most likely false.

Comment author: TheOtherDave 10 October 2012 06:31:59PM 0 points [-]

Ratio of the probability that giving an obviously false statement an answer of relatively high probability would make TheOtherDave look foolish to the probability that giving an obviously true statement a relatively low probability would make TheOtherDave look foolish: moderately high.

That's interesting.

I considered a proposition like this, decided the ratio was roughly even, concluded that TraderJoe might therefore attempt to predict my answer (and choose their question so I'd be wrong), decided they'd have no reliable basis on which to do so and would know that, and ultimately discarded the whole line of reasoning.

Comment author: chaosmosis 10 October 2012 07:02:40PM *  -2 points [-]

I considered a proposition like this, decided the ratio was roughly even, concluded that TraderJoe might therefore attempt to predict my answer (and choose their question so I'd be wrong),

I figured that it would be more embarrassing to say something like "It is true that I am a sparkly unicorn" than to say "It is false that an apple is a fruit". Falsehoods are much more malleable, largely as an effect of the fact that there are so many more of them than truths, also because they don't have to be consistent. Since falsehoods are more malleable it seems that they'd be more likely to be ones used in an attempt to insult someone.

decided they'd have no reliable basis on which to do so and would know that, and ultimately discarded the whole line of reasoning.

My heuristic in situations with recursive mutual modeling is to assume that everyone else will discard whatever line of reasoning is recursive. I then go one layer deeper into the recursion than whatever the default assumption is. It works well.

Comment author: TheOtherDave 10 October 2012 07:17:46PM 3 points [-]

I then go one layer deeper into the recursion than whatever the default assumption is. It works well.

Sadly, I appear to lack your dizzying intellect.

Comment author: chaosmosis 10 October 2012 07:26:32PM 3 points [-]

I used to play a lot of Rock, Paper, Scissors; I'm pretty much a pro.

Comment author: gjm 10 October 2012 09:40:07PM 1 point [-]

It is possible that you may have missed TheOtherDave's allusion there.

Comment author: chaosmosis 10 October 2012 10:27:16PM *  0 points [-]

The phrase sounded familiar, but I don't recognize where it's from and a Google search for "lack your dizzying intellect" yielded no results.

Comment author: chaosmosis 10 October 2012 07:04:26PM *  -1 points [-]

My heuristic in situations with recursive mutual modeling is to assume that everyone else will discard whatever line of reasoning is recursive. I then go one layer deeper into the recursion than whatever the default assumption is. It works well.

Preempt: None of you have any way of knowing whether this is a lie.

Comment author: chaosmosis 10 October 2012 07:07:55PM -1 points [-]

The parent of this comment (yes, this one) is a lie.

Comment author: chaosmosis 10 October 2012 07:08:03PM -2 points [-]

The parent of this comment (yes, this one) is a lie.

Comment author: chaosmosis 10 October 2012 07:09:02PM -2 points [-]

The parent of this comment is true. On my honor as a rationalist.

I would like people to try to solve the puzzle.

This comment (yes, this one) is true.

Comment author: chaosmosis 10 October 2012 06:23:34PM -1 points [-]

PBEERPG.

Comment author: TheOtherDave 10 October 2012 06:08:04PM 1 point [-]

I assume you mean without looking it up.

My answer is roughly the same as TimS's... it mostly depends on "Would TraderJoe pick a true statement in this context or a false one?" Which in turn mostly depends on "Would a randomly selected LWer pick a true statement in this context or a false one?" since I don't know much about you as a distinct individual.

I seem to have a prior probability somewhat above 50% for "true", though thinking about it I'm not sure why exactly that is.

Looking it up, it amuses me to discover that I'm still not sure if it's true.

Comment author: CCC 11 October 2012 06:57:12AM 0 points [-]

This is a perfect situation for a poll.

How probable is it that TraderJoe's statement, in the parent comment, is true?

Submitting...

Comment author: chaosmosis 12 October 2012 04:14:11AM *  0 points [-]

I voted with what I thought my previous estimate was before I'd checked via rot13.

Comment author: TraderJoe 11 October 2012 10:44:11AM *  0 points [-]

[comment deleted]

Comment author: TimS 10 October 2012 05:52:48PM 0 points [-]

It seems like my guess should be based on how likely I think it is that your are trying to trick me in some sense. I assume you didn't pick a sentence at random.

Comment author: TraderJoe 12 October 2012 07:34:07AM *  0 points [-]

[comment deleted]

Comment author: TraderJoe 10 October 2012 05:49:12PM *  0 points [-]

[comment deleted]

Comment author: Kindly 11 October 2012 02:44:27PM 0 points [-]

The transliteration does, but the actual Arabic means "V'z Sebz Nzrevpn".

So in fact TraderJoe's prediction of 0.5 was a simple average over the two statements given, and everyone else giving a prediction failed to take into account that the answer could be neither "true" nor "false".

Comment author: TheOtherDave 10 October 2012 06:11:57PM 0 points [-]

Not according to google translate. Incidentally, that string is particularly easy to uncypher by inspection.

Comment author: TraderJoe 11 October 2012 06:40:41AM *  0 points [-]

[comment deleted]

Comment author: thomblake 10 October 2012 06:15:04PM 0 points [-]

Yeah, that's an interesting discrepancy.

Comment author: chaosmosis 10 October 2012 06:17:19PM *  0 points [-]

All questions that you encounter will be asked by a human. I get what you mean though, if other humans are asking a human a question then distortions are probably magnified.

Comment author: TheOtherDave 10 October 2012 06:25:29PM 1 point [-]

Some questions are implicitly raised by a situation. "Is this coffee cup capable of holding coffee without spilling it?", for example. When I pour coffee into the cup, I am implicitly expressing more than 50% confidence that the answer is "yes".

Comment author: chaosmosis 10 October 2012 10:34:36PM 0 points [-]

What I'm saying is that what's implicit is a fact about you, not the situation, and the way the question is formed is partially determined by you. I was vague in saying so, however.

Comment author: TheOtherDave 10 October 2012 10:48:17PM 0 points [-]

I agree that the way the question is formed is partially determined by me. I agree that there's a relevant implicit fact about me. I disagree that there's no relevant implicit fact about the situation.

Comment author: chaosmosis 11 October 2012 02:54:41AM 0 points [-]

Nothing can be implicit without interpretation, sometimes the apparent implications of a situation are just misguided notions that we have inside our heads. You're going to have a natural tendency to form your questions in certain ways, and some of these ways will lead you to asking nonsensical questions, such as questions with contradictory expectations.

Comment author: TheOtherDave 11 October 2012 03:22:07AM 1 point [-]

I agree that the apparent implications of a situation are notions in our heads, and that sometimes those notions are nonsensical and/or contradictory and/or misguided.

Comment author: FAWS 10 October 2012 10:56:54AM 1 point [-]

I disagree with this. The reason you shouldn't assign 50% to the proposition "I will win the lottery" is because you have some understanding of the odds behind the lottery. If a yes/no question which I have no idea about is asked, I am 50% confident that the answer is yes. The reason for this is point 2: provided I think a question and its negation are equally likely to have been asked, there is a 50% chance that the answer to the question you have asked is yes.

That's only reasonable if some agent is trying to maximize the information content of your answer. The vast majority of possible statements of a given length are false.

Comment author: TraderJoe 10 October 2012 05:54:10PM 2 points [-]

Sure, but how often do you see each of the following sentences in some kind of logic discussion: 2+2=3 2+2=4 2+2=5 2+2=6 2+2=7

I have seen the first and third from time to time, the second more frequently than any other, and virtually never see 2+2 = n for n > 5. Not all statements are shown with equal frequency. My guess is that the percentage of the time when "2+2 = x" is written in contexts where the statement is for a true/false logic proposition rather than an equation x = 4 is more common than all other values put together.

Comment author: ArisKatsaris 10 October 2012 11:25:04AM 0 points [-]

The vast majority of possible statements of a given length are false.

That's surely an artifice of human languages and even so it would depend on whether the statement is mostly structured using "or" or using "and".

There's a 1-to-1 mapping between true and false statements (just add 'the following is false:' in front of each statement to get the opposite). In a language where 'the following is false' is assumed, the reverse would be actual.

Comment author: TimS 10 October 2012 12:54:07PM -1 points [-]

I'm not sure your statement is true.

Consider:
The sky is blue.
The sky is red.
The sky is yellow.
The sky is pink.

Comment author: army1987 10 October 2012 01:27:16PM *  1 point [-]

The sky is not blue. The sky is not red. The sky is not yellow. The sky is not pink.

Anyway, it depends on what you mean by “statement”. The vast majority of all possible strings are ungrammatical, the vast majority of all grammatical sentences are meaningless, and most of the rest refer to different propositions if uttered in different contexts (“the sky is ochre” refers to a true proposition if uttered on Mars, or when talking about a picture taken on Mars).

Comment author: Will_Sawin 11 October 2012 06:03:46AM 1 point [-]

The typical mode of communication is an attempt to convey information by making true statements. One only brings up false statements in much rarer circustances, such as when one entity's information contradicts another entity's information. Thus, an optimized language is one where true statements are high in information.

Otherwise, to communicate efficiently, you'd have to go around making a bunch of statements with an extraneous not above the default for the language, which is wierd.

This has the potential to be trans-human, I think.

Comment author: army1987 11 October 2012 03:27:04PM *  1 point [-]

But whether a statement is true or false depends on things other than the language itself. (The sentence “there were no aces or kings in the flop” is the same length whether or not there were any aces or kings in the flop.) The typical mode of communication is an attempt to convey information by making true but non-tautological statements (for certain values of “typical” -- actually implicatures are often at least as important as truth conditions). So, how would such a mechanism work?

Comment author: Kindly 10 October 2012 01:22:59PM 1 point [-]

But, on the other hand:

The sky is not blue. The sky is not red. The sky is not yellow. The sky is not pink.

Comment author: ArisKatsaris 10 October 2012 01:02:40PM *  0 points [-]

You need to be more specific about what exactly it is I said that you're disputing - I am not sure what it is that I must 'consider' about these statements.

Comment author: TimS 10 October 2012 02:30:04PM 1 point [-]

On further consideration, I take it back. I was trying to make the point that "Sky not blue" != "Sky is pink". Which is true, but does not counter your point that (P or !P) must be true by definition.

It is the case that the vast majority of grammatical statements of a give length are false. But until we have a formal way of saying that statements like "The Sky is Blue" or "The Sky is Pink" are more fundamental than statements like "The Sky is Not Blue" or "The Sky is Not Pink," you must be correct that this is an artifact of the language used to express the ideas. For example, a language where negation was the default and additional length was needed to assert truth would have a different proportion of true and false statements for any given sentence length.

Also, lots of downvotes in this comment path (on both sides of the discussion). Any sense of why?

Comment author: FAWS 10 October 2012 12:45:19PM *  -1 points [-]

That's surely an artifice of human languages and even so it would depend on whether the statement is mostly structured using "or" or using "and".

It's true of any language optimized for conveying information. The information content of a statement is reciprocal to it's prior probability, and therefore more or less proportional to how many other statements of the same form would be false.

In your counter example the information content of a statement in the basic form decreases with length.

Comment author: Morendil 10 October 2012 07:55:35PM *  0 points [-]

The reason you shouldn't assign 50% to the proposition "I will win the lottery" is because you have some understanding of the odds behind the lottery.

Yup. Similarly you don't assign 50% to the proposition "X will change", where X is a relatively long-lasting feature of the world around you - long-lasting enough to have been noticed as such in the first place and given rise to the hypothesis that it will change. (In the Le Pen prediction, the important word is "cease", not "Le Pen" or "election".)

ETA: what I'm getting at is that nobody gives a damn about the class of question "yes/no question which I have no idea about". The subthread about these questions is a red herring. When a question comes up about "world events", you have some idea of the odds for change vs status quo based on the general category of things that the question is about. For instance many GJP questions are of the form "Will Prime Minister of Country X resign or otherwise vacate that position within the next six months?". Even if you are not familiar with the politics of Country X, you have some grounds for thinking that the "No" side of the question is more likely than the "Yes" side - for having an overall status quo bias on this type of question.

Comment author: AspiringRationalist 10 October 2012 05:41:38PM 1 point [-]

Just because you use "75%" as a shorthand for "I'm pretty sure" doesn't mean you are thinking probabilistically; you must train the skill of seeing that for some events, its complement "25%" also counts as "I'm pretty sure".

Would expressing these things in terms of odds rather than probability make it easier to avoid this error?

Comment author: Morendil 10 October 2012 06:33:18PM 0 points [-]

Dunno. I have trouble with odds, for some reason, and rarely if ever think in terms of odds.

Comment author: mfb 10 October 2012 12:46:03PM *  0 points [-]

That reminds me of a question about judging predictions: Is there any established method to say "x made n predictions, was underconfident / calibrated properly / overconfident and the quality of the predictions was z"? Assuming the predictions are given as "x will happen (y% confidence)".

It is easy to make 1000 unbiased predictions about lottery drawings, but this does not mean you are good in making predictions.

Comment author: Morendil 11 October 2012 08:16:33AM *  0 points [-]

Is there any established method

Yes: use a scoring rule to rate your predictions, giving you an overall evaluation of their quality. If you use, say, the Brier score, that admits decompositions into separate components, for instance "calibration" and "refinement"; if your "refinement" score was high on the lottery drawings, meaning that you'd assigned higher probabilities of winning to the people who did in fact win (as opposed to correctly calling the probabilities of winning overall), you'd be a suspect for game-rigging or psi powers. ;)

Comment author: mfb 12 October 2012 07:29:43PM *  1 point [-]

Interesting, thanks, but not exactly what I looked for. As an example, take a simplified lottery: 1 number is drawn out of 10. I can predict "number X will have a probability of 10%" 100 times in a row - this is correct, and will give a good score in all scoring rules. However, those predictions are not interesting.

If I make 100 predictions "a meteorite will hit position X tomorrow (10% confidence)" and 10% of them are correct, those predictions are very interesting - you would expect that I have some additional knowledge (for example, observed an approaching asteroid).

The difference between the examples is the quality of the predictions: Everybody can get correct (unbiased) 10%-predictions for the lottery, but getting enough evidence to make correct 10%-probabilities for asteroid impacts is hard - most predictions for those positions will be way lower.

Comment author: Morendil 12 October 2012 09:59:08PM *  0 points [-]

Interesting, thanks, but not exactly what I looked for.

Help me understand what you're describing? Below is a stab at working out the math (I'm horrible at math, I have to laboriously work things out with a bc-like program, but I'm more confident in my grasp of the concepts).

The salient feature of your meteorite predictions is location. We can score these forecasts exactly as GJP scores multiple-choice forecasts, as long as they're well-specified. Let's refine "hit position X" to "within 10 miles of X". That translates to roughly a one in a million chance of calling the location correctly (surface area of the Earth divided by a 10-mile radius area is about 10 to the 6). We can make a similar calculation with respect to the probability that a meteorite hits at all; it comes out to roughly one per day on average, so we can simplify and assume exactly one hits every day.

So a forecast that "a meteorite will hit location X tomorrow at 10% confidence" is equivalent to dividing Earth into one million cells, each cell being one possible outcome in a multiple-outcome forecast, and putting 10% probability mass into one cell. Let's say you distribute the remaining probability evenly among the 999,999 remaining cells. We can now compute your Brier loss function, the sum of squared errors.

Either the meteorite hits X, and your score is .81 (the penalty for predicting an event at 10% confidence that turns out to happen), plus epsilon times one million minus one for the other cells. Or the meteorite hits a different cell, and your Brier score is 1.01 minus epsilon: 1 minus epsilon for hitting a cell that you had predicted would be hit at a probability close to 0, plus .01 for failing to hit X, plus epsilon for failing to hit the other cells.

So, over 100 such events, the expected value of your score ranges from 81 if you have laser-like accuracy, to 101 if you're just guessing at random. Intermediate values reflect intermediate accuracies. The range of scores is fairly narrow, because your probability mass isn't very concentrated - only a 10% bump on the "jackpot" cell, the rest spread around the surface of the earth.

If any of the above is wrong (math-wise) or stupid, or misrepresents your model, I'd appreciate knowing. :)

Comment author: mfb 13 October 2012 01:38:46PM 0 points [-]

To calculate the Brier score, you used >your< assumption that meteorites have a 1 in a million chance to hit a specfic area. What about events without a natural way to get those assumptions?

Let's use another example:

Assume that I predict that neither Obama nor Romney will be elected with 95% confidence. If that prediction becomes true, it is amazing and indicates a high predictive power (especially if I make multiple similar predictions and most of them become true).

Assume that I predict that either Obama or Romney will be elected with 95% confidence. If that prediction becomes true, it is not surprising.

Where is the difference? The second event is expected by others. How can we quantify "difference to expectations of others" and include it in the score? Maybe with an additional weight - weight each prediction with the difference from the expectations of others (as mean of the log ratio or something like that).

Comment author: Kindly 13 October 2012 03:31:40PM 0 points [-]

If the objective is to get better scores than others, then that helps, though it's not clear to me that it does so in any consistent way (in particular, the strategy to maximize your score and the strategy to get the best score with the highest probability may well be different, and one of them might involve mis-reporting your own degree of belief).

Comment author: Morendil 13 October 2012 02:45:08PM *  0 points [-]

How can we quantify "difference to expectations of others" and include it in the score?

You're getting this from the "refinement" part of the calibration/refinement decomposition of the Brier score. Over time, your score will end up much higher than others' if you have better refinement (e.g. from "inside information", or from a superior methodology), even if everyone is identically (perfectly) calibrated.

This is the difference between a weather forecast derived from looking at a climate model, e.g. I assign 68% probability to the proposition that the temperature today in your city is within one standard deviation of its average October temperature, and one derived from looking out the window.

ETA: what you say about my using an assumption is not correct - I've only been making the forecast well-specified, such that the way you said you allocated your probability mass would give us a proper loss function, and simplifying the calculation by using a uniform distribution for the rest of your 90%. You can compute the loss function for any allocation of probability among outcomes that you care to name - the math might become more complicated, is all. I'm not making any assumptions as to the probability distribution of the actual events. The math doesn't, either. It's quite general.

Comment author: mfb 14 October 2012 09:17:00PM 0 points [-]

I can still make 100000 lottery predictions, and get a good score. I look for a system which you cannot trick in that way. Ok, for each prediction, you can subtract the average score from your score. That should work. Assuming that all other predictions are rational, too, you get an expectation of 0 difference in the lottery predictions.

I've only been making the forecast well-specified

I think "impact here (10% confidence), no impact at that place (90% confidence)" is quite specific. It is a binary event.

Comment author: [deleted] 09 October 2012 11:57:25PM *  0 points [-]

del

Comment author: Morendil 11 October 2012 02:05:08PM *  3 points [-]

Maybe a similar rule in forecasting is "home ground advantage plus personal or professional stakes, or don't bother."

On 16 questions currently scored, I've done better than the team average at 15. Two of the questions where I outperformed the team by a large margin where the Syrian refugee question, basically a matter of extrapolating a trend and predicting status quo with respect to the conflict, and the Kismayo question, basically a matter of knowing my loss function. I had zero home ground advantage on either question.

Some of my wins resulted purely from general knowledge rather than from having any idea of the specifics of the situation: for instance, in mid-August I answered 40% to "Will Kuwait commence parliamentary elections before 1 October 2012?", reflecting only status quo bias in that a date for the election had not yet been announced. However, early in September I downgraded this to 10%, because I know that as a rule of thumb it takes at least one month to convene an election. The week before, I went to 5% (and even that was quite a generous margin), while several of my teammates made predictions, after I published mine, of 15%, 19%, 33% and even 51% (!).

This felt like entering a poker tournament where people routinely raise pre-flop with a "beer hand" (seven and two - when you play this, either you've had too many beers, or it's time you have one). Elections aren't a mysterious thing, we participate in one every so often. You need to print ballots, set up voting booths, audit voter registration records, give people time to campaign on national media, all very mundane stuff. Even dictatorships make at least a half-hearted attempt at this, and it's not like anyone in Kuwait had any particular interest in meeting an October deadline, this was strictly an internal-to-GJP deadline.

So while this question had to do, ostensibly, with something happening in Kuwait, all you needed to make a call at least as good as mine was background knowledge about extremely mundane, practical stuff that, if I had any hint that you wouldn't factor that in when making a close-to-home prediction, I wouldn't trust you with organizing so much as the PTA president election. Maybe a birthday party.

I wouldn't go so far as to claim that "skill at forecasting macro trends transfer to microeconomic moves".

But I'd take a stand on "demonstrated incompetence at the most elementary moves of forecasting, in a macro domain, is a strong indicator of likely incompetence at forecasting in any micro domain, other than the few narrow ones you might happen to be good at".

Comment author: army1987 11 October 2012 08:26:24PM *  1 point [-]

Some of my wins resulted purely from general knowledge rather than from having any idea of the specifics of the situation: for instance, in mid-August I answered 40% to "Will Kuwait commence parliamentary elections before 1 October 2012?", reflecting only status quo bias in that a date for the election had not yet been announced. However, early in September I downgraded this to 10%, because I know that as a rule of thumb it takes at least one month to convene an election. The week before, I went to 5% (and even that was quite a generous margin), while several of my teammates made predictions, after I published mine, of 15%, 19%, 33% and even 51% (!).

Yeah. Answering “1%” that “there will be a major earthquake in California during $time_period” a month before the end of $time_period kind-of felt like cheating to me.

Comment author: Kindly 11 October 2012 02:11:54PM 0 points [-]

How does GJP score predictions that change over time?

Comment author: Morendil 11 October 2012 03:04:15PM *  3 points [-]

They compute your Brier score for each day that the question is open, according to what your forecast is on that day, and average over all days.

Suppose you start at 80%, six days pass, you switch to 40% three days before the deadline, and the event doesn't happen, your score is (6*(0.8)^2+3*(0.4)^2)/9 = .48, which is a so-so score - but an improvement over the .64 that you'd get if you didn't change your mind.

Comment author: Morendil 10 October 2012 04:42:57AM *  2 points [-]

Isn't Tool 0 of forecasting 'Mind your own business'?

In a nutshell, no.

Consider some practicalities. An advantage of forecasting world events is that it permits participation by a much broader population. I could run a forecasting contest on when the city of Paris will complete a construction project on the banks of the Seine, which is "my backyard" compared to Syria. Nobody would bother.

The point is to find out something about how you think, and comparing yourself to other people will yield information that you can't get by sitting on your own, minding your own business. (On the other hand, there's nothing preventing you from doing both.)

Finally, I'm not aware that people routinely make explicit, quantified forecasts even about their own business. Rather, it seems plain that most of the time, we think "probable" the things we would like to happen, and as a result fail to plan for contingencies we don't like to think about.

To go from not forecasting at all to making forecasts in any domain is progress. It would certainly be useful to many to make forecasts about their daily lives (which I now do, a little bit). But let's imagine this were taught in schools as a life skill: I suspect you would have people practicing precisely on events that they have no control over and that allow interpersonal comparison.

Comment author: [deleted] 10 October 2012 01:33:53PM *  0 points [-]

del

Comment author: Morendil 11 October 2012 11:32:59AM 1 point [-]

Isn't Tool 0 of forecasting 'Mind your own business'?

Thanks for inspiring the following bit of staircase wit, which might make it into some further version of the post: Tool 0 of forecasting is "forecast". If you don't do it, you can't become better at it.

Gwern prefers PredictionBook - where you can, if you want, record private predictions - to GJP. For my part I prefer GJP, precisely because they ask me questions that might not occur to me otherwise, and the competitive aspect suits me. You could also do just fine by recording your own forecasts in a spreadsheet or a notepad, on whatever topics you like.

The key to accuracy is having fewer moving parts, all of which are visible and known by you.

Is accuracy what you're after? Which component of accuracy? I can get perfect calibration by throwing a thousand coin flips and predicting 50% all the time. What I seek is debiasing, making the most of whatever information is available without overweighting any part of it (including my own hunches, feelings and fears); and I'm most vulnerable to bias when there are many moving parts, many of which are hidden from me or unknown to me.

Comment author: gwern 10 October 2012 12:32:41AM 1 point [-]

Isn't Tool 0 of forecasting 'Mind your own business'? One rule of the thumb in poker is "Jacks or better", meaning that you shouldn't even consider playing with anything less. Maybe a similar rule in forecasting is "home ground advantage plus personal or professional stakes, or don't bother."

No, tool 0 is more like 'mind your base rates' or 'don't predict what you would like, predict what you really think would happen'. I dunno where you're getting Tool 0 as 'Mind your own business' from; certainly I or Morendil didn't write it.

How well does skill at forecasting macro trends transfer to microeconomic moves?

I dunno, did you look into any research?

I'm guessing successful policymakers know a lot about their colleagues' personalities, histories, and connections, and that good policy comes from navigating those instead of forecasting macrotrends accurately.

Per the huge amount of material on Outside View vs Inside View and performance of SPRs already discussed on LW, I would guess quite the opposite.

How much would someone boost their batting average by limiting opinions to things of concern, such as for example health-related science, psychology as a science, or which people to draw closer or keep away? Entering a service-based deal with someone who turns out to be incompetent is costly, painful. Or failing to size up new "friends" fast enough for you to retreat before they waste your time. Many people perform well with evolved and practiced instinct by making sure that their forecasts never leave the home ground of their concern.

Do you know that, or are you just guessing, as you said you were before?

Or was your entire comment just an excuse to do an awful lot of rhetorical questions?

Comment author: chaosmosis 10 October 2012 04:53:33AM 2 points [-]

I think he's saying it's a waste of effort to predict who or what will happen in the world if you can't exert any control over it. That sort of makes sense because it seems useless to worry about those sort of things, at first. But it's important to understand the consequences of the actions of other people so that you can react to them, and he didn't take that into account. So, for example, a French citizen might be interested in knowing who the next US president will be because they're curious about the implications that has for their business contacts in America.

Comment author: Morendil 10 October 2012 05:09:49AM 3 points [-]

I think he's saying it's a waste of effort to predict who or what will happen in the world if you can't exert any control over it.

Buying insurance is a decision that relates to things that may or may not happen, that you have little or no control over: illness, accidents, burglaries, etc. Being able to make informed predictions as to the likelihood of these things is a valuable life skill.

Comment author: [deleted] 10 October 2012 01:40:46PM *  0 points [-]

del

Comment author: chaosmosis 10 October 2012 06:09:36PM *  0 points [-]

Are they different in kind? I'm uncertain.

The distinction seems arbitrary at first glance both because what's personal for one person is impersonal for another and because causality is causality no matter where it occurs. However, if you meant that they're different in kind in a more epistemic sense, that they're different in kind from any particular perspective because of the way that they go through your reasoning process, then that seems plausible.

The question is then what types of data work best and why. You're likely to have less total amounts of data in Near Mode, but you'll be working with things that are important to you personally which it seems like evolution would favor (individual selection).

On the other hand, evolution seems to make biases more frequent and more intense when they're about personal matters. But evolution wouldn't do this if it hadn't worked often in the past, so perhaps those biases are good? I think that this is fairly plausible, but I also think that these biases would only be "good" in a reproductive sense and not in the sense of epistemic accuracy. They would move you towards maximizing your social status, not the quality of your predictions. It's unlikely those would overlap.

How likely is it that people are good at evaluating the credibility of the ideas of specific people? I would say that most people are probably bad at this when seeing others face to face because of things like the halo effect and because credibility is rather easy to fake. I would also say that people are rather good at this otherwise. Are these evaluations still accurate when they interact with social motivations, like rivalry? I would say that they probably end up even worse under those circumstances.

So, I believe that personal events and impersonal events should be considered differently because I believe trying to evaluate the accuracy of the views of specific experts would improve the accuracy of your predictions if and only if you avoided personal familiarity or intimacy with those experts, and that otherwise it would damage your accuracy.

I failed to consider the implications of social motivation for professional accuracy, and a bunch of other stuff.

Comment author: [deleted] 11 October 2012 01:19:53AM *  0 points [-]

del

Comment author: chaosmosis 11 October 2012 02:48:12AM 0 points [-]

I'm sorry, either I'm misunderstanding you or you misunderstood my comment. I don't understand what you mean by the phrase "choosing types of data". I think that although we're better at dealing with some types of data, that doesn't mean we should focus exclusively on that type of data. I think that becoming a skilled general forecaster is a very useful thing and something that should be pursued.

What sort of questions did you have in mind?

Comment author: [deleted] 11 October 2012 04:04:38AM *  1 point [-]

del

Comment author: CCC 11 October 2012 07:38:27AM 1 point [-]

Well, I can give you an argument, though you'll have to evaluate the strength of it yourself.


Forecasting, in a Bayesian sense, is a matter of repeated application of Bayes' theorem. In short, I make an observation (B) and then ask - what are the chances of prediction (A), given observation (B)? ('Prediction' may be the wrong word, given that I may be predicting something unseen that has already happened). Bayes' theorem states that this is equal to the following:

The chances of observation B, given prediction A, multiplied by the prior probability of prediction A, divided by the prior probability of observation B

Now, the result of the equation is only as good as the figures you feed into it. In your example of the freelancer, the new freelancer (just starting out) has poor estimates of the probabilities involved, though he can improve these estimates by asking a more experienced freelancer for help. The experienced freelancer, on the other hand, has got a better grasp of the input probabilities, and thus gets a more accurate output probability. The equation works for both large-scale, macro events and small-scale, personal events - the difference is, once again, a matter of the input numbers. For a macro event, you'll have more people looking at, commenting on, discussing the situation; reading the words of others will improve your estimates of the probabilities involved, and putting better numbers in will get you better numbers out. Also, with macro events, you're more likely to have more time to sit down with pencil and paper and work it out.

However, predicting macro events will help you to better practice the equation, and thus learn how to apply it more quickly and easily to micro events. Sufficient practice will also help you to more quickly and accurately estimate the result for a given set of inputs. So while it is true that the skill of guessing the input probabilities for macro events may have little to do with the skill of guessing the input probabilites for micro events (though there is some correlation there - the skill of accurately putting figures to the probability may transfer to some degree), the skill of practicing the application of the equation is transferable between the two realms.

Comment author: Morendil 11 October 2012 08:07:43AM 0 points [-]

So I'm interested in forecasting. It's an important skill. I'm going on about it because I want to be good, smart, well-calibrated, about what matters to me.

Well, to start with: what evidence do you have at the moment about how well calibrated you are?

Comment author: AspiringRationalist 10 October 2012 05:36:58PM 0 points [-]

The methods that Morendil is discussing here are pretty general forecasting techniques, not limited to a particular domain. Some skills are worth developing, even if you're practicing them in domains you don't care about.

Personal example: I was a bio major in college, and I found it very difficult to care about organic chemistry, because we were mostly learning about chemicals that had no biological relevance. Consequently, I didn't learn it very well, which came back to bite me pretty hard when I took biochemistry.

Comment author: chaosmosis 11 October 2012 05:36:07PM -3 points [-]

Are there self consistent ways for people to believe that trickle-down economic policies should be encouraged but also to believe that small businesses are the primary drivers of growth? Many people seem to believe both and I do not understand why.

Comment author: TheOtherDave 11 October 2012 05:43:54PM 2 points [-]

Are there self consistent ways for people to believe that trickle-down economic policies should be encouraged but also to believe that small businesses are the primary drivers of growth?

Sure. Just to pick an obvious example, I might believe that trickle-down economic policies will benefit a subset of the population, and also believe that small businesses are primary growth drivers for the population as a whole, and believe that trickle-down economic policies should be encouraged because I consider benefits to the former subset more valuable than growth.

Comment author: Morendil 11 October 2012 06:11:57PM 1 point [-]

What's the connection with the OP? I'm not seeing it...

Comment author: chaosmosis 11 October 2012 11:04:17PM 2 points [-]

Oh. Err.

I meant to comment with this in open thread. My mistake.