This is a linkpost for https://cuttyshark.substack.com/p/forecasting-the-way-i-think-about
Good points well made. I'm not sure what you mean by "my expected log score is maximized" (and would like to know), but in any case it's probably your average world rather than your median world that does it?
Figure 1 is clumsy, sorry. In the case of a smooth probability distribution of infinite worlds, I think the median and the average world are the same? But in practice, yes, it's an expected value calculation, summing P(world) * P(U|world) for all the worlds you've thought about.
This is the first post in a little series I'm slowly writing on how I see forecasting, particularly conditional forecasting; what it's good for; and whether we should expect people to agree if they just talk to each other enough.
Views are my own. I work at the Forecasting Research Institute (FRI), I forecast with the Samotsvety group, and to the extent that I have formal training in this stuff, it's mostly from studying and collaborating with Leonard Smith, a chaos specialist.
My current plan is:
What do I do when I forecast? Let's say I'm forecasting an arbitrary bad outcome U that we're going to resolve in/by 2100 (e.g. AI-related catastrophe). I ask myself:
Imagining all the worlds is impossible, so I wind up decomposing the probability mass function into a few types of worlds and thinking about how being in each world would affect P(U) — i.e. for worlds A, B, C, … I have P(U|A), P(U|B), P(U|C) etc (Fig. 2). And I have ideas about how likely we are to wind up in each of A, B, C, etc. Here, B is my "modal world" and my "expectation" world is somewhere between C and D on the P(U) scale.
If you want to get really fancy, you can factor in uncertainty about U in each of these worlds, treat them all as distributions (some are pointier, some are more uncertain), and think about your all-things-considered P(U) as a mixed distribution of all of your worlds. This can always be distilled into a point estimate by taking that center of mass (dotted line in Fig. 1). You can use tools like squiggle for this.
Side-note: I think some people just think about the modal world B by default. It's probably the first world you think of. It's the world you most think will come to pass. But you don't maximize your log score by forecasting P(U|B) when you're asked for P(U).
In our projects at FRI, we've conditioned on things of the form "[x happens] by [year]." Let's say [x happens] is a certain policy being implemented. Understandably, our study participants have factored in what "this policy" being implemented may imply about the world in [year]. Maybe you think it highly unlikely that this policy would be implemented if we were living in World B, so conditioning on it makes you think we're probably in World F, where Russia has nuked the UK and there are dragons. Conditioning on any given thing changes the shape of your curve. Now it might look something like this:
This is a problem if what we want to know is how the policy would causally affect the ultimate outcome that we care about. Can we say whether this policy would be good or bad (measured by its impact on P(U))? Not really. But if you ask a forecaster to "hold all else equal" and try to isolate just the effect of the policy, I'd argue that they're hardly forecasting anymore. Any forecast generated that way can't be scored. Worlds A, B, C, etc could manifest, whereas the world where nothing happens except this policy is implemented isn't realizable. In fact, this is a fallacy that Adam Dorr has written about, ceteris paribus: when you're forecasting, it's a mistake to imagine "single-variable futures" (h/t Michał Dubrawski, without whom I probably wouldn't have read Dorr).
If only we had a way to capture how much of my forecast owes to "evidential" considerations like P(B|policy) and how much is more like causal reasoning! We need better ways for people to articulate their models of the world and what they're weighing in their forecasts. Dan Schwarz has written about that need here. I have some thoughts I'll share in my next post.
Keen to hear how different this is from how you, dear reader, think about forecasting.