I think Tetlock and cia might have already done some related work?
Question decomposition is part of the superforecasting commandments, though I can't recall off the top of my head if they were RCT'd individually or just as a whole.
ETA: This is the relevant paper (h/t Misha Yagudin). It was not about the 10 commandments. Apparently those haven't been RCT'd at all?
I don't remember anything specific from reading their stuff, but that would of course be useful. Sadly, I haven't been able to find any more recent investigations into decomposition, e.g. Connected Papers for MacGregor 1999 gives nothing worthwhile after 2006 on a first skim, but I'll perhaps look more at it.
"What is the empirical evidence for decomposition being a technique that improves forecasts?"
I might be misunderstanding here, but I'm fairly confident that the recent history of predicting sports outcomes and developing live betting odds very strongly supports decomposition as a technique (under some conditions).
It seems like the only rational way of predicting the outcome of a multi-stage sports event (like the FIFA World Cup, for example) is decomposing the chances of a team winning the World Cup into the chances of them winning each previous game. (And then adding a K-factor to adjust to recent results).
Maybe to clarify, by question decomposition I mean techniques such as saying " will happen if and only if and and ... all happen, so we estimate and and &c, and then multiply them together to estimate …", which is how it is done in the sources I linked.
Do you by chance have links about how this is done in sports betting? I'd be interested in that.
I think this is highly confounded with effort. Asking people to decompose a forecast will, on average, cause them to think more. This further calls into question any positive findings for decomposition.
I find this baffling. It seems like breaking predictions into sub-parts should help. But I haven't thought about it much :)
One possible counter-factor is in structuring people's judgments artificially. If asking them to break a prediction into sub-parts makes them factor the problem in different ways than they would in their own thinking, I can see how that would hurt judgments.
And it could actually cost time. Asking sub-questions could cause people to spend their cognitive time on the particulars of those sub-problems, rather than spending that time on sub-problems they thought of themselves, and that work naturally with their overall strategy for making that prediction.
This seems like a question one shouldn't be using statistical evidence to make an opinion about. It seems tractable to just grok (and intuify) the theoretical considerations and thus gain a much better understanding of when vs when not to decompose (and with how much granularity and by which method). Deferring to statistics on it seems liable to distort the model—such that I don't think a temporary increase in the accuracy of final-stage judgments would be worth it.
Question decomposition appears to be a relatively common method for forecasting, see Allyn-Feuer & Sanders 2023, Silver 2016, Kaufman 2011 and Hanson 2011, but there have been conceptual arguments against this technique, see Yudkowsky 2017 and Gwern 2019, which both state that it reliably underestimates the probability of events.
What is the empirical evidence for decomposition?
Lawrence et al. 2006 summarize the state of the field:
(Emphasis mine).
The types of decomposition described here seem quite different from the ones used in the sources above: Decomposed time series are quite dissimilar to multiplied probabilities for binary predictions, and in combination with the conceptual counter-arguments the evidence appears quite weak.
It appears as if a team of a few (let's say 4) dedicated forecasters could run a small experiment to determine whether multiplicative decomposition for binary forecasts a good method, by randomly spending 20 minutes either making explicitely decomposed forecasts or control forecasts (although the exact method for control needs to be elaborated on). Working in parallel, making 70 forecasts should take 70⋅13⋅4≈5.8 less than 6 hours, although it'd be useful to search for more recent literature on the question.