Agile programming notices that we're bad at forcasting, but concludes that we're systematically bad. The approach it takes is to ask the programmers to try for consistency in their forecasting of individual tasks, and puts the learning in the next phase, which is planning. So as individual members of a development team, we're supposed to simultaneously believe that we can make consistent forecasts of particular tasks, and that our ability to make estimates is consistently off, and applying a correction factor will make it work.
Partly this is like learning to throw darts, and partly it's a theory about aggregating biased estimates. What I mean by the dart throwing example, is that beginning dart tossers are taught to try first to aim for consistency. Once you get all your darts to end up close to the same spot, you can adjust small things to move the spot around. The first requirement for being a decent dart thrower is being able to throw the same way each toss. Once you can do that, you can turn slightly, or learn other tricks to adjust how your aim point relates to the place the darts land.
The aggregation theory says that the problem in forecasting is not with the individual estimates, it's with random and unforseen factors that are easier to correct for in the aggregate. The problem with the individual forecasts might be overhead tasks that reliably steal time away, it might be that bugs are unpredictable, or it might be about redesign that only becomes apparent as you make progress. These are easier to account for in the aggregate planning than when thinking about individual tasks. If you pad all your tasks for their worst case, you end up with too much padding.
Over the long term, the expansion factor from individual tasks to weeks or months of work can be fairly consistent.
Providing Slack at the project level instead of the task level is a really good idea, and has worked well in many fields outside of programming. It is analogous to the concept of insurance: the RoI on Slack is higher when you aggregate many events with at least partially uncorrelated errors.
One major problem with trying to fix estimates at the task level is that there are strong incentives not to finish a task too early. For example, if you estimated 6 weeks, and are almost done after 3, and something moderately urgent comes up, you're more likely to switch and fix that urgent thing since you have time. On the other hand, if you estimated 4 weeks, you're more likely to delay the other task (or ask someone else to do it).
As a result, I've found that teams are literally likely to finish projects faster with higher quality if you estimate the project as, say, 8 3-week tasks with 24 weeks of overall slack (so 48 weeks total) than if you estimate the project as a 8 6-week tasks.
This is somewhat counterintuitive but really easy to apply in practice if you have a bit of social capital.
We're not systematically bad forecasters. We're subject to widespread rewards to overconfidence.
There's a bunch of things in the content of this post I really like:
I don't feel like I fully understand (i.e. could easily replicate) how your notion of the policy level interacts with the gears - I might try to work through some examples in a separate comment later, and if I can do that I'll probably start using these terms myself.
Also, something about this post felt really well structured. Anyhow, for all these reasons, I've Featured this.
One thing I'd add to that list is that the post focuses on refining existing concepts, which is quite valuable and generally doesn't get enough attention.
Doesn't this basically move the reference class tennis to the meta-level?
"Oh, in general I'm terrible at planning, but not in cases involving X, Y and Z!"
It seems reasonable that this is harder to do this on a meta-level, but do any of the other points you mention actually "solve" this problem?
Valid question. I think the way I framed it may over-sell the solution -- there'll still be some problems of reference-class choice. But, you should move toward gears to try and resolve that.
The way in which policy-level thinking improves the reference-class tennis is that you're not stopping at reference-class reasoning. You're trying to map the input-output relations involved.
If you think you're terrible at planning, but not in cases X, Y, and Z, what mechanisms do you think lie behind that? You don't have to know in order to use the reference classes, but if you are feeling uncertain about the validity of the reference classes, digging down into causal details is likely a pretty good way to disambiguate.
For example, maybe you find that you're tempted to say something very critical of someone (accuse them of lying), but you notice that you're in a position where you are socially incentivised to do so (everyone is criticising this person and you feel like joining in). However, you alse value honesty, and don't want to accuse them of lying unfairly. You don't think you're being dishonest with yourself about it, but you can remember other situations where you've joined in an group criticism and later realized you were unfair due to the social momentum.
I think the reference-class reaction is to just downgrade your certainty, and maybe have a policy of not speaking up in those kinds of situations. This isn't a bad reaction, but it can be seen as a sort of epistemic learned helplessness. "I've been irrational in this sort of situation, therefore I'm incapable of being rational in this sort of situation." You might end up generally uncomfortable with this sort of social situation and feeling like you don't know how to handle it well.
So, another reaction, which might be better in the long-term, would be to take a look at your thinking process. "What makes me think they're a liar? Wow, I'm not working on much evidence here. There are all kinds of alternative explanations, I just gravitated to that one..."
It's a somewhat subtle distinction there; maybe not the clearest example.
Another example which is pretty clear: someone who you don't like provides a critique of your idea. You are tempted to reject it out of hand, but outside view tells you you're likely to reject that person's comments reguardless of truth. A naive adjustment might be to upgrade your credence in their criticism. This seems like something you only want to do if your reasoning in the domain is too messy to assess objectively. Policy-level thinking might say that the solution is to judge their critique on its merits. Your "inside view" (in the sense of how-reality-naively-seems-to-you) is that it's a bad critique; but this obviously isn't a gears-level assessment.
Maybe your policy-level analysis is that you'll be unable to judge a critique objectively in such cases, even if you pause to try and imagine what you'd think of it if you came up with it yourself. In that case, maybe you decide that what you'll do in such cases is write it down and think it through in more detail later (and say as much to the critic).
Or, maybe your best option really is to downgrade your own belief without assessing the critique at the gears level. (Perhaps this issue isn't that important to you, and you don't want to spend time evaluating arguments in detail.) But the point is that you can go into more detail.
Inside view vs outside view has been a fairly useful intuition-pump for rationality. However, the dichotomy has a lot of shortcomings. We've just gotten a whole sequence about failures of a cluster of practices called modest epistemology, which largely overlaps with what people call outside view. I'm not ready to stop championing what I think of as the outside view. However, I am ready for a name change. The term outside view doesn't exactly have a clear definition; or, to the extent that it does have one, it's "reference class forecasting", which is not what I want to point at. Reference class forecasting has its uses, but many problems have been noted.
I propose gears level & policy level. But, before I discuss why these are appropriate replacements, let's look at my motives for finding better terms.
Issues with Inside vs Outside
Problems with the concept of outside view as it currently exists:
The existing notion of inside view is also problematic:
The Gears Level and the Policy Level
Gears-level understanding is a term from CFAR, so you can't blame me for it. Well, I'm endorsing it, so I suppose you can blame me a little. In any case, I like the term, and I think it fits my purposes. Some features of gears-level reasoning:
The policy level is not a CFAR concept. It is similar to the CFAR concept of the strategic level, which I suspect is based on Nate Soares' Staring Into Regrets. In any case, here are some things which point in the right direction:
Most of the existing ideas I can point to are about actions: game theory, decision theory, the planning fallacy. That's probably the worst problem with the terminology choice. Policy-level thinking has a very instrumental character, because it is about process. However, at its core, it is epistemic. Gears level thinking is the practice of good map-making. The output is a high-quality map. Policy-level thinking, on the other hand, is the theory of map-making. The output is a refined strategy for making maps.
The standard example with the planning fallacy illustrates this: although the goal is to improve planning, which sounds instrumental, the key is noticing the miscalibration of time estimates. The same trick works for any kind of mental miscalibration: if you know about it, you can adjust for it.
This is not just reference class forecasting, though. You don't adjust your time estimates for projects upward and stop there. The fact that you normally underestimate how long things will take makes you think about your model. "Hm, that's interesting. My plans almost never come out as stated, but I always believe in them when I'm making them." You shouldn't be satisfied with this state of affairs! You can slap on a correction factor and keep planning like you always have, but this is a sort of paradoxical mental state to maintain. If you do manage keep the disparity between your past predictions and actual events actively in mind, I think it's more natural to start considering which parts of your plans are most likely to go wrong.
If I had to spell it out in steps:
I don't know quite what I can say here to convey the importance of this. There is a skill here; a very important skill, which can be done in a split second. It is the skill of going meta.
Gears-Leves and Policy-Level Are Not Opposites
The second-most confusing thing about my proposed terms is probably that they are not opposites of each other. They'd be snappier if they were; "inside view vs outside view" had a nice sound to it. On the other hand, I don't want the concepts to be opposed. I don't want a dichotomy that serves as a discriptive clustering of ways of thinking; I want to point at skills of thinking. As I mentioned, the virtuous features of gears-level thinking are still present when thinking at the policy level; unlike in reference class forecasting, the ideal is still to get a good causal model of what's going on (IE, a good causal model of what is producing systematic bias in your way of thinking).
The opposite of gears-level thinking is un-gears-like thinking: reasoning by analogy, loose verbal arguments, rules of thumb. Policy-level thinking will often be like this when you seek to make simple corrections for biases. But, remember, these are error models in the errors-vs-bugs dichotomy; real skill improvement relies on bug models (as studies in deliberate practice suggest).
The opposite of policy-level thinking? Stimulus-response; reinforcement learning; habit; scripted, sphexish behavior. This, too, has its place.
Still, like inside and outside view, gears and policy thinking are made to work together. Learning the principles of strong gears-level thinking helps you fill in the intricate structure of the universe. It allows you to get past social reasoning about who said what and what you were taught and whay you're supposed to think and believe, and instead, get at what's true. Policy-level thinking, on the other hand, helps you to not get lost in the details. It provides the rudder which can keep you moving in the right direction. It's better at cooperating with others, maintaining sanity before you figure out how it all adds up to normality, and optimizing your daily life.
Gears and policies both constitute moment-to-moment ways of looking at the world which can change the way you think. There's no simple place to go to learn the skillsets behind each of them, but if you've been around LessWrong long enough, I suspect you know what I'm gesturing at.