I think this sort of problem is far more common than most engineers intuitively expect.
Minor nit: Most software engineers I know who have worked on systems that interact with the real world and especially people (e.g. anything in the 'gig economy', real estate tech, etc.) are aware of these problems, even if they can't do nearly as good a job as you did of explaining them and describing when they're likely to occur. So I would say either more common than engineers intuitively expect until they work with a messy system like this or more common than more theory-oriented people intuitively expect.
What you describe is part of what I was trying (and not doing as good a job as you) to get at in my cruxes shortform post. Specifically, having a healthy respect for how difficult "messiness" is to deal with and its resistance to elegant solutions is very related to the long tail.
Interestingly, a lot of that cruxes thing resonates with me - I hate definition-theorem-proof textbooks, strongly prefer my math to be driven by applications and examples, have ample exposure to real-world messiness, and I've only very grudgingly come to accept that I'm not an engineer at heart.
But at the same time, I still think that, for instance, "AGI needs Judea Pearl more than John Carmack".
I think that too many people on both sides of the theory/engineering aesthetic divide identify "theory" with what I'd call "bad theory" - think Bourbaki-esque opaque math textbooks, where 80% of the effort is just spent defining things and there don't seem to be many (if any) practical applications. The definition-theorem-proof writing style is a central example here.
By contrast, I think "good theory" usually lets us take real-world phenomena which we already intuitively recognize, formalize them, then use that formalization to make predictions about them and/or design them. Great examples:
These all took things which we didn't previously know how to handle mathematically, and gave us a way to handle them mathematically. The models don't always match the messy world perfectly, but they tell us what questions to ask, what the key bottlenecks are, and what design approaches are possible. They give us frames through which the world makes more sense - they show ways in which the world is less messy than it first appears, and they help us identify which parts of the mess are actually relevant to particular phenomena.
An analogy: bad theory is drawing a map of an unfamiliar city by sitting in a room with the shades closed. It may produce some cool pictures, but it won't produce a map of the world. Good theory goes out and looks at the world, says "hey I've noticed a pattern where <interesting thing>, and I don't have a map for that", and then makes the map by abstracting the relevant parts of the real world.
Maybe my post makes it seem otherwise (although I hope not), but I agree with everything you said.
A minor meta point: since writing my original comment I've also learned more about graphical models and causality, which has led me to realize I previously underestimated Pearl's (and his students' / collaborators') achievements.
https://www.newyorker.com/science/maria-konnikova/hazards-automation I haven't gone deep into these studies, but I am aware that there have been claims made that large percentage but incomplete automation can have negative consequences because the human operator does not get enough practice to really be effective during the times she needs to take over. Especially in rare high leverage situations.
Anecdotally, I work in software development for a company that has a lot of services. A service that is not 100% resilient to incomplete/missing data at start up gets restarted any time its key data changes. There is no point to making something 90% resilient.
The automation rule of thumb is that the automated process has to be significantly better than manual or human-assisted automation by every metric. It does not have to be 100%, but it has to be way better than the alternative. An autonomous vehicle can make mistakes, but if there are far fewer of them and if they are of the same kind a human would have made (otherwise it's "worse than a human driver" by at least one metric), then it's acceptable.
Suppose we have a self-driving car which works 99% of the time. A human driver only needs to intervene to prevent an accident on one trip out of 100. How much economic value does this generate, relative to a full human-level-or-better self-driving car?
I would guess less than 10%, maybe even less than 1% of the value. Why? Because with a 1% error rate, we still need a human driver, and the vast majority of the value of self-driving cars comes from removing the human. Things like automated emergency braking, parallel parking, warning systems, or cruise control definitely have some value - but they’re nowhere near the value of giving every worker in the United States one extra hour every weekday (roughly the average round-trip commute time).
I think this sort of problem is far more common than most engineers intuitively expect. There’s a lot of areas where it seems like it should be easy to automate, if not all of the work, at least 90% of it. It looks like there’s metaphorical hundred-dollar bills lying on the ground. But I think, in most of these cases, automating 90% just doesn’t generate all that much value - because you still need a human watching everything, waiting to jump in as soon as the other 10% comes up. The vast majority of the value comes from taking the human out of the loop.
Personally, I ran into this at a mortgage startup. We wanted to automate as much of the approval process as possible; we figured at least 90% of approval conditions (weighted by how often they’re needed) should be tractable. In retrospect, that was true - 90% of it was pretty tractable. But we realized that, even with the easy 90% automated, we would still need humans most of the time. The large majority of our loans had at least some “hair” on them - something which was weird and needed special handling. Sometimes it was FHA/VA subsidies (each requiring a bunch of extra legwork). Sometimes it was income from a side-gig or alimony. Sometimes it was a condition associated with the appraisal - e.g. a roof repair in-progress. Sometimes it was an ex-spouse on the title. No single issue was very common, but most loans had something weird on them. And as soon as a human needed to be in the loop, at all, most of the automation value was gone - we couldn’t offer substantive instant results.
In general, when would we expect this problem to show up?
I see two main ways to circumvent the issue:
The problem should show up mainly when circumvention fails. So, we’d expect the majority of automation-value to be in the long tail when:
The first condition is both self-explanatory and common; the second condition is the probably the limiting factor more often. When and why would having one human oversee multiple tasks not be helpful?
The self-driving car and mortgage examples offer some possible reasons. In both cases, reaction time is a key issue. For the car, we need a reaction fast enough to avert an accident; for the mortgage, we want to offer substantive instantaneous approval checks. In either case, drawing the attention of a human and asking them to solve the problem would take too long.
Another factor is necessary context. For some mortgages - especially those near the approval boundary - there’s a lot of interdependence between requirements. We need to document A or B or C (unless D), and B is only an option if E but not A, and so forth. The overall effect is that a human needs the whole context of the mortgage application in order to handle any one piece of it. When that happens, the value-add of automation is very low; even if some subset of the work is automated, the human still needs to figure out all the context, and that’s where most of the work is.
To summarize, the value of automation will mostly be in the long tail when:
I’m curious to hear peoples’ thoughts on other relevant factors.