I like this post because it pushes us to be more precise about what we mean by corrigibility. Nice example.
This seems like a generic problem with User Interfaces in general. Making them too smart while they are still able to "make errors that they don't know that they're making" is a recipe for a bad user experience.
If you're going to have a layer of disintermediation between what happens under the hood and what the user requests, it should either be super super tight (so that the request ALWAYS causes what is desired) or else it should have the capacity to notice fuzzy or unrealizable expressions of intent and and initiate repair on the communicative intent.
Maybe in the mid 2020s things will get better, but in 2018:
...observing users struggle with the AI interfaces felt like a return to the dark ages of the 1970s: the need to memorize cryptic commands, oppressive modes, confusing content, inflexible interactions — basically an unpleasant user experience.
A safety measure (MCAS) made the plane actually less safe. I see it as an example of possible type of alignment failure.
The Boeing Maneuvering Characteristics Augmentation System (MCAS) can be thought of, if reaching a bit, as a specialized AI: it performs a function normally reserved for a human pilot: pitching the nose down when it deems the angle of attack to be dangerously high. This is not, by itself, a problem. There are pilots in the cockpit who can take control when needed.
Only in this case they couldn't. Simply manually pitching the nose up when the MCAS pitching it down too much would not disengage the system, it would activate again, and again. One has to manually disengage the autopilot (this information was not in the pilot training). For comparison, think of the cruise control system in a car: the moment you press the brake pedal, it disengages; if you push the gas pedal, then release, it return to the preset speed. At no time it tries to override your actions. Unlike MCAS.
MCAS disregards critical human input and even fights the human for control in order to reach its goal of "nominal flight parameters". From the Corrigibility paper:
In this case the "agent" actively fought its human handlers instead of assisting them. Granted, the definition above is about programmers, not pilots, and the existing MCAS probably would not fight a software update, being a dumb specialized agent.
But we are not that far off: a lot of systems include built-in security checks for the remote updates. If one of those checks were to examine the algorithm the updated code uses and reject it when it deems it unacceptable because it fails its internal checks, the corrigibility failure would be complete! In a life-critical always-on system this would produce a mini-Skynet. I don't know whether something like that has happened yet, but I would not be surprised if it has, and resulted in catastrophic consequences.