"Computers can add numbers much more accurately than humans. They can draw better pictures than humans. They can play better chess. See the pattern? Well, AIs will soon be able to generate desired outcomes for society better than humans can.
I feel that the AI Alignment discourse has become somewhat detached from both reality and from sane theory.
This post is an attempt to correct that state of affairs."
There are at lest two meaning in alignment: 1. do what I want and 2. don't have catastrophic failure.
I think that "alignment is easy' works only for the first requirement. But there could be many catastrophic failures modes, e.g. even humans can rebel or drift away from initial goals.
Yes, I think this objection captures something important.
I have proven that aligned AI must exist and also that it must be practically implementable.
But some kind of failure, i.e. a "near miss" on achieving a desired goal can happen even if success was possible.
I will address these near misses in future posts.