It seems like it would be really easy to come up with a lot of moral questions and answers and then ask an AI to tell us what it predicts humans preferring as an outcome.
There's a possibility that AI is not good at modeling human preferences, but if that's the case, it'll be very apparent at lower levels because that will mean commands will have to be very specific to get results. Any model that can't answer basic questions about it's intended goals is not going to be given the (metaphorical) nuclear codes.
In fact, why wouldn't you just test every AI by asking it to explain how it's going to solve your problem before it actually solves it?
How do I die?
I think there may have been some miscommunication here, either I'm not understanding you or you're not understanding me, so I'll explain my second paragraph point in a different way in case it was my mistake.
My model is that at lower levels of AI, 'misalignment' will be measurable but not catastrophic. It would look like producing an advertising campaign that is funny but does not feature the product, or a tool that is very cheap but very useless. Any misunderstanding of human preferences will lead to failure, so either humans will improve their ability to understand and communicate with AI, or vice versa, but it will happen by necessity of getting AI to do anything.
That's why you make its goal to communicate what it would do, not to actually do the thing. Which I guess is based on the assumptions that oracular AI can be created, and we have enough time to build the superintelligence, and then use the superintelligence to align itself.
On a side note, I enjoyed the article. My answer to the problem would be more testing, more training, and always using hypotheticals. I don't see why you couldn't ask an AI to predict what it would do if you gave it a particular goal system and let it out of the box.
Also, I don't think it's an argument against any one theory of AI, it seems like a general problem "AI can misunderstand humans and their reactions." That problem can be mitigated, prevented, and backdoored, but it doesn't seem like the problem or solutions differ among AI systems, no? If EY's style of alignment wouldn't have that problem, I might need it explained to me how that would be the case.