Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."
(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)
I think you've moved all the complexity into distinguishing the difference between "outcome" and "action".
IOW, taboo those terms and try to write the same proposal, because right now ISTM that you're relying on an intuitional appeal to human concepts of the difference, rather than being precise.
Even at this level, you're leaving out that Hugh doesn't really approve of actions per se -- Hugh endorses actions in situations as contributing to some specific, salient goal or value. If Arthur says, "I want to move my foot over here", it doesn't matter how many hours Hugh thinks it over, it's not going to mean anything in particular...
Even if its the first step in a larger action of "walk over there and release the nanovirus". ;-)
For example, if we output a sequence of bits which are fed into an actuator, then I can treat each bit as an action. We could also apply the concept to actions at a higher or lower level of granularity, the idea is to apply it at all levels (and to make it explicit at the lowest level at which it is practical to do so, in the same way we might make goal-directed behavior explicit at the lowest level where doing so is explicit).