Paul, I think you're headed in a good direction here.
On the subject of approval-directed behavior:
One broad reason people and governments disapprove of behaviors is that they break the law or violate ethical norms that supplement laws. A lot of AGI disaster seems to incorporate some law-breaking pretty early on.
Putting aside an advanced AI that can start working on changing the law, shouldn't one thing (but not the only thing) an approval-directed AI do is constantly check whether its actions are legal before doing them?
The law by itself is not a complete set of norms of acceptable behavior, and violating the law may be acceptable in exceptional circumstances.
However, why can't we start there?
(Crossposted from ordinary ideas).
I’ve recently been thinking about AI safety, and some of the writeups might be interesting to some LWers:
I’m excited about a few possible next steps: