In addition to determining whether an action would be approved using a priori reasoning, an approval-directed AI could also reference a large database of past actions which have either been approved or disapproved.
Alternatively, in advance of ever making any real-world decision, the approval-directed AI could generate example scenarios and propose actions to people deemed effective moral reasoners many thousands of times. Their responses would greatly assist the system in constructing a model of whether an action is approvable, and by whom.
A lot of approval data could be created fairly readily. The AI can train on this data.
(Crossposted from ordinary ideas).
I’ve recently been thinking about AI safety, and some of the writeups might be interesting to some LWers:
I’m excited about a few possible next steps: