Approval-directed agents

paulfchristiano

Most concern about AI comes down to the scariness of goal-oriented behavior. A common response to such concerns is “why would we give an AI goals anyway?” I think there are good reasons to expect goal-oriented behavior, and I’ve been on that side of a lot of arguments. But I don’t think the issue is settled, and it might be possible to get better outcomes without them. I flesh out one possible alternative here, based on the dictum "take the action I would like best" rather than "achieve the outcome I would like best."

(As an experiment I wrote the post on medium, so that it is easier to provide sentence-level feedback, especially feedback on writing or low-level comments.)

I feel like if you give the AI enough freedom for its intelligence to be helpful, you'd have the same pitfalls as having the AI pick a goal you'd approve of. I also feel like it's not clear exactly which decisions you'd oversee. What if the AI convinces you that it's actions are fine, because you'd approve of its method of choosing them, and that it's method is fine, because you'd approve of the individual action?

15

Approval-directed agents

15

15

15

Approval-directed agents

15

15