_rpd comments on The AI That Pretends To Be Human - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (69)
Okay, to simplify, suppose the AI has a function ...
Boolean humankind_approves(Outcome o)
... that returns 1 when humankind would approve of a particular outcome o, and zero otherwise.
Okay, to simplify, suppose the AI has a function ...
Outcome U(Input i)
... which returns the outcome(s) (e.g., answer, plan) that optimizes expected utility given the input i.
Assuming the AI is corrigible (I think we all agree that if the AI is not corrigible, it shouldn't be turned on), we modify its utility function to U' where
U'(i) = U(i) when humankind_approves(U(i)) or null if there does not exist a U(i) such that humankind_approves(U(i))
I suggest that an AI with utility function U' is a friendly AI.
I think extrapolation from existing research is an interesting area of study, but I was attempting to evoke the surprise of a breakthrough invention. To me, the most interesting inventions are exactly those inventions that are not mundane extrapolations of existing techniques.