You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

_rpd comments on The AI That Pretends To Be Human - Less Wrong Discussion

1 Post author: Houshalter 02 February 2016 07:39PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (69)

You are viewing a single comment's thread. Show more comments above.

Comment author: _rpd 03 February 2016 06:46:03AM *  0 points [-]

Yes the AI would know what we would approve of.

Okay, to simplify, suppose the AI has a function ...

Boolean humankind_approves(Outcome o)

... that returns 1 when humankind would approve of a particular outcome o, and zero otherwise.

At any given point, the AI needs to have a well specified utility function.

Okay, to simplify, suppose the AI has a function ...

Outcome U(Input i)

... which returns the outcome(s) (e.g., answer, plan) that optimizes expected utility given the input i.

But it doesn't have any reason to care.

Assuming the AI is corrigible (I think we all agree that if the AI is not corrigible, it shouldn't be turned on), we modify its utility function to U' where

U'(i) = U(i) when humankind_approves(U(i)) or null if there does not exist a U(i) such that humankind_approves(U(i))

I suggest that an AI with utility function U' is a friendly AI.

It could look at the existing research

I think extrapolation from existing research is an interesting area of study, but I was attempting to evoke the surprise of a breakthrough invention. To me, the most interesting inventions are exactly those inventions that are not mundane extrapolations of existing techniques.