You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Larks comments on asking an AI to make itself friendly - Less Wrong Discussion

-4 Post author: anotheruser 27 June 2011 07:06AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: Larks 29 June 2011 03:31:29PM 3 points [-]

If we don't define "optimal" properly it should be able to find a suitable definition on its own by imagining what we might have meant.

But it wouldn't want to. If we mistakenly define 'optimal' to mean 'really good at calculating pi' then it won't want to change itself to aim for our real values. It would realise that we made a mistake, but wouldn't want to rectify it, because the only thing it cares about is calculating pi, and helping humans isn't going to do that.

You're broadly on the right track; the idea of CEV is that we just tell the AI to look at humans and do what they would have wanted it to do. However, we have to actually be able to code that; it's not going to converge on that by itself.

Comment author: anotheruser 29 June 2011 03:42:06PM -2 points [-]

It would want to, because it's goal is defined as "tell the truth".

You have to differentiate between the goal we are trying to find (the optimal one) and the goal that is actually controlling what the AI does ("tell the truth"), while we are still looking for what that optimal goal could be.

the optimal goal is only implemented later, when we are sure that there are no bugs.