Eliezer_Yudkowsky comments on The Need for Human Friendliness - Less Wrong

6 Post author: Elithrion 07 March 2013 04:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (28)

You are viewing a single comment's thread.

Comment author: Eliezer_Yudkowsky 07 March 2013 11:30:05PM 6 points [-]

The other problem with checking the code is that an FAI's Friendliness content is also going to consist significantly or mostly of things the FAI has learned, in its own cognitive representation. Keeping these cognitive representations transparent is going to be an important issue, but basically you'd have to trust that the tool and possibly AI skill that somebody told you translates the cognitive content, really does so; and that the AI is answering questions honestly.

The main reason this isn't completely hopeless for external assurance (by a trusted party, i.e., they have to be trusted not to destroy the world or start a competing project using gleaned insights) is that the FAI team can be expected to spend effort on maintaining their own assurance of Friendliness, and their own ability to be assured that goal-system content is transparent. Still, we're not talking about anything nearly as easy as checking the code to see if the EVIL variable is set to 1.