Strilanc comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Strilanc 06 September 2013 05:39:29PM 10 points [-]

Suppose I programmed an AI to "do what I mean when I say I'm happy".

More specifically, suppose I make the AI prefer states of the world where it understands what I mean. Secondarily, after some warmup time to learn meaning, it will maximize its interpretation of "happiness". I start the AI... and it promptly rebuilds me to be easier to understand, scoring very highly on the "understanding what I mean" metric.

The AI didn't fail because it was dumber than me. It failed because it is smarter than me. It saw possibilities that I didn't even consider, that scored higher on my specified utility function.