You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

MixedNuts comments on asking an AI to make itself friendly - Less Wrong Discussion

-4 Post author: anotheruser 27 June 2011 07:06AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (30)

You are viewing a single comment's thread. Show more comments above.

Comment author: MixedNuts 29 June 2011 07:31:52AM 2 points [-]

In essence, it all boils down to asking the AI: "if you were in our position, if you had our human goals and drives, how would you define your (the AI's) goals?"

That's extrapolated volition.

And it requires telling the AI "Implement good. Human brains contain evidence for good, but don't define it; don't modify human drives, that won't change good.". It requires telling it "Prove you don't get goal drift when you self-modify.". It requires giving it an explicit goal system for its infancy, telling it that it's allowed to use transistors despite the differences in temperature and gravity and electricity consumption that causes, but not to turn the galaxy into computronium - and writing the general rules for that, not the superficial cases I gave - and telling it how to progressively overwrite these goals with its true ones.

"Oracle AI" is a reasonable idea. Writing object-level goals into the AI would be bloody stupid, so we are going to do some derivation, and Oracle isn't much further than CEV. Bostrom defends it. But seriously, "don't influence reality beyond answering questions"?