MixedNuts comments on asking an AI to make itself friendly - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (30)
That's extrapolated volition.
And it requires telling the AI "Implement good. Human brains contain evidence for good, but don't define it; don't modify human drives, that won't change good.". It requires telling it "Prove you don't get goal drift when you self-modify.". It requires giving it an explicit goal system for its infancy, telling it that it's allowed to use transistors despite the differences in temperature and gravity and electricity consumption that causes, but not to turn the galaxy into computronium - and writing the general rules for that, not the superficial cases I gave - and telling it how to progressively overwrite these goals with its true ones.
"Oracle AI" is a reasonable idea. Writing object-level goals into the AI would be bloody stupid, so we are going to do some derivation, and Oracle isn't much further than CEV. Bostrom defends it. But seriously, "don't influence reality beyond answering questions"?