Presumably with a very powerful AI, if it was going wrong we'd have seriously massive problems. So the best failsafe would be retrospective and rely on a Chinese Wall within the AI so it was banned from working round via another failsafe.
So when the AI hits certain prompts (realising that humans must all be destroyed) this sets off the hidden ('subconscious') failsafe that switches it off. Or possibly makes it sing Daisy, Daisy while slowly sinking into idiocy.
To clarify, I know nothing about AI or these 'genie' debates, so sorry if this has already been discussed and doesn't work at all. At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.
At the London meetup I tried out the idea of an AI which only cared about a small geographical area to limit risk: someone pointed out that it would happily eat the rest of the universe to help its patch. Oh well.
You've to understand that the basic argument is the mere possibility that AI might be dangerous and the high-risk associated with it. Even if it would be unlikely to happen, the vast amount of negative utility associated with it does outweigh its low probability.
Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?