Many people think you can solve the Friendly AI problem just by writing certain failsafe rules into the superintelligent machine's programming, like Asimov's Three Laws of Robotics. I thought the rebuttal to this was in "Basic AI Drives" or one of Yudkowsky's major articles, but after skimming them, I haven't found it. Where are the arguments concerning this suggestion?
Just some miscellaneous thoughts:
I always flinch when I read something along those lines. It sounds like you could come up with something that by definition you shouldn't be able to come up with. I know that many humans can do better than one human alone but if it comes to the question of proving goal stability of superior agents then any agent will either have to face the same bottleneck or it isn't an important problem at all. By definition we are unable to guess what a superior agent will be able to devise to get around failsafes, yet that will be the case for every iteration. Consequently, goal stability, or intelligence-independent 'friendliness' is a requirement for an intelligence explosion to happen in the first place. A paperclip maximizer wants to guarantee that its goal of maximizing paperclips will be preserved when it improves itself. By definition a paperclip maximizer is unfriendly, does not feature inherent goal-stability and therefore has to use its initial seed intelligence to devise a sort of paperclip-friendliness. And if goal-stability isn't independent of the level of intelligence then that is another bottleneck that will slow down recursive self-improvement.
I am having a lot of trouble following your point, here, or how what you're saying relates to the line you quote.
Taking a stab at it...
I can see how, in some sense, goal stability is a prerequisite for an "intelligence explosion".
At least, if a system S that optimizes for a goal G is capable of building a new system S2 that is better suited to optimize for G, and this process continues through S3, S4 .. Sn, that's as good a definition of an "intelligence explosion" as any I can think of off-hand.
And it's hard to see how that process ... (read more)