Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
Part 1 was previously posted and it seemed that people likd it, so I figured that I should post part 2 - http://waitbutwhy.com/2015/01/artificial-intelligence-revolution-2.html
There's a story about a card writing AI named Tully that really clarified the problem of FAI for me (I'd elaborate but I don't want to ruin it).
You have a complicated goal system that can distinguish between short-term rewards and other goals. In the situations in question, the AI has no goal other than than the goal in question. To some extent, your stability arises precisely because you are an evolved hodgepodge of different goals in tension- if you weren't you wouldn't survive. But note that similar, essentially involuntary self-modification does on occasion happen with some humans- severe drug addiction is the most obvious example.
But the goal in question is "get the reward" and it's only by controlling the circumstances under which the reward is given that we can shape the AIs behavior. Once the AI is capable of taking control of the trigger, why would it leave it the way we've set it? Whatever we've got it set to is almost certainly not optimized to triggering the reward.