People who think that risks from AI is the category of dangers that is most likely to be the cause of a loss of all human value in the universe often argue that artificial general intelligence tends to undergo recursive self-improvement. The reason for doing so is that intelligence is maximally instrumentally useful in the realization of almost any terminal goal an AI might be equipped with. They believe that intelligence is an universal instrumental value. This sounds convincing, so let's accept it as given.
What kind of instrumental value is general intelligence, what is it good for? Personally I try to see general intelligence purely as a potential. It allows an agent to achieve its goals.
The question that is not asked is why an artificial agent would tap the full potential of its general intelligence rather than only use the amount it is "told" to use, where would the incentive to do more come from?
If you deprived a human infant of all its evolutionary drives (e.g. to avoid pain, seek nutrition, status and - later on - sex), would it just grow into an adult that might try to become rich or rule a country? No, it would have no incentive to do so. Even though such a "blank slate" would have the same potential for general intelligence, it wouldn't use it.
Say you came up with the most basic template for general intelligence that works given limited resources. If you wanted to apply this potential to improve your template, would this be a sufficient condition for it to take over the world? I don't think so. If you didn't explicitly told it to do so, why would it?
The crux of the matter is that a goal isn't enough to enable the full potential of general intelligence, you also need to explicitly define how to achieve that goal. General intelligence does not imply recursive self-improvement, just the potential to do so, but not the incentive. The incentive has to be given, it is not implied by general intelligence.
For the same reasons that I don't think that an AGI will be automatically friendly, I don't think that it will automatically undergo recursive self-improvement. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
For example, in what sense would it be wrong for a general intelligence to maximize paperclips in the universe by waiting for them to arise due to random fluctuations out of a state of chaos? It is not inherently stupid to desire that, there is no law of nature that prohibits certain goals.
Why would an generally intelligent artificial agent care about how to reach its goals if the preferred way is undefined? It is not intelligent to do something as quickly or effectively as possible if doing so is not desired. And an artificial agent doesn't desire anything that it isn't made to desire.
There exists an interesting idiom stating that the journey is the reward. Humans know that it takes a journey to reach a goal and that the journey can be a goal in and of itself. For an artificial agent there is no difference between a goal and how to reach it. If you told it to reach Africa but not how, it might as well wait until it reaches Africa by means of continental drift. Would that be stupid? Only for humans, the AI has infinite patience, it just doesn't care about any implicit connotations.
There needs to be some metric by which it can measure the available ways, as long as you don't build it to choose one randomly. So if it doesn't act randomly, why exactly would the favorable option be to act by consuming the whole world to improve its intelligence? Recursive self-improvement is a resource that can be used, not a mandatory way of accomplishing goals. There is nothing fundamentally rational about achieving goals efficiently and quickly. An artificial agent simply doesn't care if you don't make it care.
My case is that any instrumental values are relative, even intelligence and goal-preservation. An artificial agent simply doesn't care not to die whatever it takes, to act as smart and fast as possible or to achieve any given goal economically.
If the AI is a maximizer rather than satisficer, then it will likely have a method for measuring the quality of it's paths to achieving optimization that can be derived from it's utility function and it's model of the world. So the question isn't whether it will be able to choose a path, but instead is: Is it more likely to choose a path where it sits around risking its own destruction or more likely to get started protecting things that share its goal (including itself) and acheiving some of its subgoals.
Also, if the AI is a satisficer then maybe that wou... (read more)