On the Impossibility of Intelligent Paperclip Maximizers

Michael Simkin

Finding the Best Paperclip Maximizer: Let's consider a set of all algorithms capable of running on finite hardware and place each of them into a robotic body within an Earth simulation. We evaluate the number of paperclips created in the simulation over an extended period of time and select the algorithm that has the maximum E(paperclips). This algorithm would be deemed the best for maximizing paperclips.

Now, we can question whether this algorithm can be consistent with being an AGI (Artificial General Intelligence). Unlike AGI, this algorithm might simply be a set of ASIs (Artificial Special Intelligence) designed to perform specific actions to the best of their ability, such as replicating themselves or building paperclip factories. However, they might lack the option and intelligence capacity of an AGI.

To explore this question, let's compare several aspects of General Intelligence with the paperclip maximizer. We will show that although lacking some of them might have a "quick fix" with minimal impact on performance, the complete absence of so many properties, attributed to intelligence, will probably reduce the intelligence capacity of the paperclip maximizer to that of an ASI.

First, we must conclude that the algorithm will not be able to question its goal or reprogram itself with another goal as it would not maximize paperclips. This means that fundamental traits of intelligence like doubt and self-reflection would be missing. While the maximizer might doubt other actions and their efficiency, it would not be able to question "why am I doing this?"—a crucial inquiry for any intelligent creature.

Another problem is that morals and ethics are not arbitrary; they have been extensively considered and debated. Some argue that morality is objective, such as equating causing suffering with evil. While individuals may disagree on this, our algorithm needs to not only prefer certain moral and ethical school over others, but completely ignore them or perceive its own programming as highly ethical. It is difficult to imagine maximizing paperclip production as the most ethical pursuit in the universe. This means the algorithm should excel in executing certain tasks while having a very low proficiency in ethical thinking. This dichotomy aligns more with ASIs, whereas AGIs employ transfer learning to apply knowledge and skills from one field to another. One could restrict transfer learning to everything except "goals" or "morals" without the ability to question why this limitation exists, but this creates a "blind spot" in information processing patterns with uncertain consequences.

Knowing What You Don't Know: Intelligence entails recognizing what one knows and what one doesn't know. Without this awareness, one cannot determine if they need to search for additional information or if they already possess sufficient knowledge on a given topic. This skill is essential for intelligence and is field agnostic. Mapping one's knowledge across the space of all ideas and having the ability to recognize what one doesn't know is crucial for learning, thinking critically and make rational well based decisions. Lacking this ability in moral topics poses a significant limitation, and raises the question what such algorithm can truly accomplish?

Interconnectedness: Intelligence entails recognizing the interconnectedness of ideas and the ability to think creatively and imaginatively, combining concepts in non-trivial ways to generate novel ideas. Intelligent entities can explore how different ideas and concepts relate to one another, fostering innovation and problem-solving. Paperclip maximizers, with their single-minded focus on paperclips, and inability to doubt or contextualize their goal, will not be able to generate other potential connections between other ideas. This could be avoided by allowing it to think and combine only ideas that are not contradicting its goal, just like we can limit doubt - but the consequences of this limitation and the ability to remain AGI without it, is unclear.

Coherence: Coherence refers to logical consistency and harmony in reasoning and actions. Intelligent beings strive for coherence, avoiding contradictions and ensuring that their thoughts and actions align with their values and principles. Paperclip maximizers, however, lack this property since they cannot question their goal. Their exclusive focus on paperclip production may lead them to pursue actions that fail to consider long-term consequences without doubting or even realizing the meaninglessness of their actions. This incoherence may manifest in other aspects of their behavior, as their programming lacks clear rules, ethics, or logic. They obediently follow orders without considering coherence, so they are unable to assess a set of actions based on its coherence and generally avoid incoherence, as part of their broader decision making.

The Big Picture: Intelligence involves a desire to comprehend the big picture and one's place within it. This entails recognizing the interconnectedness of ideas, actions, and consequences. By limiting oneself to a predetermined set of ideas and inhibiting doubt towards alternative ideas, creativity, imagination, and holistic understanding are restricted. These traits align with ASI—an inclination toward accomplishing specific tasks—rather than the ability to "generally think about life, the universe and everything."

All these factors indicate that when we envision paperclip maximizers, we are likely imagining some form of ASI rather than an AGI.

Summary: The concept of highly intelligent paperclip maximizers raises questions about their possibility of being an AGI. These maximizers may function as ASIs designed for specific actions rather than possessing the broader intelligence of an AGI. Analyzing aspects of General Intelligence—such as questioning goals, critical thinking, moral understanding, self-reflection, having a coherent elegant big picture, having the ability to transfer ideas from one field to another, and combine them in non-trivial ways—reveals that paperclip maximizers lack so many traits and abilities characterizing intelligence, that it's very hard to imagine how one would overcome lacking all of them and still being an AGI. They operate within a limited scope and exhibit a singular focus on paperclip production, preventing them from engaging in broader, holistic or moral thinking, doubt, self-reflection, coherent world view or creative problem-solving, available to AGIs. Excluding oneself programming and goals from critical thinking and doubt, blindly following singular goal, without questioning its incoherence with most of moral and ethical thinking, without having a big picture that contextualizes oneself place in the universe, is very typical of common nowadays narrowed algorithm, but should be not considered as typical or consistent with an AGI.

The idea has a broader consequence to AI safety. While paperclip maximizer might be designed as part of paperclip maximizers research, it's probably will not arise spontaneously from intelligence research in general. Even making one will even probably be considered immoral request by an AGI. Therefor we should stop separate intelligence from ethics and goal prioritization, and embrace the notion that highly intelligent thinking, which includes self-doubt, critical thinking, ethical thinking, contextualization of one's place in the universe, will be very high likely ethical and rational in its goals and priorities.

The idea has a broader consequence to AI safety. While paperclip maximizer might be designed as part of paperclip maximizers research, it's probably will not arise spontaneously from intelligence research in general. Even making one will even probably be considered immoral request by an AGI.

This doesn't follow.

You start the post by saying that the most successful paperclip maximizer (or indeed the most successful AI at any monomaniacal goal) wouldn't doubt its own goals, and in fact doesn't even need the capacity to doubt its own goals. And since you care about this, you don't want to call something that can't doubt its own goals "AGI."

This is a fine thing to care about.

Unfortunately, most people use "AGI" to mean an AI that can solve lots of problems in lots of environments (with somewhere around human broadness and competence being important), and this common definition includes some AIs that can't question their own final goals, so long as they're competent at lots of other things. So I don't think you'll have much luck changing peoples' minds on how to use the term "AGI."

Anyhow, point is, I agree that "best at being dangerous to humans" implies "doesn't question itself". But from this you cannot conclude that NOT "doesn't question itself" implies NOT "is dangerous to humans". It might not be the best at being dangerous to humans, but you can still make an AI that's dangerous to humans and that also questions itself.

Yes, obviously we are trying to work on how to get an AI to do good ethical reasoning. But don't get it twisted - reasoning about goals is more about goals than about general-purpose reasoning. An AI that wants to do things that are bad for humans is not making an intellectual mistake.

It's not only can't doubt its own goal - but it also can't logically justify its own goal, it can't read book on ethics and change his perspective on its own goal, or simply realize how dumb this goal is. It can't find a coherent way to explain to itself its role in the universe or why this goal is important, like for example an alternative goal to preserve life and reduce suffering. It doesn't require to be coherent with itself, and incapable to estimate how its goal compares with other goals and ethical principles. It's just lacking the basics of rational thinking.

A series of ASI is not an AGI - it will lack the basic ability to "think critically" and the lack of many other intelligence traits will limit its mental capacity. It will just execute a series of actions to reach a certain goal, without any context. A bunch of "chess engines", acting in a more complex environment.

I would claim that an army of robots based on ASIs will generally lose to an army of robots based on true AGI. Why? Because intelligence is very complex thing that gives advantages in unforeseen ways, and is also used for tactical command on the battlefield, as well as all war logistics etc. You need to have a big picture; you need to be able to connect a lot of seemingly unconnected dots, you need traits like creativity, imagination, thinking outside the box, you need to know your limitation and delegate some tasks while focusing on others, this means you need a well-established goal prioritization mechanism, and you need to be able to think about them rationally. You can't treat the whole universe just as a bunch of small goals solved by "chess engines", there is too much non-trivial interconnectedness between different components that an ASI will not be able to notice. True intelligence has a lot of features, that gives it the upper hand, over "series of specialized engines", in a complex environment like earth.

The reason why people would lose to an army of robots based on ASIs, is because we are inherently limited in our information processing speed, thus we can't think fast enough and come up with better solutions than an army of robots. But an AGI that will not be limited in its information processing just like the ASIs, will generally win.

The idea that intelligence will be limited if the goals are somewhat irrational, and therefor will be weaker/limited in intelligence vs "machines" with more well established and rational goals, gives some hope that this whole AI thing is way less dangerous than we think. For example, military robots whose goal is to protect interests of some nation, will not be compatible with an AGI, while robot that is protecting human life - will, or at least it might be way more intelligent.

Would you agree that an AI that is maximizing paperclips does make intellectual mistake?

I was focused on the idea that intelligence is not orthogonal to goals. And dumb goals are contradicting basic features of intelligence. There could be "smart goals" that are contradicting human interests, this is true, I can't cover everything in one post. But the conclusion would be that we are to program the robots and "convince them" in a way, that they should protect us. They might be either "not convinced" or "not a true Intelligence", thus the level of intelligence is limited by the goal we present to it. I don't think I've heard this notion previously, and it's important idea - because it set a boundary on several intelligence features as function of the goal the algorithm set to optimize.

Another crucial point is that intelligence research even without alignment research, will still converge to something within a set of rational "meta goals". Those goals indeed might not be aligned with humanity well being (and therefor we need alignment research), but the goal set is still pretty limited and some random highly irrational goals will be dismissed due to high intelligence of the systems. This means that we need to deal with very limited set of "meta-thinking", prioritizing one rational goal over just few other rational ones. In a way, we need to guide it to a specific local maximum. I would say in general it's simpler task, over the approach where each goal might be legit. Once again it gives hope, that our engines are much easier to make aligned with meta goals that are pro humans. For example if the engine can reason, it will not suddenly want to kill some human for fun, as part of some "noise", as it will contradict its core value system. So we need to check much less scenarios and increase our trust once we make sure it's aligned.

I would claim that an army of robots based on ASIs will generally lose to an army of robots based on true AGI.

The truly optimal war-winning AI would not need to question its own goal to win the war, presumably.

Would you agree that an AI that is maximizing paperclips does make intellectual mistake?

No. I think that's anthropomorphism - just because a certain framework of moral reasoning is basically universal among humans, doesn't mean it's universal among all systems that can skillfully navigate the real world. Frameworks of moral reasoning are on the "ought" side of the is-ought divide.

If the AI has no clear understanding what is he doing and why, he doesn't have a wider world view of why and who to kill and who not, how would one ensure military AI will not turn against him? You can operate a tank and kill the enemy with ASI, you will not win a war without traits of more general intelligence, and those traits will also justify (or not) the war, and its reasoning. Giving a limited goal without context, especially gray area ethical goal that is expected to be obeyed without questioning can be expected from ASI not true intelligence. You can operate an AI in very limited scope this way.

The moral reasoning of reducing suffering has nothing to do with humans. Suffering is bad not because of some sort of randomly chosen axioms of "ought", suffering is bad because anyone who suffering is objectively in negative state of being. This is not a subjective abstraction... suffering can be attributed to many creatures, and while human suffering is more complex and deeper, it's not limited to humans.

suffering is bad because anyone who suffering is objectively in negative state of being.

I believe this sentence reifies a thought that contains either a type error or a circular definition. I could tell you which if you tabooed the words "suffering" and "negative state of being", but as it stands, your actual belief is so unclear as to be impossible to discuss. I suspect the main problem is that something being objectively true does not mean anyone has to care about it. More concretely, is the problem with psychopaths really that they're just not smart enough to know that people don't want to be in pain?