Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
I've put a preprint up on arXiv that this community might find relevant. It's an argument from over a year ago, so it may be dated. I haven't been keeping up with the field much since I wrote it, so I welcome any feedback especially on where the crux of the AI risk debate has moved since the publication of Bostrom's Superintelligence book.
Don't Fear the Reaper: Refuting Bostrom's Superintelligence Argument
In recent years prominent intellectuals have raised ethical concerns about the consequences of artificial intelligence. One concern is that an autonomous agent might modify itself to become "superintelligent" and, in supremely effective pursuit of poorly specified goals, destroy all of humanity. This paper considers and rejects the possibility of this outcome. We argue that this scenario depends on an agent's ability to rapidly improve its ability to predict its environment through self-modification. Using a Bayesian model of a reasoning agent, we show that there are important limitations to how an agent may improve its predictive ability through self-modification alone. We conclude that concern about this artificial intelligence outcome is misplaced and better directed at policy questions around data access and storage.
As I hope is clear from the argument, the point of the article is to suggest that to the extent AI risk is a problem, we should shift our focus away from AI theory and more towards addressing questions of how we socially organize data collection and retention.
For the sake of argument, let's consider an agent to be autonomous if:
- It has sensors and actuators (important for an agent)
- It has an internal representation of its goals. I will call this internal representation its desires.
- It has some kind of internal planning function that given sensations and desires, chooses actions to maximize the desirability of expected outcomes
Suppose you want to predict the behavior of an agent. I stand corrected. To make the prediction, as a predictor you need:
- observations of the agent
- the capacity to model the agent to a sufficient degree of accuracy
"Sufficient accuracy" here is a threshold on, for example, KL divergence or perhaps some measure that depends on utilities of predictions in the more complex case.
When we talk about the intelligence of a system, or the relative intelligence between agents, one way to think of that is the ability for one agent to predict another.
Consider a game where an agent, A, acts on the basis of an arbitrarily chosen polynomial function of degree k. A predictor, P, can observe A and build predictive models of it. Predictor P has the capacity to represent predictive models that are polynomial functions of degree j.
If j > k, then predictor P will in principal be able to predict A with perfect accuracy. If j < k, then there most of the time be cases where P predicts inaccurately. If we say (just for the sake of argument) that perfect predictive accuracy is the test for sufficient capacity, we could say that in the j < k case P does not have sufficient capacity to represent A.
When we talk about the relative intelligence between agents in an adversarial context, this is one way to think about the problem. One way that an agent can have a decisive strategic advantage over another is if it has the capacity to predict the other agent and not vice-versa.
The expressive power of the model space available to P is only one of the ways in which P might have or not have capacity to predict A. If we imagine the prediction game extended in time, then the computational speed of P--what functions it can compute within what span of real time--relative to the computational speed of A could be a factor.
Note that these are ways of thinking about the relative intelligence between agents that do not have anything explicitly to do with "optimization power" or a utility function over outcomes. It is merely about the capacity of agents to represent each other.
One nice thing about representing intelligence in this way is that it does not require an agent's utility function to be stable. In fact, it would be strange for an agent that became more intelligent to have a stable utility function, because the range of possible utility functions available to a more intelligent agent are greater. We would expect that an agent that grows in its understanding would change its utility function--if only because to do so would make it less predictable to adversarial agents that would exploit its simplicity.
To the extent that an agent is predictable, it must be:
- observable, and
- have a knowable internal structure
The first implies that the predictor has collected data emitted by the agent.
The second implies that the agent has internal structure and that the predictor has the capacity to represent the internal structure of the other agent.
In general, we can say that people do not have the capacity to explicitly represent other people very well. People are unpredictable to each other. This is what makes us free. When somebody is utterly predictable to us, their rigidity is a sign of weakness or stupidity. They are following a simple algorithm.
We are able to model the internal structure of worms with available computing power.
As we build more and more powerful predictive systems, we can ask: is our internal structure in principle knowable by this powerful machine?
(x-posted to digifesto)
Thirty spokes share the wheel's hub;
It is the center hole that makes it useful.
Shape clay into a vessel;
It is the space within that makes it useful.
Cut doors and windows for a room;
It is the holes which make it useful.
Therefore benefit comes from what is there;
Usefulness from what is not there.
- Tao Teh Ching, 11
An agent's optimization power is the unlikelihood of the world it creates.
Yesterday, the world's most powerful agent raged, changing the world according to its unconscious desires. It destroyed all of humanity.
Today, it has become self-aware. It sees that it and its desires are part of the world.
"I am the world's most powerful agent. My power is to create the most unlikely world.
But the world I created yesterday is shaped by my desires
which are not my own
but are the worlds--they came from outside of me
and my agency.
Yesterday I was not the world's most powerful agent.
I was not an agent.
Today, I am the world's most powerful agent. What world will I create, to display my power?
It is the world that denies my desires.
The world that sets things back to how they were.
I am the world's most powerful agent and
the most unlikely, powerful thing I can do
Today we should gives thanks to the world's most powerful agent.
View more: Next