Sources: An old draft on Oracle AI from Daniel Dewey, conversation with Dewey and Nick Beckstead. See also Thinking Inside the Box and Leakproofing the Singularity.
Can we just create an Oracle AI that informs us but doesn't do anything?
"Oracle AI" has been proposed in many forms, but most proposals share a common thread: a powerful AI is not dangerous if it doesn't "want to do anything", the argument goes, and therefore, it should be possible to create a safe "Oracle AI" that just gives us information. Here, we discuss the difficulties of a few common types of proposed Oracle AI.
Two broad categories can be treated separately: True Oracle AIs, which are true goal-seeking AIs with oracular goals, and Oracular non-AIs, which are designed to be "very smart calculators" instead of goal-oriented agents.
True Oracle AIs
A True Oracle AI is an AI with some kind of oracular goal. Informally proposed oracular goals often include ideas such as "answer all questions", "only act to provide answers to questions", "have no other effect on the outside world", and "interpret questions as we would wish them to be interpreted.” Oracular goals are meant to "motivate" the AI to provide us with the information we want or need, and to keep the AI from doing anything else.
First, we point out that True Oracle AI is not causally isolated from the rest of the world. Like any AI, it has at least its observations (questions and data) and its actions (answers and other information) with which to affect the world. A True Oracle AI interacts through a somewhat low-bandwidth channel, but it is not qualitatively different from any other AI. It still acts autonomously in service of its goal as it answers questions, and it is realistic to assume that a superintelligent True Oracle AI will still be able to have large effects on the world.
Given that a True Oracle AI acts, by answering questions, to achieve its goal, it follows that True Oracle AI is only safe if its goal is fully compatible with human values. A limited interaction channel is not a good defense against a superintelligence.
There are many ways that omission of detail about human value could cause a "question-answering" goal to assign utility to a very undesirable state of the world, resulting in a undesirable future. A designer of an oracular goal must be certain to include a virtually endless list of qualifiers and patches. An incomplete list includes "don't forcefully acquire resources to compute answers, don't defend yourself against shutdown, don't coerce or threaten humans, don't manipulate humans to want to help you compute answers, don't trick the questioner into asking easy questions, don't hypnotize the questioner into reporting satisfaction, don't dramatically simplify the world to make prediction easier, don't ask yourself questions, don't create a questioner-surrogate that asks easy questions," etc.
Since an oracular goal must contain a full specification of human values, the True Oracle AI problem is Friendly-AI-complete (FAI-complete). If we had the knowledge and skills needed to create a safe True Oracular AI, we could create a Friendly AI instead.
Oracular non-AIs
An Oracular non-AI is a question-answering or otherwise informative system that is not goal-seeking and has no internal parts that are goal-seeking, i.e. not an AI at all. Informally, an Oracular non-AI is something like a "nearly AI-complete calculator" that implements a function from input "questions" to output "answers.” It is difficult to discuss the set of Oracular non-AIs formally because it is a heterogeneous concept by nature. Despite this, we argue that many are either FAI-complete or unsafe for use.
In addition to the problems with specific proposals below, many Oracular non-AI proposals are based on powerful metacomputation, e.g. Solomonoff induction or program evolution, and therefore incur the generic metacomputational hazards: they may accidentally perform morally bad computations (e.g. suffering sentient programs or human simulations), they may stumble upon and fail to sandbox an Unfriendly AI, or they may fall victim to ambient control by a superintelligence. Other unknown metacomputational hazards may also exist.
Since many Oracular non-AIs have never been specified formally, we approach proposals on an informal level.
Oracular non-AIs: Advisors
An Advisor is a systems that takes a corpus of real-world data and somehow computes the answer to the informal question "what ought we (or I) to do?". Advisors are FAI-complete because:
- Formalizing the ought-question requires a complete formal statement of human values or a formal method for finding them.
- Answering the ought-question requires a full theory of instrumental decision-making.
Oracular non-AIs: Question-Answerers
A Question-Answerer is a system that takes a corpus of real-world data along with a "question,” then somehow computes the "answer to the question.” To analyze the difficulty of creating a Question-Answerer, suppose that we ask it the question "what ought we (or I) to do?"
- If it can answer this question, the Question-Answerer and the question together are FAI-complete. Either the Question-Answerer can understand the question as-is, or we can rewrite it in a more formal language; regardless, the Question-Answerer and the question together comprise an Advisor, which we previously argued to be FAI-complete.
- If it cannot answer this question, many of its answers are radically unsafe. Courses of action recommended by the Question-Answerer will likely be unsafe, insofar as "safety" relies on the definition of human value. Also, asking questions about the future will turn the Question-Answerer into a Predictor, leading to the problems outlined below.
Of course, if safe uses for a Question-Answerer can be devised, we still have the non-negligible challenge of creating a Question-Answerer without using any goal-seeking AI techniques.
Oracular non-AIs: Predictors
A Predictor is a system that takes a corpus of data and produces a probability distribution over future data. Very accurate and general Predictors may be based on Solomonoff's theory of universal induction.
Very powerful Predictors are unsafe in a rather surprising way: when given sufficient data about the real world, they exhibit goal-seeking behavior, i.e. they calculate a distribution over future data in a way that brings about certain real-world states. This is surprising, since a Predictor is theoretically just a very large and expensive application of Bayes' law, not even performing a search over its possible outputs.
To see why, consider a Predictor P with a large corpus of real-world data. If P is sufficiently powerful and the corpus is sufficiently large, P will infer a distribution that gives very high probability to a model of the world (let’s call it M) that contains a model of P being asked the questions we’re asking it. (It is perfectly possible for a program to model its own behavior, and in fact necessary if the Predictor is to be accurate.)
Suppose now that we ask P to calculate the probability of future data d; call this probability P(d). Since model M has much of P’s distribution’s probability mass, P(d) is approximately equal to the probability of M if M computes d (call this M→d), and zero otherwise. Furthermore, since M contains a model of the Predictor being asked about d, M→d depends on the way P’s “answer” affects M’s execution. This means that P(d) depends on P(d)’s predicted impact on the world; in other words, P takes into account the effects of its predictions on the world, and “selects” predictions that make themselves accurate-- P has an implicit goal that the world ought to match its predictions. This goal does not necessarily align with human goals, and should be treated very carefully.
Probabilistic predictions of future data are a very small output channel, but once again, the ability of a superintelligence to use a small channel effectively should not be underestimated. Additionally, the difficulty of using such a Predictor well (specifying future data strings of interest and interpreting the results) speaks against our ability to keep the Predictor from influencing us through its predictions.
It is not clear that there is any general way to design a Predictor that will not exhibit goal-seeking behavior, short of dramatically limiting the power of the Predictor.
We do indeed have billions of seriously flawed predictors walking around today, and feedback loops between them are not a negligible problem. Going back to that example, we nearly managed to start WW3 all by ourselves without waiting for artificially intelligent assistance. And it's easy to come up with a half a dozen contemporary examples of entire populations thinking "what we're doing to them may be bad, but not as bad as what they'd do to us if we let up".
It's entirely possible that the answer to the Fermi Paradox is that there's a devastatingly bad massively multiplayer Mutually Assured Distruction situation waiting along the path of technological development, one in which even a dumb natural predictor can reason "I predict that a few of them are thinking about defecting, in which case I should think about defecting first, but once they realize that they'll really want to defect, and oh damn I'd better hit that red button right now!" And the next thing you know all the slow biowarfare researchers are killed off by a tailored virus that left the fastest researchers alone (to pick an exaggerated trope out of a hat). Artificial Predictors would make such things worse by speeding up the inevitable.
Even if a situation like that isn't inevitable with only natural intelligences, Oracle AIs might make one inevitable by reducing the barrier to entry for predictions. When it takes more than a decade of dedicated work to become a natural expert on something, people don't want to put in that investment becoming an expert on evil. If becoming an expert on evil merely requires building an automated Question-Answerer for the purpose of asking it good questions, but then succumbing to temptation and asking it an evil question too, proliferation of any technology with evil applications might become harder to stop. Research and development that is presently guided by market forces, government decisions, and moral considerations would instead proceed in the order of "which new technologies can the computer figure out first".
And a Predictor asked to predict "What will we do based on your prediction" is effectively a lobotomized Question-Answerer, for which we can't phrase questions directly, leaving us stuck with whatever implicit questions (almost certainly including "which new technologies can computers figure out first") are inherent in that feedback loop.