(Update 2022: Enjoy the post, but note that it’s old, has some errors, and is certainly not reflective of my current thinking. –Steve)
Low confidence; offering this up for discussion
An Oracle AI is an AI that only answers questions, and doesn't take any other actions. The opposite of an Oracle AI is an Agent AI, which might also send emails, control actuators, etc.
I'm especially excited about the possibility of non-self-improving oracle AIs, dubbed Tool AI in a 2012 article by Holden Karnofsky.
I've seen two arguments against this "Tool AI":
- First, as in Eliezer's 2012 response to Holden, we don't know how to safely make and operate an oracle AGI (just like every other type of AGI). Fair enough! I never said this is an easy solution to all our problems! (But see my separate post for why I'm thinking about this.)
- Second, as in Gwern's 2016 essay, there's a coordination problem. Even if we could build a safe oracle AGI, the argument goes, there will still be an economic incentive to build an agent AGI, because you can do more and better and faster by empowering the AGI to take actions. Thus, agreeing to never ever build agent AGIs is a very hard coordination problem for society. I don't find the coordination argument compelling—in fact, I think it's backwards—and I wrote this post to explain why.
Five reasons I don't believe the coordination / competitiveness argument against oracles
1. If the oracle isn't smart or powerful enough for our needs, we can solve that by bootstrapping. Even if the oracle is not inherently self-modifying, we can ask it for advice and do human-in-the-loop modifications to make more powerful successor oracles. By the same token, we can ask an oracle AGI for advice about how to design a safe agent AGI.
2. Avoiding coordination problems is a pipe dream; we need to solve the coordination problem at some point, and that point might as well be at the oracle stage. As far as I can tell, we will never get to a stage where we know how to build safe AGIs and where there is no possibility of making more-powerful-and-less-safe AGIs. If we have a goal in the world that we really really want to happen, a low-impact agent is going to be less effective than a not-impact-restrained agent; an act-based agent is going to be less effective than a goal-seeking agent;[1] and so on and so forth. It seems likely that, no matter how powerful a safe AGI we can make, there will always be an incentive for people to try experimenting with even more powerful unsafe alternative designs.
Therefore, at some point in AI development, we have to blow the whistle, declare that technical solutions aren't enough, and we need to start relying 100% on actually solving the coordination problem. When is that point? Hopefully far enough along that we realize the benefits of AGI for humanity—automating the development of new technology to help solve problems, dramatically improving our ability to think clearly and foresightedly about our decisions, and so on. Oracles can do all that! So why not just stop when we get to AGI oracles?
Indeed, once I started thinking along those lines, I actually see the coordination argument going in the other direction! I say restricting ourselves to oracle AI make coordination easier, not harder! Why is that? Two more reasons:
3. We want a high technological barrier between us and the most dangerous systems: These days, I don't think anyone takes seriously the idea of building an all-powerful benevolent dictator AGI implementing CEV. [ETA: If you do take that idea seriously, see point 1 above on bootstrapping.] At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs. (That certainly sounds like a good idea to me!) Thus, the biggest coordination problem we face is: "Don't ever make a human-out-of-the-loop free-roaming AGI world-optimizer." This is made easier by having a high technological barrier between the safe AGIs that we are building and using, and the free-roaming AGI world-optimizers that we are forbidding. If we make an agent AGI—whether corrigible, aligned, norm-following, low-impact, or whatever—I just don't see any technological barrier there. It seems like it would be trivial for a rogue employee to tweak such an AGI to stop asking permission, deactivate the self-restraint code, and go tile the universe with hedonium at all costs (or whatever that rogue employee happens to value). By contrast, if we stop when we get to oracle AI, it seems like there would be a higher technological barrier to turning it into a free-roaming AGI world-optimizer—probably not that high a barrier, but higher than the alternatives. (The height of this technological barrier, and indeed whether there's a barrier at all, is hard to say.... It probably depends on how exactly the oracles are constructed and access-controlled.)
4. We want a bright-line, verifiable rule between us and the most dangerous systems: Even more importantly, take the rule:
"AGIs are not allowed to do anything except output pixels onto a screen."
This is a nice, simple, bright-line rule, which moreover has at least a chance of being verifiable by external auditors. By contrast, if we try to draw a line through the universe of agent AGIs, defining how low-impact is low-impact enough, how act-based is act-based enough, and so on, it seems to me like it would inevitably be a complicated, blurry, and unenforceable line. This would make a very hard coordination problem very much harder still.
[Clarifications on this rule: (A) I'm not saying this rule would be easy to enforce (globally and forever), only that it would be less hard than alternatives; (B) I'm not saying that, if we enforce this rule, we are free and clear of all possible existential risks, but rather that this would be a very helpful ingredient along with other control and governance measures; (C) Again, I'm presupposing here that we succeed in making superintelligent AI oracles that always give honest and non-manipulative answers; (D) I'm not saying we should outlaw all AI agents, just that we should outlaw world-modeling AGI agents. Narrow-AI robots and automated systems are fine. (I'm not sure exactly how that line would be drawn.)]
Finally, one more thing:
5. Maybe superintelligent oracle AGI is "a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that." (copying from this Paul Christiano post). I hate this argument. It's a cop-out. It's an excuse to recklessly plow forward with no plan and everything at stake. But I have to admit, it seems to have a kernel of truth...
See Paul's research agenda FAQ section 0.1 for things that act-based agents are unlikely to be able to do. ↩︎
Maybe; but there also seems to be a general consensus that humans should be kept in the loop when doing any important decisions in general; yet there are also powerful incentives pushing various actors to automate their modern-day autonomous systems. In particular, there are cases where not having a human in the loop is an advantage by itself, because it e.g. buys you a faster reaction time (see high-frequency trading).
From "Disjunctive Scenarios of Catastrophic AI Risk":
Suppose that you have a powerful government or corporate actor which has been spending a long time upgrading its AI systems to be increasingly powerful, and achieved better and better gains that way. Now someone shows up and says that they shouldn't make [some set of additional upgrades], because that would push it to the level of a general intelligence, and having autonomous AGIs is bad. I would expect them to do everything in power to argue that no, actually this is still narrow AI, doing these upgrades and keeping the system in control of their operations are fine - especially if they know that failing to do so is likely to confer an advantage to one of their competitors.
The problem is related to one discussed by Goertzel & Pitt (2012): it seems unlikely that governments would ban narrow AI or restrict its development, but there's no clear dividing line between narrow AI and AGI, meaning that if you don't restrict narrow AI then you can't restrict AGI either.
It does seem that regulation of AI, should it become necessary, basically has to take the form of regulating access to computer chips. Supercomputers (and server farms) are relatively expensive. You can't make your own in your basement. Production is centralized at a few locations and so it would not be terribly difficult to track who they're sold to. They also use lots of electricity, making it easier to track down people who have acquired lots of them illicitly.
I think it's likely that the computing power required for dangerous AGI will r... (read more)