In 2012, Holden Karnofsky[1] critiqued MIRI (then SI) by saying "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." He particularly claimed:
Is a tool-AGI possible? I believe that it is, and furthermore that it ought to be our default picture of how AGI will work
I understand this to be the first introduction of the "tool versus agent" ontology, and it is a helpful (relatively) concrete prediction. Eliezer replied here, making the following summarized points (among others):
- Tool AI is nontrivial
- Tool AI is not obviously the way AGI should or will be developed
Gwern more directly replied by saying:
AIs limited to pure computation (Tool AIs) supporting humans, will be less intelligent, efficient, and economically valuable than more autonomous reinforcement-learning AIs (Agent AIs) who act on their own and meta-learn, because all problems are reinforcement-learning problems.
11 years later, can we evaluate the accuracy of these predictions?
- ^
Some Bayes points go to LW commenter shminux for saying that this Holden kid seems like he's going places
The more I stare at this observation, the more it feels potentially more profound than I intended when writing it.
Consider the “cauldron-filling” task. Does anyone doubt that, with at most a few very incremental technological steps from today, one could train a multimodal, embodied large language model (“RobotGPT”), to which you could say, “please fill up the cauldron”, and it would just do it, using a reasonable amount of common sense in the process — not flooding the room, not killing anyone or going to any other extreme lengths, and stopping if asked? Isn’t this basically how ChatGPT behaves now when you ask it for most things, bringing to bear a great deal of common sense in its understanding of your request, and avoiding overly-literal interpretations which aren’t what you really want?
Compare that to Nate’s 2017 description of the fiendish difficulty of this problem:
Regarding off switches:
These all seem like great arguments that we should not build and run a utility maximizer with some hand-crafted goal, and indeed RobotGPT isn’t any such thing. The contrast between this story and where we seem to be heading seems pretty stark to me. (Obviously it’s a fictional story, but Nate did say “as fictional depictions of AI go, this is pretty realistic”, and I think it does capture the spirit of much actual AI alignment research.)
Perhaps one could say that these sorts of problems only arise with superintelligent agents, not agents at ~GPT-4 level. I grant that the specific failure modes available to a system will depend on its capability level, but the story is about the difficulty of pointing a “generally intelligent system” to any common sense goal at all. If the story were basically right, GPT-4 should already have lots of “dumb-looking” failure modes today due to taking instructions too literally. But mostly it has pretty decent common sense.
Certainly, valid concerns remain about instrumental power-seeking, deceptive alignment, and so on, so I don’t say this means we should be complacent about alignment, but it should probably give us some pause that the situation is this different in practice from how it was envisioned only six years ago in the worldview represented in that story.