"Ten years ago, everyone was talking about superintelligence, the singularity, the robot apocalypse. What happened?"
What is this referencing? I was only 10 years old in 2009 but I have a strong impression that AI risk gets a lot more attention now than it did then.
Also, what are the most salient differences between CAIS and the cluster of concepts Karnofsky and others were calling "Tool AI"?
It might also be worth comparing CAIS and "tool AI" to Paul Christiano's IDA and the desiderata MIRI tends to talk about (task-directed AGI [1,2,3], mild optimization, limited AGI).
At a high level, I tend to think of Christiano and Drexler as both approaching alignment from very much the right angle, in that they're (a) trying to break apart the vague idea of "AGI reasoning" into smaller parts, and (b) shooting for a system that won't optimize harder (or more domain-generally) than we need for a given task. From conversations with Nate, one way I'd summarize MIRI-cluster disagreements with Christiano and Drexler's proposals is that MIRI people don't tend to think these proposals decompose cognitive work enough. Without a lot more decomposition/understanding, either the system as a whole won't be capable enough, or it will be capable by virtue of atomic parts that are smart enough to be dangerous, where safety is a matter of how well we can open those black boxes.
In my experience people use "tool AI" to mean a bunch of different things, including things MIRI considers very important and useful (like "only works on a limited task, rather than putting any cognitive work into more general topics or trying to open-endedly optimize the future") as well as ideas that don't seem relevant or that obscure where the hard parts of the problem probably are.
A lot of the distinction between a service and an agent seems to rest on the difference between thinking and doing. Is there a well-defined concept of action for intelligent agents?
AGIs will have a causal model of the world. If their own output is part of that model, and they work forward from there to the real-world consequences of their outputs, and they choose outputs partly based on those consequences, then it's an agent by (my) definition. The outputs are called "actions" and the consequences are called "goals". In all other cases, then I'd call it a service, unless I'm forgetting about some edge cases.
A system whose only output is text on a screen can be either a service or an agent, depending on the computational process generating the text. A simple test is that if there's a weird, non-obvious way to manipulate the people reading the text (according the everyday, bad-connotation sense of "manipulate"), would the system take advantage of it? Agents would do so (by default, unless they had a complicated goal involving ethics etc.), services would not by default.
Nobody knows how to build a useful AI capable of world-modeling and formulating intelligent plans but which is not an agent, although I'm personally hopeful that it might be possible by self-supervised learning (cf. Self-Supervised Learning and AGI safety).
This sounds like we're resting on an abstract generalization of 'outputs.' Is there any work being done to distinguish between different outputs, and consider how a computer might recognize a kind it doesn't already have?
Right, I was using "output" in a broad sense of "any way that the system can causally impact the rest of the world". We can divide that into "intended output channels" (text on a screen etc.) and "unintended output channels" (sending out radio signals using RAM etc.). I'm familiar with a small amount of work on avoiding unintended output channels (e.g. using homomorphic encryption or fancy vacuum-sealed Faraday cage boxes).
Usually the assumption is that a superintelligent AI will figure out what it is, and where it is, and how it works, and what all its output channels are (both intended and unintended), unless there is some strong reason to believe otherwise (example). I'm not sure this answers your question ... I'm a bit confused at what you're getting at.
I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI. My intuition, driven by Moravec's Paradox, is that each new kind of output (or input) has a pretty hefty computational threshold associated with it, so I suspect that the details of the initial inputs/outputs will have a big influence on the risk any given service or agent presents.
The reason I am interested in this is that it feels like doing things has no intrinsic connection to learning things, and the we only link them because so much of our learning and doing is unconscious. That is to say, I suspect actions are orthogonal to intelligence.
Regarding "computational threshold", my working assumption is that any given capability X is either (1) always and forever out of reach of a system by design, or (2) completely useless, or (3) very likely to be learned by a system, if the system has long-term real-world goals. Maybe it takes some computational time and effort to learn it, but AIs are not lazy (unless we program them to be). AIs are just systems that make good decisions in pursuit of a goal, and if "acquiring capability X" is instrumentally helpful towards achieving goals in the world, it will probably make that decision if it can (cf. "Instrumental convergence").
If I have a life goal that is best accomplished by learning to use a forklift, I'll learn to use a forklift, right? Maybe I won't be very fluid at it, but fine, I'll operate it more slowly and deliberately, or design a forklift autopilot subsystem, or whatever...
A lot of the distinction between a service and an agent seems to rest on the difference between thinking and doing.
That doesn't seem right to me. There are several, potentially subtle differences between services and agents – the boundary (or maybe even 'boundaries') are probably nebulous at high resolution.
A good prototypical service is Google Translate. You submit text to it to translate and it outputs a translation as text. It's both thinking and doing but the 'doing' is limited – it just outputs translated text.
A good prototypical agent is AlphaGo. It pursues a goal, to win a game of Go, but does so in a (more) open-ended fashion than a service. It will continue to play as long as it can.
Down-thread, you wrote:
I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI.
I think one thing to point out up-front is that a lot of current AI systems are generated or built in a stage distinct from the stage in which they 'operate'. A lot of machine learning algorithms involve a distinct period of learning, first, which produces a model. That model can then be used – as a service. The model/service would do something like 'tell me if an image is of a hot dog'. Or, in the case of AlphaGo, something like 'given a game state X, what next move or action should be taken?'.
What makes AlphaGo an agent is that it's model is operated in a mode whereby it's continually fed a sequence of game states, and, crucially, both its output controls the behavior of a player in the game, and the next game state its given depends on it's previous output. It becomes embedded or embodied via the feedback between its output, player behavior, and its subsequence input, a game state that includes the consequences of its previous output.
But, we're still missing yet another crucial ingredient to make an agent truly (or at least more) dangerous – 'online learning'.
Instead of training a model/service all at once up-front, we could train it while it acts as an agent or service, i.e. 'online'.
I would be very surprised if an AI installed to control a robotic arm would gain control of drones or be able to trade stocks, but just because I would expect such an AI to not use online learning and to be overall very limited in terms of what inputs with which it's provided (e.g. the position of the arm and maybe a camera covering its work area) and what outputs to which it has direct access (e.g. a sequence of arm motions to be performed).
Probably the most dangerous kind of tool/service AI imagined is an oracle AI, i.e. an AI to which people would pose general open-ended questions, e.g. 'what should I do?'. For oracle AIs, I think some other (possibly) key dangerous ingredients might be present:
https://slatestarcodex.com/2019/08/27/book-review-reframing-superintelligence/
A takeaway: