This article poses questions on the distinction between Tool AGI and Agent AGI, which was described very concisely by Holden Karnofsky in his recent Thoughts on the Singularity Institute post:
In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.
For me, this instantly raised one question: What if a Tool AGI becomes/is self-aware (which, for the purposes of this post, I define as “able to have goals that are distinct from the goals of the outside world”) and starts manipulating its results in a way that is non-obvious to its user? Or, even worse: What if the Tool AGI makes its user do things (which I do not expect to be much more difficult than succeding in the AI box experiment)?
My first reaction was to flinch away by telling myself: “But of course a Tool would never become self-aware! Self-awareness is too complex to just happen unintentionally!”
But some uncertainty survived and was strenghtened by Eliezer's reply to Holden:
[Tool AGI] starts sounding much scarier once you try to say something more formal and internally-causal like "Model the user and the universe, predict the degree of correspondence between the user's model and the universe, and select from among possible explanation-actions on this basis."
After all, “Self-awareness is too complex to just happen unintentionally!” is just a bunch of English words expressing my personal incredulity. It's not a valid argument.
So, can we make the argument, that self-awareness will not happen unintentionally?
If we can't make that argument, can we stop Tool AGIs from potentially becoming a Weak Agent AGI which acts through its human user?
If we can't do that, how meaningful is the distinction between a Weak Agent AGI (a.k.a. Tool AGI) and an Agent AGI?
For more, see the Tools versus Agents post by Stuart_Armstrong, which points to similar questions.