Tool/Agent distinction in the light of the AI box experiment

Jost

-7

Tool/Agent distinction in the light of the AI box experiment

15th Jul 2012

1 min read

-7

This article poses questions on the distinction between Tool AGI and Agent AGI, which was described very concisely by Holden Karnofsky in his recent Thoughts on the Singularity Institute post:

In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

For me, this instantly raised one question: What if a Tool AGI becomes/is self-aware (which, for the purposes of this post, I define as “able to have goals that are distinct from the goals of the outside world”) and starts manipulating its results in a way that is non-obvious to its user? Or, even worse: What if the Tool AGI makes its user do things (which I do not expect to be much more difficult than succeding in the AI box experiment)?

My first reaction was to flinch away by telling myself: “But of course a Tool would never become self-aware! Self-awareness is too complex to just happen unintentionally!”

But some uncertainty survived and was strenghtened by Eliezer's reply to Holden:

[Tool AGI] starts sounding much scarier once you try to say something more formal and internally-causal like "Model the user and the universe, predict the degree of correspondence between the user's model and the universe, and select from among possible explanation-actions on this basis."

After all, “Self-awareness is too complex to just happen unintentionally!” is just a bunch of English words expressing my personal incredulity. It's not a valid argument.

So, can we make the argument, that self-awareness will not happen unintentionally?

If we can't make that argument, can we stop Tool AGIs from potentially becoming a Weak Agent AGI which acts through its human user?

If we can't do that, how meaningful is the distinction between a Weak Agent AGI (a.k.a. Tool AGI) and an Agent AGI?

For more, see the Tools versus Agents post by Stuart_Armstrong, which points to similar questions.

Personal Blog

-7

New Comment

Rendering 0/6 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:11 PM

Moderation Log

-7

Tool/Agent distinction in the light of the AI box experiment

by Jost

15th Jul 2012

1 min read

-7

This article poses questions on the distinction between Tool AGI and Agent AGI, which was described very concisely by Holden Karnofsky in his recent Thoughts on the Singularity Institute post:

In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

My first reaction was to flinch away by telling myself: “But of course a Tool would never become self-aware! Self-awareness is too complex to just happen unintentionally!”

But some uncertainty survived and was strenghtened by Eliezer's reply to Holden:

[Tool AGI] starts sounding much scarier once you try to say something more formal and internally-causal like "Model the user and the universe, predict the degree of correspondence between the user's model and the universe, and select from among possible explanation-actions on this basis."

After all, “Self-awareness is too complex to just happen unintentionally!” is just a bunch of English words expressing my personal incredulity. It's not a valid argument.

So, can we make the argument, that self-awareness will not happen unintentionally?

If we can't make that argument, can we stop Tool AGIs from potentially becoming a Weak Agent AGI which acts through its human user?

If we can't do that, how meaningful is the distinction between a Weak Agent AGI (a.k.a. Tool AGI) and an Agent AGI?

For more, see the Tools versus Agents post by Stuart_Armstrong, which points to similar questions.