x

LESSWRONG
LW

Why not tool AI? — LessWrong

20

[ Question ]

Why not tool AI?

19th Jan 2019

AI Alignment Forum

1 min read

20

Ω 6

An extremely basic question that, after months of engaging with AI safety literature, I'm surprised to realize I don't fully understand: why not tool AI?

AI Safety scenarios seem to conceive of AI as an autonomous agent. Is that because of the current machine learning paradigm, where we're setting the AI's goals but not specifying the steps to get there? Is this paradigm the entire reason why AI safety is an issue?

If so, is there a reason why advanced AI would need an agenty utility function sort of set up? Is it just too cumbersome to give step by step instructions for high level tasks?

Thanks!

Why not tool AI?

3Lukas Finnveden

3Mateusz Bagiński

8Charlie Steiner

New Answer

New Comment

2 Answers sorted by
top scoring

Jan 19, 2019*

Ω16520

You mention having looked through the literature; in case you missed any, here's what I think of as the standard resources on this topic.

Gwern's analysis of why Tool AIs want to become Agent AIs. It contains references to many other works. (It also has a positive review from Eliezer.)
Eric Drexler's incredibly long report on Comprehensive AI Services as General Intelligence. Related: Rohin Shah's summary, and Richard Ngo's comment.
The Goals and Utility Functions chapter of Rohin Shah's Value Learning sequence.
Eliezer's very well-written post on Arbital addressing the question "Why Expected Utility?" [Added: crossposted to LessWrong here]
The LessWrong wiki article on Tool AI links to several posts on this topic.

All are very worth reading.

Edit: There was an old discussion between Holden Karnofsky + Jaan Tallinn on Tool AI in yahoo groups, but Yahoo Groups has been deprecated. Here's the page in the wayback machine, but the attachment is not available. I would appreciate someone here leaving a link to that old document, I recall it being quite thoughtful.

After some more reading, particularly the Drexler CAIS report, I realize I was more confused than I thought about Tool vs Agent AI. I think I've resolved it, but I'd appreciate feedback. Would the below be correct?

"Most sophisticated software behaves like both a Tool and an Agent, at different times. Google Maps reports possible routes like a tool, but it searches for paths to maximize a utility function like an agent. DeepMind might ultimately select the move that maximizes its winning probability, but it follows some set rules in how it fr... (read more)

4TAG7y

Out of the two implicit definitions of agent: "maximises UF" and "affects outside world (without explicit being told to)", the second is the only one that is relevant to AI safety, and the one that is used by the actual AI community. IOW, trying to bring in the first definition just causes confusion.

1smithee7y

Didn't realize that, but it makes complete sense. Thanks.

[-]Kaj_Sotala7y60

Also, we discussed Tool AI as a subcategory of Oracle AI in section 5.1. of Responses to Catastrophic AGI Risk; our conclusion:

... it seems like Oracle AIs could be a useful stepping stone on the path toward safe, freely acting AGIs. However, because any Oracle AI can be relatively easily turned into a free-acting AGI and because many people will have an incentive to do so, Oracle AIs are not by themselves a solution to AGI risk, even if they are safer than free-acting AGIs when kept as pure oracles.

[-]Lukas Finnveden7y30

Eric Drexler's report on comprehensive AI services also contains relevant readings. Here is Rohin's summary of it.

4Ben Pace7y

Thanks, this example was so big and recent that I forgot it. Have added it to my answer.

5smithee7y

Thanks to all! Very useful reading, particularly Gwern.

[-]Bird Concept4yΩ120

Jaan/Holden convo link is broken :(

3Mateusz Bagiński1y

https://gwern.net/doc/existential-risk/2011-05-10-givewell-holdenkarnofskyjaantallinn.doc

Charlie Steiner

Jan 20, 2019

80

Any time you have a search process (and, let's be real, most of the things we think of as "smart" are search problems), you are setting a target but not specifying how to get there. I think the important sense of the word "agent" in this context is that it's a process that searches for an output based on the modeled consequences of that output.

For example, if you want to colonize the upper atmosphere of Venus, one approach is to make an AI that evaluates outputs (e.g. text outputs of persuasive arguments and technical proposals) based on some combined metric of how much Venus gets colonized and how much it costs. Because it evaluates outputs based on their consequences, it's going to act like an agent that wants to pursue its utility function at the expense of everything else.

Call the above output "the plan" - you can make a "tool AI" that still outputs the plan without being an agent!

Just make it so that the plan is merely part of the output - the rest is composed according to some subprogram that humans have designed for elucidating the reasons the AI chose that output (call this the "explanation"). The AI predicts the results as if its output was only the plan, but what humans see is both the plan and the explanation, so it's no longer fulfilling the criterion for agency above.

In this example, the plan is a bad idea in both cases - the thing you programmed the AI to search for is probably something that's bad for humanity when taken to an extreme. It's just that in the "tool AI" case, you've added some extra non-search-optimized output that you hope undoes some of the work of the search process.

Making your search process into a tool by adding the reason-elucidator hopefully made it less disastrously bad, but it didn't actually get you a good plan. The problems that you need to solve to get a superhumanly good plan are in fact the same problems you'd need to solve to make the agent safe.

(Sidenote: This can be worked around by giving your tool AI a simplified model of the world and then relying on humans to un-simplify the resulting plan, much like Google Maps makes a plan in an extremely simplified model of the world and then you follow something that sort of looks like that plan. This workaround fails when the task of un-simplifying the plan becomes superhumanly difficult, i.e. right around when things get really interesting, which is why imagining a Google-Maps-like list of safe abstract instructions might be building a false intuition.)

In short, to actually find out the superintelligently awesome plan to solve a problem, you have to have a search process that's looking for the plan you want. Since this sounds a lot like an agent, and an unfriendly agent is one of the cases we're most concerned about, it's easy and common to frame this in terms of an agent.

11

20

Ω 6

Rendering 0/7 comments, sorted by

(show more) Click to highlight new comments since: Today at 7:57 PM

More from smithee

Curated and popular this week