Why would an AI try to figure out its goals?

cousin_it

21 Why would an AI try to figure out its goals?

9th Nov 2011

1 min read

21

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?

Personal Blog

21

New Comment

Rendering 0/88 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:56 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from cousin_it

Curated and popular this week

88Comments

Why would an AI try to figure out its goals? — LessWrong

Comment Permalink

Vladimir_Nesov15y60

Saying that there is an agent refers (in my view; definition for this thread) to a situation where future events are in some sense expected to be optimized according to some goals, to the extent certain other events ("actions") control those future events. There might be many sufficient conditions for that in terms of particular AI designs, but they should amount to this expectation.

So an agent is already associated with goals in terms of its actual effect on its environment. Given that agent's own future state (design) is an easily controlled part of the environment, it's one of the things that'll be optimized, and given that agents are particularly powerful incantations, it's a good bet that future will retain agent-y patterns, at least for a start. If future agent has goals different from the original, this by the same definition says that the future will be optimized for different goals, and yet in a way controllable by original agent's actions (through the future agent). This contradicts that the original agent is an agent (with original goals). And since the task of constructing future agent includes specification of goals, original agent needs to figure out what they are.

Showing 3 of 4 replies (Click to show all)

Tyrrell_McAllister15y30

And since the task of constructing future agent includes specification of goals . . .

There seems to be a leap, here. An agent, qua agent, has goals. But is it clear that the historical way in which the future-agent is constructed by the original agent must pass through an explicit specification of the future-agent's goals? The future-agent could be constructed that way, but must it? (Analogously, a composite integer has factors, but a composite can be constructed without explicitly specifying its factors.)

-1cousin_it15y

Omohundro's paper says: [...] It's not obvious to me why any of these systems would be "agents" under your definition. So I guess your definition is too strong. My question stands.

2XiXiDu15y

If you added general intelligence and consciousness to IBM Watson, where does the urge to refine or protect its Jeopardy skills come from? Why would it care if you pulled the plug on it? I just don't see how optimization and goal protection are inherent features of general intelligence, agency or even consciousness.

See in context