Why would an AI try to figure out its goals?

cousin_it

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws

Hmm. This reminds me of my recent discussion with Matt M. about constraints.

Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties.

Identifying constraints and then rejecting optimisation-based explanations just doesn't follow, IMHO.

if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn't work; regardless of what level you look at, there is no function such that humans will say "yes, this is my utility function, and I care about nothing but maximizing it."

...and at this point, I usually just cite Dewey:

Any agent can be expressed as an. O-maximizer (as we show in Section 3.1),

This actually only covers any computable agent.

Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness - not truth seeking.

Any agent can be expressed as an. O-maximizer

Not just any agent, but any entity. A leaf blown on the wind can be thought of as optimizing the function of following the trajectory dictated by the laws of physics. Which is my point: if you broaden a theory to the point where it can explain anything whatsoever, then it makes no useful predictions.

21

Why would an AI try to figure out its goals?

21

21

21

Why would an AI try to figure out its goals?

21

21