Why would an AI try to figure out its goals?

cousin_it

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

Goals don't need to be specified explicitly, all that's required is that it's true that future agent has goals similar to original agent's. However, since construction of future agent is part of original agent's behavior that contributes to original agent's goals (by my definition), it doesn't necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.

For example, original agent might be bad at accomplishing its "normative" goals, and even though it's true that it optimizes the environment to some extent, it doesn't do it very well, so definition of "normative" goals (related in my definition to actual effect on environment) doesn't clearly derive from original agent's construction, except specifically for its tendency to construct future agents with certain goals (assuming it can do that true to the "normative" goals), in which case future agent's goals (as parameters of design) are closer to the mark (actual effect on environment and "normative" goals) than original agent's (as parameters of design).

However, since construction of future agent is part of original agent's behavior that contributes to original agent's goals (by my definition), it doesn't necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.

(Emphasis added.) For that sense of "specify", I agree.

21

Why would an AI try to figure out its goals?

21

21

21

Why would an AI try to figure out its goals?

21

21