Why would an AI try to figure out its goals?

cousin_it

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?

"So how can it ensure that future self-modiﬁcations will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modiﬁcations are unlikely to preserve them. Systems will therefore be motivated to reﬂect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives

This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why

I have stopped understanding why these quotes are correct. Help!

Hmm - so: publicly soliciting personally identifiable expressions of murderous intent is probably not the best way of going about this

It was a rhetorical question. I'm confident the answer is no - the law only works when most people are basically honest. We think we have a goal, and so we do by the ordinary English meaning of the word, but then there are things we are not prepared to do to achieve it, so it turns out what we have is not a goal by the ultimate criterion of decision theory on which Omohundro draws, and if we try to rescue the overuse of decision theory by appealing to a broader goal, it still doesn't work; regardless of what level you look at, there is no function such that humans will say "yes, this is my utility function, and I care about nothing but maximizing it."

The idea of goals in the sense of decision theory is like the idea of particles in the sense of Newtonian physics - a useful approximation for many purposes, provided we remember that it is only an approximation and that if we get a division by zero error the fault is in our overzealous application of the theory, not in reality.

OK - but even plants are optimising. There are multiple optimisation processes

Precisely. There are many optimization processes - and none of them work the way they would need to work for Omohundro's argument to be relevant.

Precisely. There are many optimization processes - and none of them work the way they would need to work for Omohundro's argument to be relevant.

What do you mean exactly? Humans have the pieces for it to be relevant, but have many constraints preventing it from being applicable, such as difficulty changing our brains' design. A mind very like humans' that had the ability to test out new brain components and organizations seems like it would fit it.

-2timtyler15y

Hmm. This reminds me of my recent discussion with Matt M. about constraints. Optimising under constraints is extremely similar to optimising some different function that incorporates the constraints as utility penalties. Identifying constraints and then rejecting optimisation-based explanations just doesn't follow, IMHO. [...] ...and at this point, I usually just cite Dewey: [...] This actually only covers any computable agent. Humans might reject the idea that they are utility maximisers, but they are. Their rejection is likely to be signallling their mysteriousness and wonderousness - not truth seeking.

21

Why would an AI try to figure out its goals?

21

21

21

Why would an AI try to figure out its goals?

21

21