"So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives
This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?
Thanks for clarifying. I think Steve is using "sufficiently powerful" to mean "sufficiently intelligent" - and quite a few definitions of intellligence are all to do with being goal-directed.
The main reason most humans don't murder people to get what they want is because prison sentences confllict with their goals - not because they are insufficiently goal-directed, IMO. They are constrained by society's disapproval and act within those constraints. In warfare, soociety approves, and then the other people actually do die.
Most creatures are as goal-directed as evolution can make them. It is true that there are parasites and symbiotes that mean that composite systems are sometimes optimising mulltiple goals simultaneously. Memetic parasites are quite significant for humans - but they will probably be quite significant for intelligent machines as well. Systems with parasites are not seriously inconsistent with a goal-directed model. From the perspective of such a model, parasites are part of the environment.
Machines that are goal directed until their goal is complete are another real possibility - besides open-ended optimisation. However, while their goal is incomplete, goal directed models would seem to be applicable.
Of the seventy-some definitions of intelligence that had been gathered last count, most have something to do with achieving goals. That is a very different thing from being goal-directed (which has several additional requirements, the most obvious being an explicit representation of one's goals).
Would you murder your next-door neighbor if you thought... (read more)