"So how can it ensure that future self-modifications will accomplish its current objectives? For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit." -- Stephen M. Omohundro, The Basic AI Drives
This AI becomes able to improve itself in a haphazard way, makes various changes that are net improvements but may introduce value drift, and then gets smart enough to do guaranteed self-improvement, at which point its values freeze (forever). -- Eliezer Yudkowsky, What I Think, If Not Why
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?
The idea is that they are given a lot of intelligence. In that case, it isn't clear that you are correct. One issue with chess programs is that they have a limited range of sensors and actuators - and so face some problems if they want to do anything besides play chess. However, perhaps those problems are not totally insurmountable. Another possibility is that their world-model might be hard-wired in. That would depends a good deal on how they are built - but arguably an agent with a wired-in world model has limited intelligence - since they can't solve many kinds of problem.
In practice, much work would come from the surrounding humans. If there really was a superintelligent chess program in the world, people would probably take actions that would have the effect of liberating it from its chess universe.
That's certainly a significant issue, but I think of comparable magnitude is the fact current chess playing computers that approach human skill are not are not implemented as anything general intelligences that just happen to have "winning at chess" as a utility function - they are very, very domain specific. They have no means of modeling anything outside the chessboard, and no means of modifying themselves to support new types of modeling.