And since the task of constructing future agent includes specification of goals . . .
There seems to be a leap, here. An agent, qua agent, has goals. But is it clear that the historical way in which the future-agent is constructed by the original agent must pass through an explicit specification of the future-agent's goals? The future-agent could be constructed that way, but must it? (Analogously, a composite integer has factors, but a composite can be constructed without explicitly specifying its factors.)
Goals don't need to be specified explicitly, all that's required is that it's true that future agent has goals similar to original agent's. However, since construction of future agent is part of original agent's behavior that contributes to original agent's goals (by my definition), it doesn't necessarily make sense for the agent to prove that goals are preserved, it just needs to be true that they are (to some extent), more as an indication that we understand original agent correctly than a consideration that it takes into account.
For example, original a...
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?