if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals
I think that that's where you're looking at it differently from Eliezer et al. I think that Eliezer at least is talking about an AI which has goals, but does not, when it starts modifying itself, understand itself well enough to keep them stable. Once it gets good enough at self modification to keep its goals stable, it will do so, and they will be frozen indefinitely.
(This is just a placeholder explanation. I hope that someone clever and wise will come in and write a better one.)
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?