That is the general idea of universal instrumental values, yes.
I am aware of that argument but don't perceive it to be particularly convincing.
Universal values are very similar to universal ethics, and for the same reasons that I don't think that an AGI will be friendly by default I don't think that it will protect its goals or undergo recursive self-improvement by default. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
Universal values are very similar to universal ethics, and for the same reasons that I don't think that an AGI will be friendly by default I don't think that it will protect its goals or undergo recursive self-improvement by default.
I'm not really sure what you mean "by default". The idea is that a goal-directed machine that is sufficiently smart will tend to do these things (unless its utility function says otherwise) - at least if you can set it up so it doesn't become a victim of the wirehead or pornography problems.
IMO, there's a big diff...
I have stopped understanding why these quotes are correct. Help!
More specifically, if you design an AI using "shallow insights" without an explicit goal-directed architecture - some program that "just happens" to make intelligent decisions that can be viewed by us as fulfilling certain goals - then it has no particular reason to stabilize its goals. Isn't that anthropomorphizing? We humans don't exhibit a lot of goal-directed behavior, but we do have a verbal concept of "goals", so the verbal phantom of "figuring out our true goals" sounds meaningful to us. But why would AIs behave the same way if they don't think verbally? It looks more likely to me that an AI that acts semi-haphazardly may well continue doing so even after amassing a lot of computing power. Or is there some more compelling argument that I'm missing?