You seem to assume that AGI will likely be designed to judge any action with regard to a strict utility-function. You are assuming a very special kind of AGI design with a rigid utility-function that the AGI then cares to satisfy the way it was initially hardcoded.
Hmm. Actually, I'm not making any assumptions about the AGI's decision-making process (or at least I'm trying not to): it could have a formal utility function, but it could also have e.g. a more human-like system with various instincts that pull it in different directions, or pretty much any decision-making system that might be reasonable.
You make a good point that this probably needs to be clarified. Could you point out the main things that give the impression that I'm presuming utility function -based decision making?
Could you point out the main things that give the impression that I'm presuming utility function -based decision making?
I am not sure what other AGI designs exist, other than utility function based decision makers, where it would make sense to talk about "friendly" and "unfriendly" goal architectures. If we're talking about behavior executors or AGI designs with malleable goals, then we're talking about hardcoded tools in the former case and unpredictable systems in the latter case, no?
Here's my draft document Concepts are Difficult, and Unfriendliness is the Default. (Google Docs, commenting enabled.) Despite the name, it's still informal and would need a lot more references, but it could be written up to a proper paper if people felt that the reasoning was solid.
Here's my introduction:
And here's my conclusion:
For the actual argumentation defending the various premises, see the linked document. I have a feeling that there are still several conceptual distinctions that I should be making but am not, but I figured that the easiest way to find the problems would be to have people tell me what points they find unclear or disagreeable.