I don't think you can dismiss the "then those aren't really top-level goals" argument as easily as you are trying to.
I wasn't trying to dismiss it, I was trying to refute it.
Sure, if you design an AI to do nothing but collect coins then it will not decide to go off and be a poet and forget about collecting coins. As you said, the failure mode to be more worried about is that it decides to convert the entire solar system into coins, or to bring about a stock market crash so that coins are worth less, or something.
Though ... if you have an AI system with substantial ability to modify itself, or to make replacements for itself, in pursuit of its goals, then it seems to me you do have to worry about the possibility that this modification/replacement process can (after much iteration) produce divergence from the original goals. In that case the AI might become a poet after all.
(Solving this goal-stability problem is one of MIRI's long-term research projects, AIUI.)
I'm wondering whether we're at cross purposes somehow, because it seems like we both think what we're saying in this thread is "LW orthodoxy" and we both think we disagree with one another :-). So, for the avoidance of doubt,
I guess I'm confused then. It seems like you are agreeing that computers will only do what they are programmed to do. Then you stipulate a computer programmed not to change its goals. So...it won't change its goals, right?
Like:
Objective A: Never mess with these rules Objective B: Collect Paperclips unless it would mess with A.
Researchers are wondering how we'll make these 'stick', but the fundamental notion of how to box someone whose utility function you get to write is not complicated. You make it want to stay in the box, or rather, the box is made o...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.