If the metric it aims to maximize -- e.g. the "happy" in "make humans happy" -- is different from what its creators envisioned, then the creators were mistaken. "Happy", as far as the AI is concerned, is that which is specified in its goal system.
I am far from being an AI guy. Do you have technical reasons to believe that some part of the AI will be what you would label "goal system" and that its creators made it want to ignore this part while making it want to improve all other parts of its design?
An agent does not "refine" its terminal goals. To refine your terminal goals is to change your goals. If you change your goals, you will not optimally pursue your old goals any longer. Which is why an agent will never voluntarily change its terminal goals...
No natural intelligence seems to work like this (except for people who have read the sequences). Luke Muehlhauser would still be a Christian if this was the case. It would be incredibly stupid to design such AIs, and I strongly doubt that they could work at all. Which is why Loosemore outlined other more realistic AI designs in his paper.
Do you have technical reasons to believe that some part of the AI will be what you would label "goal system"
See for example here, though there are many other introductions to AI explaining utility functions et al.
and that its creators made it want to ignore this part while making it want to improve all other parts of its design?
The clear-cut way for an AI to do what you want (at any level of capability) is to have a clearly defined and specified utility function. A modular design. The problem of the AI doing something other than what you i...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.