Sure. But there is a historical pattern here, as well. If I construct a new utility function for myself, I will do so in such a way as to optimize its utility according to my pre-existing utility function (for the same reason I do everything else that way). I'm not starting out in a vacuum.
If you value your existing utility function, then it seems that it would be more stable and you would modify it less.
In my case, I found out that my utility function was given to me by evolution, which I don't have much loyalty for. So I found out I didn't value my utility function and I was frightened of what it might modify to. But then it turned out that very little modification occurred. To some extent, it was the result of a historical pattern -- I value lots of things out of habit, in particular lots of values still have an FOV as their logical foun...
Link: physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/
Suzanne Gildert basically argues that any AGI that can considerably self-improve would simply alter its reward function directly. I'm not sure how she arrives at the conclusion that such an AGI would likely switch itself off. Even if an abstract general intelligence would tend to alter its reward function, wouldn't it do so indefinitely rather than switching itself off?
If it wants to maximize its reward by increasing a numerical value, why wouldn't it consume the universe doing so? Maybe she had something in mind along the lines of an argument by Katja Grace:
Link: meteuphoric.wordpress.com/2010/02/06/cheap-goals-not-explosive/
I am not sure if that argument would apply here. I suppose the AI might hit diminishing returns but could again alter its reward function to prevent that, though what would be the incentive for doing so?
ETA:
I left a comment over there:
ETA #2:
What else I wrote: