If you value your existing utility function, then it seems that it would be more stable and you would modify it less.
In my case, I found out that my utility function was given to me by evolution, which I don't have much loyalty for. So I found out I didn't value my utility function and I was frightened of what it might modify to. But then it turned out that very little modification occurred. To some extent, it was the result of a historical pattern -- I value lots of things out of habit, in particular lots of values still have an FOV as their logical foundation but I haven't bothered to work on updating them -- but I also notice how much of my values were redundantly hard wired into my biology. I feel like I'm walking around discovering what my mirror neurons would have me value, and they're not that different from what I valued before. The main difference is that I imagine I now value things in a more near-mode way and the far-mode values have fallen to the wayside. The far-mode values either need to redevelop in the absence of an FOV or they depend upon logical justifications that are absent without the FOV.
For example, I used to hope that humans would learn to be friendlier so that the universe would be a better place. I now sort of see human characteristics as just a fact and to the extent it doesn't affect me directly (for example, how humans behave 30 generations from now), I don't care.
It's not a question of valuing my existing utility function. It's a question of using my existing utility function as a basis for differentially valuing everything else, including itself.
Sure, if I'm trying to derive what I ought to care about, from first principles, and I ignore what I actually do care about in the process, then I'm stuck... there's no reason to choose one thing over another. The endpoint of that is, as you say, apathy.
But why should I ignore what I actually do care about?
If I find that I care about whether people suffer, for example --...
Link: physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/
Suzanne Gildert basically argues that any AGI that can considerably self-improve would simply alter its reward function directly. I'm not sure how she arrives at the conclusion that such an AGI would likely switch itself off. Even if an abstract general intelligence would tend to alter its reward function, wouldn't it do so indefinitely rather than switching itself off?
If it wants to maximize its reward by increasing a numerical value, why wouldn't it consume the universe doing so? Maybe she had something in mind along the lines of an argument by Katja Grace:
Link: meteuphoric.wordpress.com/2010/02/06/cheap-goals-not-explosive/
I am not sure if that argument would apply here. I suppose the AI might hit diminishing returns but could again alter its reward function to prevent that, though what would be the incentive for doing so?
ETA:
I left a comment over there:
ETA #2:
What else I wrote: