Your comment seems absolutely right, I have no idea where the whole 'turn itself off' thing came from.
I doubt diminishing returns would come into effect. Examples like Graham's number and Conway Chain Arrow notation seem to be strong evidence that the task of 'store the biggest number possible' does not run into diminishing returns but instead achieves accelerating returns of truly mind-boggling proportions.
However, I have to admit that I think the whole idea is rubbish. The main problem is that the author is confusing two different tasks "maximise the extent to which the future meets my future preferences" and "maximise the extent to which the future meets my current preferences".
To explain what I mean more rigorously, suppose we have an AI with a utility function U0, which is considering whether or not it should alter its utility function to a new function U1. It extrapolates possible futures and deduces that if it sticks with U0 the universe will end up in state A, whereas if it switches to U1 the universe will end up in state B, (e.g. if U0 is paper-clip maximising then A contains a lot of paper-clips).
"Maximise the extent to which the future meets my future preferences" means it will switch if and only if U1(B) > U0(A)
As the article points out, it is very easy to find a U1 which meets this criterion, simply define U1(x) = U0(x) + 1 (actions are unaffected by affine transforms of utility functions so B=A for this choice of U1).
"Maximise the extent to which the future meets my current preferences" means it will switch if and only if U0(B) > U0(A)
This criterion is much more demanding, for example U1(x) = U0(x) + 1 clearly no longer works.
I suspect that for most internally consistent utility functions this criterion is impossible to satisfy (thought experiment; is there any utility function a paper-clip maximiser could switch to which would result in a universe containing more paper-clips?).
Even if I am wrong about it being mostly impossible, it is not an especially worrying problem. I would have no problem with an FAI switching to a new utility function which was even more friendly than the one we gave it.
Of course, you could program an AI to do either of the tasks, but there are a number of reasons why I consider the second to be better. Firstly, for all the reasons the article gives, it is more likely to do whatever you wanted it to do. Secondly it is more general since the former can be given as a special case of the latter.
The article's mistake is right there in the title, it fails to break out of the rather anthropomorphic reward/punishment mode of thinking.
Your comment seems absolutely right, I have no idea where the whole 'turn itself off' thing came from.
Suzanne is proposing that that's (essentially) what happens to wireheads when they finger their reward signal - they collapse in an ecstatic heap.
In reality, there are, of course, other types of wirehead behaviour to consider. The heroin addict doesn't exactly collapse in a corner when looking for their next fix.
Link: physicsandcake.wordpress.com/2011/01/22/pavlovs-ai-what-did-it-mean/
Suzanne Gildert basically argues that any AGI that can considerably self-improve would simply alter its reward function directly. I'm not sure how she arrives at the conclusion that such an AGI would likely switch itself off. Even if an abstract general intelligence would tend to alter its reward function, wouldn't it do so indefinitely rather than switching itself off?
If it wants to maximize its reward by increasing a numerical value, why wouldn't it consume the universe doing so? Maybe she had something in mind along the lines of an argument by Katja Grace:
Link: meteuphoric.wordpress.com/2010/02/06/cheap-goals-not-explosive/
I am not sure if that argument would apply here. I suppose the AI might hit diminishing returns but could again alter its reward function to prevent that, though what would be the incentive for doing so?
ETA:
I left a comment over there:
ETA #2:
What else I wrote: