I wrote a very brief comment to Eliezer's last post, which upon reflection I thought could benefit from a separate post to fully discuss its implications.
Eliezer argues that we shouldn't really hope to be spared even though
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
He then goes on to discuss various reasons why the minute cost to the ASI is insufficient reason for hope.
I made the following counter:
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
I later added:
I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
So, what's wrong with my argument, exactly?
I see your argument. You are saying that "maximal reward", by definition, is something that gives us the maximum utility from all possible actions, and so, by definition, it is our purpose in life.
But actually, utility is a function of both the action (getting two golden bricks) and what it rewards (murdering my child), not merely a function of the action itself (getting two golden bricks).
And so it happens that for many possible demands that I could be given ("you have to murder your child"), there are no possible rewards that would give me more utility than not obeying the command.
For that reason, simply because someone will maximally reward me for obeying them doesn't make their commands my objective purpose in life.
Of course, we can respond "but then, by definition, they aren't maximally rewarding you" and by that definition, it would be a correct statement to make. The problem here is that the set of all possible commands for which I can't (by that definition) be maximally rewarded is so vast that the statement "if someone maximally rewards/punishes you, their orders are your purpose of life" becomes meaningless.