I wrote a very brief comment to Eliezer's last post, which upon reflection I thought could benefit from a separate post to fully discuss its implications.
Eliezer argues that we shouldn't really hope to be spared even though
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
He then goes on to discuss various reasons why the minute cost to the ASI is insufficient reason for hope.
I made the following counter:
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
I later added:
I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
So, what's wrong with my argument, exactly?
While I understand what you were trying to say, I think it's important to notice that:
Killing all humans without being noticed will still satisfy this condition.
Killing all humans after trading with them in some way will still satisfy this condition
Killing all humans with any other way except X will still satisfy this condition.
Sadly for us, survival of humanity is a very specific thing. This is just the whole premise of the alignment problem once again.
Aren't you arguing that AI will be aligned by default? This seems to be a very different position that being completely unsure what happens.
Total probability of all the simulation hypothesises that reward AI for courses of action that lead to not killing humans has to exceed the total probability of all simulation hypothesises that reward AI for courses of action that erradicate humanity, so that all humans were not killed. As there is no particular reason to expect that it's the case, your simulation argument doesn't work.