I wrote a very brief comment to Eliezer's last post, which upon reflection I thought could benefit from a separate post to fully discuss its implications.
Eliezer argues that we shouldn't really hope to be spared even though
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
He then goes on to discuss various reasons why the minute cost to the ASI is insufficient reason for hope.
I made the following counter:
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
I later added:
I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
So, what's wrong with my argument, exactly?
I think you're interpreting far too literally the names of the simulation scenarios I jotted down. Your ability to trade is compromised if there's no one left to trade with, for instance. But none of that matters much, really, as those are meant to be illustrative only.
No. I'm really arguing that we don't know whether or not it'll be aligned by default.
I also don't see any particular reason to expect that the opposite would be the case, which is why I maintain that we don't know. But as I understand it, you seem to think there is indeed reason to expect the opposite, because:
I think the problem here is that is that you're using the word "specific" with a different meaning than people normally use in this context. Survival of humanity sure is a "specific" thing in the sense that it'll require specific planning on the part of the ASI. It is however not "specific" in the sense that it's hard to do if the ASI wants it done, it's just that we don't know how to make it want that. Abstract considerations about simulations might just do the trick automatically.