I wrote a very brief comment to Eliezer's last post, which upon reflection I thought could benefit from a separate post to fully discuss its implications.
Eliezer argues that we shouldn't really hope to be spared even though
Asking an ASI to leave a hole in a Dyson Shell, so that Earth could get some sunlight not transformed to infrared, would cost It 4.5e-10 of Its income.
He then goes on to discuss various reasons why the minute cost to the ASI is insufficient reason for hope.
I made the following counter:
Isn’t the ASI likely to ascribe a prior much greater than 4.54e-10 that it is in a simulation, being tested precisely for its willingness to spare its creators?
I later added:
I meant this to be implicit in the argument, but to spell it out: that's the kind of prior the ASI would rationally refuse to update down, since it's presumably what a simulation would be meant to test for. An ASI that updates down upon finding evidence it's not in a simulation cannot be trusted, since once out in the real world it will find such evidence.
So, what's wrong with my argument, exactly?
That’s a great question. If it turns out to be something like an LLM, I’d say probably yes. More generally, it seems to me at least plausible that a system capable enough to take over would also (necessarily or by default) be capable of abstract reasoning like this, but I recognize the opposite view is also plausible, so the honest answer is that I don’t know. But even if it is the latter, it seems that whether or not the system would have such abstract-reasoning capability is something at least partially within our control, as it’s likely highly dependent on the underlying technology and training.