Will sentient, self-interested agents ever be free from the existential risks of UFAI/intelligence amplification without some form of oversight? It's nice to think that humanity will grow up and learn how to get along, but even if that's true for 99.9999999% of humans that leaves 7 people from today's population who would probably have the power to trigger their own UFAI hard takeoff after a FAI fixes the world and then disappears. Even if such a disaster could be stopped it is a risk probably worth the cost of keeping some form of FAI around indefinitely. What FAI becomes is anyone's guess but the need for what FAI does will probably not go away. If we can't trust humans to do FAI's job now, I don't think we can trust humanity's descendents to do FAI's job either, just from Loeb's theorem. I think it is unlikely that humans will become enough like FAI to properly do FAI's job. They would essentially give up their humanity in the process.
Thanks for explaning the reasoning!
I do agree that it seems quite likely that even in the long run, we may not want to modify ourselves so that we are perfectly dependable, because it seems like that would mean getting rid of traits we want to keep around. That said, I agree with Eliezer's reply about why this doesn't mean we need to keep an FAI around forever; see also my comment here.
I don't think Löb's theorem enters into it. For example, though I agree that it's unlikely that we'd want to do so, I don't believe Löb's theorem would be an obstacle to mod...
One open question in AI risk strategy is: Can we trust the world's elite decision-makers (hereafter "elites") to navigate the creation of human-level AI (and beyond) just fine, without the kinds of special efforts that e.g. Bostrom and Yudkowsky think are needed?
Some reasons for concern include:
But if you were trying to argue for hope, you might argue along these lines (presented for the sake of argument; I don't actually endorse this argument):
The basic structure of this 'argument for hope' is due to Carl Shulman, though he doesn't necessarily endorse the details. (Also, it's just a rough argument, and as stated is not deductively valid.)
Personally, I am not very comforted by this argument because:
Obviously, there's a lot more for me to spell out here, and some of it may be unclear. The reason I'm posting these thoughts in such a rough state is so that MIRI can get some help on our research into this question.
In particular, I'd like to know: