A secure operating system for governed matter doesn't need to take the form of a powerful optimization process
This seems non-obvious. (So I'm surprised to see you state it as if it was obvious. Unless you already wrote about the idea somewhere else and are expecting people to pick up the reference?) If we want the "secure OS" to stop posthumans from running private hell simulations, it has to determine what constitutes a hell simulation and successfully detect all such attempts despite superintelligent efforts at obscuration. How does it do that without being superintelligent itself?
nor does verification of transparent agents trusted to run at root level
This sounds interesting but I'm not sure what it means. Can you elaborate?
Hm, that's true. Okay, you do need enough intelligence in the OS to detect certain types of simulations / and/or the intention to build such simulations, however obscured.
If you can verify an agent's goals (and competence at self-modification), you might be able to trust zillions of different such agents to all run at root level, depending on what the tiny failure probability worked out to quantitatively.
One open question in AI risk strategy is: Can we trust the world's elite decision-makers (hereafter "elites") to navigate the creation of human-level AI (and beyond) just fine, without the kinds of special efforts that e.g. Bostrom and Yudkowsky think are needed?
Some reasons for concern include:
But if you were trying to argue for hope, you might argue along these lines (presented for the sake of argument; I don't actually endorse this argument):
The basic structure of this 'argument for hope' is due to Carl Shulman, though he doesn't necessarily endorse the details. (Also, it's just a rough argument, and as stated is not deductively valid.)
Personally, I am not very comforted by this argument because:
Obviously, there's a lot more for me to spell out here, and some of it may be unclear. The reason I'm posting these thoughts in such a rough state is so that MIRI can get some help on our research into this question.
In particular, I'd like to know: