Is there somewhere where I can find this stuff written up, and discover exactly what is known to be (provably, or at least that's the impression I get from what you write) achievable by auditing?
I remain gently skeptical despite your very confident tone because, e.g., we don't currently know how to make any sort of superintelligent machine, and I at least would be rather surprised by any theorem along the lines of "given any reasonable class of superintelligent agents, there is a reasonably straightforward way to make a superintelligent agent in this class that can be shown to be innocuous by means of auditing that ordinary human beings are capable of doing reliably".
For the avoidance of doubt, I have no difficulty at all in believing e.g. that there are auditing techniques that will guarantee (or very very nearly guarantee) that a particular agent is performing a particular computational process; I would be only modestly surprised to find that there are techniques that will verify that a particular agent is in some sense optimizing a particular objective function; but the difficulties of keeping a superintelligent AI from doing terrible things are much more complicated and include e.g. tremendous difficulty in working out what it is we really want optimized, and what computational processes we really want carried out.
Perhaps it would be useful to get a bit more concrete. Could you give an example of the sort of thing we might want a superintelligent AI to do for us, that we can't "obviously" make it do safely without the techniques you have in mind, and explain how those techniques enable us to make it do that thing safely?
How are you saving the world? Please, let us know!
Whether it is solving the problem of death or teaching rationality, one of the correlated phenomena of being less wrong is making things better. Given the value many of us place on altruism, this extends beyond just ourselves and into that question of, “How can I make The Rest better?” The rest of my community. The rest of my country. The rest of my species. The rest of my world. To word it in a less other-optimizing way: How can I save the world?
So, tell us how you are saving the world. Not how you want to save the world. Not how you plan to. How you are, actively, saving the world. It doesn’t have to be “I invented a friendly AI,” or “I reformed a nation’s gender politics” or “I perfected a cryonics reviving process.” It can be a simple goal (“I taught a child how to recognize when they use ad hominen” or "I stopped using as much water to shower") or a simple action as part of a larger plan (such as “I helped with a breakthrough on reducing gas emissions in cars by five percent”).
If we accept this challenge of saving the world, then let us be open and honest with our progress. Let us put our successes on display and our shortcomings as well, so that both can be recognized, recommended, and, if need be, repaired.
If you are not doing anything to save the world, even something as simple as “learning about global risks” or “encouraging others to research a topic before deciding on it”? Then find something. Find a goal and work for it. Find an act that needs doing and do it.
Then tell us about it.