One of the main principles of engineering safety is multilevel defence. When a nuclear bomb accidentally fell from the sky in the US, 3 of 4 defence levels failed. The last one prevented the nuclear explosion: https://en.wikipedia.org/wiki/1961_Goldsboro_B-52_crash
Multilevel defence is used a lot in the nuclear industry and includes different systems of passive and active safety, starting from the use of delayed neutrons for the reaction activation and up to control rods, containment building and exclusion zones.
Here, I present a look at the AI safety from the point of view of multilevel defence. This is mainly based on two of my yet unpublished articles: “Global and local solutions to AI safety” and “Catching treacherous turn: multilevel AI containment system”.
The special property of the multilevel defence, in the case of AI, is that the biggest defence comes from only the first level, which is AI alignment. Other levels have progressively smaller chances to provide any protection, as the power of self-improving AI will grow after it will break of each next level. So we may ignore all levels after AI alignment, but, oh Houston, we have a problem: based on the current speed of AI development, it seems that powerful and dangerous AI could appear within several years, but AI safety theory needs several decades to be created.
The map is intended to demonstrate a general classification principle of the defence levels in AI safety, but not to list all known ideas on the topic. I marked in “yellow” boxes, which are part of the plan of MIRI according to my understanding.
I also add my personal probability estimates as to whether each level will work (under the condition that AI risks are the only global risk, and previous levels have failed).
The principles of the construction of the map are similar to my “plan of x-risks prevention” map and my “immortality map”, which are also based around the idea of the multilevel defence.
pdf: https://goo.gl/XH3WgK
I hadn't thought about it that way.
I do think that either compiler time flags for the AI system or a second 'monitor' system chained to the AI system in order to enforce the named rules would probably limit the damage.
The broader point is that probabilistic AI safety is probably a much more tractable problem than absolute AI safety for a lot of reasons, to further the nuclear analogy, emergency shutdown is probably a viable safety measure for a lot of the plausible 'paperclip maximizer turns us into paperclips' scenarios.
"I need to disconnect the AI safety monitoring robot from my AI-enabled nanotoaster robot prototype because it keeps deactivating it" might still be the last words a human ever speaks, but hey, we tried.
There seems to be a complexity limit to what humans can build. A full GAI is likely to be somewhere beyond that limit.
The usual solution to that problem -- see the EY's fooming scenario -- is to make the process recursive: let a mediocre AI improve itself, and as it gets better it can improve itself more rapidly. Exponential growth can go fast and far.
This, of course, gives rise to another problem: you have no idea what the end product is going to look like. If you're looking at the gazillionth iteration, your compiler flags were probably lost around the t... (read more)