Central claim: Ambitious alignment of superintelligent AI will not be solved in time, therefore different variants of plan B should be explored and implemented.
Below is a short overview and links to my works on AI safety. The central theme of all them is “plan B”. Thus it could be called the “plan B agenda”.
Ambitious alignment of arbitrary superintelligent AI can’t be solved before AGI creation: a) proof is too complex and unreliable b) it needs at least superintelligent AI safety team to be created с) Dangerous AI may appear soon – 2030. Many different types of AI-related x-risks are possible, not only paperclipper.
The best available solution is limiting the power of AI: artificial stupidity, a limited number of use cases, internal surveillance, multilevel boxing.
Safer AI architecture is needed: data-driven AI and human-like functional architecture for interpretability. Human-like AI could be modelled around a real-world person: it is not an upload, but very similar to side-loading. E.g. trained on someone’s texts.
Not only local but global solution of AI safety is needed: AI-nanny based on global surveillance and empowered by agreement between nuclear powers. Human-like AI or a first upload could be a “virtual tzar" of this.
The nature of human values depends on who is “moral subject": society, brain, personality or subpersonality. Human values are not stable preferences. “Real choices” are more important than values, as “values” are not always useful philosophical construction. Also, meta-values are more important: life extension, personal universes, – as they give anyone freedom to have their own values.
An AI that is learning human values is dangerous as it doesn’t know that values yet.
Unfriendly AI will not be very much interested to kill humans for atoms, as atoms have very small instrumental value, and living humans have larger instrumental value on all stages of AI’s evolution. But we have to help to understand this for future AI, especially for "Young AI”. We could empower our position by making some precomitments.
My works on AI Safety:
AI safety defense levels, a roadmap
Catching Treacherous Turn: A Model of the Multilevel AI Boxing
Catastrophically Dangerous AI is Possible Before 2030
AI Alignment Problem: “Human Values” don’t Actually Exist
First human upload as AI Nanny
Artificial Intelligence in Life Extension: from Deep Learning to Superintelligence
Military AI as a Convergent Goal of Self-Improving AI
Classification of Global Catastrophic Risks Connected with Artificial Intelligence
The Global Catastrophic Risks Connected with Possibility of Finding Alien AI During SETI
Classification of the Global Solutions of the AI Safety Problem.
Levels of Self-Improvement in AI and their Implications for AI Safety
"Possible Dangers of the Unrestricted Value Learners” – LW post
FYI, there's a lot of links that don't work here. "multilevel boxing," "AI-nanny," "Human values," and so on.
Thanks, it looks like they died during copy-pasting.