You are viewing revision 1.1.0, last edited by
Eliezer YudkowskyOne of the things we almost certainly don't want our AI to do, unless we're extremely confident that it is extremely robust and value-aligned, is have it think about and try to model alien civilizations that might contain superintelligences or potential simulators. Among the potential problems that would result could be:
- The AI ends up effectively internally simulating a hostile superintelligence, and that simulated superintelligence manages to "break out of the box" in any number of possible ways. The rule against having any hostile superintelligences anywhere near us should apply to simulated hostile superintelligences inside the AI, for the same reason, unless we are extremely confident of our AI's value alignment and internal robustness.
- The AI becomes subject to Christiano's hack and begins to expect that it is probably being simulated, hence that the AI's 'true' environment or the true causes of its experiences are what the simulated superintelligence wants them to be.
- The AI becomes subject to blackmail, e.g., the AI models what a simulated paperclip maximizer would do, and concludes that a simulated paperclip maximizer will do (very bad thing) unless the AI modifies its utility function to tile the universe with paperclips and then hides this fact from its programmers.
- The AI commits mindcrime in the course of modeling an alien civilization that would contain sapient beings.
Since there's no known task that actually requires a non-Sovereign AI to think about distant superintelligences, it seems like we should probably react to this possibility by figuring out how to design the first AI such that it just does not think about aliens, period. This would require averting an instrumental pressure and epistemic question that the AI would otherwise naturally consider in the course of, e.g., considering likely explanations for the Fermi Paradox.