Consider Alice, the mad computer scientist. Alice has just solved general artifical intelligence and the alignment problem. On her computer she has two files, each containing a seed for a superintelligent AI, one of them is aligned with human values, the other one is a paperclip maximizer. The two AIs only differ in their goals/values, the rest of the algorithms, including decision procedures, are identical.
Alice decides to flipp a coin. If the coin comes up heads, she starts the friendly AI, and if it comes up tails, she starts the paperclip maximizer.
The coin comes up heads. Alice starts the friendly AI, and everyone rejoice. Some years later the friendly AI learns about the coinflip and of the paperclip maximizer.
Should the friendly AI counterfactually cooperate with the paperclip maximizer?
What does various decision theories say in this situation?
What do you think is the correct answer?
Depends on what value the FAI places on human flourishing in hypothetical alternate realities I guess. If it's focused on the universe it's in then there's no reason to waste half of it on paperclips. If it's trying to help out the people living in a universe where the paperclip maximizer got activated then it should cooperate. I guess a large part of that is also about whether it determines there really are parallel universes or not to be concerned about.