Consider Alice, the mad computer scientist. Alice has just solved general artifical intelligence and the alignment problem. On her computer she has two files, each containing a seed for a superintelligent AI, one of them is aligned with human values, the other one is a paperclip maximizer. The two AIs only differ in their goals/values, the rest of the algorithms, including decision procedures, are identical.
Alice decides to flipp a coin. If the coin comes up heads, she starts the friendly AI, and if it comes up tails, she starts the paperclip maximizer.
The coin comes up heads. Alice starts the friendly AI, and everyone rejoice. Some years later the friendly AI learns about the coinflip and of the paperclip maximizer.
Should the friendly AI counterfactually cooperate with the paperclip maximizer?
What does various decision theories say in this situation?
What do you think is the correct answer?
DagonGod, you are clearly not getting the point here, which is that the vN-M theorem that defines utility is not compatible with you declaring values of your utility function. If you do that, you are no longer talking about the same concept of "utility".