Consider Alice, the mad computer scientist. Alice has just solved general artifical intelligence and the alignment problem. On her computer she has two files, each containing a seed for a superintelligent AI, one of them is aligned with human values, the other one is a paperclip maximizer. The two AIs only differ in their goals/values, the rest of the algorithms, including decision procedures, are identical.
Alice decides to flipp a coin. If the coin comes up heads, she starts the friendly AI, and if it comes up tails, she starts the paperclip maximizer.
The coin comes up heads. Alice starts the friendly AI, and everyone rejoice. Some years later the friendly AI learns about the coinflip and of the paperclip maximizer.
Should the friendly AI counterfactually cooperate with the paperclip maximizer?
What does various decision theories say in this situation?
What do you think is the correct answer?
I'm not sure it can be assumed that the deal is profitable for both parties. The way I understand risk aversion is that it's a bug, not a feature; humans would be better off if they weren't risk averse (they should self-modify to be risk neutral if and when possible, in order to be better at fulfilling their own values).