Related to The True Prisoner's Dilemma, Let's split the cake, lengthwise, upwise and slantwise, If you don't know the name of the game, just tell me what I mean to you
tl;dr: Playing the true PD, it might be that you should co-operate when expecting the other one to defect, or vice versa, in some situations, against agents that are capable of superrationality. This is because relative weight of outcomes for both parties might vary. This could lead this sort of agents to outperform even superrational ones.
So, it happens that our benevolent Omega has actually an evil twin, that is as trustworthy as his sibling, but abducts people into a lot worse hypothetical scenarios. Here we have one:
You wake up in a strange dimension, and this Evil-Omega is smiling at you, and explains that you're about to play a game with unknown paperclip maximizer from another dimension that you haven't interacted with before and won't interact with ever after. The alien is like GLUT when it comes to consciousness, it runs a simple approximation of rational decision algorithm but nothing that you could think of as "personality" or "soul". Also, since it doesn't have a soul, you have absolutely no reason to feel bad for it's losses. This is true PD.
You are also told some specifics about the algorithm that the alien uses to reach its decision, and likewise told that alien is told about as much about you. At this point I don't want to nail the algorithm the opposing alien uses down to one specific. We're looking for a method that wins when summing up all these possibilities. Next, especially, we're looking at the group of AI's that are capable of superrationality, since against other's the game is trivial.
The payoff matrix is like this:
DD=(lose 3 billion lives and be tortured, lose 4 paperclips), CC=(2 billion lives and be made miserable, lose 2 paperclips), CD=(lose 5 billion lives and be tortured a lot, nothing), DC=(nothing, lose 8 paperclips)
So, what do you do? Opponent is capable of superrationality. In the post "The True Prisoner's Dilemma", it was(kinda, vaguely, implicitly) assumed for simplicity's sake that this information is enough to decide whether to defect or not. Answer, based on this information, could be to co-operate. However, I argue that information given is not enough.
Back to the hypothetical: In-hypothetical you is still wondering about his/her decision, but we zoom out and observe that, unbeknownst to you, Omega has abducted your fellow LW reader and another paperclip maximizer from that same dimension, and is making them play PD. But this time their payoff matrix is like this:
DD=(lose $0.04, make 2 random, small changes to alien's utility function and 200 paperclips lost), CC=(lose $0.02, 1 change, 100 paperclips), CD=(lose $0.08, nothing), DC=(nothing, 4 changes, 400 paperclips)
Now, if it's not "rational" to take the relative loss into account, we're bound to find ourselves in a situation where billions of humans die. You could be regretting your rationality, even. It should become obvious now that you'd wish you could somehow negotiate both of these PD's so that you would defect and your opponent co-operate. You'd be totally willing to take a $0.08 hit for that, maybe paying it in its entirety for your friend. And so it happens, paperclip maximizers would also have an incentive to do this.
But, of course, players don't know about this entire situation, so they might not be able to operate in optimal way in this specific scenario. However, if they take into account how much the other cares about those results, using some unknown method, they just might be able to systematically perform better(if we made more of this sorts of problems, or if we selected payoffs at random for the one-shot game), than "naive" PD-players playing against each other. Naivity here would imply that they simply and blindly co-operate against equally rational opponents. How to achieve that is the open question.
-
Stuart Armstrong, for example, has an actual idea of how to co-operate when the payoffs are skewed, while I'm just pointing out that there's a problem to be solved, so this is not really news or anything. Anyways, I still think that this topic has not been explored as much as it should be.
Edit. Added this bit: You are also told some specifics about the algorithm that the alien uses to reach its decision, and likewise told that alien is told about as much about you. At this point I don't want to nail the algorithm the opposing alien uses down to one specific. We're looking for a method that wins when summing up all these possibilities. Next, especially, we're looking at the group of AI's that are capable of superrationality, since against other sort of agents the game is trivial.
Edit. Corrected some huge errors here and there, like, mixing hypothetical you and hypothetical LW-friend.
Edit. Transfer Discussion -> Real LW complete!
Everyone does, the problem is that the whole area of several steps around its literal meaning has serious problems. "But souls don't exist! But so what if someone doesn't have a soul tag, it's not morally relevant! But so what if the presence of souls influences empathy/eternal life/etc., this reason doesn't screen off other sources of moral value!" Only when you've gone all the way to "The other agent doesn't have moral value.", it starts making sense, but then you should've just said so, instead of pretending an argument.
But I'd think if I only said "It doesn't have moral value in itself", you'd still have to go back similar steps to find that property cluster that we assign value. I tried to transfer both ideas by using the word soul and claiming lack of moral value.