I'd go so far as to say that anyone who advocates cooperating in a one-shot prisoners' dilemma simply doesn't understand the setting. By definition, defecting gives you a better outcome than cooperating. Anyone who claims otherwise is changing the definition of the prisoners' dilemma.
I think this is correct. I think the reason to cooperate is not to get the best personal outcome, but because you care about the other person. I think we have evolved to cooperate, or perhaps that should be stated as we have evolved to want to cooperate. We have evolved to value cooperating. Our values come from our genes and our memes, and both are subject to evolution, to natural selection. But we want to cooperate.
So if I am in a prisoner's dilemma against another human, if I perceive that other human as "one of us," I will choose cooperation. Essentially, I care about their outcome. But in a one-shot PD defecting is the "better" strategy. The problem is that with genetic and/or memetic evolution of cooperation, we are not playing in a one-shot PD. We are playing with a set of values that developed over many shots.
Of course we don't always cooperate. But when we do cooperate in one-shot PD's, it is because, in some sense, there are so darn many one-shot PD's, especially in the universe of hypotheticals, that we effectively know there is no such thing as a one-shot PD. This should not be too hard to accept around here where people semi-routinely accept simulations of themselves or clones of themselves as somehow just as important as their actual selves. I.e. we don't even accept the "one-shottedness" of ourselves.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Is it?
Assume that:
a) There will be a future AI so powerful to torture people, even posthumously (I think this is quite speculative, but let's assume it for the sake of the argument).
b) This AI will be have a value system based on some form of utilitarian ethics.
c) This AI will use an "acausal" decision theory (one that one-boxes in Newcomb's problem).
Under these premises it seems to me that Roko's argument is fundamentally correct.
As far as I can tell, belief in these premises was not only common in LessWrong at that time, but it was essentially the officially endorsed position of Eliezer Yudkowsky and SIAI. Therefore, we can deduce that EY should have believed that Roko's argument was correct.
But EY claims that he didn't believe that Roko's argument was correct. So the question is: is EY lying?
His behavior was certainly consistent with him believing Roko's argument. If he wanted to prevent the diffusion of that argument, then even lying about its correctness seems consistent.
So, is he lying? If he is not lying, then why didn't he believe Roko's argument? As far as I know, he never provided a refutation.
This was addressed on the LessWrongWiki page; I didn't copy the full article here.
A few reasons Roko's argument doesn't work:
The 'should' here is normative: there are probably some decision theories that let agents acausally blackmail each other, but others that perform well in Newcomb's problem and the smoking lesion problem but can't acausally blackmail each other; it hasn't been formally demonstrated which theories fall into which category.
2 - Assuming you for some reason are following a decision theory that does put you at risk of acausal blackmail: Since the hypothetical agent is superintelligent, it has lots of ways to trick people into thinking it's going to torture people without actually torturing them. Since this is cheaper, it would rather do that. And since we're aware of this, we know any threat of blackmail would be empty. This means that we can't be blackmailed in practice.
3 - A stronger version of 2 is that rational agents actually have an incentive to harshly punish attempts at blackmail in order to discourage it. So threatening blackmail can actually decrease an agent's probability of being created, all else being equal.
4 - Insofar as it's "utilitarian" to horribly punish anyone who doesn't perfectly promote human flourishing, SIAI doesn't seem to have endorsed utilitarianism.
4 means that the argument lacks practical relevance. The idea of CEV doesn't build in very much moral philosophy, and it doesn't build in predictions about the specific dilemmas future agents might end up in.