Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

You know, you're right.

I was thrown off by the word "precommit", which implies a reflectively inconsistent strategy, which is TDT-anathema. On the other hand, rational agents win, so having that strategy does make sense in that case, despite the fact that we might incur negative utility relative to playing submissively if we had to actually carry it out.

The solution, I think, is to be "the type of agent who would be ruthlessly vindictive against opponents who have enough predictive capability to see that I'm this type of agent, and enough strategic capability to accept that this means they gain nothing by defecting against me." That makes it a reflectively consistent part of a decision theory, by keeping the negative-utility behavior in the realm of the pure counterfactual. As long as you know that having that strategy will effectively deter the other player, I think it can work.

And if not, or if I've made an error in some detail of my reasoning of how to make it work, I'm fairly confident at this point that an ideal TDT-agent could find a valid way to address the problem case in a reflectively consistent and strategically sound manner.

I can certainly empathize with that statement. And if my opponent is not only dominating in ability but exploiting that advantage to the point where I'm losing just as much by submitting as I would by exacting punishment, then that's the tipping point where I start hitting back. Of course, I'd attempt retaliatory behavior initially when I was unsure how dominated I was, as well, but once I know that the opponent is just that much better than me, and as long as they're not abusing that advantage to the point where retaliation becomes cost-effective, then I'd have to concede my opponent's superiority, grit my teeth, bend over, and take one for the team. Especially with a 1 million human lives per util ratio. With lives at stake, I shut up and multiply.

If I judge the probability that I am a simulation or equivalent construct to be greater than 1/499500, yes.

(EDIT: Er, make that 1/999000, actually. What's the markup code for strikethrough 'round these parts?)

(EDIT 2: Okay, I'm posting too quickly. It should be just 10^-6, straight up. If I'm a figment then the $1000 isn't real disutility.)

(EDIT 3: ARGH. Sorry. 24 hours without sleep here. I might not be the sim, duh. Correct calculations:

u(pay|sim) = 10^6; u(~pay|sim) = 0; u(pay|~sim) = -1000; u(~pay|~sim) = 0

u(~pay) = 0; u(pay) = P(sim) 10^6 - P(~sim) (1000) = 1001000 * P(sim) - 1000

pay if P(sim) > 1/1001.

Double-checking... triple-checking... okay, I think that's got it. No... no... NOW that's got it.)

To be clear:

  • Are both I and my simulation told this is a one-time offer?

  • Is a simulation generated whether the real coin is heads or tails?

  • Are both my simulation and I told that one of us is a simulation?

  • Does the simulation persist after the choice is made?

I suppose the second and fourth points don't matter particularly... as long as the first and third are true, then I consider it plus EV to pay the $1000.

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

Compute the probabilities P(0)..P(n) that this deal will be offered to you again n times in the future. Sum over 499500 P(n) (n) for all n and agree to pay if the sum is greater than 1,000.

Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote....

...Every majority coalition and division of the pie, is dominated by another majority coalition in which each agent of the new majority gets more pie. There does not appear to be any such thing as a dominant majority vote.

I suggest offering the following deal at the outset:

"I offer each of you the opportunity to lobby for an open spot in a coalition with me, to split the pie equally six ways, formed with a mutual promise that we will not defect, and if any coalition members do defect, we agree to exclude them from future dealings and remain together as a voting bloc, offering the defectors' spots to the remaining agents not originally aligned with us, for a 1/6 + epsilon share, the cost of the excess portion divided among those of us remaining. I will award spots in this coalition to the five of you who are most successful at convincing me you will adhere to these terms."

Here's yet another problem whose proper formulation I'm still not sure of, and it runs as follows. First, consider the Prisoner's Dilemma. Informally, two timeless decision agents with common knowledge of the other's timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do. In general, on the True Prisoner's Dilemma, facing an opponent who can accurately predict your own decisions, you want to cooperate only if the other agent will cooperate if and only if they predict that you will cooperate. And the other agent is reasoning similarly: They want to cooperate only if you will cooperate if and only if you accurately predict that they will cooperate.

But there's actually an infinite regress here which is being glossed over - you won't cooperate just because you predict that they will cooperate, you will only cooperate if you predict they will cooperate if and only if you cooperate. So the other agent needs to cooperate if they predict that you will cooperate if you predict that they will cooperate... (...only if they predict that you will cooperate, etcetera).

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion, so that you can expect their action to be the symmetrical analogue of your own - in which case (C, C) is preferable to (D, D). But what if you're facing a more general decision problem, with many agents having asymmetrical choices, and everyone wants to have their decisions depend on how they predict that other agents' decisions depend on their own predicted decisions? Is there a general way of resolving the regress?

Yes. You can condition on two prior probabilities: that an agent will successfully predict your actual action, and that an agent will respond in a particular way based on the action they predict you to take. For the solution in the case of the Truly Iterated Prisoner's Dilemma, see here.

(EDIT, 6/18/2011:

On further consideration, my assertion -- that the indicated solution to the Prisoner's Dilemma constitutes a general method for resolving infinite regress in the full class of problems specified -- is a naive oversimplification. The indicated solution to a specific dilemma is suggestive of an area of solution space to search for the general solution or solutions to specific similar problems, but considerable work remains to be done before a general solution to the problem class can be justifiably claimed. I'll analyze the full problem further and see what I come up with.)

Hi. Found the site about a week ago. I read the TDT paper and was intrigued enough to start poring through Eliezer's old posts. I've been working my way through the sequences and following backlinks. The material on rationality has helped me reconstruct my brain after a Halt, Melt and Catch Fire event. Good stuff.

I observe that comments on old posts are welcome, and I notice no one has yet come back to this post with the full formal solution for this dilemma since the publication of TDT. So here it is.

Whatever our opponent's decision algorithm may be, it will either depend to some degree on a prediction of our behavior, or it will not. It can only rationally base its decision on a prediction of our behavior to the extent that it believes a) we will attempt to predict its own behavior; and b) we will only cooperate to the extent that we believe it will cooperate. It will thus be incentivized to cooperate to the extent that it believes we can and will successfully condition our behavior on its own. To the extent that it chooses independently of any prediction of our behavior, its only rational choice is to defect. Any other choices it could make will do worse than the above decisions in all cases, and the following strategy will gain extra utility against any such suboptimal choices, as will become clear.

There are thus two unknown probabilities for us to condition on: The probability that the opponent will choose to cooperate iff it believes we will cooperate, which I'll call P(c), and the probability that the opponent will be able to successfully predict our action, which I'll call P(p).

We want to calculate the utility of cooperating, u(C), and the utility of defecting, u(D), for each relevant case. So we shut up and multiply.

If the opponent is uncooperative (~c), they always defect. Thus u(C|~c) = 0 and u(D|~c) = 1.

In cases where a potentially cooperative opponent successfully predicts our action, we have u(C|c,p) = 2 and u(D|c,p) = 1. When such an opponent guesses our action incorrectly, we have u(C|c,~p) = 0 and u(D|c,~p) = 3.

Thus we have:

u(C) = 2 P(c) P(p)

u(D) = P(~c) + P(c) P(p) + 3 P(c) P(~p) = 1 - P(c) + P(c) P(p) + 3 P(c) (1 - P(p)) = 1 + 2 P(c) - 2 P(c) * P(p)

We consider the one-shot dilemma first. An intelligent opponent can be assumed to have behavioral predictive capabilities at least better than chance (P(p) > 0.5), and perhaps approaching perfection (P(p) ~ 1) if it is a superintelligence. In the worst case, u(C) ~ P(c), and u(D) ~ 1 + P(c), and we should certainly defect. In the best case, u(C) ~ 2 * P(c) and u(D) ~ 1, so we should defect if P(c) < 0.5, that is, if we assess that our opponent is even slightly more likely to automatically defect than to consider cooperation. If we have optimistic priors for both probabilities due to applicable previous experiences or any immediate observational cues, we may choose to cooperate; we plug in our numbers, and shut up and multiply.

In the iterated case, we have the opportunity to observe our opponent's behavior and update priors as we go. We are incentivized to cooperate when we believe it will do so, and to defect when we believe it will defect or when we believe we can do so without it anticipating us. Both players are incentivized to cooperate more often than defecting when they believe the other is good at predicting them. A player with a dominating edge in predictive capabilities can potentially attain a better result than pure mutual cooperation against an opponent with weak capabilities, through occasional strategic defections; the weaker player may find themselves incentivized not to punish the defector if they realize that they cannot do so without being anticipated and losing just as many utilons as the superior player would lose from the punishment. To the extent that the superior predictor can ascertain that their opponent is savvy enough to know when it's dominated and would choose not to lose further utilons through vindictive play, such a strategy may be profitable.

Thus the spoils go to the algorithm with the best ability to predict an opponent. Skilled poker players or experts at "Rock-Paper-Scissors" could perform quite well in such contests against the average human. That could be fun to watch.