Vladimir_Nesov comments on Towards a New Decision Theory - Less Wrong

50 Post author: Wei_Dai 13 August 2009 05:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (142)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 16 August 2009 12:02:24PM *  3 points [-]

I guess my reductio ad absurdum wasn't quite sufficient. I'll try to think this through more thoroughly and carefully. Let me know which steps, if any, you disagree with, or are unclear, in the following line of reasoning.

  1. TDT couldn't have arisen by evolution.
  2. Until a few years ago, almost everyone on Earth was running some sort of non-TDT which plays defect in one-shot PD.
  3. It's possible that upon learning about TDT, some people might spontaneously switch to running it, depending on whatever meta-DT controls this, and whether the human brain is malleable enough to run TDT.
  4. If, in any identifiable group of people, a sufficient fraction switches to TDT, and that proportion is public knowledge, the TDT-running individuals in that group should start playing cooperate in one-shot PD with other members of the group.
  5. The threshold proportion is higher if the remaining defectors can cause greater damage. If the remaining defectors can use their gains from defection to better reproduce themselves, or to gather more resources that will let them increase their gains/damage, then the threshold proportion must be close to 1, because even a single defector can start a chain reaction that causes all the resources of the group to become held by defectors.
  6. What proportion of skilled AI designers would switch to TDT is ultimately an empirical question, but it seems to me that it's unlikely to be close to unity.
  7. TDT-running AI designers will design their AIs to run TDT. Non-TDT-running AI designers will design their AIs to run non-TDT (not necessarily the same non-TDT).
  8. Assume that a TDT-running AI (TAI) can't tell which other AIs are running TDT and which ones aren't, so in every game it faces the decision described in steps 4 and 5. A TDT AI will cooperate in some situations where the benefit from cooperation is relatively high and damage from defection relatively low, and not in other situations.
  9. As a result, non-TAI will do better than TAI, but the damage to TAIs will be limited.
  10. Only if a TAI is sure that all AIs are TAIs, will it play cooperate unconditionally.
  11. If a TAI encounters an AI of alien origin, the same logic applies. The alien AI will be TAI if-and-only-if its creator was running TDT. If the TAI knows nothing about the alien creator, then it has to estimate what fraction of AI-builders in the universe runs TDT. Taking into account that TDT can't arise from evolution, and not seeing any reason for evolution to create a meta-DT that would pick TDT upon discovering it, this fraction seems pretty low, and so the TAI will likely play defect against the alien AI.

Hmm, this exercise has cleared a lot of my own confusion. Obviously a lot more work needs to be done to make the reasoning rigorous, but hopefully I've gotten the gist of it right.

ETA: According to this line of argument, your hypothesis that all skilled AI designers play cooperate in one-shot PD against each other is equivalent to saying that skilled AI designers have minds malleable enough to run TDT, and have a meta-DT that causes them to switch to running TDT. But I do not see an evolutionary reason for this, so if it's true, it must be true by luck. Do you agree?

Comment author: Vladimir_Nesov 16 August 2009 01:47:34PM *  2 points [-]

It looks like in this discussion you assume that switching to "TDT" (it's highly uncertain what this means) immediately gives the decision to cooperate in "true PD". I don't see why it should be so. Summarizing my previous comments, exactly what the players know about each other, exactly in what way they know it, may make their decisions go either way. That the players switch from CDT to some kind of more timeless decision theory doesn't determine the answer to be "cooperate", it merely opens up the possibility that previously was decreed irrational, and I suspect that what's important in the new setting for making the decision go either way isn't captured properly in the problem statement of "true PD".

Also, the way you treat "agents with TDT" seems more appropriate for "agents with Cooperator prefix" from cousin_it's Formalizing PD. And this is a simplified thing far removed from a complete decision theory, although a step in the right direction.

Comment author: Wei_Dai 16 August 2009 07:17:56PM 0 points [-]

I don't assume that switching to TDT immediately gives the decision to cooperate in "true PD". I assume that an AI running TDT would decide to cooperate if it thinks the expected utility of cooperating is higher than the EU of defecting, and that is true if its probability of facing another TDT is sufficiently high compared to its probability of facing a defector (how high is sufficient depends on the payoffs of the game). Well, this is necessary but not sufficient. For example if the other TDT doesn't think its probability of facing a TDT is high enough, it won't cooperate, so we need some common knowledge of the relevant probabilities and payoffs.

Does my line of reasoning make sense now, given this additional explanation?

Comment author: Vladimir_Nesov 16 August 2009 07:39:23PM *  0 points [-]

Actually it makes less sense now, since your explanation seems to agree that two "TDT" algorithms that know each of them is "TDT" won't necessarily cooperate, which undermines my hypothesis for why you were talking about cooperation as a sure thing in some relation to "TDT". I still think you make that assumption though. Citation from your argument:

4. If, in any identifiable group of people, a sufficient fraction switches to TDT, and that proportion is public knowledge, the TDT-running individuals in that group should start playing cooperate in one-shot PD with other members of the group.

Comment author: Wei_Dai 16 August 2009 08:04:32PM *  0 points [-]

I'm having trouble understanding what you're talking about again. Do you agree or disagree with step 4? To rephrase it a bit, if an identifiable group of people contains a high fraction of individuals running TDT, and that proportion is public knowledge, then TDT-running individuals in that group should play cooperate in one-shot PD with other members of the group in games where the payoffs are such that potential gains from mutual cooperation is large compared to potential loses from being defected against. (Assuming being in such a group is the best evidence available about whether someone is running TDT or not.)

If you disagree, why do you think a TDT-running individual might not play cooperate in this situation? Can you give an example to help me understand?

Comment author: Vladimir_Nesov 19 August 2009 01:14:14PM 0 points [-]
Comment author: Vladimir_Nesov 16 August 2009 09:49:42PM 0 points [-]

I disagree with step 4, I think sometimes the TDT players that know they both are TDT players won't cooperate, but this discussion stirred up some of the relevant issues, so I'll answer later when I figure out what I should believe now.

Comment author: Eliezer_Yudkowsky 16 August 2009 10:18:41PM 0 points [-]

I don't see why TDT players would fail to cooperate under conditions of common knowledge. Are you talking about a case where they each know the other is TDT but think the other doesn't know they know, or something like that?

Comment author: Vladimir_Nesov 19 August 2009 01:11:57PM *  2 points [-]

I don't know the whole answer, but for example consider what happens with Pareto-efficiency in PD when you allow mixed strategies (and mixed strategy is essentially the presence of nontrivial dependence of the played move on the agent's state of knowledge, beyond what is restricted by the experiment, so there is no actual choice about allowing mixed strategies, mixed strategies are what's there by default even if the problem states that players select some certain play). Now, the Pareto-efficient plays are those where one player cooperates with certainty, while the other cooperates or defects with some probability. These strategies correspond to bargaining between the players. I don't know how to solve the bargaining problem (aka fairness problem aka first-mover problem in TDT), but I see no good reason to expect that the solution in this case is going to be exactly pure cooperation. Which is what I meant by the insufficiency in correspondence between true PD and pure cooperation: true PD seems to give too little info, leaving uncertainty about the outcome, at least in this sense. This example doesn't allow both players to defect, but it's not pure cooperation either.