Eliezer_Yudkowsky comments on Towards a New Decision Theory - Less Wrong

50 Post author: Wei_Dai 13 August 2009 05:31AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (142)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 16 August 2009 12:02:24PM *  3 points [-]

I guess my reductio ad absurdum wasn't quite sufficient. I'll try to think this through more thoroughly and carefully. Let me know which steps, if any, you disagree with, or are unclear, in the following line of reasoning.

  1. TDT couldn't have arisen by evolution.
  2. Until a few years ago, almost everyone on Earth was running some sort of non-TDT which plays defect in one-shot PD.
  3. It's possible that upon learning about TDT, some people might spontaneously switch to running it, depending on whatever meta-DT controls this, and whether the human brain is malleable enough to run TDT.
  4. If, in any identifiable group of people, a sufficient fraction switches to TDT, and that proportion is public knowledge, the TDT-running individuals in that group should start playing cooperate in one-shot PD with other members of the group.
  5. The threshold proportion is higher if the remaining defectors can cause greater damage. If the remaining defectors can use their gains from defection to better reproduce themselves, or to gather more resources that will let them increase their gains/damage, then the threshold proportion must be close to 1, because even a single defector can start a chain reaction that causes all the resources of the group to become held by defectors.
  6. What proportion of skilled AI designers would switch to TDT is ultimately an empirical question, but it seems to me that it's unlikely to be close to unity.
  7. TDT-running AI designers will design their AIs to run TDT. Non-TDT-running AI designers will design their AIs to run non-TDT (not necessarily the same non-TDT).
  8. Assume that a TDT-running AI (TAI) can't tell which other AIs are running TDT and which ones aren't, so in every game it faces the decision described in steps 4 and 5. A TDT AI will cooperate in some situations where the benefit from cooperation is relatively high and damage from defection relatively low, and not in other situations.
  9. As a result, non-TAI will do better than TAI, but the damage to TAIs will be limited.
  10. Only if a TAI is sure that all AIs are TAIs, will it play cooperate unconditionally.
  11. If a TAI encounters an AI of alien origin, the same logic applies. The alien AI will be TAI if-and-only-if its creator was running TDT. If the TAI knows nothing about the alien creator, then it has to estimate what fraction of AI-builders in the universe runs TDT. Taking into account that TDT can't arise from evolution, and not seeing any reason for evolution to create a meta-DT that would pick TDT upon discovering it, this fraction seems pretty low, and so the TAI will likely play defect against the alien AI.

Hmm, this exercise has cleared a lot of my own confusion. Obviously a lot more work needs to be done to make the reasoning rigorous, but hopefully I've gotten the gist of it right.

ETA: According to this line of argument, your hypothesis that all skilled AI designers play cooperate in one-shot PD against each other is equivalent to saying that skilled AI designers have minds malleable enough to run TDT, and have a meta-DT that causes them to switch to running TDT. But I do not see an evolutionary reason for this, so if it's true, it must be true by luck. Do you agree?

Comment author: Eliezer_Yudkowsky 16 August 2009 10:28:40PM 1 point [-]

Btw, agree with steps 3-9.