cousin_it comments on Ingredients of Timeless Decision Theory - Less Wrong

43 Post author: Eliezer_Yudkowsky 19 August 2009 01:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (226)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 19 August 2009 08:01:41AM 10 points [-]

Or does my example fall outside of the specified problem class?

If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.

Nonetheless it's certainly an interesting dilemma.

Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on "unusual reasons" are much more difficult than your posts seem to imply. We can't all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?

To introduce yet another complication: What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can "get away with it"? It's the TDT's responses that make them "exploitable" by your decision to submit a CDT player - so what if they employ a different strategy instead? (This is another open problem - "who acts first" in timeless negotiations.)

There might be a certain sense in which being in a "small subgroup internally correlated but not correlated with larger groups" could possibly act as a sort of resource for getting away with defection in the true PD, because if you're in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you're in a small subgroup then it shifts the probability of the opponent defecting by a little, so there's a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)

Anyway - you can't end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.

But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don't automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can't deliberately add noise, but you might be able to preserve existing uncorrelation.

Also, other TDTs can potentially set their "minimum cooperator frequency threshold" at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting - though this itself is a possibility I am highly unsure of, and once again it has to do with "who goes first" in timeless strategies, which is an open problem.

But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.

So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT - most of them/us would be making the decision "Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?" and not "Should I build TDT, having never thought of that?" In other words, we may already be part of a large correlated subgroup - though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the "successful biological designer" group...

...but a lot of this presently has the quality of "No fucking way would I try that in real life", at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.

Comment author: cousin_it 20 June 2013 03:47:52PM *  2 points [-]

What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?

That's a good point, but what if the process that gives birth to CDT doesn't listen to the incentives you give it? For example, it could be evolution or random chance.

Here's an example, similar to Wei's example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob's life will be much happier than Alice's, right?

We can't all act on unusual reasons and end up doing the same thing, after all.

What force will push against evolution and keep the number of Bobs small?