Eliezer_Yudkowsky comments on Ingredients of Timeless Decision Theory - Less Wrong

43 Post author: Eliezer_Yudkowsky 19 August 2009 01:10AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (226)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 19 August 2009 08:01:41AM 10 points [-]

Or does my example fall outside of the specified problem class?

If I wanted to defend the original thesis, I would say yes, because TDT doesn't cooperate or defect depending directly on your decision, but cooperates or defects depending on how your decision depends on its decision (which was one of the open problems I listed - the original TDT is for cases where Omega offers you straightforward dilemmas in which its behavior is just a direct transform of your behavior). So where one algorithm has one payoff matrix for defection or cooperation, the other algorithm gets a different payoff matrix for defection or cooperation, which breaks the "problem class" under which the original TDT is automatically reflectively consistent.

Nonetheless it's certainly an interesting dilemma.

Your comment here is actually pre-empting a comment that I'd planned to make after providing some of the background for the content of TDT. I'd thought about your dilemmas, and then did manage to translate into my terms a notion about how it might be possible to unilaterally defect in the Prisoner's Dilemma and predictably get away with it, provided you did so for unusual reasons. But the conditions on "unusual reasons" are much more difficult than your posts seem to imply. We can't all act on unusual reasons and end up doing the same thing, after all. How is it that these two TDT AIs got here, if not by act of Omega, if the sensible thing to do is always to submit a CDT AI?

To introduce yet another complication: What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player? Given that your reason for submitting a CDT player involves your expectation about how the TDT players will respond, and that you can "get away with it"? It's the TDT's responses that make them "exploitable" by your decision to submit a CDT player - so what if they employ a different strategy instead? (This is another open problem - "who acts first" in timeless negotiations.)

There might be a certain sense in which being in a "small subgroup internally correlated but not correlated with larger groups" could possibly act as a sort of resource for getting away with defection in the true PD, because if you're in a large group then defecting shifts the probability of an opponent likewise defecting by a lot, but if you're in a small subgroup then it shifts the probability of the opponent defecting by a little, so there's a lower penalty for defection, so in marginal cases a small subgroup might play defection while a large subgroup plays cooperate. (But again, the conditions on this are difficult. If all small subgroups reason this way, then all small subgroups form a large correlated group!)

Anyway - you can't end up in a small subgroup if you start out in a large one, because if you decide to deliberately condition on noise in order to decrease the size of your subgroup, that itself is a correlated sort of decision with a clear line of reasoning and motive, and others in your correlated group will try doing the same thing, with predictable results. So to the extent that lots of AI designers in distant parts of Reality are discussing this same issue with the same logic, we are already in a group of a certain minimum size.

But this does lead to an argument for CEV (values extrapolating / Friendly AI) algorithms that don't automatically, inherently correlate us with larger groups than we already started out being in. If uncorrelation is a nonrenewable resource then FAI programmers should at least be careful not to wantonly burn it. You can't deliberately add noise, but you might be able to preserve existing uncorrelation.

Also, other TDTs can potentially set their "minimum cooperator frequency threshold" at just the right level that if any group of noticeable size chooses to defect, all the TDTs start defecting - though this itself is a possibility I am highly unsure of, and once again it has to do with "who goes first" in timeless strategies, which is an open problem.

But these are issues in which my understanding is still shaky, and it very rapidly gets us into very dangerous territory like trying to throw the steering wheel out the window while playing chicken.

So far as evolved biological organisms go, I suspect that the ones who create successful Friendly AIs (instead of losing control and dying at the hands of paperclip maximizers), would hardly start out seeing only the view from CDT - most of them/us would be making the decision "Should I build TDT, knowing that the decisions of other biological civilizations are correlated to this one?" and not "Should I build TDT, having never thought of that?" In other words, we may already be part of a large correlated subgroup - though I sometimes suspect that most of the AIs out there are paperclip maximizers born of experimental accidents, and in that case, if there is no way of verifying source code, nor of telling the difference between SIs containing bio-values-preserving civs and SIs containing paperclip maximizers, then we might be able to exploit the relative smallness of the "successful biological designer" group...

...but a lot of this presently has the quality of "No fucking way would I try that in real life", at least based on my current understanding. The closest I would get might be trying for a CEV algorithm that did not inherently add correlation to decision systems with which we were not already correlated.

Comment author: Wei_Dai 19 August 2009 12:42:04PM *  7 points [-]

This is another open problem - "who acts first" in timeless negotiations.

You're right, I failed to realize that with timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.

Here's an idea. The logical order of decisions is related to simulation ability. Suppose A can simulate B, meaning it has trustworthy information about B's source code and has sufficient computing power to fully simulate B or sufficient intelligence to analyze B using reliable shortcuts, but B can't simulate A. Then the logical order of decisions is B followed by A, because when B makes his decision, he can treat A's decision as conditional on his. But when A makes her decision, she has to take B's decision as a given.

Does that make sense?

Comment author: Eliezer_Yudkowsky 19 August 2009 03:05:15PM *  7 points [-]

Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power. Rational agents should not regret having more computing power (because they can always use less) or more knowledge (because they can always implement the same strategy they would use with less knowledge) - this sort of thing is a sure sign of reflective inconsistency.

To see why moving logically second is a disadvantage, consider that it lets an opponent playing Chicken always toss their steering wheel out the window and get away with it.

That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.

(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)

With timeless agents, we can't do backwards induction using the physical order of decisions. We need some notion of the logical order of decisions.

BTW, thanks for this compact way of putting it.

Comment author: rwallace 19 August 2009 07:33:55PM 1 point [-]

Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.

One category of counterexample comes from warfare, where if you know what the enemy will do and he doesn't know what you will do, you have the upper hand. (The logical versus temporal distinction is clear here: being temporally the first to reach an objective can be a big advantage.)

Another counterexample is in negotiation where a buyer and seller are both uncertain about fair market price; each may prefer the other to be first to suggest a price. (In practice this is often resolved by the party with more knowledge, or more at stake, or both - usually the seller - being first to suggest a price.)

Comment author: Wei_Dai 20 August 2009 12:18:02AM 0 points [-]

Being logically second only keeps being a disadvantage because examples keep being chosen to be of the kind that make it so.

You're right. Rock-paper-scissors is another counter-example. In these cases, the relationship between between the logical order of moves and simulation ability seems pretty obvious and intuitive.

Comment author: Eliezer_Yudkowsky 20 August 2009 12:19:37AM 1 point [-]

Except that the analogy to rock-paper-scissors would be that I get to move logically first by deciding my conditional strategy "rock if you play scissors" etc., and simulating you simulating me without running into an apparently non-halting computation (that would otherwise have to be stopped by my performing counterfactual surgery on the part of you that simulates my own decision), then playing rock if I simulate you playing scissors.

At least I think that's how the analogy would work.

Comment author: Vladimir_Nesov 20 August 2009 12:36:21AM *  2 points [-]

I suspect that this kind of problems will run into computational complexity issues, not clever decision theory issues. Like with a certain variation on St. Petersburg paradox (see the last two paragraphs), where you need to count to the greatest finite number to which you can count, and then stop.

Comment author: Wei_Dai 20 August 2009 12:29:38AM 1 point [-]

Suppose I know that's your strategy, and decide to play the move equal to (the first googleplex digits of pi mod 3), and I can actually compute that but you can't. What are you going to do?

If you can predict what I do, then your conditional strategy works, which just shows that move order is related to simulation ability.

Comment author: Eliezer_Yudkowsky 20 August 2009 03:32:21AM *  4 points [-]

In this zero-sum game, yes, it's possible that whoever has the most computing power wins, if neither can access unpredictable random or private variables. But what if both sides have exactly equal computing power? We could define a Timeless Paper-Scissors-Rock Tournament this way - standard language, no random function, each program gets access to the other's source code and exactly 100 million ticks, if you halt without outputting a move then you lose 2 points.

Comment author: Wei_Dai 20 August 2009 09:13:40AM 1 point [-]

This game is pretty easy to solve, I think. A simple equilibrium is for each side to do something like iterate x = SHA-512(x), with a random starting value, using an optimal implementation of SHA-512, until time is just about to run out, then output x mod 3. SHA-512 is easy to optimize (in the sense of writing the absolutely fastest implementation), and It seems very unlikely that there could be shortcuts to computing (SHA-512)^n until n gets so big (around 2^256 unless SHA-512 is badly designed) that the function starts to cycle.

I think I've answered your specific question, but the answer doesn't seem that interesting, and I'm not sure why you asked it.

Comment author: ciphergoth 23 May 2012 11:51:39AM 1 point [-]

Schneier et al here prove that being able to calculate H^n(x) quickly leads to a faster way of finding collisions in H. http://www.schneier.com/paper-low-entropy.html

Comment author: Eliezer_Yudkowsky 20 August 2009 10:11:17PM 1 point [-]

Well, it's probably not all that interesting from a purely theoretical perspective, but if the prize money was divided up among only the top fifth of players, you'd actually have to try to win, and that would be an interesting challenge for computer programmers.

Comment author: Wei_Dai 19 August 2009 07:31:16PM *  1 point [-]

Moving second is a disadvantage (at least it seems to always work out that way, counterexamples requested if you can find them) and A can always use less computing power.

But if you are TDT, you can't always use less computing power, because that might be correlated with your opponents also deciding to use less computing power, or will be distrusted by your opponent because it can't simulate you.

But if you simply don't have that much computing power (and opponent knows this) then you seem to have the advantage of logically moving first.

(I should comment in the future about the possibility that bio-values-derived civs, by virtue of having evolved to be crazy, can succeed in moving logically first using crazy reasoning, but that would be a whole 'nother story, and of course also falls into the "Way the fuck too dangerous to try in real life" category relative to my present knowledge.)

Lack of computing power could be considered a form of "crazy reasoning"...

Why does TDT lead to the phenomenon of "stupid winners"? If there's a way to explain this as a reasonable outcome, I'd feel a lot better. But is that like a two-boxer asking for an explanation of why, when the stupid (from their perspective) one-boxers keep winning, that's a reasonable outcome?

Comment author: Eliezer_Yudkowsky 19 August 2009 07:55:42PM *  0 points [-]

But if you are TDT, you can't always use less computing power, because that correlates with your opponents also deciding to use less computing power.

Substitute "move logically first" for "use less computing power"? Using less computing power seems like a red herring to me. TDT on simple problems (with the causal / logical structure already given) uses skeletally small amounts of computing power. "Who moves first" is a "battle"(?) over the causal / logical structure, not over who can manage to run out of computing power first. If you're visualizing this using lots of computing power for the core logic, rather than computing the 20th decimal place of some threshold or verifying large proofs, then we've got different visualizations.

The idea of "if you do this, the opponent does the same" might apply to trying to move logically first, but in my world this has nothing to do with computing power, so at this point I think it'd be pretty odd if the agents were competing to be stupider.

Besides, you don't want to respond to most logical threats, because that gives your opponent an incentive to make logical threats; you only want to respond to logical offers that you want your opponent to have an incentive to make. This gets into the scary issues I was hinting at before, like determining in advance that if you see your opponent predetermine to destroy the universe in a mutual suicide unless you pay a ransom, you'll call their bet and die with them, even if they've predetermined to ignore your decision, etcetera; but if they offer to trade you silver for gold at a Ricardian-advantageous rate, you'll predetermine to cooperate, etc. The point, though, is that "If I do X, they'll do Y" is not a blank check to decide that minds do X, because you could choose a different form of responsiveness.

But anyway, I don't see in the first place that agents should be having these sorts of contests over how little computing power to use. That doesn't seem to me like a compelling advantage to reach for.

But if you simply don't have that much computing power then you seem to have the advantage of logically moving first.

If you've got that little computing power then perhaps you can't simulate your opponent's skeletally small TDT decision, i.e., you can't use TDT at all. If you can't close the loop of "I simulate you simulating me" - which isn't infinite, and actually terminates rather quickly in the simple cases I know how to analyze at all, because we perform counterfactual surgery inside the loop - then you can't use TDT at all.

Lack of computing power could be considered a form of "crazy reasoning"...

No, I mean much crazier than that. Like "This doesn't follow, but I'm going to believe it anyway!" That's what it takes to get "unusual reasons" - the sort of madness that only strictly naturally selected biological minds would find compelling in advance of a timeless decision to be crazy. Like "I'M GOING TO THROW THE STEERING WINDOW OUT THE WHEEL AND I DON'T CARE WHAT THE OPPONENT PREDETERMINES" crazy.

Why does TDT lead to the phenomenon of "stupid winners"?

It has not been established to my satisfaction that it does. It is a central philosophical intuition driving my decision theory that increased computing power, knowledge, or self-control, should not harm a rational agent.

Comment author: Eliezer_Yudkowsky 20 August 2009 12:11:18AM 0 points [-]

That both players desire to move "logically first" argues strongly that neither one will; that the resolution here does not involve any particular fixed global logical order of decisions.

...possibly employing mixed strategies, by analogy to the equilibrium of games where neither agent gets to go first and both must choose simultaneously? But I haven't done anything with this idea, yet.

Comment author: [deleted] 13 June 2014 06:52:49AM -1 points [-]

This reminds me of logical Fatalism and the Argument from Bivalence

Comment author: cousin_it 20 June 2013 03:47:52PM *  2 points [-]

What if the TDTs that you're playing against, decide to defect unconditionally if you submit a CDT player, in order to give you an incentive to submit a TDT player?

That's a good point, but what if the process that gives birth to CDT doesn't listen to the incentives you give it? For example, it could be evolution or random chance.

Here's an example, similar to Wei's example above. Imagine two parallel universes, both containing large populations of TDT agents. In both universes, a child is born, looking exactly like everyone else. The child in universe A is a TDT agent named Alice. The child in universe B is named Bob and has a random mutation that makes him use CDT. Both children go on to play many blind PDs with their neighbors. It looks like Bob's life will be much happier than Alice's, right?

We can't all act on unusual reasons and end up doing the same thing, after all.

What force will push against evolution and keep the number of Bobs small?