Timeless Decision Theory: Problems I Can't Solve

25Eliezer_Yudkowsky20 July 2009 12:02AM

Suppose you're out in the desert, running out of water, and soon to die - when someone in a motor vehicle drives up next to you.  Furthermore, the driver of the motor vehicle is a perfectly selfish ideal game-theoretic agent, and even further, so are you; and what's more, the driver is Paul Ekman, who's really, really good at reading facial microexpressions.  The driver says, "Well, I'll convey you to town if it's in my interest to do so - so will you give me $100 from an ATM when we reach town?"

Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver.  "Yes," you say.  "You're lying," says the driver, and drives off leaving you to die.

If only you weren't so rational!

This is the dilemma of Parfit's Hitchhiker, and the above is the standard resolution according to mainstream philosophy's causal decision theory, which also two-boxes on Newcomb's Problem and defects in the Prisoner's Dilemma.  Of course, any self-modifying agent who expects to face such problems - in general, or in particular - will soon self-modify into an agent that doesn't regret its "rationality" so much.  So from the perspective of a self-modifying-AI-theorist, classical causal decision theory is a wash.  And indeed I've worked out what seems like an elegant theory, tentatively labeled "timeless decision theory", which covers these three Newcomblike problems and delivers a first-order answer that is already reflectively consistent, without need to explicitly consider such notions as "precommitment".  Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

However, there are some other timeless decision problems for which I do not possess a general theory.

For example, there's a problem introduced to me by Gary Drescher's marvelous Good and Real (OOPS: The below formulation was independently invented by Vladimir Nesov; Drescher's book actually contains a related dilemma in which box B is transparent, and only contains $1M if Omega predicts you will one-box whether B appears full or empty, and Omega has a 1% error rate) which runs as follows:

Suppose Omega (the same superagent from Newcomb's Problem, who is known to be honest about how it poses these sorts of dilemmas) comes to you and says:

"I just flipped a fair coin.  I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000.  And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads.  The coin came up heads - can I have $1000?"

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self-modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

But I don't have a general theory which replies "Yes".  At the point where Omega asks me this question, I already know that the coin came up heads, so I already know I'm not going to get the million.  It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads.  But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability previously possessed by events I have since learned the outcome of...  Again, it's easy to talk informally about why you have to reply "Yes" in this case, but that's not the same as being able to exhibit a general algorithm.

Another stumper was presented to me by Robin Hanson at an OBLW meetup.  Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote.  Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each.  But then two of them think, "Hey, this leaves four agents out in the cold.  We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us.  We get a larger share than one-sixth that way, and they get a larger share than zero, so it's an improvement from the perspectives of all six of us - they should take the deal."  And those six then form a new coalition and redivide the pie.  Then another two of the agents think:  "The two of us are getting one-eighth apiece, while four other agents are getting zero - we should form a coalition with them, and by majority vote, give each of us one-sixth."

And so it goes on:  Every majority coalition and division of the pie, is dominated by another majority coalition in which each agent of the new majority gets more pie.  There does not appear to be any such thing as a dominant majority vote.

(Robin Hanson actually used this to suggest that if you set up a Constitution which governs a society of humans and AIs, the AIs will be unable to conspire among themselves to change the constitution and leave the humans out in the cold, because then the new compact would be dominated by yet other compacts and there would be chaos, and therefore any constitution stays in place forever.  Or something along those lines.  Needless to say, I do not intend to rely on such, but it would be nice to have a formal theory in hand which shows how ideal reflectively consistent decision agents will act in such cases (so we can prove they'll shed the old "constitution" like used snakeskin.))

Here's yet another problem whose proper formulation I'm still not sure of, and it runs as follows.  First, consider the Prisoner's Dilemma.  Informally, two timeless decision agents with common knowledge of the other's timeless decision agency, but no way to communicate or make binding commitments, will both Cooperate because they know that the other agent is in a similar epistemic state, running a similar decision algorithm, and will end up doing the same thing that they themselves do.  In general, on the True Prisoner's Dilemma, facing an opponent who can accurately predict your own decisions, you want to cooperate only if the other agent will cooperate if and only if they predict that you will cooperate.  And the other agent is reasoning similarly:  They want to cooperate only if you will cooperate if and only if you accurately predict that they will cooperate.

But there's actually an infinite regress here which is being glossed over - you won't cooperate just because you predict that they will cooperate, you will only cooperate if you predict they will cooperate if and only if you cooperate.  So the other agent needs to cooperate if they predict that you will cooperate if you predict that they will cooperate... (...only if they predict that you will cooperate, etcetera).

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion, so that you can expect their action to be the symmetrical analogue of your own - in which case (C, C) is preferable to (D, D).  But what if you're facing a more general decision problem, with many agents having asymmetrical choices, and everyone wants to have their decisions depend on how they predict that other agents' decisions depend on their own predicted decisions?  Is there a general way of resolving the regress?

On Parfit's Hitchhiker and Newcomb's Problem, we're told how the other behaves as a direct function of our own predicted decision - Omega rewards you if you (are predicted to) one-box, the driver in Parfit's Hitchhiker saves you if you (are predicted to) pay $100 on reaching the city.  My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - either by specification (as in Newcomb's Problem) or by symmetry (as in the Prisoner's Dilemma).  If their decision is allowed to depend on how your decision depends on their decision - like saying, "I'll cooperate, not 'if the other agent cooperates', but only if the other agent cooperates if and only if I cooperate - if I predict the other agent to cooperate unconditionally, then I'll just defect" - then in general I do not know how to resolve the resulting infinite regress of conditionality, except in the special case of predictable symmetry.

You perceive that there is a definite note of "timelessness" in all these problems.

Any offered solution may assume that a timeless decision theory for direct cases already exists - that is, if you can reduce the problem to one of "I can predict that if (the other agent predicts) I choose strategy X, then the other agent will implement strategy Y, and my expected payoff is Z", then I already have a reflectively consistent solution which this margin is unfortunately too small to contain.

(In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No", but this is actually the one exception that I can think of.  (Anyone actually taking a shot at this should probably familiarize themselves with the existing literature on Newcomblike problems - the edited volume "Paradoxes of Rationality and Cooperation" should be a sufficient start (and I believe there's a copy at the SIAI Summer Project house.)))

Comments (81)

cousin_it28 December 2009 08:58:27PM* 2 points [-]

Here's a comment that took me way too long to formulate:

On the Prisoner's Dilemma in particular, this infinite regress can be cut short by expecting that the other agent is doing symmetrical reasoning on a symmetrical problem and will come to a symmetrical conclusion...

Eliezer, if such reasoning from symmetry is allowed, then we sure don't need your "TDT" to solve the PD!

Eliezer_Yudkowsky28 December 2009 10:47:04PM* 0 points [-]

TDT allows you to use whatever you can prove mathematically. If you can prove that two computations have the same output because their global structures are isomorphic, it doesn't matter if the internal structure is twisty or involves regresses you haven't yet resolved. However, you need a license to use that sort of mathematical reasoning in the first place, which is provided by TDT but not CDT.

PhilGoetz06 August 2009 12:57:43AM3 points [-]

Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

  • But it is the writeup most-frequently requested of you, and also, I think, the thing you have done that you refer to the most often.

  • Nobody's going to offer. You have to ask them.

wedrifid29 July 2009 02:06:42AM1 point [-]

Unfortunately this "timeless decision theory" would require a long sequence to write up, and it's not my current highest writing priority unless someone offers to let me do a PhD thesis on it.

Can someone tell me the matrix of pay-offs for taking on Eleizer as a PhD student?

Wei_Dai20 July 2009 08:18:50AM13 points [-]

There does not appear to be any such thing as a dominant majority vote.

Eliezer, are you aware that there's an academic field studying issues like this? It's called Social Choice Theory, and happens to be covered in chapter 4 of Hervé Moulin's Fair Division and Collective Welfare, which I recommended in my post about Cooperative Game Theory.

I know you're probably approaching this problem from a different angle, but it should still be helpful to read what other researchers have written about it.

A separate comment I want to make is that if you want others to help you solve problems in "timeless decision theory", you really need to publish the results you've got already. What you're doing now is like if Einstein had asked people to help him predict the temperature of black holes before having published the general theory of relativity.

As far as needing a long sequence, are you assuming that the reader has no background in decision theory? What if you just write to an audience of professional decision theorists, or someone who has at least read "The Foundations of Causal Decision Theory" or the equivalent?

cousin_it20 July 2009 09:18:00AM4 points [-]

Seconded. I, for one, would be perfectly OK with posts requiring a lot of unfamiliar background math as long as they're correct and give references. For example, Scott Aaronson isn't afraid of scary topics and I'm not afraid of using his posts as entry points into the maze.

AaronBenson23 July 2009 03:43:14PM0 points [-]

For that matter, I'm sure someone else would be willing to write a sequence on decision theory to ensure everyone has the required background knowledge. This might work even better if Eliezer suggested some topics to be covered in the sequence so that the background was more specific.

In fact, I would happily do that and I'm sure others would too.

Jayson_Virissimo20 July 2009 09:03:16PM1 point [-]

"Now of course you wish you could answer "Yes", but as an ideal game theorist yourself, you realize that, once you actually reach town, you'll have no further motive to pay off the driver."

Can't you contract your way out of this one?

Nanani21 July 2009 04:03:14AM0 points [-]

Indeed. It would seem sufficient to push a bit further and take in the desirebility of upholding verbal contracts. Unless of course, the driver is so harsh as to drive away for a mere second of considering non-payment.

RichardKennaway20 July 2009 04:35:20PM* 2 points [-]

If the ten pie-sharers is to be more than a theoretical puzzle, but something with applicability to real decision problems, then certain expansions of the problem suggest themselves. For example, some of the players might conspire to forcibly exclude the others entirely. And then a subset of the conspirators do the same.

This is the plot of "For a Few Dollars More".

How do criminals arrange these matters in real life?

RobinZ20 July 2009 04:43:23PM0 points [-]

Dagnabbit, another movie I have to see now!

(i.e. thanks for the ref!)

Jotaf21 July 2009 03:52:16AM1 point [-]

The Dark Knight has an even better example - in the bank robbery scene, each subgroup excludes only one more member, until the only man left is... That's enough of a spoiler I guess.

RobinZ21 July 2009 01:21:40PM0 points [-]

Yeah ... guess which scene I came in during the middle of? :P

cousin_it20 July 2009 09:49:50AM* 2 points [-]

Is your majority vote problem related to Condorcet's paradox? It smells so, but I can't put a handle on why.

I cheated the PD infinite regress problem with a quine trick in Re-formalizing PD. The asymmetric case seems to be hard because fair division of utility is hard, not because quining is hard. Given a division procedure that everyone accepts as fair, the quine trick seems to solve the asymmetric case just as well.

Post your "timeless decision theory" already. If it's correct, it shouldn't be that complex. With your intelligence you can always write a PhD on some other AI topic should the opportunity arise. But after conversations with Vladimir Nesov I was kinda under the impression that you could solve the asymmetric PD-like cases too; if not, I'm a little disappointed in advance. :-(

Psychohistorian20 July 2009 06:31:53AM* 3 points [-]

Hanson's example of ten people dividing the pie seems to hinge on arbitrarily passive actors who get to accept and make propositions instead of being able to solicit other deals or make counter proposals, and it is also contingent on infinite and costless bargaining time. The bargaining time bit may be a fair (if unrealistic) assumption, but the passivity does not make sense. It really depends on the kind of commitments and bargains players are able to make and enforce, and the degree/order of proposals from outgroup and ingroup members.

When the first two defectors say, "Hey, you each get an eighth if you join us," the four could pick another two of the in-crowd and say, "Hey, they offered us Y apiece, but we'll join you instead if you each give us X, Y<X (which is actually profitable to the other four so long as X < 1/4 - they get cut out entirely if they can't bargain)." No matter how it is divided, there will always be a subgroup in the in-crowd that could profitably bargain with the out-crowd, and there will always be a different subgroup in the in-crowd that will be able to make a better offer. So long as there is an out-crowd, there are people who can bargain profitably, and so longer as the in-crowd is > 6, people can be profitably removed.

If bargaining time is finite (or especially if it has non-zero cost), I suspect, but can't prove (for lack of effort/technical proficiency, not saying it's unprovable) that each actor will opt for the even 10-person split (especially if risk-averse) because it is (statistically) equivalent (or superior) to the sum*probability of other potential arrangements.

CronoDAS20 July 2009 06:57:25AM* 0 points [-]

What if we try a simpler model?

Let's go from ten agents to two, with the stipulation that nobody gets any pie until both agents agree on the split...

cousin_it20 July 2009 10:31:54AM* 3 points [-]

This is the Nash bargaining game. Voting plays no role there, but it's a necessary ingredient in our game; this means we've simplified too much.

Velochy20 July 2009 11:00:44AM0 points [-]

But three people should do already. Im fairly convinced that this game is unstable in the sense it would not make sense for any of them to agree to get 1/3 as they can always guarantee themselves more by defecting with someone (even by offeing them 1/6 - epsilon which is REALLY hard to turn down). It seems that a given majority getting 1/2 each would be a more probable solution but you would really need to formalize the rules before this can be proven. Im a cryptologist so this is sadly not really my area...

Psychohistorian20 July 2009 08:29:09PM* 2 points [-]

I almost posted on the three-person situation earlier, but what I wrote wasn't cogent enough. It does seem like it should work as an archetype for any N > 2.

The problem is how the game is iterated. Call the players A, B, and C. If A says, "B, let's go 50-50," and you assume C doesn't get to make a counter-offer and they vote immediately, 50-50-0 is clearly the outcome. This is probably also the case for the 10-person if there's no protracted bargaining.

If there is protracted bargaining, it turns into an infinite regression as long as there is an out-group, and possibly even without an outgroup. Take this series of proposals, each of which will be preferred to the one prior (format is Proposer:A gets-B gets-C gets):

A:50-50-0

C:0-55-45

A:50-0-50

B: 55-45-0

C:0-55-45

A:50-0-50 ...

There's clearly no stable equilibrium. It seems (though I'm not sure how to prove this) that an equal split is the appropriate efficient outcome. Any action by any individual will create an outgroup that will spin them into an infinite imbalance. Moreover, if we are to arbitrarily stop somewhere along that infinite chain, the expected value for each player is going to be 100/3 (they get part of a two-way split twice which should average to 50 each time overall, and they get zero once per three exchanges). Thus, at 33-33-33, one can't profitably defect. At 40-40-20, C could defect and have a positive expected outcome.

If the players have no bargaining costs whatsoever, and always have the opportunity to bargain before a deal is voted on, and have an infinite amount of time and do not care how long it takes to reach agreement (or if agreement is reached), then it does seem like you get an infinite loop, because there's always going to be an outgroup that can outbid one of the ingroup. This same principle should also apply to the 10-person model; with infinite free time and infinite free bargaining, no equilibrium can be reached. If there is some cost to defecting, or a limitation on bargaining, there should be an even N/2+1-way split (depending admittedly on how those costs and limits are defined). If there is no limitation on bargaining and no cost to defecting, but time has a cost or time will be arbitrarily "called," an even N-way split seems like the most likely/efficient outcome. The doubly-infinite situations is so far divorced from reality that it does not seem worth losing sleep over.

Also, the problem may stem from our limitation of thinking of this as a linear series of propositions, because that's how people would have to actually bargain. In the no-repeated bargaining game, whether it's 50-50-0 or 0-50-50 all depends on who asks first, which seems like an improper and unrealistic determining factor. This linear, proposer-centered view may not be how such beings would actually bargain.

cousin_it21 July 2009 05:28:15AM* 0 points [-]

The example of the Rubinstein bargaining model suggests that you could make players alternate offers and introduce exponential temporal discounting. An equal split isn't logically necessary in this case: a player's payoff will likely depend on their personal rate of utility discounting, also known as "impatience", and others' perceptions of it. The search keyword is "n-person bargaining"; there seems to be a lot of literature that I'm too lazy and stupid to quickly summarize.

Liron20 July 2009 12:06:35PM1 point [-]

But I don't have a general theory which replies "Yes" [to a counterfactual mugging].

You don't? I was sure you'd handled this case with Timeless Decision Theory.

I will try to write up a sketch of my idea, which involves using a Markov State Machine to represent world states that transition into one another. Then you distinguish evidence about the structure of the MSM, from evidence of your historical path through the MSM. And the best decision to make in a world state is defined as the decision which is part of a policy that maximizes expected utility for the whole MSM.

OK, I just tried for four hours but couldn't successfully describe a useful formalism that provides a good analysis of counterfactual mugging. Will keep trying later.

Vladimir_Nesov20 July 2009 01:52:38AM6 points [-]

In case you're wondering, I'm writing this up because one of the SIAI Summer Project people asked if there was any Friendly AI problem that could be modularized and handed off and potentially written up afterward, and the answer to this is almost always "No"

Does it mean that the problem isn't reduced enough to reasonably modularize? It would be nice if you written up the outline of state of research at SIAI (even a brief one with unexplained labels) or an explanation of why you won't.

Jonathan_Graehl20 July 2009 01:31:49AM* 4 points [-]

"I believe X to be like me" => "whatever I decide, X will decide also" seems tenuous without some proof of likeness that is beyond any guarantee possible in humans.

I can accept your analysis in the context of actors who have irrevocably committed to some mechanically predictable decision rule, which, along with perfect information on all the causal inputs to the rule, gives me perfect predictions of their behavior, but I'm not sure such an actor could ever trust its understanding of an actual human.

Maybe you could aspire to such determinism in a proven-correct software system running on proven-robust hardware.

Eliezer_Yudkowsky20 July 2009 01:52:43AM2 points [-]

"I believe X to be like me" => "whatever I decide, X will decide also" seems tenuous without some proof of likeness that is beyond any guarantee possible in humans...

Maybe you could aspire to such determinism in a proven-correct software system running on proven-robust hardware.

Well, yeah, this is primarily a theory for AIs dealing with other AIs.

You could possibly talk about human applications if you knew that the N of you had the same training as rationalists, or if you assigned probabilities to the others having such training.

SoullessAutomaton20 July 2009 01:49:23AM* 1 point [-]

I'm not sue such an actor could ever trust its understanding of an actual human.

Let's play a little game; you and an opponent, 10 rounds of the prisoner's dilemma. It will cost you each $5 to play, with the following payouts on each round:

  • (C,C) = $0.75 each
  • (C,D) = $1.00 for D, $0 for C
  • (D,D) = $0.25 each

Conventional game theory says both people walk away with $2.50 and a grudge against each other, and I, running the game, pocket the difference.

Your opponent is Eliezer Yudkowsky.

How much money do you expect to have after the final round?

Eliezer_Yudkowsky20 July 2009 01:54:25AM1 point [-]

But that's not the true PD.

SoullessAutomaton20 July 2009 02:02:03AM1 point [-]

The statistical predictability of human behavior in less extreme circumstances is a much weaker constraint. I thought the (very gentle) PD presented sufficed to make the point that prediction is not impossible even in a real-world scenario.

I don't know that I have confidence in even you to cooperate on the True PD--sorry. A hypothetical transhuman Bayesian intelligence with your value system? Quite possibly.

Eliezer_Yudkowsky20 July 2009 04:06:15AM7 points [-]

Well, let me put it this way - if my opponent is Eliezer Yudkowsky, I would be shocked to walk away with anything but $7.50.

SoullessAutomaton20 July 2009 10:23:30AM1 point [-]

Well, obviously. But the more interesting question is what if you suspect, but are not certain, that your opponent is Eliezer Yudkowsky? Assuming identity makes the problem too easy.

My position is that I'd expect a reasonable chance that an arbitrary, frequent LW participant playing this game against you would also end with 10 (C,C)s. I'd suggest actually running this as an experiment if I didn't think I'd lose money on the deal...

Jonathan_Graehl20 July 2009 07:28:50PM* 2 points [-]

Harsher dilemmas (more meaningful stake, loss from an unreciprocated cooperation that may not be recoverable in the remaining iterations) would make me increasingly hesitant to assume "this person is probably like me".

This makes me feel like I'm in "no true Scotsman" territory; nobody "like me" would fail to optimistically attempt cooperation. But if caring more about the difference in outcomes makes me less optimistic about other-similarity, then in a hypothetical where I am matched up against essentially myself (but I don't know this), I defeat myself exactly when it matters - when the payoff is the highest.

MBlume20 July 2009 07:35:25PM* 8 points [-]

and this is exactly the problem: If your behavior on the prisoner's dilemma changes with the size of the outcome, then you aren't really playing the prisoner's dilemma. Your calculation in the low-payoff case was being confused by other terms in your utility function, terms for being someone who cooperates -- terms that didn't scale.

Jonathan_Graehl21 July 2009 01:27:28AM0 points [-]

Yes, my point was that my variable skepticism is surely evidence of bias or rationalization, and that we can't learn much from "mild" PD. I do also agree that warm fuzzies from being a cooperator don't scale.

Gavin20 July 2009 07:14:49PM* 2 points [-]

If we wanted to be clever we could include Eliezer playing against himself (just report back to him the same value) as a possibility, though if it's a high probability that he faces himself it seems pointless.

I'd be happy to front the (likely loss of) $10.

It might be possible to make it more like a the true prisoner's dilemma if we could come up with two players each of whom want the money donated to a cause that they consider worthy but the other player opposes or considers ineffective.

Though I have plenty of paperclips, sadly I lack the resources to successfully simulate Eliezer's true PD . . .

SoullessAutomaton20 July 2009 10:11:21PM1 point [-]

I'd be happy to front the (likely loss of) $10.

Meaningful results would probably require several iterations of the game, though, with different players (also, the expected loss in my scenario was $5 per game).

I seem to recall Douglas Hofstadter did an experiment with several of his more rational friends, and was distressed by the globally rather suboptimal outcome. I do wonder if we on LW would do better, with or without Eliezer?

srdiamond20 July 2009 04:35:23AM1 point [-]

Obviously, the only reflectively consistent answer in this case is "Yes - here's the $1000", because if you're an agent who expects to encounter many problems like this in the future, you will self- modify to be the sort of agent who answers "Yes" to this sort of question - just like with Newcomb's Problem or Parfit's Hitchhiker.

But I don't have a general theory which replies "Yes".

If you think being a rational agent includes an infinite ability to modify oneself, then the game has no solution because such an agent would be unable to guarantee the new trait's continued, unmodified existence without sacrificing the rationality that is a premise of the game.

So, for the game to be solvable, the self-modification ability must have limits, and the limits appear as parameters in the formalism.

Liron20 July 2009 08:20:26AM3 points [-]

An agent can guarantee the persistence of a trait by self-modifying into code that provably can never lead to the modification of that trait. A trivial example is that the agent can self-modify into code that preserves a trait and can't self-modify.

srdiamond20 July 2009 04:54:28PM0 points [-]

But more precisely, an agent can guarantee the persistence of a trait only "by self-modifying into code that provably can nevenrlead to the modification of that trait." Anything tied to rationality that guarantees the existence of a conforming modification at the time of offer must guarantee the continued existence of the same capacity after the modification, making the proposed self-modification self-contradictory.

SoullessAutomaton20 July 2009 12:22:32AM* 2 points [-]

As a first off-the-cuff thought, the infinite regress of conditionality sounds suspiciously close to general recursion. Do you have any guarantee that a fully general theory that gives a decision wouldn't be equivalent to a Halting Oracle?

ETA: If you don't have such a guarantee, I would submit that the first priority should be either securing one, or proving isomorphism to the Entscheidungsproblem and, thus, the impossibility of the fully general solution.

Eliezer_Yudkowsky20 July 2009 01:51:02AM0 points [-]

Obviously any game theory is equivalent to the halting problem if your opponents can be controlled by arbitrary Turing machines. But this sort of infinite regress doesn't come from a big complex starting point, it comes from a simple starting point that keeps passing the recursive buck.

SoullessAutomaton20 July 2009 02:41:01AM* 2 points [-]

I understand that much, but if there's anything I've learned from computer science it's that turing completeness can pop up in the strangest places.

I of course admit it was an off-the-cuff, intuitive thought, but the structure of the problem reminds me vaguely of the combinatorial calculus, particularly Smullyan's Mockingbird forest.

thomblake20 July 2009 02:58:06AM0 points [-]

This was a clever ploy to distract me with logic problems, wasn't it?

SoullessAutomaton20 July 2009 03:00:37AM1 point [-]

No, but mentioning the rest of Smullyan's books might be.

JulianMorrison20 July 2009 01:02:50AM0 points [-]

Hah! Same thought!

What's the moral action when the moral problem seems to diverge, and you don't have the compute resources to follow it any further? Flip a coin?

SoullessAutomaton20 July 2009 01:11:34AM0 points [-]

I would suggest that the best move would be to attempt to coerce the situation into one where the infinite regress is subject to analysis without Halting issues, in a way that is predicted to be least likely to have negative impacts.

Remember, Halting is only undecidable in the general case, and it is often quite tractable to decide on some subset of computations.

JulianMorrison20 July 2009 01:45:57AM0 points [-]

Unless you're saying "don't answer the question, use the answer from a different but closely related one", then a moral problem is either going to be known transformable into a decidable halting problem, or not. And if not, my above question remains unanswered.

SoullessAutomaton20 July 2009 01:57:17AM* 0 points [-]

I meant something more like "don't make a decision, change the context such that there is a different question that must be answered". In practice this would probably mean colluding to enforce some sort of amoral constraints on all parties.

I grant that at some point you may get irretrievably stuck. And no, I don't have an answer, sorry. Chosing randomly is likely to be better than inaction, though.

PhilGoetz06 August 2009 01:16:48AM0 points [-]

Another stumper was presented to me by Robin Hanson at an OBLW meetup. Suppose you have ten ideal game-theoretic selfish agents and a pie to be divided by majority vote. Let's say that six of them form a coalition and decide to vote to divide the pie among themselves, one-sixth each. But then two of them think, "Hey, this leaves four agents out in the cold. We'll get together with those four agents and offer them to divide half the pie among the four of them, leaving one quarter apiece for the two of us. We get a larger share than one-sixth that way, and they get a larger share than zero, so it's an improvement from the perspectives of all six of us - they should take the deal." And those six then form a new coalition and redivide the pie. Then another two of the agents think: "The two of us are getting one-eighth apiece, while four other agents are getting zero - we should form a coalition with them, and by majority vote, give each of us one-sixth."

How I would approach this problem:

Suppose that it is easier to adjust the proportions within your existing coalitions than to switch coalitions. An agent will not consider switching coalitions until it cannot improve its share in its present coalition. Therefore, any coalition will reach a stable configuration before you need consider agents switching to another coalition. If you can show that the only stable configuration is an equal division, then there will be no coalition-switching.

You can probably show that any agent receiving less than its share can receive a larger share by switching to a different coalition. Assume the other agents know this proof. You may then be able to show that they can hold onto a larger share by giving that agent its fair share than by letting it quit the coalition. You may need to use derivatives to do this. Or not.

MrHen22 July 2009 05:50:36PM0 points [-]

"I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. The coin came up heads - can I have $1000?"

Err... pardon my noobishness but I am failing to see the game here. This is mostly me working it out audibly.

A less Omega version of this game involves flipping a coin, getting $100 on tails, losing $1 on heads. Using humans, it makes sense to have an arbiter holding $100 from the Flipper and $1 from the Guesser. With this setup, the Guesser should always play.

If the Flipper is Omega and offered the same game with the same fair arbiter there is no reason to not play. If Omega was a perfect predictor and knew what the coin would do before flipping it, should we play? If Omega commits to playing the game regardless of the prediction, yes, we should play.

If the arbiter is removed and Omega stands in as the arbiter, we should still play because it is assumed that Omega is honest and will pay out if tails appears. Even if we prepay before the coin flip, we should still play.

If the Flipper flips the coin before we prepay the arbiter, it should not matter. This is equivalent to the scenario of Omega being a perfect predictor.

The only two changes remaining are:

  • Us knowing the coin flip before we agree to play
  • Us not paying before we see the coin flip

The latter assumes we could renege on payment after seeing the coin but I highly doubt Omega would play the game with someone like this since this would be known to a perfect predictor. This means we can completely eliminate the arbiter.

This leaves us at the following scenario:

I just flipped a fair coin. I decided, before I flipped the coin, that if it came up heads, I would ask you for $1000. And if it came up tails, I would give you $1,000,000 if and only if I predicted that you would give me $1000 if the coin had come up heads. I know the result of the coin but will wait for you to agree to the game before I tell you what it is. Do you want to play?

The answer is "Yes." Why does it matter if Omega blurts out the answer beforehand? Because we know we will "lose"?

In my opinion this is a trivial problem. If we assume that Omega is (a) fair and (b) accurate we would always play the game. Omega is predefined to not take advantage of us. We just got unlucky, which is perfectly acceptable as long as we do not know the answer beforehand.

So... what am I missing? It seems like there is mental warning when imagining myself before Omega and handing him $1000 when I "never had a shot". But I did have a shot. I would never pay anyone other than Omega, but I am assuming Omega is being completely honest.

Why would anyone answer "No"? The basic answer, "Because you do not want to lose $1000" seems completely irrational to me. I can see why it would appear rational, but Omega's definition makes it irrational.

Vladimir_Nesov22 July 2009 08:28:29PM1 point [-]

See counterfactual mugging for an extended discussion in comments.

MrHen22 July 2009 08:35:26PM0 points [-]

Thanks.

Jotaf21 July 2009 03:47:58AM0 points [-]

I don't really wanna rock the boat here, but in the words of one of my professors, it "needs more math".

I predict it will go somewhat like this: you specify the problem in terms of A implies B, etc; you find out there's infinite recursion; you prove that the solution doesn't exist. Reductio ad absurdum anyone?

ArthurB21 July 2009 03:10:39AM0 points [-]

Instead of assuming that other will behave as a function of our choice, we look at the rest of the universe (including other sentient being, including Omega) as a system where our own code is part of the data.

Given a prior on physics, there is a well defined code that maximizes our expected utility.

That code wins. It one boxes, it pays Omega when the coin falls on heads etc.

I think this solves the infinite regress problem, albeit in a very unpractical way,

Vladimir_Nesov21 July 2009 01:34:42PM0 points [-]

This doesn't sound obviously wrong, but is too vague even for an informal answer.

ArthurB21 July 2009 03:05:11PM* 0 points [-]

Well, if you want practicality, I think Omega problems can be disregarded, they're not realistic. It seems that the only feature needed for the real world is the ability to make trusted promises as we encounter the need to make them.

If we are not concerned with practicality but the theoretical problem behind these paradoxes, the key is that other agents make prediction on your behavior, which is the same as saying they have a theory of mind, which is simply a belief distribution over your own code.

To win, you should take the actions that make their belief about your own code favorable to you, which can include lying, or modifying your own code and showing it to make your point.

It's not our choice that matters in these problem but our choosing algorithm.

Vladimir_Nesov21 July 2009 06:30:51PM* 0 points [-]

Again, if you can state same with precision, it could be valuable, while on this level my reply is "So?".

ArthurB21 July 2009 06:45:39PM* 0 points [-]

I confess I do not grasp the problem well enough to see where the problem lies in my comment. I am trying to formalize the problem, and I think the formalism I describe is sensible.

Once again, I'll reword it but I think you'll still find it too vague : to win, one must act rationally and the set of possible action includes modifying one's code.

The question was

My timeless decision theory only functions in cases where the other agents' decisions can be viewed as functions of one argument, that argument being your own choice in that particular case - either by specification (as in Newcomb's Problem) or by symmetry (as in the Prisoner's Dilemma). If their decision is allowed to depend on how your decision depends on their decision - like saying, "I'll cooperate, not 'if the other agent cooperates', but only if the other agent cooperates if and only if I cooperate - if I predict the other agent to cooperate unconditionally, then I'll just defect" - then in general I do not know how to resolve the resulting infinite regress of conditionality, except in the special case of predictable symmetry

I do not know the specifics of Eliezer's timeless decision theory, but it seems to me that if one looks at the decision process of other based on their belief of your code, not on your decisions, there is no infinite regression progress.

You could say : Ah but there is your belief about an agent's code, then his belief about your belief about his code, then your belief about his belief about your belief about his code, and that looks like an infinite regression. However, there is really no regression since "his belief about your belief about his code" is entirely contained in "your belief about his code".

Vladimir_Nesov21 July 2009 09:06:51PM0 points [-]

Thanks, this comment makes your point clearer. See cousin_it's post Re-formalizing PD.

JamesAndrix20 July 2009 04:00:07PM0 points [-]

I swear I'll give you a PhD if you write the thesis. On fancy paper and everything.

Would timeless decision theory handle negotiation with your future self? For example if a timeless decision agent likes paperclips today but you knows it is going to be modified to like apples tomorrow, (and not care a bit about paperclips,) will it abstain from destroying the apple orchard, and its future self abstain from destroying the paperclips in exchange?

And is negotiation the right way to think about reconciling the difference between what I now want and what a predicted smarter, grown up, more knowledgeable version of me would want? or am I going the wrong way?

MBlume20 July 2009 06:09:20PM* 3 points [-]

to talk about turning a paperclip maximizer into an apple maximizer is needlessly confusing. Better to talk about destroying a paperclip maximizer and creating an apple maximizer. And yes, timeless decision theory should allow these two agents to negotiate, though it gets confusing fast.

Peter_de_Blanc20 July 2009 06:03:19PM2 points [-]

In what sense is that a future self?

JamesAndrix20 July 2009 06:33:41PM1 point [-]

In the paperclip->apple scenario, in the sense that it retains the memory and inherits the assets of the original, and everything else that keeps you 'you' when you start wanting something different.

In the simulation scenario, I'm not sure.

nawitus20 July 2009 10:58:38AM0 points [-]

If you're an AI, you do not have to (and shouldn't) pay the first $1000, you can just self-modify to pay $1000 in all the following coin flips (if we assume that the AI can easily rewrite/modify it's own behaviour in this way). Human brains probably don't have this capability, so I guess paying $1000 even in the first game makes sense.

JamesAndrix20 July 2009 07:26:33PM0 points [-]

That assumes that you didn't expect to face problems like that in the future before omega presented you with the problem, but do expect to face problems like that in the future after omega presents you with the problem. It doesn't work at all if you only get one shot at it. (and you should already be a person who would pay, just in case you do)

CannibalSmith20 July 2009 10:37:22AM* 0 points [-]

I stopped reading at "Yes, you say". The correct solution is obviously obvious: you give him your credit card and promise to tell the PIN number once you're at the ATM.

You could also try to knock him off his bike.

JGWeissman20 July 2009 11:03:39PM1 point [-]

It seems quite convenient that you can physically give him your credit card.

timtyler20 July 2009 07:03:33AM* 0 points [-]

I had a look at the existing literature. It seems as though the idea of a "rational agent" who takes one box goes quite a way back:

"Rationality, Dispositions, and the Newcomb Paradox" (Philosophical Studies, volume 88, number 1, October 1997)

Abstract: "In this article I point out two important ambiguities in the paradox. [...] I draw an analogy to Parfit's hitchhiker example which explains why some people are tempted to claim that taking only one box is rational. I go on to claim that although the ideal strategy is to adopt a necessitating disposition to take only one box, it is never rational to choose only one box. [...] I conclude that the rational action for a player in the Newcomb Paradox is taking both boxes, but that rational agents will usually take only one box because they have rationally adopted the disposition to do so."

Warrigal20 July 2009 06:07:06AM0 points [-]

At the point where Omega asks me this question, I already know that the coin came up heads, so I already know I'm not going to get the million. It seems like I want to decide "as if" I don't know whether the coin came up heads or tails, and then implement that decision even if I know the coin came up heads. But I don't have a good formal way of talking about how my decision in one state of knowledge has to be determined by the decision I would make if I occupied a different epistemic state, conditioning using the probability previously possessed by events I have since learned the outcome of...

Well, it seems to me that you always want to do this. According to timeless-reflectively-consistent-yada-yada decision theory, the best decision to make is to follow the strategy that you would have chosen at the very beginning.

The precise constraint this problem places on you is that the context you make your decision in is that there is a 50% chance that your decision results in you getting $1,000,000 instead of nothing.

Treat your observations as putting you in the context in which you make your decision.

Bo10201020 July 2009 12:38:59AM* 0 points [-]

On dividing the pie, I ran across this in an introduction to game theory class. I think the instructor wanted us to figure out that there's a regress and see how we dealt with it. Different groups did different things, but two members of my group wanted to be nice and not cut anyone out, so our collective behavior was not particularly rational. "It's not about being nice! It's about getting the points!" I kept saying, but at the time the group was about 16 (and so was I), and had varying math backgrounds, and some were less interested in that aspect of the game.

I think at least one group realized there would always be a way to undermine the coalitions that assembled, and cut everyone in equally.

dclayh20 July 2009 01:22:14AM* 0 points [-]

two members of my group wanted to be nice and not cut anyone out, so our collective behavior was not particularly rational.

One might guess that evolution granted us a strong fairness drive to avoid just these sorts of decision regresses.

kpreid20 July 2009 02:11:01AM* 0 points [-]

Statement of the obvious: Spending excessive time deciding is neither rational nor evolutionarily favored.

Eliezer_Yudkowsky20 July 2009 01:55:30AM0 points [-]
dclayh20 July 2009 05:00:30AM0 points [-]

It's not group selection: if group A splits things evenly and moves on, while group B goes around and around with fractious coalitions until a tiger comes along and eats them, then being in group A confers an individual advantage.

Clearly evolution also gave us the ability to make elaborate justifications as to why we, particularly, deserve more than an equal share. But that hardly disallows the fairness heuristic as a fallback option when the discussion is taking longer than it deserves. (And some people just have the stamina to keep arguing until everyone else has given up in disgust. These usually become middle managers or Congressmen.)

orthonormal20 July 2009 05:47:04AM0 points [-]

What you just described is group selection, and thus highly unlikely.

It's to your individual benefit to be more (unconsciously) selfish and calculating in these situations, whether the other people in your group have a fairness drive or not.

Vladimir_Nesov20 July 2009 11:40:03AM* 2 points [-]

It's to your individual benefit to be more (unconsciously) selfish and calculating in these situations, whether the other people in your group have a fairness drive or not.

Not if you are punished for selfishness. I'm not sure how reasonable the following analysis it (since I didn't study this kind of thing at all); it suggests that fairness is a stable strategy, and given some constraints a more feasible one than selfishness:

M. A. Nowak, et al. (2000). `Fairness versus reason in the ultimatum game.'. Science 289(5485):1773-1775. (PDF)

orthonormal20 July 2009 05:28:45PM0 points [-]

See reply to Tim Tyler.

timtyler20 July 2009 07:39:00AM1 point [-]

...and if your companions have circuitry for detecting and punishing selfish behaviour - what then? That's how the "fairness drive" is implemented - get mad and punish cheaters until it hurts. That way, cheaters learn that crime doesn't pay - and act fairly.

orthonormal20 July 2009 05:27:15PM1 point [-]

I agree. But you see how this individual selection pressure towards fairness is different from the group selection pressure that dclayh was actually asserting?

timtyler20 July 2009 07:17:23PM-2 points [-]

You and EY seem to be the people who are talking about group selection.

dclayh20 July 2009 06:36:48AM0 points [-]

It's to your individual benefit to be more (unconsciously) selfish and calculating in these situations

Not when the cost (including opportunity cost) of doing the calculating outweighs the benefit it would give you.

orthonormal20 July 2009 05:32:15PM1 point [-]

You're introducing weaker and less plausible factors to rescue a mistaken assertion. It's not worth it.

As pointed out below in this thread, the fairness drive almost certainly comes from the individual pressure of cheaters being punished, not from any group pressure as you tried to say above.