b1shop comments on Harry Potter and the Methods of Rationality discussion thread, part 3 - Less Wrong

5 Post author: Unnamed 30 August 2010 05:37AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (560)

You are viewing a single comment's thread.

Comment author: b1shop 31 August 2010 04:28:19AM 3 points [-]

I have a question about TDT's application in 33 of HP:MoR.

I didn't say you should just automatically cooperate. Not on a true Prisoner's Dilemma like this one. What I said was that when you choose, you shouldn't think like you're choosing for just yourself, or like you're choosing for everyone. You should think like you're choosing for all the people who are similar enough to you that they'll probably do the same thing you do for the same reasons.

Should businessmen collude on one-shot pricing? The decision theory I learned in school says "Never!," but I can see Harry's beliefs going in either direction.

Links to a good summary of TDT are welcome.

Comment author: Perplexed 31 August 2010 06:00:31AM *  8 points [-]

good TDT summary

Well, not really good. Merely best.

Comment author: wedrifid 31 August 2010 04:40:00AM 4 points [-]

Links to a good summary of TDT are welcome.

*cough* They certainly would be!

Comment author: Eliezer_Yudkowsky 31 August 2010 04:13:48PM 3 points [-]

Should businessmen collude on one-shot pricing?

All they need to do is find someone who can help enforce the decision, or make it matter reputationally to friends, or iterate it, and they don't need to worry about whether they're doing the same thing for the same reasons.

Comment author: PhilGoetz 31 August 2010 06:31:58PM 3 points [-]

I think we should be charitable (make the interpretation that makes the question most sensible or interesting), and assume b1shop is assuming those conditions don't apply.

Comment author: SilasBarta 03 September 2010 07:16:47PM 2 points [-]

What about new entrants?

Comment author: PhilGoetz 31 August 2010 03:52:27PM 2 points [-]

I don't understand TDT, and the people who write about it are very smart; but that passage made it sound like sympathetic magic. You're not choosing for all the people who are similar to you.

You can argue that doing so leads to better outcomes in PDs - but then you're really just arguing for cooperation, not choosing the decision theory that maximizes your utility. "All the people who are similar enough to you that they'll probably do the same thing you do for the same reasons" just means "All the other cooperators". So saying "I implement timeless decision theory" seems to be a way to clothe saying "I cooperate on PD" in bogus Bayesian respectability.

Comment author: orthonormal 31 August 2010 09:18:18PM *  8 points [-]

That's because the passage isn't actually about TDT; Eliezer is trying to avoid throwing anachronisms into a story set in the 1990s. It's instead about the closest thing that existed at the time, Hofstadter's idea of superrationality, which does (IMO) suffer from the flaw you posit.

Comment author: WrongBot 31 August 2010 04:07:25PM 7 points [-]

"I always cooperate", "I always cooperate with agents I know will cooperate with me", and "I always cooperate with agents I know will cooperate with me iff I cooperate" are all separate decision making processes. Depending on who they're playing the PD with, they can make different decisions.

Comment author: TobyBartels 01 September 2010 02:00:54AM 2 points [-]

OK, this is off-topic, but why do people stop there? Why not ‘I always cooperate with agents I know will cooperate with me iff I cooperate iff they cooperate’, and so on? These are not equivalent.

Incidentally, in classical logic, I cooperate iff (you cooperate iff (I cooperate iff you cooperate)) is always true. (But we don't really have that here, because the modal operator ‘I know’ interferes.)

Comment author: WrongBot 01 September 2010 02:18:31AM 4 points [-]

It isn't necessary to stop there, and you can follow that chain pretty much infinitely.

I think TDT jumps to the end of that regression by cooperating iff you and I are both implementations of the same abstract computation.

Comment author: TobyBartels 01 September 2010 03:04:45AM 0 points [-]

Yeah, but each stage is rather different from the one before; at no stage would you actually cooperate with yourself, since those ‘iff’s are so strict.

But if this (which I've seen here before) is not supposed to be what TDT really says, but just some handwaving to give the idea, then that's all right.

Comment author: WrongBot 01 September 2010 03:37:03AM 2 points [-]

TDT hasn't been published in anything resembling a finished form, and I'm a curious amateur when it comes to decision theory, at best. I imagine there's more to it, but I can't really speculate about what it might be.

Comment author: saturn 01 September 2010 04:02:06AM *  1 point [-]

People stop there because going further starts hurting instead of helping. The PD payoff matrix implies that I want to avoid cooperating if I can, but it's more important that I get you to cooperate, even if in order to do that, I have to cooperate. Adding more restrictions on your reasons for cooperating can't make the outcome better for me, I only care that you do it.

Comment author: TobyBartels 01 September 2010 11:37:53PM *  3 points [-]

Going one step further doesn't (generally) add restrictions; it just changes them. Consider:

  1. I will cooperate if I know anything.
  2. I will cooperate if I know that you will cooperate.
  3. I will cooperate if I know that you will cooperate iff I cooperate.
  4. I will cooperate if I know that you will cooperate iff I cooperate iff you cooperate.
  5. I will cooperate if I know that you will cooperate iff I cooperate iff you cooperate iff I cooperate.

Using classical logic after the modal operator, these reduce to:

  1. I will cooperate if I know anything.
  2. I will cooperate if I know that you will cooperate.
  3. I will cooperate if I know that we will perform the same action.
  4. I will cooperate if I know that I will cooperate.
  5. I will cooperate if I know anything.
  6. … (repeats)

Actually, now that I write it out like this, I can see why one would choose (3)!

It's important that there's an ‘if I know that’ instead of an ‘iff’, which I've seen before. But the version above is how I parsed WrongBot's statement, so hopefully WrongBot quoted it correctly. (The search function is not helping me find an original.)

Comment author: PhilGoetz 31 August 2010 04:16:44PM *  2 points [-]

When I said "I cooperate on PD", I didn't mean to lump all those together. It's shorthand for those who cooperate in the way resulting from TDT. My point is that TDT itself, as described in that one sentence from the Harry Potter story, is no different from saying "People should cooperate because then they will all get better payoffs" (although in a more specific way).

Comment author: wnoise 31 August 2010 06:02:13PM 3 points [-]

It's pretty clear that if your opponent in a symmetric game with the same information is running the exact same deterministic algorithm as you with the exact same computational resources, you'll have to make the same move. This essentially "vanishes" the off-diagonal boxes. The TDT proponents want to take advantage of this to have a "better, more winning" decision theory, which would give both of them the (C, C) payoff rather than (D, D) even in a one shot prisoner's dilemma.. Grafting this on by itself currently buys little, as there is never this degree of symmetry.

Now, suppose we are playing a symmetric game and 90% confident that we are playing a clone with the exact same computational resources, and 10% confident that we're playing some one random (and that if we're playing a clone that it's 90% confident that we're a clone that's 90% confident that ...) What do we do in this case?

I claim that in this case we can still take advantage of this, as long as the probability is high enough relative to the utility losses. We need to do both the calculations for "they act independently" (Assume they choose the Nash equilibrium, if we don't know anything else about their decision-making) and "they act the same" (diagonal) and merge them. The proper thing as always, is choosing based on the expected utility, which is just the probability weighted utility for each choice.

For the standard prisoner's dilemma, where (C, C) = (3, 3), (C, D) = (0, 5), (D, C) = (5, 0), and (D, D) = 1. We can see that no matter what your opponent chooses, it's better to Defect than cooperate, so the Nash equilibrium is the pure strategy D. If, on the other hand, your opponent is a computational clone, you choosing to Cooperate gets you the (C,C) box, not the (C, D) box. So, we have 0.9 * 3 + 0.1 * 0 = 2.7 for Cooperate, and 0.9 *1 + 0.1 * 1 = 1for Defect. Cooperation is the better choice, and remains so up until p < 1/3.

I also claim this is pretty much the limit of what we can do. If the algorithm you share is non-deterministic (with different random number sources), then off-diagonal results are now possible. If the computational resources are different, then we are analyzing to a different recursive cutoff than our opponent, and so may come to different conclusions. If we have the same resources, but the problem is asymmetric, than we can't simply say "they'll do the same thing we will". All the boxes must then be considered.

In some sense we also get to choose the algorithm we use. If the game is close enough to symmetric we can choose to play as if it is, and so long as that decision process is symmetric, we recover the result. If we were computers we could choose to avoid using true random number generators, and only use the same pseudorandom number generators (we do want to keep the ability to act randomly against non-clones).

Comment author: NihilCredo 01 September 2010 12:36:49AM 0 points [-]

we're playing some one random

We need to do both the calculations for "they act independently" (Assume they choose the Nash equilibrium, if we don't know anything else about their decision-making)

Why do we assume that?

Comment author: wnoise 01 September 2010 06:12:49AM 0 points [-]

In order to figure out our best move, we need to something about how the other person will move. It's generally a good idea to assume your opponent is indeed rational and utility maximizing. The Nash equilibrium strategy is the one that is stable such that neither side can do better by switching, which gives it a great deal of stablity. As such, it's often a good model for what people will do.

In this case, the full machinery isn't necessary. Defect strictly dominates Cooperate in every choice, so barring special considerations like TDT, it's what anyone with a modicum of sense will do.

Comment author: TobyBartels 02 September 2010 12:32:28AM 5 points [-]

Nash equilibrium […]'s often a good model for what people will do.

Is this a fact? Where can I read about the evidence for this?

Comment author: wnoise 03 September 2010 06:13:18PM *  8 points [-]

I don't have any direct citations to current social science, no, but I'll tell a plausible story, and give some indirect citations.

Often? Yes. Always or near always? No. It depends crucially on the complexity of the game, the familiarity of the person playing the game, and the intelligence of the people playing.

Most people playing a game iteratively update their strategies with each game, learning both which moves of theirs worked better, and what their opponents are likely to do.

If both sides constantly do these updates, they are driven towards a Nash equilibrium. (It might be better to say they are driven away from anything that is not a Nash equilibrium.) The definition of a Nash equilibrium is a combination of strategies where neither side can do better by unilaterally altering their own move. If one side can do better by altering their own move, and realizes it, they will.

Complexity makes it harder for people to explore the space and find the Nash equilibrium. The more familiar with the game, the more likely you are to have found that neighborhood and play near it.

The smarter you are the faster this happens -- you can essentially model your opponent to figure out what they'll do, and respond. But your opponent can model you as well, so you must include that in your model of him, and so forth.

A surprisingly large number of people are essentially "level 0" modelers, who aren't influenced at all by a model of what their opponent does until they have gained data on it. They may use the "maximin" strategy that says pick the choice that maximizes your gain if in each of your possible choices your opponent does what helps you the least -- brutal pessimism. Similarly there is an optimistic "maximax" strategy -- pick the choice that maximizes your gain if in each of your possible choices, you opponent chooses what is best for you. Or there is the expected value over a flat distribution of the opponent's choices. And ones that output a number for each choice can be combined with some weighting. There are many other possibilities, of course, but if there is one choice that strictly dominates another (better for every value of the opponent's choice), they should not pick the one that is strictly dominated.

Another large number are "level 1" modelers -- they figure the other guy will do something given by one of these "level 0" models. There are a few "level 2" modelers that model the other guy as "level 1" , and level 3, and so forth. The Nash equilibria are the stable fixed points of this process, so are what "high enough" people will do. (Note that this process may not converge unless it starts at a fixed point -- consider rock, paper, scissors. But doing the equivalent of Cesàro summation will make it converge to the unique mixed strategy of randomly picking one).

In practice, you want to be exactly one level higher than your opponent. It is, of course, possible to model your opponent as a probabilistic mixture of these, though your best response is (in general) not going to be a probabilistic mixture of levels one-higher. And the best response to you modeling like that will not be a simple mixture of levels either.

So, why do I say many are level 0, and level 1? Well, consider the Guess 2/3 of the average game. People are restricted to numbers between 0 and 100. The person guessing closest to 2/3 of the mean wins (utility 1), and everyone else loses (utility 0), (pick the winner randomly in case of ties). What will a level 0 modeler do? The maximin strategy gives no restriction -- you can always lose. The maximax strategy eliminates everything above 66 2/3, because that's the maximum the average can possibly be. The equiprobabal expected value strategy puts the mean at 50, and suggests 33 1/3 (getting more peaked the more people there are and the more they are modeled as independent). A level 1 modeler realizes that everybody else should know at least this, so will probably guess around 2/3 * 33 1/3 = 22, perhaps higher realizing that some level 0s won't go through even the utility analysis and pick randomly. A level 2 modeler will be 2/3 of this, at roughly 14 or 15. This obviously converges to 0 for a "level infinity" modeler, and this is the Nash equilibrium.

But of course, very few people pick near 0, so it is not a good idea to pick 0. What is it rational to pick? From the link, a newspaper ran this with a prize, and the winner was at 21.6 (so the average was around 32.4). http://museumofmoney.org/exhibitions/games/numberpop.html references a study with college students winning with 24 (average of 36!), and a financial newspaper winning with 13 (average of 19.5). When I've seen histograms, they tend to have a spike at 33 1/3, indicating lots of pretty directly level 0. Curiously, there also tends to be a spike at 66 2/3 indicating a fair number really not quite understanding the game.

In this case, with an unfamiliar game, playing the Nash equilibrium is not optimal, and people don't do it. Levels 1-3 seem to be what wins in this case, with the majority effectively playing at levels 0-2. But I can guarantee that played multiple times with the same people this will go down to 0 quite rapidly. Tautologically, people are more familiar with the games they play more often, and will in practice be effectively higher -- not because they explicitly model higher levels, but because the "level 0" models are not random, but incorporate how people have played before (rather than how they should play now).

For the specific case of the Prisoner's Dilemma, all of the "level 0" strategies pick Defect, which is the Nash equilibrium. Even so, http://en.wikipedia.org/wiki/Prisoner's_dilemma#cite_note-2 claims that 40% cooperate. I would expect that this is from some innate valuing of fairness so that the rewards they get are not actually their utilities for those outcomes, but this is not clear.

EDITED: links fixed, and a bit of clarifications and grammar rewrites.

Comment author: TobyBartels 03 September 2010 09:08:00PM *  0 points [-]

Thanks; upvoted.

Comment author: NihilCredo 01 September 2010 01:56:37PM *  1 point [-]

It's generally a good idea to assume your opponent is indeed rational and utility maximizing.

Why? I'm not just asking rhetorical questions - your opponent may be non-rational in one of a trillion of very realistic, very common ways: maybe he'll cooperate because of his personal moral/religious views, maybe he'll defect because he doesn't want to think of himself as a 'sucker'. Or maybe he is rational but has made a mistake somewhere along the way of his musings on PD.

If all your formulation says is that you're playing against "someone random", at a minimum this means a randomly chosen human, most of which have a terribly flawed rational process. At a maximum it means any randomly-chosen or randomly-generated entity capable of picking an option - you could be playing against Eliza or Paul the Octopus for all you know.

Also, if you are going to assume that he is rational and not mistaken, why assume that he is just rational enough to do the obvious, zero-depth payoff analysis, but not rational enough to have a model any more sophisticated than that? (indeed, why should TDT be discarded as a "special consideration"?)

Comment author: wnoise 03 September 2010 07:26:36PM *  2 points [-]

Why? ... your opponent may be non-rational in one of a trillion of very realistic, very common ways ...

This is a fair point. For games people are not familiar with, have not played to death, this is absolutely true. For the games people have played a lot of, (or have had their genes and memes evolved under and contributing towards their moves), anything but what the Nash equilibria does must get outcompeted. Playing poker against a novice, there should be options much better than Nash. Against a pro? Not so much. Against a hustler (who isn't actually cheating)? Well, they're optimized to take advantage of novices, by leading them into bigger bets, so you can probably take advantage of this by not scaring them off by playing too well too soon.

maybe he'll cooperate because of his personal moral/religious views, maybe he'll defect because he doesn't want to think of himself as a 'sucker'.

These are, of course, part of the utility function -- and if you don't know that modeling is a bitch.

most of which have a terribly flawed rational process.

Most people do not play by reasoning it out to any great depth at a conscious level. They play by gut instinct, set by genes (and memes) and shapened by experience. For games where the genes are relevant, this is going to push towards Nash. Experience is also going to push towards Nash.

Comment author: LucasSloan 31 August 2010 04:46:03AM 1 point [-]

Should businessmen collude on one-shot pricing?

Yes. If the expect to be using the same decision algorithm, they will maximize their personal profits if they both set the price that gives the greatest total return.