Previously: Simplified PokerSimplified Poker Strategy

Related (Eliezer Yudkowsky): Meta Honesty: Firming Honesty Around Its Edge Cases

About forty people submitted programs that used randomization. Several of those random programs correctly solved for the Nash equilibrium, which did well.

I submitted the only deterministic program.

I won going away.

I broke even against the Nash programs, utterly crushed vulnerable programs, and lost a non-trivial amount to only one program, a resounding heads-up defeat handed to me by the only other top-level gamer in the room, fellow Magic: the Gathering semi-pro player Eric Phillips.

Like me, Eric had an escape hatch in his program that reversed his decisions (rather than retreating to Nash) if he was losing by enough. Unlike me, his actually got implemented – the professor decided that given how well I was going to do anyway, I’d hit the complexity limit, so my escape hatch was left out.

Rather than get into implementation details, or proving the Nash equilibrium, I’ll discuss two things: How few levels people play on, and the motivating point: How things are already more distinct and random than you think they are, and how to take advantage of that.

Next Level

In the comments to the first two posts, most people focused on finding the Nash equilibrium. A few people tried to do something that would better exploit obviously stupid players, but none that tried to discover the opponents’ strategy.

The only reason not to play an exploitable strategy is if you’re worried someone will exploit it!

Consider thinking as having levels. Level N+1 attempts to optimize against Levels N and below, or just Level N.

Level 0 isn’t thinking or optimizing, so higher levels all crush it, mostly.

Level 1 thinking picking actions that are generically powerful, likely to lead to good outcomes, without considering what opponents might do. Do ‘natural’ things.

Level 2 thinking considers what to do against opponents using Level 1 thinking. You try to counter the ‘natural’ actions, and exploit standard behaviors.

Level 3 counters Level 2. You assume your opponents are trying to exploit basic behaviors, and attempt to exploit those trying to do this.

Level 4 counters Level 3. You assume your opponents are trying to exploit exploitative behavior, and acting accordingly. So you do what’s best against that.

And so on. Being caught one level below your opponent is death. Being one level ahead is amazing. Two or more levels different, and strange things happen.

Life is messy. Political campaigns, major corporation strategic plans, theaters of war. The big stuff. A lot of Level 0. Level 1 is industry standard. Level 2 is inspired, exceptional. Level 3 is the stuff of legend.

In well-defined situations where losers are strongly filtered out, such as tournaments, you can get glimmers of high level behavior. But mostly, you get it by changing the view of what Level 1 is. The old Level 2 and Level 3 strategies become the new ‘rules of the game’. The brain chunks them into basic actions. Only then can the cycle begin again.

Also, ‘getting’ someone with Level 3 thinking risks giving the game away. What level should one be on next time, then?

Effective Randomization

There is a strong instinct that whenever predictable behavior can be punished, one must randomize one’s behavior.

That’s true. But only from another’s point of view. You can’t be predictable, but that doesn’t mean you need to be random.

It’s another form of illusion of transparency. If you think about a problem differently than others, their attempts to predict or model you will get it wrong. The only requirement is that your decision process is complex, and doesn’t reduce to a simple model.

If you also have different information than they do, that’s even better.

When analyzing the hand histories, I know what cards I was dealt, and use that to deduce what cards my opponent likely held, and in turn guess their behaviors. Thus, my opponent likely has no clue either what process I’m using, how I implemented it, or what data I’m feeding into it. All of that is effective randomization.

If that reduces to me always betting with a 1, they might catch on eventually. But since I’m constantly re-evaluating what they’re doing, and reacting accordingly, on an impossible-to-predict schedule, such catching on might end up backfiring. It’s the same at a human poker table. If you’re good enough at reading people to figure out what I’m thinking and stay one step ahead, I need to retreat to Nash, but that’s rare. Mostly, I only need to worry, at most, if my actions are effectively doing something simple and easy to model.

Playing the same exact scenarios, or with the same exact people, or both, for long enough, both increases the amount of data available for analysis, and reduces the randomness behind it. Eventually, such tactics stop working. But it takes a while, and the more you care about long histories in non-obvious ways, the longer it will take.

Rather than be actually random, instead one adjusts when one’s behavior has sufficiently deviated from what would look random, such that others will likely adjust to account for it. That adjustment, too, need not be random.

Rushing into doing things to mix up your play, before others have any data to work with, only leaves value on the table.

One strong strategy when one needs to mix it up is to do what the details favor. Thus, if there’s something you need to occasionally do, and today is an unusually good day for it, or now an especially good time, do it now, and adjust your threshold for that depending on how often you’ve done it recently.

A mistake I often make is to choose actions as if I was assuming others know my decision algorithm and will exploit that to extract all the information. Most of the time this is silly.

This brings us to the issue of Glomarization.

Glomarization

Are you harboring any criminals? Did you rob a bank? Is there a tap on my phone? Does this make me look fat?

If when the answer is no I would tell you no, then refusing to answer is the same as saying yes. So if you want to avoid lying, and want to keep secrets, you need to sometimes refuse to answer questions, to avoid making refusing to answer too meaningful an action. Eliezer discussed such issues recently.

This section was the original motivation for writing the poker series up now, but having written it, I think a full treatment should mostly just be its own thing. And I’m not happy with my ability to explain these concepts concisely. But a few thoughts here.

The advantage of fully explicit meta-honesty, telling people exactly under what conditions you would lie or refuse to share information, is that it protects a system of full, reliable honesty.

The problem with fully explicit meta-honesty is that it vastly expands the necessary amount of Glomarization to say exactly when you would use it. 

Eliezer correctly points out that if the Feds ask you where you were last night, your answer of ‘I can neither confirm or deny where I was last night’ is going to sound mighty suspicious regardless of how often you answer that way. Saying ‘none of your goddamn business’ is only marginally better. Also, letting them know that you always refuse to answer that question might not be the best way to make them think you’re less suspicious.

This means both that full Glomarization isn’t practical unless (this actually does come up) your response to a question can reliably be ‘that’s a trap!’.

However, partial Glomarization is fine. As long as you mix in some refusing to answer when the answer wouldn’t hurt you, people don’t know much. Most importantly, they don’t know how often you’d refuse to answer. 

If the last five times you’ve refused to answer if there was a dragon in your garage, there was a dragon in your garage, your refusal to answer is rather strong evidence there’s a dragon in your garage.

If it only happened one of the last five times, then there’s certainly a Bayesian update one can make, but you don’t know how often there’s a Glamorization there, so it’s hard to know how much to update on that. The key question is, what’s the threshold where they feel the need to look in your garage? Can you muddy the waters enough to avoid that?

Once you’re doing that, it is almost certainly fine to answer ‘no’ when it especially matters that they know there isn’t a dragon there, because they don’t know when it’s important, or what rule you’re following. If you went and told them exactly when you answer the question, it would be bad. But if they’re not sure, it’s fine.

One can complement that by understanding how conversations and topics develop, and not set yourself up for questions you don’t want to answer. If you have a dragon in your garage and don’t want to lie about it or reveal that it’s there, it’s a really bad idea to talk about the idea of dragons in garages. Someone is going to ask. So when your refusal to answer would be suspicious, especially when it would be a potential sign of a heretical belief, the best strategy is to not get into position to get asked.

Which in turn, means avoiding perfectly harmless things gently, invisibly, without saying that this is what you’re doing. Posts that don’t get written, statements not made, rather than questions not answered. As a new practitioner of such arts, hard and fast rules are good. As an expert, they only serve to give the game away. ‘

Remember the illusion of transparency. Your counterfactual selves would need to act differently. But if no one knows that, it’s not a problem.

 

 

 

 

 

New Comment
2 comments, sorted by Click to highlight new comments since:

>I broke even against the Nash programs, utterly crushed vulnerable programs, and lost a non-trivial amount to only one program, a resounding heads-up defeat handed to me by the only other top-level gamer in the room, fellow Magic: the Gathering semi-pro player Eric Phillips.

Great series.

Do you have the win/loss stats or final amounts by strategy? Or a rough approximation from memory?

In the comments to the first two posts, most people focused on finding the Nash equilibrium. A few people tried to do something that would better exploit obviously stupid players, but none that tried to discover the opponents’ strategy.

I mean, I explicitly considered that, I just thought that it was unlikely to pay off with only 50 rounds. I am curious to see how many hands it took your strategy to correctly identify its opponent--it's possible that I wasn't accounting for the fact that most opponents would only take you down half of the pathways (and thus is twice as easy to learn as a general agent).