Thanks for this summary!
In 2017 I commented on the two-player version here.
... if the player bets in [a winning] situation only when holding the best possible hand, then the opponents would know to always fold in response. To cope with this, Pluribus keeps track of the probability it would have reached the current situation with each possible hand according to its strategy. Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand, being careful to balance its strategy across all the hands so as to remain unpredictable to the opponent. Once this balanced strategy across all hands is computed, Pluribus then executes an action for the hand it is actually holding.
Human professional players are trying to approximate this level of balancedness as well, using computer programs ("solvers"). See this youtube video for an example of a hand with solver analysis. In order to get the solver analysis started, one needs to specify input hand ranges one expects people to have in the specific situations, as well as bet sizes for the solver to consider (more than just 2-3 bet sizes would be too much for the solver to handle). To specify those parameters, professionals can make guesses (sometimes based on data) about how other players play. Because the input parameters depend on human learned wisdom rather than worked out game theory, solvers can't quite be said to have solved poker.
So, like the computer, human players try to simplify the game tree in order to be able to approximate balanced play. However, this is much easier for computers. Pluribus knows its own counterfactuals perfectly, and it can make sure it always covers all the options for cards to have (in order to represent different board textures) and has the right number of bluffs paired with good hands for every state of the game given past actions.
It almost seems kind of easy to beat humans in this way, except that knowing how to simplify and then model the situations in the first place seemed to have been the bottleneck up until 2017.
Donk betting: some kind of uncommon play that's usually considered dumb (like a donkey). I didn't figure out what it actually means.
"Donk betting" has a bad reputation because it's a typical mistake amateur players make, doing it in the wrong type of situations with the wrong types of hands. You can only donk bet in some betting round if you're first to act, and a general weakness amateur players have is that they don't understand the value of being last to act (having more information). To at least somewhat mitigate the awfulness of being first to act, good players try to give out as little information as possible. If you played the previous street passively and your opponent displayed strength, you generally want to check because your opponent already expects you to be weaker, and so will do the betting for you often enough because they're still telling their story of having a stronger hand. If you donk bet when a new card improved you, you telegraph information and your opponent can play perfectly against that, folding their weak hands and continuing only with strong hands. If you check instead, you get more value from your opponent's bluffs, and you almost always still get to put in your raise after they bet for you, reopening the betting round for you.
However, there are instances where donk betting is clearly good: When a new card is much more likely to improve your range of hands compared to your opponent's. In certain situations a new card is terrible for one player and good for the other player. In those instances, you can expect thinking opponents to check after you even with most of their strong hands, because they became apprehensive of your range of hands having improved a lot. In that case, you sometimes want to bet out right away (both in some of the cases where you hit, as well as with bluffs).
However, Pluribus disagrees with the folk wisdom that “donk betting” (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.
It might just be that professional humans decide to keep the game tree simple by not developing donk bet strategies for situations where this is complicated to balance and only produces small benefits if done perfectly. But it could be that Pluribus found a more interesting reason to occasionally use donk bets in situations where professional players would struggle to see the immediate use. Unfortunately I couldn't find any discussion of hand histories illustrating the concept.
Thanks for the post. I would recommend reading the original blog post by Noam Brown as it has the proper level of exposition and more details/nuances.
Overall, it seems that Pluribus is conceptually very similar to Libratus; sadly, no new insights about >2-player games. My impression is that because poker players don't collude/cooperate too much, playing something close to an equilibrium against them will make you rich.
Thanks for this.
Nitpick:
The description of a big blind:
Big blind: the minimal money/poker chips that every player must bet in order to play. For example, $0.1 would be a reasonable amount in casual play.
sounds more like an ante than a big blind. This is important for understanding the discussion of limping in Ars Technica.
This is very interesting, thanks for posting.
The explanations of the AI's algorithms sound pretty simplified, i.e. I wouldn't be surprised if all these descriptions of how the algorithm works applied to efforts from 10+ years ago. Why did the human-level threshold just get crossed now?
I cannot see anything that is particularly innovative in the paper, though I'm not an expert on this.
Maybe ask people working on poker AI, like Sandholm, directly. Perhaps something like many details of the particular program (and the paper is full of these details) must be assembled in order for this to work cheaply enough to be trained.
A very interesting analysis of an interesting article. I'm not familiar with AI development and because of that my questions may be too elementary.
Its major strength is its ability to use mixed strategies... to do this in a perfectly random way and to do so consistently. Most people just can't.
It amazes me how much of the advantage from AI and other computer programs are derived from their lower bias than humans.
Because poker is played commercially, the risk associated with releasing the code outweighs the benefits. To aid reproducibility, we have included the pseudocode for the major components of our program in the supplementary materials.
It took 2 professionals to develop its algorithm and one of them to code it. As I understand, with the provided pseudocode it would require only to code it again to create a new instance of the AI. Or, is there something crucial that is missing? Can you estimate how much work/cost would be necessary to do that?
Could you include a link to the analyzed article on the introduction? It is easy to find, but it feels strange without a direct link.
On July 11, a new poker AI is published in Science. Called Pluribus, it plays 6-player No-limit Texas Hold'em at superhuman level.
In this post, we read through the paper. The level of exposition is between the paper (too serious) and the popular press (too entertaining).
Basics of Texas Hold'em
If you don't know what it even is, like me, then playing a tutorial would be best. I used Learn Poker on my phone.
Now that you know how to play it, it's time to deal with some of the terminologies.
The authors
The authors are Noam Brown and Tuomas Sandholm. Previously, they made the news by writing Libratus, a poker AI that beat human champions in 2-player no-limit Texas Hold'em, in 2017.
Pluribus contains a lot of the code from Libratus and its siblings:
Scroll to the bottom for more on the two companies.
Highlights from the paper
Is Nash equilibrium even worthwhile?
In multiplayer games, Nash equilibriums are not easy to compute, and might not even matter. Consider the Lemonade Stand Game:
The Nash equilibrium is when three of you are equidistant from each other, but there's no way to achieve that unilaterally. You might decide that you will just stay in Stand 0 and wait for the others to get to Stand 4 and Stand 8, but they might decide upon a different Nash equilibrium.
The authors decided to go all empirical and not consider the problem of Nash equilibrium:
The success of Pluribus appears to vindicate them:
Description of Pluribus
Pluribus first produces a "blueprint" by offline self-play, then during live gaming, adapt it:
Since the first round (like chess opening vs chess midgame) had the smallest amount of variation, Pluribus could afford to train an almost complete blueprint strategy for the first round. For later rounds, some real-time search was needed:
Pluribus uses Monte Carlo counterfactual regret minimization. The details can be found in the link.
Pluribus can be sneaky:
This was corroborated by a comment from a human opponent:
Scroll down for how Ferguson lost to Pluribus.
Pluribus is cheap, small, and fast
In order to make Pluribus small, the blueprint strategy is "abstracted", that is, it intentionally confuses some game actions (because really, $200 and $201 are not so different).
The abstraction paid off. Pluribus was cheap to train, cheap to run, and faster than humans:
On Amazon right now, Intel® Xeon® Processor E5-2695 v3 CPU cost just $500 each, and a 128 GB RAM cost $750. The whole setup can be constructed for under $2000. It would only take a little while to recoup the cost if it goes to online poker.
Pluribus vs Human professionals. Pluribus wins!
Professional Poker is an endurance game, like marathon:
And there was prize money, of course, for the humans. Pluribus played for free -- what a champ.
Pluribus had a very high win rate, and is statistically demonstrated to be profitable when playing against 5 elite humans:
"mbb/game" means "milli big blinds per game". "big blind" just means "the least amount that one must bet at the beginning of the game", and poker players use it as a unit of measurement of the size of bets. "milli" means 1/1000. So Pluribus would on average win 4.8% of the big blind each game. Very impressive.
AIVAT is statistical technique that is designed specifically to evaluate how good a poker player is. From (Neil Burch et al, 2018):
Pluribus vs Jesus (and Elias)
Pluribus did not gang up on the poor human:
The humans were paid on average $0.60 per game:
Pluribus won!
Ferguson lost less than Elias:
Pluribus is an alien, like AlphaZero
And like AlphaZero, it confirms some human strategies, and dismisses some others:
Two examples in particular:
Too dangerous to be released, again
The program is not released for some kind of unspecified risk. (News articles made it specifically about the risk of wrecking the online gambling industry.)
Useful quotes from other news report
From Ars Technica:
From MIT Technology Review:
There are a bit of details to the two companies of Sandholm:
"Better computer games"... hm, sounds suspiciously nonspecific.