A test of your rationality skills

[-]jaspax3y88

I think you're underestimating the effort required to understand this scenario for someone who doesn't already follow poker. I am a lifelong player of trick-taking games (casually, at the kitchen table with family members), but I've never played poker, and here's how the play description reads to me:

called an all-in shove

Only a vague idea of what this means, based on the everyday idiom of being "all-in".

with the jack of clubs and four of hearts on a board

Don't know what it means for these to be "on a board".

reading ThTc9c3h

Gibberish.

her jack high held against Adelstein’s eight of clubs and seven of clubs

Only vaguely comprehensible. I don't know poker's hand-scoring rules.

Additional details that are necessary to interpret the situation: is the deck continually shuffled, or are multiple hands played off of the same shuffle? (Implicitly: are there card-counting strategies that provide relevant information?) What are the point rules / rank of hands? How does suit interact with card rank? Is there a concept of trump? What was the sequence of bets leading up to the play in question? How typical is this behavior in high-level play? How high-level are these people? Robbi is called a "recreational" player -- does this mean "top-level amateur" or "low-level pro", or something else?

In the absence of these details, all I really get is "Robbi made a risky play off a mediocre hand, and won big". And yes, this is bayesian evidence in favor of cheating, but how strong the evidence is depends heavily on all of the unknown details mentioned above. At the same time, the fact that no one identified the means by which the cheating occurred despite heavy scrutiny is bayesian evidence against cheating.

My operational decision would be that this is enough evidence to subject Robbi to heightened scrutiny in future tournaments, but not enough to ban her or claw back her winnings. This is a good test, but maybe not as good as you think it is, due to the amount of uncommon background knowledge required.

[-]Max H3y3-3

(Also, FYI for others: this comment is close to violating my bolded request not to post object-level conclusions or speculations publicly. I'll let this one slide since it's mostly just an initial reaction, but I may ask that similar comments be deleted.)

[-]Max H3y*10

I may be underestimating the background knowledge and effort required, yes. Understanding the rules of poker and Texas Hold'em in particular is pretty essential for this exercise, so it might be worth writing a longer introduction and explanation that provides some of the required background knowledge.

Though, this is the kind of thing I expect GPT to be a great help with, and so for those unfamiliar with poker, this is also a good test of a different set of skills: using AI tools to get up to speed quickly in an unfamiliar domain.

Here's what GPT-4 said in response to your comment:

Let me try to explain the key terms and concepts in the description of the hand.
"Called an all-in shove" - In poker, a player can bet all their remaining chips (money) in a single move. This is known as going "all-in." When a player calls an all-in shove, they match the bet and put their own chips at risk, deciding to continue playing with their current hand.
"On a board" - In Texas Hold'em poker, a variant played in the described scenario, there are five community cards that are dealt face-up in the middle of the table. These cards are shared by all players and are referred to as the "board." Players combine their two private cards (hole cards) with the community cards to form the best possible hand.
"Reading ThTc9c3h" - This is the composition of the board at the time of the described hand. It indicates that the community cards are the Ten of hearts (Th), Ten of clubs (Tc), Nine of clubs (9c), and Three of hearts (3h).
"Her jack high held against Adelstein’s eight of clubs and seven of clubs" - In poker, the objective is to have the best hand or convince others to fold. In this case, Lew's best hand was a single high card, the Jack, which is better than Adelstein's best hand, which had an Eight as the high card. The term "held" means that Lew's hand remained the best hand after all the community cards were dealt.
As for your other questions:
In poker, the deck is shuffled before each hand, so card-counting strategies are not relevant.
Poker hands are ranked based on the combination of cards, from high card (weakest) to royal flush (strongest). There is no concept of trump, and suits do not affect hand ranking.
The sequence of bets leading up to the play in question and the players' levels were not provided in the original description, but the focus is on the suspicious nature of Lew's call given her weak hand.
Robbi being called a "recreational" player usually means she is not a professional poker player and plays casually or for fun.
In summary, the controversy revolves around Robbi making a risky call with a weak hand and winning a large amount of money. This incident raises suspicions of cheating, but without concrete evidence, it's difficult to confirm. As you mentioned, heightened scrutiny for Robbi in future tournaments would be a reasonable course of action, but without more information or evidence, it would be unfair to ban her or take back her winnings.

I think it's pretty good! If anyone wants to learn more, I suggest pasting the description (or other, longer descriptions available online) into ChatGPT and querying interactively. Note, I used GPT-4 for the version above, not sure how well the free version does on something like this. Bing might do really well with this, since it can query external / up-to-date info on the web.

[-]Dagon3y44

I think this is a bad excercise or test of rationality skill. First, it's massively time-consuming, as a LOT has been written about it. Second (though perhaps more important), there's no reasonable scoring rubric (so not good as a test), and no feedback loop to improve on (so not good as an excercise).

I have, in fact, followed the topic - I used to play poker at semi-professional levels (played in big games and cashed in many small and medium tourneys, net positive over many years, never actually devoted the energy to make it a big part of my income), and still have close friends in the biz (organizers, authors, and players). There is a consensus among those I know well enough to have a positive opinion on their honesty and epistemology, but it's complex enough that it's not a very good topic for abstract rationality practice.

More standard prediction contests would seem strictly superior for testing and practice. Pick some metaculus medium-term predictions, make individual bets, then discuss reasoning and make new bets. Practice crux-finding and input metrics you can use to resolve actual work disagreements.

[-]Max H3y10

I used to be quite an active and profitable trader on PredictIt. I've also looked into this incident a bit myself. I think the rationality skills needed to do well in prediction contests are important, but different, than the kind needed to investigate a question like this, the Amanda Knox case, or the Sabatini incident.

[-]faul_sname3y20

I have opinions on the object-level here, but I concur that this is probably more of a test of "how familiar are you with what is and is not normal in a high-stakes cash game" and also "how familiar are you with the specific math" than of more general rationality.

[-]Alex Vermillion3y20

(I haven't read the post yet) The mention of the Knox posts made me think of this comment chain about the slowly-growing number of similar posts on LW: https://www.lesswrong.com/posts/YTJp5WBcktBimdxBG/staying-split-sabatini-and-social-justice?commentId=xctop8E3zpuCFjj4p

I don't know if it's worth adding in to your post anywhere, but here it is if you would like it.

[-]Sune3y11

Seems difficult to mark answers to this question.

The type of replies you get, and the skills you are testing, would also depend how long the subject is spending on the test. Did you have a particular time limit in mind?

[-]Max H3y10

I think timeboxing it to 3 hours or so would be a good standard; maybe a bit more if you're totally unfamiliar with poker.

I don't think judging responses would be particularly difficult; even if we don't know what actually happened for certain, you can still judge whether someone used valid rules of inference to reach a plausible estimate. (Judging well requires rationality skills too, of course - rationalists should be more easily convinced of true propositions than false ones, and be able to distinguish invalid reasoning from valid reasoning.)

Also, I suspect that most strong rationalists would independently converge to the same probability estimate for approximately the same reasons, if they looked into the matter, which could serve as a baseline.

[-]Dagon3y22

Also, I suspect that most strong rationalists would independently converge to the same probability estimate for approximately the same reasons, if they looked into the matter, which could serve as a baseline.

ROFL. The very setup of the post (it's controversial and there's no consensus, even among professionals who've spent a lot more than a few hours looking into it) contradicts this. There's also a bunch of private information and priors (such as "what is the base rate of cheating in high-stakes poker" and "what side payments had been made among participants and crew") that are very hard to validate. Even if there were a reasonable base-rate, the question of whether this KIND of cheating (alleged access to hold-card camera feed) is comparable to other kinds (soft-playing or signaling a compatriot, acting out of turn for information, more mild angle-shooting).

[-]Max H3y10

The fact that it is controversial among non-rationalists does not mean that it would be similarly controversial among (strong) rationalists.

This is probably not worth their time and too expensive to test, but concretely, I predict: if Duncan Sabien, gwern, Zvi (or other people in this general reference class, or people who did well on the Amanda Knox test for the right reasons, etc.) each spent some hours looking into this, I suspect they would reach mostly the same conclusions for mostly the same reasons independently.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

11

A test of your rationality skills

11

11

The exercise

Why this is a good test