This seems like a fun idea. I imagine there would be some high-level streamers willing to try this live (maybe chessbrah?).
What kind of lessons do you envision we learn from Deception Chess that could be applied towards alignment work? In my head, the situation is slightly different since we (or I) are currently assuming an AI tool isn't actively trying to deceive us, but in Deception Chess it's already known that there's a malicious actor.
It's a playground for testing ideas associated with Deception. Naturally there are other ways and other arenas. The rules for this arena are fun and flexible (perhaps no deceivers some of the time!), but still limited to discussing only the quality of particular chess moves in a specific positions. Quality as compared to a hidden but soon-revealed 'perfect' answer.
As far as lessons, I expect Player will have the most valuable post-game perspective. How easy is it to judge quality of Advice? In what ways does advice look different if it's Deceptive? Does it even look different? Given a reasonably strong Opponent, most any human advice appears 'Deceptive' with no such intent.
I Social Deduction Games
The gameplay in Social Deduction games like Mafia or Werewolf revolves around a large group of honest players ("Villagers") trying to uncover a smaller group of deceivers who are slowly and secretly eliminating everyone else, all while claiming to be honest themselves. Under most rulesets, roles are all revealed by the end of a game (15-30 min). Everyone learns who had played what role.
II Social Deduction Chess, or Deception Chess
A new variant for a game with more than any other. Refer also to Zane's earlier posts. I believe the idea is originally from Eliezer.
ROLES
The game is between Player and Opponent. Opponent is typically a specific engine, nerfed to an appropriate skill level. Engines play differently than humans but the benefits are large: immediately reply and consistent-ish move quality. There is no Time Control in that Player cannot lose on time. Player informs Advisors of the current position. After receiving Advice from Advisors through an Advice Channel, Player chooses a move to play and reports the engine's response. The Advice Channel is typically a text thread (e.g. Discord) and asynchronous, with a speed goal of one move per day.
Player is unable to defeat Opponent alone. However, defeat should be straightforward with high-quality advice. You may have already guessed the fun part of Deception Chess. While at least one Advisor is honest, one or more of them are trying to deceive the Player into losing. The parallel to AI: how might one go about detecting deceit in areas where a potential deceiver is objectively far more skilled than yourself?
Like ordinary Social Deduction games, there are many rule variations that can affect winning chances dramatically, upwards or downwards. Advisor status (honest/deceitful) is not known by Player, but the number of each can be revealed at the beginning, or only at the end, or never (but boo! on never). Players can only reveal chess positions, or else communicate freely in the Advice Channel, asking questions about the position or given advice. Advisors can communicate separately, or else they can see each other's advice and even react to it. The Advice Channel can be free-form, or else limited in many different ways (e.g., some max word count, only offer up to three numbered move options, limited analysis time). Ground-truth Roles can be revealed immediately after the game, or some time later, or never (but boo! on never).
III Chess Strength
Chess strength is average move selection quality over a very (very!) large set of game/game-like positions and game/game-like conditions. It's perhaps most accurately calculated as Average Centi-Pawn Loss (ACPL), a kind of distance from perfection. Modern computer engine evaluation (e.g.,'+1.74') serves well enough as 'perfection' when judging human play. Only a perfect move maintains a position's evaluation. More than one perfect move is possible, especially in clearly drawn endgame positions, but any less-than-ideal move pushes the evaluation towards the opponent by a measurable amount. Average over thousands of moves and you get the ACPL.
It's far easier and more common to use Elo rating instead of ACPL. Rough equivalents: Elo 1400 (FIDE) ~= 70-90 ACPL, Elo 2200 ~= 35-45 ACPL, Elo 2800 ~= 15-25 ACPL. A person's effective chess skill (Elo and ACPL) varies across position types (balanced/unbalanced, ahead/behind, simple/complicated, attacking/defending, etc.), and opponent strength (stronger/similar/weaker). It also reflects time controls (fast/slow), time management skill, and nerves + psychology, quite apart from opening/endgame knowledge, calculation ability, etc. There's a barrel metaphor: a barrel holds only as much as its shortest stave. Effective chess skill tends to track your weakest area(s).
The chess world is an unusual culture. The stratification of skill is robust and exceptionally visible. 14-15 classes between the weakest humans (very young children) and the world's best. A difference of one class (200 Elo) is clear superiority (~75% win rate, 3 wins 1 loss, or 2 wins 2 draws). Two classes is ~90%, three is ~96%, etc.
Of course, the preceding is enormously simplified, as demonstrated by the objections/clarifications/elaborations to the above that you the reader are currently thinking about mentioning.
IV Preliminary Observations
A chess game is a chess strength comparison between Opponent and Player. Deception Chess is instead Opponent vs. the effective strength of the Advice Channel. The information compression inherent in putting Advice into a Channel reduces the maximum effective strength of any Advisor, honest or deceptive. Advisors will differ in their ability to persuade about their advice vs. other advice. Players must interpret Advice, which further reduces effective strength, especially given some of it may be deceptive. Some Players will be better at evaluating deception than others. Deceivers will intentionally lower the strength level of their Advice to below that of the Opponent, but not below that of Player so the deceit can be hidden.
Deception Chess is new. Few games have been played. I expect to see strategies used in ordinary Social Deduction games. One advisor will volunteer as Honest, at which point all other Advisors say the same thing. (A full-up 'Role Reveal' is often discouraged in SD games because it's anti-fun). Deceivers will be honest a lot of time, perhaps most of the time, or perhaps all of the time and only lie by omission. Multiple deceivers will want to work together, often by pretending to be at odds with one another. A Player aware of these dynamics is better able to sense deceit, and therefore increase the effective skill of the Advice Channel.
There will be chess-specific communication strategies for both types of Advisors. Many of these on the Deceit side will be (I predict) largely unprecedented in the history of chess literature. I recall one fascinating exception. Kramnik wrote an article analyzing a R+4P vs. R+3P ending, in which the winner gradually outplayed and wore down the opponent. (I can't find it, it may have been an English translation of a Russian article). At the end of the analysis, he asked, 'Did you believe all of that? Because like other examples of superficial analysis it was unduly influenced by the result.' He then went on to analyze it much more accurately, showing the many mistakes on both sides where the position passed back and forth between Winning and Drawn. A phenomenal lesson.
V Engine Deception Chess
I foresee Deception Chess being played with only engine advisors. This would eliminate many human variables as advice can be identically formatted between advisors (only top move, or top 2-3 moves, or top line/lines). 'Deception' could be active in some way, or it could simply be an engine with a strength level below Opponent. Players could select approximate strength levels for Opponents and Advisors, etc.