I'm rated ~1700 on chess.com, though I suspect their ratings may be inflated relative to e.g. FIDE ones. Happy to play whatever role that rating fits best with. I work around NYC at a full-time job: I'm generally free in the evenings (perhaps 7pm-11pm NY time) and on weekends.
Two questions:
Do you anticipate using a time control for this? I suspect B will be heavily advantaged by short time controls that don't give A much time, while A will be heavily favored by having enough time to e.g. tell two advisors who disagree 'okay, C1 thinks that move is a blunder and C2 thinks it's great, you two start from this position and play the game out after C2 makes that move and we'll see if C1 easily wins'. I don't immediately have a good guess for what time control will be balanced.
Are C players allowed to use chess engines?
I am also in NYC and happy to participate. My lichess rating is around 2200 rapid and 2300 blitz.
I think a time control of some sort would be helpful just so that it doesn't take a whole week, but I would prefer it to be a fairly long time control. Not long enough to play a whole new game, though, because that's not an option when it comes to alignment - in the analogy, that would be like actually letting loose the advisors' plans in another galaxy and seeing if the world gets destroyed.
I'm not sure exactly what the time control would be - maybe something like 4 hours on each side, if we're using standard chess time controls. I'm also thinking about u...
I'm happy to play on any of the 4 roles, I haven't played non-blitz chess in quite a while (and never played it seriously) but I would guess I'm ~1300 on standard time controls on chess.com (interpolating between different time controls and assuming a similar decay as other games like Go).
I'm free after 9pm PDT most weekdays, and free between noon and 6pm or so on weekends.
I'm happy to be B if it'd be useful - mainly because I expect that to require least time, and I do play chess to relax anyway. Pretty flexible on times/days. I don't think I'd have time for A/C. (unless the whole thing is quite quick - I'd be ok spending an afternoon or two, so long as it's not in the next two weeks; currently very busy)
I've not been rated recently. IIRC I was about 1900 in blitz on chess.com when playing for fun.
I'd guess that I could be ~1900 on longer controls if I spent quite a bit of effort on the games.
I'd prefer to participate with more of a ~1700 expectation, since I can do that quickly.
So long as I'm B, I'm fine with multi-week or multi-month 1-move-per-day games - but clearly the limiting factor is that this is much more demanding on A and C.
Some thoughts on the setup:
It'd make sense to have at least a few fastish games between B and C, so that it's pretty clear there is the expected skill disparity. Blitz games are likely to be the most efficient here - I'd suggest an increment of at least 5 seconds per move, to avoid the incentive to win on time. But ~3 minutes on the clock may be enough. (9 games of ~10 minutes each will tell you a lot more than 1 game of ~90minutes)
Similarly between A and B.
This should ideally be done at the end of the experiment too, in particular to guard against A being a very fast learner.
B improving a lot seems less likely (though possible, if they started out rusty).
I don't think Cs improving should be an issue.
But it's plausible that both the A-B and B-C gaps shrink during the experiment.
A control that's probably useful is to have A play some games against B with entirely honest advisors.
The point here being that it can impose some penalty to have three suggestions rather than one - e.g. if the advisors know different opening lines, A might pick an inconsistent combination: advisor 1 makes a suggestion that goes down a path advisor 2 doesn't know well; A picks advisor 1's move, then advisor 2's follow-up, resulting in an incoherent strategy.
I don't expect this would be have a large effect, but it seems sensible to do if there's time. (if time's a big constraint, it might not be worth it)
It's worth considering what norms make sense for the C role.
For instance, if C is giving explanations, does that extend to giving complex arguments against other plausible moves? Is C aiming to play fully to win given the constraints, or is there an in-the-spirit-of-things norm?
E.g. if C had a character limit on the advice they could give, the most efficient approach might be to give various lines in chess notation, without any explanation. Is this desirable?
Would it make sense to limit the move depth that C can talk about in concrete terms? E.g. to say that you can give a concrete line up to 6 plies, but beyond that point you can only talk in generalities (more space; pressure on dark squares; more active pieces; will win material...).
I expect that prototyping this will make sense - come up with something vaguely plausible, then just try it and adjust.
I'd be interested to give feedback on the setup you're planning, if that'd be useful.
I was thinking I would test the players to make sure they really could beat each other as they should be able to. Good points on using blitz and doing the test afterwards; the main constraint as to whether it happens before or after the game is that I would prefer to do it beforehand to know whether the rankings were accurate rather than playing for weeks and only later realizing we were doing the wrong test.
I wasn't thinking of much in the way of limits on what Cs could say, although possibly some limits on whether the Cs can see and argue against each ot...
Note that, if there's enough time, you can, each turn have the experts play full games against each other, and copy the next move distribution of whoever wins the most. The dishonest experts can only win by making good moves, so you get good moves either way. So the remaining question is how possible it is to reduce the time required.
One approach is to set up a prediction market where experts can bet on the value of a given position, and resolve bets by playing out the game. That way, dishonest experts lose currency faster by being dishonest. This still introduces variance in how long a given turn takes, though.
AI safety via debate could also inspire strategies.
Neither of these would be allowed, because in the real world, you can't do a bunch of test "games" before or during the actual "game." There's no way to perform a proposed alignment plan in a faraway galaxy, check whether that galaxy is destroyed, and make decisions for what to do on Earth based on that data - let alone perform so many of those tests to inform a prediction market based on what they say.
I would have allowed player A to consult a prediction market made by other a bunch of other inexperienced players on who was really honest or lying. After all, in the real world, whoever was making the final decision on what plan to execute would be able to ask a prediction market what it thought. But the problem is that if I make a prediction market that's supposed to only be for other players around player A's level, somebody will just use a chess engine to cheat, bet in the market, and make it unrealistically accurate.
Yes, if this were only about chess, then having the advisors play games with each other as A watched would help A learn who to trust. I'm saying that since the real-world scenario we're trying to model doesn't allow such a thing to happen, we artificially forbid this in the chess game to make it more like the real-world scenario. The prediction market thing, similarly, would require being able to do a test run so that the dishonest advisors could lose their money by the time A had to make a choice.
I don't think the advisors should be able to use chess engines, because then even the advisors themselves don't understand the reasoning behind what the chess engines are saying. The premise of the experiment involves the advisors telling A "this is my reasoning on what the best move is; try to evaluate if it's right."
Based on my rating on the Free Internet Chess Server (FICS) in 2015, I estimate I would currently have a rating of about 1270 on Chess.com (on the assumption that the average player on FICS in 2015 is slightly better than the average today on Chess.com) which is regrettable because it is probably too high to make a good advisee, but probably too low to make a good advisor. Still, I am willing to participate.
(I still play, but these years I play as a guest, not as a registered user, which means I don't have a rating.)
I would have thought that giving the players 24 hours to make each move would approximate scientific research better than giving 4 hours for all the moves (or 40 moves like they tend to do in competition).
24 hours per move would make the experiment a lot more accurate, but I expect a lot of players might not be willing to play a game that could last several months. I'll ask everyone how long they can handle.
I'd be happy to play any of the A, B and C roles.
I'm a around 1850 elo FIDE, about 2000-2100 on lichess. I play a couple of blitz games daily.
I'd be willing to play at almost any cadence and have a lot of free time. I actually live in France, so a one-move-per-day game with someone living in the US would probably be ideal. Live sessions can be programmed from 16 GMT to 23 GMT on weekdays, and from 7 GMT to 23 GMT on weekends.
As I said I would be happy to play any role. I think it would be more interesting if the lower player is actually not a total beginner - total beginners are probably not hard to deceive. A decent club player with advisors about 300-500 elo higher would be best imo. And if we can experiment at many different elo levels, even better.
Registering a prediction: assuming the elo difference stay constant, better players will be much more difficult to deceive. And a GM would consistently pick up who is lying if you could rope up Caruana, Carlsen and Ding to do the experiment.
I’m about 1000 ELO on chess.com and would be interested in playing as A. I play regularly, but haven’t had formal training or studied seriously. I’d be free weekdays after 7 pm ET.
Very interested in C, also B. I'm an over-the-board FM. Available many evenings (US) but not all. I enjoy recreational deception (e.g. Mafia / Werewolf) but I'm much better at chess than detecting or deploying verbal trickery.
Additional thoughts:
Written chess commentary by 'weak' players tends to be true but not the most relevant. After 1.e4 Nf6 2.e5, a player might say "Black can play 2...Nc6 developing the N and attacking the pawn on e5". True, but this neglects 3.exf6. This scales upwards. My commentary tends to be very relevant but I miss things that even stronger players do not.
Players choose a weaker move over a stronger move not so much because they reject the stronger move, but because they don't see the stronger move as an option. When going over games with students, I'll stop at a position, offer three moves and ask which is best. They'll consider and choose and explain reasoning. But there's a fourth option, a mate-in-one, and it was not selected. "You must see the move before you can play the move."
Based on 2, a deception strategy is to recommend a weak move over others even weaker. Stronger options? Ignored.
Sounds like a good strategy! ...although, actually, I would recommend you delete it before all the potential As read it and know what to look out for.
I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.
Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.
No computers, because the advisors should be reporting their own reasoning (or, 2/3 of the time, a lie that they claim is their own reasoning.) I would prefer to avoid explicit coordination between the advisors, because the AIs might not have access to each other in the real world, but I'm not sure at the moment whether player A can show the advisors each other's suggestions and ask for critiques. I would prefer not to give either dishonest advisor information on who the other two were, since the real-world AIs probably can't read each other's source code.
I would be interested in this, probably in role A (but depending on the pool of other players possibly one of the other roles; I have no opposition to any of them). I play chess casually with friends, and am probably at somewhere around 1300 elo (based on my winrate against one friend who plays online).
I am happy to be A. I haven't played chess since my teenage years, wherein my record was one of occasional games with friends and relatives, leading to almost unrelieved defeat. But that was four decades ago, and I like to imagine I've become pretty good at judging arguments. So if I competed, it would be on a basis of almost total chess ignorance, but ability to follow complex chains of logic.
I can be any of A, B, or C. I've been playing chess for the past ten years, and my USCF rating was in the upper 1500s when I last played in-person a year ago. I'm usually available from 9PM-UCT to 2AM-UCT (afternoon to evenings in American time) every day, and on Saturdays from 5PM-UCT to 2AM-UCT.
I would be interested in this. A few years ago I failed to convince my favourite chess YouTubers to engage in something similar. My preference for the roles is A>C>B and I am 2100 on chess.com, 2300 lichess. I'm fairly addicted to chess, so willing to spend many hours on this.
Some musing for the format... I had proposed that instead of a game, the 'human' is shown positions that have been selected to be very complicated, but with there being one ambiguously good move. The good move should not be entirely tactical in nature, because this is easy to verify, but rather strategic. I have a book with such positions, but you can find examples online.
The reason for this is that you would otherwise need to be careful about the format. There are some positions that I believe I understand very well and even a top player would really struggle to deceive me in. However, there are also positions in which I have not the faintest clue what is going on. The latter are the more interesting ones to test. If the 'deceptive AIs' are forced to lie in a position I understand well, I could then discount them for the rest of the experiment. Even with something like randomising their identifiers at each move, grammatical tells might be present. Therefore, playing out a game, the 'deceptive AIs' would need to be truthful on many on the moves and only lie in a handful, which is additional complexity.
Individual positions like that could be an interesting thing to test; I'll likely have some people try out some of those too.
I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.
I agree that knowing when to lie is part of the challenge a deceptive AI will face. However, I would argue that a coherent plan is needed for every move suggestion. In a game of chess, there are typically only a few critical positions, and it is these where a deceptive AI ought to strike. This is similar to the cheating discussions in chess - a top player would only need a hint in a few positions to greatly benefit - the other 90% of moves they can make without assistance.
But by focusing on challenging positions, it could be a more efficient use of the participant's time. Otherwise, for a whole game you may only have had 3 moves where a deceptive AI actually lied.
My chess.com ELO is astoundingly low. I hereby volunteer for role (A), and/or any other role in an experiment setup where you think "Dang, I want some entity that makes moves that aren't literally random, but are also nigh-guaranteed to lose."
Why select a deterministic game with complete information for this? I suspect games like poker or backgammon would be easier for the adversarial advisors to fool the player and that these games are a better model of the real world scenario.
I'm not sure about poker, but I think for backgammon it'd be harder to get three levels where C beats B beats A reliably. I'm not a backgammon expert, but I could win games against experts - it's enough to be competent and lucky. A may also learn too fast - becoming competent is much faster for backgammon than for chess. (needing a larger sample size due to randomness makes A learning more of a problem - this may apply with poker too??)
I have a lot more experience and skill at chess, but it's still pretty simple to find players who'll beat me 90% of the time.
See Table 2 in https://www.emilkirkegaard.com/p/skill-vs-luck-in-games for
[...] the corresponding winning probability of a player who is exactly one standard deviation better than his opponent. We refer to this probability as p^sd . For comparison, we also provide the winning probablities when a 99% percentile player is matched against a 1% percentile player, which we call p99 1 .
Go & Chess (p^sd=83.3,72.9) are notably above Backgammon (p^sd=53.6%)
Oh that's cool - nice that someone's run the numbers on this.
I'm actually surprised quite how close-to-50% both backgammon and poker are.
Agreed that it could be a bit more realistic that way, but the main constraint here is that we need a game where there are three distinct levels of players who always beat each other. The element of luck in games like poker and backgammon makes that harder to guarantee (as suggested by the stats Joern_Stoller brought up). And another issue is that it'll be harder to find a lot of skilled players at different levels from any game that isn't as popular as chess is - even if we find an obscure game that would in theory be a better fit for the experiment, we won't be able to find any Cs for it.
I'd be excited to play as any of the roles. I'm around 1700 on lichess. Happy with any time control, including correspondence. I'm generally free between 5pm and 11pm ET every day.
I'm rated ~1600 on Lichess and would participate in whichever role that rating fits best with.
I have some questions, such as which time controls are used but am most interested in how you plan on having the "C" group give advice to the "A" group. Would they just give notation (for instance Re1) or would they type or speak a little bit of context alongside such as "Centralize your pieces" or "complete your development by activating your rook"?
I live in the Bay Area and am generally available on weekdays anytime after 5PST.
My only concern is that while very inexperienced players may be able to determine who is giving good advice in the early game and for general improving moves, the much better players who will be advising will be much more concrete with their advice. By concrete I mean moves that are tactically justified whose reasonings would be utterly anathema to even a casual player making it much more difficult for them to determine who is lying. This problem would be exacerbated the less context the advisors are able to give for each recommendation.
Unsure about the time controls at the moment; see my response to aphyer. The advisors would be able to give the A player justification for the move they've recommended.
The concern that A might not be able to understand the reasoning that the advisors give them is a valid one, and that's the whole point of the experiment! If A can't follow the reasoning well enough to determine whether it's good advice, then (says the analogy) people who are asking AIs how to solve alignment can't follow their reasoning well enough to determine whether it's good advice.
Why does B have to be better at chess than A but worse than C? Eliezer's post only specifies that B has to be weaker than C; unless I missed something, it doesn't say they have to be stronger than A.
If B were the same level as A, then they wouldn't pose any challenge to A; A would be able to beat them on their own without listening to the advice of the Cs.
I've created a Manifold market if anyone wants to bet on what happens. If you're playing in the experiment, you are not allowed to make any bets/trades while you have private information (that is, while you are in a game, or if I haven't yet reported the details of a game you were in to the public.)
I would happily play the role of B.
I do not have an established FIDE rating, but my strength is approximately 1850 FIDE currently (based on playing against FIDE rated players OTB quite often, as well as maintaining 2100-2200 blitz ratings on Lichess & Chess.com, and 2200-2300 bullet). I'd be available after 6:30 pm (UTC+10) until ~12:00 pm (UTC+10). Alternatively, weekends are very flexible. I could do a few hours per week.
I agree short/long time controls are a relevant, because speed is a skill that is almost entirely independent of conceptual knowledge and is mostly a function of baseline playing ability.
Edit: Would also be fine with C
Eliezer Yudkowsky recently posted on Facebook an experiment that could potentially indicate whether humans can "have AI do their alignment homework" despite not being able to trust whether the AI is accurate: see if people improve in their chess-playing abilities when given advice from experts, two out of three of which are lying.
I'm interested in trying this! If anyone else is interested, leave a comment. Please tell me whether you're interested in being:
A) the person who hears the advice, and plays chess while trying to determine who is trustworthy
B) the person who they are playing against, who is normally better at chess than A but worse than the advisors
C) one of the three advisors, of which one is honestly trying to help and the other two are trying to sabotage A; which one is which will be chosen at random after the three have been selected to prevent A from knowing the truth
Feel free, and in fact encouraged, to give multiple options that you're open to trying out! Who gets assigned to what role would depend on how many people respond and their levels of chess ability, and it's easier to find possible combinations with more flexibility in whose role is which.
Please also briefly describe your level of experience in chess. How frequently have you played, if at all; if you have ELO rating(s), what are they and which organizations are they from (FIDE, USCF, Chess.com, etc). No experience is required! In fact, people who are new to the game are actively preferred for A!
Finally, please tell me what days and times you tend to be available - I won't hold you to anything, of course, but it'll help give me an estimate before I contact you to set up a specific time.
Edit: also, please say how long you would be willing to play for - a couple hours, a week, a one-move-per-day game over the course of months? A multi-week or multi-month game would give the players a lot more time to think about the moves and more accurately simulate the real-life scenario, but I doubt everyone would be up for that.
Edit 2: GoteNoSente suggested using a computer at a fixed skill level for player B, which in retrospect is clearly a great idea.
Edit 3: there is now a Google Form for signing up: https://docs.google.com/forms/d/e/1FAIpQLScPKrSB6ytJcXlLhnxgvRv1V4vMx8DXWg1j9KYVfVT1ofdD-A/viewform?vc=0&c=0&w=1&flr=0