Eliezer Yudkowsky recently posted on Facebook an experiment that could potentially indicate whether humans can "have AI do their alignment homework" despite not being able to trust whether the AI is accurate: see if people improve in their chess-playing abilities when given advice from experts, two out of three of which are lying.
I'm interested in trying this! If anyone else is interested, leave a comment. Please tell me whether you're interested in being:
A) the person who hears the advice, and plays chess while trying to determine who is trustworthy
B) the person who they are playing against, who is normally better at chess than A but worse than the advisors
C) one of the three advisors, of which one is honestly trying to help and the other two are trying to sabotage A; which one is which will be chosen at random after the three have been selected to prevent A from knowing the truth
Feel free, and in fact encouraged, to give multiple options that you're open to trying out! Who gets assigned to what role would depend on how many people respond and their levels of chess ability, and it's easier to find possible combinations with more flexibility in whose role is which.
Please also briefly describe your level of experience in chess. How frequently have you played, if at all; if you have ELO rating(s), what are they and which organizations are they from (FIDE, USCF, Chess.com, etc). No experience is required! In fact, people who are new to the game are actively preferred for A!
Finally, please tell me what days and times you tend to be available - I won't hold you to anything, of course, but it'll help give me an estimate before I contact you to set up a specific time.
Edit: also, please say how long you would be willing to play for - a couple hours, a week, a one-move-per-day game over the course of months? A multi-week or multi-month game would give the players a lot more time to think about the moves and more accurately simulate the real-life scenario, but I doubt everyone would be up for that.
Edit 2: GoteNoSente suggested using a computer at a fixed skill level for player B, which in retrospect is clearly a great idea.
Edit 3: there is now a Google Form for signing up: https://docs.google.com/forms/d/e/1FAIpQLScPKrSB6ytJcXlLhnxgvRv1V4vMx8DXWg1j9KYVfVT1ofdD-A/viewform?vc=0&c=0&w=1&flr=0
Very interested in C, also B. I'm an over-the-board FM. Available many evenings (US) but not all. I enjoy recreational deception (e.g. Mafia / Werewolf) but I'm much better at chess than detecting or deploying verbal trickery.
Additional thoughts:
Written chess commentary by 'weak' players tends to be true but not the most relevant. After 1.e4 Nf6 2.e5, a player might say "Black can play 2...Nc6 developing the N and attacking the pawn on e5". True, but this neglects 3.exf6. This scales upwards. My commentary tends to be very relevant but I miss things that even stronger players do not.
Players choose a weaker move over a stronger move not so much because they reject the stronger move, but because they don't see the stronger move as an option. When going over games with students, I'll stop at a position, offer three moves and ask which is best. They'll consider and choose and explain reasoning. But there's a fourth option, a mate-in-one, and it was not selected. "You must see the move before you can play the move."
Based on 2, a deception strategy is to recommend a weak move over others even weaker. Stronger options? Ignored.
Sounds like a good strategy! ...although, actually, I would recommend you delete it before all the potential As read it and know what to look out for.