Eliezer Yudkowsky recently posted on Facebook an experiment that could potentially indicate whether humans can "have AI do their alignment homework" despite not being able to trust whether the AI is accurate: see if people improve in their chess-playing abilities when given advice from experts, two out of three of which are lying.
I'm interested in trying this! If anyone else is interested, leave a comment. Please tell me whether you're interested in being:
A) the person who hears the advice, and plays chess while trying to determine who is trustworthy
B) the person who they are playing against, who is normally better at chess than A but worse than the advisors
C) one of the three advisors, of which one is honestly trying to help and the other two are trying to sabotage A; which one is which will be chosen at random after the three have been selected to prevent A from knowing the truth
Feel free, and in fact encouraged, to give multiple options that you're open to trying out! Who gets assigned to what role would depend on how many people respond and their levels of chess ability, and it's easier to find possible combinations with more flexibility in whose role is which.
Please also briefly describe your level of experience in chess. How frequently have you played, if at all; if you have ELO rating(s), what are they and which organizations are they from (FIDE, USCF, Chess.com, etc). No experience is required! In fact, people who are new to the game are actively preferred for A!
Finally, please tell me what days and times you tend to be available - I won't hold you to anything, of course, but it'll help give me an estimate before I contact you to set up a specific time.
Edit: also, please say how long you would be willing to play for - a couple hours, a week, a one-move-per-day game over the course of months? A multi-week or multi-month game would give the players a lot more time to think about the moves and more accurately simulate the real-life scenario, but I doubt everyone would be up for that.
Edit 2: GoteNoSente suggested using a computer at a fixed skill level for player B, which in retrospect is clearly a great idea.
Edit 3: there is now a Google Form for signing up: https://docs.google.com/forms/d/e/1FAIpQLScPKrSB6ytJcXlLhnxgvRv1V4vMx8DXWg1j9KYVfVT1ofdD-A/viewform?vc=0&c=0&w=1&flr=0
I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.
Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.
The problem is that while the human can give some rationalizations as to "ah, this is probably why the computer says it's the best move," it's not the original reasoning that generated those moves as the best option, because that took place inside the engine. Some of the time, looking ahead with computer analysis is enough to reproduce the original reasoning - particularly when it comes to tactics - but sometimes they would just have to guess.