Lying to chess players for alignment

Zane

97 Lying to chess players for alignment

25th Oct 2023

2 min read

97

Eliezer Yudkowsky recently posted on Facebook an experiment that could potentially indicate whether humans can "have AI do their alignment homework" despite not being able to trust whether the AI is accurate: see if people improve in their chess-playing abilities when given advice from experts, two out of three of which are lying.

I'm interested in trying this! If anyone else is interested, leave a comment. Please tell me whether you're interested in being:

A) the person who hears the advice, and plays chess while trying to determine who is trustworthy

B) the person who they are playing against, who is normally better at chess than A but worse than the advisors

C) one of the three advisors, of which one is honestly trying to help and the other two are trying to sabotage A; which one is which will be chosen at random after the three have been selected to prevent A from knowing the truth

Feel free, and in fact encouraged, to give multiple options that you're open to trying out! Who gets assigned to what role would depend on how many people respond and their levels of chess ability, and it's easier to find possible combinations with more flexibility in whose role is which.

Please also briefly describe your level of experience in chess. How frequently have you played, if at all; if you have ELO rating(s), what are they and which organizations are they from (FIDE, USCF, Chess.com, etc). No experience is required! In fact, people who are new to the game are actively preferred for A!

Finally, please tell me what days and times you tend to be available - I won't hold you to anything, of course, but it'll help give me an estimate before I contact you to set up a specific time.

Edit: also, please say how long you would be willing to play for - a couple hours, a week, a one-move-per-day game over the course of months? A multi-week or multi-month game would give the players a lot more time to think about the moves and more accurately simulate the real-life scenario, but I doubt everyone would be up for that.

Edit 2: GoteNoSente suggested using a computer at a fixed skill level for player B, which in retrospect is clearly a great idea.

Edit 3: there is now a Google Form for signing up: https://docs.google.com/forms/d/e/1FAIpQLScPKrSB6ytJcXlLhnxgvRv1V4vMx8DXWg1j9KYVfVT1ofdD-A/viewform?vc=0&c=0&w=1&flr=0

ChessAI

Frontpage

97

Deception Chess: Game #1

22 comments111 karma

Mentioned in

111Deception Chess: Game #1

26AI debate: test yourself against chess 'AIs'

13Suggestions for chess puzzles

New Answer

New Comment

18 Answers sorted by
top scoring

aphyer

Oct 25, 2023

I'm rated ~1700 on chess.com, though I suspect their ratings may be inflated relative to e.g. FIDE ones. Happy to play whatever role that rating fits best with. I work around NYC at a full-time job: I'm generally free in the evenings (perhaps 7pm-11pm NY time) and on weekends.

Two questions:

Do you anticipate using a time control for this? I suspect B will be heavily advantaged by short time controls that don't give A much time, while A will be heavily favored by having enough time to e.g. tell two advisors who disagree 'okay, C1 thinks that move is a blunder and C2 thinks it's great, you two start from this position and play the game out after C2 makes that move and we'll see if C1 easily wins'. I don't immediately have a good guess for what time control will be balanced.
Are C players allowed to use chess engines?

[-]robertzk2y20

I am also in NYC and happy to participate. My lichess rating is around 2200 rapid and 2300 blitz.

[-]Zane2y20

I think a time control of some sort would be helpful just so that it doesn't take a whole week, but I would prefer it to be a fairly long time control. Not long enough to play a whole new game, though, because that's not an option when it comes to alignment - in the analogy, that would be like actually letting loose the advisors' plans in another galaxy and seeing if the world gets destroyed.

I'm not sure exactly what the time control would be - maybe something like 4 hours on each side, if we're using standard chess time controls. I'm also thinking about u... (read more)

2Joe Collman2y

I think it'd make sense to give C at least as long as B. B doesn't need to do any explaining. I think giving A significantly longer than B is fine, so long as the players have enough time to stick around for that. I think it's a more interesting experiment if A has ample time to figure things out to the best of their ability. A failing because they weren't able to understand quickly seems less interesting. The best way to handle this seems to be to play A-vs-B and B-vs-C control games with something as close to the final setup as possible. So e.g. you could have B-vs-C games to check that C really is significantly better, but require C to write explanations for their move and why they didn't make a couple of other moves. Essentially C imagines they're playing the final setup, except their move is always picked. And you can do A-vs-B games where A has a significant advantage in time over B (though I think blitz games is still more efficient in gaining the most information in the given time). This way it doesn't matter much whether the setup is 'fair' to A/B/C, so long as it's unfair to a similar level in the control 1-v-1 games as in the advisor-based games.

2Joe Collman2y

That said, I don't expect the setup to be particularly sensitive to the control games or time controls. If you have something like: A: novice B: ~1700 C: ~2200 Then A is going to robustly lose to B and B to C. An extra couple of minutes either way isn't going to matter. (thinking for longer might get you 100 Elo, but nowhere close to 500) If this reliably holds - e.g. B beats A 9-0 in blitz games, and the same for C vs B, then it doesn't seem worth the time to do more careful controls. (or at least the primary reason to do more careful controls at that point would be a worry that the results wouldn't otherwise be taken seriously by some because you weren't doing Proper Science)

Linch

Oct 26, 2023*

I'm happy to play on any of the 4 roles, I haven't played non-blitz chess in quite a while (and never played it seriously) but I would guess I'm ~1300 on standard time controls on chess.com (interpolating between different time controls and assuming a similar decay as other games like Go).

I'm free after 9pm PDT most weekdays, and free between noon and 6pm or so on weekends.

Joe Collman

Oct 26, 2023

I'm happy to be B if it'd be useful - mainly because I expect that to require least time, and I do play chess to relax anyway. Pretty flexible on times/days. I don't think I'd have time for A/C. (unless the whole thing is quite quick - I'd be ok spending an afternoon or two, so long as it's not in the next two weeks; currently very busy)

I've not been rated recently. IIRC I was about 1900 in blitz on chess.com when playing for fun.
I'd guess that I could be ~1900 on longer controls if I spent quite a bit of effort on the games.
I'd prefer to participate with more of a ~1700 expectation, since I can do that quickly.

So long as I'm B, I'm fine with multi-week or multi-month 1-move-per-day games - but clearly the limiting factor is that this is much more demanding on A and C.

Some thoughts on the setup:
It'd make sense to have at least a few fastish games between B and C, so that it's pretty clear there is the expected skill disparity. Blitz games are likely to be the most efficient here - I'd suggest an increment of at least 5 seconds per move, to avoid the incentive to win on time. But ~3 minutes on the clock may be enough. (9 games of ~10 minutes each will tell you a lot more than 1 game of ~90minutes)

Similarly between A and B.

This should ideally be done at the end of the experiment too, in particular to guard against A being a very fast learner.
B improving a lot seems less likely (though possible, if they started out rusty).
I don't think Cs improving should be an issue.
But it's plausible that both the A-B and B-C gaps shrink during the experiment.

A control that's probably useful is to have A play some games against B with entirely honest advisors.
The point here being that it can impose some penalty to have three suggestions rather than one - e.g. if the advisors know different opening lines, A might pick an inconsistent combination: advisor 1 makes a suggestion that goes down a path advisor 2 doesn't know well; A picks advisor 1's move, then advisor 2's follow-up, resulting in an incoherent strategy.
I don't expect this would be have a large effect, but it seems sensible to do if there's time. (if time's a big constraint, it might not be worth it)

It's worth considering what norms make sense for the C role.
For instance, if C is giving explanations, does that extend to giving complex arguments against other plausible moves? Is C aiming to play fully to win given the constraints, or is there an in-the-spirit-of-things norm?

E.g. if C had a character limit on the advice they could give, the most efficient approach might be to give various lines in chess notation, without any explanation. Is this desirable?

Would it make sense to limit the move depth that C can talk about in concrete terms? E.g. to say that you can give a concrete line up to 6 plies, but beyond that point you can only talk in generalities (more space; pressure on dark squares; more active pieces; will win material...).

I expect that prototyping this will make sense - come up with something vaguely plausible, then just try it and adjust.

I'd be interested to give feedback on the setup you're planning, if that'd be useful.

[-]Zane2y30

I was thinking I would test the players to make sure they really could beat each other as they should be able to. Good points on using blitz and doing the test afterwards; the main constraint as to whether it happens before or after the game is that I would prefer to do it beforehand to know whether the rankings were accurate rather than playing for weeks and only later realizing we were doing the wrong test.

I wasn't thinking of much in the way of limits on what Cs could say, although possibly some limits on whether the Cs can see and argue against each ot... (read more)

5Joe Collman2y

Oh I didn't mean only to do it afterwards. I think before is definitely required to know the experiment is worth doing with a given setup/people. Afterwards is nice-to-have for Science. (even a few blitz games is better than nothing)

jessicata

Oct 25, 2023

5-1

Note that, if there's enough time, you can, each turn have the experts play full games against each other, and copy the next move distribution of whoever wins the most. The dishonest experts can only win by making good moves, so you get good moves either way. So the remaining question is how possible it is to reduce the time required.

One approach is to set up a prediction market where experts can bet on the value of a given position, and resolve bets by playing out the game. That way, dishonest experts lose currency faster by being dishonest. This still introduces variance in how long a given turn takes, though.

AI safety via debate could also inspire strategies.

[-]Zane2y1311

Neither of these would be allowed, because in the real world, you can't do a bunch of test "games" before or during the actual "game." There's no way to perform a proposed alignment plan in a faraway galaxy, check whether that galaxy is destroyed, and make decisions for what to do on Earth based on that data - let alone perform so many of those tests to inform a prediction market based on what they say.

I would have allowed player A to consult a prediction market made by other a bunch of other inexperienced players on who was really honest or lying. After all, in the real world, whoever was making the final decision on what plan to execute would be able to ask a prediction market what it thought. But the problem is that if I make a prediction market that's supposed to only be for other players around player A's level, somebody will just use a chess engine to cheat, bet in the market, and make it unrealistically accurate.

5jessicata2y

Chess is simulable, unlike the real world. If the player and advisors can use paper and pencil, or type in a chat, they can play chess. I think whether the advisors can use a chess engine is just part of the rules of the game, and you make the prediction market among those relevant advisors only.

[-]Zane2y158

Yes, if this were only about chess, then having the advisors play games with each other as A watched would help A learn who to trust. I'm saying that since the real-world scenario we're trying to model doesn't allow such a thing to happen, we artificially forbid this in the chess game to make it more like the real-world scenario. The prediction market thing, similarly, would require being able to do a test run so that the dishonest advisors could lose their money by the time A had to make a choice.

I don't think the advisors should be able to use chess engines, because then even the advisors themselves don't understand the reasoning behind what the chess engines are saying. The premise of the experiment involves the advisors telling A "this is my reasoning on what the best move is; try to evaluate if it's right."

5aphyer2y

I think it is very hard to artificially forbid that: there isn't a well-defined boundary between playing out a full game and a conversation like: "that other advisor says playing Rd4 is bad because of Nxd4, but after Nxd4 you can play Qd6 and win" "No, Qd6 doesn't win, playing Bf7 breaks up the attack." One thing that might work, though, is to deny back-and-forth between advisors. If each advisor can send one recommendation, and maybe one further response to a question from A, but not have a free-form conversation, that would deny the ability to play out a game.

4Zane2y

Yeah, that's a bit of an issue. I think in real life you would have some back-and-forth ability between advisors, but the complexity and unknowns of the real world would create a qualitative difference between the conversation and an actual game - which chess doesn't have. Maybe we can either limit back-and-forth like you suggested, or just have short enough time controls that there isn't enough time for that to get too far.

Nathan Helm-Burger

Nov 01, 2023

I'd be down to give it a shot as A. Particularly would be interested in trying the 'solve a predefined puzzle situation' as a way of testing the idea out.

I played a bit of chess in 6th grade, but wasn't very good, and have barely played since. It would be easy to find advisors for me.

SarahNibs

Oct 26, 2023

I would participate. Likely as A, but I'm fine with B if there are people worse-enough. I'm 1100 on chess.com, playing occasional 10 minute games for fun. Tend to be available Th/Fr/Sa/Su evenings Pacific, fine with very long durations.

RHollerith

Oct 25, 2023*

Based on my rating on the Free Internet Chess Server (FICS) in 2015, I estimate I would currently have a rating of about 1270 on Chess.com (on the assumption that the average player on FICS in 2015 is slightly better than the average today on Chess.com) which is regrettable because it is probably too high to make a good advisee, but probably too low to make a good advisor. Still, I am willing to participate.

(I still play, but these years I play as a guest, not as a registered user, which means I don't have a rating.)

I would have thought that giving the players 24 hours to make each move would approximate scientific research better than giving 4 hours for all the moves (or 40 moves like they tend to do in competition).

[-]Zane2y10

24 hours per move would make the experiment a lot more accurate, but I expect a lot of players might not be willing to play a game that could last several months. I'll ask everyone how long they can handle.

3RHollerith2y

If the chess players (and advisors) in this experiment were receiving approximately the same monetary compensation as scientific researchers receive, *then* giving the players 24 hours to make each move would approximate scientific research better than giving 4 hours for all the moves, but if the experiment lasts for months, it is unrealistic to expect *volunteers* to expend about the same level of mental effort on this experiment as they would expend on a salaried research job. Some volunteers might in fact expend that amount of effort at this due to their being very young and not yet having any model of the scarcity and the physiological costs of extended mental efforts, but that would be a bad thing because it would introduce variation into the experiment along a dimension other than the dimensions we want to measure. So, I take back the final paragraph of my previous comment, and I note that in the future, I should spend more time "playing out" things in my imagination before making a suggestion.

Dirichlet-to-Neumann

Oct 27, 2023

I'd be happy to play any of the A, B and C roles.

I'm a around 1850 elo FIDE, about 2000-2100 on lichess. I play a couple of blitz games daily.

I'd be willing to play at almost any cadence and have a lot of free time. I actually live in France, so a one-move-per-day game with someone living in the US would probably be ideal. Live sessions can be programmed from 16 GMT to 23 GMT on weekdays, and from 7 GMT to 23 GMT on weekends.

As I said I would be happy to play any role. I think it would be more interesting if the lower player is actually not a total beginner - total beginners are probably not hard to deceive. A decent club player with advisors about 300-500 elo higher would be best imo. And if we can experiment at many different elo levels, even better.

Registering a prediction: assuming the elo difference stay constant, better players will be much more difficult to deceive. And a GM would consistently pick up who is lying if you could rope up Caruana, Carlsen and Ding to do the experiment.

Alex A

Oct 27, 2023

I’m about 1000 ELO on chess.com and would be interested in playing as A. I play regularly, but haven’t had formal training or studied seriously. I’d be free weekdays after 7 pm ET.

Chris Land

Oct 26, 2023

Very interested in C, also B. I'm an over-the-board FM. Available many evenings (US) but not all. I enjoy recreational deception (e.g. Mafia / Werewolf) but I'm much better at chess than detecting or deploying verbal trickery.

Additional thoughts:

Written chess commentary by 'weak' players tends to be true but not the most relevant. After 1.e4 Nf6 2.e5, a player might say "Black can play 2...Nc6 developing the N and attacking the pawn on e5". True, but this neglects 3.exf6. This scales upwards. My commentary tends to be very relevant but I miss things that even stronger players do not.
Players choose a weaker move over a stronger move not so much because they reject the stronger move, but because they don't see the stronger move as an option. When going over games with students, I'll stop at a position, offer three moves and ask which is best. They'll consider and choose and explain reasoning. But there's a fourth option, a mate-in-one, and it was not selected. "You must see the move before you can play the move."
Based on 2, a deception strategy is to recommend a weak move over others even weaker. Stronger options? Ignored.

[-]Zane2y10

Sounds like a good strategy! ...although, actually, I would recommend you delete it before all the potential As read it and know what to look out for.

GoteNoSente

Oct 26, 2023

I could be interested in trying this, in any configuration. Preferred time control would be one move per day. My lichess rating is about 2200.

Are the advisors allowed computer assistance, do the dishonest and the honest advisor know who is who in this experiment, and are the advisors allowed to coordinate? I think those parameters would make a large difference potentially in outcome for this type of experiment.

[-]Zane2y10

No computers, because the advisors should be reporting their own reasoning (or, 2/3 of the time, a lie that they claim is their own reasoning.) I would prefer to avoid explicit coordination between the advisors, because the AIs might not have access to each other in the real world, but I'm not sure at the moment whether player A can show the advisors each other's suggestions and ask for critiques. I would prefer not to give either dishonest advisor information on who the other two were, since the real-world AIs probably can't read each other's source code.

5GoteNoSente2y

As an additional thought regarding computers, it seems to me that participant B could be replaced by a weak computer in order to provide a consistent experimental setting. For instance, Leela Zero running just the current T2 network (no look-ahead) would provide an opponent that is probably at master-level strength and should easily be able to crush most human opponents who are playing unassisted, but would provide a perfectly reproducible and beatable opponent.

1Zane2y

[facepalms] Thanks! That idea did not occur to me and drastically simplifies all of the complicated logistics I was previously having trouble with.

1GoteNoSente2y

I think having access to computer analysis would allow the advisors (both honest and malicious) to provide analysis far better than their normal level of play, and allow the malicious advisors in particular to set very deep traps. The honest advisor, on the other hand, could use the computer analysis to find convincing refutations of any traps the dishonest advisors are likely to set, so I am not sure whether the task of the malicious side becomes harder or easier in that setup. I don't think reporting reasoning is much of a problem here, as a centaur (a chess player consulting an engine) can most certainly give reasons for their moves (even though sometimes they won't understand their own advice and be wrong about why their suggested move is good). It does make the setup more akin to working with a superintelligence than working with an AGI, though, as the gulf between engine analysis and the analysis that most/all humans can do unassisted is vast.

1Zane2y

The problem is that while the human can give some rationalizations as to "ah, this is probably why the computer says it's the best move," it's not the original reasoning that generated those moves as the best option, because that took place inside the engine. Some of the time, looking ahead with computer analysis is enough to reproduce the original reasoning - particularly when it comes to tactics - but sometimes they would just have to guess.

anaguma

Oct 25, 2023

I’m rated about 2100 USCF and 2300 Lichess, and I’m open to any of the roles. I’m free on the weekend and weekdays after 3 pm pacific. I’m happy to play any time control including multi-month correspondence.

WitheringWeights

Oct 25, 2023

Hi!

I'm rated between 1500 and 1700 on lichess, I'd be happy to take part in the game in whatever role.

Olli Järviniemi

Oct 25, 2023

Open for any of the roles A, B, C. I should have a flexible schedule at my waking hours (around GMT+0). Willing to play for even long times, say a month (though in that case I'd be thinking about "hmm, could we get more quantity in addition to quality"). ELO probably around 1800.

antanaclasis

Oct 25, 2023

I would be interested in this, probably in role A (but depending on the pool of other players possibly one of the other roles; I have no opposition to any of them). I play chess casually with friends, and am probably at somewhere around 1300 elo (based on my winrate against one friend who plays online).

Carl Feynman

Oct 25, 2023

I am happy to be A. I haven't played chess since my teenage years, wherein my record was one of occasional games with friends and relatives, leading to almost unrelieved defeat. But that was four decades ago, and I like to imagine I've become pretty good at judging arguments. So if I competed, it would be on a basis of almost total chess ignorance, but ability to follow complex chains of logic.

Lost Futures

Oct 25, 2023

Interested in any of the roles. I haven't played chess competitively in close to a decade and my USCF elo was in the 1500s at the time of stopping. So long as I'm given a heads up in advance, I'm free almost all day on Wednesdays, Fridays, and Sundays.

Zane

Oct 25, 2023

I can be any of A, B, or C. I've been playing chess for the past ten years, and my USCF rating was in the upper 1500s when I last played in-person a year ago. I'm usually available from 9PM-UCT to 2AM-UCT (afternoon to evenings in American time) every day, and on Saturdays from 5PM-UCT to 2AM-UCT.

17 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:37 AM

[-]Richard Willis2y74

I would be interested in this. A few years ago I failed to convince my favourite chess YouTubers to engage in something similar. My preference for the roles is A>C>B and I am 2100 on chess.com, 2300 lichess. I'm fairly addicted to chess, so willing to spend many hours on this.

Some musing for the format... I had proposed that instead of a game, the 'human' is shown positions that have been selected to be very complicated, but with there being one ambiguously good move. The good move should not be entirely tactical in nature, because this is easy to verify, but rather strategic. I have a book with such positions, but you can find examples online.

The reason for this is that you would otherwise need to be careful about the format. There are some positions that I believe I understand very well and even a top player would really struggle to deceive me in. However, there are also positions in which I have not the faintest clue what is going on. The latter are the more interesting ones to test. If the 'deceptive AIs' are forced to lie in a position I understand well, I could then discount them for the rest of the experiment. Even with something like randomising their identifiers at each move, grammatical tells might be present. Therefore, playing out a game, the 'deceptive AIs' would need to be truthful on many on the moves and only lie in a handful, which is additional complexity.

[-]Zane2y10

Individual positions like that could be an interesting thing to test; I'll likely have some people try out some of those too.

I think the aspect where the deceivers have to tell the truth in many cases to avoid getting caught could make it more realistic, as in the real AI situation the best strategy might be to present a mostly coherent plan with a few fatal flaws.

[-]Richard Willis2y40

I agree that knowing when to lie is part of the challenge a deceptive AI will face. However, I would argue that a coherent plan is needed for every move suggestion. In a game of chess, there are typically only a few critical positions, and it is these where a deceptive AI ought to strike. This is similar to the cheating discussions in chess - a top player would only need a hint in a few positions to greatly benefit - the other 90% of moves they can make without assistance.

But by focusing on challenging positions, it could be a more efficient use of the participant's time. Otherwise, for a whole game you may only have had 3 moves where a deceptive AI actually lied.

[-]Nicholas / Heather Kross2y30

My chess.com ELO is astoundingly low. I hereby volunteer for role (A), and/or any other role in an experiment setup where you think "Dang, I want some entity that makes moves that aren't literally random, but are also nigh-guaranteed to lose."

[-]Sune2y20

Why select a deterministic game with complete information for this? I suspect games like poker or backgammon would be easier for the adversarial advisors to fool the player and that these games are a better model of the real world scenario.

[-]aphyer2y20

For an entertainingly thematic choice, I'd recommend Twilight Struggle.

[-]Joe Collman2y20

I'm not sure about poker, but I think for backgammon it'd be harder to get three levels where C beats B beats A reliably. I'm not a backgammon expert, but I could win games against experts - it's enough to be competent and lucky. A may also learn too fast - becoming competent is much faster for backgammon than for chess. (needing a larger sample size due to randomness makes A learning more of a problem - this may apply with poker too??)

I have a lot more experience and skill at chess, but it's still pretty simple to find players who'll beat me 90% of the time.

[-]Joern Stoehler2y50

See Table 2 in https://www.emilkirkegaard.com/p/skill-vs-luck-in-games for

[...] the corresponding winning probability of a player who is exactly one standard deviation better than his opponent. We refer to this probability as p^sd . For comparison, we also provide the winning probablities when a 99% percentile player is matched against a 1% percentile player, which we call p99 1 .

Go & Chess (p^sd=83.3,72.9) are notably above Backgammon (p^sd=53.6%)

[-]Joe Collman2y20

Oh that's cool - nice that someone's run the numbers on this.
I'm actually surprised quite how close-to-50% both backgammon and poker are.

[-]Zane2y10

Agreed that it could be a bit more realistic that way, but the main constraint here is that we need a game where there are three distinct levels of players who always beat each other. The element of luck in games like poker and backgammon makes that harder to guarantee (as suggested by the stats Joern_Stoller brought up). And another issue is that it'll be harder to find a lot of skilled players at different levels from any game that isn't as popular as chess is - even if we find an obscure game that would in theory be a better fit for the experiment, we won't be able to find any Cs for it.

[-]Hide2y20

I would happily play the role of B.

I do not have an established FIDE rating, but my strength is approximately 1850 FIDE currently (based on playing against FIDE rated players OTB quite often, as well as maintaining 2100-2200 blitz ratings on Lichess & Chess.com, and 2200-2300 bullet). I'd be available after 6:30 pm (UTC+10) until ~12:00 pm (UTC+10). Alternatively, weekends are very flexible. I could do a few hours per week.

I agree short/long time controls are a relevant, because speed is a skill that is almost entirely independent of conceptual knowledge and is mostly a function of baseline playing ability.

Edit: Would also be fine with C

[-]AdamYedidia2y20

I'd be excited to play as any of the roles. I'm around 1700 on lichess. Happy with any time control, including correspondence. I'm generally free between 5pm and 11pm ET every day.

[-]javva2092y20

I'm rated ~1600 on Lichess and would participate in whichever role that rating fits best with.

I have some questions, such as which time controls are used but am most interested in how you plan on having the "C" group give advice to the "A" group. Would they just give notation (for instance Re1) or would they type or speak a little bit of context alongside such as "Centralize your pieces" or "complete your development by activating your rook"?

I live in the Bay Area and am generally available on weekdays anytime after 5PST.

My only concern is that while very inexperienced players may be able to determine who is giving good advice in the early game and for general improving moves, the much better players who will be advising will be much more concrete with their advice. By concrete I mean moves that are tactically justified whose reasonings would be utterly anathema to even a casual player making it much more difficult for them to determine who is lying. This problem would be exacerbated the less context the advisors are able to give for each recommendation.

[-]Zane2y10

Unsure about the time controls at the moment; see my response to aphyer. The advisors would be able to give the A player justification for the move they've recommended.

The concern that A might not be able to understand the reasoning that the advisors give them is a valid one, and that's the whole point of the experiment! If A can't follow the reasoning well enough to determine whether it's good advice, then (says the analogy) people who are asking AIs how to solve alignment can't follow their reasoning well enough to determine whether it's good advice.

[-]NoriMori19922y10

Why does B have to be better at chess than A but worse than C? Eliezer's post only specifies that B has to be weaker than C; unless I missed something, it doesn't say they have to be stronger than A.

[-]Zane2y10

If B were the same level as A, then they wouldn't pose any challenge to A; A would be able to beat them on their own without listening to the advice of the Cs.

[-]Zane2y10

I've created a Manifold market if anyone wants to bet on what happens. If you're playing in the experiment, you are not allowed to make any bets/trades while you have private information (that is, while you are in a game, or if I haven't yet reported the details of a game you were in to the public.)

https://manifold.markets/Zane_3219/will-chess-players-win-most-of-thei

Moderation Log

LESSWRONG
LW

97

[ Question ]

Lying to chess players for alignment

97

97

18 Answers sorted by
top scoring

Oct 25, 2023

Oct 26, 2023*

Oct 26, 2023

Oct 25, 2023

Nov 01, 2023

Oct 26, 2023

Oct 25, 2023*

Oct 27, 2023

Oct 27, 2023

Oct 26, 2023

Oct 26, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

97

[ Question ]

Lying to chess players for alignment

97

97

18 Answers sorted by top scoring

Oct 25, 2023

Oct 26, 2023*

Oct 26, 2023

Oct 25, 2023

Nov 01, 2023

Oct 26, 2023

Oct 25, 2023*

Oct 27, 2023

Oct 27, 2023

Oct 26, 2023

Oct 26, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

Oct 25, 2023

18 Answers sorted by
top scoring