Hey Tao,
We agree this is a major limitation, and discuss this within the Discussion and Appendix section.
We tried using base GPT-4, unfortunately, as it has no helpfulness training - it finds it exceptionally hard to follow instructions. We'd love access to Helpful-only models but currently, no scaling labs offer this.
It's on the list.
I have to disagree; BoN is a really good approximation of what happens under RL-finetuning (which is the natural learning method for multi-turn debate).
I do worry "persuasiveness" is the incorrect word, but it seems to be a reasonable interpretation when comparing debaters A and B. E.g. for a given question and set of answers, if A wins independent of the answer assignment (e.g no matter what answer it has to defend) it is more persuasive then B.
Hey this is super exciting work, I'm a huge fan of the clarification over the protocol and introduction of cross-examination!
Will you be able to open-source the dataset at any point? In particular, the questions, human arguments and then counter-claims. It would be very useful for further work.
Hey thank you for the comments! (Sorry for slow response i'll try reply in line).
1) So i think input sourcing could be a great solution! However one issue we have especially with current systems (and in particular Independent Reinforcement Learning) is that it's really really difficult to disentangle other-agents from the environment. As a premise, imagine watching a law of nature and not being able to work out if this a learned behaviour or some omniscient being. Agents need not come conveniently packaged in some "sensors-actuators-internal structur...
Hey this is cool work. I think i disagree with a couple of these comments coming from a Scalable Oversight background rather than an AI control, and what weak/strong gaps we intend to look over.
My main contention is the framing that you can talk about Consultants producing arguments with convincingness scores $c_i$ and $c_c$, which are independent of the protocol and judge that you are using. I try to break this down into the two claims. I make my arguments backwards, but in particular, I contest the claims:
1) “Open consultancy i... (read more)