User Comment Replies

Open consultancy: Letting untrusted AIs choose what answer to argue for

Hey this is cool work. I think i disagree with a couple of these comments coming from a Scalable Oversight background rather than an AI control, and what weak/strong gaps we intend to look over.

My main contention is the framing that you can talk about Consultants producing arguments with convincingness scores $c_i$ and $c_c$, which are independent of the protocol and judge that you are using. I try to break this down into the two claims. I make my arguments backwards, but in particular, I contest the claims:

1) “Open consultancy i... (read more)

2Fabien Roger1y

I agree that open consultancy will likely work less well than debate in practice for very smart AIs, especially if they don't benefit much from CoT, in big part because interactions between arguments is often important. But I think it's not clear when and for what tasks this starts to matter, and this is an empirical question (no need to argue about what is "grounded"). I'm also not convinced that calibration of judges is an issue, and that getting a nice probability matters as opposed to getting an accurate 0-1 answer. Maybe the extent to which open consultancy dominates regular consultancy is overstated in the post, but I still think that you should be able to identify the kinds of questions for which you have no signal, and avoid the weird distributions of convincingness / non-expert calibrations where the noise from regular consultancy is better than actual random noise on top of open consultancy.

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan1y10

Hey Tao,

We agree this is a major limitation, and discuss this within the Discussion and Appendix section.

We tried using base GPT-4, unfortunately, as it has no helpfulness training - it finds it exceptionally hard to follow instructions. We'd love access to Helpful-only models but currently, no scaling labs offer this.

It's on the list.

Debating with More Persuasive LLMs Leads to More Truthful Answers

Akbir Khan1y*63

I have to disagree; BoN is a really good approximation of what happens under RL-finetuning (which is the natural learning method for multi-turn debate).

I do worry "persuasiveness" is the incorrect word, but it seems to be a reasonable interpretation when comparing debaters A and B. E.g. for a given question and set of answers, if A wins independent of the answer assignment (e.g no matter what answer it has to defend) it is more persuasive then B.

Debate update: Obfuscated arguments problem

Akbir Khan2yΩ020

Hey this is super exciting work, I'm a huge fan of the clarification over the protocol and introduction of cross-examination!

Will you be able to open-source the dataset at any point? In particular, the questions, human arguments and then counter-claims. It would be very useful for further work.

Why multi-agent safety is important

Akbir Khan3y20

Hey thank you for the comments! (Sorry for slow response i'll try reply in line).

1) So i think input sourcing could be a great solution! However one issue we have especially with current systems (and in particular Independent Reinforcement Learning) is that it's really really difficult to disentangle other-agents from the environment. As a premise, imagine watching a law of nature and not being able to work out if this a learned behaviour or some omniscient being. Agents need not come conveniently packaged in some "sensors-actuators-internal structur... (read more)

LESSWRONG
LW

All of Akbir Khan's Comments + Replies