Sr Applied Research Engineer at LinkedIn
AI Alignment, Red Teaming, Safety
Oh that's interesting. Wouldn't that slightly bias the results? For eg. the paper claims no advantage of debate over QA without article. Intuitively if the weak LLM isn't pretrained on QA without article then debate should work better than consultancy. On the other hand, if it is, then intuitively there should be no difference between Debate and Consultancy which is what the team observes. Wdyt?
Ah that makes sense, thank you.
Did the team also ensure that there wasn't any data leakage between the tasks being evaluated and the pretraining data? For context, I'm thinking of replicating the results with Llama so wondering about the same.
My apologies I didn't frame my question correctly.
Our current work is looking into training our LLM judges to be better proxies of human judges
My understanding from this statement is that the team plans to finetune Weak LLMs on human judges and then use them as a judge for Strong LLM Debates. This makes sense right now, when human judges are able to assess Strong LLM Debates fairly robustly.
What happens when we want to use a Weak LLM as a judge but there is no accurate or good enough human judge? At that point we won't be able to finetune the Weak LLM because there is no good human judge. Do we assume that at that stage the Weak LLM itself will be pretty robust?
Our current work is looking into training our LLM judges to be better proxies of human judges
How does this scale to superintelligent AI capabilities? Wouldn't Debate be severely restricted by a lack of accurate human judges at that point? Or is the idea akin to Weak to Strong generalisation wherein the human judge can act like a weak teacher judge at that point.
That makes sense.
Do you suppose a suitable proxy for prompt quality can be replicating these experiments with LLM debaters/judges of different sizes? Let's say P is the optimal prompt and Q is a suboptimal one, then LLM performance with prompt Q <= LLM performance with prompt P <= bigger LLM performance with prompt Q.