This is a linkpost for https://cullenokeefe.com/blog/debate-evidence
Planned summary for the Alignment Newsletter:
<@Debate@>(@Writeup: Progress on AI Safety via Debate@) requires us to provide a structure for a debate as well as rules for how the human judge should decide who wins. This post points out that we have an existing system that has been heavily optimized for this already: evidence law, which governs how court cases are run. A court case is high-stakes and involves two sides presenting opposing opinions; evidence law tells us how to structure these arguments and how to limit the kinds of arguments debaters can use. Evidence is generally admissible by default, but there are many exceptions, often based on the fallibility of fact-finders.
As a result, it may be fruitful to look to evidence law for how we might structure debates, and to see what types of arguments we should be looking for.
Planned opinion:
This seems eminently sensible to me. Of course, evidence law is going to be specialized to arguments about innocence or guilt of a crime, and may not generalize to what we would like to do with debate, but it still seems like we should be able to learn some generalizable lessons.
In this post, I highlight some parallels between AI Safety by Debate (“Debate”) and evidence law.
Evidence law structures high-stakes arguments with human judges.
The prima facie reason that Evidence law (“Evidence”) is relevant to Debate is because Evidence is one of the few areas, like Debate, where debates have high stakes: potentially including severe criminal penalties or millions of dollars in liability. Other high-stakes debates could include parliamentary or electoral debates, but these are less substantively limited (i.e., there are fewer restraints on what debaters can do) and less aimed at seeking truth (and more aimed at political theater).
In court proceedings, questions of law are decided by the judge, while the questions of fact are decided by the finder of fact (usually the jury, but sometimes a judge). The finder of fact weighs the persuasiveness of factual arguments (e.g., whether the defendant shot the victim, and whether he intended to do so). In all cases, like in Debate, the final arbiter of factual debates is human.
Evidence law limits the types of arguments available to debaters.
The goal of the Federal Rules of Evidence is “ascertaining the truth and securing a just determination.”[1] Therefore, generally, “relevant evidence is admissible unless [otherwise provided].”[2] A piece of evidence is relevant if “(a) it has any tendency to make a fact more or less probable than it would be without the evidence; and (b) the fact is of consequence in determining the action.”[3]
However, the bulk of Evidence law is dedicated to exceptions to this presumption of admissibility. The precision of these exceptions varies significantly. Some are less precise (“standards,” in legal jargon) such as Rule 403: “The court may exclude relevant evidence if its probative value is substantially outweighed by a danger of one or more of the following: unfair prejudice, confusing the issues, misleading the jury, undue delay, wasting time, or needlessly presenting cumulative evidence.”[4] Others are more specific (“rules”).
As Rule 403 exemplifies, many of the exceptions to the general admissibility of relevant evidence are based on the fallibility of fact-finders. Evidence that is relevant but likely to be on-balance detrimental to truth-seeking is therefore excluded. Other examples of rules of this form include:
Relevance to Debate
Types of Arguments to Watch For
The rules of Evidence have evolved over long experience with high-stakes debates, so their substantive findings on the types of arguments that prove problematic for truth-seeking are relevant to Debate.
Opportunities for Structuring Debate
The rules of evidence could also be used to structure Debate: e.g., by training AI debaters to not make certain types of arguments, or by having a mediator screen any arguments that would violate the rules, such that the ultimate judge does not see them.
Fed. R. Evid. 102. ↩︎
Fed. R. Evid. 402. ↩︎
Fed. R. Evid. 401. ↩︎
Fed. R. Evid. 403. ↩︎
Fed. R. Evid. 404. ↩︎
Fed. R. Evid. 801–02. ↩︎
Fed. R. Evid. 609. ↩︎
Fed. R. Evid. 610. ↩︎