How to be skeptical
Community
The Center For Applied Rationality (CFAR) checklist is a heuristic for assessing the admissibility of one's own testimony.
What of the challenge of evaluating the testimony of others?
Slapping the label of a bias on a situation?
Arguing at the object level by provision of evidence to the contrary?
This risks Gish Gallop. For those who prefer to pick their battles, I commisioned this post of my time, a structural intervention into the information ecosystem.
We need not event the wheel, for legal theorists have researched this issue for years, while practitioners and courts have identified heuristics useful to lay people interested in this field.
Precedent
The Daubert standard provides a rule of evidence regarding the admissibility of expert witnesses' testimony during United States federal legal proceedings. Pursuant to this standard, a party may raise a Daubert motion, which is a special case of motion in limine raised before or during trial to exclude the presentation of unqualified evidence to the jury. The Daubert trilogy refers to the three United States Supreme Court cases that articulated the Daubert standard:
-https://en.wikipedia.org/wiki/Daubert_standard
Further reading on the case is available here on Google Scholar
Practice
How can this be applied in practice?
What is the first principle of skepticism. It's effectively synonymous: 'question'
What question? This isn't the 5 W's of primary school, after all.
I have summarized critical questions to a reading here to get the ball rolling:
Issues to consider when contesting and evaluating expert opinion evidence
A. Relevance (on the voir dire)
I accept that you are highly qualified and have extensive experience, but how do we know that your level of performance regarding . . . [the task at hand — eg, voice comparison] is actually better than that of a lay person (or the jury)?
What independent evidence... [such as published studies of your technique and its accuracy] can you direct us to that would allow us to answer this question?
What independent evidence confirms that your technique works?
Do you participate in a blind proficiency testing program?
Given that you undertake blind proficiency exercises, are these exercises also given to lay persons to determine if there are significant differences in results, such that your asserted expertise can be supported?
B. Validation
Do you accept that techniques should be validated?
Can you direct us to specific studies that have validated the technique that you used?
What precisely did these studies assess (and is the technique being used in the same way in this case)?
Have you ever had your ability formally tested in conditions where the correct answer was known? (ie, not a previous investigation or trial)
Might different analysts using your technique produce different answers?
Has there been any variation in the result on any of the validation or proficiency tests you know of or participated in?
Can you direct us to the written standard or protocol used in your analysis?
Was it followed?
C. Limitations and errors
Could you explain the limitations of this technique?
Can you tell us about the error rate or potential sources of error associated with this technique?
Can you point to specific studies that provide an error rate or an estimation of an error rate for your technique?
How did you select what to examine?
Were there any differences observed when making your comparison . . . [eg, between two fingerprints], but which you ultimately discounted? On what basis were these discounted?
Could there be differences between the samples that you are unable to observe?
Might someone using the same technique come to a different conclusion?
Might someone using a different technique come to a different conclusion?
Did any of your colleagues disagree with you?
Did any express concerns about the quality of the sample, the results, or your interpretation?
Would some analysts be unwilling to analyse this sample (or produce such a confident opinion)?
...
D Personal proficiency
...
Have you ever had your own ability... [doing the specific task/using the technique] tested in conditions where the correct answer was known?
If not, how can we be confident that you are proficient?
If so, can you provide independent empirical evidence of your performance?
E Expressions of opinion
...
Can you explain how you selected the terminology used to express your opinion? Is it based on a scale or some calculation?
If so, how was the expression selected?
Would others analyzing the same material produce similar conclusions, and a similar strength of opinion? How do you know?
Is the use of this terminology derived from validation studies?
Did you report all of your results?
You would accept that forensic science results should generally be expressed in non-absolute terms?
More
For further reading, I recommend the seminal text in cross-examination which is the 1903 The Art of Cross Examination.
The Full Text is available free here on Project Gutenberg.
Other countries use different standards, such as the Opinion Rule in Australia.
The failure of counter-arguments argument
Suppose you read a convincing-seeming argument by Karl Marx, and get swept up in the beauty of the rhetoric and clarity of the exposition. Or maybe a creationist argument carries you away with its elegance and power. Or maybe you've read Eliezer's take on AI risk, and, again, it seems pretty convincing.
How could you know if these arguments are sound? Ok, you could whack the creationist argument with the scientific method, and Karl Marx with the verdict of history, but what would you do if neither was available (as they aren't available when currently assessing the AI risk argument)? Even if you're pretty smart, there's no guarantee that you haven't missed a subtle logical flaw, a dubious premise or two, or haven't got caught up in the rhetoric.
One thing should make you believe the argument more strongly: and that's if the argument has been repeatedly criticised, and the criticisms have failed to puncture it. Unless you have the time to become an expert yourself, this is the best way to evaluate arguments where evidence isn't available or conclusive. After all, opposite experts presumably know the subject intimately, and are motivated to identify and illuminate the argument's weaknesses.
If counter-arguments seem incisive, pointing out serious flaws, or if the main argument is being continually patched to defend it against criticisms - well, this is strong evidence that main argument is flawed. Conversely, if the counter-arguments continually fail, then this is good evidence that the main argument is sound. Not logical evidence - a failure to find a disproof doesn't establish a proposition - but good Bayesian evidence.
In fact, the failure of counter-arguments is much stronger evidence than whatever is in the argument itself. If you can't find a flaw, that just means you can't find a flaw. If counter-arguments fail, that means many smart and knowledgeable people have thought deeply about the argument - and haven't found a flaw.
And as far as I can tell, critics have constantly failed to counter the AI risk argument. To pick just one example, Holden recently provided a cogent critique of the value of MIRI's focus on AI risk reduction. Eliezer wrote a response to it (I wrote one as well). The core of Eliezer's and my response wasn't anything new; they were mainly a rehash of what had been said before, with a different emphasis.
And most responses to critics of the AI risk argument take this form. Thinking for a short while, one can rephrase essentially the same argument, with a change in emphasis to take down the criticism. After a few examples, it becomes quite easy, a kind of paint-by-numbers process of showing that the ideas the critic has assumed, do not actually make the AI safe.
You may not agree with my assessment of the critiques, but if you do, then you should adjust your belief in AI risk upwards. There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.
In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments. This would be higher, but we haven't yet seen the most prominent people in the AI community take a really good swing at it.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)