Suppose you read a convincing-seeming argument by Karl Marx, and get swept up in the beauty of the rhetoric and clarity of the exposition. Or maybe a creationist argument carries you away with its elegance and power. Or maybe you've read Eliezer's take on AI risk, and, again, it seems pretty convincing.

How could you know if these arguments are sound? Ok, you could whack the creationist argument with the scientific method, and Karl Marx with the verdict of history, but what would you do if neither was available (as they aren't available when currently assessing the AI risk argument)? Even if you're pretty smart, there's no guarantee that you haven't missed a subtle logical flaw, a dubious premise or two, or haven't got caught up in the rhetoric.

One thing should make you believe the argument more strongly: and that's if the argument has been repeatedly criticised, and the criticisms have failed to puncture it. Unless you have the time to become an expert yourself, this is the best way to evaluate arguments where evidence isn't available or conclusive. After all, opposite experts presumably know the subject intimately, and are motivated to identify and illuminate the argument's weaknesses.

If counter-arguments seem incisive, pointing out serious flaws, or if the main argument is being continually patched to defend it against criticisms - well, this is strong evidence that main argument is flawed. Conversely, if the counter-arguments continually fail, then this is good evidence that the main argument is sound. Not logical evidence - a failure to find a disproof doesn't establish a proposition - but good Bayesian evidence.

In fact, the failure of counter-arguments is much stronger evidence than whatever is in the argument itself. If you can't find a flaw, that just means you can't find a flaw. If counter-arguments fail, that means many smart and knowledgeable people have thought deeply about the argument - and haven't found a flaw.

And as far as I can tell, critics have constantly failed to counter the AI risk argument. To pick just one example, Holden recently provided a cogent critique of the value of MIRI's focus on AI risk reduction. Eliezer wrote a response to it (I wrote one as well). The core of Eliezer's and my response wasn't anything new; they were mainly a rehash of what had been said before, with a different emphasis.

And most responses to critics of the AI risk argument take this form. Thinking for a short while, one can rephrase essentially the same argument, with a change in emphasis to take down the criticism. After a few examples, it becomes quite easy, a kind of paint-by-numbers process of showing that the ideas the critic has assumed, do not actually make the AI safe.

You may not agree with my assessment of the critiques, but if you do, then you should adjust your belief in AI risk upwards. There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.

In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments. This would be higher, but we haven't yet seen the most prominent people in the AI community take a really good swing at it.

New Comment
43 comments, sorted by Click to highlight new comments since:
[-]Jack170

The flip side of this is the "Failure to Convince" argument. If experts being unable to knock down an argument is evidence for that argument then the argument failing to convince the experts evidence against the argument.

Assuming, of course, that there is evidence of the experts having actually considered the argument properly, instead of just giving the first or second answer that came to mind in response to it. (Or to be exact, the first-thought-that-came-to-mind dismissal is evidence too, but very weak such.)

This is unfortunately reminiscent of the standard LW objection "well, they're only smart in the lab" to any scientist who doesn't support local tropes.

This is unfortunately reminiscent of the standard LW objection "well, they're only smart in the lab" to any scientist who doesn't support local tropes.

I don't believe that is a standard LW objection.

It's only strong evidence if there's a good chance that a good argument will convince them. Would you expect a good argument to convince an expert creationist that he's wrong?

Yes, this is correct (on conservation of expected evidence, if nothing else: if experts were all convinced, then we'd believe the argument more).

Now, there's a whole host of reasons to suspect that the experts are not being rational with the AI risk argument (and if we didn't have those reasons, we'd be even more worried!), but the failure to convince experts is a (small) worry. Especially so with experts who have really grappled with the arguments and still reject them (though I can only think of one person in that category: Robin Hanson).

Do you mean that the counterarguments were objectively unconvincing, or that you personally were not convinced? The latter is a common problem with arguments with creationists: they just won't believe the counterarguments.

The former is of course hard to ascertain. But an example would be how well the argument does convincing the unconvinced.

Do you mean that the counterarguments were objectively unconvincing, or that you personally were not convinced?

The second (as the critics didn't commit any logical fallacies, so there's no objective criteria :-(

So yes, it's a judgement call on my part, that you don't have to share. But what I'm claiming is that if you judge that the critiques have failed, then you should become more confident in the initial argument.

There's a kind of conservation of expected evidence here: if you'd judged the critics had succeeded, then you would have adjusted the other way.

But an example would be how well the argument does convincing the unconvinced.

Meh. Unusual arguments rarely convince people, whatever their merits. Maybe if you walked an unconvinced through the initial argument, a rebuttal, and a counter rebuttal, and so on, and asked whether they judged that the rebuttals worked (independently of the truth of the initial argument)?

But what I'm claiming is that if you judge that the critiques have failed, then you should become more confident in the initial argument.

This is your advice in the congruent case of the unconvinceable creationist?

An already very convinced person has a probability distribution already very peaked at his/her own truth, so no amount of evidence, argumentative or factual, could change that.
I think Stuart was aiming at rational people who avoid to have peaked opinions on something for which there's no evidential support. Counter-argument failure is weak evidence, so it won't sway the zealot, but should sway, albeit by a small quantity, the reasonable.

Yes, but everyone thinks they're reasonable. Just because I can imagine sufficiently rational behaviour does not make me sufficiently rational on an arbitrary issue for this to hold.

Well, everyone can think they're reasonable for an arbitrary definition of reasonable. If we take it to be "being moved by small quantity of evidence", then it is possible to check if we are being reasonable or not, on a subject.
I don't think we have such a poor access to our brain that we aren't even able to tell if we are convinced or not...

Er - ignore them? Creationists have well established epistemic failures. Unless you have evidence the same failures are at work here, then what creationists think isn't particularly relevant.

The fact that the counter-critiques to creationism seem so very effective, and are very weakly re-rebutted, is more relevant.

Yes, I know they're wronger than a wrong thing (hence my using them as a counterexample - and, of course, that you used them in your post), but you haven't shown the difference in the shape of the reasoning applied: your proposed razor doesn't work well on beliefs held on a level below rational consideration (rational consideration being something that many creationists can do quite well, if sufficiently compartmentalised away from their protected beliefs).

I'm not sure I get your point here, sorry!

I'm trying to think of ways that rational people can use to evaluate claims, not ways that can be used rhetorically to convince people in general...

My point is to warn people who want to be rational of a failure mode that makes this razor not something to be relied upon precisely when they hold the belief in question strongly.

Ok, I see the point, and agree it's an issue.

You don't say what "the AI risk argument" is. Instead, you provide a link to a long and rambling document by one E. Yudkowsky with no abstract or clear thesis. What thesis are you claiming is being strengthened by a lack of credible counter-arguments?

I agree that the counterarguments to AI risk seem quite poor (to me), and this is some evidence in favor of AI risk. However, since humans are biased towards not being convinced by any counterargument to something they strongly believe, this limits how much we can use the unconvincingness of counterarguments to AI risk to update in favor of AI risk.

However, since humans are biased towards not being convinced by any counterargument to something they strongly believe

Indeed. It has been experimentally tested that strong counterarguments are less likely to convince a strong believer.

“The general idea is that it’s absolutely threatening to admit you’re wrong,” says political scientist Brendan Nyhan, the lead researcher on the Michigan study. The phenomenon — known as “backfire” — is “a natural defense mechanism to avoid that cognitive dissonance.”

(I'm sure this was linked on LW before, but couldn't find it quickly ...)

So the proposed "failure of counter-arguments argument" falls afoul of errors in human thinking.

[-]gwern260

The backfire effect, as far as I know, has never been replicated and bears all the hallmarks of being your classic counterintuitive psych finding which will disappear the moment anyone looks too hard at it. I've been trying to discourage people from citing it...

[-]gjm50

I've been trying to discourage people from citing it...

Nononono, don't do that! It'll just make them believe it more strongly.

(The nice thing is that actually your attempts at discouragement will, roughly speaking, work if and only if the discouragement is correct!)

What other (popular) psychology findings do you think won't hold up?

Stereotype threat looks extremely questionable to me, and dual n-back is more or less finished as far as I'm concerned. Those are the only two famous findings which I can think of off-hand; did you have any specific ones in mind?

Or maybe the evidence for the effect is so strong that you refuse to believe it. :)

I feel obligated to point out that the backfire effect, even in the original paper, applied only to a few zealots, and not everyone and not relatively moderate subjects. So your obvious joke is not itself consistent with the paper.

I'm not sure I understand why you think any given counter-argument would not be susceptible to employing same sorts of bias-inducing mechanisms (elegant rhetoric, etc.) as the argument it sets out to counter.

As long as there are some actual arguments in the critiques and counter-critiques, you will gain a better appreciation of the strength of the initial argument, even if there is a lot of rhetoric.

There is also the issue that many of the AI risk arguments have traditionally only been available in inconvenient formats, like the whole complexity of value argument requiring you to read half the Sequences in order to properly understand it. More recent papers have begun to fix this, but to a large extent the lack of good critiques of the AI risk argument could be because there haven't been good arguments for AI risk easily available.

Good point. We'll see if better counter-arguments develop.

In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments.

The most salient example of this phenonemon for me was the FOOM debate. Eliezer's arguments were unremarkable wile Robin Hanson's argument in both the posts and the comments did an admirable job of supporting Eliezer's position.

One thing to consider is that the argument for AI Risk covers a lot of probability-space. Counterarguments can only remove small portions of probability-space by ruling out specific subsets of X. Additionally, as you point out, many counterarguments seem to overlap and ruling out each particular version of a counterargument against a subset of X can not increase the probability-space of AI Risk more than the independent probability-space that subset of X accounts for. The existence of weak counterarguments also does not mean that the probability-space they belong to is not vulnerable to a stronger counterargument. P(AI Risk | ~Counterargument-1) may equal P(AI Risk | ~Counterargument-1 AND ~Counterargument-2) but be greater than P(AI Risk | Strong-counterargument-X)

I think you've mostly accounted for this by only considering well-thought-out counterarguments that are more likely to be independent and strong. I am not confident enough in my ability to predict how likely it is for strong counterarguments to be found to use this as evidence for AI Risk.

I think it's more straightforward when the situation is compared to the "No AI Risk" default position that lays out a fairly straightforward progression of technology and human development in harmony: In this case AI Risk is the counterargument, and it's very convincing to me. Counter-counterarguments against AI Risk are often just "but of course AI will work out okay, that's what our original argument said". A convincing argument for No AI Risk would have to lay out a highly probable scenario for the harmonious future development of humans and machines into the indefinite future with no intervention in the development of AI. That seems like a very tall order after reading about AI Risk.

There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.

This doesn't sound quite right. I think what matters is the amount of cogent, non-redundant thought about the subject demonstrated by the failed critiques. If the critique is just generally bad, it shouldn't have an impact either way (I could randomly computer-generate millions of incoherent critiques of creationism, but that wouldn't affect my credence)

Similarly if a hundred experts make the same flawed counterargument, there would be rapidly diminishing returns after the first few.

Similarly if a hundred experts make the same flawed counterargument, there would be rapidly diminishing returns after the first few.

Actually, no: because this would be a sign that they aren't capable of finding a better counterargument!

I thought of that. If they all think of the same thing, they could be going for something obvious but not quite right. Something that pops out at this class of expert and so they don't feel the need to go any further. But if you have an answer to that, that doesn't mean there isn't a remaining problem not available to surface inspection.

Suppose you read Svante Arrhenius' prediction of global warming from greenhouse gases, and you get swept up in the clarity of the exposition. Do you evaluate the content of the paper on its own merits? God no.

(No-sarcasm mode: I think that while criticism provides an opportunity to test a work, direct evaluation is still best. Among other problems, what happens in the inevitable case that someone tries to fake all the signs of their argument having successfully repelled criticism? Shit becomes he-said-she-said awful fast if you don't do direct evaluations.)

The problem is in things like AI predictions, where direct evaluations aren't easy to come by.

By direct evaluation I mean just evaluating things (both claims and criticisms) using your own brainpower and basic resources that you harness to figure out the truth.

In the case of Svante Arrhenius, this means reading through the paper carefully, looking for math mistakes and important assumptions, and checking these assumptions against external references (e.g. on CO2 spectra, human outputs, atmospheric composition). This can be done to any degree of thoroughness - the more thorough you are, the better evidence you get, but weak evidence is still evidence.

This is probably a lot easier to do for AI predictions than for global warming, or even just Arrhenius' prediction.

Your own brainpower is overrated. Unless you suspect that politics has rotten the field completely, a large collection of experts will be more likely to find flaws than you on your own.

I agree with the checking against external references, though! Experts don't do this enough, so you can add a lot of value by doing this.

Are you assuming that if effective criticisms of the argument in question existed, they would be published in a manner that we should expect to find them?

Yes. And I think it's a reasonable assumption.

I was just pointing out a limit on the applicability of the heuristic; if someone controls the discourse, they can allow only bad criticisms of their arguments enter the discourse.

[-]Zian00

This sounds like an expanded exposition of the discussion in HPMoR between Harry and Draco about trying to find good counterarguments to the Death Eaters' main belief. Is that a fair representation of your article?

If not, I'd like to know so I know to re-read the article. :)