Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Argument Screens Off Authority

32 Post author: Eliezer_Yudkowsky 14 December 2007 12:05AM

Black Belt Bayesian (aka "steven") tries to explain the asymmetry between good arguments and good authority, but it doesn't seem to be resolving the comments on Reversed Stupidity Is Not Intelligence, so let me take my own stab at it:

Scenario 1:  Barry is a famous geologist.  Charles is a fourteen-year-old juvenile delinquent with a long arrest record and occasional psychotic episodes.  Barry flatly asserts to Arthur some counterintuitive statement about rocks, and Arthur judges it 90% probable.  Then Charles makes an equally counterintuitive flat assertion about rocks, and Arthur judges it 10% probable.  Clearly, Arthur is taking the speaker's authority into account in deciding whether to believe the speaker's assertions.

Scenario 2:  David makes a counterintuitive statement about physics and gives Arthur a detailed explanation of the arguments, including references.  Ernie makes an equally counterintuitive statement, but gives an unconvincing argument involving several leaps of faith.  Both David and Ernie assert that this is the best explanation they can possibly give (to anyone, not just Arthur).  Arthur assigns 90% probability to David's statement after hearing his explanation, but assigns a 10% probability to Ernie's statement.

It might seem like these two scenarios are roughly symmetrical: both involve taking into account useful evidence, whether strong versus weak authority,  or strong versus weak argument.

But now suppose that Arthur asks Barry and Charles to make full technical cases, with references; and that Barry and Charles present equally good cases, and Arthur looks up the references and they check out.  Then Arthur asks David and Ernie for their credentials, and it turns out that David and Ernie have roughly the same credentials—maybe they're both clowns, maybe they're both physicists.

Assuming that Arthur is knowledgeable enough to understand all the technical arguments—otherwise they're just impressive noises—it seems that Arthur should view David as having a great advantage in plausibility over Ernie, while Barry has at best a minor advantage over Charles.

Indeed, if the technical arguments are good enough, Barry's advantage over Charles may not be worth tracking.  A good technical argument is one that eliminates reliance on the personal authority of the speaker.

Similarly, if we really believe Ernie that the argument he gave is the best argument he could give, which includes all of the inferential steps that Ernie executed, and all of the support that Ernie took into account—citing any authorities that Ernie may have listened to himself—then we can pretty much ignore any information about Ernie's credentials.  Ernie can be a physicist or a clown, it shouldn't matter.  (Again, this assumes we have enough technical ability to process the argument.  Otherwise, Ernie is simply uttering mystical syllables, and whether we "believe" these syllables depends a great deal on his authority.)

So it seems there's an asymmetry between argument and authority.  If we know authority we are still interested in hearing the arguments; but if we know the arguments fully, we have very little left to learn from authority.

Clearly (says the novice) authority and argument are fundamentally different kinds of evidence, a difference unaccountable in the boringly clean methods of Bayesian probability theory.  For while the strength of the evidences—90% versus 10%—is just the same in both cases, they do not behave similarly when combined.  How, oh how, will we account for this?

Here's half a technical demonstration of how to represent this difference in probability theory.  (The rest you can take on my personal authority, or look up in the references.)

If p(H|E1) = 90% and p(H|E2) = 9%, what is the probability p(H|E1,E2)?  If learning E1 is true leads us to assign 90% probability to H, and learning E2 is true leads us to assign 9% probability to H, then what probability should we assign to H if we learn both E1 and E2?  This is simply not something you can calculate in probability theory from the information given.  No, the missing information is not the prior probability of H.  E1 and E2 may not be independent of each other.

Suppose that H is "My sidewalk is slippery", E1 is "My sprinkler is running", and E2 is "It's night."  The sidewalk is slippery starting from 1 minute after the sprinkler starts, until just after the sprinkler finishes, and the sprinkler runs for 10 minutes.  So if we know the sprinkler is on, the probability is 90% that the sidewalk is slippery.  The sprinkler is on during 10% of the nighttime, so if we know that it's night, the probability of the sidewalk being slippery is 9%.  If we know that it's night and the sprinkler is on—that is, if we know both facts—the probability of the sidewalk being slippery is 90%.

We can represent this in a graphical model as follows:

Night -> Sprinkler -> Slippery

Whether or not it's Night causes the Sprinkler to be on or off, and whether the Sprinkler is on causes the Sidewalk to be slippery or unslippery.

The direction of the arrows is meaningful.  If I wrote:

Night -> Sprinkler <- Slippery

This would mean that, if I didn't know anything about the Sprinkler, the probability of Nighttime and Slipperiness would be independent of each other.  For example, suppose that I roll Die One and Die Two, and add up the showing numbers to get the Sum:

Die 1 -> Sum <- Die 2.

If you don't tell me the sum of the two numbers, and you tell me the first die showed 6, this doesn't tell me anything about the result of the second die, yet.  But if you now also tell me the sum is 7, I know the second die showed 1.

Figuring out when various pieces of information are dependent or independent of each other, given various background knowledge, actually turns into a quite technical topic.  The books to read are Judea Pearl's Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference and Causality.  (If you only have time to read one book, read the first one.)

If you know how to read causal graphs, then you look at the dice-roll graph and immediately see:

p(die1,die2) = p(die1)*p(die2)
p(die1,die2|sum) p(die1|sum)*p(die2|sum)

If you look at the correct sidewalk diagram, you see facts like:

p(slippery|night) p(slippery)
p(slippery|sprinkler) p(slippery)
p(slippery|night, sprinkler) = p(slippery|sprinkler)

That is, the probability of the sidewalk being Slippery, given knowledge about the Sprinkler and the Night, is the same probability we would assign if we knew only about the Sprinkler.  Knowledge of the Sprinkler has made knowledge of the Night irrelevant to inferences about Slipperiness.

This is known as screening off, and the criterion that lets us read such conditional independences off causal graphs is known as D-separation.

For the case of argument and authority, the causal diagram looks like this:

Truth -> Argument Goodness -> Expert Belief

If something is true, then it therefore tends to have arguments in favor of it, and the experts therefore observe these evidences and change their opinions.  (In theory!)

If we see that an expert believes something, we infer back to the existence of evidence-in-the-abstract (even though we don't know what that evidence is exactly), and from the existence of this abstract evidence, we infer back to the truth of the proposition.

But if we know the value of the Argument node, this D-separates the node "Truth" from the node "Expert Belief" by blocking all paths between them, according to certain technical criteria for "path blocking" that seem pretty obvious in this case.  So even without checking the exact probability distribution, we can read off from the graph that:

p(truth|argument,expert) = p(truth|argument)

This does not represent a contradiction of ordinary probability theory.  It's just a more compact way of expressing certain probabilistic facts.  You could read the same equalities and inequalities off an unadorned probability distribution—but it would be harder to see it by eyeballing.  Authority and argument don't need two different kinds of probability, any more than sprinklers are made out of ontologically different stuff than sunlight.

In practice you can never completely eliminate reliance on authority.  Good authorities are more likely to know about any counterevidence that exists and should be taken into account; a lesser authority is less likely to know this, which makes their arguments less reliable.  This is not a factor you can eliminate merely by hearing the evidence they did take into account.

It's also very hard to reduce arguments to pure math; and otherwise, judging the strength of an inferential step may rely on intuitions you can't duplicate without the same thirty years of experience.

There is an ineradicable legitimacy to assigning slightly higher probability to what E. T. Jaynes tells you about Bayesian probability, than you assign to Eliezer Yudkowsky making the exact same statement.  Fifty additional years of experience should not count for literally zero influence.

But this slight strength of authority is only ceteris paribus, and can easily be overwhelmed by stronger arguments.  I have a minor erratum in one of Jaynes's books—because algebra trumps authority.


Part of the Politics Is the Mind-Killer subsequence of How To Actually Change Your Mind

Next post: "Hug the Query"

Previous post: "Reversed Stupidity Is Not Intelligence"

Comments (31)

Sort By: Old
Comment author: RobinHanson 14 December 2007 12:14:23AM 19 points [-]

Unfortunately, it is only in a few rare technical areas where one can find anything like "full technical cases, with references" given to a substantial group "knowledgeable enough to understand all the technical arguments", and it is even more rare that they actually bother to do so. Even when people appear to be giving such technical arguments to such knowledgeable audiences, the true is more often otherwise. For example, the arguments presented are often only a small fraction of what convinced someone to support a position.

Comment author: Eliezer_Yudkowsky 14 December 2007 12:23:34AM 13 points [-]

Robin, that's surely true. But the human default seems to be to give too much credence to authority in cases where we can partially evaluate the arguments. Even experts exhibit herd behavior, math errors go undetected, etc. It's certainly a mistake to believe plausible verbal arguments from a nonexpert over math you can't understand. But I think you could make a good case that as a general heuristic, it is wiser to try to rely harder on argument, and less on authority, wherever you can.

Comment author: Eliezer_Yudkowsky 14 December 2007 12:48:27AM 9 points [-]

An example of where not to apply this advice: There are so many different observations bearing on global warming, that if you try to check the evidence for yourself, you will be even more doomed than if you try to decide which authority to trust.

Comment author: Ian_C. 14 December 2007 12:50:14AM 0 points [-]

Was there supposed to be a second book there?


Comment author: Doug_S. 14 December 2007 01:14:17AM 1 point [-]

Book 1: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Book 2: Causality

Comment author: RobinHanson 14 December 2007 01:16:30AM 4 points [-]

Sometimes people attend too much to authority, and sometimes too little. I'm not sure I can discern an overall bias either way.

Comment author: Rinon 04 June 2012 03:28:39PM 5 points [-]

I haven't done any studies, but I have a feeling that people attend to authority when it supports their natural biases, and ignore authority when it opposes their natural biases.

Comment author: manuelg 14 December 2007 01:21:54AM 1 point [-]

Apropos of nothing: you have a lot to say about the discrete Bayesian. But I would argue that talking about the quality of manufacturing processes, one would often do best talking about continuous distributions.

The distributions that my metal-working machines manifest (over the dimensions under tolerance that my customers care about) are the Gaussian normal, the log normal, and the Pareto.

When the continuous form of the Bayesian is discussed, they always talk about the Beta distributions.

I have tried reasoning with the lathe, the mill, and the drill presses, to begin exhibiting the Beta, but they just ignore my pleadings, and spit hot metal chips at me.

The standard frequentist approaches seem like statistical theater. So I am inclined to explore other approaches.

Comment author: James_Bach 14 December 2007 06:01:38AM 2 points [-]

You said: "So it seems there's an asymmetry between argument and authority. If we know authority we are still interested in hearing the arguments; but if we know the arguments fully, we have very little left to learn from authority."

I like your conclusion, but I can't find anything in your argument to support it! By rearranging some words in your text I could construct an equally plausible (to a hypothetical neutral observer) argument that authority screens off evidence. You seem to believe that evidence screens off authority simply because you think evidence is what makes authority believe something. But isn't that assuming the very thing you want to demonstrate?

Your scenarios in the first paragraphs are neither arguments nor demonstrations. They are statements of what you believe. Fair enough. But then I was expecting that you'd provide some reason for me to reject the hypothesis-- a hypothesis that carried a lot of weight during the era of Scholasticism-- that there is no such thing as evidence without authority (in other words, it is authority that consecrates evidence *as* evidence).

I used to wonder how anyone could take the obviously wrong physics of Aristotle seriously, until I learned enough about history that it dawned on me that for the Scholastic thinkers of the middle ages, how physics really worked was far less important than maintaining social order. If maintaining social order is the problem that trumps all others in your life and in your society, then evidence must necessarily carry little weight compared to authority. You will give up a lot of science, of course, but you will give it up gladly.

Obviously, we aren't in that situation. But I worry when I see, for instance, rational arguments for the existence of God that assume the very thing they purport to prove. And your argument (hopefully I've misunderstood it) seems a lot like those.

Comment author: JoshuaZ 30 April 2010 05:24:30AM *  13 points [-]

Much of what is obviously wrong about Aristotle or likely to be wrong was discussed. Orseme for example wrote in the 1300s and discussed a lot of problems with Aristotle (or at least his logic). He proposed concepts of momentum and gravity that were more or less correct but lacked any quantization. And people from a much earlier time understood that Aristotle's explanation of movement of thrown objects was deeply wrong. Attempts to repair this occurred well before the Scholastics even were around. Scholastics were more than willing to discuss alternate theories, especially theories of impetus. People seem to fail to realize how much discussion there was in the middle ages about these issues. It didn't go Aristotle and then Galileo and Newton. Between Aristotle and Galileo were Oresme, Benedetti (who proposed a law of falling objects very similar to Galileo) and many others. Also, many of the Scholastics paid very careful attention to Avicenna's criticism and analysis of Aristotle (Edit: My impression is that they became in some ways more knee-jerk Aristotelian after Averroism became prevalent but I don't know enough about the exact details to comment on ratios or the like).

It might be fun to dismiss everyone in the Middle Ages as religion-bound control freaks, but that's simply not the case. The actual history is much more complicated.

Comment author: ejstheman 14 July 2011 06:11:28PM 5 points [-]

If we observe experts changing their beliefs based on evidence often, but evidence changing based on the beliefs of experts never, then it seems reasonable that the chain of causality goes reality->evidence->beliefs of experts->beliefs of non-experts, with the possible shortcut reality->evidence->beliefs of non-experts, when the evidence is particularly abundant or clear.

Comment author: durgadas 20 August 2012 09:16:12PM 0 points [-]

"I used to wonder how anyone could take the obviously wrong physics of Aristotle seriously, until I learned enough about history that it dawned on me that for the Scholastic thinkers of the middle ages, how physics really worked was far less important than maintaining social order. If maintaining social order is the problem that trumps all others in your life and in your society, then evidence must necessarily carry little weight compared to authority. You will give up a lot of science, of course, but you will give it up gladly.

Obviously, we aren't in that situation. But I worry when I see, for instance, rational arguments for the existence of God that assume the very thing they purport to prove. And your argument (hopefully I've misunderstood it) seems a lot like those."

Well, reading Sam Harris' account of speaking to prominent atheists backing a moralistic relativism "on behalf of" the world's religions would led me to suspect that we are just as, maybe more influenced by the idea of maintaining social order. I think that the tyrrany of choice (50 kinds of ketchup anyone?) makes it seem like we've got more 'apparent choices' many of which aren't fundamentally different from each other as far as what social cliques to participate in.

If you look closely, each of these apparently different groups has a uniform and a rallying cry, but on the whole say much the same thing, even where the 'authority' in each case seems quite different.

Comment author: Eliezer_Yudkowsky 14 December 2007 06:26:10AM 1 point [-]

Changed first use of "evidence" to link to "What is Evidence?" and first use of "Bayesian" to link to "An Intuitive Explanation of Bayesian Reasoning", respectively the qualitative and quantitative definitions of evidence that I use as standard. See also this on rationality as engine of map-territory correlation.

Map-territory correlation ("truth") being my goal, I have no use for Scholasticism.

Comment author: Unknown 14 December 2007 08:28:34AM 5 points [-]

The overall bias that people have is to point to authority when it seems to support their position more, but to point to argument when it seems to support their position more: i.e. confirmation bias.

Comment author: Bob_Unwin6 14 December 2007 09:43:00AM 3 points [-]

Similarly, if we really believe Ernie that the argument he gave is the best argument he could give, which includes all of the inferential steps that Ernie executed, and all of the support that Ernie took into account - citing any authorities that Ernie may have listened to himself - then we can pretty much ignore any information about Ernie's credentials.

It might take an intellectual life-time (or much more) to get all the relevant background. For example, mathematicians (and other people in very technical domains) develop very good intuitions about whether or not certain statements hold. They might be quite sure that something is true long before they are able to give even a sketchy proof and it seems rational to follow them based on their credentials (e.g. having made contributions to this sub-discipline). Yet there is probably no way to really get a grasp of their inferential steps without having done lots of the math they have.

I stress "doing" the math, rather than reading about it. Lots of math is "knowing how" rather than "knowing that". The same sort of thing might hold for aesthetical or ethical judgments. Without having played (or at least studied) a lot of classical music for the clarinet, I might not be able to grasp the "inferential" steps that led a professional player to his judgment about the superiority of a certain piece of music.

Comment author: billswift 14 December 2007 02:50:11PM 5 points [-]

Part of the problem is that "authority" conflates two distinct ideas. The first is "justified use of coercion" as when the government is referred to as "the authorities". The second is as a synonym for expertise. The two are united in parents but otherwise distinct. It may be useful to do as I have in my notes and avoid using "authority" when "expertise" is what is meant, at least it reduces the confusion a little.

Comment author: Dynamically_Linked 15 December 2007 03:48:34AM 0 points [-]

Has anyone read Learning Bayesian Networks by Richard E. Neapolitan? How does it compare with Judea Pearl's two books as an introduction to Bayesian Networks? I'm reading Pearl's first book now, but I wonder if Neapolitan's would be better since it is newer and is written specifically as a textbook.

Comment author: Richard_Hollerith2 16 December 2007 12:19:25PM 0 points [-]

Sorry, I do not know that book.

Bob Unwin, in my humble opinion, math is a poor choice of example to make your point because mathematical knowledge can be established by a proof (with a calculation being a kind of proof) and what distinguishes a proof from other kinds of arguments is the ease with which a proof can be verified by nonexperts. (Yes, yes, a math expert's opinion on whether someone will discover a proof of a particular proposition is worth something, but the vast majority of the value of math resides in knowledge for which a proof already exists.)

Comment author: Joshua_Fox 16 December 2007 02:47:07PM 0 points [-]

Great stuff as always. Enhanced diagrams (beyond the simple ASCII ones), with clear labels, and even inline explanations, on nodes and edges, would make the Bayesian explanations much clearer.

Comment author: steven 16 December 2007 02:55:09PM 0 points [-]

Eliezer, good reduxification. I'm still not sure about the point that Tom McCabe made about when authority stops mattering because overwhelming evidence brings the probability close to 0 or 1. Screening seems to do at least *some* of the work, though.


"The standard frequentist approaches seem like statistical theater."

I lost any remaining respect for standard frequentist inference when I was taught a test that would sometimes "neither reject nor fail to reject" a null hypothesis. Haha.

Comment author: Eliezer_Yudkowsky 16 December 2007 07:54:10PM 0 points [-]

Dynamically, I haven't read Neapolitan's book, but judging by the table of contents, it's more directed toward people who just want to use the algorithms and less at people who want a really deep understanding of why they work, where they come from, what the meaning is, and why these algorithms and no others. Read Pearl's book first.

Billswift, I think I've consistently used "authority" in the sense of "trusted expert", and for social coercion I've used "regulation" or "goverment".

Comment author: Wei_Dai2 20 December 2007 11:00:47AM 2 points [-]

Eliezer, what is your view of the relationship between Bayesian Networks and Solomonoff Induction? You've talked about both of these concepts on this blog, but I'm having trouble understanding how they fit together. A Google search for both of these terms together yields only one meaningful hit, which happens to be a mailing list post by you. But it doesn't really touch on my question.

On the face of it, both Bayesian Networks and Solomonoff Induction are "Bayesian", but they seem to be incompatible with each other. In the Bayesian Networks approach, conditional probabilities are primary, and the full probability distribution function is more of a mathematical formalism that stays in the background. Solomonoff Induction on the other hand starts with a fully specified (even if uncomputable) prior distribution and derives any conditional probabilities from it as needed. Do you have any idea how to reconcile these two approaches?

Comment author: clockbackward 11 October 2010 02:03:26PM 2 points [-]

Unfortunately, in practice, being as knowledgable about the details of a particular scenario as an expert does not imply that you will process the facts as correctly as the expert. For instance, an expert and I may both know all of the facts of a murder case, but (if expertise means anything) they are still more likely to make correct judgements about what actually happened due to their prior experience. If I actually had their prior experience, it's true that their authority would mean a lot less, but in that case I would be closer to an expert myself.

To give another example, a mathematically inclined high school student may see a mathematical proof, with each step laid out before them in detail. The high school student may have the opportunity to analyze every step to look for potential problems in the proof and see none. Then, a mathematician may come along, glance over the proof, and say that it is invalid. Who are you going to believe?

In some cases, we are the high school student. We can stare at all the raw facts (the details of the proof) and they all make sense to us and we feel very strongly that we can draw a certain inference from them. And yet, we are unaware of what we don't know that the expert does know. Or the expert is simply better at reasoning in these kinds of problems, or avoiding falling into logical traps that sound valid but are not.

Of course, the more you know about the expert's arguments, the less their authority counts. But sometimes, the expertise lies in the ability to correctly process the type of facts at hand. If a mathematician's argument about the invalidness of step 3 does not seem convincing to you, and your argument about why step 3 is valid seems totally convincing, you should still at least hesitate in concluding you are correct.

Comment author: mat33 05 October 2011 10:56:12AM 0 points [-]

"If we know authority we are still interested in hearing the arguments; but if we know the arguments fully, we have very little left to learn from authority."

Really? We don't deny any ideas/possibilities without 5 minutes of thinking, at least (on the authority of Harry Potter :)). Right. But I'll need a lot more time (days at least) to understand an advanced research of any able professional. And I am ready to fail understanding any work of true genius before it's included in the textbooks for, well, students.

Comment author: Dojan 18 October 2011 11:12:25AM 1 point [-]

This post begs the question of when we assign authority to someone. For example, I don't usually take the pope very seriously, even though by many standards he is a high authority; But Carl Sagan rocks. But if I listen ever so slightly more to the Sagan than to the pope (which isn't true: I don't listen even a little to the pope); when did I decide that? I mean, if I only assign authority to the people who already agrees with me and share my worldview, in't that a short trip to the happy death spiral?

Comment author: royf 23 August 2012 05:16:25AM 1 point [-]

p(H|E1,E2) [...] is simply not something you can calculate in probability theory from the information given [i.e. p(H|E1) and p(H|E2)].

Jaynes would disapprove.

You continue to give more information, namely that p(H|E1,E2) = p(H|E1). Thanks, that reduces our uncertainty about p(H|E1,E2).

But we are hardly helpless without it. Whatever happened to the Maximum Entropy Principle? Incidentally, the maximum entropy distribution (given the initial information) does have E1 and E2 independent. If your intuition says this before having more information, it is good.

Don't say that an answer can't be reached without further information. Say: here's more information to make your answer better.

Comment author: BlueAjah 12 January 2013 05:08:26PM 1 point [-]

You've called two different things "Argument Goodness" so you can draw your diagram, but in reality the arguments that the expert heard that led them to their opinion, and the argument that they gave you, are always going to be slightly different.

Also your ability to evaluate the "Argument Goodness" of the argument they gave you is going to be limited, while the expert will probably be better at it.

Comment author: cousin_it 23 September 2013 10:06:29AM *  1 point [-]

Note that if we strengthen "argument" to "valid formal proof", and "authority" to "proof generator", then the statement of this post is wrong. For a good decision theory, seeing a valid formal proof that some action leads to higher utility than others should not be reason enough to choose that action, because such a decision theory would be exploitable by Lobian proof generators.

I'm not sure if this counterargument transfers continuously to everyday reasoning, or it's just a fluke of how we think about decision theory. Maybe there could be a different formalization of logical counterfactuals in which "argument screens off authority" stays true. But that doesn't seem likely to me...

Comment author: private_messaging 24 September 2013 06:47:16AM *  0 points [-]

I think what applies to everyday reasoning is that an argument is usually an informal suggestion pointing at a single component out of, often, a very huge sum, or, in other cases, a proposition reliant on a very large number of implicit assumptions and/or very prone to being destroyed "from the outside" by expert knowledge.

If the term from the sum was picked at random, it would have to be regressed towards the mean when you estimate expected value of the sum; when the term is not picked at random, and you don't know to which extent it's choice is correlated with it's value, you can't really use it in any way to meaningfully improve an estimate of the sum (even though the authority and non-authority alike will demand that you add in their argument somehow, and will not suggest you treat it as an estimation of the totality of the arguments).

Comment author: Colombi 20 February 2014 05:24:35AM 0 points [-]

Hmmm. I'm not sure what to believe here: you, or So8rien.