Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
"I feel like I'm not the sort of person who's allowed to have opinions about the important issues like AI risk."
"What's the bad thing that might happen if you expressed your opinion?"
"It would be wrong in some way I hadn't foreseen, and people would think less of me."
"Do you think less of other people who have wrong opinions?"
"Not if they change their minds when confronted with the evidence."
"Would you do that?"
"Do you think other people think less of those who do that?"
"Well, if it's alright for other people to make mistakes, what makes YOU so special?"
A lot of my otherwise very smart and thoughtful friends seem to have a mental block around thinking on certain topics, because they're the sort of topics Important People have Important Opinions around. There seem to be two very different reasons for this sort of block:
- Being wrong feels bad.
- They might lose the respect of others.
If you don't have an opinion, you can hold onto the fantasy that someday, once you figure the thing out, you'll end up having a right opinion. But if you put yourself out there with an opinion that's unmistakably your own, you don't have that excuse anymore.
This is related to the desire to pass tests. The smart kids go through school and are taught - explicitly or tacitly - that as long as they get good grades they're doing OK, and if they try at all they can get good grades. So when they bump up against a problem that might actually be hard, there's a strong impulse to look away, to redirect to something else. So they do.
You have to understand that this system is not real, it's just a game. In real life you have to be straight-up wrong sometimes. So you may as well get it over with.
If you expect to be wrong when you guess, then you're already wrong, and paying the price for it. As Eugene Gendlin said:
What is true is already so. Owning up to it doesn't make it worse. Not being open about it doesn't make it go away. And because it's true, it is what is there to be interacted with. Anything untrue isn't there to be lived. People can stand what is true, for they are already enduring it.
What you would be mistaken about, you're already mistaken about. Owning up to it doesn't make you any more mistaken. Not being open about it doesn't make it go away.
"You're already "wrong" in the sense that your anticipations aren't perfectly aligned with reality. You just haven't put yourself in a situation where you've openly tried to guess the teacher's password. But if you want more power over the world, you need to focus your uncertainty - and this only reliably makes you righter if you repeatedly test your beliefs. Which means sometimes being wrong, and noticing. (And then, of course, changing your mind.)
Being wrong is how you learn - by testing hypotheses.
Getting used to being wrong - forming the boldest hypotheses your current beliefs can truly justify so that you can correct your model based on the data - is painful and I don't have a good solution to getting over it except to tough it out. But there's a part of the problem we can separate out, which is - the pain of being wrong publicly.
When I attended a Toastmasters club, one of the things I liked a lot about giving speeches there was that the stakes were low in terms of the content. If I were giving a presentation at work, I had to worry about my generic presentation skills, but also whether the way I was presenting it was a good match for my audience, and also whether the idea I was pitching was a good strategic move for the company or my career, and also whether the information I was presenting was accurate. At Toastmasters, all the content-related stakes were gone. No one with the power to promote or fire me was present. Everyone was on my side, and the group was all about helping each other get better. So all I had to think about was the form of my speech.
Once I'd learned some general presentations at Toastmasters, it became easier to give talks where I did care about the content and there were real-world consequences to the quality of the talk. I'd gotten practice on the form of public speaking separately - so now I could relax about that, and just focus on getting the content right.
Similarly, expressing opinions publicly can be stressful because of the work of generating likely hypotheses, and revealing to yourself that you are farther behind in understanding things than you thought - but also because of the perceived social consequences of sounding stupid. You can at least isolate the last factor, by starting out thinking things through in secret. This works by separating epistemic uncertainty from social confidence. (This is closely related to the dichotomy between social and objective respect.)
Of course, as soon as you can stand to do this in public, that's better - you'll learn faster, you'll get help. But if you're not there yet, this is a step along the way. If the choice is between having private opinions and having none, have private opinions. (Also related: If we can't lie to others, we will lie to ourselves.)
Read and discuss a book on a topic you want to have opinions about, with one trusted friend. Start a secret blog - or just take notes. Practice having opinions at all, that you can be wrong about, before you worry about being accountable for your opinions. One step at a time.
Before you're publicly right, consider being secretly wrong. Better to be secretly wrong, than secretly not even wrong.
(Cross-posted at my personal blog.)
It is obvious that the same thing will not be willing to do or undergo opposites in the same part of itself, in relation to the same thing, at the same time. --Book IV of Plato's Republic
Can you simultaneously want sex and not want it? Can you believe in God and not believe in Him at the same time? Can you be fearless while frightened?
To be fair to Plato, this was meant not as an assertion that such contradictions are impossible, but as an argument that the soul has multiple parts. It seems we can, in fact, want something while also not wanting it. This is awfully strange, and it led Plato to conclude the soul must have multiple parts, for surely no one part could contain both sides of the contradiction.
Often, when we attempt to accept contradictory statements as correct, it causes cognitive dissonance--that nagging, itchy feeling in your brain that won't leave you alone until you admit that something is wrong. Like when you try to convince yourself that staying up just a little longer playing 2048 won't have adverse effects on the presentation you're giving tomorrow, when you know full well that's exactly what's going to happen.
But it may be that cognitive dissonance is the exception in the face of contradictions, rather than the rule. How would you know? If it doesn't cause any emotional friction, the two propositions will just sit quietly together in your brain, never mentioning that it's logically impossible for both of them to be true. When we accept a contradiction wholesale without cognitive dissonance, it's what Orwell called "doublethink".
When you're a mere mortal trying to get by in a complex universe, doublethink may be adaptive. If you want to be completely free of contradictory beliefs without spending your whole life alone in a cave, you'll likely waste a lot of your precious time working through conundrums, which will often produce even more conundrums.
Suppose I believe that my husband is faithful, and I also believe that the unfamiliar perfume on his collar indicates he's sleeping with other women without my permission. I could let that pesky little contradiction turn into an extended investigation that may ultimately ruin my marriage. Or I could get on with my day and leave my marriage intact.
It's better to just leave those kinds of thoughts alone, isn't it? It probably makes for a happier life.
Suppose you believe that driving is dangerous, and also that, while you are driving, you're completely safe. As established in Doublethink, there may be some benefits to letting that mental configuration be.
There are also some life-shattering downsides. One of the things you believe is false, you see, by the law of the excluded middle. In point of fact, it's the one that goes "I'm completely safe while driving". Believing false things has consequences.
Be irrationally optimistic about your driving skills, and you will be happily unconcerned where others sweat and fear. You won't have to put up with the inconvenience of a seatbelt. You will be happily unconcerned for a day, a week, a year. Then CRASH, and spend the rest of your life wishing you could scratch the itch in your phantom limb. Or paralyzed from the neck down. Or dead. It's not inevitable, but it's possible; how probable is it? You can't make that tradeoff rationally unless you know your real driving skills, so you can figure out how much danger you're placing yourself in. --Eliezer Yudkowsky, Doublethink (Choosing to be Biased)
What are beliefs for? Please pause for ten seconds and come up with your own answer.
Ultimately, I think beliefs are inputs for predictions. We're basically very complicated simulators that try to guess which actions will cause desired outcomes, like survival or reproduction or chocolate. We input beliefs about how the world behaves, make inferences from them to which experiences we should anticipate given various changes we might make to the world, and output behaviors that get us what we want, provided our simulations are good enough.
My car is making a mysterious ticking sound. I have many beliefs about cars, and one of them is that if my car makes noises it shouldn't, it will probably stop working eventually, and possibly explode. I can use this input to simulate the future. Since I've observed my car making a noise it shouldn't, I predict that my car will stop working. I also believe that there is something causing the ticking. So I predict that if I intervene and stop the ticking (in non-ridiculous ways), my car will keep working. My belief has thus led to the action of researching the ticking noise, planning some simple tests, and will probably lead to cleaning the sticky lifters.
If it's true that solving the ticking noise will keep my car running, then my beliefs will cash out in correctly anticipated experiences, and my actions will cause desired outcomes. If it's false, perhaps because the ticking can be solved without addressing a larger underlying problem, then the experiences I anticipate will not occur, and my actions may lead to my car exploding.
Doublethink guarantees that you believe falsehoods. Some of the time you'll call upon the true belief ("driving is dangerous"), anticipate future experiences accurately, and get the results you want from your chosen actions ("don't drive three times the speed limit at night while it's raining"). But some of the time, if you actually believe the false thing as well, you'll call upon the opposite belief, anticipate inaccurately, and choose the last action you'll ever take.
Without any principled algorithm determining which of the contradictory propositions to use as an input for the simulation at hand, you'll fail as often as you succeed. So it makes no sense to anticipate more positive outcomes from believing contradictions.
Contradictions may keep you happy as long as you never need to use them. Should you call upon them, though, to guide your actions, the debt on false beliefs will come due. You will drive too fast at night in the rain, you will crash, you will fly out of the car with no seat belt to restrain you, you will die, and it will be your fault.
Against Against Doublethink
What if Plato was pretty much right, and we sometimes believe contradictions because we're sort of not actually one single person?
It is not literally true that Systems 1 and 2 are separate individuals the way you and I are. But the idea of Systems 1 and 2 suggests to me something quite interesting with respect to the relationship between beliefs and their role in decision making, and modeling them as separate people with very different personalities seems to work pretty darn well when I test my suspicions.
I read Atlas Shrugged probably about a decade ago. I was impressed with its defense of capitalism, which really hammers home the reasons it’s good and important on a gut level. But I was equally turned off by its promotion of selfishness as a moral ideal. I thought that was *basically* just being a jerk. After all, if there’s one thing the world doesn’t need (I thought) it’s more selfishness.
Then I talked to a friend who told me Atlas Shrugged had changed his life. That he’d been raised in a really strict family that had told him that ever enjoying himself was selfish and made him a bad person, that he had to be working at every moment to make his family and other people happy or else let them shame him to pieces. And the revelation that it was sometimes okay to consider your own happiness gave him the strength to stand up to them and turn his life around, while still keeping the basic human instinct of helping others when he wanted to and he felt they deserved it (as, indeed, do Rand characters). --Scott of Slate Star Codex in All Debates Are Bravery Debates
If you're generous to a fault, "I should be more selfish" is probably a belief that will pay off in positive outcomes should you install it for future use. If you're selfish to a fault, the same belief will be harmful. So what if you were too generous half of the time and too selfish the other half? Well, then you would want to believe "I should be more selfish" with only the generous half, while disbelieving it with the selfish half.
Systems 1 and 2 need to hear different things. System 2 might be able to understand the reality of biases and make appropriate adjustments that would work if System 1 were on board, but System 1 isn't so great at being reasonable. And it's not System 2 that's in charge of most of your actions. If you want your beliefs to positively influence your actions (which is the point of beliefs, after all), you need to tailor your beliefs to System 1's needs.
For example: The planning fallacy is nearly ubiquitous. I know this because for the past three years or so, I've gotten everywhere five to fifteen minutes early. Almost every single person I meet with arrives five to fifteen minutes late. It is very rare for someone to be on time, and only twice in three years have I encountered the (rather awkward) circumstance of meeting with someone who also arrived early.
Before three years ago, I was also usually late, and I far underestimated how long my projects would take. I knew, abstractly and intellectually, about the planning fallacy, but that didn't stop System 1 from thinking things would go implausibly quickly. System 1's just optimistic like that. It responds to, "Dude, that is not going to work, and I have a twelve point argument supporting my position and suggesting alternative plans," with "Naaaaw, it'll be fine! We can totally make that deadline."
At some point (I don't remember when or exactly how), I gained the ability to look at the true due date, shift my System 1 beliefs to make up for the planning fallacy, and then hide my memory that I'd ever seen the original due date. I would see that my flight left at 2:30, and be surprised to discover on travel day that I was not late for my 2:00 flight, but a little early for my 2:30 one. I consistently finished projects on time, and only disasters caused me to be late for meetings. It took me about three months before I noticed the pattern and realized what must be going on.
I got a little worried I might make a mistake, such as leaving a meeting thinking the other person just wasn't going to show when the actual meeting time hadn't arrived. I did have a couple close calls along those lines. But it was easy enough to fix; in important cases, I started receiving Boomeranged notes from past-me around the time present-me expected things to start that said, "Surprise! You've still got ten minutes!"
This unquestionably improved my life. You don't realize just how inconvenient the planning fallacy is until you've left it behind. Clearly, considered in isolation, the action of believing falsely in this domain was instrumentally rational.
Doublethink, and the Dark Arts generally, applied to carefully chosen domains is a powerful tool. It's dumb to believe false things about really dangerous stuff like driving, obviously. But you don't have to doublethink indiscriminately. As long as you're careful, as long as you suspend epistemic rationality only when it's clearly beneficial to do so, employing doublethink at will is a great idea.
Instrumental rationality is what really matters. Epistemic rationality is useful, but what use is holding accurate beliefs in situations where that won't get you what you want?
Against Against Against Doublethink
There are indeed epistemically irrational actions that are instrumentally rational, and instrumental rationality is what really matters. It is pointless to believing true things if it doesn't get you what you want. This has always been very obvious to me, and it remains so.
There is a bigger picture.
Certain epistemic rationality techniques are not compatible with dark side epistemology. Most importantly, the Dark Arts do not play nicely with "notice your confusion", which is essentially your strength as a rationalist. If you use doublethink on purpose, confusion doesn't always indicate that you need to find out what false thing you believe so you can fix it. Sometimes you have to bury your confusion. There's an itsy bitsy pause where you try to predict whether it's useful to bury.
As soon as I finally decided to abandon the Dark Arts, I began to sweep out corners I'd allowed myself to neglect before. They were mainly corners I didn't know I'd neglected.
The first one I noticed was the way I responded to requests from my boyfriend. He'd mentioned before that I often seemed resentful when he made requests of me, and I'd insisted that he was wrong, that I was actually happy all the while. (Notice that in the short term, since I was probably going to do as he asked anyway, attending to the resentment would probably have made things more difficult for me.) This self-deception went on for months.
Shortly after I gave up doublethink, he made a request, and I felt a little stab of dissonance. Something I might have swept away before, because it seemed more immediately useful to bury the confusion than to notice it. But I thought (wordlessly and with my emotions), "No, look at it. This is exactly what I've decided to watch for. I have noticed confusion, and I will attend to it."
It was very upsetting at first to learn that he'd been right. I feared the implications for our relationship. But that fear didn't last, because we both knew the only problems you can solve are the ones you acknowledge, so it is a comfort to know the truth.
I was far more shaken by the realization that I really, truly was ignorant that this had been happening. Not because the consequences of this one bit of ignorance were so important, but because who knows what other epistemic curses have hidden themselves in the shadows? I realized that I had not been in control of my doublethink, that I couldn't have been.
Pinning down that one tiny little stab of dissonance took great preparation and effort, and there's no way I'd been working fast enough before. "How often," I wondered, "does this kind of thing happen?"
Very often, it turns out. I began noticing and acting on confusion several times a day, where before I'd been doing it a couple times a week. I wasn't just noticing things that I'd have ignored on purpose before; I was noticing things that would have slipped by because my reflexes slowed as I weighed the benefit of paying attention. "Ignore it" was not an available action in the face of confusion anymore, and that was a dramatic change. Because there are no disruptions, acting on confusion is becoming automatic.
I can't know for sure which bits of confusion I've noticed since the change would otherwise have slipped by unseen. But here's a plausible instance. Tonight I was having dinner with a friend I've met very recently. I was feeling s little bit tired and nervous, so I wasn't putting as much effort as usual into directing the conversation. At one point I realized we had stopped making making any progress toward my goals, since it was clear we were drifting toward small talk. In a tired and slightly nervous state, I imagine that I might have buried that bit of information and abdicated responsibility for the conversation--not by means of considering whether allowing small talk to happen was actually a good idea, but by not pouncing on the dissonance aggressively, and thereby letting it get away. Instead, I directed my attention at the feeling (without effort this time!), inquired of myself what precisely was causing it, identified the prediction that the current course of conversation was leading away from my goals, listed potential interventions, weighed their costs and benefits against my simulation of small talk, and said, "What are your terminal values?"
(I know that sounds like a lot of work, but it took at most three seconds. The hard part was building the pouncing reflex.)
When you know that some of your beliefs are false, and you know that leaving them be is instrumentally rational, you do not develop the automatic reflex of interrogating every suspicion of confusion. You might think you can do this selectively, but if you do, I strongly suspect you're wrong in exactly the way I was.
I have long been more viscerally motivated by things that are interesting or beautiful than by things that correspond to the territory. So it's not too surprising that toward the beginning of my rationality training, I went through a long period of being so enamored with a-veridical instrumental techniques--things like willful doublethink--that I double-thought myself into believing accuracy was not so great.
But I was wrong. And that mattered. Having accurate beliefs is a ridiculously convergent incentive. Every utility function that involves interaction with the territory--interaction of just about any kind!--benefits from a sound map. Even if "beauty" is a terminal value, "being viscerally motivated to increase your ability to make predictions that lead to greater beauty" increases your odds of success.
Dark side epistemology prevents total dedication to continuous improvement in epistemic rationality. Though individual dark side actions may be instrumentally rational, the patterns of thought required to allow them are not. Though instrumental rationality is ultimately the goal, your instrumental rationality will always be limited by your epistemic rationality.
That was important enough to say again: Your instrumental rationality will always be limited by your epistemic rationality.
It only takes a fraction of a second to sweep an observation into the corner. You don't have time to decide whether looking at it might prove problematic. If you take the time to protect your compartments, false beliefs you don't endorse will slide in from everywhere through those split-second cracks in your art. You must attend to your confusion the very moment you notice it. You must be relentless an unmerciful toward your own beliefs.
Excellent epistemology is not the natural state of a human brain. Rationality is hard. Without extreme dedication and advanced training, without reliable automatic reflexes of rational thought, your belief structure will be a mess. You can't have totally automatic anti-rationalization reflexes if you use doublethink as a technique of instrumental rationality.
This has been a difficult lesson for me. I have lost some benefits I'd gained from the Dark Arts. I'm late now, sometimes. And painful truths are painful, though now they are sharp and fast instead of dull and damaging.
And it is so worth it! I have much more work to do before I can move on to the next thing. But whatever the next thing is, I'll tackle it with far more predictive power than I otherwise would have--though I doubt I'd have noticed the difference.
So when I say that I'm against against against doublethink--that dark side epistemology is bad--I mean that there is more potential on the light side, not that the dark side has no redeeming features. Its fruits hang low, and they are delicious.
But the fruits of the light side are worth the climb. You'll never even know they're there if you gorge yourself in the dark forever.
Robin Hanson once wrote:
On average, contrarian views are less accurate than standard views. Honest contrarians should admit this, that neutral outsiders should assign most contrarian views a lower probability than standard views, though perhaps a high enough probability to warrant further investigation. Honest contrarians who expect reasonable outsiders to give their contrarian view more than normal credence should point to strong outside indicators that correlate enough with contrarians tending more to be right.
I tend to think through the issue in three stages:
- When should I consider myself to be holding a contrarian view? What is the relevant expert community?
- If I seem to hold a contrarian view, when do I have enough reason to think I’m correct?
- If I seem to hold a correct contrarian view, what can I do to give other people good reasons to accept my view, or at least to take it seriously enough to examine it at length?
I don’t yet feel that I have “answers” to these questions, but in this post (and hopefully some future posts) I’d like to organize some of what has been said before, and push things a bit further along, in the hope that further discussion and inquiry will contribute toward significant progress in social epistemology. Basically, I hope to say a bunch of obvious things, in a relatively well-organized fashion, so that less obvious things can be said from there.
In this post, I’ll just address stage 1. Hopefully I’ll have time to revisit stages 2 and 3 in future posts.
Is my view contrarian?
World model differences vs. value differences
Is my effective altruism a contrarian view? It seems to be more of a contrarian value judgment than a contrarian world model, and by “contrarian view” I tend to mean “contrarian world model.” Some apparently contrarian views are probably actually contrarian values.
But what’s the relevant expert population, here? Suppose it’s “academics who specialize in the arguments and evidence concerning whether a god or gods exist.” If so, then the expert population is probably dominated by academic theologians and religious philosophers, and my atheism is a contrarian view.
For example, we should consider the selection effects operating on communities of experts. If someone doesn’t believe in God, they’re unlikely to spend their career studying arcane arguments for and against God’s existence. So most people who specialize in this topic are theists, but nearly all of them were theists before they knew the arguments.
Perhaps instead the relevant expert community is “scholars who study the fundamental nature of the universe” — maybe, philosophers and physicists? They’re mostly atheists.  This is starting to get pretty ad-hoc, but maybe that’s unavoidable.
What about my view that the overall long-term impact of AGI will be, most likely, extremely bad? A recent survey of the top 100 authors in artificial intelligence (by citation index) suggests that my view is somewhat out of sync with the views of those researchers. But is that the relevant expert population? My impression is that AI experts know a lot about contemporary AI methods, especially within their subfield, but usually haven’t thought much about, or read much about, long-term AI impacts.
Instead, perhaps I’d need to survey “AGI impact experts” to tell whether my view is contrarian. But who is that, exactly? There’s no standard credential.
Moreover, the most plausible candidates around today for “AGI impact experts” are — like the “experts” of many other fields — mere “scholastic experts,” in that they know a lot about the arguments and evidence typically brought to bear on questions of long-term AI outcomes. They generally are not experts in the sense of “Reliably superior performance on representative tasks” — they don’t have uniquely good track records on predicting long-term AI outcomes, for example. As far as I know, they don’t even have uniquely good track records on predicting short-term geopolitical or sci-tech outcomes — e.g. they aren’t among the “super forecasters” discovered in IARPA’s forecasting tournaments.
Furthermore, we might start to worry about selection effects, again. E.g. if we ask AGI experts when they think AGI will be built, they may be overly optimistic about the timeline: after all, if they didn’t think AGI was feasible soon, they probably wouldn’t be focusing their careers on it.
Perhaps we can salvage this approach for determining whether one has a contrarian view, but for now, let’s consider another proposal.
Mildly extrapolated elite opinion
Nick Beckstead instead suggests that, at least as a strong prior, one should believe what one thinks “a broad coalition of trustworthy people would believe if they were trying to have accurate views and they had access to [one’s own] evidence.” Below, I’ll propose a modification of Beckstead’s approach which aims to address the “Is my view contrarian?” question, and I’ll call it the “mildly extrapolated elite opinion” (MEEO) method for determining the relevant expert population. 
First: which people are “trustworthy”? With Beckstead, I favor “giving more weight to the opinions of people who can be shown to be trustworthy by clear indicators that many people would accept, rather than people that seem trustworthy to you personally.” (This guideline aims to avoid parochialism and self-serving cognitive biases.)
What are some “clear indicators that many people would accept”? Beckstead suggests:
IQ, business success, academic success, generally respected scientific or other intellectual achievements, wide acceptance as an intellectual authority by certain groups of people, or success in any area where there is intense competition and success is a function of ability to make accurate predictions and good decisions…
Of course, trustworthiness can also be domain-specific. Very often, elite common sense would recommend deferring to the opinions of experts (e.g., listening to what physicists say about physics, what biologists say about biology, and what doctors say about medicine). In other cases, elite common sense may give partial weight to what putative experts say without accepting it all (e.g. economics and psychology). In other cases, they may give less weight to what putative experts say (e.g. sociology and philosophy).
Hence MEEO outsources the challenge of evaluating academic consensus in different fields to the “generally trustworthy people.” But in doing so, it raises several new challenges. How do we determine which people are trustworthy? How do we “mildly extrapolate” their opinions? How do we weight those mildly extrapolated opinions in combination?
This approach might also be promising, or it might be even harder to use than the “expert consensus” method.
In practice, I tend to do something like this:
- To determine whether my view is contrarian, I ask whether there’s a fairly obvious, relatively trustworthy expert population on the issue. If there is, I try to figure out what their consensus on the matter is. If it’s different than my view, I conclude I have a contrarian view.
- If there isn’t an obvious trustworthy expert population on the issue from which to extract a consensus view, then I basically give up on step 1 (“Is my view contrarian?”) and just move to the model combination in step 2 (see below), retaining pretty large uncertainty about how contrarian my view might be.
When do I have good reason to think I’m correct?
Suppose I conclude I have a contrarian view, as I plausibly have about long-term AGI outcomes, and as I might have about the technological feasibility of preserving myself via cryonics. How much evidence do I need to conclude that my view is justified despite the informed disagreement of others?
I’ll try to tackle that question in a future post. Not surprisingly, my approach is a kind of model combination and adjustment.
I don’t have a concise definition for what counts as a “contrarian view.” In any case, I don’t think that searching for an exact definition of “contrarian view” is what matters. In an email conversation with me, Holden Karnofsky concurred, making the point this way: “I agree with you that the idea of ‘contrarianism’ is tricky to define. I think things get a bit easier when you start looking for patterns that should worry you rather than trying to Platonically define contrarianism… I find ‘Most smart people think I’m bonkers about X’ and ‘Most people who have studied X more than I have plus seem to generally think like I do think I’m wrong about X’ both worrying; I find ‘Most smart people think I’m wrong about X’ and ‘Most people who spend their lives studying X within a system that seems to be clearly dysfunctional and to have a bad track record think I’m bonkers about X’ to be less worrying.” ↩ For a diverse set of perspectives on the social epistemology of disagreement and contrarianism not influenced (as far as I know) by the Overcoming Bias and Less Wrong conversations about the topic, see Christensen (2009); Ericsson et al. (2006); Kuchar (forthcoming); Miller (2013); Gelman (2009); Martin & Richards (1995); Schwed & Bearman (2010); Intemann & de Melo-Martin (2013). Also see Wikipedia’s article on scientific consensus. ↩ I suppose I should mention that my entire inquiry here is, ala Goldman (1998), premised on the assumptions that (1) the point of epistemology is the pursuit of correspondence-theory truth, and (2) the point of social epistemology is to evaluate which social institutions and practices have instrumental value for producing true or well-calibrated beliefs. ↩ Holden Karnofsky seems to agree: “I think effective altruism falls somewhere on the spectrum between ‘contrarian view’ and ‘unusual taste.’ My commitment to effective altruism is probably better characterized as ‘wanting/choosing to be an effective altruist’ than as ‘believing that effective altruism is correct.’” ↩ Without such heuristics, we can also rather quickly arrive at contradictions. For example, the majority of scholars who specialize in Allah’s existence believe that Allah is the One True God, and the majority of scholars who specialize in Yahweh’s existence believe that Yahweh is the One True God. Consistency isn’t everything, but contradictions like this should still be a warning sign. ↩ According to the PhilPapers Surveys, 72.8% of philosophers are atheists, 14.6% are theists, and 12.6% categorized themselves as “other.” If we look only at metaphysicians, atheism remains dominant at 73.7%. If we look only at analytic philosophers, we again see atheism at 76.3%. As for physicists: Larson & Witham (1997) found that 77.9% of physicists and astronomers are disbelievers, and Pew Research Center (2009) found that 71% of physicists and astronomers did not believe in a god. ↩ Muller & Bostrom (forthcoming). “Future Progress in Artificial Intelligence: A Poll Among Experts.” ↩ But, this is unclear. First, I haven’t read the forthcoming paper, so I don’t yet have the full results of the survey, along with all its important caveats. Second, distributions of expert opinion can vary widely between polls. For example, Schlosshauer et al. (2013) reports the results of a poll given to participants in a 2011 quantum foundations conference (mostly physicists). When asked “When will we have a working and useful quantum computer?”, 9% said “within 10 years,” 42% said “10–25 years,” 30% said “25–50 years,” 0% said “50–100 years,” and 15% said “never.” But when the exact same questions were asked of participants at another quantum foundations conference just two years later, Norsen & Nelson (2013) report, the distribution of opinion was substantially different: 9% said “within 10 years,” 22% said “10–25 years,” 20% said “25–50 years,” 21% said “50–100 years,” and 12% said “never.” ↩ I say “they” in this paragraph, but I consider myself to be a plausible candidate for an “AGI impact expert,” in that I’m unusually familiar with the arguments and evidence typically brought to bear on questions of long-term AI outcomes. I also don’t have a uniquely good track record on predicting long-term AI outcomes, nor am I among the discovered “super forecasters.” I haven’t participated in IARPA’s forecasting tournaments myself because it would just be too time consuming. I would, however, very much like to see these super forecasters grouped into teams and tasked with forecasting longer-term outcomes, so that we can begin to gather scientific data on which psychological and computational methods result in the best predictive outcomes when considering long-term questions. Given how long it takes to acquire these data, we should start as soon as possible. ↩ Beckstead’s “elite common sense” prior and my “mildly extrapolated elite opinion” method are epistemic notions that involve some kind idealization or extrapolation of opinion. One earlier such proposal in social epistemology was Habermas’ “ideal speech situation,” a situation of unlimited discussion between free and equal humans. See Habermas’ “Wahrheitstheorien” in Schulz & Fahrenbach (1973) or, for an English description, Geuss (1981), pp. 65–66. See also the discussion in Tucker (2003), pp. 502–504. ↩ Beckstead calls his method the “elite common sense” prior. I’ve named my method differently for two reasons. First, I want to distinguish MEEO from Beckstead’s prior, since I’m using the method for a slightly different purpose. Second, I think “elite common sense” is a confusing term even for Beckstead’s prior, since there’s some extrapolation of views going on. But also, it’s only a “mild” extrapolation — e.g. we aren’t asking what elites would think if they knew everything, or if they could rewrite their cognitive software for better reasoning accuracy. ↩ My rough impression is that among the people who seem to have thought long and hard about AGI outcomes, and seem to me to exhibit fairly good epistemic practices on most issues, my view on AGI outcomes is still an outlier in its pessimism about the likelihood of desirable outcomes. But it’s hard to tell: there haven’t been systematic surveys of the important-to-me experts on the issue. I also wonder whether my views about long-term AGI outcomes are more a matter of seriously tackling a contrarian question rather than being a matter of having a particularly contrarian view. On this latter point, see this Facebook discussion. ↩ I haven’t seen a poll of cryobiologists on the likely future technological feasibility of cryonics. Even if there were such polls, I’d wonder whether cryobiologists also had the relevant philosophical and neuroscientific expertise. I should mention that I’m not personally signed up for cryonics, for these reasons. ↩
I don’t have a concise definition for what counts as a “contrarian view.” In any case, I don’t think that searching for an exact definition of “contrarian view” is what matters. In an email conversation with me, Holden Karnofsky concurred, making the point this way: “I agree with you that the idea of ‘contrarianism’ is tricky to define. I think things get a bit easier when you start looking for patterns that should worry you rather than trying to Platonically define contrarianism… I find ‘Most smart people think I’m bonkers about X’ and ‘Most people who have studied X more than I have plus seem to generally think like I do think I’m wrong about X’ both worrying; I find ‘Most smart people think I’m wrong about X’ and ‘Most people who spend their lives studying X within a system that seems to be clearly dysfunctional and to have a bad track record think I’m bonkers about X’ to be less worrying.” ↩
For a diverse set of perspectives on the social epistemology of disagreement and contrarianism not influenced (as far as I know) by the Overcoming Bias and Less Wrong conversations about the topic, see Christensen (2009); Ericsson et al. (2006); Kuchar (forthcoming); Miller (2013); Gelman (2009); Martin & Richards (1995); Schwed & Bearman (2010); Intemann & de Melo-Martin (2013). Also see Wikipedia’s article on scientific consensus. ↩
I suppose I should mention that my entire inquiry here is, ala Goldman (1998), premised on the assumptions that (1) the point of epistemology is the pursuit of correspondence-theory truth, and (2) the point of social epistemology is to evaluate which social institutions and practices have instrumental value for producing true or well-calibrated beliefs. ↩
Holden Karnofsky seems to agree: “I think effective altruism falls somewhere on the spectrum between ‘contrarian view’ and ‘unusual taste.’ My commitment to effective altruism is probably better characterized as ‘wanting/choosing to be an effective altruist’ than as ‘believing that effective altruism is correct.’” ↩
Without such heuristics, we can also rather quickly arrive at contradictions. For example, the majority of scholars who specialize in Allah’s existence believe that Allah is the One True God, and the majority of scholars who specialize in Yahweh’s existence believe that Yahweh is the One True God. Consistency isn’t everything, but contradictions like this should still be a warning sign. ↩
According to the PhilPapers Surveys, 72.8% of philosophers are atheists, 14.6% are theists, and 12.6% categorized themselves as “other.” If we look only at metaphysicians, atheism remains dominant at 73.7%. If we look only at analytic philosophers, we again see atheism at 76.3%. As for physicists: Larson & Witham (1997) found that 77.9% of physicists and astronomers are disbelievers, and Pew Research Center (2009) found that 71% of physicists and astronomers did not believe in a god. ↩
Muller & Bostrom (forthcoming). “Future Progress in Artificial Intelligence: A Poll Among Experts.” ↩
But, this is unclear. First, I haven’t read the forthcoming paper, so I don’t yet have the full results of the survey, along with all its important caveats. Second, distributions of expert opinion can vary widely between polls. For example, Schlosshauer et al. (2013) reports the results of a poll given to participants in a 2011 quantum foundations conference (mostly physicists). When asked “When will we have a working and useful quantum computer?”, 9% said “within 10 years,” 42% said “10–25 years,” 30% said “25–50 years,” 0% said “50–100 years,” and 15% said “never.” But when the exact same questions were asked of participants at another quantum foundations conference just two years later, Norsen & Nelson (2013) report, the distribution of opinion was substantially different: 9% said “within 10 years,” 22% said “10–25 years,” 20% said “25–50 years,” 21% said “50–100 years,” and 12% said “never.” ↩
I say “they” in this paragraph, but I consider myself to be a plausible candidate for an “AGI impact expert,” in that I’m unusually familiar with the arguments and evidence typically brought to bear on questions of long-term AI outcomes. I also don’t have a uniquely good track record on predicting long-term AI outcomes, nor am I among the discovered “super forecasters.” I haven’t participated in IARPA’s forecasting tournaments myself because it would just be too time consuming. I would, however, very much like to see these super forecasters grouped into teams and tasked with forecasting longer-term outcomes, so that we can begin to gather scientific data on which psychological and computational methods result in the best predictive outcomes when considering long-term questions. Given how long it takes to acquire these data, we should start as soon as possible. ↩
Beckstead’s “elite common sense” prior and my “mildly extrapolated elite opinion” method are epistemic notions that involve some kind idealization or extrapolation of opinion. One earlier such proposal in social epistemology was Habermas’ “ideal speech situation,” a situation of unlimited discussion between free and equal humans. See Habermas’ “Wahrheitstheorien” in Schulz & Fahrenbach (1973) or, for an English description, Geuss (1981), pp. 65–66. See also the discussion in Tucker (2003), pp. 502–504. ↩
Beckstead calls his method the “elite common sense” prior. I’ve named my method differently for two reasons. First, I want to distinguish MEEO from Beckstead’s prior, since I’m using the method for a slightly different purpose. Second, I think “elite common sense” is a confusing term even for Beckstead’s prior, since there’s some extrapolation of views going on. But also, it’s only a “mild” extrapolation — e.g. we aren’t asking what elites would think if they knew everything, or if they could rewrite their cognitive software for better reasoning accuracy. ↩
My rough impression is that among the people who seem to have thought long and hard about AGI outcomes, and seem to me to exhibit fairly good epistemic practices on most issues, my view on AGI outcomes is still an outlier in its pessimism about the likelihood of desirable outcomes. But it’s hard to tell: there haven’t been systematic surveys of the important-to-me experts on the issue. I also wonder whether my views about long-term AGI outcomes are more a matter of seriously tackling a contrarian question rather than being a matter of having a particularly contrarian view. On this latter point, see this Facebook discussion. ↩
I haven’t seen a poll of cryobiologists on the likely future technological feasibility of cryonics. Even if there were such polls, I’d wonder whether cryobiologists also had the relevant philosophical and neuroscientific expertise. I should mention that I’m not personally signed up for cryonics, for these reasons. ↩
In the previous article in this sequence, I conducted a thought experiment in which simple probability was not sufficient to choose how to act. Rationality required reasoning about meta-probabilities, the probabilities of probabilities.
Relatedly, lukeprog has a brief post that explains how this matters; a long article by HoldenKarnofsky makes meta-probability central to utilitarian estimates of the effectiveness of charitable giving; and Jonathan_Lee, in a reply to that, has used the same framework I presented.
In my previous article, I ran thought experiments that presented you with various colored boxes you could put coins in, gambling with uncertain odds.
The last box I showed you was blue. I explained that it had a fixed but unknown probability of a twofold payout, uniformly distributed between 0 and 0.9. The overall probability of a payout was 0.45, so the expectation value for gambling was 0.9—a bad bet. Yet your optimal strategy was to gamble a bit to figure out whether the odds were good or bad.
Let’s continue the experiment. I hand you a black box, shaped rather differently from the others. Its sealed faceplate is carved with runic inscriptions and eldritch figures. “I find this one particularly interesting,” I say.
This article is the first in a sequence that will consider situations where probability estimates are not, by themselves, adequate to make rational decisions. This one introduces a "meta-probability" approach, borrowed from E. T. Jaynes, and uses it to analyze a gambling problem. This situation is one in which reasonably straightforward decision-theoretic methods suffice. Later articles introduce increasingly problematic cases.
[I have edited the introduction of this post for increased clarity.]
This post is my attempt to answer the question, "How should we take account of the distribution of opinion and epistemic standards in the world?" By “epistemic standards,” I roughly mean a person’s way of processing evidence to arrive at conclusions. If people were good Bayesians, their epistemic standards would correspond to their fundamental prior probability distributions. At a first pass, my answer to this questions is:
Main Recommendation: Believe what you think a broad coalition of trustworthy people would believe if they were trying to have accurate views and they had access to your evidence.
The rest of the post can be seen as an attempt to spell this out more precisely and to explain, in practical terms, how to follow the recommendation. Note that there are therefore two broad ways to disagree with the post: you might disagree with the main recommendation, or the guidelines for following main recommendation.
I am aware of two relatively close intellectual relatives to my framework: what philosophers call “equal weight” or “conciliatory” views about disagreement and what people on LessWrong may know as “philosophical majoritarianism.” Equal weight views roughly hold that when two people who are expected to be roughly equally competent at answering a certain question have different subjective probability distributions over answers to that question, those people should adopt some impartial combination of their subjective probability distributions. Unlike equal weight views in philosophy, my position is meant as a set of rough practical guidelines rather than a set of exceptionless and fundamental rules. I accordingly focus on practical issues for applying the framework effectively and am open to limiting the framework’s scope of application. Philosophical majoritarianism is the idea that on most issues, the average opinion of humanity as a whole will be a better guide to the truth than one’s own personal judgment. My perspective differs from both equal weight views and philosophical majoritarianism in that it emphasizes an elite subset of the population rather than humanity as a whole and that it emphasizes epistemic standards more than individual opinions. My perspective differs from what you might call "elite majoritarianism" in that, according to me, you can disagree with what very trustworthy people think on average if you think that those people would accept your views if they had access to your evidence and were trying to have accurate opinions.
I am very grateful to Holden Karnofsky and Jonah Sinick for thought-provoking conversations on this topic which led to this post. Many of the ideas ultimately derive from Holden’s thinking, but I've developed them, made them somewhat more precise and systematic, discussed additional considerations for and against adopting them, and put everything in my own words. I am also grateful to Luke Muehlhauser and Pablo Stafforini for feedback on this post.
In the rest of this post I will:
- Outline the framework and offer guidelines for applying it effectively. I explain why I favor relying on the epistemic standards of people who are trustworthy by clear indicators that many people would accept, why I favor paying more attention to what people think than why they say they think it (on the margin), and why I favor stress-testing critical assumptions by attempting to convince a broad coalition of trustworthy people to accept them.
- Offer some considerations in favor of using the framework.
- Respond to the objection that common sense is often wrong, the objection that the most successful people are very unconventional, and objections of the form “elite common sense is wrong about X and can’t be talked out of it.”
- Discuss some limitations of the framework and some areas where it might be further developed. I suspect it is weakest in cases where there is a large upside to disregarding elite common sense, there is little downside, and you’ll find out whether your bet against conventional wisdom was right within a tolerable time limit, and cases where people are unwilling to carefully consider arguments with the goal of having accurate beliefs.
I give a history of the 2009 leaked script, discuss internal & external evidence for its authenticity including stylometrics; and then give a simple step-by-step Bayesian analysis of each point. We finish with high confidence in the script's authenticity, discussion of how this analysis was surprisingly enlightening, and what followup work the analysis suggests would be most valuable.
Strategic Reliabilism is an epistemological framework that, unlike other contemporary academic theories, is grounded in psychology and seeks to give genuine advice on how to form beliefs. The framework was first laid out by Michael Bishop and J.D. Trout in their book Epistemology and the Psychology of Human Judgment. Although regular readers here won’t necessarily find a lot of new material here, Bishop and Trout provide a clear description of many of the working assumptions and goals of this community. In contrast to standard epistemology, which seeks to explain what constitutes a justified belief, Strategic Reliabilism is meant to explain excellent reasoning. In particular, reasoning is excellent to the extent it reliably and efficiently produces truths about significant matters. When combined with the Aristotelian principle that good reasoning tends to produce good outcomes in the long run (i.e. rationalists should win), empirical findings about good reasoning gain prescriptive power. Rather than getting bogged down in definitional debates, epistemology really is about being less wrong.
The book is an easily read 150 pages, and I highly recommend you find a copy, but a chapter-by-chapter summary is below. As I said, you might not find a lot of new ideas in this book, but it went a long ways in clarifying how I think about this topic. For instance, even though it can seem trivial to be told to focus on significant problems, these basic issues deserve a little extra thought.
If you enjoy podcasts, check out lukeprog’s interview with Michael Bishop. This article provides another overview of Strategic Reliabilism, addressing objections raised since the publication of the book.
ETA: As stated below, criticizing beliefs is trivial in principle, either they were arrived at with an approximation to Bayes' rule starting with a reasonable prior and then updated with actual observations, or they weren't. Subsequent conversation made it clear that criticizing behavior is also trivial in principle, since someone is either taking the action that they believe will best suit their preferences, or not. Finally, criticizing preferences became trivial too -- the relevant question is "Does/will agent X behave as though they have preferences Y", and that's a belief, so go back to Bayes' rule and a reasonable prior. So the entire issue that this post was meant to solve has evaporated, in my opinion. Here's the original article, in case anyone is still interested:
Pancritical rationalism is a fundamental value in Extropianism that has only been mentioned in passing on LessWrong. I think it deserves more attention here. It's an approach to epistemology, that is, the question of "How do we know what we know?", that avoids the contradictions inherent in some of the alternative approaches.
The fundamental source document for it is William Bartley's Retreat to Commitment. He describes three approaches to epistemology, along with the dissatisfying aspects of the other two:
- Nihilism. Nothing matters, so it doesn't matter what you believe. This path is self-consistent, but it gives no guidance.
- Justificationlism. Your belief is justified because it is a consequence of other beliefs. This path is self-contradictory. Eventually you'll go in circles trying to justify the other beliefs, or you'll find beliefs you can't jutify. Justificationalism itself cannot be justified.
- Pancritical rationalism. You have taken the available criticisms for the belief into account and still feel comfortable with the belief. This path gives guidance about what to believe, although it does not uniquely determine one's beliefs. Pancritical rationalism can be criticized, so it is self-consistent in that sense.
Read on for a discussion about emotional consequences and extending this to include preferences and behaviors as well as beliefs.
Designed to gauge responses to some parts of the planned “Noticing confusion about meta-ethics” sequence, which should intertwine with or be absorbed by Lukeprog’s meta-ethics sequence at some point.
Disclaimer: I am going to leave out many relevant details. If you want, you can bring them up in the comments, but in general meta-ethics is still very confusing and thus we could list relevant details all day and still be confused. There are a lot of subtle themes and distinctions that have thus far been completely ignored by everyone, as far as I can tell.
Problem 1: Torture versus specks
Imagine you’re at a Less Wrong meetup when out of nowhere Eliezer Yudkowsky proposes his torture versus dust specks problem. Years of bullet-biting make this a trivial dilemma for any good philosopher, but suddenly you have a seizure during which you vividly recall all of those history lessons where you learned about the horrible things people do when they feel justified in being blatantly evil because of some abstract moral theory that is at best an approximation of sane morality and at worst an obviously anti-epistemic spiral of moral rationalization. Temporarily humbled, you decide to think about the problem a little longer:
"Considering I am deciding the fate of 3^^^3+1 people, I should perhaps not immediately assert my speculative and controversial meta-ethics. Instead, perhaps I should use the averaged meta-ethics of the 3^^^3+1 people I am deciding for, since it is probable that they have preferences that implicitly cover edge cases such as this, and disregarding the meta-ethical preferences of 3^^^3+1 people is certainly one of the most blatantly immoral things one can do. After all, even if they never learn anything about this decision taking place, people are allowed to have preferences about it. But... that the majority of people believe something doesn’t make it right, and that the majority of people prefer something doesn’t make it right either. If I expect that these 3^^^3+1 people are mostly wrong about morality and would not reflectively endorse their implicit preferences being used in this decision instead of my explicitly reasoned and reflected upon preferences, then I should just go with mine, even if I am knowingly arrogantly blatantly disregarding the current preferences of 3^^^3 currently-alive-and-and-not-just-hypothetical people in doing so and thus causing negative utility many, many, many times more severe than the 3^^^3 units of negative utility I was trying to avert. I may be willing to accept this sacrifice, but I should at least admit that what I am doing largely ignores their current preferences, and there is some chance it is wrong upon reflection regardless, for though I am wiser than those 3^^^3+1 people, I notice that I too am confused."
You hesitantly give your answer and continue to ponder the analogies to Eliezer’s document “CEV”, and this whole business about “extrapolation”...
(Thinking of people as having coherent non-contradictory preferences is very misleadingly wrong, not taking into account preferences at gradient levels of organization is probably wrong, not thinking of typical human preferences as implicitly preferring to update in various ways is maybe wrong (i.e. failing to see preferences as processes embedded in time is probably wrong), et cetera, but I have to start somewhere and this is already glossing over way too much.)
Bonus problem 1: Taking trolleys seriously
"...Wait, considering how unlikely this scenario is, if I ever actually did end up in it then that would probably mean I was in some perverse simulation set up by empirical meta-ethicists with powerful computers, in which case they might use my decision as part of a propaganda campaign meant to somehow discredit consequentialist reasoning or maybe deontological reasoning, or maybe they'd use it for some other reason entirely, but at any rate that sure complicates the problem...” (HT: Steve Rayhawk)
View more: Next