Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
There’s a story I like, about this little kid who wants to be a writer. So she writes a story and shows it to her teacher.
“You misspelt the word ‘ocean’”, says the teacher.
“No I didn’t!”, says the kid.
The teacher looks a bit apologetic, but persists: “‘Ocean’ is spelt with a ‘c’ rather than an ‘sh’; this makes sense, because the ‘e’ after the ‘c’ changes its sound…”
“No I didn’t!” interrupts the kid.
“Look,” says the teacher, “I get it that it hurts to notice mistakes. But that which can be destroyed by the truth should be! You did, in fact, misspell the word ‘ocean’.”
“I did not!” says the kid, whereupon she bursts into tears, and runs away and hides in the closet, repeating again and again: “I did not misspell the word! I can too be a writer!”.
Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality”
In the days since we published our previous post, a number of people have come up to me and expressed concerns about our new mission. Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”
I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.
"I feel like I'm not the sort of person who's allowed to have opinions about the important issues like AI risk."
"What's the bad thing that might happen if you expressed your opinion?"
"It would be wrong in some way I hadn't foreseen, and people would think less of me."
"Do you think less of other people who have wrong opinions?"
"Not if they change their minds when confronted with the evidence."
"Would you do that?"
"Do you think other people think less of those who do that?"
"Well, if it's alright for other people to make mistakes, what makes YOU so special?"
A lot of my otherwise very smart and thoughtful friends seem to have a mental block around thinking on certain topics, because they're the sort of topics Important People have Important Opinions around. There seem to be two very different reasons for this sort of block:
- Being wrong feels bad.
- They might lose the respect of others.
If you don't have an opinion, you can hold onto the fantasy that someday, once you figure the thing out, you'll end up having a right opinion. But if you put yourself out there with an opinion that's unmistakably your own, you don't have that excuse anymore.
This is related to the desire to pass tests. The smart kids go through school and are taught - explicitly or tacitly - that as long as they get good grades they're doing OK, and if they try at all they can get good grades. So when they bump up against a problem that might actually be hard, there's a strong impulse to look away, to redirect to something else. So they do.
You have to understand that this system is not real, it's just a game. In real life you have to be straight-up wrong sometimes. So you may as well get it over with.
If you expect to be wrong when you guess, then you're already wrong, and paying the price for it. As Eugene Gendlin said:
What is true is already so. Owning up to it doesn't make it worse. Not being open about it doesn't make it go away. And because it's true, it is what is there to be interacted with. Anything untrue isn't there to be lived. People can stand what is true, for they are already enduring it.
What you would be mistaken about, you're already mistaken about. Owning up to it doesn't make you any more mistaken. Not being open about it doesn't make it go away.
"You're already "wrong" in the sense that your anticipations aren't perfectly aligned with reality. You just haven't put yourself in a situation where you've openly tried to guess the teacher's password. But if you want more power over the world, you need to focus your uncertainty - and this only reliably makes you righter if you repeatedly test your beliefs. Which means sometimes being wrong, and noticing. (And then, of course, changing your mind.)
Being wrong is how you learn - by testing hypotheses.
Getting used to being wrong - forming the boldest hypotheses your current beliefs can truly justify so that you can correct your model based on the data - is painful and I don't have a good solution to getting over it except to tough it out. But there's a part of the problem we can separate out, which is - the pain of being wrong publicly.
When I attended a Toastmasters club, one of the things I liked a lot about giving speeches there was that the stakes were low in terms of the content. If I were giving a presentation at work, I had to worry about my generic presentation skills, but also whether the way I was presenting it was a good match for my audience, and also whether the idea I was pitching was a good strategic move for the company or my career, and also whether the information I was presenting was accurate. At Toastmasters, all the content-related stakes were gone. No one with the power to promote or fire me was present. Everyone was on my side, and the group was all about helping each other get better. So all I had to think about was the form of my speech.
Once I'd learned some general presentations at Toastmasters, it became easier to give talks where I did care about the content and there were real-world consequences to the quality of the talk. I'd gotten practice on the form of public speaking separately - so now I could relax about that, and just focus on getting the content right.
Similarly, expressing opinions publicly can be stressful because of the work of generating likely hypotheses, and revealing to yourself that you are farther behind in understanding things than you thought - but also because of the perceived social consequences of sounding stupid. You can at least isolate the last factor, by starting out thinking things through in secret. This works by separating epistemic uncertainty from social confidence. (This is closely related to the dichotomy between social and objective respect.)
Of course, as soon as you can stand to do this in public, that's better - you'll learn faster, you'll get help. But if you're not there yet, this is a step along the way. If the choice is between having private opinions and having none, have private opinions. (Also related: If we can't lie to others, we will lie to ourselves.)
Read and discuss a book on a topic you want to have opinions about, with one trusted friend. Start a secret blog - or just take notes. Practice having opinions at all, that you can be wrong about, before you worry about being accountable for your opinions. One step at a time.
Before you're publicly right, consider being secretly wrong. Better to be secretly wrong, than secretly not even wrong.
(Cross-posted at my personal blog.)
A bit about our last few months:
- We’ve been working on getting a simple clear mission and an organization that actually works. We think of our goal as analogous to the transition that the old Singularity Institute underwent under Lukeprog (during which chaos was replaced by a simple, intelligible structure that made it easier to turn effort into forward motion).
- As part of that, we’ll need to find a way to be intelligible.
- This is the first of several blog posts aimed at causing our new form to be visible from outside. (If you're in the Bay Area, you can also come meet us at tonight's open house.) (We'll be talking more about the causes of this mission-change; the extent to which it is in fact a change, etc. in an upcoming post.)
We care a lot about AI Safety efforts in particular, and about otherwise increasing the odds that humanity reaches the stars.
Also, we believe such efforts are bottlenecked more by our collective epistemology, than by the number of people who verbally endorse or act on "AI Safety", or any other "spreadable viewpoint" disconnected from its derivation.
Our aim is therefore to find ways of improving both individual thinking skill, and the modes of thinking and social fabric that allow people to think together. And to do this among the relatively small sets of people tackling existential risk.
The most useful thinking skill I've taught myself, which I think should be more widely practiced, is writing what I call "fact posts." I write a bunch of these on my blog. (I write fact posts about pregnancy and childbirth here.)
To write a fact post, you start with an empirical question, or a general topic. Something like "How common are hate crimes?" or "Are epidurals really dangerous?" or "What causes manufacturing job loss?"
It's okay if this is a topic you know very little about. This is an exercise in original seeing and showing your reasoning, not finding the official last word on a topic or doing the best analysis in the world.
Then you open up a Google doc and start taking notes.
You look for quantitative data from conventionally reliable sources. CDC data for incidences of diseases and other health risks in the US; WHO data for global health issues; Bureau of Labor Statistics data for US employment; and so on. Published scientific journal articles, especially from reputable journals and large randomized studies.
You explicitly do not look for opinion, even expert opinion. You avoid news, and you're wary of think-tank white papers. You're looking for raw information. You are taking a sola scriptura approach, for better and for worse.
And then you start letting the data show you things.
You see things that are surprising or odd, and you note that.
You see facts that seem to be inconsistent with each other, and you look into the data sources and methodology until you clear up the mystery.
You orient towards the random, the unfamiliar, the things that are totally unfamiliar to your experience. One of the major exports of Germany is valves? When was the last time I even thought about valves? Why valves, what do you use valves in? OK, show me a list of all the different kinds of machine parts, by percent of total exports.
And so, you dig in a little bit, to this part of the world that you hadn't looked at before. You cultivate the ability to spin up a lightweight sort of fannish obsessive curiosity when something seems like it might be a big deal.
And you take casual notes and impressions (though keeping track of all the numbers and their sources in your notes).
You do a little bit of arithmetic to compare things to familiar reference points. How does this source of risk compare to the risk of smoking or going horseback riding? How does the effect size of this drug compare to the effect size of psychotherapy?
You don't really want to do statistics. You might take percents, means, standard deviations, maybe a Cohen's d here and there, but nothing fancy. You're just trying to figure out what's going on.
It's often a good idea to rank things by raw scale. What is responsible for the bulk of deaths, the bulk of money moved, etc? What is big? Then pay attention more to things, and ask more questions about things, that are big. (Or disproportionately high-impact.)
You may find that this process gives you contrarian beliefs, but often you won't, you'll just have a strongly fact-based assessment of why you believe the usual thing.
There's a quality of ordinariness about fact-based beliefs. It's not that they're never surprising -- they often are. But if you do fact-checking frequently enough, you begin to have a sense of the world overall that stays in place, even as you discover new facts, instead of swinging wildly around at every new stimulus. For example, after doing lots and lots of reading of the biomedical literature, I have sort of a "sense of the world" of biomedical science -- what sorts of things I expect to see, and what sorts of things I don't. My "sense of the world" isn't that the world itself is boring -- I actually believe in a world rich in discoveries and low-hanging fruit -- but the sense itself has stabilized, feels like "yeah, that's how things are" rather than "omg what is even going on."
In areas where I'm less familiar, I feel more like "omg what is even going on", which sometimes motivates me to go accumulate facts.
Once you've accumulated a bunch of facts, and they've "spoken to you" with some conclusions or answers to your question, you write them up on a blog, so that other people can check your reasoning. If your mind gets changed, or you learn more, you write a follow-up post. You should, on any topic where you continue to learn over time, feel embarrassed by the naivety of your early posts. This is fine. This is how learning works.
The advantage of fact posts is that they give you the ability to form independent opinions based on evidence. It's a sort of practice of the skill of seeing. They likely aren't the optimal way to get the most accurate beliefs -- listening to the best experts would almost certainly be better -- but you, personally, may not know who the best experts are, or may be overwhelmed by the swirl of controversy. Fact posts give you a relatively low-effort way of coming to informed opinions. They make you into the proverbial 'educated layman.'
Being an 'educated layman' makes you much more fertile in generating ideas, for research, business, fiction, or anything else. Having facts floating around in your head means you'll naturally think of problems to solve, questions to ask, opportunities to fix things in the world, applications for your technical skills.
Ideally, a group of people writing fact posts on related topics, could learn from each other, and share how they think. I have the strong intuition that this is valuable. It's a bit more active than a "journal club", and quite a bit more casual than "research". It's just the activity of learning and showing one's work in public.
Double crux is one of CFAR's newer concepts, and one that's forced a re-examination and refactoring of a lot of our curriculum (in the same way that the introduction of TAPs and Inner Simulator did previously). It rapidly became a part of our organizational social fabric, and is one of our highest-EV threads for outreach and dissemination, so it's long overdue for a public, formal explanation.
Note that while the core concept is fairly settled, the execution remains somewhat in flux, with notable experimentation coming from Julia Galef, Kenzi Amodei, Andrew Critch, Eli Tyre, Anna Salamon, myself, and others. Because of that, this post will be less of a cake and more of a folk recipe—this is long and meandering on purpose, because the priority is to transmit the generators of the thing over the thing itself. Accordingly, if you think you see stuff that's wrong or missing, you're probably onto something, and we'd appreciate having them added here as commentary.
To a first approximation, a human can be thought of as a black box that takes in data from its environment, and outputs beliefs and behaviors (that black box isn't really "opaque" given that we do have access to a lot of what's going on inside of it, but our understanding of our own cognition seems uncontroversially incomplete).
When two humans disagree—when their black boxes output different answers, as below—there are often a handful of unproductive things that can occur.
The most obvious (and tiresome) is that they'll simply repeatedly bash those outputs together without making any progress (think most disagreements over sports or politics; the people above just shouting "triangle!" and "circle!" louder and louder). On the second level, people can (and often do) take the difference in output as evidence that the other person's black box is broken (i.e. they're bad, dumb, crazy) or that the other person doesn't see the universe clearly (i.e. they're biased, oblivious, unobservant). On the third level, people will often agree to disagree, a move which preserves the social fabric at the cost of truth-seeking and actual progress.
Double crux in the ideal solves all of these problems, and in practice even fumbling and inexpert steps toward that ideal seem to produce a lot of marginal value, both in increasing understanding and in decreasing conflict-due-to-disagreement.
This post will occasionally delineate two versions of double crux: a strong version, in which both parties have a shared understanding of double crux and have explicitly agreed to work within that framework, and a weak version, in which only one party has access to the concept, and is attempting to improve the conversational dynamic unilaterally.
In either case, the following things seem to be required:
- Epistemic humility. The number one foundational backbone of rationality seems, to me, to be how readily one is able to think "It's possible that I might be the one who's wrong, here." Viewed another way, this is the ability to take one's beliefs as object, rather than being subject to them and unable to set them aside (and then try on some other belief and productively imagine "what would the world be like if this were true, instead of that?").
- Good faith. An assumption that people believe things for causal reasons; a recognition that having been exposed to the same set of stimuli would have caused one to hold approximately the same beliefs; a default stance of holding-with-skepticism what seems to be evidence that the other party is bad or wants the world to be bad (because as monkeys it's not hard for us to convince ourselves that we have such evidence when we really don't).1
- Confidence in the existence of objective truth. I was tempted to call this "objectivity," "empiricism," or "the Mulder principle," but in the end none of those quite fit. In essence: a conviction that for almost any well-defined question, there really truly is a clear-cut answer. That answer may be impractically or even impossibly difficult to find, such that we can't actually go looking for it and have to fall back on heuristics (e.g. how many grasshoppers are alive on Earth at this exact moment, is the color orange superior to the color green, why isn't there an audio book of Fight Club narrated by Edward Norton), but it nevertheless exists.
- Curiosity and/or a desire to uncover truth. Originally, I had this listed as truth-seeking alone, but my colleagues pointed out that one can move in the right direction simply by being curious about the other person and the contents of their map, without focusing directly on the territory.
At CFAR workshops, we hit on the first and second through specific lectures, the third through osmosis, and the fourth through osmosis and a lot of relational dynamics work that gets people curious and comfortable with one another. Other qualities (such as the ability to regulate and transcend one's emotions in the heat of the moment, or the ability to commit to a thought experiment and really wrestle with it) are also helpful, but not as critical as the above.
How to play
Let's say you have a belief, which we can label A (for instance, "middle school students should wear uniforms"), and that you're in disagreement with someone who believes some form of ¬A. Double cruxing with that person means that you're both in search of a second statement B, with the following properties:
- You and your partner both disagree about B as well (you think B, your partner thinks ¬B).
- The belief B is crucial for your belief in A; it is one of the cruxes of the argument. If it turned out that B was not true, that would be sufficient to make you think A was false, too.
- The belief ¬B is crucial for your partner's belief in ¬A, in a similar fashion.
In the example about school uniforms, B might be a statement like "uniforms help smooth out unhelpful class distinctions by making it harder for rich and poor students to judge one another through clothing," which your partner might sum up as "optimistic bullshit." Ideally, B is a statement that is somewhat closer to reality than A—it's more concrete, grounded, well-defined, discoverable, etc. It's less about principles and summed-up, induced conclusions, and more of a glimpse into the structure that led to those conclusions.
(It doesn't have to be concrete and discoverable, though—often after finding B it's productive to start over in search of a C, and then a D, and then an E, and so forth, until you end up with something you can research or run an experiment on).
At first glance, it might not be clear why simply finding B counts as victory—shouldn't you settle B, so that you can conclusively choose between A and ¬A? But it's important to recognize that arriving at B means you've already dissolved a significant chunk of your disagreement, in that you and your partner now share a belief about the causal nature of the universe.
If B, then A. Furthermore, if ¬B, then ¬A. You've both agreed that the states of B are crucial for the states of A, and in this way your continuing "agreement to disagree" isn't just "well, you take your truth and I'll take mine," but rather "okay, well, let's see what the evidence shows." Progress! And (more importantly) collaboration!
This is where CFAR's versions of the double crux unit are currently weakest—there's some form of magic in the search for cruxes that we haven't quite locked down. In general, the method is "search through your cruxes for ones that your partner is likely to disagree with, and then compare lists." For some people and some topics, clearly identifying your own cruxes is easy; for others, it very quickly starts to feel like one's position is fundamental/objective/un-break-downable.
- Increase noticing of subtle tastes, judgments, and "karma scores." Often, people suppress a lot of their opinions and judgments due to social mores and so forth. Generally loosening up one's inner censors can make it easier to notice why we think X, Y, or Z.
- Look forward rather than backward. In places where the question "why?" fails to produce meaningful answers, it's often more productive to try making predictions about the future. For example, I might not know why I think school uniforms are a good idea, but if I turn on my narrative engine and start describing the better world I think will result, I can often sort of feel my way toward the underlying causal models.
- Narrow the scope. A specific test case of "Steve should've said hello to us when he got off the elevator yesterday" is easier to wrestle with than "Steve should be more sociable." Similarly, it's often easier to answer questions like "How much of our next $10,000 should we spend on research, as opposed to advertising?" than to answer "Which is more important right now, research or advertising?"
- Do "Focusing" and other resonance checks. It's often useful to try on a perspective, hypothetically, and then pay attention to your intuition and bodily responses to refine your actual stance. For instance: (wildly asserts) "I bet if everyone wore uniforms there would be a fifty percent reduction in bullying." (pauses, listens to inner doubts) "Actually, scratch that—that doesn't seem true, now that I say it out loud, but there is something in the vein of reducing overt bullying, maybe?"
- Seek cruxes independently before anchoring on your partner's thoughts. This one is fairly straightforward. It's also worth noting that if you're attempting to find disagreements in the first place (e.g. in order to practice double cruxing with friends) this is an excellent way to start—give everyone the same ten or fifteen open-ended questions, and have everyone write down their own answers based on their own thinking, crystallizing opinions before opening the discussion.
Overall, it helps to keep the ideal of a perfect double crux in the front of your mind, while holding the realities of your actual conversation somewhat separate. We've found that, at any given moment, increasing the "double cruxiness" of a conversation tends to be useful, but worrying about how far from the ideal you are in absolute terms doesn't. It's all about doing what's useful and productive in the moment, and that often means making sane compromises—if one of you has clear cruxes and the other is floundering, it's fine to focus on one side. If neither of you can find a single crux, but instead each of you has something like eight co-cruxes of which any five are sufficient, just say so and then move forward in whatever way seems best.
(Variant: a "trio" double crux conversation in which, at any given moment, if you're the least-active participant, your job is to squint at your two partners and try to model what each of them is saying, and where/why/how they're talking past one another and failing to see each other's points. Once you have a rough "translation" to offer, do so—at that point, you'll likely become more central to the conversation and someone else will rotate out into the squinter/translator role.)
Ultimately, each move should be in service of reversing the usual antagonistic, warlike, "win at all costs" dynamic of most disagreements. Usually, we spend a significant chunk of our mental resources guessing at the shape of our opponent's belief structure, forming hypotheses about what things are crucial and lobbing arguments at them in the hopes of knocking the whole edifice over. Meanwhile, we're incentivized to obfuscate our own belief structure, so that our opponent's attacks will be ineffective.
(This is also terrible because it means that we often fail to even find the crux of the argument, and waste time in the weeds. If you've ever had the experience of awkwardly fidgeting while someone spends ten minutes assembling a conclusive proof of some tangential sub-point that never even had the potential of changing your mind, then you know the value of someone being willing to say "Nope, this isn't going to be relevant for me; try speaking to that instead.")
If we can move the debate to a place where, instead of fighting over the truth, we're collaborating on a search for understanding, then we can recoup a lot of wasted resources. You have a tremendous comparative advantage at knowing the shape of your own belief structure—if we can switch to a mode where we're each looking inward and candidly sharing insights, we'll move forward much more efficiently than if we're each engaged in guesswork about the other person. This requires that we want to know the actual truth (such that we're incentivized to seek out flaws and falsify wrong beliefs in ourselves just as much as in others) and that we feel emotionally and socially safe with our partner, but there's a doubly-causal dynamic where a tiny bit of double crux spirit up front can produce safety and truth-seeking, which allows for more double crux, which produces more safety and truth-seeking, etc.
First and foremost, it matters whether you're in the strong version of double crux (cooperative, consent-based) or the weak version (you, as an agent, trying to improve the conversational dynamic, possibly in the face of direct opposition). In particular, if someone is currently riled up and conceives of you as rude/hostile/the enemy, then saying something like "I just think we'd make better progress if we talked about the underlying reasons for our beliefs" doesn't sound like a plea for cooperation—it sounds like a trap.
So, if you're in the weak version, the primary strategy is to embody the question "What do you see that I don't?" In other words, approach from a place of explicit humility and good faith, drawing out their belief structure for its own sake, to see and appreciate it rather than to undermine or attack it. In my experience, people can "smell it" if you're just playing at good faith to get them to expose themselves; if you're having trouble really getting into the spirit, I recommend meditating on times in your past when you were embarrassingly wrong, and how you felt prior to realizing it compared to after realizing it.
(If you're unable or unwilling to swallow your pride or set aside your sense of justice or fairness hard enough to really do this, that's actually fine; not every disagreement benefits from the double-crux-nature. But if your actual goal is improving the conversational dynamic, then this is a cost you want to be prepared to pay—going the extra mile, because a) going what feels like an appropriate distance is more often an undershoot, and b) going an actually appropriate distance may not be enough to overturn their entrenched model in which you are The Enemy. Patience- and sanity-inducing rituals recommended.)
As a further tip that's good for either version but particularly important for the weak one, model the behavior you'd like your partner to exhibit. Expose your own belief structure, show how your own beliefs might be falsified, highlight points where you're uncertain and visibly integrate their perspective and information, etc. In particular, if you don't want people running amok with wrong models of what's going on in your head, make sure you're not acting like you're the authority on what's going on in their head.
Speaking of non-sequiturs, beware of getting lost in the fog. The very first step in double crux should always be to operationalize and clarify terms. Try attaching numbers to things rather than using misinterpretable qualifiers; try to talk about what would be observable in the world rather than how things feel or what's good or bad. In the school uniforms example, saying "uniforms make students feel better about themselves" is a start, but it's not enough, and going further into quantifiability (if you think you could actually get numbers someday) would be even better. Often, disagreements will "dissolve" as soon as you remove ambiguity—this is success, not failure!
Finally, use paper and pencil, or whiteboards, or get people to treat specific predictions and conclusions as immutable objects (if you or they want to change or update the wording, that's encouraged, but make sure that at any given moment, you're working with a clear, unambiguous statement). Part of the value of double crux is that it's the opposite of the weaselly, score-points, hide-in-ambiguity-and-look-clever dynamic of, say, a public political debate. The goal is to have everyone understand, at all times and as much as possible, what the other person is actually trying to say—not to try to get a straw version of their argument to stick to them and make them look silly. Recognize that you yourself may be tempted or incentivized to fall back to that familiar, fun dynamic, and take steps to keep yourself in "scout mindset" rather than "soldier mindset."
This is the double crux algorithm as it currently exists in our handbook. It's not strictly connected to all of the discussion above; it was designed to be read in context with an hour-long lecture and several practice activities (so it has some holes and weirdnesses) and is presented here more for completeness and as food for thought than as an actual conclusion to the above.
1. Find a disagreement with another person
A case where you believe one thing and they believe the other
A case where you and the other person have different confidences (e.g. you think X is 60% likely to be true, and they think it’s 90%)
2. Operationalize the disagreement
Define terms to avoid getting lost in semantic confusions that miss the real point
Find specific test cases—instead of (e.g.) discussing whether you should be more outgoing, instead evaluate whether you should have said hello to Steve in the office yesterday morning
Wherever possible, try to think in terms of actions rather than beliefs—it’s easier to evaluate arguments like “we should do X before Y” than it is to converge on “X is better than Y.”
3. Seek double cruxes
Seek your own cruxes independently, and compare with those of the other person to find overlap
Seek cruxes collaboratively, by making claims (“I believe that X will happen because Y”) and focusing on falsifiability (“It would take A, B, or C to make me stop believing X”)
Spend time “inhabiting” both sides of the double crux, to confirm that you’ve found the core of the disagreement (as opposed to something that will ultimately fail to produce an update)
Imagine the resolution as an if-then statement, and use your inner sim and other checks to see if there are any unspoken hesitations about the truth of that statement
We think double crux is super sweet. To the extent that you see flaws in it, we want to find them and repair them, and we're currently betting that repairing and refining double crux is going to pay off better than try something totally different. In particular, we believe that embracing the spirit of this mental move has huge potential for unlocking people's abilities to wrestle with all sorts of complex and heavy hard-to-parse topics (like existential risk, for instance), because it provides a format for holding a bunch of partly-wrong models at the same time while you distill the value out of each.
Comments appreciated; critiques highly appreciated; anecdotal data from experimental attempts to teach yourself double crux, or teach it to others, or use it on the down-low without telling other people what you're doing extremely appreciated.
- Duncan Sabien
1One reason good faith is important is that even when people are "wrong," they are usually partially right—there are flecks of gold mixed in with their false belief that can be productively mined by an agent who's interested in getting the whole picture. Normal disagreement-navigation methods have some tendency to throw out that gold, either by allowing everyone to protect their original belief set or by replacing everyone's view with whichever view is shown to be "best," thereby throwing out data, causing information cascades, disincentivizing "noticing your confusion," etc.
The central assumption is that the universe is like a large and complex maze that each of us can only see parts of. To the extent that language and communication allow us to gather info about parts of the maze without having to investigate them ourselves, that's great. But when we disagree on what to do because we each see a different slice of reality, it's nice to adopt methods that allow us to integrate and synthesize, rather than methods that force us to pick and pare down. It's like the parable of the three blind men and the elephant—whenever possible, avoid generating a bottom-line conclusion until you've accounted for all of the available data.
The agent at the top mistakenly believes that the correct move is to head to the left, since that seems to be the most direct path toward the goal. The agent on the right can see that this is a mistake, but it would never have been able to navigate to that particular node of the maze on its own.
Despite all priors and appearances, our little community (the "aspiring rationality" community; the "effective altruist" project; efforts to create an existential win; etc.) has a shot at seriously helping with this puzzle. This sounds like hubris, but it is at this point at least partially a matter of track record.
To aid in solving this puzzle, we must probably find a way to think together, accumulatively.
Update December 22: Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed $589,248, making this our second-biggest fundraiser to date. Although we fell short of our target by $160,000, we have since made up this shortfall thanks to November/December donors. I’m extremely grateful for this support, and will plan accordingly for more staff growth over the coming year.
As described in our post-fundraiser update, we are still fairly funding-constrained. Donations at this time will have an especially large effect on our 2017–2018 hiring plans and strategy, as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin.
Our 2016 fundraiser is underway! Unlike in past years, we'll only be running one fundraiser in 2016, from Sep. 16 to Oct. 31. Our progress so far (updated live):
Employer matching and pledges to give later this year also count towards the total. Click here to learn more.
MIRI is a nonprofit research group based in Berkeley, California. We do foundational research in mathematics and computer science that’s aimed at ensuring that smarter-than-human AI systems have a positive impact on the world. 2016 has been a big year for MIRI, and for the wider field of AI alignment research. Our 2016 strategic update in early August reviewed a number of recent developments:
- A group of researchers headed by Chris Olah of Google Brain and Dario Amodei of OpenAI published “Concrete problems in AI safety,” a new set of research directions that are likely to bear both on near-term and long-term safety issues.
- Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, and Stuart Russell published a new value learning framework, “Cooperative inverse reinforcement learning,” with implications for corrigibility.
- Laurent Orseau of Google DeepMind and Stuart Armstrong of the Future of Humanity Institute received positive attention from news outlets and from Alphabet executive chairman Eric Schmidt for their new paper “Safely interruptible agents,” partly supported by MIRI.
- MIRI ran a three-week AI safety and robustness colloquium and workshop series, with speakers including Stuart Russell, Tom Dietterich, Francesca Rossi, and Bart Selman.
- We received a generous $300,000 donation and expanded our research and ops teams.
- We started work on a new research agenda, “Alignment for advanced machine learning systems.” This agenda will be occupying about half of our time going forward, with the other half focusing on our agent foundations agenda.
We also published new results in decision theory and logical uncertainty, including “Parametric bounded Löb’s theorem and robust cooperation of bounded agents” and “A formal solution to the grain of truth problem.” For a survey of our research progress and other updates from last year, see our 2015 review. In the last three weeks, there have been three more major developments:
- We released a new paper, “Logical induction,” describing a method for learning to assign reasonable probabilities to mathematical conjectures and computational facts in a way that outpaces deduction.
- The Open Philanthropy Project awarded MIRI a one-year $500,000 grant to scale up our research program, with a strong chance of renewal next year.
- The Open Philanthropy Project is supporting the launch of the new UC Berkeley Center for Human-Compatible AI, headed by Stuart Russell.
Things have been moving fast over the last nine months. If we can replicate last year’s fundraising successes, we’ll be in an excellent position to move forward on our plans to grow our team and scale our research activities.
Recently, multiple suspicious user accounts were created on Less Wrong. These accounts don't post any content in the forum. Instead, they are used only to send private messages to the existing users.
Many users have received a copy of the same message, but different variants exist, too. Here are the examples I know about. If you have received a different variant, please post it in a comment below this article:
Hi good day. My boss is interested on donating to MIRI's project and he is wondering if he could send money through you and you donate to miri through your company and thus accelertaing the value created. He wants to use "match donations" as a way of donating thats why he is looking for people in companies like you. I want to discuss more about this so if you could see this message please give me a reply. Thank you!
I don't know yet about anyone who replied and got scammed, so this is all based on indirect evidence. If you got scammed, please tell me. If you are ashamed, I can publish your story anonymously. Your story could help other potential victims.
Most likely, the scheme is the following:
- The scammer will send you money.
- Then they will ask some of the money back because they changed their mind, or they mistakenly sent you more than they wanted, or their financial situation suddenly changed, or whatever.
- After receiving the money from you, they will flag the original transaction as a fraud, so they get back the money they originally sent you, plus the money you sent them back. Then they disappear, or it will turn out they used a stolen identity, etc.
If you replied to the original message and now you are already in the middle of the process, please inform your bank as soon as possible! Even if the step 2 didn't happen yet, so you can still get out without losing money, warning your bank about the scammer could help other potential victims.
Warning: If you have already received a check or a payment confirmation, and someone is asking you to send the overpayment back quickly, do not send anything. The check or the payment confirmation is fake, and the goal is to make you send money before you find out. (Thanks to
View more: Next