Follow-up to:

In the days since we published our previous post, a number of people have come up to me and expressed concerns about our new mission.  Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.

First: Where are CFAR’s activities affected by the cause(s) it chooses to prioritize?

The question of which causes CFAR aims to help (via its rationality training) plugs into our day-to-day activities in at least 4 ways:

1)  It affects which people we target.  If AI safety is our aim, we must then backchain from “Who is likely both to impact AI safety better if they have more rationality skills, and also to be able to train rationality skills with us?” to who to target with specialized workshops.

2) It affects which rationality skills we prioritize.  AI safety work benefits from the ability to reason about abstract, philosophically confusing issues (notably: AI); which presumably benefits from various rationality skills.  Competitive marathon running probably also benefits from certain rationality skills; but they are probably different ones.   Designing an “art of rationality” that can support work on AI safety is different from designing an “art of rationality” for some other cause.  (Although see point C, below.)

3)  It affects what metrics or feedback systems we make interim use of, and how we evaluate our work.  If “AI safety via rationality training” is the mission, then “person X produced work A that looks existential risk-reducing on our best guess, and X says they would’ve been less able to do A without us” is the obvious proxy measure of whether we’re having impact.  If we have this measure, we can use our measurements of it to steer.

4) It affects explicit curriculum at AI-related or EA-related events.  E.g., it affects whether we’re allowed to run events at which participants double crux about AI safety, and whether we’re allowed to present arguments from Bostrom’s Superintelligence without also presenting a commensurate amount of analysis of global poverty interventions.

In addition to the above four effects, it has traditionally also affected: 5) what causes/opinions CFAR staff feel free to talk about when speaking informally to participants at workshops or otherwise representing CFAR.  (We used to try not to bring up such subjects.)

One thing to notice, here, is that CFAR’s mission doesn’t just affect our external face; it affects the details of our day-to-day activities.  (Or at minimum, it should affect these.)  It is therefore very important that our mission be: (a) actually important; (b) simple, intelligible, and usable by our staff on a day-to-day basis; (c) corresponding to a detailed (and, ideally, accurate) model in the heads of at least a few CFARians doing strategy (or, better, in all CFARians), so that the details of what we’re doing can in fact “cut through” to reducing existential risk.

So, okay, we just looked concretely at how CFAR’s mission (and, in particular, its prioritization of AI safety) can affect its day-to-day choices.

It’s natural next to ask what upsides people were hoping for from a (previous or imagined) “cause neutral” CFAR, and to discuss which of those upsides we can access still, and which we can’t.  I’ll start with the ones we can do.

Some components that people may be hoping for from “cause neutral”, that we can do, and that we intend to do:

A.  For students of all intellectual vantage points, we can make a serious effort to be “epistemically trustworthy relative to their starting point”.

By this I mean:
  • We can be careful to include all information that they, from their vantage point, would want to know -- even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.

  • Similarly, we can attempt to expose people to skilled thinkers they would want to talk with, regardless of those thinkers’ viewpoints; and we can be careful to allow their own thoughts, values, and arguments to develop, regardless of which “side” this may lead to them supporting.

  • More generally, we can and should attempt to cooperate with each student’s extrapolated volition, and to treat the student as they (from their initial epistemic vantage point; and with their initial values) would wish to be treated.  Which is to say that we should not do anything that would work less well if the algorithm behind it were known, and that we should attempt to run such workshops (and to have such conversations, and so on) as would cause good people of varied initial views to stably on reflection want to participate in them.

In asserting this commitment, I do not mean to assert that others should believe this of us; only that we will aim to do it.  You are welcome to stare skeptically at us about potential biases; we will not take offense; it is probably prudent.  Also, our execution will doubtless have flaws; still, we’ll appreciate it if people point such flaws out to us.

B.  We can deal forthrightly and honorably with potential allies who have different views about what is important.

That is: we can be clear and explicit about the values and beliefs we are basing CFAR’s actions on, and we can attempt to negotiate clearly and explicitly with individuals who are interested in supporting particular initiatives, but who disagree with us about other parts of our priorities.[1]

C.  We can create new “art of rationality” content at least partly via broad-based exploratory play —  and thus reduce the odds that our “art of rationality” ends up in a local optimum around one specific application. 
That is: we can follow Feynman’s lead and notice and chase “spinning plates”.  We can bring in new material by bringing in folks with very different skillsets, and seeing what happens to our art and theirs when we attempt to translate things into one another’s languages.  We can play; and we can nourish an applied rationality community that can also play.

Some components that people may be hoping for from “cause neutral”, that we can’t or won’t do:

i. Appear to have no viewpoints, in hopes of attracting people who don’t trust those with our viewpoints.

We can’t do this one.  Both CFAR as an entity and individual CFAR staff, do in fact have viewpoints; there is no high-integrity way to mask that fact.  Also, “integrity” isn’t a window-dressing that one pastes onto a thing, or a nicety that one can compromise for the sake of results; “integrity” is a word for the basic agreements that make it possible for groups of people to work together while stably trusting one another.  Integrity is thus structurally necessary if we are to get anything done at all.

All we can do is do our best to *be* trustworthy in our dealings with varied people, and assume that image will eventually track substance.  (And if image doesn’t, we can look harder at our substance, see if we may still be subtly acting in bad faith, and try again.  Integrity happens from the inside out.)

ii. Leave our views or plans stalled or vague, in cases where having a particular viewpoint would expose us to possibly being wrong (or to possibly alienating those who disagree).

Again, we can’t do this one; organizations need a clear plan for their actions to have any chance at either: i) working; or ii) banging into data and allowing one to notice that the plan was wrong.  Flinching from clearly and visibly held views is the mother of wasting time.  (Retaining a willingness to say “Oops!” and change course is, however, key.)

iii. Emphasize all rationality use cases evenly.  Cause all people to be evenly targeted by CFAR workshops.

We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

We are presently targeting all workshops at either: (a) folks who are more likely than usual to directly impact existential risk; or (b) folks who will add to a robust rationality community, and/or (c) allow us to learn more about the art (e.g., by having a different mix of personalities, skills, or backgrounds than most folks here).

Coming soon: 
  • CFAR’s history around our mission: How did we come to change?



[1] In my opinion, I goofed this up historically in several instances, most notably with respect to Val and Julia, who joined CFAR in 2012 with the intention to create a cause-neutral rationality organization.  Most integrity-gaps are caused by lack of planning rather than strategic deviousness; someone tells their friend they’ll have a project done by Tuesday and then just… doesn’t.  My mistakes here seem to me to be mostly of this form.  In any case, I expect the task to be much easier, and for me and CFAR to do better, now that we have a simpler and clearer mission.

New Comment
38 comments, sorted by Click to highlight new comments since: Today at 3:44 PM

I feel a lot of relief upon hearing that CFAR will be taking integrity and honor seriously, including being transparent about overall goals.

It's likely you'll address this in future posts, but I'm curious now. To me it seems like CFAR played a very important role in attracting people to the bay. "Come for the rationality, stay for the x-risk." I have a feeling that with this pivot it'll be harder to attract people to the community. What are your thoughts on that?

That seems fine to me as long as the people who do get attracted are selected harder for being relevant to AI safety; arguably this would be an improvement.

I'm not sure how much of this was CFAR and x-risk vs. programming and autism. Certainly a lot of the people at the SF meetup were not CFARniks based on my completely unscientific examination of my memory. The community's survival and growth is secondary to X-risk solving now, even if before the goal was to make a community devoted to these arts.

I like this and intend to continue donating to CFAR.

I dislike CFAR's new focus, and I will probably stop my modest annual donations as a result.

In my opinion, the most important benefit of cause-neutrality is that it safeguards the integrity of the young and still-evolving methods of rationality. If it is official CFAR policy that reducing AI risk is the most important cause, and CFAR staff do almost all of their work with people who are actively involved with AI risk, and then go and do almost all of their socializing with rationalists (most of whom also place a high value on reducing AI risk), then there will be an enormous temptation to discover, promote, and discuss only those methods of reasoning that support the viewpoint that reducing AI risk is the most important value. This is bad partly because it might stop CFAR from changing its mind in the face of new evidence, but mostly because the methods that CFAR will discover (and share with the world) will be stunted -- students will not receive the best-available cognitive tools; they will only receive the best-available cognitive tools that encourage people to reduce AI risk. You might also lose out on discovering methods of (teaching) rationality that would only be found by people with different sorts of brains -- it might turn out that the sort of people who strongly prioritize friendly AI think in certain similar ways, and if you surround yourself with only those people, then you limit yourself to learning only what those people have to teach, even if you somehow maintain perfect intellectual honesty.

Another problem with focusing exclusively on AI risk is that it is such a Black Swan-type problem that it is extremely difficult to measure progress, which in turn makes it difficult to assess the value or success of any new cognitive tools. If you work on reducing global warming, you can check the global average temperature. More importantly, so can any layperson, and you can all evaluate your success together. If you work on reducing nuclear proliferation for ten years, and you haven't secured or prevented a single nuclear warhead, then you know you're not doing a good job. But how do you know if you're failing to reduce AI risk? Even if you think you have good evidence that you're making progress, how could anyone who's not already a technical expert possibly assess that progress? And if you propose to train all of the best experts in your methods, so that they learn to see you as a source of wisdom, then how many of them will retain the capacity to accuse you of failure?

I would not object to CFAR rolling out a new line of seminars that are specifically intended for people working on AI risk -- it is a very important cause, and there's something to be gained in working on a specific problem, and as you say, CFAR is small enough that CFAR can't do it all. But what I hear you saying that the mission is now going to focus exclusively on reducing AI risk. I hear you saying that if all of CFAR's top leadership is obsessed with AI risk, then the solution is not to aggressively recruit some leaders who care about other topics, but rather to just be honest about that obsession and redirect the institution's policies accordingly. That sounds bad. I appreciate your transparency, but transparency alone won't be enough to save the CFAR/MIRI community from the consequences of deliberately retreating into a bubble of AI researchers.

I see here a description of several potential costs of the new focus but no attempt to weigh those costs against the potential benefit.

Well, like I said, AI risk is a very important cause, and working on a specific problem can help focus the mind, so running a series of AI-researcher-specific rationality seminars would offer the benefit of (a) reducing AI risk, (b) improving morale, and (c) encouraging rationality researchers to test their theories using a real-world example. That's why I think it's a good idea for CFAR to run a series of AI-specific seminars.

What is the marginal benefit gained by moving further along the road to specialization, from "roughly half our efforts these days happen to go to running an AI research seminar series" to "our mission is to enlighten AI researchers?" The only marginal benefit I would expect is the potential for an even more rapid reduction in AI risk, caused by being able to run, e.g., 4 seminars a quarter for AI researchers, instead of 2 for AI researchers and 2 for the general public. I would expect any such potential to be seriously outweighed by the costs I describe in my main post (e.g., losing out on rationality techniques that would be invented by people who are interested in other issues), such that the marginal effect of moving from 50% specialization to 100% specialization would be to increase AI risk. That's why I don't want CFAR to specialize in educating AI researchers to the exclusion of all other groups.

What is the marginal benefit gained by moving further along the road to specialization, from "roughly half our efforts these days happen to go to running an AI research seminar series" to "our mission is to enlighten AI researchers?" The only marginal benefit I would expect is the potential for an even more rapid reduction in AI risk, caused by being able to run, e.g., 4 seminars a quarter for AI researchers, instead of 2 for AI researchers and 2 for the general public.

Yes, I agree that this is the important question. I think there are benefits around stronger coordination among 1) CFAR staff, 2) CFAR supporters, and 3) CFAR participants around AI safety that are not captured by a quantitative increase in the number of seminars being run or whatever.

In the ideal situation, you can try to create a group of people who have common knowledge that everyone else in the group is actually dedicated to AI safety, and it allows them to coordinate better because it allows them to act and make plans under the assumption that everyone else is dedicated to AI safety, at every level of meta (e.g. when you make plans which are contingent on someone else's plans). If CFAR instead continues to publicly present as approximately cause-neutral, these assumptions shatter and people can't rely on each other and coordinate as well. I think it would be pretty difficult to attempt to quantify the benefit of doing this but I'd be skeptical of any confident and low upper bounds.

There are also benefits from CFAR signaling that it cares enough about AI safety in particular to drop cause neutrality; that could encourage some people who otherwise might not have to take the cause more seriously.

Yeah, that pretty much sums it up: do you think it's more important for rationalists to focus even more heavily on AI research so that their example will sway others to prioritize FAI, or do you think it's more important for rationalists to broaden their network so that rationalists have more examples to learn from?

Shockingly, as a lawyer who's working on homelessness and donating to universal income experiments, I prefer a more general focus. Just as shockingly, the mathematicians and engineers who have been focusing on AI for the last several years prefer a more specialized focus. I don't see a good way for us to resolve our disagreement, because the disagreement is rooted primarily in differences in personal identity.

I think the evidence is undeniable that rationality memes can help young, awkward engineers build a satisfying social life and increase their productivity by 10% to 20%. As an alum of one of CFAR's first minicamps back in 2011, I'd hoped that rationality would amount to much more than that. I was looking forward to seeing rationalist tycoons, rationalist Olympians, rationalist professors, rationalist mayors, rationalist DJs. I assumed that learning how to think clearly and act accordingly would fuel a wave of conspicuous success, which would in turn attract more resources for the project of learning how to think clearly, in a rapidly expanding virtuous cycle.

Instead, five years later, we've got a handful of reasonably happy rationalist families, an annual holiday party, and a couple of research institutes dedicated to pursuing problems that, by definition, will provide no reliable indicia of their success until it is too late. I feel very disappointed.

I think a lot of this is fair concern (I care about AI but am currently neutral/undecided on whether this change was a good one)

But I also note that "a couple research institutions" is sweeping a lot of work into deliberately innocuous sounding words.

First - we have lots of startups that aren't AI related that I think were in some fashion facilitated by the overall rationality community project (With CFAR playing a major role in pushing that project forward).

We also have Effective Altruism Global, and many wings of the EA community that have benefited from CFAR and Eliezer's original writings, which has had huge benefits to plenty of cause areas other than AI. We have your aforementioned young, awkward engineers with their 20% increase in productivity, often earning to give (often to non AI causes), or embarking on startups of their own.

Second, very credible progress has happened on AI as a result of the institutions working on AI. Elon Musk pledged $10 million to AI safety, and he did that because FLI held a conference bringing him and top AI people together, and FLI was able to do that because of a sizeable base of CFAR inspired volunteers as well as the FLI leadership having attended CFAR.

Even if everything MIRI does turns out to be worthless (which I also think is unlikely), FLI has demonstrably changed the landscape of AI safety.

do you think it's more important for rationalists to focus even more heavily on AI research so that their example will sway others to prioritize FAI, or do you think it's more important for rationalists to broaden their network so that rationalists have more examples to learn from?

I think this question implicitly assumes as a premise that CFAR is the main vehicle by which the rationality community grows. That may be more or less true now, plausibly it can become less true in the future, but most interestingly it suggests that you already understand the value of CFAR as a coordination point (for rationality in general). That's the kind of value I think CFAR is trying to generate in the future as a coordination point for AI safety in particular, because it might in fact turn out to be that important.

I sympathize with your concerns - I would love for the rationality community to be more diverse along all sorts of axes - but I worry they're predicated on a perspective on existential risk-like topics as these luxuries that maybe we should devote a little time to but that aren't particularly urgent, and that if you had a stronger sense of urgency around them as a group (not necessarily around any of them individually) you might be able to have more sympathy for people (such as the CFAR staff) who really, really just want to focus on them, even though they're highly uncertain and even though there are no obvious feedback loops, because they're important enough to work on anyway.

I am always trying to cultivate a little more sympathy for people who work hard and have good intentions! CFAR staff definitely fit in that basket. If your heart's calling is reducing AI risk, then work on that! Despite my disappointment, I would not urge anyone who's longing to work on reducing AI risk to put that dream aside and teach general-purpose rationality classes.

That said, I honestly believe that there is an anti-synergy between (a) cultivating rationality and (b) teaching AI researchers. I think each of those worthy goals is best pursued separately.

That said, I honestly believe that there is an anti-synergy between (a) cultivating rationality and (b) teaching AI researchers. I think each of those worthy goals is best pursued separately.

That seems fine to me. At some point someone might be sufficiently worried about the lack of a cause-neutral rationality organization to start a new one themselves, and that would be probably fine; CFAR would probably try to help them out. (I don't have a good sense of CFAR's internal position on whether they should themselves spin off such an organization.)

At some point someone might be sufficiently worried about the lack of a cause-neutral rationality organization to start a new one themselves, and that would be probably fine

Incidentally, if someone decides to do this please advertise here. This change in focus has made me stop my (modest) donations to CFAR. If someone started a cause-neutral rationality building institute I'd fund it, at a higher(*) level than I funded CFAR.

(*) One of the things that restrained my CFAR charity in the last few years, other than lack of money until recently, was uncertainty over their cause neutrality. They seemed to be biased in the causes they pushed for, and that gave me hesitation against funding them further. Now that they've come out of the closet on the issue I'm against giving them even 1 cent.

I like your (A)-(C), particularly (A). This seems important, and something that isn't always found by default in the world at large.

Because it's somewhat unusual, I think it's helpful to give strong signals that this is important to you. For example I'd feel happy about it being a core part of the CFAR identity, appearing in even short statements of organisational mission. (I also think this can help organisation insiders to take it even more seriously.)

On (i), it seems clearly a bad idea for staff to pretend they have no viewpoints. And if the organisation has viewpoints, it's a bad idea to hide them. I think there is a case for keeping organisational identity small -- not taking views on things it doesn't need views on. Among other things, this helps to make sure that it actually delivers on (A). But I thought the start of your post (points (1)-(4)) did a good job of explaining why there are in fact substantive benefits to having an organisational view on AI, and I'm more supportive of this than before. I still think it is worth trying to keep organisational identity relatively small, and I'm still not certain whether it would be better to have separate organisations.

Can you clarify on what sort of people you're trying to create, and the sort of people who should be pointed to attend your workshops? For example, I know people who will work on important parts of the problem (policy people, networking people, management people) but who will never do FHI or MIRI research, and I don't know whether these are people I should point your way.

(Given my experience of your content, my guess is these people would get a lot of value from your workshops, but perhaps you're planning to radically change the workshops to make them math/philosophy focused)

We would indeed love to help those people train.

We can be careful to include all information that they, from their vantage point, would want to know -- even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.

I did not understand this part.

I don't know how it plays out in the CFAR context specifically, but the sort of situation being described is this:

Alice is a social democrat and believes in redistributive taxation, a strong social safety net, and heavy government regulation. Bob is a libertarian and believes taxes should be as low as possible and "flat", safety-nets should be provided by the community, and regulation should be light or entirely absent. Bob asks Alice[1] what she knows about some topic related to government policy. Should Alice (1) provide Bob with all the evidence she can favouring the position she holds to be correct, or (2) provide Bob with absolutely all the relevant information she knows of, or (3) provide Bob with all the information she has that someone with Bob's existing preconceptions will find credible?

It's tempting to do #1. Anna is saying that CFAR will do (the equivalent of) #2 or even #3.

[1] I flipped a coin to decide who would ask whom.

Yes. Or will seriously attempt this, at least. It seems required for cooperation and good epistemic hygiene.

I like this and think it's good.

If AI safety is our aim, we must then backchain from “Who is likely both to impact AI safety better if they have more rationality skills, and also to be able to train rationality skills with us?”

Whoever is most likely to impact AI safety is someone who knows a lot about AI. Is such a person likely to be badly lacking in rationality?

ETA And could we have concrete examples of people in a position to impact AI safety?

Whoever is most likely to impact AI safety is someone who knows a lot about AI. Is such a person likely to be badly lacking in rationality?

Sure. I think selecting for knowing a lot about AI mostly selects for raw intelligence and a particular kind of curiosity, and that neither of these are all that correlated with what one might call "street rationality," except insofar as street rationality requires enough raw intelligence to reliably do metacognition. There are plenty of very intelligent people who do almost no metacognition.

And could we have concrete examples of people in a position to impact AI safety?

Elon Musk, Peter Thiel, people who work or might work at DeepMind and similar groups...

Sure. I think selecting for knowing a lot about AI mostly selects for raw intelligence and a particular kind of curiosity,

The two things you mention add up, minimally to wanting to know about AI

There is a third component to actually knowing a lot about AI, which is having succeeded in having learnt about AI, which is to say, having "won" in a certain sense. If rationality is winning, or knowing how to use raw intelligence effectively, a baseline level of rationality is indicated.

and that neither of these are all that correlated with what one might call "street rationality," except insofar as street rationality requires enough raw intelligence to reliably do metacognition. There are plenty of very intelligent people who do almost no metacognition.

Why is metacognition needed for AI safety? I can see how an average person might need to understand that they are, for instance, making anthropomorphic assumptions, but someone with a good understanding of AI would not do that..in fact, someone with hands-on experience of AI would be less biased in their assumptions about AI than someone who merely theorises about AI safety.

Salamon does not use the word "metacognition" , but does use the word "philosophical". I can see how AI safety touches on typically philosophical issues like ethics and philosophy of mind, and I can see how people with a tech background might be lacking in that kind of area. I can't see how either raw intelligence or generic thinking skills is going to help with that, since all the evidence is that domain knowledge dominates the raw and the generic. And there is an obvious answer to "AI safety needs an injection of philosophy", and that is to start a joint enterprise with both domain experts in AI and domain experts in the appropriate areas of philosophy. (Compare with the way medical ethics is done, for instance). This is something MIRI and CFAR have not ... done twice over.

And could we have concrete examples of people in a position to impact AI safety?

Elon Musk, Peter Thiel, people who work or might work at DeepMind and similar groups...

And how do you sell rationality training to them? Presumably not on the basis that they don't know how to win...

There is a third component to actually knowing a lot about AI, which is having succeeded in having learnt about AI, which is to say, having "won" in a certain sense. If rationality is winning, or knowing how to use raw intelligence effectively, a baseline level of rationality is indicated.

Have you heard the anecdote about Kahneman and the planning fallacy? It's from Thinking Fast and Slow, and deals with him creating curriculum to teach judgment and decision-making in high school. He puts together a team of experts, they meet for a year, and have a solid outline. They're talking about estimating uncertain quantities, and he gets the bright idea of having everyone estimate how long it will take them until they submit a finished draft to the Ministry of Education. He solicits everyone's probabilities using one of the approved-by-research methods they're including in the curriculum, and their guesses are tightly centered around two years (ranging from about 1.5 to 2.5).

Then he decides to employ the outside view, and asks the curriculum expert how long it took similar teams in the past. That expert realizes that, in the past, about 40% of similar teams gave up and never finished; those who finished, no one took less than seven years to finish. (Kahneman tries to rescue them by asking about skills and resources, and turns out that this team is below average, but not by much.)

We should have quit that day. None of us was willing to invest six more years of work in a project with a 40% chance of failure. Although we must have sensed that persevering was not reasonable, the warning did not provide an immediately compelling reason to quit. After a few minutes of desultory debate, we gathered ourselves together and carried on as if nothing had happened. The book was eventually completed eight(!) years later.


It seems to me that if the person who discovered the planning fallacy is unable to make basic use of the planning fallacy when plotting out projects, a general sense that experts know what they're doing and are able to use their symbolic manipulation skills on their actual lives is dangerously misplaced. If it is a bad idea to publish things about decision theory in academia (because the costs outweigh the benefits, say) then it will only be bad decision-makers who publish on decision theory!

If we live in a world where the discover of the planning fallacy can fall victim to it, we live in a world where teachers of rationality fail to improve anyone's rationality skills.

This conclusion is way too strong. To just give one way: there's a big space of possibilities where discovering the planning fallacy in fact makes you less susceptible to the planning fallacy, but not immune.

Actually, if the CFAR could reliably reduce susceptibility to the planning fallacy, they are wasting their time with AI safety--they could be making a fortune teaching their methods to the software industry, or engineers in general.

Wow, I've read the story but I didn't quite realize the irony of it being a textbook (not a curriuculum, a textbook, right?) about judgment and decision making.

There is a third component to actually knowing a lot about AI, which is having succeeded in having learnt about AI, which is to say, having "won" in a certain sense. If rationality is winning, or knowing how to use raw intelligence effectively, a baseline level of rationality is indicated.

To speak from my own personal experience, I know a lot of math, and mostly the reason I know a lot of math is a combination of raw intelligence and teachers pushing me hard in that direction (for which I'm very grateful). I used almost no metacognition that I can remember; people just shoved topics in my direction and I got curious about and thought about some of them a lot. (But I did not, for example, do any thinking about where my curiosity should be aimed and why, nor did I spend time explicitly brainstorming ways I could be learning math faster or anything like that.)

Why is metacognition needed for AI safety? I can see how an average person might need to understand that they are, for instance, making anthropomorphic assumptions, but someone with a good understanding of AI would not do that..in fact, someone with hands-on experience of AI would be less biased in their assumptions about AI than someone who merely theorises about AI safety.

This is not at all clear to me. I think you underestimate how compartmentalized the thinking of even very intelligent academics can be.

And how do you sell rationality training to them? Presumably not on the basis that they don't know how to win...

You can try convincing them that CFAR teaches skills that they don't have that would help them in some way. In any case some kind of pitch was good enough for Max Tegmark and Jaan Tallinn, both of whom attended workshops and then played a role in making the Puerto Rico conference happen and founding FLI, along with a few other CFAR alumni. My impression is that this event was more or less responsible for getting Elon Musk on board with AI safety as a cause, which in return did a lot to normalize AI safety as a topic people could talk about publicly.

Can you answer the question : why is metacognition needed for AI safety?

If there are patterns in your thinking that are consistently causing you to think things that are not true, metacognition is the general tool by which you can notice that and try to correct the situation.

To be more specific, I can very easily imagine AI researchers not believing that AI safety is an issue due to something like cognitive dissonance: if they admitted that AI safety was an issue, they'd be admitting that what they're working on is dangerous and maybe shouldn't be worked on, which contradicts their desire to work on it. The easiest way to resolve the cognitive dissonance, and the most socially acceptable way barring people like Stuart Russell publicly pumping in the other direction, is to dismiss the concern as Luddite fear-mongering. This is the sort of thing you can try to notice and correct about yourself with the right metacognitive tools.

To make another analogy with math, I have never once heard a mathematics graduate student or professor speculate, publicly or privately, about the extent to which pure mathematics is mostly useless and overfunded. This is unsayable among mathematicians, maybe even unthinkable.

If there are patterns in your thinking that are consistently causing you to think things that are not true, metacognition is the general tool by which you can notice that and try to correct the situatio

And it there isnt that problem, there is no need for that solution. For your argument to go through, you need to show that people likely to be impactive on AI safety are likely to have cognitive problems that affect them when they are doing AI safety. (Saying something like "academics are irrational because some of them believe in God" isn't enough." Compartmentalised beliefs are unimpactive because compartmentalised. Instrumental rationality is not epistemic rationality ).

To be more specific, I can very easily imagine AI researchers not believing that AI safety is an issue due to something like cognitive dissonance:

I dare say

if they admitted that AI safety was an issue, they'd be admitting that what they're working on is dangerous and maybe shouldn't be worked on, which contradicts their desire to work on it.

I can easily imagine an AI safety researcher maintaining a false belief that AI safety is a huge deal, because if they didn't they would be a nobody working on a non-problem. Funny how you can make logic run in more than one direction.

And how do you sell rationality training to them?

That's why you don't sell them a rationality workshop but a workshop for rationally thinking about AGI risks.

iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.

We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

This surprised me, since I think of rationality as the general principles of truth-finding.

What have you found about the degree to which rationality instruction needs to be tailored to a use-case?

Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I don't think that AI safety is important, which I guess makes me one of the "more varied participants made wary by an explicit focus on AI." Happy you're being explicit about your goals but I don't like them.