Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality”

AnnaSalamon

65 Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality”

by AnnaSalamon

12th Dec 2016

6 min read

65

Follow-up to:

CFAR's new focus, and AI safety
CFAR's new mission statement (link post; links to our website).

In the days since we published our previous post, a number of people have come up to me and expressed concerns about our new mission. Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”

I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.

First: Where are CFAR’s activities affected by the cause(s) it chooses to prioritize?

The question of which causes CFAR aims to help (via its rationality training) plugs into our day-to-day activities in at least 4 ways:

1) It affects which people we target. If AI safety is our aim, we must then backchain from “Who is likely both to impact AI safety better if they have more rationality skills, and also to be able to train rationality skills with us?” to who to target with specialized workshops.

2) It affects which rationality skills we prioritize. AI safety work benefits from the ability to reason about abstract, philosophically confusing issues (notably: AI); which presumably benefits from various rationality skills. Competitive marathon running probably also benefits from certain rationality skills; but they are probably different ones. Designing an “art of rationality” that can support work on AI safety is different from designing an “art of rationality” for some other cause. (Although see point C, below.)

3) It affects what metrics or feedback systems we make interim use of, and how we evaluate our work. If “AI safety via rationality training” is the mission, then “person X produced work A that looks existential risk-reducing on our best guess, and X says they would’ve been less able to do A without us” is the obvious proxy measure of whether we’re having impact. If we have this measure, we can use our measurements of it to steer.

4) It affects explicit curriculum at AI-related or EA-related events. E.g., it affects whether we’re allowed to run events at which participants double crux about AI safety, and whether we’re allowed to present arguments from Bostrom’s Superintelligence without also presenting a commensurate amount of analysis of global poverty interventions.

In addition to the above four effects, it has traditionally also affected: 5) what causes/opinions CFAR staff feel free to talk about when speaking informally to participants at workshops or otherwise representing CFAR. (We used to try not to bring up such subjects.)

One thing to notice, here, is that CFAR’s mission doesn’t just affect our external face; it affects the details of our day-to-day activities. (Or at minimum, it should affect these.) It is therefore very important that our mission be: (a) actually important; (b) simple, intelligible, and usable by our staff on a day-to-day basis; (c) corresponding to a detailed (and, ideally, accurate) model in the heads of at least a few CFARians doing strategy (or, better, in all CFARians), so that the details of what we’re doing can in fact “cut through” to reducing existential risk.

So, okay, we just looked concretely at how CFAR’s mission (and, in particular, its prioritization of AI safety) can affect its day-to-day choices.

It’s natural next to ask what upsides people were hoping for from a (previous or imagined) “cause neutral” CFAR, and to discuss which of those upsides we can access still, and which we can’t. I’ll start with the ones we can do.

Some components that people may be hoping for from “cause neutral”, that we can do, and that we intend to do:

A. For students of all intellectual vantage points, we can make a serious effort to be “epistemically trustworthy relative to their starting point”.

By this I mean:

We can be careful to include all information that they, from their vantage point, would want to know -- even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.
Similarly, we can attempt to expose people to skilled thinkers they would want to talk with, regardless of those thinkers’ viewpoints; and we can be careful to allow their own thoughts, values, and arguments to develop, regardless of which “side” this may lead to them supporting.
More generally, we can and should attempt to cooperate with each student’s extrapolated volition, and to treat the student as they (from their initial epistemic vantage point; and with their initial values) would wish to be treated. Which is to say that we should not do anything that would work less well if the algorithm behind it were known, and that we should attempt to run such workshops (and to have such conversations, and so on) as would cause good people of varied initial views to stably on reflection want to participate in them.

In asserting this commitment, I do not mean to assert that others should believe this of us; only that we will aim to do it. You are welcome to stare skeptically at us about potential biases; we will not take offense; it is probably prudent. Also, our execution will doubtless have flaws; still, we’ll appreciate it if people point such flaws out to us.

B. We can deal forthrightly and honorably with potential allies who have different views about what is important.

That is: we can be clear and explicit about the values and beliefs we are basing CFAR’s actions on, and we can attempt to negotiate clearly and explicitly with individuals who are interested in supporting particular initiatives, but who disagree with us about other parts of our priorities.[1]

C. We can create new “art of rationality” content at least partly via broad-based exploratory play — and thus reduce the odds that our “art of rationality” ends up in a local optimum around one specific application.

That is: we can follow Feynman’s lead and notice and chase “spinning plates”. We can bring in new material by bringing in folks with very different skillsets, and seeing what happens to our art and theirs when we attempt to translate things into one another’s languages. We can play; and we can nourish an applied rationality community that can also play.

Some components that people may be hoping for from “cause neutral”, that we can’t or won’t do:

i. Appear to have no viewpoints, in hopes of attracting people who don’t trust those with our viewpoints.

We can’t do this one. Both CFAR as an entity and individual CFAR staff, do in fact have viewpoints; there is no high-integrity way to mask that fact. Also, “integrity” isn’t a window-dressing that one pastes onto a thing, or a nicety that one can compromise for the sake of results; “integrity” is a word for the basic agreements that make it possible for groups of people to work together while stably trusting one another. Integrity is thus structurally necessary if we are to get anything done at all.

All we can do is do our best to *be* trustworthy in our dealings with varied people, and assume that image will eventually track substance. (And if image doesn’t, we can look harder at our substance, see if we may still be subtly acting in bad faith, and try again. Integrity happens from the inside out.)

ii. Leave our views or plans stalled or vague, in cases where having a particular viewpoint would expose us to possibly being wrong (or to possibly alienating those who disagree).

Again, we can’t do this one; organizations need a clear plan for their actions to have any chance at either: i) working; or ii) banging into data and allowing one to notice that the plan was wrong. Flinching from clearly and visibly held views is the mother of wasting time. (Retaining a willingness to say “Oops!” and change course is, however, key.)

iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.

We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

We are presently targeting all workshops at either: (a) folks who are more likely than usual to directly impact existential risk; or (b) folks who will add to a robust rationality community, and/or (c) allow us to learn more about the art (e.g., by having a different mix of personalities, skills, or backgrounds than most folks here).

Coming soon:

CFAR’s history around our mission: How did we come to change?

[1] In my opinion, I goofed this up historically in several instances, most notably with respect to Val and Julia, who joined CFAR in 2012 with the intention to create a cause-neutral rationality organization. Most integrity-gaps are caused by lack of planning rather than strategic deviousness; someone tells their friend they’ll have a project done by Tuesday and then just… doesn’t. My mistakes here seem to me to be mostly of this form. In any case, I expect the task to be much easier, and for me and CFAR to do better, now that we have a simpler and clearer mission.

Center for Applied Rationality (CFAR)Cause Prioritization

Frontpage

65

New Comment

Rendering 0/38 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:42 PM

Moderation Log

65 Further discussion of CFAR’s focus on AI safety, and the good things folks wanted from “cause neutrality”

by AnnaSalamon

12th Dec 2016

6 min read

65

Follow-up to:

CFAR's new focus, and AI safety
CFAR's new mission statement (link post; links to our website).

I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.

First: Where are CFAR’s activities affected by the cause(s) it chooses to prioritize?

The question of which causes CFAR aims to help (via its rationality training) plugs into our day-to-day activities in at least 4 ways:

So, okay, we just looked concretely at how CFAR’s mission (and, in particular, its prioritization of AI safety) can affect its day-to-day choices.

Some components that people may be hoping for from “cause neutral”, that we can do, and that we intend to do:

A. For students of all intellectual vantage points, we can make a serious effort to be “epistemically trustworthy relative to their starting point”.

By this I mean:

We can be careful to include all information that they, from their vantage point, would want to know -- even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.
Similarly, we can attempt to expose people to skilled thinkers they would want to talk with, regardless of those thinkers’ viewpoints; and we can be careful to allow their own thoughts, values, and arguments to develop, regardless of which “side” this may lead to them supporting.
More generally, we can and should attempt to cooperate with each student’s extrapolated volition, and to treat the student as they (from their initial epistemic vantage point; and with their initial values) would wish to be treated. Which is to say that we should not do anything that would work less well if the algorithm behind it were known, and that we should attempt to run such workshops (and to have such conversations, and so on) as would cause good people of varied initial views to stably on reflection want to participate in them.

B. We can deal forthrightly and honorably with potential allies who have different views about what is important.

Some components that people may be hoping for from “cause neutral”, that we can’t or won’t do:

i. Appear to have no viewpoints, in hopes of attracting people who don’t trust those with our viewpoints.

ii. Leave our views or plans stalled or vague, in cases where having a particular viewpoint would expose us to possibly being wrong (or to possibly alienating those who disagree).

iii. Emphasize all rationality use cases evenly. Cause all people to be evenly targeted by CFAR workshops.

We can’t do this one either; we are too small to pursue all opportunities without horrible dilution and failure to capitalize on the most useful opportunities.

Coming soon:

CFAR’s history around our mission: How did we come to change?

Center for Applied Rationality (CFAR)Cause Prioritization

Frontpage

65

Mentioned in

51CFAR’s new focus, and AI Safety

18Guided Mental Change Requires High Trust

New Comment

Rendering 0/38 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 11:42 PM

Moderation Log

More from AnnaSalamon

Curated and popular this week

38Comments

Comment Permalink

Mass_Driver10y90

I dislike CFAR's new focus, and I will probably stop my modest annual donations as a result.

In my opinion, the most important benefit of cause-neutrality is that it safeguards the integrity of the young and still-evolving methods of rationality. If it is official CFAR policy that reducing AI risk is the most important cause, and CFAR staff do almost all of their work with people who are actively involved with AI risk, and then go and do almost all of their socializing with rationalists (most of whom also place a high value on reducing AI risk), then there will be an enormous temptation to discover, promote, and discuss only those methods of reasoning that support the viewpoint that reducing AI risk is the most important value. This is bad partly because it might stop CFAR from changing its mind in the face of new evidence, but mostly because the methods that CFAR will discover (and share with the world) will be stunted -- students will not receive the best-available cognitive tools; they will only receive the best-available cognitive tools that encourage people to reduce AI risk. You might also lose out on discovering methods of (teaching) rationality that would only be found by people with different sorts of brains -- it might turn out that the sort of people who strongly prioritize friendly AI think in certain similar ways, and if you surround yourself with only those people, then you limit yourself to learning only what those people have to teach, even if you somehow maintain perfect intellectual honesty.

Another problem with focusing exclusively on AI risk is that it is such a Black Swan-type problem that it is extremely difficult to measure progress, which in turn makes it difficult to assess the value or success of any new cognitive tools. If you work on reducing global warming, you can check the global average temperature. More importantly, so can any layperson, and you can all evaluate your success together. If you work on reducing nuclear proliferation for ten years, and you haven't secured or prevented a single nuclear warhead, then you know you're not doing a good job. But how do you know if you're failing to reduce AI risk? Even if you think you have good evidence that you're making progress, how could anyone who's not already a technical expert possibly assess that progress? And if you propose to train all of the best experts in your methods, so that they learn to see you as a source of wisdom, then how many of them will retain the capacity to accuse you of failure?

I would not object to CFAR rolling out a new line of seminars that are specifically intended for people working on AI risk -- it is a very important cause, and there's something to be gained in working on a specific problem, and as you say, CFAR is small enough that CFAR can't do it all. But what I hear you saying that the mission is now going to focus exclusively on reducing AI risk. I hear you saying that if all of CFAR's top leadership is obsessed with AI risk, then the solution is not to aggressively recruit some leaders who care about other topics, but rather to just be honest about that obsession and redirect the institution's policies accordingly. That sounds bad. I appreciate your transparency, but transparency alone won't be enough to save the CFAR/MIRI community from the consequences of deliberately retreating into a bubble of AI researchers.

Qiaochu_Yuan10y70

I see here a description of several potential costs of the new focus but no attempt to weigh those costs against the potential benefit.

See in context