Follow-up to:
- CFAR's new focus, and AI safety
- CFAR's new mission statement (link post; links to our website).
In the days since we published our previous post, a number of people have come up to me and expressed concerns about our new mission. Several of these had the form “I, too, think that AI safety is incredibly important — and that is why I think CFAR should remain cause-neutral, so it can bring in more varied participants who might be made wary by an explicit focus on AI.”
I would here like to reply to these people and others, and to clarify what is and isn’t entailed by our new focus on AI safety.
First: Where are CFAR’s activities affected by the cause(s) it chooses to prioritize?
Some components that people may be hoping for from “cause neutral”, that we can do, and that we intend to do:
-
We can be careful to include all information that they, from their vantage point, would want to know -- even if on our judgment, some of the information is misleading or irrelevant, or might pull them to the “wrong” conclusions.
-
Similarly, we can attempt to expose people to skilled thinkers they would want to talk with, regardless of those thinkers’ viewpoints; and we can be careful to allow their own thoughts, values, and arguments to develop, regardless of which “side” this may lead to them supporting.
-
More generally, we can and should attempt to cooperate with each student’s extrapolated volition, and to treat the student as they (from their initial epistemic vantage point; and with their initial values) would wish to be treated. Which is to say that we should not do anything that would work less well if the algorithm behind it were known, and that we should attempt to run such workshops (and to have such conversations, and so on) as would cause good people of varied initial views to stably on reflection want to participate in them.
Some components that people may be hoping for from “cause neutral”, that we can’t or won’t do:
- CFAR’s history around our mission: How did we come to change?
[1] In my opinion, I goofed this up historically in several instances, most notably with respect to Val and Julia, who joined CFAR in 2012 with the intention to create a cause-neutral rationality organization. Most integrity-gaps are caused by lack of planning rather than strategic deviousness; someone tells their friend they’ll have a project done by Tuesday and then just… doesn’t. My mistakes here seem to me to be mostly of this form. In any case, I expect the task to be much easier, and for me and CFAR to do better, now that we have a simpler and clearer mission.
I dislike CFAR's new focus, and I will probably stop my modest annual donations as a result.
In my opinion, the most important benefit of cause-neutrality is that it safeguards the integrity of the young and still-evolving methods of rationality. If it is official CFAR policy that reducing AI risk is the most important cause, and CFAR staff do almost all of their work with people who are actively involved with AI risk, and then go and do almost all of their socializing with rationalists (most of whom also place a high value on reducing AI risk), then there will be an enormous temptation to discover, promote, and discuss only those methods of reasoning that support the viewpoint that reducing AI risk is the most important value. This is bad partly because it might stop CFAR from changing its mind in the face of new evidence, but mostly because the methods that CFAR will discover (and share with the world) will be stunted -- students will not receive the best-available cognitive tools; they will only receive the best-available cognitive tools that encourage people to reduce AI risk. You might also lose out on discovering methods of (teaching) rationality that would only be found by people with different sorts of brains -- it might turn out that the sort of people who strongly prioritize friendly AI think in certain similar ways, and if you surround yourself with only those people, then you limit yourself to learning only what those people have to teach, even if you somehow maintain perfect intellectual honesty.
Another problem with focusing exclusively on AI risk is that it is such a Black Swan-type problem that it is extremely difficult to measure progress, which in turn makes it difficult to assess the value or success of any new cognitive tools. If you work on reducing global warming, you can check the global average temperature. More importantly, so can any layperson, and you can all evaluate your success together. If you work on reducing nuclear proliferation for ten years, and you haven't secured or prevented a single nuclear warhead, then you know you're not doing a good job. But how do you know if you're failing to reduce AI risk? Even if you think you have good evidence that you're making progress, how could anyone who's not already a technical expert possibly assess that progress? And if you propose to train all of the best experts in your methods, so that they learn to see you as a source of wisdom, then how many of them will retain the capacity to accuse you of failure?
I would not object to CFAR rolling out a new line of seminars that are specifically intended for people working on AI risk -- it is a very important cause, and there's something to be gained in working on a specific problem, and as you say, CFAR is small enough that CFAR can't do it all. But what I hear you saying that the mission is now going to focus exclusively on reducing AI risk. I hear you saying that if all of CFAR's top leadership is obsessed with AI risk, then the solution is not to aggressively recruit some leaders who care about other topics, but rather to just be honest about that obsession and redirect the institution's policies accordingly. That sounds bad. I appreciate your transparency, but transparency alone won't be enough to save the CFAR/MIRI community from the consequences of deliberately retreating into a bubble of AI researchers.
I see here a description of several potential costs of the new focus but no attempt to weigh those costs against the potential benefit.