In response to Wei Dai's claim that a multi-post 2009 Less Wrong discussion on gender issues and offensive speech went well, MIRI researcher Evan Hubinger writes—
Do you think having that debate online was something that needed to happen for AI safety/x-risk? Do you think it benefited AI safety at all? I'm genuinely curious. My bet would be the opposite—that it caused AI safety to be more associated with political drama that helped further taint it.
Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)
The cognitive algorithm of "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." wouldn't have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).
An analogy: it's actually easier to build a calculator that does correct arithmetic than it is to build a "triskaidekaphobic calculator" that does "correct arithmetic, except that it never displays the result 13", because the simplest implementation of the latter is just a calculator plus an extra conditional that puts something else on the screen when the real answer would have been 13.
If you don't actually understand how arithmetic works, but you feel intense social pressure to produce a machine that never displays the number 13, I don't think you actually succeed at building a triskaidekaphobic calculator: you're trying to solve a problem under constraints that make it impossible to solve a strictly easier problem.
Similarly, I conjecture that it's actually easier to build a rationality/alignment research community that does systematically correct reasoning, than it is to build a Catholic rationality/alignment research community that does "systematically correct reasoning, except never saying anything the Pope disagrees with." The latter is a strictly harder problem: you have to somehow both get the right answer, and throw out all of the steps of your reasoning that the Pope doesn't want you to say.
You're absolutely right that figuring out how politics and the psychology of offense work doesn't directly help increase the power and prestige of the "AI safety" research agenda. It's just that the caliber of thinkers who can solve AGI alignment should also be able to solve politics and the psychology of offense, much as how a calculator that can compute 1423 + 1389
should also be able to compute 6 + 7
.
In the analogy, it's only possible to build a calculator that outputs the right answer on non-13 numbers because you already understand the true nature of addition. It might be more difficult if you were confused about addition, and were trying to come up with a general theory by extrapolating from known cases -- then, thinking 6 + 7 = 15 could easily send you down the wrong path. In the real world, we're similarly confused about human preferences, mind architecture, the nature of politics, etc., but some of the information we might want to use to build a general theory is taboo. I think that some of these questions are directly relevant to AI -- e.g. the nature of human preferences is relevant to building an AI to satisfy those preferences, the nature of politics could be relevant to reasoning about what the lead-up to AGI will look like, etc.