In response to Wei Dai's claim that a multi-post 2009 Less Wrong discussion on gender issues and offensive speech went well, MIRI researcher Evan Hubinger writes—
Do you think having that debate online was something that needed to happen for AI safety/x-risk? Do you think it benefited AI safety at all? I'm genuinely curious. My bet would be the opposite—that it caused AI safety to be more associated with political drama that helped further taint it.
Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)
The cognitive algorithm of "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." wouldn't have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).
An analogy: it's actually easier to build a calculator that does correct arithmetic than it is to build a "triskaidekaphobic calculator" that does "correct arithmetic, except that it never displays the result 13", because the simplest implementation of the latter is just a calculator plus an extra conditional that puts something else on the screen when the real answer would have been 13.
If you don't actually understand how arithmetic works, but you feel intense social pressure to produce a machine that never displays the number 13, I don't think you actually succeed at building a triskaidekaphobic calculator: you're trying to solve a problem under constraints that make it impossible to solve a strictly easier problem.
Similarly, I conjecture that it's actually easier to build a rationality/alignment research community that does systematically correct reasoning, than it is to build a Catholic rationality/alignment research community that does "systematically correct reasoning, except never saying anything the Pope disagrees with." The latter is a strictly harder problem: you have to somehow both get the right answer, and throw out all of the steps of your reasoning that the Pope doesn't want you to say.
You're absolutely right that figuring out how politics and the psychology of offense work doesn't directly help increase the power and prestige of the "AI safety" research agenda. It's just that the caliber of thinkers who can solve AGI alignment should also be able to solve politics and the psychology of offense, much as how a calculator that can compute 1423 + 1389
should also be able to compute 6 + 7
.
Both sides make good points. One side being Zack, and the other side being everyone else. :D
Instead of debating object-level stuff here, everyone talks in metaphors. Which is supposed to be a good way to avoid political mindkilling. Except that mostly everyone knows what the metaphors represent, so I doubt this really works. And it seems to me that rationality requires to look at the specific things. So, do I wish that people stopped using metaphors and addressed the underlying specific topics? Aaah... yes and no. Yes, because it seems to me there is otherwise no way to come to a meaningful conclusion. No, because that would invite other people, encourage people on Twitter to share carefully selected screenshots, and make everyone worry about having parts of their text quoted out of context. So maybe the metaphors actually do something useful by adding extra complexity.
In real life, I'd say: "Ok guys, let's sit in this room, everyone turn off their recording devices, and let's talk, with the agreement that what happens in this room stays in this room." Which is exactly the thing that is difficult to do online. (On the second thought, is it? What about a chat, where only selected people can join, but everyone gets assigned a random nickname, and perhaps the nicknames also reset randomly in the middle of conversation...)
Paul Graham recommends: "Draw a sharp line between your thoughts and your speech. Inside your head, anything is allowed. [...] But, as in a secret society, nothing that happens within the building should be told to outsiders. The first rule of Fight Club is, you do not talk about Fight Club."
The problem is, how to apply this to an online community, where anything that happens automatically has a written record; and how to allow new members to join the community without making everything that happens there automatically public. (How would you keep the Fight Club secret when people have smartphones?)
As a semi-outsider, rationalists seem remarkably unlikely to altruistically punish each other for this sort of casual betrayal. (This is a significant part of why I've chosen to remain a semi-outsider by only participating online.)