Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk

Zack_M_Davis

77 Free Speech and Triskaidekaphobic Calculators: A Reply to Hubinger on the Relevance of Public Online Discussion to Existential Risk

21st Dec 2019

2 min read

77

In response to Wei Dai's claim that a multi-post 2009 Less Wrong discussion on gender issues and offensive speech went well, MIRI researcher Evan Hubinger writes—

Do you think having that debate online was something that needed to happen for AI safety/x-risk? Do you think it benefited AI safety at all? I'm genuinely curious. My bet would be the opposite—that it caused AI safety to be more associated with political drama that helped further taint it.

Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)

The cognitive algorithm of "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." wouldn't have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).

An analogy: it's actually easier to build a calculator that does correct arithmetic than it is to build a "triskaidekaphobic calculator" that does "correct arithmetic, except that it never displays the result 13", because the simplest implementation of the latter is just a calculator plus an extra conditional that puts something else on the screen when the real answer would have been 13.

If you don't actually understand how arithmetic works, but you feel intense social pressure to produce a machine that never displays the number 13, I don't think you actually succeed at building a triskaidekaphobic calculator: you're trying to solve a problem under constraints that make it impossible to solve a strictly easier problem.

Similarly, I conjecture that it's actually easier to build a rationality/alignment research community that does systematically correct reasoning, than it is to build a Catholic rationality/alignment research community that does "systematically correct reasoning, except never saying anything the Pope disagrees with." The latter is a strictly harder problem: you have to somehow both get the right answer, and throw out all of the steps of your reasoning that the Pope doesn't want you to say.

You're absolutely right that figuring out how politics and the psychology of offense work doesn't directly help increase the power and prestige of the "AI safety" research agenda. It's just that the caliber of thinkers who can solve AGI alignment should also be able to solve politics and the psychology of offense, much as how a calculator that can compute 1423 + 1389 should also be able to compute 6 + 7.

PoliticsCensorship

Frontpage

77

Mentioned in

47If Clarity Seems Like Death to Them

New Comment

25 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:24 PM

[-]Scott Alexander5y570

Maybe I'm misunderstanding you, but I'm not getting why having the ability to discuss involves actually discussing. Compare two ways to build a triskaidekaphobic calculator.

1. You build a normal calculator correctly, and at the end you add a line of code IF ANSWER == 13, PRINT: "ERROR: IT WOULD BE IMPOLITE OF ME TO DISCUSS THIS PARTICULAR QUESTION".

2. You somehow invent a new form of mathematics that "naturally" never comes up with the number 13, and implement it so perfectly that a naive observer examining the calculator code would never be able to tell which number you were trying to avoid.

Imagine some people who were trying to take the cosines of various angles. If they used method (1), they would have no problem, since cosines are never 13. If they used method (2), it's hard for me to imagine exactly how this would work but probably they would have a lot of problems.

It sounds like the proposal you're arguing against (and which I want to argue for) - not talking about taboo political issues on LW - is basically (1). We discuss whatever we want, we use logic which (we hope) would output the correct (taboo) answer on controversial questions, but if for some reason those questions come up (which they shouldn't, because they're pretty different from AI-related questions), we instead don't talk about them. If for some reason they're really relevant to some really important issue at some point, then we take the hit for that issue only, with lots of consultation first to make sure we're not stuck in the Unilateralist's Curse.

This seems like the right answer even in the metaphor - if people burned down calculator factories whenever any of their calculators displayed "13", and the sorts of problems people used calculators for almost never involved 13, just have the calculator display an error message at that number.

(...plus doing other activism and waterline-raising work to deal with the fact that your society is insane, but that work isn't going to look like having your calculators display 13 and dying when your factory burns down)

[-]Zack_M_Davis5y460

not talking about taboo political issues on LW

Sure, I'm happy to have separate discussion forums for different topics. For example, I wouldn't want people talking about football on /r/mylittlepony—that would be crazy!^[1]

"Take it to /r/TheMotte, you guys" is not that onerous of a demand, and it's a demand I'm happy to support: I really like the Less Wrong æsthetic of doing everything at the meta level.^[2]

But Hubinger seems to argue that the demand should be, "Take it offline," and that seems extremely onerous to me.

The operative principle here is "Permalink or It Didn't Happen": if it's not online, does it really exist? I mean, okay, there's a boring literal sense in which it "exists", but does it exist in a way that matters?

If they used method (2), it's hard for me to imagine exactly how this would work but probably they would have a lot of problems.

The problem is that between the massive evidential entanglement between facts, the temptation to invent fake epistemology lessons to justify conclusions that you couldn't otherwise get on the merits, and the set of topics that someone has an interest in distorting being sufficiently large, I think we do end up with the analogue of nonsense-math in large areas of psychology, sociology, political science, history, &c. Which is to say, life.

In terms of the calculator metaphor, imagine having to use a triskaidekaphobic calculator multiple times as part of solving a complicated problem with many intermediate results. Triskaidekaphobia doesn't just break your ability to compute 6 + 7. It breaks your ability to compute the infinite family of expressions that include 13 as an intermediate result, like (6 + 7) + 1. It breaks the associativity of addition, because now you can't count on (6 + 7) + 1 being the same as 6 + (7 + 1).^[3] And so on.

Also, what Interstice said.

with lots of consultation first to make sure we're not stuck in the Unilateralist's Curse.

So, I agree that regression to the mean exists, which implies that the one who unilaterally does a thing is likely to be the one who most overestimated the value of the thing. I'm very suspicious of this "Unilateralist's Curse" meme being motivatedly and selectively wielded as an excuse for groupthink and conformity, to police and intimidate anyone in your cult^[4] who tries to do anything interesting that might damage the "rationalist" or "effective altruism" brand names.

In the context of existential risk, regression to the mean recommends a principle of conformity because you don't want everyone deciding independently that their own AI is probably safe. But if we're talking about censorship about boring ordinary things like political expression or psychology research (not nukes or bioweapons), I suspect considering regression to the mean makes policy recommendations in the other direction.^[5] I've been meaning to write a post to be titled "First Offender Models and the Unilateralist's Blessing": if shattering a preference-falsification equilibrium would be good for Society, but requires some brave critical group to eat the upfront cost, then that group is likely to be the one that most underestimated the costs to themselves. Maybe we should let them!

if people burned down calculator factories whenever any of their calculators displayed "13"

If.

On the other hand, if people threatened to burn down factories that produced correct calculators but were obviously bluffing, or if they TPed the factories, then calculator manufacurers who care about correct arithmetic might find it better policy to say, "I don't negotiate with terrorists!^[6] Do your worst, you superstitious motherfuckers!"

It would be nice if calculator manufacturers with different risk-tolerances or decision theories could manage to cooperate with each other. For a cute story about a structually similar scenario in which that kind of cooperation doesn't emerge, see my latest Less Wrong post, "Funk-tunul's Legacy; Or, The Legend of the Extortion War."^[7]

At this point the hypothetical adversary in my head is saying, "Zack, you're being motivatedly dense—you know damned well why that example of separate forums isn't analogous!" I reply, "Yeah, sorry, sometimes the text-generating process in my head is motivatedly dense to make a rhetorical point when I understand the consideration my interlocutor is trying to bring up, but I consider it non-normative, the sort of thing an innocent being wouldn't understand. Call it angelic irony, after the angels in Unsong who can't understand deception. It's not intellectually dishonest if I admit I'm doing it in a footnote." ↩︎
Although as Wei Dai points out, preceded by an earlier complaint by Vanessa Kosoy, this does carry the cost of encouraging hidden agendas. ↩︎
This is the part where a pedant points out that real-world floating-point numbers (which your standard desk calculator uses) aren't associative anyway. I hope there aren't any pedants on this website! ↩︎
In an earlier draft of this comment, this phrase was written in the first person: "our cult." (Yes, this is a noncentral usage of the word cult, but I think the hyperlink to "Every Cause Wants To Be", and this footnote, is adequate to clarify what I mean.) On consideration, the second person seems more appropriate, because by now I think I've actually reached the point of pseudo-ragequitting the so-called "rationalist" community. "Pseudo" because once you've spent your entire adult life in a cult, you can't realistically leave, because your vocabulary has been trained so hard on the cult's foundational texts that you can't really talk to anyone else. Instead, what happens is you actually become more active in intra-cult discourse, except being visibly contemptuous about it (putting the cult's name in scare quotes, using gratuitous cuss words, being inappropriately socially-aggressive to the cult leaders, &c.). ↩︎
But I have pretty intense psychological reasons to want to believe this, so maybe you shouldn't believe me until I actually come up with the math. ↩︎
I tend to use this slogan and appeals to timeless decision theory a lot in the context of defying censorship (example), but I very recently realized that this was kind of stupid and/or intellectually dishonest of me. The application of decision theory to the real world can get very complicated very quickly: if the math doesn't turn out the way I hope, am I actually going to change my behavior? Probably not. Therefore I shouldn't pretend that my behavior is the result of sophisticated decision-theoretic computations on my part, when the real explanation is a raw emotional disposition that might be usefully summarized in English as, "Do your worst, you motherfuckers!" That disposition probably is the result of a sophisticated decision-theoretic computation—it's just that it was a distributed computation that took place over thousands of years in humanity's environment of evolutionary adaptedness. ↩︎
But you should be suspicious of the real-world relevance of my choice of modeling assumptions in accordance with the psychological considerations in the previous two footnotes, especially since I kind of forced it because it's half past one in the morning and I really really wanted to shove this post out the door. ↩︎

[-]Scott Alexander5y230

I agree much of psychology etc are bad for the reasons you state, but this doesn't seem to be because everyone else has fried their brains by trying to simulate how to appease triskaidekaphobics too much. It's because the actual triskaidekaphobics are the ones inventing the psychology theories. I know a bunch of people in academia who do various verbal gymnastics to appease the triskaidekaphobics, and when you talk to them in private they get everything 100% right.

I agree that most people will not literally have their buildings burned down if they speak out against orthodoxies (though there's a folk etymology for getting fired which is relevant here). But I appreciate Zvi's sequence on super-perfect competition as a signpost of where things can end up. I don't think academics, organization leaders, etc. are in super-perfect competition the same way middle managers are, but I also don't think we live in the world where everyone has infinite amounts of slack to burn endorsing taboo ideas and nothing can possibly go wrong.

[-]Zack_M_Davis5y340

when you talk to them in private they get everything 100% right.

I'm happy for them, but I thought the point of having taxpayer-funded academic departments was so that people who aren't insider experts can have accurate information with which to inform decisions? Getting the right answer in private can only help those you talk to in private.

I also don't think we live in the world where everyone has infinite amounts of slack to burn endorsing taboo ideas and nothing can possibly go wrong.

Can you think of any ways something could possibly go wrong if our collective map of how humans work fails to reflect the territory?

(I drafted a vicious and hilarious comment about one thing that could go wrong, but I fear that site culture demands that I withhold it.)

[-]steven04615y110

"Take it to /r/TheMotte, you guys" is not that onerous of a demand, and it's a demand I'm happy to support

I'd agree having political discussions in some other designated place online is much less harmful than having them here, but on the other hand, a quick look at what's being posted on the Motte doesn't support the idea that rationalist politics discussion has any importance for sanity on more general topics. If none of it had been posted, as far as I can tell, the rationalist community wouldn't have been any more wrong on any major issue.

[-]philh5y70

real-world floating-point numbers (which your standard desk calculator uses)

Not that it matters, but I expect (and Google seems to confirm) most calculators will use something else, mostly fixed point decimal-based arithmetic. I don't offhand know if that's associative.

I hope there aren’t any pedants on this website!

Well this is awkward.

[-]interstice5y*220

In the analogy, it's only possible to build a calculator that outputs the right answer on non-13 numbers because you already understand the true nature of addition. It might be more difficult if you were confused about addition, and were trying to come up with a general theory by extrapolating from known cases -- then, thinking 6 + 7 = 15 could easily send you down the wrong path. In the real world, we're similarly confused about human preferences, mind architecture, the nature of politics, etc., but some of the information we might want to use to build a general theory is taboo. I think that some of these questions are directly relevant to AI -- e.g. the nature of human preferences is relevant to building an AI to satisfy those preferences, the nature of politics could be relevant to reasoning about what the lead-up to AGI will look like, etc.

[-]Wei Dai5y410

The cognitive algorithm of “Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c.” wouldn’t have led us to noticing the alignment problem, and I would be pretty surprised if it were sufficient to solve it (although that would be very convenient).

This seems to be straw-manning Evan. I'm pretty sure he would explicitly disagree with "Assume my current agenda is the most important thing, and then execute whatever political strategies are required to protect its social status, funding, power, un-taintedness, &c." and I don't see much evidence he is implicitly doing this either. In my own case I think taintedness is one consideration that I have to weigh among many and I don't see what's wrong with that. Practically, this means trying to find some way of talking about political issues on LW or an LW-adjacent space while limiting how much the more political discussions taint the more technical discussions or taint everyone who is associated with LW in some way.

I'm not sure what your own position is on the "should we discuss object-level politics on LW" question. Are you suggesting that we should just go ahead and do it without giving any weight to taintedness? (You didn't fully spell out the analogy with the Triskaidekaphobic Calculator, but it seems like you're saying that worrying about taintedness while trying to solve AI safety is like trying to make a calculator while worrying about triskaidekaphobia, so we shouldn't do that?)

I've been saying that I don't see how things work out well if x-risk / AI safety people don't get better at thinking about, talking about, and doing politics. I'm doubtful that I'm understanding you correctly, but if I am, I also don't see how things work out well if LW starts talking about politics without any regard to taintedness and as a result x-risk / AI safety becomes politically radioactive to most people who have to worry about conventional politics.

[-]Zack_M_Davis5y160

I'm not sure what your own position is on the "should we discuss object-level politics on LW" question. Are you suggesting that we should just go ahead and do it without giving any weight to taintedness?

No. (Sorry, I guess this isn't clear at all from the post, which was written hastily and arguably should have just been a mere comment on the "Against Premature Abstraction" thread; I wanted to make it a top-level post for psychological reasons that I probably shouldn't elaborate on because (a) you probably don't care, and (b) they reflect poorly on me. Feel free to downvote if I made the wrong call.)

You didn't fully spell out the analogy with the Triskaidekaphobic Calculator, but it seems like you're saying that worrying about taintedness while trying to solve AI safety is like trying to make a calculator while worrying about triskaidekaphobia, so we shouldn't do that?

More like—we should at least be aware that worrying about taintedness is making the task harder and that we should be on the lookout for ways to strategize around that (e.g., encouraging the use of a separate forum and pseudonyms for non-mathy topics, having "mutual defense pact" norms where curious thinkers support each other rather than defaulting to "Well, you should've known better than to say that" victim-blaming, &c.).

[-]Ben Pace5y240

Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)

Yep, this is true.

(Well, I don't think EY + NB just wanted true beliefs, I think they were also specifically asking themselves like "Which questions are important to ask?" and focused on things like nanotech and EMs and AGI, but I think your key point is that they also had to push against a lot of political tides in academia and other circles, and that cow-towing to such pressures universally will kill you, not literally but kinda.)

But, I will add that I think there's a general variable we can track a bit, which is whether a topic is pulling along a major/common dimension in the conversational/political tug-o-war, or whether the idea is trying to pull the rope sideways. I tend to be more sympathetic to conversations that are pulling it sideways, and want to be more charitable and give space to such ideas, than to ideas that are being debated on every other space on the internet (example).

I feel some desire to say specifically to you Zack, that obviously I don't think this means that you should allow people to all agree with one side of such a topic; you shouldn't pick a side, but instead dissuade arguments for either side equally. Jim Babcock was the person who taught me the rule that if you get punished for saying X, then even if you believe not-X, it is generally unethical to argue for not-X, as the other side isn't allowed to counter.

[-]evhub5y200

I'm not really interested in debating this on LessWrong, for basically the exact reasons that I stated in the first place, which is that I don't really think these sorts of conversations can be done effectively online. Thus, I probably won't try to respond to any replies to this comment.

At the very least, though, I think it's worth clarifying that my position is certainly not "assume what you're doing is the most important thing and run with it." Rather, I think that trying to think really hard about the most important things to be doing is an incredibly valuable exercise, and I think the effective altruism community provides a great model of how I think that should be done. The only thing I was advocating was not discussing hot-button political issues specifically online. I think to the extent that those sorts of things are relevant to doing the most good, they should be done offline, where the quality of the discussion can be higher and nobody ends up tainted by other people's beliefs by association.

[-]steven04615y140

Okay, but the reason you think AI safety/x-risk is important is because twenty years ago, people like Eliezer Yudkowsky and Nick Bostrom were trying to do systematically correct reasoning about the future, noticed that the alignment problem looked really important, and followed that line of reasoning where it took them—even though it probably looked "tainted" to the serious academics of the time. (The robot apocalypse is nigh? Pftt, sounds like science fiction.)

Those subjects were always obviously potentially important, so I don't see this as evidence against a policy of picking one's battles by only arguing for unpopular truths that are obviously potentially important.

[-]Zack_M_Davis5y30

Hm, touché. Although ... if "the community" were actually following a policy of strategically arguing for things based on importance-times-neglectedness, I would expect to see a lot more people working on eugenics, which looks really obviously potentially important to me, either on a Christiano-esque Outside View (smarter humans means relatively more human optimization power steering the future rather than unalignable machine-learning algorithms), or a hard-takeoff view (smarter humans sooner means more time to raise alignment-researcher tykebombs). Does that seem right or wrong to you? (Feel free to email or PM me.)

[-]Matt Goldenberg5y30

Importance * Neglectedness - Reputation hit might be more accurate.

[-]Zack_M_Davis5y20

I was thinking that reputation-hit contributes to neglectedness. Maybe what we really need is a way to reduce reputational "splash damage", so that people with different levels of reputation risk-tolerance can work together or at least talk to each other (using, for example, a website).

[-]Charlie Steiner5y100

AI value alignment is a hard problem, definitely. But it has one big advantage over politics, if we're shopping for problems: it hasn't been quite so optimized by memetic evolution for ability to take over a conversation.

I think talking about gender issues on LW, that time everyone talked about mostly gender issues for a while, was good (not as solving anything, but as a political act on the object level). But also, saying things like "we should be able to solve politics" is how you get struck by memetic Zeus' lightning. SSC has a subreddit and another subreddit for a reason, and that reason isn't because rationalists are so good at solving politics that they need two whole subreddits to do it in.

[-]Pattern5y70

I thought it was because the founder got tired of being harassed.

[-]FactorialCode5y80

I think the whole thing is a very relevant case study. It should be investigated if we want to develop methods that will allow rationalists to discuss and understand object level politics. All CW threads were initially quarantined to a single thread on the subreddit. Then the entire thread needed to be moved to a separate subreddit.

Building a calculator that adds 6+7 properly can be exceptionally difficult in an environment full of Triskaidekaphobes who want to smash it and harass it's creators.

[-]Chris_Leong5y*80

One worry I have about having a norm of not discussing politics is that sometimes politics will affect the rationality community. For example, the rationality community will have to deal with claims of harassment or making people feeling unwelcome. How you handle that probably depends on your stance on feminism. So avoiding discussing politics could make these decisions worse. (That said, I'm not actually claiming that discussing these issues would lead to an improvement, rather than everyone just sticking to whatever view they had before)

[-]Viliam5y60

Debating politics is meta-complicated, because even the decisions on whether/how to debate politics can themselves be (or be interpreted as) moves in the political game.

"You don't want to have a debate about X? That means you agree with the majority opinion on X! While pretending to be neutral! And these people call themselves 'rationalists' without anyone calling you out on such blatant political move?" (Heh, heh, I just made them debate X anyway, again.)

[-]Viliam5y40

Both sides make good points. One side being Zack, and the other side being everyone else. :D

Instead of debating object-level stuff here, everyone talks in metaphors. Which is supposed to be a good way to avoid political mindkilling. Except that mostly everyone knows what the metaphors represent, so I doubt this really works. And it seems to me that rationality requires to look at the specific things. So, do I wish that people stopped using metaphors and addressed the underlying specific topics? Aaah... yes and no. Yes, because it seems to me there is otherwise no way to come to a meaningful conclusion. No, because that would invite other people, encourage people on Twitter to share carefully selected screenshots, and make everyone worry about having parts of their text quoted out of context. So maybe the metaphors actually do something useful by adding extra complexity.

In real life, I'd say: "Ok guys, let's sit in this room, everyone turn off their recording devices, and let's talk, with the agreement that what happens in this room stays in this room." Which is exactly the thing that is difficult to do online. (On the second thought, is it? What about a chat, where only selected people can join, but everyone gets assigned a random nickname, and perhaps the nicknames also reset randomly in the middle of conversation...)

Paul Graham recommends: "Draw a sharp line between your thoughts and your speech. Inside your head, anything is allowed. [...] But, as in a secret society, nothing that happens within the building should be told to outsiders. The first rule of Fight Club is, you do not talk about Fight Club."

The problem is, how to apply this to an online community, where anything that happens automatically has a written record; and how to allow new members to join the community without making everything that happens there automatically public. (How would you keep the Fight Club secret when people have smartphones?)

[-]bgaesop5y210

real life, I'd say: "Ok guys, let's sit in this room, everyone turn off their recording devices, and let's talk, with the agreement that what happens in this room stays in this room."

The one time I did this with rationalists, the person (Adam Widmer) who organized the event and explicitly set forth the rule you just described, then went on to remember what people had said and bring it up publicly later in order to shake them into changing their behavior to fit his (if you'll excuse me speaking ill of the dead) spoiled little rich boy desires.

So my advice, based on my experience, and which my life would have been noticably better had someone told me before, is: DON'T do this, and if anyone suggests doing this, stop trusting them and run away

Which is not to say that you are untrustworthy and trying to manipulate people into revealing sensitive information so you can use it to manipulate them; in order for me to confidently reach that conclusion, you'd have to actually attempt to organize such an event, not just casually suggest one on the internet

[-]Viliam5y110

Well, that sucks. Good point that no matter what the rules are, people can simply break them. The more you think about the details of the rules, the easier you forget that the rules do not become physical law.

Though I'd expect social consequences for breaking such rules to be quite severe. Which again, deters some kinds of people more, and some of them less.

[-]clone of saturn5y190

As a semi-outsider, rationalists seem remarkably unlikely to altruistically punish each other for this sort of casual betrayal. (This is a significant part of why I've chosen to remain a semi-outsider by only participating online.)

[-]Pattern5y-10

encourage people on Twitter to share carefully selected screenshots

So maybe the metaphors actually do something useful by adding extra complexity.

This sounds like security through obscurity. Just use encryption.

, and make everyone worry about having parts of their text quoted out of context.

One way of dealing with this is offense.

What about a chat, where only selected people can join, but everyone gets assigned a random nickname, and perhaps the nicknames also reset randomly in the middle of conversation...

Sounds a bit like 4chan. Some variations:

1) No names.

2) Participants are randomly selected.

The problem is, how to apply this to an online community, where anything that happens automatically has a written record;

3) Ditch the written record.

4) Have the chat run forever. The names keep changing.

Moderation Log