So8res - LessWrong

A case for courage, when speaking of AI danger

(From a moderation perspective:

I consider the following question-cluster to be squarely topical: "Suppose one believes it is evil to advance AI capabilities towards superintelligence, on the grounds that such a superintelligence would quite likely to kill us all. Suppose further that one fails to unapologetically name this perceived evil as 'evil', e.g. out of a sense of social discomfort. Is that a failure of courage, in the sense of this post?"
I consider the following question-cluster to be a tangent: "Suppose person X is contributing to a project that I believe will, in the future, cause great harms. Does person X count as 'evil'? Even if X agrees with me about which outcomes are good and disagrees about the consequences of the project? Even if the harms of the project have not yet occurred? Even if X would not be robustly harmful in other circumstances? What if X thinks they're trying to nudge the project in a less-bad direction?"
I consider the following sort of question to be sliding into the controversy attractor: "Are people working at AI companies evil?"

The LW mods told me they're considering implementing a tool to move discussions to the open thread (so that they may continue without derailing the topical discussions). FYI @habryka: if it existed, I might use it on the tangents, idk. I encourage people to pump against the controversy attractor.)

A case for courage, when speaking of AI danger

So8res11d1717

(The existence of exceptions is why I said "most anyone" instead of "anyone".)

A case for courage, when speaking of AI danger

So8res11d*4836

To be clear, my recommendation for SB-1047 was not "be basically the same bill but talk about extinction risks and levy a few more restrictions on the labs", but rather "focus very explicitly on the extinction threat; say 'this bill is trying to address a looming danger described by a variety of scientists and industry leaders' or suchlike, shape the bill differently to actually address the extinction threat straightforwardly".

I don't have a strong take on whether SB-1047 would have been more likely to pass in that world. My recollection is that, back when I attempted to give this advice, I said I thought it would make the bill less likely to pass but more likely to have good effects on the conversation (in addition to it being much more likely to matter in cases where it did pass). But that could easily be hindsight bias; it's been a few years. And post facto, the modern question of what is "more likely" depends a bunch on things like how stochastic you think Newsom is (we already observed that he vetoed the real bill, so I think there's a decent argument that a bill with different content has a better chance even if it's a lower than our a-priori odds on SB-1047), though that's a digression.

I do think that SB-1047 would have had a substantially better effect on the conversation if it was targeted towards the "superintelligence is on track to kill us all" stuff. I think this is a pretty low bar because I think that SB-1047 had an effect that was somewhere between neutral and quite bad, depending on which follow-on effects you attribute to it. Big visible bad effects that I think you can maybe attribute to it are Cruz and Vance polarizing against (what they perceived as) attempts to regulate a budding normal tech industry, and some big dems also solidifying a position against doing much (e.g. Newsom and Pelosi). More insidiously and less clearly, I suspect that SB-1047 was a force holding the Overton window together. It was implicitly saying "you can't talk about the danger that AI kills everyone and be taken seriously" to all who would listen. It was implicitly saying "this is a sort of problem that could be pretty-well addressed by requiring labs to file annual safety reports" to all who would listen. I think these are some pretty false and harmful memes.

With regards to the Overton window shifting: I think this effect is somewhat real, but I doubt it has as much importance as you imply.

For one thing, I started meeting with various staffers in the summer of 2023, and the reception I got is a big part of why I started pitching Eliezer on the world being ready for a book (a project that we started in early 2024). Also, the anecdote in the post is dated to late 2024 but before o3 or DeepSeek. Tbc, it did seem to me like the conversation changed markedly in the wake of DeepSeek, but it changed from a baseline of elected officials being receptive in ways that shocked onlookers.

For another thing, in my experience, anecdotes like "the AI cheats and then hides it" or experimental results like "the AI avoids shutdown sometimes" are doing as much if not more of the lifting as capabilities advances. (Though I think that's somewhat of a digression.)

For a third thing, I suspect that one piece of the puzzle you're missing is how much the Overton window has been shifting because courageous people have been putting in the legwork for the last couple years. My guess is that the folks putting these demos and arguments in front of members of congress are a big part of why we're seeing the shift, and my guess is that the ones who are blunt and courageous are causing the shift to happen moreso (and are causing it to happen in a better direction).

I'm worried about the people who go in and talk only about (e.g.) AI-enabled biorisk while avoiding saying a word about superintelligence or loss-of-control. I think this happens pretty often and that it comes with a big opportunity cost in the best cases, and that it's actively harmful in the worst cases -- when (e.g.) it reinforces a silly Overton window, or when it shuts down some congress member's budding thoughts about the key problems, or when it orients them towards silly issues. I also think it spends down future credibility; I think it risks exasperating them when you try to come back next year and say that we're on track to all die. I also think that the lack of earnestness is fishy in a noticeable way (per the link in the OP).

[edited for clarity and to fix typos, with apologies about breaking the emoji-reaction highlights]

A case for courage, when speaking of AI danger

So8res12d*4433

I don't think most anyone who's studied the issues at hand thinks the chance of danger is "really small", even among people who disagree with me quite a lot (see e.g. here). I think folks who retreat to arguments like "you should pay attention to this even if you think there's a really small chance of it happening" are doing a bunch of damage, and this is one of many problems I attribute to a lack of this "courage" stuff I'm trying to describe.

When I speak of "finding a position you have courage in", I do not mean "find a position that you think should be logically unassailable." I'm apparently not doing a very good job at transmitting the concept, but here's some positive and negative examples:

✓ "The race towards superintelligence is ridiculously risky and I don't think humanity should be doing it."
✓ "I'm not talking about a tiny risk. On my model, this is your most likely cause of death."
✓ "Oh I think nobody should be allowed to race towards superintelligence, but I'm trying to build it anyway because I think I'll do it better than the next guy. Ideally all AI companies in the world should be shut down, though, because we'd need way more time to do this properly." (The action is perhaps a bit cowardly, but the statement is courageous, if spoken by someone for whom it's true.)

✗ "Well we can all agree that it'd be bad if AIs were used to enable terrorists to make bioweapons" (spoken by someone who thinks the chance is quite high).
✗ "Even if you think the chance of it happening is very small, it's worth focusing on, because the consequences are so huge" (spoken by someone who believes the danger is substantial).
✗ "In some unlikely but extreme cases, these companies put civilization at risk, and the companies should be responsible for managing those tail risks" (spoken by someone who believes the danger is substantial).

One litmus test here is: have you communicated the real core of the situation and its direness as you perceive it? Not like "have you caused them as much concern as you can manage to", more like "have you actually just straightforwardly named the key issue". (There's also some caveats here about how, if you think there's a lowish chance of disaster because you think humanity will come to its senses and change course, then this notion of "courage" I'm trying to name still entails communicating how humanity is currently on course for a full-fledged disaster, without mincing words.)

A case for courage, when speaking of AI danger

So8res12d*4828

A few claims from the post (made at varying levels of explicitness) are:

1 . Often people are themselves motivated by concern X (ex: "the race to superintelligence is reckless and highly dangerous") and decide to talk about concern Y instead (ex: "AI-enabled biorisks"), perhaps because they think it is more palatable.

2 . Focusing on the "palatable" concerns is a pretty grave mistake.

2a. The claims Y are often not in fact more palatable; people are often pretty willing to talk about the concerns that actually motivate you.

2b. When people try talking about the concerns that actually motivate them while loudly signalling that they think their ideas are shameful and weird, this is not a terribly good test of claim (2a).

2c. Talking about claims other than the key ones comes at a steep opportunity cost.

2d. Talking about claims other than the key ones risks confusing people who are trying to make sense of the situation.

2e. Talking about claims other than the key ones risks making enemies of allies (when those would-be allies agree about the high-stakes issues and disagree about how to treat the mild stuff).

2f. Talking about claims other than the key ones triggers people's bullshit detectors.

3 . Nate suspects that many people are confusing "I'd be uncomfortable saying something radically different from the social consensus" with "if I said something radically different from the social consensus then it would go over poorly", and that this conflation is hindering their ability to update on the evidence.

3a. Nate is hopeful that evidence of many people's receptiveness to key concerns will help address this failure.

3b. Nate suspects that various tropes and mental stances associated with the word "couarge" are perhaps a remedy to this particular error, and hopes that advice like "speak with the courage of your convictions" is helpful for remembering the evidence in (3a) and overcoming the error of (3).

I think the people I know who worked on SB-1047 are totally happy to say "it's ridiculous that these companies don't have any of the types of constraints that might help mitigate extreme risks from their work" without wavering

I don't think this is in much tension with my model.

For one thing, that whole sentence has a bunch of the property I'd call "cowardice". "Risks" is how one describes tail possibilities; if one believes that a car is hurtling towards a cliff-edge, it's a bit cowardly to say "I think we should perhaps talk about gravity risks" rather than "STOP". And the clause "help mitigate extreme risks from their work" lets the speaker pretend the risks are tiny on Monday and large on Tuesday; it doesn't extend the speaker's own neck.

For another thing, willingness to say that sort of sentence when someone else brings up the risks (or to say that sort of sentence in private) is very different from putting the property I call "courage" into the draft legislation itself.

I observe that SB-1047 itself doesn't say anything about a big looming extinction threat that requires narrowly targeted legislation. It maybe gives the faintest of allusions to it, and treads no closer than that. The bill lacks the courage of the conviction "AI is on track to ruin everything." Perhaps you believe this simply reflects the will of Scott Weiner. (And for the record: I think it's commendable that Senator Weiner put forth a bill that was also trying to get a handle on sub-extinction threats, though it's not what I would have done.) But my guess is that the bill would be written very differently if the authors believed that the whole world knew how insane and reckless the race to superintelligence is. And "write as you would if your ideas were already in the Overton window" is not exactly what I mean by "have the courage of your convictions", but it's close.

(This is also roughly my answer to the protest "a lot of the people in D.C. really do care about AI-enabled biorisk a bunch!". If the whole world was like "this race to superintelligence is insane and suicidal; let's start addressing that", would the same people be saying "well our first priority should be AI-enabled biorisk; we can get to stopping the suicide race later"? Because my bet is that they're implicitly focusing on issues that they think will fly, and I think that this "focus on stuff you think will fly" calculation is gravely erroneous and harmful.)

As for how the DC anecdote relates: it gives an example of people committing error (1), and it provides fairly direct evidence for claims (2a) and (2c). (It also provided evidence for (3a) and (3b), in that the people at the dinner all expressed surprise to me post-facto, and conceptualized this pretty well in terms of 'courage', and have been much more Nate!courageous at future meetings I've attended, to what seem to me like good effects. Though I didn't spell those bits out in the original post.)

I agree that one could see this evidence and say "well it only shows that courage works for that exact argument in that exact time period" (as is mentioned in a footnote, and as is a running theme throughout the post). Various other parts of the post provide evidence for other claims (e.g. the Vance, Cruz, and Sacks references provide evidence for (2d), (2e), and (2f)). I don't expect this post to be wholly persuasive, and indeed a variant of it has been sitting in my drafts folder for years. I'm putting it out now in part (because I am trying to post more in the lead-up to the book and in part) because folks have started saying things like "this is slightly breaking my model of where things are in the overton window", which causes me to think that maybe the evidence has finally piled up high enough that people can start to internalize hypothesis (3), even despite how bad and wrong it might feel for them to (e.g.) draft legislation in accordance with beliefs of theirs that radically disagree with perceived social consensus.

A case for courage, when speaking of AI danger

So8res13d*1713

Huh! I've been in various conversations with elected officials and have had the sense that most people speak without the courage of their convictions (which is not quite the same thing as "confidence", but which is more what the post is about, and which is the property I'm more interested in discussing in this comment section, and one factor of the lack of courage is broadcasting uncertainty about things like "25% vs 90+%" when they could instead be broadcasting confidence about "this is ridiculous and should stop"). In my experience, it's common to the point of others expressing explicit surprise when someone does and it works (as per the anecdote in the post).

I am uncertain to what degree we're seeing very different conversations, versus to what degree I just haven't communicated the phenomena I'm talking about, versus to what degree we're making different inferences from similar observations.

A case for courage, when speaking of AI danger

So8res13d4629

I don't think the weed/local turf wars really cause the problems here, why do you think that?

The hypothesized effect is: people who have been engaged in the weeds/turf wars think of themselves as "uncertain" (between e.g. the 25%ers and the 90+%ers) and forget that they're actually quite confident about some proposition like "this whole situation is reckless and crazy and Earth would be way better off if we stopped". And then there's a disconnect where (e.g.) an elected official asks a local how bad things look, and they answer while mentally inhabiting the uncertain position ("well I'm not sure whether it's 25%ish or 90%ish risk"), and all they manage to communicate is a bunch of wishy-washy uncertainty. And (on this theory) they'd do a bunch better if they set aside all the local disagreements and went back to the prima-facie "obvious" recklessness/insanity of the situation and tried to communicate about that first. (It is, I think, usually the most significant part to communicate!)

A case for courage, when speaking of AI danger

So8res13d5230

I agree that it's usually helpful and kind to model your conversation-partner's belief-state (and act accordingly).

And for the avoidance of doubt: I am not advocating that anyone pretend they think something is obvious when they in fact do not.

By "share your concerns as if they’re obvious and sensible", I was primarily attempting to communicate something more like: I think it's easy for LessWrong locals to get lost in arguments like whether AI might go fine because we're all in a simulation anyway, or confused by turf wars about whether AI has a 90+% chance of killing us or "only" a ~25% chance. If someone leans towards the 90+% model and gets asked their thoughts on AI, I think it's worse for them to answer in a fashion that's all wobbly and uncertain because they don't want to be seen as overconfident against the 25%ers, and better for them to connect back to the part where this whole situation (where companies are trying to build machine superintelligence with very little idea of what they're doing) is wacky and insane and reckless, and speak from there.

I don't think one should condescend about the obviousness of it. I do think that this community is generally dramatically failing to make the argument "humanity is building machine superintelligence while having very little idea of what it's doing, and that's just pretty crazy on its face" because it keeps getting lost in the weeds (or in local turf wars).

And I was secondarily attempting to communicate something like: I think our friends in the policy world tend to cede far too much social ground. A bunch of folks in DC seem to think that the views of (say) Yann LeCun and similar is the scientific consensus with only a few radicals protesting, whereas the actual facts is that "there's actually a pretty big problem here" is much closer to consensus, and that a lack of scientific consensus is a negative sign rather than a positive sign in a situation like this one (because it's an indication that the scientific field has been able to get this far without really knowing what the heck it's doing, which doesn't bode well if it goes farther). I think loads of folks are mentally framing themselves as fighting for an unpopular fringe wacky view when that's not the case, and they're accidentally signaling "my view is wacky and fringe" in cases where that's both false and harmful.

(I was mixing both these meanings into one sentence because I was trying to merely name my old spiels rather than actually giving them, because presenting the old spiels was not the point of the post. Perhaps I'll edit the OP to make this point clearer, with apologies to future people for confusion caused if the lines that Buck and I are quoting have disappeared.)

A case for courage, when speaking of AI danger

So8res14d157

That doesn't spark any memories (and people who know me rarely describe my conversational style as "soft and ever-so-polite"). My best guess is nevertheless that this tweet is based on a real event (albeit filtered through some misunderstandings, e.g. perhaps my tendency to talk with a tone of confidence was misinterpreted as a status game; or perhaps I made some hamfisted attempt to signal "I don't actually like talking about work on dates" and accidentally signaled "I think you're dumb if you don't already believe these conclusions I'm reciting in response to your questions").

To be quite clear: I endorse everyone thinking through the AI danger arguments on their own, no matter what anyone else says to them and no matter what tone they say it in.

All that said, I don't quite see how any of this relates to the topic at hand, and so I'll go ahead and delete this comment thread in the morning unless I'm compelled by argument not to.

Futarchy's fundamental flaw

So8res25d74

Are you claiming that this is mistaken, or rather that this is correct but it's not a problem?

mistaken.

But if you like money, you’ll pay more for a contract on coin B.

this is an invalid step. it's true in some cases but not others, depending on how the act of paying for a contract on coin B (with no additional knowledge of whether it's double-headed) affects the chance that the market tosses coin B.

LESSWRONG
LW

Sequences

Posts

Wikitag Contributions

Comments