LESSWRONG
LW

437
Raemon
60287Ω7674968741311
Message
Dialogue
Subscribe

LessWrong team member / moderator. I've been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I've been interested in improving my own epistemic standards and helping others to do so as well.

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
22Raemon's Shortform
Ω
8y
Ω
709
Foyle's Shortform
Raemon3h42

It doesn't feel worth the time to listen to the whole thing for me, but, if someone pulled out the highlights of "what particular new analogies are there? any particular nuances to the presentation that were interesting?" I'd be interested.

Reply
Plan 1 and Plan 2
Raemon4h30

Maybe I'm not sure what you mean by "have a respectable position."

I'm not sure either, but for example if a scientist publishes an experiment, and then another scientist with a known track record of understanding things publishes a critique, the first scientist can't respectably dismiss the critique unsubstantially.

I think:

  • there isn't consensus on what counts as a good track record of understanding things, or a good critique
    • (relatedly, there's disagreement about which epistemic norms are important)
  • A few points haven't really had a critique that interlocutors consider very substantive or clear, just sort of frustratedly rehashing the same arguments they found unpersuasive the first time.

And for at least some of those points, I'm personally like "my intuitions lean in the other direction as y'all Camp B people, but, I don't feel like I can really confidently stand by it, I don't think the argument has been made very clearly.

Things I have in mind.

On "How Hard is Success?"

  • "How anti-natural is corrigibility?" (i.e. "sure, I see some arguments for thinking corrigibility might get hard as you dial up capabilities. But, can't we just... not dial up capabilities past that point? It seems like humans understand corrigibility pretty easily when they try, it seems like Claude-et-al currently actually understand corrigibility reasonable well and if focused on training that I don't see why it wouldn't basically work?")
  • "How likely is FOOM?" (i.e. "if I believed FOOM was very likely, I'd agree we had to be a lot more careful about ramping up capabilities and being scared the next training run would be our last. But, I don't see reason to think FOOM is particularly likely, and I see reasons to think it's not.")
  • "What capabilities are needed to make a pivotally-scary demo or game-changing coordination tech?" (i.e. you maybe don't need to actually do anything that complicated to radically change how much coordination is possible for a proper controlled takeoff)

On "How Bad is Failure?"

  • "How nice is AI likely to be?" (i.e. it really only needs to be very slightly nice to give us the solar system, and it seems weird for the niceness to be "zero")
  • "How likely is whatever ends up being created to have moral value?". (i.e. consciousness is pretty confusing, seems pretty plausible that whatever ends up getting created would at least be a pretty interesting successor species)

For all of those, like, I know the arguments against, but I my own current take is not like >75% on any of these given model uncertainty, and meanwhile, if your probabilities are below 50% on the relevant MIRI-ish argument, you also have to worry about...

...

Other geopolitical concerns and considerations

  • The longer a pause went on, the more likely it is that things get unstable and something goes wrong
  • If you think alignment isn't that hard, or that sticking to a safe-but-high power level isn't that hard, you do have to take more seriously the risk of serious misuse.
  • You might think buy-in for a serious pause or controlled takeoff is basically impossible until we have seriously scary demos, and the "race to build them, then use them to rally world leaders and then burn the lead" plan might seem better than "try to pause now."
  • The sorts of things necessary for a pause seem way more likely to go badly than well (i.e. it's basically guaranteed to create a molochian bureaucratic hellscape that stifles wide ranging innovation and makes it harder to do anything sensible with AI development)
Reply1
Plan 1 and Plan 2
Raemon4h20

I am maybe starting from the assumption that sooner-or-later, alignment research would reach this point, and "well, help the alignment research progress as fast possible" seemed like a straightforward goal on the meta-level and is one of the obvious things to be shooting for whether or not it's currently tractable.

I have a current set of projects, but, the meta-level one is "look for ways to systematically improve people's ability to quickly navigate confusing technical problems, and see what works, and stack as many interventions as we can."

(I can drill into that but I'm not sure what level your skepticism was one)

Reply
leogao's Shortform
Raemon5h20

works on safety, and because international coordination seems possible, so we need to focus on regulation and policy before ASI kills everyone

Is this actually a quadrant? Or, I'm not sure I'm parsing what the axes are.

Reply
kave's Shortform
Raemon1d20

FYI the particular thing I care about here is less "our usual literal frontpage criteria", and more that people doing things that seem aimed at bypassing the frontpage criteria particularly for the purpose of getting attention on a promotional thing. (which may be somewhat different than kave's take)

Reply1
Plan 1 and Plan 2
Raemon1d40

I mean why can't the people in these LW conversations say something like "yeah the lion's share of the relevant power is held by people who don't sincerely hold A or B / Type 1/2"?

Here are some groups who I think are currently relevant (not in any particular order, without quite knowing where I'm going with this yet)

  • nVidia (I hadn't realized how huge they were till recently)
  • Sam Altman in particular
  • Demis Hassabis in particular
  • Dario Amodei in particular
  • OpenAI, Google, Google DeepMind, Anthropic, and maybe by now other leading labs (in slightly less particular than Sam/Demis/Dario)
  • The cluster of AI industry leaders of whom Sam/Demis/Dario are representative of.
  • People at labs who are basically AI researchers (who might have ever said the words "I do alignment research" because that's those are the mouthwords the culture at their company said, but, weren't meaningfully involved with safety efforts)
  • Anthropic safety engineers and similar
  • Eliezer in particular
  • cluster including MIRI / Lightcone / relatively highly bought in friends
  • Oliver Habryka in particular
  • OpenPhil
  • Dustin Moskowitz in particular
  • Jaan Tallin
  • "Constellation"
  • "AI Safety" egregore
  • "The EAgregore"
  • Future of Life Institute
  • Maybe Max Tegmark in particular, I'm not sure
  • Trump
  • MAGA
  • Elon Musk

Okay writing that out turned out to be most of the time I felt like spending right now, but, the next questions I have in mind are "who has power, here, over what?", or "what is the 'relevant'" power?

But, roughly:

a) a ton of the above are "fake" in some sense

b) On the worldscale, the OpenPhil/Constellation/Anthropic cluster is relatively weak. 

c) within OpenPhil/Constellation/Anthropic, there are people more like Dario, Holden, Jack Clark, and Dustin, and people who are more rank-and-file-EA/AI-ish. I think the latter are fake the way I think you think things are fake. I think the former are differently fake from the way I think you think things are fake.

d) there are ton of vague EA/AI-safety people that I think are fake in the way you think they are fake but they don't really matter except for The Median Researcher Problem 

Reply
New Statement Calls For Not Building Superintelligence For Now
Raemon1d30

I want that statement too but it doesn't seem like what this one's job is. This one is for establishing common knowledge "it'd be bad to build ASI under current conditions", there probably wouldn't be enough consensus that "...and that means stop building AGI" yet so it wouldn't be very useful to try.

Reply
Plan 1 and Plan 2
Raemon1d20

Yes, you have to understand that they are not doing the "have a respectable position" thing.

I think this is false for the particular people I have in mind (who to be clear are filtered for "are willing to talk to me", but, they seem like relatively central members of a significant class of people).

Maybe I'm not sure what you mean by "have a respectable position."

(I think a large chunk of the problem is "the full argument is complicated, people aren't tracking all the pieces." Which is maybe not "intellectually respectable", though IMO understandable, and importantly different from 'biased.' But, when I sit someone down and make sure to lay out all the pieces and make sure they understand each piece and understand how the pieces fit together, we still hit a few pieces where they are like 'yeah I just don't buy that.')

Maybe I should just check, are you consciously trying to deny a conflict-type stance, and consciously trying to (conflictually) assert the mistake-type stance, as a strategy?

I'm saying you seem to be conflict-stanced in a way that is inaccurate to me (i.e. you are making a mistake in your conflict)

I think it's current to be conflict-stanced, but, you need like a good model who/what the enemy is ("sniper mindset"), and the words you're saying sound to me like you don't (in a way that seems more tribally biased than you usually seem to me)

Reply
Plan 1 and Plan 2
Raemon1d209

A thing I've been thinking lately (this is reposted from a twitter thread where it was more squarely on-topic, but seems like a reasonable part of the convo here, riffing off the Tsvi thread)

It matters a fair amount which biases people have, here.

A few different biases pointing in the "Plan 2 for bad reasons" direction:

1. a desire for wealth
2. a desire to not look weird in front of your friends
3. a desire to "be important"
4. subtly different from #3, a desire to "have some measure of control over the big forces playing out."
5. a desire to be high status in the world's Big League status tier
6. action bias, i.e. inability to do nothing.
7. bias against abstract arguments that you can't clearly see/measure, or against sitting with confusion.
8. bias to think things are basically okay and you don't need to majorly change your life plans.
9. being annoyed at people who keep trying to stop you or make you feel bad or be lower status.
10. being annoyed at people who seem to be missing an important point when they argue with you about AI doom.

All of these seem in-play to me. But depending on these things' relative strength, they suggest different modes of dealing with the problem.

A reason I am optimistic about If Anyone Builds It, is because I think it has a decent chance of changing how reasonable it feels to say "yo guys I do think we might kill everyone" in front of both your friends, and high status big wigs.

This won't be sufficient to change decisionmaking at labs, or people's propensity to join labs. But I think the next biggest bias is more like "feeling important/in-control," than "having wealth."

I view this all pretty cynically. BUT, not necessarily pessimistically. If IABIED works, then, the main remaining blockers are "having an important/in-control thing to do, which groks some arguments that are more abstract."

You don't have to get rid of people's bias, or defeat them memetically. (Although those are both live options too). You can also steer towards a world where their bias becomes irrelevant.

So, while I do really wanna grab people by the collar and shout:

"Dudes, Dario is one of the most responsible parties for causing the race conditions that Anthropic uses to justify their actions, and he lied or was grossly negligent about whether Anthropic would push the capabilities frontier. If your 'well Plan 2 seems more tractable' attitude doesn't include 'also, our leader was the guy who gave the current paradigm to OpenAI, then left OpenAI, gained early resources via deception/communication-negligence and caused the current race to start in earnest' you have a missing mood and that's fucked."

...I also see part of my goal as trying to help the "real alignment work" technical field reach a point where the stuff-that-needs-doing is paradigmatic enough that you can just point at it, and the action-biased-philosophy-averse lab "safety" people can just say "oh, sure it sounds obvious when you put it like that, why didn't you say that before?"

Reply21
Plan 1 and Plan 2
Raemon1d42

I think "want to feel traction/in-control" is more obviously a bias (and people vary in whether they read to me as having this bias.). 

I think the attitude of "don't share core intuitions isn't a respectable position" is, well, idk you have that attitude if you want but I don't think it'd going to help you understand or persuade people. 

There is no clear line between Type 2 and Type 3 people, it can be true that people both have earnest intellectual positions you find frustrating but it's fundamentally an intellectual disagreement and also they can have biases that you-and-they would both agree would be bad, and the percent of causal impact of the intellectual-positions and biases can range from like 99% to 1% in either direction. 

Even among people who do seem to have followed the entire Alignment-Is-Hard arguments and understand them all, a bunch just say "yeah I don't buy that as obvious" to stop that seems obvious to me (or in some cases, which seems obviously like '>50% likely' to me). And they seem sincere to me.

This results in sad conversations where you're like 'but, clearly, I am picking up that you've got biased cope in you!' and they're like 'but, clearly, I can tell that my thinking here is not just cope, I know what my copey-thinking feels like, and the particular argument we're talking about doesn't feel like that. (and both are correct – there was cope but it lay elsewhere). 

Reply
Load More
Step by Step Metacognition
Feedbackloop-First Rationality
The Coordination Frontier
Privacy Practices
Keep your beliefs cruxy and your frames explicit
LW Open Source Guide
Tensions in Truthseeking
Project Hufflepuff
Rational Ritual
Load More (9/10)
15Early stage goal-directednesss
4d
8
71"Intelligence" -> "Relentless, Creative Resourcefulness"
19d
28
154Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.
23d
19
56</rant> </uncharitable> </psychologizing>
24d
13
83Why Corrigibility is Hard and Important (i.e. "Whence the high MIRI confidence in alignment difficulty?")
1mo
52
93The Illustrated Petrov Day Ceremony
1mo
11
102"Shut It Down" is simpler than "Controlled Takeoff"
1mo
29
72Accelerando as a "Slow, Reasonably Nice Takeoff" Story
1mo
20
196The title is reasonable
1mo
128
45Meetup Month
1mo
10
Load More
AI Consciousness
2 months ago
AI Auditing
3 months ago
(+25)
AI Auditing
3 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
Guide to the LessWrong Editor
6 months ago
(+317)
Sandbagging (AI)
7 months ago
Sandbagging (AI)
7 months ago
(+88)
AI "Agent" Scaffolds
7 months ago
Load More