LESSWRONG
LW

habryka
45636Ω17812685362117
Message
Dialogue
Subscribe

Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com. 

(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
[Meta] New moderation tools and moderation guidelines
habryka10h*20

Thanks for the follow-up! I talked with Scott about LW moderation a long time ago (my guess is around 2019) and Said's name came up then. My guess is he doesn't remember. It wasn't an incredibly intense mention, but we were talking about what makes LW comment sections good or bad, and he was a commenter we discussed in that conversation in 2019 or so.

I think you can clearly see how the Jacob Falkovich one is complicated. He basically says "I used to be frustrated by you, but this thing made that a lot better". I don't remember the exact time I talked to Jacob about it, but it had come up sometime some context where we discussed LW comment sections. It's plausible to me it was before he made this comment, though it would be a bit surprising to me, since that's pretty early into LW's history.

Reply
tlevin's Shortform
habryka1d20

It did cause my probability to go from 20% to 80%, so it definitely helped! 

Reply
If Anyone Builds It, Everyone Dies: Advertisement design competition
habryka1d52

PLEASE PLEASE PLEASE stop being paranoid about hyperstition. It's fine, it almost never happens. Most things happen for boring reasons, not because of some weird self-fulfilling prophecy. Hyperstition is rare and weird and usually not a real concern. If bad futures are likely, say that. If bad futures are unlikely, say that. Do not worry too much about how much your prediction will shift the outcome, it very rarely does, and the anxiety of whether it does is not actually making anything better.

Reply
tlevin's Shortform
habryka1d*20

I am genuinely uncertain whether this is a joke. 

We do happen to have had had the great Air Conditioner War of 2022: https://www.lesswrong.com/posts/MMAK6eeMCH3JGuqeZ/everything-i-need-to-know-about-takeoff-speeds-i-learned 

Reply
Zach Stein-Perlman's Shortform
habryka2d127

I'm worried about risk from current models but because it's a bad sign about noticing risks when warning signs appear, being honest about risk/safety even when it makes you look bad, etc.

I agree that this is the key dimension, but I don't currently think RSPs are a great vehicle for that. Indeed, looking at the regulatory advocacy of a company seems like a much better indicator, since I expect that to have a bigger effect on the conversation about risk/safety than the RSP and eval results (though it's not overwhelmingly clear to me). 

And again, many RSPs and eval results seem to me to be active propaganda, and so are harmful on this dimension, and it's better to do nothing than to be harmful in this way (though I agree that if xAI said they would do a think and then didn't, then that is quite bad).

I guess your belief "no actions that seem at all plausible for any current AI company to take have really any chance of making it so that it's non-catastrophic for them to develop and deploy systems much smarter than humans" is a crux; I disagree, and so I care about marginal differences in risk-preparedness.

Makes sense. I am not overwhelmingly confident there isn't something control-esque to be done here, though that's the only real candidate I have, and it's not currently clear to me that current safety evals positively correlate with control interventions being easier or harder or even more likely to be implemented. For example, my sense is having models be trained for harmlessness makes them worse for control interventions, you would much rather have pure helpful + honest models.

Reply
Zach Stein-Perlman's Shortform
habryka2d2018

I'm surprised if you think that variance in regulatory outcomes is not just more important than variance in what-a-company-does outcomes but also sufficiently tractable for the marginal company that it's the key question.

Huh, I am not sure what you mean. I am surprised if you think that I think that marginal frontier-lab safety research is making progress on the hard parts of the alignment problem. I've been pretty open that I think valuable progress on that dimension has been close to zero. 

This doesn't mean actions of an AI company do not matter, but I do think that no actions that seem at all plausible for any current AI company to take have really any chance of making it so that it's non-catastrophic for them to develop and deploy systems much smarter than humans. So the key dimension is how much the actions of the labs are doing things that might prevent smarter than human systems from being deployed in the near future. 

I think there are roughly two dimensions of variance where I do think AI lab behavior has a big effect on that: 

  • Do they advocate for reasonable regulatory action and speak openly about the catastrophic risk from AI
    • Elon, of the people at leading labs at least historically has been the best on the latter, and I don't know about the former
  • Do they have any plans that involve leveraging AI systems to improve the degree to which humanity can coordinate to not build superhuman systems in the near future
    • xAI is in a very promising position here, and deploying grok to Twitter (barring things like the MechaHitler incident) and generally leveraging Twitter seems like it could be really good here. But also Elon making Grok sycophantic to him seems quite bad for trust, so sign here is unclear. 

I think RSPs can be helpful in as much as they create common knowledge of risks. No current company's RSP commits the company to stopping or slowing down if risks get too high, and so on the company policy level seem largely powerless. I also think many RSPs and Risk Management Frameworks are active safety washing and written with an intent to actively confuse people about the risks from AI, so marginally less of that is often good (though I do think there are real teeth to some evals and RSPs, but definitely not all).

If a company says it thinks a model is safe on the basis of eval results

All current models are safe. No strongly superhuman future models are safe. They will stay safe until they non-trivially exceed top human performance. There, I did it. 

Like, this is of course tongue-in-cheek, but I think it's really important to not pretend that current safety evals are measuring the safety or risks of present systems. There is approximately no variance in how dangerous systems from different AI companies of the same capability level are. The key question, according to my models, is when these systems will become capable of disempowering humanity. None of these systems are anywhere remotely close to being aligned enough to take reasonable actions if they ever get into a position of disempowering humanity. The only evals that matter are the capability evals. No mitigations have ever helped with the fundamental risk model at all (I think there is value in reducing misuse risk, but misuse risk isn't going to kill everyone, and so is many orders of magnitude less important than accident risk[1]).

  1. ^

    Like, there is some important interplay between people racing and misuse risk, though I think stuff like bioterrorism evals do not really measure that either. There is also some interesting thinking to be done about the security requirements of RSPs, though the sign here is confusing and unclear to me, but it could be a large effect size.

Reply
Zach Stein-Perlman's Shortform
habryka2d2816

FWIW, I think the key question is to understand the regulatory demands that xAI is making. It's not like the RSPs or safety evaluations will really tell anyone that much new. Indeed, it seems very sad to evaluate the quality of frontier company's safety work on the basis of the pretty fake seeming RSP and Risk Management Frameworks that other companies have created, which seem pretty powerless. Clearly if one was thinking from first principles about what a responsible company should do, those are not the relevant dimensions. 

I don't know what the current lobbying and advocacy efforts of xAI are, but if they are absent then seems to me like they are messing up less than e.g. OpenAI, and if they are calling for real and weighty regulations (as at least Elon has done in the past, though he seems to have changed his tune recently), then that seems like it would matter more.

Edit: To be clear, a thing I do think really matters is keeping your commitments, even if they committed you to doing things I don't think are super important. So on this dimension, xAI does seem like it messed up pretty badly, given this:

"‬We plan to release an updated version of this policy within three months" but it was published on Feb 10, over five months ago"

Reply
Noah Weinberger's Shortform
habryka2d50

Welcome! Glad to have you around and hope you have a good time!

Reply
Annapurna's Shortform
habryka2dModerator Comment31

This comment too is not fit for this site. What is going on with y'all? Why is fertility such a weirdly mindkilling issue? Please don't presume your theory to be true, try to highlight cruxes, try to summon up at least a bit of curiosity about your interlocutors, all the usual things.

Like, it's fine to have a personally confident take on the causes of low fertility in western countries, but man, you can't just treat your personal confidence as shared and obvious with everyone else, at least in this way.

Reply
Annapurna's Shortform
habryka2dModerator Comment1310

What... is going on in this comment? It has so much snark, and so my guess is downstream of some culture war gremlins. Please don't leave comments like this.

The basic observation that status might be a kind of conserved quality and as such in order to advocate for status-raising of one thing you also need to be transparent about which things you would feel comfortably lowering in status is a fine one, but this isn't the way to communicate that observation.

Reply
Load More
56Habryka's Shortform Feed
Ω
6y
Ω
436
A Moderate Update to your Artificial Priors
A Moderate Update to your Organic Priors
Concepts in formal epistemology
Roko's Basilisk
6d
Roko's Basilisk
6d
AI Psychology
6mo
(+58/-28)
87Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity
2d
28
20Open Thread - Summer 2025
19d
15
91ASI existential risk: Reconsidering Alignment as a Goal
3mo
14
346LessWrong has been acquired by EA
3mo
53
772025 Prediction Thread
6mo
21
23Open Thread Winter 2024/2025
7mo
60
45The Deep Lore of LightHaven, with Oliver Habryka (TBC episode 228)
7mo
4
36Announcing the Q1 2025 Long-Term Future Fund grant round
7mo
2
112Sorry for the downtime, looks like we got DDosd
7mo
13
610(The) Lightcone is nothing without its people: LW + Lighthaven's big fundraiser
7mo
270
Load More