Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
It did cause my probability to go from 20% to 80%, so it definitely helped!
PLEASE PLEASE PLEASE stop being paranoid about hyperstition. It's fine, it almost never happens. Most things happen for boring reasons, not because of some weird self-fulfilling prophecy. Hyperstition is rare and weird and usually not a real concern. If bad futures are likely, say that. If bad futures are unlikely, say that. Do not worry too much about how much your prediction will shift the outcome, it very rarely does, and the anxiety of whether it does is not actually making anything better.
I am genuinely uncertain whether this is a joke.
We do happen to have had had the great Air Conditioner War of 2022: https://www.lesswrong.com/posts/MMAK6eeMCH3JGuqeZ/everything-i-need-to-know-about-takeoff-speeds-i-learned
I'm worried about risk from current models but because it's a bad sign about noticing risks when warning signs appear, being honest about risk/safety even when it makes you look bad, etc.
I agree that this is the key dimension, but I don't currently think RSPs are a great vehicle for that. Indeed, looking at the regulatory advocacy of a company seems like a much better indicator, since I expect that to have a bigger effect on the conversation about risk/safety than the RSP and eval results (though it's not overwhelmingly clear to me).
And again, many RSPs and eval results seem to me to be active propaganda, and so are harmful on this dimension, and it's better to do nothing than to be harmful in this way (though I agree that if xAI said they would do a think and then didn't, then that is quite bad).
I guess your belief "no actions that seem at all plausible for any current AI company to take have really any chance of making it so that it's non-catastrophic for them to develop and deploy systems much smarter than humans" is a crux; I disagree, and so I care about marginal differences in risk-preparedness.
Makes sense. I am not overwhelmingly confident there isn't something control-esque to be done here, though that's the only real candidate I have, and it's not currently clear to me that current safety evals positively correlate with control interventions being easier or harder or even more likely to be implemented. For example, my sense is having models be trained for harmlessness makes them worse for control interventions, you would much rather have pure helpful + honest models.
I'm surprised if you think that variance in regulatory outcomes is not just more important than variance in what-a-company-does outcomes but also sufficiently tractable for the marginal company that it's the key question.
Huh, I am not sure what you mean. I am surprised if you think that I think that marginal frontier-lab safety research is making progress on the hard parts of the alignment problem. I've been pretty open that I think valuable progress on that dimension has been close to zero.
This doesn't mean actions of an AI company do not matter, but I do think that no actions that seem at all plausible for any current AI company to take have really any chance of making it so that it's non-catastrophic for them to develop and deploy systems much smarter than humans. So the key dimension is how much the actions of the labs are doing things that might prevent smarter than human systems from being deployed in the near future.
I think there are roughly two dimensions of variance where I do think AI lab behavior has a big effect on that:
I think RSPs can be helpful in as much as they create common knowledge of risks. No current company's RSP commits the company to stopping or slowing down if risks get too high, and so on the company policy level seem largely powerless. I also think many RSPs and Risk Management Frameworks are active safety washing and written with an intent to actively confuse people about the risks from AI, so marginally less of that is often good (though I do think there are real teeth to some evals and RSPs, but definitely not all).
If a company says it thinks a model is safe on the basis of eval results
All current models are safe. No strongly superhuman future models are safe. They will stay safe until they non-trivially exceed top human performance. There, I did it.
Like, this is of course tongue-in-cheek, but I think it's really important to not pretend that current safety evals are measuring the safety or risks of present systems. There is approximately no variance in how dangerous systems from different AI companies of the same capability level are. The key question, according to my models, is when these systems will become capable of disempowering humanity. None of these systems are anywhere remotely close to being aligned enough to take reasonable actions if they ever get into a position of disempowering humanity. The only evals that matter are the capability evals. No mitigations have ever helped with the fundamental risk model at all (I think there is value in reducing misuse risk, but misuse risk isn't going to kill everyone, and so is many orders of magnitude less important than accident risk[1]).
Like, there is some important interplay between people racing and misuse risk, though I think stuff like bioterrorism evals do not really measure that either. There is also some interesting thinking to be done about the security requirements of RSPs, though the sign here is confusing and unclear to me, but it could be a large effect size.
FWIW, I think the key question is to understand the regulatory demands that xAI is making. It's not like the RSPs or safety evaluations will really tell anyone that much new. Indeed, it seems very sad to evaluate the quality of frontier company's safety work on the basis of the pretty fake seeming RSP and Risk Management Frameworks that other companies have created, which seem pretty powerless. Clearly if one was thinking from first principles about what a responsible company should do, those are not the relevant dimensions.
I don't know what the current lobbying and advocacy efforts of xAI are, but if they are absent then seems to me like they are messing up less than e.g. OpenAI, and if they are calling for real and weighty regulations (as at least Elon has done in the past, though he seems to have changed his tune recently), then that seems like it would matter more.
Edit: To be clear, a thing I do think really matters is keeping your commitments, even if they committed you to doing things I don't think are super important. So on this dimension, xAI does seem like it messed up pretty badly, given this:
"We plan to release an updated version of this policy within three months" but it was published on Feb 10, over five months ago"
Welcome! Glad to have you around and hope you have a good time!
This comment too is not fit for this site. What is going on with y'all? Why is fertility such a weirdly mindkilling issue? Please don't presume your theory to be true, try to highlight cruxes, try to summon up at least a bit of curiosity about your interlocutors, all the usual things.
Like, it's fine to have a personally confident take on the causes of low fertility in western countries, but man, you can't just treat your personal confidence as shared and obvious with everyone else, at least in this way.
What... is going on in this comment? It has so much snark, and so my guess is downstream of some culture war gremlins. Please don't leave comments like this.
The basic observation that status might be a kind of conserved quality and as such in order to advocate for status-raising of one thing you also need to be transparent about which things you would feel comfortably lowering in status is a fine one, but this isn't the way to communicate that observation.
Thanks for the follow-up! I talked with Scott about LW moderation a long time ago (my guess is around 2019) and Said's name came up then. My guess is he doesn't remember. It wasn't an incredibly intense mention, but we were talking about what makes LW comment sections good or bad, and he was a commenter we discussed in that conversation in 2019 or so.
I think you can clearly see how the Jacob Falkovich one is complicated. He basically says "I used to be frustrated by you, but this thing made that a lot better". I don't remember the exact time I talked to Jacob about it, but it had come up sometime some context where we discussed LW comment sections. It's plausible to me it was before he made this comment, though it would be a bit surprising to me, since that's pretty early into LW's history.