There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT.
When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".
Like, for Connor & people who ...
I don't expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version:
There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don't expect people to agree with those specific views. The reason I think it's obvious is that even when I make massive concessions to the anti-capabilities people, these organizations... still seem +EV? Let's make a bun...
Yeah, fair enough.
But I don't think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn't exist - something he has publicly wished for - that is in fact the most likely situation we would find. We certainly wouldn't find CEOs who advocate for a total stop on AI.
(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)
I believe you're wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you're in.
First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully make clear how wrong your world mo...
Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you're so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it's still true - and what's more, obviously so. I don't know how you and others egged each other into the position that it doesn't matter whether the people working on AI care about AI risk, but it's insane.
I agreed with most of your comment until this line. Is your argument that, there's a lot of nuance to getting saf...
...First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I'd like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bri
This post is fun but I think it's worth pointing out that basically nothing in it is true.
-"Clown attacks" are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don't want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn't an analogy that applies to humans because we're far more robust than software. A zero day exploit on an operating system can give ...
Zero day exploits in general are not a thing you have to worry about; it isn't an analogy that applies to humans because we're far more robust than software. A zero day exploit on an operating system can give you total control of it; a 'zero day exploit' like junk food can make you consume 5% more calories per day than you otherwise do.
The "just five percent more calories" example reveals nicely how meaningless this heuristic is. The vast majority of people alive today are the effective mental subjects of some religion, political party, national identi...
Seems like we mostly agree and our difference is based on timelines. I agree the effect is more of a long term one, although I wouldn't say decades. OpenAI was founded in 2015 and raised the profile of AI risk in 2022, so in the counterfactual case where Sam Altman was dissuaded from founding OpenAI due to timeline concerns, AI risk would have much lower public credibility less than a decade.
Public recognition as a researcher does seem to favour longer periods of time though, the biggest names are all people who've been in the field multiple decades, so you have a point there.
I think we're talking past each other a bit. I'm saying that people sympathetic to AI risk will be discouraged from publishing AI capability work, and publishing AI capability work is exactly why Stuart Russell and Yoshua Bengio have credibility. Because publishing AI capability work is so strongly discouraged, any new professors of AI will to some degree be selected for not caring about AI risk, which was not the case when Russell or Bengio entered the field.
The focus of the piece is on the cost of various methods taken to slow down AI timelines, with the thesis being that across a wide variety of different beliefs about the merit of slowing down AI, these costs aren't worth it. I don't think it's confused to be agnostic about the merits of slowing down AI when the tradeoffs being taken are this bad.
Views on the merit of slowing down AI will be highly variable from person to person and will depend on a lot of extremely difficult and debatable premises that are nevertheless easy to have an opinion on...
> I just think it’s extraordinarily important to be doing things on a case-by-case basis here. Like, let’s say I want to work at OpenAI, with the idea that I’m going to advocate for safety-promoting causes, and take actions that are minimally bad for timelines.
Notice that this is phrasing AI safety and AI timelines as two equal concerns that are worth trading off against each other. I don't think they are equal, and I think most people would have far better impact if they completely struck "I'm worried this will advance timelines" from their think...
Capabilities withdrawal doesn't just mean fewer people worried about AI risk at top labs - it also means fewer Stuart Russells and Yoshua Bengios. In fact, publishing AI research is seen as worse than working in a lab that keeps its work private.
My specific view:
Is the overall karma for this mostly just people boosting it for visibility? Because I don't see how this would be a quality comment by any other standards.
Frontpage comment guidelines:
So if I'm getting at things correctly, capabilities and safety are highly correlated, and there can't be situations where capabilities and alignment decouple.
Not that far, more like it doesn't decouple until more progress has been made. Pure alignment is an advanced subtopic of AI research that requires more progress to have been made before it's a viable field.
I'm not super confident in the above and wouldn't discourage people from doing alignment work now (plus the obvious nuance that it's not one big lump, there are some things that can be done later an...
I'm very unconfident in the following but, to sketch my intuition:
I don't really agree with the idea of serial alignment progress that is independent from capability progress. This is what I was trying to get at with
"AI capabilities" and "AI alignment" are highly related to each other, and "AI capabilities" has to come first in that alignment assumes that there is a system to align.
By analogy, nuclear fusion safety research is inextricable from nuclear fusion capability research.
When I try to think of ways to align AI my mind points towards questions like ...
Right, I specifically think that someone would be best served by trying to think of ways to get a SOTA result on an Atari benchmark, not simply reading up on past results (although you'd want to do that as part of your attempt). There's a huge difference between reading about what's worked in the past and trying to think of new things that could work and then trying them out to see if they do.
As I've learned more about deep learning and tried to understand the material, I've constantly had ideas that I think could improve things. Then I've tried them out, ...
"AI capabilities" and "AI alignment" are highly related to each other, and "AI capabilities" has to come first in that alignment assumes that there is a system to align. I agree that for people on the cutting edge of research like OpenAI, it would be a good idea for at least some of them to start thinking deeply about alignment instead. There's two reasons for this:
1) OpenAI is actually likely to advance capabilities a pretty significant amount, and
2) Due to their expertise that they've developed from working on AI capabilities, they're much more lik...
If I'm understanding this one right OpenAI did something similar to this for purely pragmatic reasons with VPT, a minecraft agent. They first trained a "foundation model" to imitate...
Nice post! I think these are good criticisms that don't justify the title. Points 1 through 4 are all (specific, plausible) examples of ways we may interpret the activation space incorrectly. This is worth keeping in mind, and I agree that just looking at the activation space of a single layer isn't enough, but it still seems like a very good place to start.
A layer's activation is a relatively simple space, constructed by the model, that contains all the information that the model needs to make its prediction. This makes it a great place to look if you're trying to understand how the model's thinking.