All of 307th's Comments + Replies

307th61

Nice post! I think these are good criticisms that don't justify the title. Points 1 through 4 are all (specific, plausible) examples of ways we may interpret the activation space incorrectly. This is worth keeping in mind, and I agree that just looking at the activation space of a single layer isn't enough, but it still seems like a very good place to start. 

 

A layer's activation is a relatively simple space, constructed by the model, that contains all the information that the model needs to make its prediction. This makes it a great place to look if you're trying to understand how the model's thinking.

307th1815

There are all kinds of benefits to acting with good faith, and people should not feel licensed to abandon good faith dialogue just because they're SUPER confident and this issue is REALLY IMPORTANT. 

When something is really serious it becomes even more important to do boring +EV things like "remember that you can be wrong sometimes" and "don't take people's quotes out of context, misrepresent their position, and run smear campaigns on them; and definitely don't make that your primary contribution to the conversation".

Like, for Connor & people who ... (read more)

307th40

I don't expect most people to agree with that point, but I do believe it. It ends up depending on a lot of premises, so expanding on my view there in full would be a whole post of its own. But to try to give a short version: 

There are a lot of specific reasons I think having people working in AI capabilities is so strongly +EV. But I don't expect people to agree with those specific views. The reason I think it's obvious is that even when I make massive concessions to the anti-capabilities people, these organizations... still seem +EV? Let's make a bun... (read more)

3rotatingpaguro
I have the impression that the big guys started taking AI risk seriously when they saw capabilities that impressed them. So I expect that if Musk, Altman & the rest of the Dreamgrove did not embark in pushing the frontier faster than it was moving otherwise, at the same capability level AI researchers would have taken it seriously the same. Famous AI scientists already knew about the AI risk arguments; where OpenAI made a difference was not in telling them about AI risk, but shoving GPT up their nose. I think the public would then have been able to side with Distinguished Serious People raising warnings about the dangers of ultra-intellingent machines even if Big Corp claimed otherwise.
307th1414

Yeah, fair enough.

But I don't think that would be a sensible position. The correct counterfactual is in fact the one where Google Brain, Meta, and NVIDIA led the field. Like, if DM + OpenAI + Anthropic didn't exist - something he has publicly wished for - that is in fact the most likely situation we would find. We certainly wouldn't find CEOs who advocate for a total stop on AI.

307th21

(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)

307th6719

I believe you're wrong on your model of AI risk and you have abandoned the niceness/civilization norms that act to protect you from the downside of having false beliefs and help you navigate your way out of them. When people explain why they disagree with you, you accuse them of lying for personal gain rather than introspect about their arguments deeply enough to get your way out of the hole you're in.

First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully  make clear how wrong your world mo... (read more)

Reply1082222111

Eighth, yes, working in AI capabilities is absolutely a reasonable alignment plan that raises odds of success immensely. I know, you're so overconfident on this point that even reading this will trigger you to dismiss my comment. And yet it's still true - and what's more, obviously so. I don't know how you and others egged each other into the position that it doesn't matter whether the people working on AI care about AI risk, but it's insane.

I agreed with most of your comment until this line. Is your argument that, there's a lot of nuance to getting saf... (read more)

First, this is a minor point where you're wrong, but it's also a sufficiently obvious point that it should hopefully  make clear how wrong your world model is: AI safety community in general, and DeepMind + Anthropic + OpenAI in particular, have all made your job FAR easier. This should be extremely obvious upon reflection, so I'd like you to ask yourself how on earth you ever thought otherwise. CEOs of leading AI companies publicly acknowledging AI risk has been absolutely massive for public awareness of AI risk and its credibility. You regularly bri

... (read more)
2307th
(Ninth, I am aware of the irony of calling for more civil discourse in a highly inflammatory comment. Mea culpa)
307th337

This post is fun but I think it's worth pointing out that basically nothing in it is true.

-"Clown attacks" are not a common or particularly effective form of persuasion
-They are certainly not a zero day exploit; having a low status person say X because you don't want people to believe X has been available to humans for our entire evolutionary history
-Zero day exploits in general are not a thing you have to worry about; it isn't an analogy that applies to humans because we're far more robust than software. A zero day exploit on an operating system can give ... (read more)

lc*1116

Zero day exploits in general are not a thing you have to worry about; it isn't an analogy that applies to humans because we're far more robust than software. A zero day exploit on an operating system can give you total control of it; a 'zero day exploit' like junk food can make you consume 5% more calories per day than you otherwise do.

The "just five percent more calories" example reveals nicely how meaningless this heuristic is. The vast majority of people alive today are the effective mental subjects of some religion, political party, national identi... (read more)

trevor110
  1. Yes they are, clown attacks are an incredibly powerful and flexible form of Overton window manipulation. They can even become a self-fulfilling prophecy by selectively sorting domains of thought among winners and losers in real life, e.g. only losers think about lab leak hypothesis.
  2. It's a zero-day exploit because it's a flaw in the human brain that modern systems are extremely capable of utilizing to steer people's thinking without their knowledge (in this case, denial of certain lines of cognition). You're right that it's not new enough to count days, lik
... (read more)
307th30

Seems like we mostly agree and our difference is based on timelines. I agree the effect is more of a long term one, although I wouldn't say decades. OpenAI was founded in 2015 and raised the profile of AI risk in 2022, so in the counterfactual case where Sam Altman was dissuaded from founding OpenAI due to timeline concerns, AI risk would have much lower public credibility less than a decade.

Public recognition as a researcher does seem to favour longer periods of time though, the biggest names are all people who've been in the field multiple decades, so you have a point there.

307th12

I think we're talking past each other a bit. I'm saying that people sympathetic to AI risk will be discouraged from publishing AI capability work, and publishing AI capability work is exactly why Stuart Russell and Yoshua Bengio have credibility. Because publishing AI capability work is so strongly discouraged, any new professors of AI will to some degree be selected for not caring about AI risk, which was not the case when Russell or Bengio entered the field.

7habryka
I agree that this is a concern that hypothetically could make a difference, but as I said in my other comment, we are likely to alienate many of the best people by doing means-end-reasoning like this (including people like Stuart and Yoshua), and also, this seems like a very slow process that would take decades to have a large effect, and my timelines are not that long.
2habryka
Stuart and Yoshua seem to be welcomed into the field just fine, and their stature as respected people on the topic of existential risk seems to be in good shape, and I don't expect that to change on the relevant timescales. I think people talking openly about the danger and harm caused by developing cutting edge systems is exactly the kind of thing that made them engage with the field, and a field that didn't straightforwardly recognize and try to hold the people who are causing the harm accountable would have been much less likely to get at least Stuart involved (I know less about Yoshua). Stuart himself is one of the people who is harshest about people doing dangerous research, and who is most strongly calling for pretty hard accountability.
307th10

The focus of the piece is on the cost of various methods taken to slow down AI timelines, with the thesis being that across a wide variety of different beliefs about the merit of slowing down AI, these costs aren't worth it. I don't think it's confused to be agnostic about the merits of slowing down AI when the tradeoffs being taken are this bad. 

Views on the merit of slowing down AI will be highly variable from person to person and will depend on a lot of extremely difficult and debatable premises that are nevertheless easy to have an opinion on... (read more)

307th54

> I just think it’s extraordinarily important to be doing things on a case-by-case basis here. Like, let’s say I want to work at OpenAI, with the idea that I’m going to advocate for safety-promoting causes, and take actions that are minimally bad for timelines. 

Notice that this is phrasing AI safety and AI timelines as two equal concerns that are worth trading off against each other. I don't think they are equal, and I think most people would have far better impact if they completely struck "I'm worried this will advance timelines" from their think... (read more)

6Steven Byrnes
This seems confused in many respects. AI safety is the thing I care about. I think AI timelines are a factor contributing to AI safety, via having more time to do AI safety technical research, and maybe also other things like getting better AI-related governance and institutions. You’re welcome to argue that shorter AI timelines other things equal do not make safe & beneficial AGI less likely—i.e., you can argue for: “Shortening AI timelines should be excluded from cost-benefit analysis because it is not a cost in the first place.” Some people believe that, although I happen to strongly disagree. Is that what you believe? If so, I’m confused. You should have just said it directly. It would make almost everything in this OP besides the point, right? I understood this OP to be taking the perspective that shortening AI timelines is bad, but the benefits of doing so greatly outweigh the costs, and the OP is mainly listing out various benefits of being willing to shorten timelines. Putting that aside, “two equal concerns” is a strange phrasing. The whole idea of cost-benefit analysis is that the costs and benefits are generally not equal, and we’re trying to figure out which one is bigger (in the context of the decision in question). If someone thinks that shortening AI timelines is bad, then I think they shouldn’t and won’t ignore that. If they estimate that, in a particular decision, they’re shortening AI timelines infinitesimally, in exchange for a much larger benefit, then they shouldn’t ignore that either. I think “shortening AI timelines is bad but you should completely ignore that fact in all your actions” is a really bad plan. Not all timeline-shortening actions have infinitesimal consequence, and not all associated safety benefits are much larger. In some cases it’s the other way around—massive timeline-shortening for infinitesimal benefit. You won’t know which it is in a particular circumstance if you declare a priori that you’re not going to think about it i
307th10

Capabilities withdrawal doesn't just mean fewer people worried about AI risk at top labs - it also means fewer Stuart Russells and Yoshua Bengios. In fact, publishing AI research is seen as worse than working in a lab that keeps its work private.

4habryka
Then I think you are arguing against a straw man. I don't know anyone who argued against doing the kind of research or writing that seems to have been compelling to Stuart or Yoshua. As far as I can tell Stuart read a bunch of Bostrom and Eliezer stuff almost a decade ago, and Yoshua did the same and also has a bunch of grad students that did very clearly safety research with little capability externalities.
307th*20-3

My specific view: 

  • OpenAI's approach seems most promising to me
  • Alignment work will look a lot like regular AI work; it is unlikely that someone trying to theorize about how to solve alignment, separate from any particular AI system that they are trying to align, will see success.
  • Takeoff speed is more important than timelines. The ideal scenario is one where compute is the bottleneck and we figure out how to build AGI well before we have enough compute to build it, because this allows us to experiment with subhuman AGI systems.
  • Slow takeoff is pretty lik
... (read more)
0Mitchell_Porter
Do you understand the nature of consciousness? Do you know the nature of right and wrong? Do you know how an AI would be able to figure out these things? Do you think a superintelligent AI can be "aligned" without knowing these things? 
307th1624

Is the overall karma for this mostly just people boosting it for visibility? Because I don't see how this would be a quality comment by any other standards.

Frontpage comment guidelines:

  • Maybe try reading the post
2Garrett Baker
LessWrong gives those with higher karma greater post and comment karma starting out, under the assumption that their posts and comments are better and more representative of the community. Probably the high karma you’re seeing is a result of that. I think this is mostly a good thing. That particular guideline you quoted doesn’t seem to appear on my commenting guidelines text box.
307th4-8

So if I'm getting at things correctly, capabilities and safety are highly correlated, and there can't be situations where capabilities and alignment decouple.

Not that far, more like it doesn't decouple until more progress has been made. Pure alignment is an advanced subtopic of AI research that requires more progress to have been made before it's a viable field.

I'm not super confident in the above and wouldn't discourage people from doing alignment work now (plus the obvious nuance that it's not one big lump, there are some things that can be done later an... (read more)

2Nathan Helm-Burger
I like your comments, 307th, and your linked post on RL SotA. I don't agree with everything you say, but I some of what you say is quite on point. In particular I agree that 'RL is currently being rather unimpressive in achieving complicated goals in complex wide-possible-action-space simulation worlds'. I agree that some fundamental breakthroughs are needed to change this, not just scaling existing methods. I disagree that such breakthroughs will necessarily  require many calendar years of research. I think probably the eyes of the big research labs will soon be turning to focus more fully upon tackling complex-world RL, and that it won't be long at all before significant breakthroughs start being made. I think rather than thinking about research progress in terms of years, or even 'researcher hours', it's more helpful to think of progress in terms of 'research points' devoted to the specific topic. An hour of a highly effective researcher at a well-funded lab, with a well-setup research environment that makes new experiments easy to run is worth vastly more 'research points' towards a topic than an hour of a compute-limited grad student without polished experiment-running code patterns, without access to huge compute resources, and without much experience running large experiments over many variables.
307th3-2

I'm very unconfident in the following but, to sketch my intuition:

I don't really agree with the idea of serial alignment progress that is independent from capability progress. This is what I was trying to get at with

"AI capabilities" and "AI alignment" are highly related to each other, and "AI capabilities" has to come first in that alignment assumes that there is a system to align.

By analogy, nuclear fusion safety research is inextricable from nuclear fusion capability research.

When I try to think of ways to align AI my mind points towards questions like ... (read more)

5Noosphere89
Yeah, that might be a big idea. If you're right that AI capabilities work and AI Alignment work is the same thing, the problem is solved by definition. So if I'm getting at things correctly, capabilities and safety are highly correlated, and there can't be situations where capabilities and alignment decouple.
307th14-1

Right, I specifically think that someone would be best served by trying to think of ways to get a SOTA result on an Atari benchmark, not simply reading up on past results (although you'd want to do that as part of your attempt). There's a huge difference between reading about what's worked in the past and trying to think of new things that could work and then trying them out to see if they do.

As I've learned more about deep learning and tried to understand the material, I've constantly had ideas that I think could improve things. Then I've tried them out, ... (read more)

3LawrenceC
I do agree that people should try their ideas out, even if the ideas are "capabilities" flavored. However, I do think (if you buy the serial vs parallel distinction in the OP) that you should try to not do capabilities research.  As you say, most ML ideas people come up with at first are pretty doomed to failure, and the main way people learn is via experience. This is in part due to the overconfidence of newbies in any field, but also in part due to how counterintuitive many ML results are to most people. [1] The key thing people should know is, if you stumble on an actual capabilities insight... you can just... not publish it or talk about it. I think I'd emphasize this point over the other points. Do the research most helpful for learning, and then in the unlikely event it ends up being impressive capabilities work, you can always just put it into your filing cabinet and walk away. [2]  As for the ML PhD example: I think you can think very deeply about AIs without going out and working on critical-path capabilities work! You should think very deeply about AIs in general, if you're working in the field, regardless of what you're doing! But if you're in a job where you have to publish to advance, (assuming you buy the assumptions in the OP) it seems pretty bad to actively seek out and work on critical-path capabilities work, as opposed to skill-building work or safety work.   Finally, while I agree with your overall takeaway, I strongly disagree with this style of argument: Because the expected effect of most people on moving anything significant forward by a couple of months is probably going to be zero, including solutions to alignment; there's just a lot of people out there and the problems people want to work on are really hard. What matters isn't whether or not we can measure the impact of single people in terms of full percentage points of various outcomes or full weeks of time, but about whether the expected gains are larger from doing the PhD vs not
3Jay Bailey
Thanks for making things clearer! I'll have to think about this one - some very interesting points from a side I had perhaps unfairly dismissed before.
307th1-10

"AI capabilities" and "AI alignment" are highly related to each other, and "AI capabilities" has to come first in that alignment assumes that there is a system to align. I agree that for people on the cutting edge of research like OpenAI, it would be a good idea for at least some of them to start thinking deeply about alignment instead. There's two reasons for this: 


1) OpenAI is actually likely to advance capabilities a pretty significant amount, and

2) Due to their expertise that they've developed from working on AI capabilities, they're much more lik... (read more)

8Jay Bailey
"Working on AI capabilities" explicitly means working to advance the state-of-the-art of the field. Skilling up doesn't do this. Hell, most ML work doesn't do this. I would predict >50% of AI alignment researchers would say that building an AI startup that commercialises the capabilities of already-existing models does not count as "capabilities work" in the sense of this post. For instance, I've spent the last six months studying reinforcement learning and Transformers, but I haven't produced anything that has actually reduced timelines, because I haven't improved anything beyond the level that humanity was capable of before, let alone published it. If you work on research engineering in a similar manner, but don't publish any SOTA results, I would say you haven't worked on AI capabilities in the way this post refers to them.
307th23
  1. A paper which does for deceptive alignment what the goal misgeneralization paper does for inner alignment, i.e. describing it in ML language and setting up toy examples (for example, telling GPT-3 to take actions which minimize changes in its weights, given that it’s being trained using actor-critic RL with a certain advantage function, and seeing if it knows how to do so).


If I'm understanding this one right OpenAI did something similar to this for purely pragmatic reasons with VPT, a minecraft agent. They first trained a "foundation model" to imitate... (read more)