All of ryan_greenblatt's Comments + Replies

(I don't expect o3-mini is a much better agent than 3.5 sonnet new out of the box, but probably a hybrid scaffold with o3 + 3.5 sonnet will be substantially better than 3.5 sonnet. Just o3 might also be very good. Putting aside cost, I think o1 is usually better than o3-mini on open ended programing agency tasks I think.)

The question of context might be important, see here. I wouldn't find 15 minutes that surprising for ~50% success rate, but I've seen numbers more like 1.5 hours. I thought this was likely to be an overestimate so I went down to 1 hour, but more like 15-30 minutes is also plausible.

Keep in mind that I'm talking about agent scaffolds here.

2habryka
Yeah, I have failed to get any value out of agent scaffolds, and I don't think I know anyone else who has so far. If anyone has gotten more value out of them than just the Cursor chat, I would love to see how they do it!  All things like Cursor composer and codebuff and other scaffolds have been worse than useless for me (though I haven't tried it again after o3-mini, which maybe made a difference, it's been on my to-do list to give it another try).

I mean, I don't think AI R&D is a particularly hard field persay, but I do think it involves lots of tricky stuff and isn't much easier than automating some other plausibly-important-to-takeover field (e.g., robotics). (I could imagine that the AIs have a harder time automating philosophy even if they were trying to work on this, but it's more confusing to reason about because human work on this is so dysfunctional.) The main reason I focused on AI R&D is that I think it is much more likely to be fully automated first and seems like it is probably fully automated prior to AI takeover.

2TsviBT
Ok, I think I see what you're saying. To check part of my understanding: when you say "AI R&D is fully automated", I think you mean something like:

I think you can add mirror enzymes which can break down mirror carbs. Minimally we are aware of enzymes which break down mirror glucose.

No, sorry I was mostly focused on "such that if you didn't see them within 3 or 5 years, you'd majorly update about time to the type of AGI that might kill everyone". I didn't actually pick up on "most impressive" and actually tried to focus on something that occurs substantially before things get crazy.

Most impressive would probably be stuff like "automate all of AI R&D and greatly accelerate the pace of research at AI companies". (This seems about 35% likely to me within 5 years, so I'd update by at least that much.) But this hardly seems that interesting? I think we can agree that once the AIs are automating whole companies stuff is very near.

2TsviBT
Ok. So I take it you're very impressed with the difficulty of the research that is going on in AI R&D. (FWIW I don't agree with that; I don't know what companies are up to, some of them might not be doing much difficult stuff and/or the managers might not be able to or care to tell the difference.)

Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It's unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.

I think if you look at "horizon length"---at what task duration (in terms of human completion time) do the AIs get the task right 50% of the time---the trends will indicate doubling times of maybe 4 months (though 6 months is plausible). Let's say 6 months more conservatively. I think AIs are at like 30 minutes on math? And 1 hour on software engineering. It's a bit unclear, but let's go with that. Then, to get to 64 hours on math, we'd need 7 doublings = 3.5 years. So, I think the naive trend extrapolation is much faster than you think? (And this estimate strikes me as conservative at least for math IMO.)

2habryka
FWIW, this seems like an overestimate to me. Maybe o3 is better than other things, but I definitely can't get equivalents of 1-hour chunks out of language models, unless it happens to be an extremely boilerplate-heavy step. My guess is more like 15-minutes, and for debugging (which in my experience is close to most software-engineering time), more like 5-10 minutes.

Consider tasks that quite good software engineers (maybe top 40% at Jane Street) typically do in 8 hours without substantial prior context on that exact task. (As in, 8 hour median completion time.) Now, we'll aim to sample these tasks such that the distribution and characteristics of these tasks are close to the distribution of work tasks in actual software engineering jobs (we probably can't get that close because of the limited context constraint, but we'll try).

In short timelines, I expect AIs will be able to succeed at these tasks 70% of the time with... (read more)

2TsviBT
Thanks... but wait, this is among the most impressive things you expect to see? (You know more than I do about that distribution of tasks, so you could justifiably find it more impressive than I do.)

I would find this post much more useful to engage with if you more concretely described the type of tasks that you think AIs will remain bad and gave a bunch of examples. (Or at least made an argument for why it is hard to construct examples if that is your perspective.)

I think you're pointing to a category like "tasks that require lots of serial reasoning for humans, e.g., hard math problems particularly ones where the output should be a proof". But, I find this confusing, because we've pretty clearly seen huge progress on this in the last year such that ... (read more)

5Rafael Harth
So, I agree that there has been substantial progress in the past year, hence the post title. But I think if you naively extrapolate that rate of progress, you get around 15 years. The problem with the three examples you've mentioned is again that they're all comparing human cognitive work across a short amount of time with AI performance. I think the relevant scale doesn't go from 5th grade performance over 8th grade performance to university-level performance or whatever, but from "what a smart human can do in 5 minutes" over "what a smart human can do in an hour" over "what a smart human can do in a day", and so on. I don't know if there is an existing benchmark that measures anything like this. (I agree that more concrete examples would improve the post, fwiw.) And then a separate problem is that math problems are in in the easiest category from §3.1 (as are essentially all benchmarks).
4TsviBT
What are some of the most impressive things you do expect to see AI do, such that if you didn't see them within 3 or 5 years, you'd majorly update about time to the type of AGI that might kill everyone?

Sam also implies that GPT-5 will be based on o3.

IDK if Sam is trying to imply this GPT-5 will be "the AGI", but regardless, I think we can be pretty confident that o3 isn't capable enough to automate large fractions of cognitive labor let alone "outperform humans at most economically valuable work" (the original openai definition of AGI).

2Nikola Jurkovic
Oh, I didn't get the impression that GPT-5 will be based on o3. Through the GPT-N convention I'd assume GPT-5 would be a model pretrained with 8-10x more compute than GPT-4.5 (which is the biggest internal model according to Sam Altman's statement at UTokyo).

I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin.

As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible.

So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.

See appendix B.3 in particular:

Competitors receive a higher score for submitting their solutions faster. Because models can think in parallel and simultaneously attempt all problems, they have an innate advantage over humans. We elected to reduce this advantage in our primary results by estimating o3’s score for each solved problem as the median of the scores of the human participants that solved that problem in the contest with the same number of failed attempts.

We could instead use the model’s real thinking time to compute ratings. o3 uses a learned sc

... (read more)
2Olli Järviniemi
Huh, I tried to paste that excerpt as an image to my comment, but it disappeared. Thanks.

I expect substantially more integrated systems than you do at the point when AIs are obsoleting (almost all) top human experts such that I don't expect these things will happen by default and indeed I think it might be quite hard to get them to work.

METR has a list of policies here. Notably, xAI does have a policy so that isn't correct on the tracker.

(I found it hard to find this policy, so I'm not surprised you missed it!)

Your description of GDM's policy doesn't take into account the FSF update.

However, it has yet to be fleshed out: mitigations have not been connected to risk thresholds

This is no longer fully true.

I'm a bit late for a review, but I've recently been reflecting on decision theory and this post came to mind.

When I initially saw this post I didn't make much of it. I now feel like the thesis of "decision theory is very confusing and messed up" is true, insightful, and pretty important based on spending more time engaging with particular problems (mostly related to acausal/simulation trade and other interactions). I don't know if the specific examples in this post aged well, but I think the bottom line is worth keeping in mind.

You are possibly the first person I know of who reacted to MONA with "that's obvious"

I also have the "that's obvious reaction", but possibly I'm missing somne details. I also think it won't perform well enough in practice to pencil given other better places to allocate safety budget (if it does trade off which is unclear).

2Rohin Shah
I meant "it's obvious you should use MONA if you are seeing problems with long-term optimization", which I believe is Fabien's position (otherwise it would be "hard to find"). Your reaction seems more like "it's obvious MONA would prevent multi-step reward hacks"; I expect that is somewhat more common (though still rare, and usually depends on already having the concept of multi-step reward hacking).

It's just surprising that Sam is willing to say/confirm all of this given that AI companies normally at least try to be secretive.

3Davidmanheim
He says things that are advantageous, and sometimes they are even true. The benefit of not being known to be a liar usually keeps the correlation between claims and truth positive, but in his case it seems that ship has sailed. (Checkably false claims are still pretty rare, and this may be one of those.)

I doubt that person was thinking about the opaque vector reasoning making it harder to catch the rogue AIs.

(I don't think it's good to add a canary in this case (the main concern would be takeover strategies, but I basically agree this isn't that helpful), but I think people might be reacting to "might be worth adding" and are disagree reacting to your comment because it says "are you actually serious" which seems more dismissive than needed. IMO, we want AIs trained on this if they aren't themselves very capable (to improve epistemics around takeover risk) and I feel close to indifferent for AIs that are plausibly very capable as the effect on takeover plans is small and you still get some small epistemic boost.)

6jbash
For the record, I genuinely did not know if it was meant to be serious.

There are two interpretations you might have for that third bullet:

  • Can we stop rogue AIs? (Which are operating without human supervision.)
  • Can we stop AIs deployed in their intended context?

(See also here.)

In the context of "can the AIs takeover?", I was trying to point to the rogue AI intepretation. As in, even if the AIs were rogue and had a rogue internal deployment inside the frontier AI company, how do they end up with actual hard power. For catching already rogue AIs and stopping them, opaque vector reasoning doesn't make much of a diffence.

1WilliamKiely
Thanks for the clarification. My conclusion is that I think your emoji was meant to signal disagreement with the claim that 'opaque vector reasoning makes a difference' rather than a thing I believe. I had rogue AIs in mind as well, and I'll take your word on "for catching already rogue AIs and stopping them, opaque vector reasoning doesn't make much of a difference".

I think there are good reasons to expect large fractions of humans might die even if humans immediately surrender:

  • It might be an unstable position given that the AI has limited channels of influence on the physical world. (While if there are far fewer humans, this changes.)
  • The AI might not care that much or might be myopic or might have arbitrary other motivations etc.

For many people, "can the AIs actually take over" is a crux and seeing a story of this might help build some intuition.

1SAB25
Sadly, I don't think there's going to be many people who are both unconcerned about AI risk but willing to read a 8500 word story on the topic.
8WilliamKiely
Good point. At the same time, I think the underlying cruxes that lead people to being skeptical of the possibility that AIs could actually take over are commonly: * Why would an AI that well-intentioned human actors create be misaligned and motivated to takeover? * How would such an AI go from existing on computer servers to acquiring power in the physical world? * How would humanity fail to notice this and/or stop this? I mention these points because people who mention these objections typically wouldn't raise these objections to the idea of an intelligent alien species invading Earth and taking over. People generally have no problem granting that aliens may not share our values, may have actuators / the ability to physically wage war against humanity, and could plausibly overpower us with their superior intellect and technological know-how. Providing a detailed story of what a particular alien takeover process might look like then isn't actually necessarily helpful to addressing the objections people raise about AI takeover. I'd propose that authors of AI takeover stories should therefore make sure that they aren't just describing aspects of a plausible AI takeover story that could just as easily be aspects of an alien takeover story, but are instead actually addressing peoples' underlying reasons for being skeptical that AI could take over. This means doing things like focusing on explaining: * what about the future development of AIs leads to the development of powerful agentic AIs with misaligned goals where takeover could be a plausible instrumental subgoal, * how the AIs initially acquire substantial amounts of power in the physical world, * how they do the above either without people noticing or without people stopping them. (With this comment I don't intend to make a claim about how well the OP story does these things, though that could be analyzed. I'm just making a meta point about what kind of description of a plausible AI takeover scenario

Keeping the humans alive at this point is extremely cheap in terms of fraction of long term resource consumption while avoiding killing humans might substantially reduce the AI's chance of successful takeover.

4Noosphere89
I think my crux is I don't expect keeping humans alive to be very cheap at all, such that we can ignore extremely small kindness/indifference for the purpose of AI alignment, for the reasons described here: https://www.lesswrong.com/posts/xvBZPEccSfM8Fsobt/what-are-the-best-arguments-for-against-ais-being-slightly#wy9cSASwJCu7bjM6H

Wow, that is a surprising amount of information. I wonder how reliable we should expect this to be.

6Thane Ruthenis
Is it? What of this is new? To my eyes, the only remotely new thing is the admission that "there’s a lot of research still to get to [a coding agent]".
6Hopenope
Would you update your timelines, if he is telling the truth ?

I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said "needed". So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.

(Note that I was responding to "between now and superintelligence", not claiming that this would generalize to all superintelligences built in the future.)

I agree that literal jupiter brains will very likely be built using something totally different than machine learning.

6johnswentworth
Yeah ok. Seems very unlikely to actually happen, and unsure whether it would even work in principle (as e.g. scaling might not take you there at all, or might become more resource intensive faster than the AIs can produce more resources). But I buy that someone could try to intentionally push today's methods (both AI and alignment) to far superintelligence and simply turn down any opportunity to change paradigm.

"fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks"

Suppose we replace "AIs" with "aliens" (or even, some other group of humans). Do you agree that doesn't (necessarily) kill you due to slop if you don't have a full solution to the superintelligence alignment problem?

Aliens kill you due to slop, humans depend on the details.

The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren't fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the "slop/nonslop is hard to verify" problem, and you need to handle the "make sure agents the aliens build will also be aligned, and their children, etc" problem.

I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.

Maybe, but it is interesting to note that:

  • A majority of productive work is occuring on small subproblems even if some previous paradigm change was required for this.
  • For many fields, (e.g., deep learning) many people didn't recognize (and potentially still don't recognize!) that
... (read more)

It's not clear to me we'll have (or will "need") new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks.

If you want to not die to slop, then "fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks" not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.

9johnswentworth
I find it hard to imagine such a thing being at all plausible. Are you imagining that jupiter brains will be running neural nets? That their internal calculations will all be differentiable? That they'll be using strings of human natural language internally? I'm having trouble coming up with any "alignment" technique of today which would plausibly generalize to far superintelligence. What are you picturing?

This post seems to assume that research fields have big, hard central problems that are solved with some specific technique or paradigm.

This isn't always true. Many fields have the property that most of the work is on making small components work slightly better in ways that are very interoperable and don't have complex interactions. For instance, consider the case of making AIs more capable in the current paradigm. There are many different subcomponents which are mostly independent and interact mostly multiplicatively:

  • Better training data: This is extre
... (read more)
5Ariel G.
To give a maybe helpful anecdote - I am a mechanical engineer (though I now work in AI governance), and in my experience that isnt true at least for R&D (e.g. a surgical robot) where you arent just iterating or working in a highly standardized field (aerospace, hvac, mass manufacturing etc). The "bottleneck" in that case is usually figuring out the requirements (e.g. which surgical tools to support? whats the motion range, design envelope for interferences). If those are wrong, the best design will still be wrong.  In more standardized engineering fields the requirements (and user needs) are much better known, so perhaps the bottleneck now becomes a bunch of small things rather than one big thing.  

This post seems to assume that research fields have big hard central problems that are solved with some specific technique or paradigm.

This isn't always true. [...]

I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.

And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a "big hard ce... (read more)

Ok, I think what is going on here is maybe that the constant you're discussing here is different from the constant I was discussing. I was trying to discuss the question of how much worse serial labor is than parallel labor, but I think the lambda you're talking about takes into account compute bottlenecks and similar?

Not totally sure.

[This comment is no longer endorsed by its author]Reply

Lower lambda. I'd now use more like lambda = 0.4 as my median. There's really not much evidence pinning this down; I think Tamay Besiroglu thinks there's some evidence for values as low as 0.2.

Isn't this really implausible? This implies that if you had 1000 researchers/engineers of average skill at OpenAI doing AI R&D, this would be as good as having one average skill researcher running at 16x () speed. It does seem very slightly plausible that having someone as good as the best researcher/engineer at OpenAI run at 16x speed would be competit... (read more)

2ryan_greenblatt
I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin. As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible. So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.
2ryan_greenblatt
Ok, I think what is going on here is maybe that the constant you're discussing here is different from the constant I was discussing. I was trying to discuss the question of how much worse serial labor is than parallel labor, but I think the lambda you're talking about takes into account compute bottlenecks and similar? Not totally sure.

I agree, but it is important to note that the authors of the paper disagree here.

(It's somewhat hard for me to tell if the crux is more that they don't expect that everyone would get AI aligned to them (at least as representatives) even if this was technical feasible with zero alignment tax or if the crux is that even if everyone had single-single aligned corrigible AIs representing their interests and with control over their assets and power that would still result in disempowerment. I think it is more like second thing here.)

So Zvi is accurately representing the perspective of the authors, I just disagree with them.

1David Duvenaud
Yes, Ryan is correct.  Our claim is that even fully-aligned personal AI representatives won't necessarily be able to solve important collective action problems in our favor.  However, I'm not certain about this.  The empirical crux for me is: Do collective action problems get easier to solve as everyone gets smarter together, or harder? As a concrete example, consider a bunch of local polities in a literal arms race.  If each had their own AGI diplomats, would they be able to stop the arms race?  Or would the more sophisticated diplomats end up participating in precommitment races or other exotic strategies that might still prevent a negotiated settlement?  Perhaps the less sophisticated diplomats would fear that a complicated power-sharing agreement would lead to their disempowerment eventually anyways, and refuse to compromise? As a less concrete example, our future situation might be analogous to a population of monkeys who unevenly have access to human representatives which earnestly advocate on their behalf.  There is a giant, valuable forest that the monkeys live in next to a city where all important economic activity and decision-making happens between humans.  Some of the human population (or some organizations, or governments) end up not being monkey-aligned, instead focusing on their own growth and security.  The humans advocating on behalf of monkeys can see this is happening, but because they can't always participate directly in wealth generation as well as independent humans, they eventually become a small and relatively powerless constituency.  The government and various private companies regularly bid or tax enormous amounts of money for forest land, and even the monkeys with index funds eventually are forced to sell, and then go broke from rent. I admit that there are many moving parts of this scenario, but it's the closest simple analogy to what I'm worried about that I've found so far.  I'm happy for people to point out ways this analogy won't m

Yes, but random people can't comment or post on the alignment forum and in practice I find that lots of AI relevant stuff doesn't make it there (and the frontpage is generally worse than my lesswrong frontpage after misc tweaking).

TBC, I'm not really trying to make a case that something should happen here, just trying to quickly articulate why I don't think the alignment forum fully addresses what I want.

[Mildly off topic]

I think it would be better if the default was that LW is a site about AI and longtermist cause areas and other stuff was hidden by default. (Or other stuff is shown by default on rat.lesswrong.com or whatever.)

Correspondingly, I wouldn't like penalizing multiple of the same tag.

I think the non-AI stuff is less good for the world than the AI stuff and there are downsides in having LW feature non-AI stuff from my perspective (e.g., it's more weird and random from the perspective of relevant people).

Your preferences are reasonable preferences, and also I disagree with them and plan to push the weird fanfiction and cognitive self-improvement angles on LessWrong. May I offer you a nice AlignmentForum in this trying time? 

8habryka
(I would not work on LessWrong if that was the case and think it would be very harmful for the world)

I don't disagree. I assumed Raemon intended something more elaborate than just a salient button with this effect.

Fair. For reference, here are my selections which are I think are a good default strategy for people who just come to LW for AI/AI safety reasons:

(Why "-" a bunch of stuff rather than "+" AI? Well, because "+" adds some fixed karma while "-" multiplies by a constant (less than 1), and I don't think adding karma is a good strategy (as it shows you really random posts often). I do like minus-ing these specific things also. If I could, I'd probably do AI * 1.5 and then reduce the negative weight on these things a bit.)

So this might be a good suggestion for a ... (read more)

I don't understand. Can't people just hide posts tagged as AI?

2Screwtape
I will cheerfully bet at 1:1 odds that half the people who show up on LessWrong do not know how to filter posts on the frontpage. Last time I asked that on a survey it was close to 50% and I'm pretty sure selection effects for who takes the LW Census pushed that number up.
2Raemon
Yes but it's an out-of-the-way button mixed in with a bunch of buttons, as opposed to like, kinda the primary choice you're expected to make on the site.

Another way to put this is that strategy stealing might not work due to technical alignment difficulties or for other reasons and I'm not sold the other reasons I've heard so far are very lethal. I do think the situation might really suck though with e.g. tons of people dying of bioweapons and with some groups that aren't sufficiently ruthless or which don't defer enough to AIs getting disempowered.

Yes, it is relevant for untrusted monitoring. I think the basic version of this is pretty resolvable via:

  • Paraphrasing (such that exact "I generated this text" doesn't work). Note that we just need to paraphase stuff sent to the monitor, not the actual action / reasoning.
  • Using a different fine-tune for the untrusted monitor so it doesn't have the exact same probability distribution.

We discuss in more detail here.

I predict that group A survives, but the humans are no longer in power. I think this illustrates the basic dynamic. EtA: Do you understand what I'm getting at? Can you explain what you think it wrong with thinking of it this way?

I think something like this is a reasonable model but I have a few things I'd change.

Whichever group has more power at the end of the week survives.

Why can't both groups survive? Why is it winner takes all? Can we just talk about the relative change in power over the week? (As in, how much does the power of B reduce relat... (read more)

2ryan_greenblatt
Another way to put this is that strategy stealing might not work due to technical alignment difficulties or for other reasons and I'm not sold the other reasons I've heard so far are very lethal. I do think the situation might really suck though with e.g. tons of people dying of bioweapons and with some groups that aren't sufficiently ruthless or which don't defer enough to AIs getting disempowered.
ryan_greenblatt*Ω111512

I view the world today as highly dysfunctional in many ways: corruption, coordination failures, preference falsification, coercion, inequality, etc. are rampant. This state of affairs both causes many bad outcomes and many aspects are self-reinforcing. I don't expect AI to fix these problems; I expect it to exacerbate them.

Sure, but these things don't result in non-human entities obtaining power right? Like usually these are somewhat negative sum, but mostly just involve inefficient transfer of power. I don't see why these mechanisms would on net tran... (read more)

2David Scott Krueger (formerly: capybaralet)
First, RE the role of "solving alignment" in this discussion, I just want to note that: 1) I disagree that alignment solves gradual disempowerment problems. 2) Even if it would that does not imply that gradual disempowerment problems aren't important (since we can't assume alignment will be solved). 3) I'm not sure what you mean by "alignment is solved"; I'm taking it to mean "AI systems can be trivially intent aligned".  Such a system may still say things like "Well, I can build you a successor that I think has only a 90% chance of being aligned, but will make you win (e.g. survive) if it is aligned.  Is that what you want?" and people can respond with "yes" -- this is the sort of thing that probably still happens IMO. 4) Alternatively, you might say we're in the "alignment basin" -- I'm not sure what that means, precisely, but I would operationalize it as something like "the AI system is playing a roughly optimal CIRL game".  It's unclear how good of performance that can yield in practice (e.g. it can't actually be optimal due to compute limitations), but I suspect it still leaves significant room for fuck-ups. 5) I'm more interested in the case where alignment is not "perfectly" "solved", and so there are simply clear and obvious opportunities to trade-off safety and performance; I think this is much more realistic to consider. 6) I expect such trade-off opportunities to persist when it comes to assurance (even if alignment is solved), since I expect high-quality assurance to be extremely costly.  And it is irresponsible (because it's subjectively risky) to trust a perfectly aligned AI system absent strong assurances.  But of course, people who are willing to YOLO it and just say "seems aligned, let's ship" will win.  This is also part of the problem... My main response, at a high level: Consider a simple model: * We have 2 human/AI teams in competition with each other, A and B. * A and B both start out with the humans in charge, and then decide whether the
2Noosphere89
BTW, this is also a crux for me as well, in that I do believe that absent technical misalignment, some humans will have most of the power by default, rather than AIs, because I believe AI rights will be limited by default.  

The way the isoFLOPs are shaped suggests that 90-95% sparsity might turn out to be optimal here, that is you can only get worse loss with 98+% sparsity with 1e20 FLOPs, however you vary the number of epochs and active params!

I'm currently skeptical and more minimally, I don't understand the argument you're making. Probably not worth getting into.

I do think there will be a limit to how sparse you want to even in the very high compute relative to data regime for various reasons (computational if nothing else). I don't see how these graphs support 90-95% s... (read more)

4Vladimir_Nesov
With 90% sparsity you do get better loss than dense, this is sufficient to broadly carry your argument. But with 98% sparsity (your llama-3-405B variant example has 95% sparsity) you might get worse loss than with 90% when data is scarce, though it'll still be better than dense. The principle about MoE damaging data efficiency (optimal tokens/param ratio) hints that this might be the case even before looking at the experiments.

So compute optimal MoEs don't improve data efficiency, don't contribute to mitigating data scarcity.

I agree compute optimal MoEs don't improve data utilization. But, naively you might expect that MoEs can be used to reduce issues with data scarcity at a fixed level of compute by training a much bigger model on a fixed amount of data.

As in, because there are returns to both more data and bigger models, you can use MoE to effectively use a much bigger model at the same compute.

Like, maybe you would have trained llama-3-405B on 15T tokens. You could instea... (read more)

3Vladimir_Nesov
Chinchilla scaling shows that tokens/params ratio for compute optimal models only changes slowly with compute, making it a good anchor to frame other things in terms of. The experiments from this MoE scaling paper show that under fixed data, varying sparsity in MoEs that are compute optimal at that amount of data preserves perplexity. This also seems like a nice principle for framing the way compute optimal models sit in the space of hyperparameters. With infinite data, isoFLOPs for loss depending on number of active params are parabolas with some minimum point. But with finite data you need to repeat it to train with fewer active params, which damages loss. This moves the minima of isoFLOPs to the right if the minima already required 5x repetition or more. So under data scarcity, compute optimal models have more active params than under infinite data, and the effect gets worse with more compute. This way we maintain the framing of search for compute optimal hyperparameters rather than undertraining. Now consider the 1e20 FLOPs plot in Figure 12, left. If there's only 2B tokens of training data and no more, all minima already ask for 12-31 epochs, so the distortion that increases loss will move the minima to the right (and up), and move the high sparsity minima further than lower sparsity minima compared to their original (infinite data) locations. The way the isoFLOPs are shaped suggests that 90-95% sparsity might turn out to be optimal here, that is you can only get worse loss with 98+% sparsity at 1e20 FLOPs, however you vary the number of epochs and active params! This seems counterintuitive, as in an infinite data regime more sparsity only makes things better (if we ignore practical difficulties). But sure, 90% sparsity will still be better than dense, at least until we use even more compute and sparser minima start asking for even more epochs.
1Archimedes
Even if it’s the same cost to train, wouldn’t it still be a win if inference is a significant part of your compute budget?

This is overall output throughput not latency (which would be output tokens per second for a single context).

a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput

This just claims that you can run a bunch of parallel instances of R1.

4Nick_Tarleton
https://artificialanalysis.ai/leaderboards/providers claims that Cerebras achieves that OOM performance, for a single prompt, for 70B-parameter models. So nothing as smart as R1 is currently that fast, but some smart things come close.
2Nathan Helm-Burger
Oops, bamboozled. Thanks, I'll look into it more and edit accordingly.

Discussion in this other comment is relevant:

Anthropic releasing their RSP was an important change in the AI safety landscape. The RSP was likely a substantial catalyst for policies like RSPs—which contain if-then commitments and more generally describe safety procedures—becoming more prominent. In particular, OpenAI now has a beta Preparedness Framework, Google DeepMind has a Frontier Safety Framework but there aren't any concrete publicly-known policies yet, many companies agreed to the Seoul commitments which require making a similar policy, and SB-10

... (read more)

importantly, in the dictators handbook case, some humans do actually get the power.

2Noosphere89
This is why I was stating the scenario in the paper cannot really lead to existential catastrophe, at least without other assumptions here.

Yes, that is an accurate understanding.

(Not sure what you mean by "This would explain Sonnet's reaction below!")

8Daniel Kokotajlo
What I meant was, Sonnet is smart enough to know the difference between text it generated and text a different dumber model generated. So if you feed it a conversation with Opus as if it were its own, and ask it to continue, it notices & says "JFYI I'm a different AI bro"
  • I think it's much easier to RL on huge numbers of math problems, including because it is easier to verify and because you can more easily get many problems. Also, for random reasons, doing single turn RL is substantially less complex and maybe faster than multi turn RL on agency (due to variable number of steps and variable delay from environments)
  • OpenAI probably hasn't gotten around to doing as much computer use RL partially due to prioritization.

Agreed, downvoted my comment. (You can't strong downvote your own comment, or I would have done that.)

I was mostly just trying to point to prior arguments against similar arguments while expressing my view.

Load More