The question of context might be important, see here. I wouldn't find 15 minutes that surprising for ~50% success rate, but I've seen numbers more like 1.5 hours. I thought this was likely to be an overestimate so I went down to 1 hour, but more like 15-30 minutes is also plausible.
Keep in mind that I'm talking about agent scaffolds here.
I mean, I don't think AI R&D is a particularly hard field persay, but I do think it involves lots of tricky stuff and isn't much easier than automating some other plausibly-important-to-takeover field (e.g., robotics). (I could imagine that the AIs have a harder time automating philosophy even if they were trying to work on this, but it's more confusing to reason about because human work on this is so dysfunctional.) The main reason I focused on AI R&D is that I think it is much more likely to be fully automated first and seems like it is probably fully automated prior to AI takeover.
I think you can add mirror enzymes which can break down mirror carbs. Minimally we are aware of enzymes which break down mirror glucose.
No, sorry I was mostly focused on "such that if you didn't see them within 3 or 5 years, you'd majorly update about time to the type of AGI that might kill everyone". I didn't actually pick up on "most impressive" and actually tried to focus on something that occurs substantially before things get crazy.
Most impressive would probably be stuff like "automate all of AI R&D and greatly accelerate the pace of research at AI companies". (This seems about 35% likely to me within 5 years, so I'd update by at least that much.) But this hardly seems that interesting? I think we can agree that once the AIs are automating whole companies stuff is very near.
Importantly, this is an example of developing a specific application (surgical robot) rather than advancing the overall field (robots in general). It's unclear whether the analogy to an individual application or an overall field is more appropriate for AI safety.
I think if you look at "horizon length"---at what task duration (in terms of human completion time) do the AIs get the task right 50% of the time---the trends will indicate doubling times of maybe 4 months (though 6 months is plausible). Let's say 6 months more conservatively. I think AIs are at like 30 minutes on math? And 1 hour on software engineering. It's a bit unclear, but let's go with that. Then, to get to 64 hours on math, we'd need 7 doublings = 3.5 years. So, I think the naive trend extrapolation is much faster than you think? (And this estimate strikes me as conservative at least for math IMO.)
Consider tasks that quite good software engineers (maybe top 40% at Jane Street) typically do in 8 hours without substantial prior context on that exact task. (As in, 8 hour median completion time.) Now, we'll aim to sample these tasks such that the distribution and characteristics of these tasks are close to the distribution of work tasks in actual software engineering jobs (we probably can't get that close because of the limited context constraint, but we'll try).
In short timelines, I expect AIs will be able to succeed at these tasks 70% of the time with...
I would find this post much more useful to engage with if you more concretely described the type of tasks that you think AIs will remain bad and gave a bunch of examples. (Or at least made an argument for why it is hard to construct examples if that is your perspective.)
I think you're pointing to a category like "tasks that require lots of serial reasoning for humans, e.g., hard math problems particularly ones where the output should be a proof". But, I find this confusing, because we've pretty clearly seen huge progress on this in the last year such that ...
Sam also implies that GPT-5 will be based on o3.
IDK if Sam is trying to imply this GPT-5 will be "the AGI", but regardless, I think we can be pretty confident that o3 isn't capable enough to automate large fractions of cognitive labor let alone "outperform humans at most economically valuable work" (the original openai definition of AGI).
I think 0.4 is far on the lower end (maybe 15th percentile) for all the way down to one accelerated researcher, but seems pretty plausible at the margin.
As in, 0.4 suggests that 1000 researchers = 100 researchers at 2.5x speed which seems kinda reasonable while 1000 researchers = 1 researcher at 16x speed does seem kinda crazy / implausible.
So, I think my current median lambda at likely margins is like 0.55 or something and 0.4 is also pretty plausible at the margin.
See appendix B.3 in particular:
...Competitors receive a higher score for submitting their solutions faster. Because models can think in parallel and simultaneously attempt all problems, they have an innate advantage over humans. We elected to reduce this advantage in our primary results by estimating o3’s score for each solved problem as the median of the scores of the human participants that solved that problem in the contest with the same number of failed attempts.
We could instead use the model’s real thinking time to compute ratings. o3 uses a learned sc
I expect substantially more integrated systems than you do at the point when AIs are obsoleting (almost all) top human experts such that I don't expect these things will happen by default and indeed I think it might be quite hard to get them to work.
METR has a list of policies here. Notably, xAI does have a policy so that isn't correct on the tracker.
(I found it hard to find this policy, so I'm not surprised you missed it!)
Your description of GDM's policy doesn't take into account the FSF update.
However, it has yet to be fleshed out: mitigations have not been connected to risk thresholds
This is no longer fully true.
I'm a bit late for a review, but I've recently been reflecting on decision theory and this post came to mind.
When I initially saw this post I didn't make much of it. I now feel like the thesis of "decision theory is very confusing and messed up" is true, insightful, and pretty important based on spending more time engaging with particular problems (mostly related to acausal/simulation trade and other interactions). I don't know if the specific examples in this post aged well, but I think the bottom line is worth keeping in mind.
You are possibly the first person I know of who reacted to MONA with "that's obvious"
I also have the "that's obvious reaction", but possibly I'm missing somne details. I also think it won't perform well enough in practice to pencil given other better places to allocate safety budget (if it does trade off which is unclear).
It's just surprising that Sam is willing to say/confirm all of this given that AI companies normally at least try to be secretive.
I doubt that person was thinking about the opaque vector reasoning making it harder to catch the rogue AIs.
(I don't think it's good to add a canary in this case (the main concern would be takeover strategies, but I basically agree this isn't that helpful), but I think people might be reacting to "might be worth adding" and are disagree reacting to your comment because it says "are you actually serious" which seems more dismissive than needed. IMO, we want AIs trained on this if they aren't themselves very capable (to improve epistemics around takeover risk) and I feel close to indifferent for AIs that are plausibly very capable as the effect on takeover plans is small and you still get some small epistemic boost.)
There are two interpretations you might have for that third bullet:
(See also here.)
In the context of "can the AIs takeover?", I was trying to point to the rogue AI intepretation. As in, even if the AIs were rogue and had a rogue internal deployment inside the frontier AI company, how do they end up with actual hard power. For catching already rogue AIs and stopping them, opaque vector reasoning doesn't make much of a diffence.
I think there are good reasons to expect large fractions of humans might die even if humans immediately surrender:
For many people, "can the AIs actually take over" is a crux and seeing a story of this might help build some intuition.
Keeping the humans alive at this point is extremely cheap in terms of fraction of long term resource consumption while avoiding killing humans might substantially reduce the AI's chance of successful takeover.
Wow, that is a surprising amount of information. I wonder how reliable we should expect this to be.
I think you might first reach wildly superhuman AI via scaling up some sort of machine learning (and most of that is something well described as deep learning). Note that I said "needed". So, I would also count it as acceptable to build the AI with deep learning to allow for current tools to be applied even if something else would be more competitive.
(Note that I was responding to "between now and superintelligence", not claiming that this would generalize to all superintelligences built in the future.)
I agree that literal jupiter brains will very likely be built using something totally different than machine learning.
"fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks"
Suppose we replace "AIs" with "aliens" (or even, some other group of humans). Do you agree that doesn't (necessarily) kill you due to slop if you don't have a full solution to the superintelligence alignment problem?
Aliens kill you due to slop, humans depend on the details.
The basic issue here is that the problem of slop (i.e. outputs which look fine upon shallow review but aren't fine) plus the problem of aligning a parent-AI in such a way that its more-powerful descendants will robustly remain aligned, is already the core of the superintelligence alignment problem. You need to handle those problems in order to safely do the handoff, and at that point the core hard problems are done anyway. Same still applies to aliens: in order to safely do the handoff, you need to handle the "slop/nonslop is hard to verify" problem, and you need to handle the "make sure agents the aliens build will also be aligned, and their children, etc" problem.
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
Maybe, but it is interesting to note that:
It's not clear to me we'll have (or will "need") new paradigms before fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks.
If you want to not die to slop, then "fully handing over all technical and strategic work to AIs which are capable enough to obsolete humans at all cognitive tasks" not a thing which happens at all until the full superintelligence alignment problem is solved. That is how you die to slop.
This post seems to assume that research fields have big, hard central problems that are solved with some specific technique or paradigm.
This isn't always true. Many fields have the property that most of the work is on making small components work slightly better in ways that are very interoperable and don't have complex interactions. For instance, consider the case of making AIs more capable in the current paradigm. There are many different subcomponents which are mostly independent and interact mostly multiplicatively:
This post seems to assume that research fields have big hard central problems that are solved with some specific technique or paradigm.
This isn't always true. [...]
I would say it is basically-always true, but there are some fields (including deep learning today, for purposes of your comment) where the big hard central problems have already been solved, and therefore the many small pieces of progress on subproblems are all of what remains.
And insofar as there remains some problem which is simply not solvable within a certain paradigm, that is a "big hard ce...
Ok, I think what is going on here is maybe that the constant you're discussing here is different from the constant I was discussing. I was trying to discuss the question of how much worse serial labor is than parallel labor, but I think the lambda you're talking about takes into account compute bottlenecks and similar?
Not totally sure.
Lower lambda. I'd now use more like lambda = 0.4 as my median. There's really not much evidence pinning this down; I think Tamay Besiroglu thinks there's some evidence for values as low as 0.2.
Isn't this really implausible? This implies that if you had 1000 researchers/engineers of average skill at OpenAI doing AI R&D, this would be as good as having one average skill researcher running at 16x () speed. It does seem very slightly plausible that having someone as good as the best researcher/engineer at OpenAI run at 16x speed would be competit...
I agree, but it is important to note that the authors of the paper disagree here.
(It's somewhat hard for me to tell if the crux is more that they don't expect that everyone would get AI aligned to them (at least as representatives) even if this was technical feasible with zero alignment tax or if the crux is that even if everyone had single-single aligned corrigible AIs representing their interests and with control over their assets and power that would still result in disempowerment. I think it is more like second thing here.)
So Zvi is accurately representing the perspective of the authors, I just disagree with them.
Yes, but random people can't comment or post on the alignment forum and in practice I find that lots of AI relevant stuff doesn't make it there (and the frontpage is generally worse than my lesswrong frontpage after misc tweaking).
TBC, I'm not really trying to make a case that something should happen here, just trying to quickly articulate why I don't think the alignment forum fully addresses what I want.
[Mildly off topic]
I think it would be better if the default was that LW is a site about AI and longtermist cause areas and other stuff was hidden by default. (Or other stuff is shown by default on rat.lesswrong.com or whatever.)
Correspondingly, I wouldn't like penalizing multiple of the same tag.
I think the non-AI stuff is less good for the world than the AI stuff and there are downsides in having LW feature non-AI stuff from my perspective (e.g., it's more weird and random from the perspective of relevant people).
Your preferences are reasonable preferences, and also I disagree with them and plan to push the weird fanfiction and cognitive self-improvement angles on LessWrong. May I offer you a nice AlignmentForum in this trying time?
I don't disagree. I assumed Raemon intended something more elaborate than just a salient button with this effect.
Fair. For reference, here are my selections which are I think are a good default strategy for people who just come to LW for AI/AI safety reasons:
(Why "-" a bunch of stuff rather than "+" AI? Well, because "+" adds some fixed karma while "-" multiplies by a constant (less than 1), and I don't think adding karma is a good strategy (as it shows you really random posts often). I do like minus-ing these specific things also. If I could, I'd probably do AI * 1.5 and then reduce the negative weight on these things a bit.)
So this might be a good suggestion for a ...
I don't understand. Can't people just hide posts tagged as AI?
Another way to put this is that strategy stealing might not work due to technical alignment difficulties or for other reasons and I'm not sold the other reasons I've heard so far are very lethal. I do think the situation might really suck though with e.g. tons of people dying of bioweapons and with some groups that aren't sufficiently ruthless or which don't defer enough to AIs getting disempowered.
Yes, it is relevant for untrusted monitoring. I think the basic version of this is pretty resolvable via:
We discuss in more detail here.
I predict that group A survives, but the humans are no longer in power. I think this illustrates the basic dynamic. EtA: Do you understand what I'm getting at? Can you explain what you think it wrong with thinking of it this way?
I think something like this is a reasonable model but I have a few things I'd change.
Whichever group has more power at the end of the week survives.
Why can't both groups survive? Why is it winner takes all? Can we just talk about the relative change in power over the week? (As in, how much does the power of B reduce relat...
I view the world today as highly dysfunctional in many ways: corruption, coordination failures, preference falsification, coercion, inequality, etc. are rampant. This state of affairs both causes many bad outcomes and many aspects are self-reinforcing. I don't expect AI to fix these problems; I expect it to exacerbate them.
Sure, but these things don't result in non-human entities obtaining power right? Like usually these are somewhat negative sum, but mostly just involve inefficient transfer of power. I don't see why these mechanisms would on net tran...
The way the isoFLOPs are shaped suggests that 90-95% sparsity might turn out to be optimal here, that is you can only get worse loss with 98+% sparsity with 1e20 FLOPs, however you vary the number of epochs and active params!
I'm currently skeptical and more minimally, I don't understand the argument you're making. Probably not worth getting into.
I do think there will be a limit to how sparse you want to even in the very high compute relative to data regime for various reasons (computational if nothing else). I don't see how these graphs support 90-95% s...
It did OK at control.
So compute optimal MoEs don't improve data efficiency, don't contribute to mitigating data scarcity.
I agree compute optimal MoEs don't improve data utilization. But, naively you might expect that MoEs can be used to reduce issues with data scarcity at a fixed level of compute by training a much bigger model on a fixed amount of data.
As in, because there are returns to both more data and bigger models, you can use MoE to effectively use a much bigger model at the same compute.
Like, maybe you would have trained llama-3-405B on 15T tokens. You could instea...
This is overall output throughput not latency (which would be output tokens per second for a single context).
a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput
This just claims that you can run a bunch of parallel instances of R1.
Discussion in this other comment is relevant:
...Anthropic releasing their RSP was an important change in the AI safety landscape. The RSP was likely a substantial catalyst for policies like RSPs—which contain if-then commitments and more generally describe safety procedures—becoming more prominent. In particular, OpenAI now has a beta Preparedness Framework, Google DeepMind has a Frontier Safety Framework but there aren't any concrete publicly-known policies yet, many companies agreed to the Seoul commitments which require making a similar policy, and SB-10
importantly, in the dictators handbook case, some humans do actually get the power.
Yes, that is an accurate understanding.
(Not sure what you mean by "This would explain Sonnet's reaction below!")
Agreed, downvoted my comment. (You can't strong downvote your own comment, or I would have done that.)
I was mostly just trying to point to prior arguments against similar arguments while expressing my view.
(I don't expect o3-mini is a much better agent than 3.5 sonnet new out of the box, but probably a hybrid scaffold with o3 + 3.5 sonnet will be substantially better than 3.5 sonnet. Just o3 might also be very good. Putting aside cost, I think o1 is usually better than o3-mini on open ended programing agency tasks I think.)