Software engineer and repeat startup founder; best known for Writely (aka Google Docs). Now starting https://www.aisoup.org to foster constructive expert conversations about open questions in AI and AI policy, and posting at https://amistrongeryet.substack.com and https://x.com/snewmanpv.
Thanks.
I'm now very strongly feeling the need to explore the question of what sorts of activities go into creating better models, what sorts of expertise are needed, and how that might change as things move forward. Which unfortunately I know ~nothing about, so I'll have to find some folks who are willing to let me pick their brains...
Thanks! I agree that my statements about Amdahl's Law primarily hinge on my misunderstanding of the milestones, as elucidated in the back-and-forth with Ryan. I need to digest that; as Ryan anticipates, possibly I'll wind up with thoughts worth sharing regarding the "human-only, software-only" time estimates, especially for the earlier stages, but it'll take me some time to chew on that.
(As a minor point of feedback, I'd suggest adding a bit of material near the top of the timelines and/or takeoff forecasts, clarifying the range of activities meant to be included in "superhuman coder" and "superhuman AI researcher", e.g. listing some activities that are and are not in scope. I was startled to see Ryan say "my sense is that an SAR has to be better than humans at basically everything except vision"; I would never have guessed that was the intended interpretation.)
I've (briefly) addressed the compute bottleneck question on a different comment branch, and "hard-to-automate activities aren't a problem" on another (confusion regarding the definition of various milestones).
[Dependence on Narrow Data Sets] is only applicable to the timeline to the superhuman coder milestone, not to takeoff speeds once we have a superhuman coder. (Or maybe you think a similar argument applies to the time between superhuman coder and SAR.)
I do think it applies, if indirectly. Most data relating to progress in AI capabilities comes from benchmarks of crisply encapsulated tasks. I worry this may skew our collective intuitions regarding progress toward broader capabilities, especially as I haven't seen much attention paid to exploring the delta between things we currently benchmark and "everything".
Hofstadter's Law As Prior
Math: We're talking about speed up relative to what the human researchers would have done by default, so this just divides both sides equally and cancels out.
This feels like one of those "the difference between theory and practice is smaller in theory than in practice" situations... Hofstadter's Law would imply that Hofstadter's Law applies here. :-)
For one concrete example of how that could manifest, perhaps there is a delay between "AI models exist that are superhuman at all activities involved in developing better models" and "those models have been fully adopted across the organization". Interior to a frontier lab, that specific delay might be immaterial, it's just meant as an existence proof that there's room for us to be missing things.
I think my short, narrowly technical response to this would be "agreed".
Additional thoughts, which I would love your perspective on:
1. I feel like the idea that human activities involved in creating better models are broader than just, like, stereotypical things an ML Ph.D would do, is under-explored. Elsewhere in this thread you say "my sense is that an SAR has to be better than humans at basically everything except vision." There's a lot to unpack there, and I don't think I've seen it discussed anywhere, including in AI 2027. Do stereotypical things an ML Ph.D would do constitute 95% of the work? 50%? Less? Does the rest of the work mostly consist of other sorts of narrowly technical software work (coding, distributed systems design, etc.), or is there broad spillover into other areas of expertise, including non-STEM expertise? What does that look like? Etc.
(I try to make this point a lot, generally don't get much acknowledgement, and as a result have started to feel a bit like a crazy person. I appreciate you giving some validation to the idea. Please let me know if you suspect I've over-interpreted that validation.)
1a. Why "except vision"? Does an SAR have to be superhuman at creative writing, so that it can push forward creative writing capabilities in future models? (Obviously, substitute any number of other expertise domains for "creative writing".) If yes, then why doesn't it also need to be superhuman at vision (so that it can push forward vision capabilities)? If no, then presumably creative writing is one of the exceptions implied by the "basically" qualifier, what else falls in there?
2. "Superhuman AI researcher" feels like a very bad term for a system that is meant to be superhuman at the full range of activities involved in producing better models. It strongly suggests a narrower set of capabilities, thus making it hard to hold onto the idea that a broad definition is intended. Less critically, it also seems worthwhile to better define what is meant to fall within the umbrella of "superhuman coder".
3. As I read through AI 2027 and then wrote my post here, I was confused as to the breadth of skills meant to be implied by "superhuman coder" and (especially) "superhuman AI researcher", and probably did not maintain a consistent definition in my head, which may have confused my thinking.
4. I didn't spend much time evaluating the reasoning behind the estimated speedups at each milestone (5x, 25x, 250x, 2000x). I might have more to say after digging into that. If/when I find the time, that, plus the discussion we've just had here, might be enough grist for a followup post.
We now have several branches going, I'm going to consolidate most of my response in just one branch since they're converting onto similar questions anyway. Here, I'll just address this:
But, when considering activities that aren't bottlenecked on the environment, then to achieve 10x acceleration you just need 10 more speed at the same level of capability.
I'm imagining that, at some intermediate stages of development, there will be skills for which AI does not even match human capability (for the relevant humans), and its outputs are of unusably low quality.
This is valid, but doesn't really engage with the specific arguments here. By definition, when we consider the potential for AI to accelerate the path to ASI, we are contemplating the capabilities of something that is not a full ASI. Today's models have extremely jagged capabilities, with lots of holes, and (I would argue) they aren't anywhere near exhibiting sophisticated high-level planning skills able to route around their own limitations. So the question becomes, what is the shape of the curve of AI filling in weak capabilities and/or developing sophisticated strategies for routing around those weaknesses?
Maybe just by training very small models very quickly, they can discover a ton of new technologies which can scale to large models.
This is exactly missing the point. Training a cutting-edge model today involves a broad range of activities, not all of which fall under the heading of "discovering technologies" or "improving algorithms" or whatever. I am arguing that if all you can do is find better algorithms rapidly, that's valuable but it's not going to speed up overall progress by very large factors. Also, it may be that "by training very small models very quickly", the AI would discover new technologies that improve some aspects of models but fail to advance some other important aspects.
Sure, but for output quality better than what humans could (ever) do to matter for the relative speed up, you have to argue about compute bottlenecks, not Amdahl's law for just the automation itself!
I'm having trouble parsing this sentence... which may not be important – the rest of what you've said seems clear, so unless there's a separate idea here that needs responding to then it's fine.
It sounds like your actual objection is in the human-only, software-only time from superhuman coder to SAR (you think this would take more than 1.5-10 years).
Or perhaps your objection is that you think there will be a smaller AI R&D multiplier for superhuman coders. (But this isn't relevant once you hit full automation!)
Agreed that these two statements do a fairly good job of characterizing my objection. I think the discussion is somewhat confused by the term "AI researcher". Presumably, for an SAR to accelerate R&D by 25x, "AI researcher" needs to cover nearly all human activities that go into AI R&D? And even more so for SAIR/250x. While I've never worked at an AI lab, I presume that the full set of activities involved in producing better models is pretty broad, with tails extending into domains pretty far from the subject matter of an ML Ph.D and sometimes carried out by people whose job titles and career paths bear no resemblance to "AI researcher". Is that a fair statement?
If "producing better models" (AI R&D) requires more than just narrow "AI research" skills, then either SAR and SAIR need to be defined to cover that broader skill set (in which case, yes, I'd argue that 1.5-10 years is unreasonably short for unaccelerated SC->SAR), or if we stick with narrower definitions for SAR and SAIR then, yes, I'd argue for smaller multipliers.
This is valid for activities which benefit from speed and scale. But when output quality is paramount, speed and scale may not always provide much help?
My mental model is that, for some time to come, there will be activities where AIs simply aren't very competent at all, such that even many copies running at high speed won't provide uplift. For instance, if AIs aren't in general able to make good choices regarding which experiments to run next, then even an army of very fast poor-experiment-choosers might not be worth much, we might still need to rely on people to choose experiments. Or if AIs aren't much good at evaluating strategic business plans, it might be hard to train AIs to be better at running a business (a component of the SAIR -> ASI transition) without relying on human input for that task.
For Amdah's Law purposes, I've been shorthanding "incompetent AIs that don't become useful for a task even when taking speed + scale into account" as "AI doesn't provide uplift for that task".
EDIT: of course, in practice it's generally at least somewhat possible to trade speed+scale for quality, e.g. using consensus algorithms, or generate-and-test if you have a good way of identifying the best output. So a further refinement is to say that very high acceleration requires us to assume that this does not reach importantly diminishing returns in a significant set of activities.
EDIT2:
(My sense is that the progress multipliers in AI 2027 are too high but also that the human-only times between milestones are somewhat too long. On net, this makes me expect somewhat slower takeoff with a substantial chance on much slower takeoff.)
I find this quite plausible.
Yes, but you're assuming that human-driven AI R&D is very highly bottlenecked on a single, highly serial task, which is simply not the case. (If you disagree: which specific narrow activity are you referring to that constitutes the non-parallelizable bottleneck?)
Amdahl's Law isn't just a bit of math, it's a bit of math coupled with long experience of how complex systems tend to decompose in practice.
Thanks everyone for all the feedback and answers to my unending questions! The branching comments are starting to become too much to handle, so I'm going to take a breather and then write a followup post – hopefully by the end of the week but we'll see – in which I'll share some consolidated thoughts on the new (to me) ideas that surfaced here and also respond to some specific points.