I'm curious how good current robots are compared to where they'd need to be to automate the biggest bottlenecks in further robot production. You say we start from 10,000/year, but is it plausible that all current robots are too clumsy/incapable for many key bottleneck tasks, and that getting to 10,000 sufficiently good robots produced per year might be a long path - e.g. it would take a decade+ for humans? Or are current robots close to sufficient with good enough software?
I also imagine that even taking current robot production processes, the gap between ...
Yeah that's right, I made too broad a claim and only meant to say it was an argument against their ability to pose a threat as rogue independent agents.
I think in our current situation shutting down all rogue AI operations might be quite tough (though limiting their scale is certainly possible, and I agree with the critique that absent slowdowns or alignment problems or regulatory obstacles etc. it would be surprising if these rogue agents could compete with legitimate actors with many more GPUs).
Assuming the AI agents have money there are maybe three remaining constraints the AI agents need to deal with:
I did some BOTECs on this and think 1 GB/s is sort of borderline, probably works but not obviously.
E.g. I assumed a ~10TB at fp8 MoE model with a sparsity factor of 4 with 32768 hidden size.
With 32kB per token you could send at most 30k tokens/second over a 1GB/s interconnect. Not quite sure what a realistic utilization would be, but maybe we halve that to 15k?
If the model was split across 20 8xH100 boxes, then each box might do ~250 GFLOP/token (2 * 10T parameters / (4*20)), so each box would do at most 3.75 PFLOP/second, which might be about ~20-25% ut...
Things that distinguish an "RSP" or "RSP-type commitment" for me (though as with most concepts, something could lack a few of the items below and still overall seem like an "RSP" or a "draft aiming to eventually be an RSP")
Yeah, I agree that lack of agency skills are an important part of the remaining human<>AI gap, and that it's possible that this won't be too difficult to solve (and that this could then lead to rapid further recursive improvements). I was just pointing toward evidence that there is a gap at the moment, and that current systems are poorly described as AGI.
I agree the term AGI is rough and might be more misleading than it's worth in some cases. But I do quite strongly disagree that current models are 'AGI' in the sense most people intend.
Examples of very important areas where 'average humans' plausibly do way better than current transformers:
Current AIs suck at agency skills. Put a bunch of them in AutoGPT scaffolds and give them each their own computer and access to the internet and contact info for each other and let them run autonomously for weeks and... well I'm curious to find out what will happen, I expect it to be entertaining but not impressive or useful. Whereas, as you say, randomly sampled humans would form societies and fnd jobs etc.
This is the common thread behind all your examples Hjalmar. Once we teach our AIs agency (i.e. once they have lots of training-experience operating aut...
They do as far as I can tell commit to a fairly strong sort of "timeline" for implementing these things: before they scale to ASL-3 capable models (i.e. ones that pass their evals for autonomous capabilities or misuse potential).
ARC evals has only existed since last fall, so for obvious reasons we have not evaluated very early versions. Going forward I think it would be valuable and important to evaluate models during training or to scale up models in incremental steps.
I work at ARC evals, and mostly the answer is that this was sort of a trial run.
For some context ARC evals spun up as a project last fall, and collaborations with labs are very much in their infancy. We tried to be very open about limitations in the system card, and I think the takeaways from our evaluations so far should mostly be that "It seems urgent to figure out the techniques and processes for evaluating dangerous capabilities of models, several leading scaling labs (including OpenAI and Anthropic) have been working with ARC on early attempts to do t...
I might be missing something but are they not just giving the number of parameters (in millions of parameters) on a log10 scale? Scaling laws are usually by log-parameters, and I suppose they felt that it was cleaner to subtract the constant log(10^6) from all the results (e.g. taking log(1300) instead of log(1.3B)).
The B they put at the end is a bit weird though.
As someone who has been feeling increasingly skeptical of working in academia I really appreciate this post and discussion on it for challenging some of my thinking here.
I do want to respond especially to this part though, which seems cruxy to me:
...Furthermore, it is a mistake to simply focus on efforts on whatever timelines seem most likely; one should also consider tractability and neglectedness of strategies that target different timelines. It seems plausible that we are just screwed on short timelines, and somewhat longer timelines are more tractab
Strongly agree with this, I think this seems very important.
These sorts of problems are what caused me to want a presentation which didn't assume well-defined agents and boundaries in the ontology, but I'm not sure how it applies to the above - I am not looking for optimization as a behavioral pattern but as a concrete type of computation, which involves storing world-models and goals and doing active search for actions which further the goals. Neither a thermostat nor the world outside seem to do this from what I can see? I think I'm likely missing your point.
Theron Pummer has written about this precise thing in his paper on Spectrum Arguments, where he touches on this argument for "transitivity=>comparability" (here notably used as an argument against transitivity rather than an argument for comparability) and its relation to 'Sorites arguments' such as the one about sand heaps.
Personally I think the spectrum arguments are fairly convincing for making me believe in comparability, but I think there's a wide range of possible positions here and it's not entirely obvious which are actually inconsistent. Pummer
...Understanding the internal mechanics of corrigibility seems very important, and I think this post helped me get a more fine-grained understanding and vocabulary for it.
I've historically strongly preferred the type of corrigibility which comes from pointing to the goal and letting it be corrigible for instrumental reasons, I think largely because it seems very elegant and that when it works many good properties seem to pop out 'for free'. For instance, the agent is motivated to improve communication methods, avoid coercion, tile properly and even possibly i
...I really like this model of computation and how naturally it deals with counterfactuals, surprised it isn't talked about more often.
This raises the issue of abstraction - the core problem of embedded agency.
I'd like to understand this claim better - are you saying that the core problem of embedded agency is relating high-level agent models (represented as causal diagrams) to low-level physics models (also represented as causal diagrams)?
I wonder if you can extend it to also explain non-agentic approaches to Prosaic AI Alignment (and why some people prefer those).
I'm quite confused about what a non-agentic approach actually looks like, and I agree that extending this to give a proper account would be really interesting. A possible argument for actively avoiding 'agentic' models from this framework is:
Maybe distracting technicality:
This seems to make the simplifying assumption that the R&D automation is applied to a large fraction of all the compute that was previously driving algorithmic progress right?
If we imagine that a company only owns 10% of the compute being used to drive algorithmic progress pre-automation (and is only responsible for say 30% of its own algorithmic progress, with the rest coming from other labs/academia/open-source), and this company is the only one automating their AI R&D, then the effect on overall progress might be r... (read more)