User Comment Replies

We are headed into an extreme compute overhang

This scenario now seems less likely with the OpenAI "O" series. It seems like we might reach AGI with heavy inference compute cost at first. This would mean much less overhang.

Orienting to 3 year AGI timelines

devrandom6mo103

The post doesn't seem to contemplate the effect that open-weights models will have on the take-off dynamics. For example, it seems like the DeepSeek V3 release shows that whatever performance is achieved at the frontier, is then achieved in open-weights at a much lower cost.

Given that, the centralization forces might not dominate.

Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety

devrandom10mo*12

There seem to be substantial problems with low probability events, coherent predictions over time, short term events, probabilities adding up to more than 100%, etc

A probabilistic oracle being inconsistent is completely besides the point. If I have a probabilistic oracle that has high accuracy but is sometimes inconsistent, I can just post-process the predictions to force them into a consistent format. For example, I can normalize the probabilities to 100%.

The economic value is in the overall accuracy. Being consistent is a cosmetic consideration.

We are headed into an extreme compute overhang

devrandom1y10

New Transformer specific chips from Etched are in the works. This might make inference even cheaper compared to compute.

We are headed into an extreme compute overhang

devrandom1y10

Post from Epoch AI about trading off training compute against inference compute.

We are headed into an extreme compute overhang

devrandom1y30

These are good points.

But don't the additional GPU requirements apply equally to training and inference? If that's the case, then the number of inference instances that can be run on training hardware (post-training) will still be on the order of 1e6.

1jacob_cannell1y

Not for transformers, for which training and inference are fundamentally different. Transformer training parallelizes over time, but that isn't feasible for inference. So transformer inference backends have to parallelize over batch/space, just like RNNs, which is enormously less efficient in RAM and RAM bandwidth use. So if you had a large attention model that uses say 1TB of KV cache (fast weights) and 1TB of slow weights, transformer training can often run near full efficiency, flop limited, parallelizing over time. But similar full efficient transformer inference would require running about K instances/agents in parallel, where K is the flop/mem_bw ratio (currently up to 1000 on H100). So that would be 1000 * 1TB of RAM for the KV cache (fast weights) as its unique per agent instance. This, in a nutshell, is part of why we don't already have AGI. Transformers are super efficient at absorbing book knowledge, but just as inefficient as RNNs at inference (generating new experiences, which is a key bottleneck on learning from experience). Thus there is of course much research in more efficient long kv cache, tree/graph inference that can share some of the KV cache over similar branching agents, etc

We are headed into an extreme compute overhang

devrandom1y10

https://www.lesswrong.com/posts/aH9R8amREaDSwFc97/rapid-capability-gain-around-supergenius-level-seems also seems relevant to this discussion.

We are headed into an extreme compute overhang

devrandom1y10

The main advantage is that you can immediately distribute fine-tunes to all of the copies. This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.

The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations:

AGI are by definition generalists
you can segment the population into specialists (see also this comment about MoE)

We are headed into an extreme compute overhang

devrandom1y10

I think this only holds if fine tunes are composable [...] you probably can't take a million independently-fine-tuned models and merge them [...]

The purpose of a fine-tune is to "internalize" some knowledge - either because it is important to have implicit knowledge of it, or because you want to develop a skill.

Although you may have a million instances executing tasks, the knowledge you want to internalize is likely much more sparse. For example, if an instance is tasked with exploring a portion of a search space, and it doesn't find a solution... (read more)

We are headed into an extreme compute overhang

devrandom1y10

On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24/7, exchange information electronically, etc.), will be able to significantly "outcompete" (in some fashion) 8 billion humans? This seems worth further exploration / justification.

Good point, but a couple of thoughts:

the operational definition of AGI referred in the article is significantly stronger than the average human
the humans are poorly organized
t

... (read more)

2snewman1y

All of this is plausible, but I'd encourage you to go through the exercise of working out these ideas in more detail. It'd be interesting reading and you might encounter some surprises / discover some things along the way. Note, for example, that the AGIs would be unlikely to focus on AI research and self-improvement if there were more economically valuable things for them to be doing, and if (very plausibly!) there were not more economically valuable things for them to be doing, why wouldn't a big chunk of the 8 billion humans have been working on AI research already (such that an additional 1.6 million agents working on this might not be an immediate game changer)? There might be good arguments to be made that the AGIs would make an important difference, but I think it's worth spelling them out.

We are headed into an extreme compute overhang

devrandom1y10

Thank you, I missed it while looking for prior art.

Evolution Solved Alignment (what sharp left turn?)

devrandom2y10

If we haven't seen such an extinction in the archaeological record, it can mean one of several things:

misalignment is rare, or
misalignment is not rare once the species becomes intelligent, but intelligence is rare or
intelligence usually results in transcendence, so there's only one transition before the bio becomes irrelevant in the lightcone (and we are it)

We don't know which. I think it's a combination of 2 and 3.

Introducing AlignmentSearch: An AI Alignment-Informed Conversional Agent

devrandom2y20

The app is not currently working - it complains about the token.

1BionicD0LPH1N2y

The API token was cancelled, sorry about that. The most recent version of the chatbot is now at https://chat.stampy.ai and https://chat.aisafety.info, and should not have the API token issue.

1William the Kiwi 2y

I have had a similar issue. The model outputs the comment: Incorrect API key provided: sk-5LW5C***************************************HGYd. You can find your API key at https://platform.openai.com/account/api-keys.

LOVE in a simbox is all you need

devrandom2y*30

and thus AGI arrives - quite predictably^[17] - around the end of Moore's Law

Given that the brain only consumes 20 W because of biological competitiveness constraints, and that 200 KW only costs around $20/hour in data centers, we can afford to be four OOMs less efficient than the brain while maintaining parity of capabilities. This results in AGI's potential arrival at least a couple of decades earlier than the end of Moore's Law.

LESSWRONG
LW

All of devrandom's Comments + Replies