anaguma - LessWrong

I wonder how you found out about the Curve conference? I’ve been to a few fun Lighthaven events (e.g. Manifest), and I’m looking for a way to be alerted to new ones.

keltan's Shortform

anaguma5d10

What signal do we get from DeepSeek continuing to publish?

Daniel Kokotajlo's Shortform

anaguma9d10

I wonder by what factor do you think automated AI R&D speeds up research? How large is the effective compute overhang made possible by the discovery of better architectures and training methods?

Why Don't We Just... Shoggoth+Face+Paraphraser?

anaguma14d100

I'm confused about why we would expect the misaligned behavior to be displayed in the Shoggoth rather than the Face. For example, let's take the setting from the AI control paper, where a misaligned model is trying to insert a backdoor in a coding question, despite it passing a series of unit tests. This could be incentivized in the training loop if e.g. the reward supervision is weak and the backdoor grants the model access to additional resources/compute. I can see a few strategies succeeding:

The Shoggoth speculates on different ways to insert a backdoor, and the Face writes a solution using one of these backdoors.
The Shoggoth outputs the correct solution to the coding question, and the Face modifies it to insert a backdoor.
Some combination of 1) and 2), for instance the Shoggoth writes several versions of the function, one of which contains a backdoor, and the Face returns the version with the backdoor.

All of these result in the Face outputting potentially same backdoored function, and should be equally rewarded by the reward model. Why should we expect that 2) is less likely than 1) or 3)?

Why Don't We Just... Shoggoth+Face+Paraphraser?

anaguma14d50

I wonder if you could say more about how the training pipeline would work. E.g. if the reward model is applied to the outputs of the face, how do we train the Shoggoth to produce useful CoTs for the face? Is the idea to fix the Face, sample many CoTs from the Shoggoth, and fine-tune the Shoggoth on the ones which achieve high reward according to the Face's output?

AI #89: Trump Card

anaguma23d32

So $1 buys 7e17 useful FLOPs, or inference with 75-120B^[1] active parameters for 1 million tokens.

Is this right? My impression was that the 6ND (or 9.6 ND) estimate was for training, not inference. E.g. in the original scaling law paper, it states

C ~ 6 NBS – an estimate of the total non-embedding training compute, where B is the batch size, and S is the number of training steps (ie parameter updates).

Bogdan Ionut Cirstea's Shortform

anaguma1mo60

An important caveat to the data movement limit:

“A recent paper which was published only a few days before the publication of our own work, Zhang et al. (2024), finds a scaling of B = 17.75 D^0.47 (in units of tokens). If we rigorously take this more aggressive scaling into account in our model, the fall in utilization is pushed out by two orders of magnitude; starting around 3e30 instead of 2e28. Of course, even more aggressive scaling might be possible with methods that Zhang et al. (2024) do not explore, such as using alternative optimizers.”

I haven’t looked carefully at Zhang et al., but assuming their analysis is correct and the data wall is at 3e30 FLOP, it’s plausible that we hit resource constraints ($10-100 trillion training runs, 2-20 TW power required) before we hit the data movement limit.

Bogdan Ionut Cirstea's Shortform

anaguma1mo100

The post argues that there is a latency limit at 2e31 FLOP, and I've found it useful to put this scale into perspective.

Current public models such as Llama 3 405B are estimated to be trained with ~4e25 flops , so such a model would require 500,000 x more compute. Since Llama 3 405B was trained with 16,000 H-100 GPUs, the model would require 8 billion H-100 GPU equivalents, at a cost of $320 trillion with H-100 pricing (or ~$100 trillion if we use B-200s). Perhaps future hardware would reduce these costs by an order of magnitude, but this is cancelled out by another factor; the 2e31 limit assumes a training time of only 3 months. If we were to build such a system over several years and had the patience to wait an additional 3 years for the training run to complete, this pushes the latency limit out by another order of magnitude. So at the point where we are bound by the latency limit, we are either investing a significant percentage of world GDP into the project, or we have already reached ASI at a smaller scale of compute and are using it to dramatically reduce compute costs for successor models.

Of course none of this analysis applies to the earlier data limit of 2e28 flop, which I think is more relevant and interesting.

Bogdan Ionut Cirstea's Shortform

anaguma1mo12

Interesting paper, though the estimates here don’t seem to account for Epoch’s correction to the chinchilla scaling laws: https://epochai.org/blog/chinchilla-scaling-a-replication-attempt

This would imply that the data movement bottleneck is a bit further out.

Dario Amodei — Machines of Loving Grace

anaguma2mo65

Maybe he thinks that much faster technological progress would cause social problems and thus wouldn’t be implemented by an aligned AI, even if it were possible. Footnote 2 points at this:

“ I do anticipate some minority of people’s reaction will be ‘this is pretty tame’… But more importantly, tame is good from a societal perspective. I think there’s only so much change people can handle at once, and the pace I’m describing is probably close to the limits of what society can absorb without extreme turbulence. ”

A separate part of the introduction argues that causing this extreme societal turbulence would be unaligned:

“Many things cannot be done without breaking laws, harming humans, or messing up society. An aligned AI would not want to do these things (and if we have an unaligned AI, we’re back to talking about risks). Many human societal structures are inefficient or even actively harmful, but are hard to change while respecting constraints like legal requirements on clinical trials, people’s willingness to change their habits, or the behavior of governments. ”

LESSWRONG
is fundraising!
LW
$

Posts

Wiki Contributions

Comments