User Comment Replies

What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?

The population growth problem should be somewhat addressed by healthspan extension. A big reason as to why people aren't having kids now is that they lack the resources - be it housing, money, or time. If we could extend the average healthspan by a few decades, then older people who have spent enough time working to accumulate those resources, but are too old to raise children, should now be able have kids. Moreover, it means that people who are already have many kids but have just become too old will also be able to have more. For those reasons, I don't t... (read more)

GPT-4

ZeroRelevance2y10

Sorry for the late reply, but yeah, it was mostly vibes based on what I'd seen before. I've been looking over the benchmarks in the Technical Report again though, and I'm starting to feel like 500B+10T isn't too far off. Although language benchmarks are fairly similar, the improvements in mathematical capabilities over the previous SOTA is much larger than I first realised, and seem to match a model of that size considering the conventionally trained PaLM and its derivatives' performances.

GPT-4: What we (I) know about it

ZeroRelevance2y10

Apparently all OPT models were trained with a 2k token context length. So based on this, assuming basic O(n^2) scaling, an 8k token version of the 175B model would have the attention stage scale to about 35% of the FLOPS, and a 32k token version would scale to almost 90% of the FLOPS. 8k tokens is somewhat excusable, but 32k tokens is still overwhelmingly significant even with a 175B parameter model, costing around 840% more compute than a 2k token model. That percentage will probably only drop to a reasonable level at around the 10T parameter model level,... (read more)

GPT-4 solves Gary Marcus-induced flubs

ZeroRelevance2y33

I always get annoyed when people use this as an example of 'lacking intelligence'. Though it certainly is in part an issue with the model, the primary reason for this failure is much more likely the tokenization process than anything else. A GPT-4, likely even a GPT-3, trained with character-level tokenization would likely have zero issues answering these questions. It's for the same reason that the base GPT-3 struggled so much with rhyming for instance.

1Bezzi2y

Independently from the root causes of the issue, I am still very reluctant to define "superintelligent" something that cannot reliably count to three.

GPT-4

ZeroRelevance2y20

According to the Chinchilla paper, a compute-optimal model of that size should have ~500B parameters and have used ~10T tokens. Based on its GPT-4's demonstrated capabilities though, that's probably an overestimate.

2Lukas Finnveden2y

Are you saying that you would have expected GPT-4 to be stronger if it was 500B+10T? Is that based on benchmarks/extrapolations or vibes?

4sairjy2y

Yeah agree, I think it would make sense that's trained on 10x-20x the amount of tokens of GPT-3 so around 3-5T tokens (2x-3x Chinchilla) and that would give around 200-300b parameters giving those laws.

LESSWRONG
LW

All of ZeroRelevance's Comments + Replies