I have an impression that within lifetime human learning is orders of magnitude more sample efficient than large language models, but there are numerous caveats to this:
- We don't have "an ecological evaluation objective" for language models (they weren't actually optimised for the downstream language usage tasks on which we compare them to humans)
- Insomuch as we do have an ecological evaluation objective (predictive loss on the test set) language models are already very superhuman and apparently even GPT-1 was superhuman at next token prediction
- Though for similar reasons, next token prediction is not an ecological training objective for humans
- Humans that specialised at next token prediction (the way some humans specialise at chess) mat show markedly different results
- Though for similar reasons, next token prediction is not an ecological training objective for humans
- It's plausible that most of the optimisation involved in producing the brain happened over the course of our evolutionary history and within lifetime human learning is more analogous to fine tuning than to training from scratch.
#3 notwithstanding, I'm curious if we have any robust estimates for how within lifetime human learning compares to deep learning on sample efficiency across various tasks of interest.
Why Does This Matter?
The brain is known to be very energy efficient compared to GPUs of comparable processing power.
However, energy efficiency is just much less taut of a constraint for human engineering than it was for biology (electricity has a much higher throughput than ATP and we have a much larger energy budget). This relative energy abundance would likely remain the case (or rather intensify) as AI systems become more capable.
Thus, the energy efficiency of the brain does not provide much evidence with respect to whether advanced AGI will be neuromorphic.
On the other hand, it seems very plausible that data efficiency is just part and parcel of general intelligence. It may be the case that sufficiently powerful systems would necessarily be more data efficient than the brain (this seems very plausible to me).
If deep learning is sufficiently less data efficient than the brain, it may provide evidence that deep learning wouldn't produce existentially dangerous systems.
We may thus have reason not to expect deep learning to scale to superhuman general intelligence.
The average human lifespan is about 70 years or approximately 2.2 billion seconds. The average human brain contains about 86 billion neurons or roughly 100 trillion synaptic connections. In comparison, something like GPT-3 has 175 billion parameters and 500 billion tokens of data. Assuming very crudely weight/synapse and token/second of experience equivalence, we can see that the human model's ratio of parameters to data is much greater than GPT-3, to the point that humans have significantly more parameters than timesteps (100 trillion to 2.2 billion), while GPT-3 has significantly fewer parameters than timesteps (175 billion to 500 billion). Given the information gain per timestep is different for the two models, but as I said, these are crude approximations meant to convey the ballpark relative difference.
This means basically that humans are much more prone to overfitting the data, and in particular, memorizing individual data points. Hence why humans experience episodic memory of unique events. It's not clear that GPT-3 has the capacity in terms of parameters to memorize its training data with that level of clarity, and arguably this is why such models seem less sample efficient. A human can learn from a single example by memorizing it and retrieving it later when relevant. GPT-3 has to see it enough times in the training data for SGD to update the weights sufficiently that the general concept is embedded in the highly compressed information model.
It's thus, not certain whether or not existing ML models are sample inefficient because of the algorithms being used, or if its because they just don't have enough parameters yet, and increased efficiency will emerge from scaling further.