(note: Quickly written. I've attempted to number my arguments "formally", but I have no training in this format. Edits/suggestions welcome.)
- Ability to generalize is "lumpy": unpredictable -- sometimes lots of inputs lead to little progress, sometimes small inputs lead to lots of progress.
- Takeoff is threshold-based: When machines get enough ability to generalize, takeoff will happen.
thus
- As we increase the ability of machines to generalize, we have a high chance of putting in a small amount of input and getting over the takeoff threshold.
Argument for claim 1:
- Lumpy input->output curves exist in individual domains as a result of forming new and powerful abstractions in those domains.
- Cognitive architecture components generalize across domains as in (4).
- Cognitive architecture is itself a "domain" as in (4) and thus lumpy.
thus
- (restated) If we get a new abstraction in the form of a cognitive architecture component that generalizes across domains, we will see rapid cross-domain progress, and thus ability to generalize is lumpy.
Evidence for claim 4 — "Narrow" lumpy generality — specifically, new and powerful abstraction gives rise to abrupt domain-specific performance improvements:
- Anecdotally, humans have a lumpy learning experience. It is common for people to talk about "Aha moments", "eureka moments" and so on.
- AlphaGo Zero seems somehow relevant (although I have trouble pointing at precisely how, but it convinces me regardless)
- The "Grokking" paper (thanks to Quintin Pope and other commenters!)
Evidence for claim 5 — cognitive architecture components generalize:
- Chimps -> humans: Humans' cognitive architecture is very general and allows us to quickly form a broad set of domain-specific abstractions that perform extremely well with little energy.
Evidence for claim 2 — takeoff is threshold-based:
- Chimps -> humans: chimps didn't take over the world but humans did.
- (The usual arguments for hardware overhang, ability to scale and copy what works, etc.)
This has been my vague intuition as well, and I'm confused as to where exactly people think this argument goes wrong. So I would appreciate some rebuttals to this.
For 9, are you thinking of grokking?
Thanks! Am probably convinced by the third point, unsure about the others due to not having much time to think at the moment.