(note: Quickly written. I've attempted to number my arguments "formally", but I have no training in this format. Edits/suggestions welcome.)
- Ability to generalize is "lumpy": unpredictable -- sometimes lots of inputs lead to little progress, sometimes small inputs lead to lots of progress.
- Takeoff is threshold-based: When machines get enough ability to generalize, takeoff will happen.
thus
- As we increase the ability of machines to generalize, we have a high chance of putting in a small amount of input and getting over the takeoff threshold.
Argument for claim 1:
- Lumpy input->output curves exist in individual domains as a result of forming new and powerful abstractions in those domains.
- Cognitive architecture components generalize across domains as in (4).
- Cognitive architecture is itself a "domain" as in (4) and thus lumpy.
thus
- (restated) If we get a new abstraction in the form of a cognitive architecture component that generalizes across domains, we will see rapid cross-domain progress, and thus ability to generalize is lumpy.
Evidence for claim 4 — "Narrow" lumpy generality — specifically, new and powerful abstraction gives rise to abrupt domain-specific performance improvements:
- Anecdotally, humans have a lumpy learning experience. It is common for people to talk about "Aha moments", "eureka moments" and so on.
- AlphaGo Zero seems somehow relevant (although I have trouble pointing at precisely how, but it convinces me regardless)
- The "Grokking" paper (thanks to Quintin Pope and other commenters!)
Evidence for claim 5 — cognitive architecture components generalize:
- Chimps -> humans: Humans' cognitive architecture is very general and allows us to quickly form a broad set of domain-specific abstractions that perform extremely well with little energy.
Evidence for claim 2 — takeoff is threshold-based:
- Chimps -> humans: chimps didn't take over the world but humans did.
- (The usual arguments for hardware overhang, ability to scale and copy what works, etc.)
Thanks! This is very helpful, and yes, I did mean to refer to grokking! Will update the post.