Consider this abridged history of recent ML progress:
A decade or two ago, computer vision was a field that employed dedicated researchers who designed specific increasingly complex feature recognizers (SIFT, SURF, HoG, etc.) These were usurped by deep CNNs with fully learned features in the 2010's[1], which subsequently saw success in speech recognition, various NLP tasks, and much of AI, competing with other general ANN models, namely various RNNs and LSTMs. Then SOTA in CNNs and NLP evolved separately towards increasingly complex architectures until the simpler/general transformers took over NLP and quickly spread to other domains (even RL), there also often competing with newer simpler/general architectures arising within those domains, such as MLP-mixers in vision. Waves of colonization in design-space.
So the pattern is: increasing human optimization power steadily pushing up architecture complexity is occasionally upset/reset by a new simpler more general model, where the new simple/general model substitutes human optimization power for automated machine optimization power[2], enabled by improved compute scaling, ala the bitter lesson. DL isn't just a new AI/ML technique, it's a paradigm shift.
Ok, fine, then what's next?
All of these models, from the earliest deep CNNs on GPUs up to GPT-3 and EfficientZero, generally have a few major design components that haven't much changed:
- Human designed architecture, rather than learned or SGD-learnable-at-all
- Human designed backprop SGD variant (with only a bit of evolution from vanilla SGD to Adam & friends)
Obviously there are research tracks in DL such as AutoML/Arch-search and Meta-learning aiming to automate the optimization of architecture and learning algorithms. They just haven't dominated yet.
So here is my hopefully-now-obvious prediction: in this new decade internal meta-optimization will take over, eventually leading to strongly recursively self optimizing learning machines: models that have broad general flexibility to adaptively reconfigure their internal architecture and learning algorithms dynamically based on the changing data environment/distribution and available compute resources[3].
If we just assume for a moment that the strong version of this hypothesis is correct, it suggests some pessimistic predictions for AI safety research:
- Interpretability will fail - future DL descendant is more of a black box, not less
- Human designed architectural constraint fails, as human designed architecture fails
- IRL/Value Learning is far more difficult than first appearances suggest, see #2
- Progress is hyper-exponential, not exponential. Thus trying to trend-predict DL superintelligence from transformer scaling is more difficult than trying to predict transformer scaling from pre 2000-ish ANN tech, long before rectifiers and deep layer training tricks.
- Global political coordination on constraints will likely fail, due to #4 and innate difficulty.
There is an analogy here to the history-revision attack against Bitcoin. Bitcoin's security derives from the computational sacrifice invested into the longest chain. But Moore's Law leads to an exponential decrease in the total cost of that sacrifice over time, which when combined with an exponential increase in total market cap, can lead to the surprising situation where recomputing the entire PoW history is not only plausible but profitable.[4]
In 2010 few predicted that computer Go would beat a human champion just 5 years hence[5], and far fewer (or none) predicted that a future successor of that system would do much better by relearning the entire history of Go strategy from scratch, essentially throwing out the entire human tech tree [6].
So it's quite possible that future meta-optimization throws out the entire human architecture/algorithm tech tree for something else substantially more effective[7]. The circuit algorithmic landscape lacks most all the complexity of the real world, and in that sense is arguably much more similar to Go or chess. Humans are general enough learning machines to do reasonably well at anything, but we can only apply a fraction of our brain capacity to such an evolutionary novel task, and tend to lose out to more specialized scaled up DL algorithms long before said algorithms outcompete humans at all tasks, or even everday tasks.
Yudkowsky anticipated recursive self-improvement would be the core thing that enables AGI/superintelligence. Reading over that 2008 essay now in 2021, I think he mostly got the gist of it right, even if he didn't foresee/bet that connectivism would be the winning paradigm. EY2008 seems to envision RSI as an explicit cognitive process where the AI reads research papers, discusses ideas with human researchers, and rewrites its own source code.
Instead in the recursive self-optimization through DL future we seem to be careening towards, the 'source code' is the ANN circuit architecture (as or more powerful than code), and reading human papers, discussing research: all that is unnecessary baggage, as unnecessary as it was for AlphaGo Zero to discuss chess with human chess experts over tea or study their games over lunch. History-revision attack, incoming.
So what can we do? In the worst case we have near-zero control over AGI architecture or learning algorithms. So that only leaves initial objective/utility functions, compute and training environment/data. Compute restriction is obvious and has an equally obvious direct tradeoff with capability - not much edge there.
Even a super powerful recursive self-optimizing machine initially starts with some seed utility/objective function at the very core. Unfortunately it increasingly looks like efficiency strongly demands some form of inherently unsafe self-motivation utility function, such as empowerment or creativity, and self-motivated agentic utility functions are the natural strong attractor[8].
Control over training environment/data is a major remaining lever that doesn't seem to be explored much, and probably has better capability/safety tradeoffs than compute. What you get out of the recursive self optimization or universal learning machinery is always a product of the data you put in, the embedded environment; that is ultimately what separates Go bots, image detectors, story writing AI, feral children, and unaligned superintelligences.
And then finally we can try to exert control on the base optimizer, which in this case is the whole technological research industrial economy. Starting fresh with a de novo system may be easier than orchestrating a coordination miracle from the current Powers.
Alexnet is typically considered the turning point, but the transition started earlier; sparse coding and RBMs are two examples of successful feature learning techniques pre-DL. ↩︎
If you go back far enough, the word 'computer' itself originally denoted a human occupation! This trend is at least a century old. ↩︎
DL ANNs do a form of approximate bayesian updating over the implied circuit architecture space with every backprop update, which already is a limited form of self-optimization. ↩︎
Blockchain systems have a simple defense against history-revision attack: checkpointing, but unfortunately that doesn't have a realistic equivalent in our case - we don't control the timestream. ↩︎
I would have bet against this; AlphaGo Zero surprised me far more than AlphaGo. ↩︎
Quite possible != inevitable. There is still a learning efficiency gap vs the brain, and I have uncertainty over how quickly we will progress past that gap, and what happens after. ↩︎
Tool-AI, like GPT-3, is a form of capability constraint, in that economic competition is always pressuring tool-AIs to become agent-AIs. ↩︎
TLDR: I think our crux reduces to some mix of these 3 key questions
So my core position is then:
In other words, future ML systems will reach a point where they evolve faster than we can understand. This may be a decade away or more, but it's not a century away.
For interpretability to scale in this scenario, you need to outsource it to already trusted systems (ie something like iterated amplification).
The system you are imaging sounds just equivalent to transformers. Content based dynamic routing, soft attention, and content-addressable memory are all actually just variations/descriptions of the same thing.
A matrix multiply of A*M where A consists of 1-hot row vectors is mathematically equivalent to an array of memory lookup ops, where each 1-hot row of A is a memory address referencing some row of memory matrix M. Relaxing the 1-hot constraint naturally can't make it less flexible than a memory lookup, it becomes a more general soft memory blend operation.
Then if you compute A with a nonlinear input layer of the form A = f(Q*K), where f is some competitive non-linearity, and Q and K are query and key matrices, that implements a more general soft version of content-based addressing, chain em together and you get soft content-addressable memory (which obviously is universal and equivalent/implements routing).
Standard relu deepnets don't use matrix transpose in the forward pass, only the back pass, and thus have fixed K and M matrices that only change slowly with SGD. They completely lack the ability to do attention/routing/memory operations over dynamic activations. Transformers add the transpose op as a fwd pass building block allowing the output activations to feed into K and/or M, which simultaneously enables universal attention/routing/memory operations.
I disagree - you may simply be failing to imagine the future as I do. Fully justifying why I disagree is not something I should do on a public forum, but I will say that the brain is already changing it's internals more quickly than you seem to be implying, and ANNs on von neumann hardware are potentially vastly more flexible in their ability to expand/morph existing layers, add modules, distill others, explore new pathways, learn to predict sub-module training trajectories, meta-learn to predict .. etc. The brain is limited by the topological constraints of both the far less flexible neuromorphic substrate and a constrained volume; constraints that largely do not apply to ANNs on von neumman hardware.
Getting out of the optimizer's way allows it to explore complexity beyond human capability. The bitter lesson is not one of simplicity beating complexity. It is about design complexity emergence shifting from the human optimization substrate to the machine optimization substrate. The main point I was making is that meta subsumes - moving from architecture to meta-learned meta-architecture (e.g. recursions of learning to learn the architecture of compressed hyper networks that generate/train the lower level architectures).
Both a 1950's computer and a 2021 GPU-based computer, each running some complex software of their era, accept/generate human interpretable inputs/outputs, but one is enormously more difficult to understand at any deep level.
Side note, but that's such a dumb name for a paper - it's the equivalent of "Feed-Forward Network Layers are Soft Threshold Memories", and only marginally better than "Linear Neural Network Layers Are Matrix Multiplies".