The Brain is Not Close to Thermodynamic Limits on Computation
Introduction This post is written as a response to jacob_cannel's recent post Contra Yudkowsky on AI Doom. He writes: > EY correctly recognizes that thermodynamic efficiency is a key metric for computation/intelligence, and he confidently, brazenly claims (as of late 2021), that the brain is about 6 OOM from thermodynamic efficiency limits > > [...] > > EY is just completely out of his depth here: he doesn't seem to understand how the Landauer limit actually works, doesn't seem to understand that synapses are analog MACs which minimally require OOMs more energy than simple binary switches, doesn't seem to understand that interconnect dominates energy usage regardless, etc. Most of Jacob's analysis for brain efficiency is contained in this post: Brain Efficiency: Much More than You Wanted to Know. I believe this analysis is flawed with respect to the thermodynamic energy efficiency of the brain. That's the scope of this post: I will respond to Jacob's claims about thermodynamic limits on brain energy efficiency. Other constraints are out of scope, as is a discussion of the rest of the analysis in Brain Efficiency. The Landauer limit Just to review quickly, the Landauer limit says that erasing 1 bit of information has an energy cost of kTlog2. This energy must be dissipated as heat into the environment. Here k is Boltzmann's constant, while T is the temperature of the environment. At room temperature, this is about 0.02 eV. Erasing a bit is something that you have to do quite often in many types of computations, and the more bit erasures your computation needs, the more energy it costs to do that computation. (To give a general sense of how many erasures are needed to do a given amount of computation: If we add n-bit numbers a and b to get a+bmod2n, and then throw away the original values of a and b, that costs n bit erasures. I.e. the energy cost is nkTlog2.) Extra reliability costs? Brain Efficiency claims that the energy dissipation required to erase a bit
I don't have watertight arguments, but to try and state it cleanly:
- During inference, a forwards pass of the neural net is computed repeatedly as each token is generated. Activation vectors propagate from one layer to the next.
- Activation vectors are the main flow of information from earlier layers to later layers.
- The attention mechanism also allows activation vectors from previous tokens to influence the current computation. But crucially, this communication happens between activations at the same attention layer, it doesn't skip forwards or backwards in terms of layers.
- Thus, the only flow of information from later layers to earlier layers is contained in the sequence of tokens produced by the model.
- This is silly. Layer 1
... (read 464 more words →)