Wei_Dai comments on Steelmaning AI risk critiques - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (98)
Do you have a citation for this? My understanding is that biological neural networks operate far from the Landauer Limit (sorry I couldn't find a better citation but this seems to be a common understanding), whereas we already have proposals for hardware that is near that limit.
I should probably rephrase the brain optimality argument, as it isn't just about energy per se. The brain is on the pareto efficiency surface - it is optimal with respect to some complex tradeoffs between area/volume, energy, and speed/latency.
Energy is pretty dominant, so it's much closer to those limits than the rest. The typical futurist understanding about the Landauer limit is not even wrong - way off, as I point out in my earlier reply below and related links.
A consequence of the brain being near optimal for energy of computation for intelligence given it's structure is that it is also near optimal in terms of intelligence per switching events.
The brain computes with just around 10^14 switching events per second (10^14 synapses * 1 hz average firing rate). That is something of an upper bound for the average firing rate.1
The typical synapse is very small, has a low SNR and thus is equivalent to a low bit op, and only activates maybe 25% of the time.2 We can roughly compare these minimal SNR analog ops with the high precision single bit ops that digital transistors implement. The landauer principle allows us to rate them as reasonably equivalent in computational power.
So the brain computes with just 10^14 switching events per second. That is essentially miraculous. A modern GPU uses perhaps 10^18 switching events per second.
So the important thing here is not just energy - but overall circuit efficiency. The brain is crazy super efficient - and as far as we can tell near optimal - in its use of computation towards intelligence.
This explains why our best SOTA techniques in almost all AI are some version of brain-like ANNs (the key defining principle being search/optimization over circuit space). It predicts that the best we can do for AGI is to reverse engineer the brain. Yes eventually we will scale far beyond the brain, but that doesn't mean that we will use radically different algorithms.
What do you mean by, given its structure? Does this still leave open that a brain with some differences in organization could get more intelligence out of the same number of switching events per second?
Similarly, I assume the same argument applies to all animal brains. Do you happen to have stats on the number of switching events per second for e.g. the chimpanzee?
EDIT: see this comment and this comment on reddit for some references on circuit efficiency.
Computers are circuits and thus networks/graphs. For primitive devices the switches (nodes) are huge so they use up significant energy. For advanced devices the switches are not much larger than wires, and the wire energy dominates. If you look at the cross section of a modern chip, it contains a hierarchy of metal layers of decreasing wire size, with the transistors at the bottom. The side view section of the cortex looks similar with vasculature and long distance wiring taking the place of the upper meta layers.
The vast majority of the volume in both modern digital circuits and brain circuits consists of wiring. The transistors and the synapses are just tiny little things in comparison.
Modern computer mem systems have a wire energy eff of around 10^-12 to 10^-13 J/bit/mm. The limit for reliable signals is perhaps only 10x better. I think the absolute limit for unreliable bits is 10^-15 or so, will check citation for that when I get home. Wire energy eff for bandwidth is not improving at all and hasn't since the 90's. The next big innovation is simply moving the memory closer , that's about all we can do.
The min wire energy is close to that predicted by a simple model of a molecular wire where each molecule sized 1 nm section is a switch (10^-19 to 10^-21 * 10^6 = 10^-13 to 10^-15). In reality of course it's somewhat more complex - smaller wires actually dissipate more energy, but also require less to represent a signal.
Also keep in mind that synapses are analog devices which require analog impulse inputs and outputs - they do more work than a single binary switch.
So moores law is ending and we are already pretty close to the limits of wire efficiency. If you add up the wiring paths in the brain you get a similar estimate. Axons/dendrites appear to be at least as efficient as digital wires and are thus near optimal. None of this should be surprising - biological cells are energy optimal true nanocomputers. Neural circuits evolved from the bottom up - there was never a time at which they were inefficient.
However, it is possible to avoid wire dissipation entirely with some reversible signal path. Optics is one route but photons and thus photonic devices are impractically large. The other option is superconducting circuits, which work in labs but also have far too many disadvantages to be practical yet. Eventually cold superconducting reversible computers could bypass energy issues, but that tech appears to be far.
What about just replacing the copper wire inside a conventional CMOS chip with a superductor? It took some searching, but I managed to find a paper titled Cryogenically Cooled CMOS which talks about the benefits and feasibility of doing this. Quoting from the relevant section:
So it looks like there's no fundamental reason why it couldn't be done, just a matter of finding the right substrate material and solving other engineering problems.
That is the type of tech I was referring to by superconducting circuits as precursor to full reversible. From what I understand, if you chill everything down then you also change resistance in the semiconductor along with all the other properties, so it probably isn't as easy as just replacing the copper wires.
A room temperature superconductor circuit breakthrough is one of the main wild cards over the next decade or so. Cryogenic cooling is pretty impractical for mainstream computing.
Yeah, its just a question of timetables. If it's decades away, we have a longer period of stalled moore's law during which AGI will slowly surpass the brain, rather than rapidly.
From the sources I've read, there aren't any major issues running CMOS at 77 K, you only run into problems at lower temperatures, less than 40 K. I guess people aren't seriously trying this because it's probably not much harder to go directly to full superconducting computers (i.e., with logic gates made out of superconductors as well) which offers a lot more benefits. Here is an article about a major IARPA project pursuing that. It doesn't seem safe to assume that we'll get AGI before we get superconducting computers. Do you disagree, if so can you explain why?
There was similar interest in superconducting chips about a decade ago which was pretty much the same story - DARPA/IARPA spearheading research, major customer would be US intelligence.
The 500 gigaflops per watt figure is about 100 times more computation/watt than on a current GPU - which is useful because it shows that about 99% of GPU energy cost is interconnect/wiring.
In terms of viability and impact, it is still uncertain how much funding superconducting circuits will require to become competitive. And even if it is competitive in some markets for say the NSA, that doesn't make it competitive for general consumer markets. Cryogenic cooling means these things will only work in very special data rooms - so the market is more niche.
The bigger issue though is total cost competitiveness. GPUs are sort of balanced in that the energy cost is about half of the TCO (total cost of ownership). It is extremely unlikely that superconducting chips will be competitive in total cost of computation in the near future. All the various tradeoffs in a superconducting design and the overall newness of the tech imply lower circuit densities. Smaller market implies less research amortization and higher costs. Even if a superconducting chip used 0 energy, it will still be much more expensive and provide less ops/$.
Once we run out of scope for further CPU/GPU improvements over the next decade, then the TCO budget will shift increasingly towards energy, and these types of chips will become increasing viable. So I'd estimate that the probability of impact in the next 5 years is small, but 10 years or more out it's harder to say. To make a more viable forecast I'd need to read more on this tech and understand more about the costs of cryogenic cooling.
But really roughly - the net effect of this could be to add another leg to moore's law style growth, at least for server computation.
It takes energy to maintain cryogenic temperatures, probably much more than the energy that would be saved by eliminating wire resistance. If I understand correctly, the interest in superconducting circuits is mostly in using them to implement quantum computation.
Barring room temperature superconductors, there are probably no benefits of using superconducting circuits for classical computation.
From the article I linked to:
ETA: 100 petaflops per 200 kW equals 500 gigaflops per watt, so it's estimated to be about 100 times more energy efficient.
Ok, I guess it depends on how big your computer is, due to the square-cube law. Bigger computers would be at an advantage.
As the efficiency of a logically irreversible computer approaches the Landauer limit, its speed must approach zero, for the same reason why as the efficiency of a heat engine approaches the Carnot limit its speed must approach zero.
I don't have an equation at hand, but I wouldn't be surprised if it turned out that biological neurons operate close to the physical limit for their speed.
EDIT:
I found this Physics Stack Exchange answer about the thermodynamic efficiency of human muscles.
Hmm... after more searching, I found this page, which says:
So biological neurons still don't seem to be near the physical limit since they fire at only around 100 hz and according to my previous link dissipates millions to billions times more than k_B T ln(2).
A 100kT signal Is only reliable for a distance of a few nanometers. The energy cost is all in pushing signals through wires. So the synapse signal is a million times larger than 100kT to cross a distance of around 1 mm or so, which works out to 10^-13 J per synaptic event. Thus 10 watts for 10^14 synapses and a 1 hz rate. For a 100 hz rate, the average dist would need to be less.
Not my field of expertise, but I don't understand where this bound comes form. In this paper for short erasure cycles they find an exponential law, although they don't give the constants (I suppose they are system-dependent).