To build intuition about content vs architecture in AI (which comes up a lot in discussions about AI takeoff that involve Robin Hanson), I've been wondering about content vs architecture size (where size is measured in number of bits).
Here's how I'm operationalizing content and architecture size for ML systems:
- content size: The number of bits required to store the learned model of the ML system (e.g. all the floating point numbers in a neural network).
- architecture size: The number of bits of source code. I'm not sure if it makes sense to include the source code of supporting software (e.g. standard machine learning libraries).
I tried looking at the AlphaGo paper to see if I could find this kind of information, but after trying for about 30 minutes was unable to find what I wanted. I can't tell if this is because I'm not acquainted enough with the ML field to locate this information or if that information just isn't in the paper.
Is this information easily available for various ML systems? What is the fastest way to gather this information?
I'm also wondering about this same content vs architecture size split in humans. For humans one way I'm thinking of it is as "amount of information encoded in inheritance mechanisms" vs "amount of information encoded in a typical adult human brain". I know that Eliezer Yudkowsky has cited 750 megabytes as the amount of information in the human DNA, and also emphasizes that most of this information is junk. This was in 2011 and I don't know if there's a new consensus or how to factor in epigenetic information. There is also content stored in genes, and I'm not sure how to separate out the content and architecture in genes.
I'm pretty uncertain about whether this is even a good way to think about this topic, so I would also appreciate any feedback on this question itself. For example, if this isn't an interesting question to ask, I would like to know why.
OK, I think that helps.
It sounds like your question should really be more like how many programmer-hours go into putting domain-specific content / capabilities into an AI. (You can disagree.) If it's very high, then it's the Robin-Hanson-world where different companies make AI-for-domain-X, AI-for-domain-Y, etc., and they trade and collaborate. If it's very low, then it's more plausible that someone will have a good idea and Bam, they have an AGI. (Although it might still require huge amounts of compute.)
If so, I don't think the information content of the weights of a trained model is relevant. The weights are learned automatically. Changing the code from
num_hidden_layers = 10
tonum_hidden_layers = 100
is not 10× the programmer effort. (It may or may not require more compute, and it may or may not require more labeled examples, and it may or may not require more hyperparameter tuning, but those are all different things, and in no case is there any reason to think it's a factor of 10, except maybe some aspects of compute.)I don't think the size of the PyTorch codebase is relevant either.
I agree that the size of the human genome is relevant, as long as we all keep in mind that it's a massive upper bound, because perhaps a vanishingly small fraction of that is "domain-specific content / capabilities". Even within the brain, you have to synthesize tons of different proteins, control the concentrations of tons of chemicals, etc. etc.
I think the core of your question is generalizability. If you have AlphaStar but want to control a robot instead, how much extra code do you need to write? Do insights in computer vision help with NLP and vice-versa? That kind of stuff. I think generalizability has been pretty high in AI, although maybe that statement is so vague as to be vacuous. I'm thinking, for example, it's not like we have "BatchNorm for machine translation" and "BatchNorm for image segmentation" etc. It's the same BatchNorm.
On the brain side, I'm a big believer in the theory that the neocortex has one algorithm which simultaneously does planning, action, classification, prediction, etc. (The merging of action and understanding in particular is explained in my post here, see also Planning By Probabilistic Inference.) So that helps with generalizability. And I already mentioned my post on cortical uniformity. I think a programmer who knows the core neocortical algorithm and wants to then imitate the whole neocortex would mainly need (1) a database of "innate" region-to-region connections, organized by connection type (feedforward, feedback, hormone receptors) and structure (2D array of connections vs 1D, etc.), (2) a database of region-specific hyperparameters, especially when the region should lock itself down to prevent further learning ("sensitive periods"). Assuming that's the right starting point, I don't have a great sense for how many bits of data this is, but I think the information is out there in the developmental neuroscience literature. My wild guess right now would be on the order of a few KB, but with very low confidence. It's something I want to look into more when I get a chance. Note also that the would-be AGI engineer can potentially just figure out those few KB from the neuroscience literature, rather than discovering it in a more laborious way.
Oh, you also probably need code for certain non-neocortex functions like flagging human speech sounds as important to attend to etc. I suspect that that particular example is about as straightforward as it sounds, but there might be other things that are hard to do, or where it's not clear what needs to be done. Of course, for an aligned AGI, there could potentially be a lot of work required to sculpt the reward function.
Just thinking out loud :)