There is quite a large literature on "stage-wise development" in neuroscience and psychology, going back to people like Piaget but quite extensively developed in both theoretical and experimental directions. One concrete place to start on the agenda you're outlining here might be to systematically survey that literature from an SLT-informed perspective.
SLT predicts when this will happen!
Maybe. This is potentially part of the explanation for "data double descent" although I haven't thought about it beyond the 5min I spent writing that page and the 30min I spent talking about it with you at the June conference. I'd be very interested to see someone explore this more systematically (e.g. in the setting of Anthropic's "other" TMS paper https://www.anthropic.com/index/superposition-memorization-and-double-descent which contains data double descent in a setting where the theory of our recent TMS paper might allow you to do something).
Though I'm not fully confident that is indeed what they did
The k-gons are critical points of the loss, and as varies the free energy is determined by integrals restricted to neighbourhoods of these critical points in weight space.
Are brains singular do you think?
Note that in the SLT setting, "brains" or "neural networks" are not the sorts of things that can be singular (or really, have a certain ) on their own - instead they're singular for certain distributions of data. So the question is whether brains are singular on real-world data. This matters: e.g. neural networks are more singular on some data (for example, data generated by a thinner neural network) than on others. [EDIT: I'm right about the RLCT but wrong about what 'being singular' means, my apologies.]
Anyway here's roughly how you could tell the answer: if your brain were "optimal" on the data it saw, how many different ways would there be of continuously perturbing your brain such that it were still optimal? The more ways, the more singular you are.
Singularity is actually a property of the parameter function map, not the data distribution. The RLCT is defined in terms of the loss function/reward and the parameter function map. See definition 1.7 of the grey book for the definition of singular, strictly singular, and regular models.
Edit: To clarify, you do need the loss function & a set of data (or in the case of RL and the human brain, the reward signals) in order to talk about the singularities of a parameter-function map, and to calculate the RLCT. You just don't need them to make the statement that the parameter-function map is strictly singular.
One thing I noticed when reflecting on this dialogue later was that I really wasn't considering the data distribution's role in creating the loss landscape. So thanks for bringing this up!
Suppose I had some separation of the features of my brain into "parameters" and "activations". Would my brain be singular if there were multiple values the parameters could take such that for all possible inputs the activations were the same? Or would it have to be that those parameters were also local minima?
(I suppose it's not that realistic that the activations would be the same for all inputs, even assuming the separation into parameters and activations, because some inputs vaporise my brain)
Note that in the SLT setting, "brains" or "neural networks" are not the sorts of things that can be singular (or really, have a certain ) on their own - instead they're singular for certain distributions of data.
This is a good point I often see neglected. Though there's some sense in which a model can "be singular" independent of data: if the parameter-to-function map is not locally injective. Then, if a distribution minimizes the loss, the preimage of in parameter space can have non-trivial geometry.
These are called "degeneracies," and they can be understood for a particular model without talking about data. Though the actual that minimizes the loss is determined by data, so it's sort of like the "menu" of degeneracies are data-independent, and the data "selects one off the menu." Degeneracies imply singularities, but not necessarily vice-versa, so they aren't everything. But we do think that degeneracies will be fairly important in practice.
we can copy the relevant parts of the human brain which does the things our analysis of our models said they would do wrong, either empirically (informed by theory of course), or purely theoretically if we just need a little bit of inspiration for what the relevant formats need to look like.
I struggle to follow you guys in this part of the dialogue, could you unpack this a bit for me please?
The idea is that currently there's a bunch of formally unsolved alignment problems relating to things like ontology shifts, value stability under reflection & replication, non-muggable decision theories, and potentially other risks we haven't thought of yet such that if an agent pursues your values adequately in a limited environment, its difficult to say much confidently about whether it will continue to pursue your values adequately in a less limited environment.
But we see that humans are generally able to pursue human values (or at least, not go bonkers in the ways we worry about above), so maybe we can copy off of whatever evolution did to fix these traps.
The hope is that either SLT + neuroscience can give us some light into what that is, or just tell us that our agent will think about these sorts of things in the same way that humans do under certain set-ups in a very abstract way, or give us a better understanding of what risks above are actually something you need to worry about versus something you don't need to worry about.
I think Garrett is saying: our science gets good enough that we can tell that, in some situations, our models are going to do stuff we don't like. We then look at the brain and try and see what the brain would do in that situation.
This seems possible, but I'm thinking more mechanistically than that. Borrowing terminology from I think Redwood's mechanistic anomaly detection strategy, we want our AIs to make decisions for the same reasons that humans make decisions (though you can't actually use their methods or directly apply their conceptual framework here, because we also want our AIs to get smarter than humans, which necessitates them making decisions for different reasons than humans, and humans make decisions on the basis of a bunch of stuff depending on context and their current mind-state).
But all spaces are projections of some space where the projection gives singularities, surely?
Uniform priors will generically turn into non-uniform priors after you project, which I think is going to change the learning dynamics / relevance of the RLCTs?
This is a fairly "chat"-style dialogue that I (kave) had with Garrett about singular learning theory (SLT) and his ambitious plans for solving ambitious value learning by building off of it.
A colleague found this gave them better trailheads for SLT than current expositions (though they're still confused) and I got a much clearer sense of the scope of Garrett's hopes from this conversation than from his post alone.
What is Singular Learning Theory?
Garrett's hopes for grand singular learning theory-based theories
Capabilities externalities