I found this video during Summer Of Math Exposition 2. I learned more about how renormalization works, which improved my physics.
But also, it seems more generally relevant. For instance for AI, people often discuss the possibility of phase transitions where new skills are gained or effects are unlocked as the nature of the AI smoothly changes. Renormalization sort of explains when you what causes such phase transitions to exist.
The Principles of Deep Learning Theory uses renormalization group flow in its analysis of deep learning, though it is applied at a 'lower level' than an AI's capabilities.
I like the video. But there are problems applying it to AI, it seems like. GPT-3's output isn't simple. Calling quickly learning a skill a phase transition is more of a metaphor - it's not a simple state, it's the simple output of a complicated way of measuring skills.
The phase transitions I had in mind were more "Foom" scenarios than sigmoid skill curves, though a similar logic may at times apply to both:
While any one output of the model is highly complex and contextual, the impact of a model will mainly come from it giving multiple outputs in a row. This is where renormalization comes in - these many outputs will add up, and the way they add up may cause them to undergo phase transitions.
For instance with "Foom", one phase transition would be how the AI's output impacts its power in the world. Currently, nice outputs in training may cause it to get deployed, but beyond that its power is fairly fixed. However, if an individual output could increase the AI's power in the world slightly on average, then repeated outputs could renormalize into a "Foom" scenario where it rapidly takes control. (At least in principle; there are all sorts of complications to this question which have been discussed at length in Foom debates.)
Arguably, though it is not what I had in mind when I wrote the comment, renormalization also applies to sigmoid skill curves. Usually, solving a task requires outputting multiple correct words in a row. So to get 50% accuracy on a task that requires outputting 20 correct words in sequence, it needs 0.5^(1/20)=96.5% accuracy on each individual word. There's a phase transition above 90% accuracy on the individual words where it rapidly acquires the skill, due to the skill depending on repeated outputs in a row.
Kudos to the speaker, as a (physics) layman I found it really well explained. The connection b/w Renormalization flows and phase transition was really elegant.