After skimming this paper I don't feel that impressed. Maybe someone who read in detail could correct me.
There's a boring claim and an exciting claim here:
The boring claim is that diffusion models learn to generalize beyond their exact training set, and models trained on training sets drawn from the same distribution will be pretty good at unseen images drawn from that distribution - and because they're both pretty good, they'll overlap in their suggested desnoisings.
The exciting claim is that diffusion models trained on overlapping data from the same dataset learn nearly the same algorithms, which can be seen because they produce suggested denoisings that are similar in ways that would be vanishingly unlikely if they weren't overlapping mechanistically.
AFAICT, they show the boring claim that everyone already knew, and imply the exciting claim but don't support it at all.
Haven't read in detail but Fig. 2 seems to me to support the exciting claim (also because overparameterized models with 70k trainable parameters)?
Okay, sure, I kind of buy it. Generated images are closer to each other than to the nearest image in the training set. And the denoisers learn similar heuristics like "do averaging" and "there's probably a face in the middle of the image."
I still don't really feel excited, but maybe that's me and not the paper.
This is a linkpost for https://arxiv.org/abs/2310.02557.