Weiahe H

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by

A Mechanistic Interpretability Analysis of Grokking

So perhaps the way to encourage good generalization in models is to measure how good the weights can be predicted by another model (?). Apologies if this may sound like a crackpot idea.

Interesting idea. The natural next question is: how would you use that second model to determine the kolmogorov complexity (or a metric similar to kolmogorov complexity) of the first model's weights? Let's say you want to use the complexity of the second model, assuming that it is the simplest possible model that can predict the first models weights, to help you determine that. But in order to satisfy that assumption, you could use a third model in a similar way to minimize the complexity of the second. And so on. Eventually you need to determine the complexity of the weights without training another model, using some metric (whether its weight norms, performance after pruning, or [insert clever method from the future]). Why not just apply this metric to the first model and not train additional ones?

That said, I could be overlooking something and empirical results could suggest otherwise, so it could still be worth testing the idea out.

Reply