by rai
1 min read

2

This is a special post for quick takes by rai. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
1 comment, sorted by Click to highlight new comments since:
[-]rai10

Kernel of something that might inspire someone else who knows more than I.

Assuming weights that have “grokked” a task are more interpretable, is there use in modifying loss functions to increase grokking likelihood? Perhaps by making it path dependent on the updates of the weights themselves?