I want to look at deep neural-net learning and hierarchical inference through some kind of information-theoretic lens and try to show why hierarchical learning is such a powerful general principle. Anyone have an idea whether mutual information or KL-divergence is the normal measure used for this kind of study, or where I might look for literature other than surveys of deep learning, or why I might use one rather than the other?
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.