You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

[Link] An exact mapping between the Variational Renormalization Group and Deep Learning]

5 Post author: Gunnar_Zarncke 08 December 2014 02:33PM

An exact mapping between the Variational Renormalization Group and Deep Learning by Pankaj Mehta, David J. Schwab

Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relatively little is understood theoretically about why these techniques are so successful at feature learning and compression. Here, we show that deep learning is intimately related to one of the most important and successful techniques in theoretical physics, the renormalization group (RG). RG is an iterative coarse-graining scheme that allows for the extraction of relevant features (i.e. operators) as a physical system is examined at different length scales. We construct an exact mapping from the variational renormalization group, first introduced by Kadanoff, and deep learning architectures based on Restricted Boltzmann Machines (RBMs). We illustrate these ideas using the nearest-neighbor Ising Model in one and two-dimensions. Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

To me this paper suggests that deep learning is an approach that could be made or is already conceptually general enough to learn everything there is to learn (assuming sufficient time and resources). Thus it could already be used as the base algorithm of a self-optimizing AGI. 

Comments (9)

Comment author: IlyaShpitser 08 December 2014 05:47:06PM *  8 points [-]

[ meta comment about deep NNs and ML: they are very impressive predictors, but please beware of hype, AI and now machine learning is sort of hype prone, culturally. I actually think statistics culture is superior to machine learning culture about this. ML and statistics are ultimately about the same topic: drawing conclusions from data intelligently. ]

Comment author: V_V 08 December 2014 04:19:48PM 1 point [-]

This suggests that deep learning is an approach that could be made or is already conceptually general enough to learn everything there is to learn (assuming sufficient time and resources). Thus it could already be used as the base algorithm of a self-optimizing AGI.

The paper is interesting, but I don't think that the authors make this claim or that this claim is suggested by the paper.

Comment author: Gunnar_Zarncke 08 December 2014 05:42:25PM 0 points [-]

Agreed. This suggestion is made by me. I will clarify this in the post.

Comment author: Punoxysm 08 December 2014 05:44:40PM *  1 point [-]

could be made or is already conceptually general enough to learn everything there is to learn

Universality of neural networks is a known result (in the sense: A basic fully-connected net with an input layer, hidden layer, and output layer can represent any function given sufficient hidden nodes).

Comment author: skeptical_lurker 10 December 2014 12:13:24AM 1 point [-]

Nitpick: Any continuous function on a compact set. Still, I think this should include most real-life problems.

Comment author: Gunnar_Zarncke 08 December 2014 05:46:52PM 0 points [-]

Universality of functions: Yes (inefficiently so). But the claim made in the paper goes deeper.

Comment author: Punoxysm 08 December 2014 08:55:55PM 0 points [-]

Can you explain? I don't know much about renormalization groups.

Comment author: Gunnar_Zarncke 08 December 2014 09:31:59PM 0 points [-]

The idea behind RG is to find a new coarse-grained description of the spin system where one has “integrated out” short distance fluctuations.

Physics has lots of structure that is local. 'Averaging' over local structures can reveal higher level structures. On rereading I realized that the critical choice remains in the the way the RG is constructed. So the approach isn't as general as I initially imagined it to be.

Comment author: Luke_A_Somers 08 December 2014 05:15:18PM 1 point [-]

This is looking back at existing AI work and noticing a connection. I don't know that the AI folks have much to learn from the renormalization group, unless they happen to be leaving fundamental symmetries around unexploited.