Ok, so the motivation is to learn templates to do correlation at each image location with. But where would you get the idea from to do the same with the correlation map again? That seems non-obvious to me. Or do you mean biological vision?
Nope, didn't mean biological vision. Not totally sure I understand your comment, so let me know if I'm rambling.
You can think of lower layers (the ones closer to the input pixels) as "smaller" or "more local," and higher layers as "bigger," or "more global," or "composed of nonlinear combinations of lower-level features." (EDIT: In fact, this restricted connectivity of neurons is an important insight of CNNs, compared to full NNs.)
So if you want to recognize horizontal lines, the lowest layer of a CNN might...
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.