If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
I don't think "sparse neural networks" fit the bill. All the references I've turned up for the phrase talk about the usual sort of what I've been calling layered NNs, but where most of the parameters are zero. This leaves intact the layer structure.
To express more precisely the sort of connectivity I'm talking about, for any NN, construct the following directed graph. There is one node for every neuron, and an arc from each neuron A to each neuron B whose output depends directly on an output value of A.
For the NNs as described in e.g. Andrej Karpathy's lectures (which I'm currently going through), this graph is a DAG. Furthermore, it is a DAG having the property of layeredness, which I define thus:
A DAG is layered if every node A can be assigned an integer label L(A), such that for every edge from A to B, L(B) = L(A)+1. A layer is the set of all the nodes having a given label.
The sparse NNs I've found in the literature are all layered. A "full" (i.e. not sparse) NN would also satisfy the converse of the above definition, i.e. L(B) = L(A)+1 would imply an edge from A to B.
The simplest example of a non-layered DAG is one with three nodes A, B, and C, with edges from A to B, A to C, and B to C. If you tried to structure this into layers, you would either find an edge between two nodes in the same layer, or an edge that skips a layer.
To cover non-DAG NNs also, I'd call one layered if in the above definition, "L(B) = L(A)+1" is replaced by "L(B) = L(A) ± 1". (ETA: This is equivalent to the graph being bipartite: the nodes can be divided into two sets such that every edge goes from a node in one set to a node in the other.)
It could be called approximately layered if most edges satisfy the condition.
Are there any not-even-approximately-layered NNs in the literature?