LESSWRONG
LW

Comment Permalink

How can neural networks approximate functions well in practice, when the set of possible functions is exponentially larger than the set of practically possible networks?

This question answers itself. If neural networks could really approximate every possible function, they could never generalize. That is the whole point of statistical learning theory: you get a Probably Approximately Correct (PAC) generalization bound when 1) your learning machine gets good empirical accuracy and 2) the number of possible functions expressible by the machine is small in some sense compared to the volume of training data.

See in context

2 The Extraordinary Link Between Deep Neural Networks and the Nature of the Universe

by morganism

10th Sep 2016

1 min read

2

"The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties."

“For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order.” These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones."

Interesting article, and just diving into the paper now, but it looks like this is a big boost to the simulation argument. If the universe is built like a game engine, with stacked sets like Mandelbrots, then the simplicity itself becomes a driver in a fabricated reality.

https://www.technologyreview.com/s/602344/the-extraordinary-link-between-deep-neural-networks-and-the-nature-of-the-universe/

Why does deep and cheap learning work so well?

http://arxiv.org/abs/1608.08225

Personal Blog

2

New Comment

22 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:39 AM

[-]Manfred9y60

I'd blame the MIT press release organ for being clickbait, but the paper isn't much better. It's almost entirely flash with very little substance. This is not to say there's no math - the math just doesn't much apply to the real world. For example, the idea that deep neural networks work well because they recreate the hierarchical generative process for the data is a common misconception.

And then from this starting point you want to start speculating?

[-]Raiden9y20

Can you explain why that's a misconception? Or at least point me to a source that explains it?

I've started working with neural networks lately and I don't know too much yet, but the idea that they recreate the generative process behind a system, at least implicitly, seems almost obvious. If I train a neural network on a simple linear function, the weights on the network will probably change to reflect the coefficients of that function. Does this not generalize?

[-]Manfred9y20

Well, consider a neural net for distinguishing dogs from cats. This neural network might develop features that look like "dog-like eyes" and "cat-like eyes," which are pattern-matched across the image. Images with more activation on the first feature are claimed to be dogs and images with more activation on the second feature are claimed to be cats, along with input from many other features. This is fairly typical-sounding.

Now imagine how bonkers a neural net would have to be in order to reproduce the generative process behind the images! Leaving aside simulations of the early universe, our neural network should still have a solid understanding of the biology of dogs and cats, the different grooming and adornment practices, macroscopic physics and physiology that leads to poses, and the preferences of people taking and storing photographs.

[-]Tyrin9y00

Isn't the idea more that the neural network just learns rough subgraphs of the underlying DAG that captures the causal structure up to quantum detail? Whole-part relationships are such subgraphs: a person being present causes a face to be present, which causes eyes to be present etc.

[-]skeptical_lurker9y50

This might make some sense if DNNs were being used to further our understanding of theoretical physics, but afaik they're not. They're being used to classify cat pics. SInce when do you use polynomial Hamiltonians to recognise cats?

These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones

No finite DNN can approximate sin(x) over the entire real numbers, unless you cheat by having a sin(x) activation function.

[-]Houshalter9y30

I have another theory on how Deep Learning works: http://lesswrong.com/lw/m9p/approximating_solomonoff_induction/

The idea is that neural networks are a (somewhat crude) approximation of solomonoff induction.

[-]The_Jaded_One9y20

Basically every learning algorithm can be seen as a crude approximation of Solomonoff induction. What makes one approximation better than the others?

[-]Houshalter9y10

Well I try to demonstrate you can derive neural networks from first principles, starting with SI. I don't think you can derive decision trees or other ML algorithms in a similar way.

Further, NNs are completely general. In theory recurrent neural nets can learn to simulate any computer program, or at least logical circuits. With certain modifications they can even be given a memory "tape" like a turing machine and become turing complete. Most machine learning methods do not have this property or anything like it. They can only learn "shallow" functions and can't handle recurrency.

[+][comment deleted]5y10

Deleted by Abhimanyu Pallavi Sudhir, 08/22/2021

[-]Daniel_Burfoot9y20

How can neural networks approximate functions well in practice, when the set of possible functions is exponentially larger than the set of practically possible networks?

[-]morganism9y20

This also reminds me of the scale of the universe claims that everything from subatomics to galaxy clusters scale at 10x9.

animations: http://scaleofuniverse.com/

http://apod.nasa.gov/apod/ap120312.html

cosmological https://archive.org/details/arxiv-astro-ph9404054

table http://physicsoftheuniverse.com/numbers.html

[-]James_Miller9y20

it looks like this is a big boost to the simulation argument.

It could be that you only get civilizations "in universes is governed by a tiny subset of all possible functions" because else wise either evolution can't "discover" how to create intelligent life, or evolved intelligent life can't figure out science.

[-]Luke_A_Somers9y20

That reminds me of a fantasy novel I began and abandoned - in it, there's a civilization that can do astonishing things and even though they have math beyond ours, they still have no idea how just about any of it works, because the rules are so much more complicated that they have a hard time pulling off balls rolling down ramps kinds of experiments (the ramp would remember balls rolling and, depending on the details of the ramp, make it happen slower or faster; and if you made a new ramp each time the pattern of your interaction with ramps would develop the same sort of reaction). One of them was kicked out to a place where magic was weaker, allowing her to figure it all out; she ended up stronger than any of them.

[-]morganism9y00

Or it could also be that all the matter in the universe has already been converted to "smart matter" and is running basic algorithms and rulesets.....

[-]Good_Burning_Plastic9y00

That's basically the Unsong universe

[-]morganism9y00

More patterns, and set limiters found.

https://www.quantamagazine.org/20161115-strange-numbers-found-in-particle-collisions/

“It seems so that the periods which nature wants are a smaller set than the periods mathematics can define, but we cannot define very cleanly what this subset really is.”

Brown is looking to prove that there’s a kind of mathematical group — a Galois group — acting on the set of periods that come from Feynman diagrams. “The answer seems to be yes in every single case that’s ever been computed,” he said, but proof that the relationship holds categorically is still in the distance. “If it were true that there were a group acting on the numbers coming from physics, that means you’re finding a huge class of symmetries,” Brown said. “If that’s true, then the next step is to ask why there’s this big symmetry group and what possible physics meaning could it have.”

[-]The_Jaded_One9y00

I am interested in this line of research, I feel it needs a lot more work than one paper, though.

A key question is whether we can dig down into the relationship between environments and learning agents. Are there low complexity environments that neural networks do badly in?

What is really essential about our laws of physics to create a world that neural networks do relatively well in?

[-]morganism9y-20

and since you can't "look inside a NN, you cant even see problems developing

"If there hadn’t been an interpretable model, Malioutov cautions, “you could accidentally kill people.”

This is why so many are reluctant to gamble on the mysteries of neural networks."

http://nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable

[-]Houshalter9y00

You triple posted.

Second this is irrelevant. Any serious AI is going to be difficult to interpret. We have no idea how to interpret human brains. This article is about why NNs work.

[-]morganism9y00

Thought it was another view towards having an "explainer" module in your AI.

Sorry if multiposting, i typically have the "loading" bars rolling for 3-4 min before it posts, and lots of time i actually have to hit cancel after 5 min or so.

I don't see much in edit mode on a previous post, and NoScript doesn't like all the page re-directs here at all. vigilink and websiteoptimizer never work, and have to be re-authorized with each page.

[-]Houshalter9y00

Adblock plus removes the stupid vigilinks for me (or just block vigilink.com or whatever the source site is.) Though noscript should probably do that to begin with.

[+]morganism9y-50

Moderation Log

Curated and popular this week

22Comments