Updated link: https://www.learningtheory.org/learning-has-just-started-an-interview-with-prof-vladimir-vapnik/ (while looking up his very weird transfer-learning research).
This discussion, complex world vs simple rules, is very old and goes back to Plato and Aristotle. Plato explained our ability to recognize each and every A: All concrete examples partake,in varying degrees, in the ideal A. As these ideals do not exist in the world of the senses, he postulated some kind of hyperreality, the world of ideas, where they exist in timeless perfection. Our souls come from there to this world, and all recognition is re-cognition. Of course. this stuff is hard to swallow for a programmer trying to build some damned machine. A good prototype is better than nothing, but the ideal A has so far eluded any constructivist attempt.
His critic Aristotle did not believe in the world of ideas. As a taxonomist, he described the camel by its attributes. If the distinguishing attributes are present, it’s a camel. Else it’s not a camel. Characterizing an ‘A' by its attributes has proved harder than it seems. What is an attribute? Which attributes are useful? Is this line a short line, or is it already long? Is this a round bow or an edge? Not every A looks like a pointy hat! Does a very characteristic feature compensate for the lack of three others? Even if we have good features, there may be no simple rules. There is this well-known “Rule”: No Rule without Exception: Even folk wisdom discourages any attempt to catch A-ness in a simple net of if and else.
We should not expect that the concept of A-ness can be expressed by such simple means. The set of all "A" ( it exists as a platonic set somewhere, or doesn't it? ) may be pulled back into R^n as a set of grayscale images, but do we really know about its geometric structure? For large n, it is a thin subset, a complicated geometric object. The metric of R^n will preserve its local structure, but that's all. It does not tell us about the concept of A-ness. We should expect that large amounts of memory are necessary.
FYI: (shameless plug) I've tried to illustrate my ideas about a connection between the topology of finite spaces, continuous maps, product spaces and quotient spaces and the factorization of classifying maps on my website, learning-by-glueing.com. It's not finished, any comments are welcome.
Interesting quotes from the interview:
In classical philosophy there are two principles to explain the generalization phenomenon. One is Occam's razor and the other is Popper's falsifiability. It turns out that by using machine learning arguments one can show that both of them are not very good and that one can generalize violating these principles. There are other justifications for inferences.
...
What happens if it goes to value a which is not zero? Then one can prove that there exists in the space X a subspace X0 with probability measure a, such that subset of training vectors that belong to this subspace can be separated in all possible ways. This means that you cannot generalize
Scary thought: What if the rules for AI are very complex so as to make it impossible to build one or prove that an AI will be stable and or friendly? If this turns out to be the case then the singularity will never happen and we have an explanation for the fermi paradox.
It's a legitimate possibility that FAI is just too hard for the human race to achieve from anything like our current state, so that (barring some fantastic luck) we're either doomed to an extinction event, or to a "cosmic locust" future, or to something completely different.
In fact, I'd bet 20 karma against 10 that Eliezer would assign a probability of at least 1% to this being the case, and I'd bet 50 against 10 that Robin assigns a probability of 50% or greater to it.
However, if FAI is in fact too difficult, then the SIAI program seems to do no harm; and if it's not too hard, it could do a world of good. (This is one benefit of the "provably Friendly" requirement, IMO.)
For example, in a complex world one should give up explain-ability (the main goal in classical science) to gain a better predict-ability.
This sounds a lot like True vs. Useful, again.
(Of course it's a bit redundant to call it "machine" learning, since we are learning machines, and there's little reason to assume that we don't learn using mechanical processes optimized for multi-factor matching. And that would tend to explain why learning and skills don't always transfer well between Theory and Practice.)
I recently stumbled across this remarkable interview with Vladimir Vapnik, a leading light in statistical learning theory, one of the creators of the Support Vector Machine algorithm, and generally a cool guy. The interviewer obviously knows his stuff and asks probing questions. Vapnik describes his current research and also makes some interesting philosophical comments:
Later: