Once upon a time . . .
This is a story from when I first met Marcello, with whom I would later work for a year on AI theory; but at this point I had not yet accepted him as my apprentice. I knew that he competed at the national level in mathematical and computing olympiads, which sufficed to attract my attention for a closer look; but I didn’t know yet if he could learn to think about AI.
I had asked Marcello to say how he thought an AI might discover how to solve a Rubik’s Cube. Not in a preprogrammed way, which is trivial, but rather how the AI itself might figure out the laws of the Rubik universe and reason out how to exploit them. How would an AI invent for itself the concept of an “operator,” or “macro,” which is the key to solving the Rubik’s Cube?
At some point in this discussion, Marcello said: “Well, I think the AI needs complexity to do X, and complexity to do Y—”
And I said, “Don’t say ‘complexity.’ ”
Marcello said, “Why not?”
I said, “Complexity should never be a goal in itself. You may need to use a particular algorithm that adds some amount of complexity, but complexity for the sake of complexity just makes things harder.” (I was thinking of all the people whom I had heard advocating that the Internet would “wake up” and become an AI when it became “sufficiently complex.”)
And Marcello said, “But there’s got to be some amount of complexity that does it.”
I closed my eyes briefly, and tried to think of how to explain it all in words. To me, saying “complexity” simply felt like the wrong move in the AI dance. No one can think fast enough to deliberate, in words, about each sentence of their stream of consciousness; for that would require an infinite recursion. We think in words, but our stream of consciousness is steered below the level of words, by the trained-in remnants of past insights and harsh experience . . .
I said, “Did you read ‘A Technical Explanation of Technical Explanation’?”1
“Yes,” said Marcello.
“Okay,” I said. “Saying ‘complexity’ doesn’t concentrate your probability mass.”
“Oh,” Marcello said, “like ‘emergence.’ Huh. So . . . now I’ve got to think about how X might actually happen . . .”
That was when I thought to myself, “Maybe this one is teachable.”
Complexity is not a useless concept. It has mathematical definitions attached to it, such as Kolmogorov complexity and Vapnik-Chervonenkis complexity. Even on an intuitive level, complexity is often worth thinking about—you have to judge the complexity of a hypothesis and decide if it’s “too complicated” given the supporting evidence, or look at a design and try to make it simpler.
But concepts are not useful or useless of themselves. Only usages are correct or incorrect. In the step Marcello was trying to take in the dance, he was trying to explain something for free, get something for nothing. It is an extremely common misstep, at least in my field. You can join a discussion on artificial general intelligence and watch people doing the same thing, left and right, over and over again—constantly skipping over things they don’t understand, without realizing that’s what they’re doing.
In an eyeblink it happens: putting a non-controlling causal node behind something mysterious, a causal node that feels like an explanation but isn’t. The mistake takes place below the level of words. It requires no special character flaw; it is how human beings think by default, how they have thought since the ancient times.
What you must avoid is skipping over the mysterious part; you must linger at the mystery to confront it directly. There are many words that can skip over mysteries, and some of them would be legitimate in other contexts—“complexity,” for example. But the essential mistake is that skip-over, regardless of what causal node goes behind it. The skip-over is not a thought, but a microthought. You have to pay close attention to catch yourself at it. And when you train yourself to avoid skipping, it will become a matter of instinct, not verbal reasoning. You have to feel which parts of your map are still blank, and more importantly, pay attention to that feeling.
I suspect that in academia there is a huge pressure to sweep problems under the rug so that you can present a paper with the appearance of completeness. You’ll get more kudos for a seemingly complete model that includes some “emergent phenomena,” versus an explicitly incomplete map where the label says “I got no clue how this part works” or “then a miracle occurs.” A journal may not even accept the latter paper, since who knows but that the unknown steps are really where everything interesting happens?2
And if you’re working on a revolutionary AI startup, there is an even huger pressure to sweep problems under the rug; or you will have to admit to yourself that you don’t know how to build the right kind of AI yet, and your current life plans will come crashing down in ruins around your ears. But perhaps I am over-explaining, since skip-over happens by default in humans. If you’re looking for examples, just watch people discussing religion or philosophy or spirituality or any science in which they were not professionally trained.
Marcello and I developed a convention in our AI work: when we ran into something we didn’t understand, which was often, we would say “magic”—as in, X magically does Y”—to remind ourselves that here was an unsolved problem, a gap in our understanding. It is far better to say “magic” than “complexity” or “emergence”; the latter words create an illusion of understanding. Wiser to say “magic,” and leave yourself a placeholder, a reminder of work you will have to do later.
1 http://lesswrong.com/rationality/a-technical-explanation-of-technical-explanation
2 And yes, it sometimes happens that all the non-magical parts of your map turn out to also be non-important. That’s the price you sometimes pay, for entering into terra incognita and trying to solve problems incrementally. But that makes it even more important to know when you aren’t finished yet. Mostly, people don’t dare to enter terra incognita at all, for the deadly fear of wasting their time.
The solution (I posted it elsewhere also):
To solve Rubik's cube, you can just do hill climbing, with breadth-first-ish search for the higher hill point (i.e. you find higher point even if it is several moves away). This discovers the sequences. Cache the sequences.
It's a very general problem solving method, hill climbing with N move look-ahead. One does try maximizing various metrics, that are maximal in the final state, and finds one that works for you without getting you stuck in local maximum for too long. You also try various orders of iterating the moves (e.g. one could opt for repetitive sequences).
This works for chess as well, and for pretty much all puzzles. This is how I solve puzzles when I get a puzzle for first time, except of course I have terabytes worth of tricks that I can try, and 10^15 - ish operations per second; parallel, of course, but parallel works. Pre-generating sequences is not necessary. You arrive at them when hill climbing with breadth-first search, and cache them. You also tell them to other people whom you want to make into rubik-cube-solvers. The important thing that can't be stressed enough - try to figure out a good metric to climb. Some sides of hill are smoother than others.
One could hill climb some sort of complexity metric - evolution did that to arrive at humans, even though the bacteria is a better solution to 'reproduction'. You only need a comparator for climbing. Comparators are easy. You can make agents fight (or you can make agents cooperate). You don't need mapping to real number. You can do evolutionary hill climbing with n-move look ahead. edit: note that you do NOT need good ordering for hill climbing either. If sometimes a>b and b>c and c>a it is okay if you remember where you already been and avoid looping. That may still get you to the top of the hill.
I can't understand what you mean. Surely you don't mean that natural selection rewarded something besides inclusive genetic fitness.