Artificial Mysterious Intelligence

Eliezer Yudkowsky

Previously in series: Failure By Affective Analogy

I once had a conversation that I still remember for its sheer, purified archetypicality. This was a nontechnical guy, but pieces of this dialog have also appeared in conversations I've had with professional AIfolk...

Him: Oh, you're working on AI! Are you using neural networks?

Me: I think emphatically not.

Him: But neural networks are so wonderful! They solve problems and we don't have any idea how they do it!

Me: If you are ignorant of a phenomenon, that is a fact about your state of mind, not a fact about the phenomenon itself. Therefore your ignorance of how neural networks are solving a specific problem, cannot be responsible for making them work better.

Him: Huh?

Me: If you don't know how your AI works, that is not good. It is bad.

Him: Well, intelligence is much too difficult for us to understand, so we need to find some way to build AI without understanding how it works.

Me: Look, even if you could do that, you wouldn't be able to predict any kind of positive outcome from it. For all you knew, the AI would go out and slaughter orphans.

Him: Maybe we'll build Artificial Intelligence by scanning the brain and building a neuron-by-neuron duplicate. Humans are the only systems we know are intelligent.

Me: It's hard to build a flying machine if the only thing you understand about flight is that somehow birds magically fly. What you need is a concept of aerodynamic lift, so that you can see how something can fly even if it isn't exactly like a bird.

Him: That's too hard. We have to copy something that we know works.

Me: (reflectively) What do people find so unbearably awful about the prospect of having to finally break down and solve the bloody problem? Is it really that horrible?

Him: Wait... you're saying you want to actually understand intelligence?

Me: Yeah.

Him: (aghast) Seriously?

Me: I don't know everything I need to know about intelligence, but I've learned a hell of a lot. Enough to know what happens if I try to build AI while there are still gaps in my understanding.

Him: Understanding the problem is too hard. You'll never do it.

That's not just a difference of opinion you're looking at, it's a clash of cultures.

For a long time, many different parties and factions in AI, adherent to more than one ideology, have been trying to build AI without understanding intelligence. And their habits of thought have become ingrained in the field, and even transmitted to parts of the general public.

You may have heard proposals for building true AI which go something like this:

Calculate how many operations the human brain performs every second. This is "the only amount of computing power that we know is actually sufficient for human-equivalent intelligence". Raise enough venture capital to buy a supercomputer that performs an equivalent number of floating-point operations in one second. Use it to run the most advanced available neural network algorithms.
The brain is huge and complex. When the Internet becomes sufficiently huge and complex, intelligence is bound to emerge from the Internet. (I get asked about this in 50% of my interviews.)
Computers seem unintelligent because they lack common sense. Program a very large number of "common-sense facts" into a computer. Let it try to reason about the relation of these facts. Put a sufficiently huge quantity of knowledge into the machine, and intelligence will emerge from it.
Neuroscience continues to advance at a steady rate. Eventually, super-MRI or brain sectioning and scanning will give us precise knowledge of the local characteristics of all human brain areas. So we'll be able to build a duplicate of the human brain by duplicating the parts. "The human brain is the only example we have of intelligence."
Natural selection produced the human brain. It is "the only method that we know works for producing general intelligence". So we'll have to scrape up a really huge amount of computing power, and evolve AI.

What do all these proposals have in common?

They are all ways to make yourself believe that you can build an Artificial Intelligence, even if you don't understand exactly how intelligence works.

Now, such a belief is not necessarily false! Methods 4 and 5, if pursued long enough and with enough resources, will eventually work. (5 might require a computer the size of the Moon, but give it enough crunch and it will work, even if you have to simulate a quintillion planets and not just one...)

But regardless of whether any given method would work in principle, the unfortunate habits of thought will already begin to arise, as soon as you start thinking of ways to create Artificial Intelligence without having to penetrate the mystery of intelligence.

I have already spoken of some of the hope-generating tricks that appear in the examples above. There is invoking similarity to humans, or using words that make you feel good. But really, a lot of the trick here just consists of imagining yourself hitting the AI problem with a really big rock.

I know someone who goes around insisting that AI will cost a quadrillion dollars, and as soon as we're willing to spend a quadrillion dollars, we'll have AI, and we couldn't possibly get AI without spending a quadrillion dollars. "Quadrillion dollars" is his big rock, that he imagines hitting the problem with, even though he doesn't quite understand it.

It often will not occur to people that the mystery of intelligence could be any more penetrable than it seems: By the power of the Mind Projection Fallacy, being ignorant of how intelligence works will make it seem like intelligence is inherently impenetrable and chaotic. They will think they possess a positive knowledge of intractability, rather than thinking, "I am ignorant."

And the thing to remember is that, for these last decades on end, any professional in the field of AI trying to build "real AI", had some reason for trying to do it without really understanding intelligence (various fake reductions aside).

The New Connectionists accused the Good-Old-Fashioned AI researchers of not being parallel enough, not being fuzzy enough, not being emergent enough. But they did not say, "There is too much you do not understand."

The New Connectionists catalogued the flaws of GOFAI for years on end, with fiery castigation. But they couldn't ever actually say: "How exactly are all these logical deductions going to produce 'intelligence', anyway? Can you walk me through the cognitive operations, step by step, which lead to that result? Can you explain 'intelligence' and how you plan to get it, without pointing to humans as an example?"

For they themselves would be subject to exactly the same criticism.

In the house of glass, somehow, no one ever gets around to talking about throwing stones.

To tell a lie, you have to lie about all the other facts entangled with that fact, and also lie about the methods used to arrive at beliefs: The culture of Artificial Mysterious Intelligence has developed its own Dark Side Epistemology, complete with reasons why it's actually wrong to try and understand intelligence.

Yet when you step back from the bustle of this moment's history, and think about the long sweep of science - there was a time when stars were mysterious, when chemistry was mysterious, when life was mysterious. And in this era, much was attributed to black-box essences. And there were many hopes based on the similarity of one thing to another. To many, I'm sure, alchemy just seemed very difficult rather than even seeming mysterious; most alchemists probably did not go around thinking, "Look at how much I am disadvantaged by not knowing about the existence of chemistry! I must discover atoms and molecules as soon as possible!" They just memorized libraries of random things you could do with acid, and bemoaned how difficult it was to create the Philosopher's Stone.

In the end, though, what happened is that scientists achieved insight, and then things got much easier to do. You also had a better idea of what you could or couldn't do. The problem stopped being scary and confusing.

But you wouldn't hear a New Connectionist say, "Hey, maybe all the failed promises of 'logical AI' were basically due to the fact that, in their epistemic condition, they had no right to expect their AIs to work in the first place, because they couldn't actually have sketched out the link in any more detail than a medieval alchemist trying to explain why a particular formula for the Philosopher's Stone will yield gold." It would be like the Pope attacking Islam on the basis that faith is not an adequate justification for asserting the existence of their deity.

Yet in fact, the promises did fail, and so we can conclude that the promisers overreached what they had a right to expect. The Way is not omnipotent, and a bounded rationalist cannot do all things. But even a bounded rationalist can aspire not to overpromise - to only say you can do, that which you can do. So if we want to achieve that reliably, history shows that we should not accept certain kinds of hope. In the absence of insight, hopes tend to be unjustified because you lack the knowledge that would be needed to justify them.

We humans have a difficult time working in the absence of insight. It doesn't reduce us all the way down to being as stupid as evolution. But it makes everything difficult and tedious and annoying.

If the prospect of having to finally break down and solve the bloody problem of intelligence seems scary, you underestimate the interminable hell of not solving it.

Dude. Dude. No wonder you've been so emphatic in your denunciations of mysterious answers to mysterious questions.

The universe doesn't have to be kind and make all problems amenable to insight....

There are only a certain number of short programs, and once a program gets above a certain length it is hard to compress (I can't remember the reference for this, so it may be wrong, can anyone help?). We can of course reorder things, but then we have to make things currently simple complex.

That said I do think insight will play some small part in the development of AI, but that there may well be a hell a lot of parameter tweaking that we don't understand or know why they are so.

The arguments Eliezer describes are made, and his reactions are fair. But really the actual research community "grew out" of most of this stuff a while back. CYC and the "common sense" efforts were always a sideshow (in terms of research money and staff, not to mention results). Neural networks were a metonym for statistical learning for a while, then serious researchers figured out they needed to address statistical learning explicitly. Etc.

Admittedly there's always excessive enthusiasm for the current hot thing. A few years ago it was support vector machines, I'm not sure what now.

I recognize there's some need to deflate popular misconceptions, but there's also a need to move on and look at current work.

Eliezer, I'd be very interested in your comments on (what I regard as) the best current work. Examples for you to consider would be Sebastian Thrun, Andrew Ng (both in robotics at Stanford), Chris Manning (linguistics at Stanford), and the papers in the last couple of NIPS conferences (the word "Neural" in the conference title is just a fossil, don't have an allergic reaction).

As an entertaining side note, here's an abstract for a poster for NIPS '08 (happening tomorrow) that addresses the crossover between AI and ems:

A Bayesian Approach for Extracting State Transition Dynamics from Multiple Spike Trains

Neural activity is non-stationary and varies across time. Hidden Markov Models (HMMs) have been used to track the state transition among quasi-stationary discrete neural states. Within this context, an independent Poisson model has been used for the output distribution of HMMs; hence, the model is incapable of tracking the change in correlation without modulating the firing rate. To achieve this, we applied a multivariate Poisson distribution with a correlation term for the output distribution of HMMs. We formulated a Variational Bayes (VB) inference for the model. The VB could automatically determine the appropriate number of hidden states and correlation types while avoiding the overlearning problem. We developed an efficient algorithm for computing posteriors using the recursive relationship of a multivariate Poisson distribution. We demonstrated the performance of our method on synthetic data and a real spike train recorded from a songbird.This is a pretty good example of what I meant by "solving engineering problems" and it should help the ems program "cut corners".

Will Pearson: are you suggesting that the simplest algorithm for intelligence is too large to fit in human memory?

I should mention that the NIPS '08 papers aren't on line yet, but all previous conferences do have the papers, tutorials, slides, background material, etc. on line. For example here's last year.

4 seems important to me. I wouldn't expect intelligence to come via that route, but that route does seem to put a fairly credible (e.g. I would bet 4:1 on claims that credible and expect to win in the long term), though high, soft upper bound to how long we can go on with roughly current rate scientific progress without achieving AI. I'd say that it suggests such a soft uupper bound in the 2070s. That said, I wouldn't be at all surprised by science ceasing to advance at something like the current rate long before then, accelerating or decelerating a lot even without a singularity.

We shouldn't under-rate the power of insight, but we shouldn't over-rate it either; some systems can just be a mass of details, and to master such systems you must master those details. And if you pin your hopes for AI progress on powerful future insights, you have to ask how often such insights occur, and how many we would need. The track record so far doesn't look especially encouraging.

A question about Andrew Ng, who was mentined in this thread. Is that his real name?

A question about Andrew Ng, who was mentined in this thread. Is that his real name?

Here's his homepage: http://robotics.stanford.edu/~ang/

Looks like he's one of the people who worked on the LittleDog robot: http://cs.stanford.edu/groups/littledog/

Robin, the question of whether compact insights exist and whether they are likely to be obtained in reasonable time (and by how large a group, etc) are very different questions and should be considered separately, in order.

Jed, I've met Sebastian Thrun, he's smart but not quite Bayesian-to-the-uttermost-core. (I.e., he once gave an anecdote involving a randomized algorithm and it took me a while to figure out how to derandomize it.) Haven't met the others, but I agree with you that a whole lot of AI has moved on far ahead from the old days. Artificial Intelligence: A Modern Approach is a gorgeous textbook; e.g., they give some idea of what logic is good for and what it's not. Statistical learning is cool (I'm a Bayesian, what do you expect me to say?) and again the great marker is that often they can say, in an abstract sense, what kind of regularity the algorithm assumes and exploits.

But for some odd reason, a lot of the people working in the field of Artificial General Intelligence - "true AI" startups and projects - still seem to adhere to older ways, and to believe that mysterious algorithms can work... an obvious selection effect once you realize it.

But if the brain does not work by magic (of course), then insight does not either. Genius is 99% perspiration, 10,000 failed lightbulbs and all that...

I think the kind of experimental approach Jed Harris was talking about yesterday is where AI will eventually come from. Some researcher who has 10,000 failed AI programs on his hard drive will then have the insight, but not before. The trick is, once he has the insight, to not implement it right away but stop and think about friendliness! But after so many failures how could you resist...

Edison is not a good example of someone who produced insights.

Eliezer: Sorry to harp on something tangential to your main point, but you keep repeating the same mistake and it's bugging me. Evolution is not as slow as you think it is.

In an addendum to this post you mention that you tried a little genetic algorithm in Python, and it didn't do as badly as you would have expected from the math. There is a reason for this. You have the math completely wrong. Or rather, you have it correct for asexual reproduction, and then wrongly assume the limit still applies when you add in sex. As has been pointed out before, genetic algorithms with sex are much, much, much faster than asexual algorithms. Not faster by a constant factor; faster in proportion to the square root of the genome length, which can be pretty damn big.

The essential difference is that as a genome gets more fit, the odds of a mutation hurting rather than helping fitness go up, which limits your acceptable mutation rate, which limits your search rate for asexual reproduction. But if you rely on crossover (sex) to produce new individuals, (1) the expected fitness of the child is equal to the average fitness of the parents, even if the parents are already very fit; and (2) mutation isn't the main thing driving the search anyway, so even if your mutation rate is very low you can still search a large space quickly.

Once again I'll point to MacKay's Information Theory, Learning, and Inference Algorithms for a much better explanation of the math (from an information theory standpoint) than I'm capable of.

And just for fun, but here's another way to look at it that I haven't seen described before. Asexual evolution, as you've pointed out a few times, relies on mutations to generate new candidates, so it's just doing a random walk to adjacent positions in search space. But if you take two parent genomes and do a random crossover, you're effectively taking the hypercube (in the search space of genome strings) whose corners are the two parents, and randomly picking a different corner. So you're taking a much larger step. Mutations are now only necessary to prevent the population from collapsing into a lower-dimensional search space; they aren't the driver of innovation any more.

Jeff - thanks for your comment on evolutionary algorithms. Gave me a completely new perspective on these.

Real robots are hardly going to be the route to AI - too slow a build-test cycle.

Tim Tyler:

As much as I like your posts, one formal note:

If you are responding to somebody else, it is always a good idea to put his name at the beginning of post.

Jeff, if you search for my pseudonym in the comments of the "Natural Selection's Speed Limit and Complexity Bound" post, you will see that I have already brought MacKay's work to Eliezer's attention. Whatever conclusions he's come to have already factored MacKay in.

I'd previously read some of MacKay's book but not that particular section; I actually still have the information-gain problem marked as "pending". Anyone who can't distinguish between 1s gained in a bitstring, and negentropy gained in allele frequencies, is politely invited not to try to solve this particular problem.

How is this different from saying,

"For a long time, many different parties and factions in AI, adherent to more than one ideology, have been trying to build AI without understanding consciousness. Unfortunate habits of thought will already begin to arise, as soon as you start thinking of ways to create Artificial Intelligence without having to penetrate the mystery of consciousness. Instead of all this mucking about with neurons and neuroanatomy and population encoding and spike trains, we should be facing up to the hard problem of understanding what consciousness is."

I would consider that a perfectly legitimate remark if you were trying to build an Artificial Consciousness.

Eliezer: I was making a parallel. I didn't mean "how are these different"; I really meant, "This statement below about consciousness is wrong; yet it seems very similar to Eliezer's post. What is different about Eliezer's post that would make it not be wrong in the same way?"

That said, we don't know what consciousness is, and we don't know what intelligence is; and both occur in every instance of intelligence that we know of; and it would be surprising to find one without the other even in an AI; so I don't think we can distinguish between them.

Eliezer: "Anyone who can't distinguish between 1s gained in a bitstring, and negentropy gained in allele frequencies, is politely invited not to try to solve this particular problem."

Ok, here's the argument translated into allele frequencies. With sexual selection, mutations spread rapidly through the population, so we assume that each individual gets a random sample from the set of alleles for each gene. This means that some poor bastards will get more than their share of the bad ones and few of the good ones (for the current environment), while luckier individuals gets lots of good ones and few bad ones. When the unlucky individuals fail to reproduce, they're going to eliminate bad genes at a higher-than-average rate, and good genes at a lower-than-average rate.

"On average, one detrimental mutation leads to one death" does not hold with sexual selection.

Also, just in case I'm giving the wrong impression--I'm not trying to argue that genetic algorithms are some kind of secret sauce that has special relevance for AI. They just aren't as slow as you keep saying.

Eliezer, MacKay's math isn't very difficult. I think it will take you at most a couple of hours to go through how he derived his equations, understand what they mean, and verify that they are correct. (If I knew you were going to put this off for a year, I'd mentioned that during the original discussion.) After doing that, the idea that sexual reproduction speeds up evolution by gathering multiple bad mutations together to be disposed of at once will become pretty obvious in retrospect.

Jeff, I agree with what you are saying, but you're using the phrase "sexual selection" incorrectly, which might cause confusion to others. I think what you mean is "natural selection in a species with sexual reproduction". "Sexual selection" actually means "struggle between the individuals of one sex, generally the males, for the possession of the other sex".

Wei, I already understand the idea that sexual selection gathers multiple mutations to dispose - i.e., the rule is still "one mutation, one death" but not "one death, one mutation". I can even accept that eliminating half the population applies more than one bit of selection pressure, because the halves are not randomly assigned. But it's not a simple matter of reading MacKay and accepting everything he says - I have to reconcile with Worden's speed limit and with Kimura, and try to get a grasp on the negentropy produced rather than the number of bits in a bitstring.

Jeff, my estimate above on how slow evolution is, was based on the historical evolution time of Earth, rather than any numerical calculation.