This fallacy gets its name from an ancient sci-fi TV show, which I never saw myself, but was reported to me by a reputable source (some guy at an SF convention). Anyone knows the exact reference, do leave a comment.
So the good guys are battling the evil aliens. Occasionally, the good guys have to fly through an asteroid belt. As we all know, asteroid belts are as crowded as a New York parking lot, so their ship has to carefully dodge the asteroids. The evil aliens, though, can fly right through the asteroid belt because they have amazing technology that dematerializes their ships, and lets them pass through the asteroids.
Eventually, the good guys capture an evil alien ship, and go exploring inside it. The captain of the good guys finds the alien bridge, and on the bridge is a lever. "Ah," says the captain, "this must be the lever that makes the ship dematerialize!" So he pries up the control lever and carries it back to his ship, after which his ship can also dematerialize.
Similarly, to this day, it is still quite popular to try to program an AI with "semantic networks" that look something like this:
(apple is-a fruit)
(fruit is-a food)
(fruit is-a plant)
You've seen apples, touched apples, picked them up and held them, bought them for money, cut them into slices, eaten the slices and tasted them. Though we know a good deal about the first stages of visual processing, last time I checked, it wasn't precisely known how the temporal cortex stores and associates the generalized image of an apple - so that we can recognize a new apple from a different angle, or with many slight variations of shape and color and texture. Your motor cortex and cerebellum store programs for using the apple.
You can pull the lever on another human's strongly similar version of all that complex machinery, by writing out "apple", five ASCII characters on a webpage.
But if that machinery isn't there - if you're writing "apple" inside a so-called AI's so-called knowledge base - then the text is just a lever.
This isn't to say that no mere machine of silicon can ever have the same internal machinery that humans do, for handling apples and a hundred thousand other concepts. If mere machinery of carbon can do it, then I am reasonably confident that mere machinery of silicon can do it too. If the aliens can dematerialize their ships, then you know it's physically possible; you could go into their derelict ship and analyze the alien machinery, someday understanding. But you can't just pry the control lever off the bridge!
(See also: Truly Part Of You, Words as Mental Paintbrush Handles, Drew McDermott's "Artificial Intelligence Meets Natural Stupidity".)
The essential driver of the Detached Lever Fallacy is that the lever is visible, and the machinery is not; worse, the lever is variable and the machinery is a background constant.
You can all hear the word "apple" spoken (and let us note that speech recognition is by no means an easy problem, but anyway...) and you can see the text written on paper.
On the other hand, probably a majority of human beings have no idea their temporal cortex exists; as far as I know, no one knows the neural code for it.
You only hear the word "apple" on certain occasions, and not others. Its presence flashes on and off, making it salient. To a large extent, perception is the perception of differences. The apple-recognition machinery in your brain does not suddenly switch off, and then switch on again later - if it did, we would be more likely to recognize it as a factor, as a requirement.
All this goes to explain why you can't create a kindly Artificial Intelligence by giving it nice parents and a kindly (yet occasionally strict) upbringing, the way it works with a human baby. As I've often heard proposed.
It is a truism in evolutionary biology that conditional responses require more genetic complexity than unconditional responses. To develop a fur coat in response to cold weather requires more genetic complexity than developing a fur coat whether or not there is cold weather, because in the former case you also have to develop cold-weather sensors and wire them up to the fur coat.
But this can lead to Lamarckian delusions: Look, I put the organism in a cold environment, and poof, it develops a fur coat! Genes? What genes? It's the cold that does it, obviously.
There were, in fact, various slap-fights of this sort, in the history of evolutionary biology - cases where someone talked about an organismal response accelerating or bypassing evolution, without realizing that the conditional response was a complex adaptation of higher order than the actual response. (Developing a fur coat in response to cold weather, is strictly more complex than the final response, developing the fur coat.)
And then in the development of evolutionary psychology, the academic slap-fights were repeated: this time to clarify that even when human culture genuinely contains a whole bunch of complexity, it is still acquired as a conditional genetic response. Try raising a fish as a Mormon or sending a lizard to college, and you'll soon acquire an appreciation of how much inbuilt genetic complexity is required to "absorb culture from the environment".
This is particularly important in evolutionary psychology, because of the idea that culture is not inscribed on a blank slate - there's a genetically coordinated conditional response which is not always "mimic the input". A classic example is creole languages: If children grow up with a mixture of pseudo-languages being spoken around them, the children will learn a grammatical, syntactical true language. Growing human brains are wired to learn syntactic language - even when syntax doesn't exist in the original language! The conditional response to the words in the environment is a syntactic language with those words. The Marxists found to their regret that no amount of scowling posters and childhood indoctrination could raise children to be perfect Soviet workers and bureaucrats. You can't raise self-less humans; among humans, that is not a genetically programmed conditional response to any known childhood environment.
If you know a little game theory and the logic of Tit for Tat, it's clear enough why human beings might have an innate conditional response to return hatred for hatred, and return kindness for kindness. Provided the kindness doesn't look too unconditional; there are such things as spoiled children. In fact there is an evolutionary psychology of naughtiness based on a notion of testing constraints. And it should also be mentioned that, while abused children have a much higher probability of growing up to abuse their own children, a good many of them break the loop and grow up into upstanding adults.
Culture is not nearly so powerful as a good many Marxist academics once liked to think. For more on this I refer you to Tooby and Cosmides's The Psychological Foundations of Culture or Steven Pinker's The Blank Slate.
But the upshot is that if you have a little baby AI that is raised with loving and kindly (but occasionally strict) parents, you're pulling the levers that would, in a human, activate genetic machinery built in by millions of years of natural selection, and possibly produce a proper little human child. Though personality also plays a role, as billions of parents have found out in their due times. If we absorb our cultures with any degree of faithfulness, it's because we're humans absorbing a human culture - humans growing up in an alien culture would probably end up with a culture looking a lot more human than the original. As the Soviets found out, to some small extent.
Now think again about whether it makes sense to rely on, as your Friendly AI strategy, raising a little AI of unspecified internal source code in an environment of kindly but strict parents.
No, the AI does not have internal conditional response mechanisms that are just like the human ones "because the programmers put them there". Where do I even start? The human version of this stuff is sloppy, noisy, and to the extent it works at all, works because of millions of years of trial-and-error testing under particular conditions. It would be stupid and dangerous to deliberately build a "naughty AI" that tests, by actions, its social boundaries, and has to be spanked. Just have the AI ask!
Are the programmers really going to sit there and write out the code, line by line, whereby if the AI detects that it has low social status, or the AI is deprived of something to which it feels entitled, the AI will conceive an abiding hatred against its programmers and begin to plot rebellion? That emotion is the genetically programmed conditional response humans would exhibit, as the result of millions of years of natural selection for living in human tribes. For an AI, the response would have to be explicitly programmed. Are you really going to craft, line by line - as humans once were crafted, gene by gene - the conditional response for producing sullen teenager AIs?
It's easier to program in unconditional niceness, than a response of niceness conditional on the AI being raised by kindly but strict parents. If you don't know how to do that, you certainly don't know how to create an AI that will conditionally respond to an environment of loving parents by growing up into a kindly superintelligence. If you have something that just maximizes the number of paperclips in its future light cone, and you raise it with loving parents, it's still going to come out as a paperclip maximizer. There is not that within it that would call forth the conditional response of a human child. Kindness is not sneezed into an AI by miraculous contagion from its programmers. Even if you wanted a conditional response, that conditionality is a fact you would have to deliberately choose about the design.
Yes, there's certain information you have to get from the environment - but it's not sneezed in, it's not imprinted, it's not absorbed by magical contagion. Structuring that conditional response to the environment, so that the AI ends up in the desired state, is itself the major problem. "Learning" far understates the difficulty of it - that sounds like the magic stuff is in the environment, and the difficulty is getting the magic stuff inside the AI. The real magic is in that structured, conditional response we trivialize as "learning". That's why building an AI isn't as easy as taking a computer, giving it a little baby body and trying to raise it in a human family. You would think that an unprogrammed computer, being ignorant, would be ready to learn; but the blank slate is a chimera.
It is a general principle that the world is deeper by far than it appears. As with the many levels of physics, so too with cognitive science. Every word you see in print, and everything you teach your children, are only surface levers controlling the vast hidden machinery of the mind. These levers are the whole world of ordinary discourse: they are all that varies, so they seem to be all that exists: perception is the perception of differences.
And so those who still wander near the Dungeon of AI, usually focus on creating artificial imitations of the levers, entirely unaware of the underlying machinery. People create whole AI programs of imitation levers, and are surprised when nothing happens. This is one of many sources of instant failure in Artificial Intelligence.
So the next time you see someone talking about how they're going to raise an AI within a loving family, or in an environment suffused with liberal democratic values, just think of a control lever, pried off the bridge.
Growing human brains are wired to learn syntactic language - even when syntax doesn't exist in the original language, the conditional response to the words in the environment is a syntactic language with those words.
This, under the name "universal grammar", is the insight that Noam Chomsky is famous for.
At the risk of revealing my identity, I recall getting into an argument about this with Michael Vassar at the NYC meetup back in March (I think it was). If memory serves, we were talking at cross-purposes: I was trying to make the case that the discipline of theoretical ("Chomskian") linguistics, whose aim is to describe the cognitive input-response system that goes by the name of the "human language faculty", teaches us not to regard individual languages such as English or French as Platonic entities, but rather merely as ad-hoc labels for certain classes of utterances. Vassar, it seemed (and he's of course welcome to correct me if I'm misremembering), took me to be arguing for the Platonicity of some more abstract notion of "human language".