If I had to pick a single statement that relies on more Overcoming Bias content I've written than any other, that statement would be:
Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth.
"Well," says the one, "maybe according to your provincial human values, you wouldn't like it. But I can easily imagine a galactic civilization full of agents who are nothing like you, yet find great value and interest in their own goals. And that's fine by me. I'm not so bigoted as you are. Let the Future go its own way, without trying to bind it forever to the laughably primitive prejudices of a pack of four-limbed Squishy Things -"
My friend, I have no problem with the thought of a galactic civilization vastly unlike our own... full of strange beings who look nothing like me even in their own imaginations... pursuing pleasures and experiences I can't begin to empathize with... trading in a marketplace of unimaginable goods... allying to pursue incomprehensible objectives... people whose life-stories I could never understand.
That's what the Future looks like if things go right.
If the chain of inheritance from human (meta)morals is broken, the Future does not look like this. It does not end up magically, delightfully incomprehensible.
With very high probability, it ends up looking dull. Pointless. Something whose loss you wouldn't mourn.
Seeing this as obvious, is what requires that immense amount of background explanation.
And I'm not going to iterate through all the points and winding pathways of argument here, because that would take us back through 75% of my Overcoming Bias posts. Except to remark on how many different things must be known to constrain the final answer.
Consider the incredibly important human value of "boredom" - our desire not to do "the same thing" over and over and over again. You can imagine a mind that contained almost the whole specification of human value, almost all the morals and metamorals, but left out just this one thing -
- and so it spent until the end of time, and until the farthest reaches of its light cone, replaying a single highly optimized experience, over and over and over again.
Or imagine a mind that contained almost the whole specification of which sort of feelings humans most enjoy - but not the idea that those feelings had important external referents. So that the mind just went around feeling like it had made an important discovery, feeling it had found the perfect lover, feeling it had helped a friend, but not actually doing any of those things - having become its own experience machine. And if the mind pursued those feelings and their referents, it would be a good future and true; but because this one dimension of value was left out, the future became something dull. Boring and repetitive, because although this mind felt that it was encountering experiences of incredible novelty, this feeling was in no wise true.
Or the converse problem - an agent that contains all the aspects of human value, except the valuation of subjective experience. So that the result is a nonsentient optimizer that goes around making genuine discoveries, but the discoveries are not savored and enjoyed, because there is no one there to do so. This, I admit, I don't quite know to be possible. Consciousness does still confuse me to some extent. But a universe with no one to bear witness to it, might as well not be.
Value isn't just complicated, it's fragile. There is more than one dimension of human value, where if just that one thing is lost, the Future becomes null. A single blow and all value shatters. Not every single blow will shatter all value - but more than one possible "single blow" will do so.
And then there are the long defenses of this proposition, which relies on 75% of my Overcoming Bias posts, so that it would be more than one day's work to summarize all of it. Maybe some other week. There's so many branches I've seen that discussion tree go down.
After all - a mind shouldn't just go around having the same experience over and over and over again. Surely no superintelligence would be so grossly mistaken about the correct action?
Why would any supermind want something so inherently worthless as the feeling of discovery without any real discoveries? Even if that were its utility function, wouldn't it just notice that its utility function was wrong, and rewrite it? It's got free will, right?
Surely, at least boredom has to be a universal value. It evolved in humans because it's valuable, right? So any mind that doesn't share our dislike of repetition, will fail to thrive in the universe and be eliminated...
If you are familiar with the difference between instrumental values and terminal values, and familiar with the stupidity of natural selection, and you understand how this stupidity manifests in the difference between executing adaptations versus maximizing fitness, and you know this turned instrumental subgoals of reproduction into decontextualized unconditional emotions...
...and you're familiar with how the tradeoff between exploration and exploitation works in Artificial Intelligence...
...then you might be able to see that the human form of boredom that demands a steady trickle of novelty for its own sake, isn't a grand universal, but just a particular algorithm that evolution coughed out into us. And you might be able to see how the vast majority of possible expected utility maximizers, would only engage in just so much efficient exploration, and spend most of its time exploiting the best alternative found so far, over and over and over.
That's a lot of background knowledge, though.
And so on and so on and so on through 75% of my posts on Overcoming Bias, and many chains of fallacy and counter-explanation. Some week I may try to write up the whole diagram. But for now I'm going to assume that you've read the arguments, and just deliver the conclusion:
We can't relax our grip on the future - let go of the steering wheel - and still end up with anything of value.
And those who think we can -
- they're trying to be cosmopolitan. I understand that. I read those same science fiction books as a kid: The provincial villains who enslave aliens for the crime of not looking just like humans. The provincial villains who enslave helpless AIs in durance vile on the assumption that silicon can't be sentient. And the cosmopolitan heroes who understand that minds don't have to be just like us to be embraced as valuable -
I read those books. I once believed them. But the beauty that jumps out of one box, is not jumping out of all boxes. (This being the moral of the sequence on Lawful Creativity.) If you leave behind all order, what is left is not the perfect answer, what is left is perfect noise. Sometimes you have to abandon an old design rule to build a better mousetrap, but that's not the same as giving up all design rules and collecting wood shavings into a heap, with every pattern of wood as good as any other. The old rule is always abandoned at the behest of some higher rule, some higher criterion of value that governs.
If you loose the grip of human morals and metamorals - the result is not mysterious and alien and beautiful by the standards of human value. It is moral noise, a universe tiled with paperclips. To change away from human morals in the direction of improvement rather than entropy, requires a criterion of improvement; and that criterion would be physically represented in our brains, and our brains alone.
Relax the grip of human value upon the universe, and it will end up seriously valueless. Not, strange and alien and wonderful, shocking and terrifying and beautiful beyond all human imagination. Just, tiled with paperclips.
It's only some humans, you see, who have this idea of embracing manifold varieties of mind - of wanting the Future to be something greater than the past - of being not bound to our past selves - of trying to change and move forward.
A paperclip maximizer just chooses whichever action leads to the greatest number of paperclips.
No free lunch. You want a wonderful and mysterious universe? That's your value. You work to create that value. Let that value exert its force through you who represents it, let it make decisions in you to shape the future. And maybe you shall indeed obtain a wonderful and mysterious universe.
No free lunch. Valuable things appear because a goal system that values them takes action to create them. Paperclips don't materialize from nowhere for a paperclip maximizer. And a wonderfully alien and mysterious Future will not materialize from nowhere for us humans, if our values that prefer it are physically obliterated - or even disturbed in the wrong dimension. Then there is nothing left in the universe that works to make the universe valuable.
You do have values, even when you're trying to be "cosmopolitan", trying to display a properly virtuous appreciation of alien minds. Your values are then faded further into the invisible background - they are less obviously human. Your brain probably won't even generate an alternative so awful that it would wake you up, make you say "No! Something went wrong!" even at your most cosmopolitan. E.g. "a nonsentient optimizer absorbs all matter in its future light cone and tiles the universe with paperclips". You'll just imagine strange alien worlds to appreciate.
Trying to be "cosmopolitan" - to be a citizen of the cosmos - just strips off a surface veneer of goals that seem obviously "human".
But if you wouldn't like the Future tiled over with paperclips, and you would prefer a civilization of...
...sentient beings...
...with enjoyable experiences...
...that aren't the same experience over and over again...
...and are bound to something besides just being a sequence of internal pleasurable feelings...
...learning, discovering, freely choosing...
...well, I've just been through the posts on Fun Theory that went into some of the hidden details on those short English words.
Values that you might praise as cosmopolitan or universal or fundamental or obvious common sense, are represented in your brain just as much as those values that you might dismiss as merely human. Those values come of the long history of humanity, and the morally miraculous stupidity of evolution that created us. (And once I finally came to that realization, I felt less ashamed of values that seemed 'provincial' - but that's another matter.)
These values do not emerge in all possible minds. They will not appear from nowhere to rebuke and revoke the utility function of an expected paperclip maximizer.
Touch too hard in the wrong dimension, and the physical representation of those values will shatter - and not come back, for there will be nothing left to want to bring it back.
And the referent of those values - a worthwhile universe - would no longer have any physical reason to come into being.
Let go of the steering wheel, and the Future crashes.
This is the key point on which I disagree with Eliezer. I don't disagree with what he literally says here, but with what he implies and what he concludes. The key context he isn't giving here is that what he says here only applies fully to a hard-takeoff AI scenario. Consider what he says about boredom:
The things he lists here make an argument that, in the absence of competition, an existing value system can drift into one that doesn't mind boredom. But none of them address the argument he's supposedly addressing, that bored creatures will fail to compete and be eliminated. I infer that he dismisses that argument because the thing he's talking about having value-drift, the thing that appears in the next line, is a hard-takeoff AI that doesn't have to compete.
The right way to begin asking whether minds can evolve not to be bored is to sample minds that have evolved, preferably independently, and find how many of them mind boredom.
Earth has no independently-evolved minds that I know of; all intelligent life is metazoans, and all metazoans are intelligent. This does show us, however, that intelligence is an evolutionary ratchet. Unlike other phenotypic traits, it doesn't disappear from any lines after evolving. That's remarkable, and relevant: Intelligence doesn't disappear. So we can strike off "universe of mindless plankton" from our list of moderately-probable fears.
Some metazoans live lives of extreme boredom. For instance, spiders, sea urchins, molluscs, most fish, maybe alligators. Others suffer physically from boredom: parrots, humans, dogs, cats. What distinguishes these two categories?
Animals that don't become bored are generally small, small-brained, have low metabolisms, short lifespans, a large number of offspring, and live in a very narrow range of environmental conditions. Animals that become bored are just the opposite. There are exceptions: fish and alligators have long lifespans, and alligators are large. But we can see how these traits conspire to produce an organism that can afford to be bored:
Small-brained, short lifespan, large number of offspring, narrow range of environmental conditions: These are all conditions under which it is better for the species to adapt to the environment by selection or by environment-directed development than by learning. Insects and nematodes can't learn much except via selection; their brains appear to have identical neuron number and wiring within a species. Alligator brains weigh about 8 grams.
Low metabolism merely correlates with low activity, which is how I identified most of these organisms, equating "not moving" with "not minding boredom." Small correlates with short lifespan and small-brained.
These things require learning: long lifespan, a changing environment, and minimizing reproduction time. If an organism will need to compete in a changing environment or across many different environments, as birds and mammals do, they'll need to learn. If a mother's knowledge is encoded in a form that she can't transmit to her children, they'll need to learn.
This business of having children is difficult to translate to a world of AIs. But the business of adaptation is clear. Given that active, curious, intelligent, environment-transforming minds already exist, and given continued competition, only minds that can adapt to rapid change will be able to remain powerful. So we can also strike "world dominated by beings who build paperclips" off our list of fears, provided those conditions are maintained. All we need do is ensure continued competition. Intelligence will not de-evolve, and intelligence will keep the environment changing rapidly enough that constant learning will be necessary, and so will be boredom.
The space of possible minds is large. The space of possible evolved minds is much smaller. The space of possible minds co-evolved with competition is much smaller than that. The space X of possible co-evolved minds capable of dominating the human race is much smaller than that.
Let Y = the set of value systems that might be produced from trying to enumerate human "final" values and put them in a utility function which will be evaluated by a single symbolic logic engine, incorporating all types of values above the level of the gene (body, mind, conscious mind, kin group, social group, for starters), with context-free set-membership functions that classify percepts into a finite set of atomic symbols prior to considering the context those symbols will be used in, and that will be designed to prevent final values from changing. I take that as roughly Eliezer's approach.
Let f(Z) be a function over sets of possible value systems, which tells how many of them are not repugnant to us.
My estimation is that f(X) / |X| >> f(Y) / |Y|. Therefore, the best approach is not to try to enumerate human final values and code them into an AI, but to study how co-evolution works, and what conditions give rise to the phenomena we value such as intelligence, consciousness, curiosity, and affection. Then try to direct the future to stay within those conditions.