The Thing That I Protect

Eliezer Yudkowsky

Followup to: Something to Protect, Value is Fragile

"Something to Protect" discursed on the idea of wielding rationality in the service of something other than "rationality". Not just that rationalists ought to pick out a Noble Cause as a hobby to keep them busy; but rather, that rationality itself is generated by having something that you care about more than your current ritual of cognition.

So what is it, then, that I protect?

I quite deliberately did not discuss that in "Something to Protect", leaving it only as a hanging implication. In the unlikely event that we ever run into aliens, I don't expect their version of Bayes's Theorem to be mathematically different from ours, even if they generated it in the course of protecting different and incompatible values. Among humans, the idiom of having "something to protect" is not bound to any one cause, and therefore, to mention my own cause in that post would have harmed its integrity. Causes are dangerous things, whatever their true importance; I have written somewhat on this, and will write more about it.

But still - what is it, then, the thing that I protect?

Friendly AI? No - a thousand times no - a thousand times not anymore. It's not thinking of the AI that gives me strength to carry on even in the face of inconvenience.

I would be a strange and dangerous AI wannabe if that were my cause - the image in my mind of a perfected being, an existence greater than humankind. Maybe someday I'll be able to imagine such a child and try to build one, but for now I'm too young to be a father.

Those of you who've been following along recent discussions, particularly "Value is Fragile", might have noticed something else that I might, perhaps, hold precious. Smart agents want to protect the physical representation of their utility function for almost the same reason that male organisms are built to be protective of their testicles. From the standpoint of the alien god, natural selection, losing the germline - the gene-carrier that propagates the pattern into the next generation - means losing almost everything that natural selection cares about. Unless you already have children to protect, can protect relatives, etcetera - few are the absolute and unqualified statements that can be made in evolutionary biology - but still, if you happen to be a male human, you will find yourself rather protective of your testicles; that one, centralized vulnerability is why a kick in the testicles hurts more than being hit on the head.

To lose the pattern of human value - which, for now, is physically embodied only in the human brains that care about those values - would be to lose the Future itself; if there's no agent with those values, there's nothing to shape a valuable Future.

And this pattern, this one most vulnerable and precious pattern, is indeed at risk to be distorted or destroyed. Growing up is a hard problem either way, whether you try to edit existing brains, or build de novo Artificial Intelligence that mirrors human values. If something more powerful than humans, and not sharing human values, comes into existence - whether by de novo AI gone wrong, or augmented humans gone wrong - then we can expect to lose, hard. And value is fragile; losing just one dimension of human value can destroy nearly all of the utility we expect from the future.

So is that, then, the thing that I protect?

If it were - then what inspired me when times got tough would be, say, thinking of people being nice to each other. Or thinking of people laughing, and contemplating how humor probably exists among only an infinitesimal fraction of evolved intelligent species and their descendants. I would marvel at the power of sympathy to make us feel what others feel -

But that's not quite it either.

I once attended a small gathering whose theme was "This I Believe". You could interpret that phrase in a number of ways; I chose "What do you believe that most other people don't believe which makes a corresponding difference in your behavior?" And it seemed to me that most of how I behaved differently from other people boiled down to two unusual beliefs. The first belief could be summarized as "intelligence is a manifestation of order rather than chaos"; this accounts both for my attempts to master rationality, and my attempt to wield the power of AI.

And the second unusual belief could be summarized as: "Humanity's future can be a WHOLE LOT better than its past."

Not desperately darwinian robots surging out to eat as much of the cosmos as possible, mostly ignoring their own internal values to try and grab as many stars as possible, with most of the remaining matter going into making paperclips.

Not some bittersweet ending where you and I fade away on Earth while the inscrutable robots ride off into the unknowable sunset, having grown beyond such merely human values as love or sympathy.

Screw bittersweet. To hell with that melancholy-tinged crap. Why leave anyone behind? Why surrender a single thing that's precious?

(And the compromise-futures are all fake anyway; at this difficulty level, you steer precisely or you crash.)

The pattern of fun is also lawful. And, though I do not know all the law - I do think that written in humanity's value-patterns is the implicit potential of a happy future. A seriously goddamn FUN future. A genuinely GOOD outcome. Not something you'd accept with a sigh of resignation for nothing better being possible. Something that would make you go "WOOHOO!"

In the sequence on Fun Theory, I have given you, I hope, some small reason to believe that such a possibility might be consistently describable, if only it could be made real. How to read that potential out of humans and project it into reality... might or might not be as simple as "superpose our extrapolated reflected equilibria". But that's one way of looking at what I'm trying to do - to reach the potential of the GOOD outcome, not the melancholy bittersweet compromise. Why settle for less?

To really have something to protect, it has to be able to bring tears to your eyes. That, generally, requires something concrete to visualize - not just abstract laws. Reading the Laws of Fun doesn't bring tears to my eyes. I can visualize a possibility or two that makes sense to me, but I don't know if it would make sense to others the same way.

What does bring tears to my eyes? Imagining a future where humanity has its act together. Imagining children who grow up never knowing our world, who don't even understand it. Imagining the rescue of those now in sorrow, the end of nightmares great and small. Seeing in reality the real sorrows that happen now, so many of which are unnecessary even now. Seeing in reality the signs of progress toward a humanity that's at least trying to get its act together and become something more - even if the signs are mostly just symbolic: a space shuttle launch, a march that protests a war.

(And of course these are not the only things that move me. Not everything that moves me has to be a Cause. When I'm listening to e.g. Bach's Jesu: Joy of Man's Desiring, I don't think about how every extant copy might be vaporized if things go wrong. That may be true, but it's not the point. It would be as bad as refusing to listen to that melody because it was once inspired by belief in the supernatural.)

To really have something to protect, you have to be able to protect it, not just value it. My battleground for that better Future is, indeed, the fragile pattern of value. Not to keep it in stasis, but to keep it improving under its own criteria rather than randomly losing information. And then to project that through more powerful optimization, to materialize the valuable future. Without surrendering a single thing that's precious, because losing a single dimension of value could lose it all.

There's no easy way to do this, whether by de novo AI or by editing brains. But with a de novo AI, cleanly and correctly designed, I think it should at least be possible to get it truly right and win completely. It seems, for all its danger, the safest and easiest and shortest way (yes, the alternatives really are that bad). And so that is my project.

That, then, is the service in which I wield rationality. To protect the Future, on the battleground of the physical representation of value. And my weapon, if I can master it, is the ultimate hidden technique of Bayescraft - to explicitly and fully know the structure of rationality, to such an extent that you can shape the pure form outside yourself - what some call "Artificial General Intelligence" and I call "Friendly AI". Which is, itself, a major unsolved research problem, and so it calls into play the more informal methods of merely human rationality. That is the purpose of my art and the wellspring of my art.

That's pretty much all I wanted to say here about this Singularity business...

...except for one last thing; so after tomorrow, I plan to go back to posting about plain old rationality on Monday.

I'm both excited for and somewhat disappointed with the return to rationality.. I've enjoyed many of the posts on other topics, but the rationality posts are also immensely useful to me in everyday life. Maybe you can toss in some more fiction now and then? (Of course, I'll probably get speared by other commenters for saying that...)

Very interesting concept... would it mean that a 5 celled organism whose behavior could all be black-box analyzed to suggest that it wishes to protect its own life thus be considered rational?

I'm curious if anyone knows of any of EY's other writings that address the phenomenon of rationality as not requiring consciousness.

"I'm curious if anyone knows of any of EY's other writings that address the phenomenon of rationality as not requiring consciousness."

Cf. Eliezer-sub-2002 on evolution and rationality.

This post needs more explosions.

Ian C: neither group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value. See the post value is fragile.

Are not all/most organisms built to protect reproductive organs? Not just "male organisms are built to be protective of their testicles"?

Please explain what you mean by: "that one, centralized vulnerability is why a kick in the testicles hurts more than being hit on the head." A kick in the head can leave you as unable to reproduce as a kick in the balls. I think it is more likely to kill you, too.

Nick Hay: "[N]either group is changing human values as it is referred to here: everyone is still human, no one is suggesting neurosurgery to change how brains compute value."

Once again I fail to see how culturally-derived values can be brushed away as irrelevant under CEV. When you convince someone with a political argument, you are changing how their brain computes value. Just because the effect is many orders of magnitude subtler than major neurosurgery doesn't mean it's trivial.

Z. M. Davis: Good point, I was brushing that distinction under the rug. From this perspective all people arguing about values are trying to change someone's value computation, to a greater or lesser degree i.e. this is not the place to look if you want to discriminate between "liberal" and "conservative".

With the obvious way to implement a CEV, you start by modeling a population of actual humans (e.g. Earth's), then consider extrapolations of these models (know more, thought faster, etc). No "wipe culturally-defined values" step, however that would be defined.

Where was it suggested otherwise?

Nick: "Where was it suggested otherwise?"

Oh, no one's explicitly proposed a "wipe culturally-defined values" step; I'm just saying that we shouldn't assume that extrapolated human values converge. Cf. the thread following "Moral Error and Moral Disagreement."

I'm happy to hear that Eliezer will go back to posting on rationality.

CFAI 3.4.4: "The renormalizing shaper network should ultimately ground itself in the panhuman and gaussian layers..."

Nick, ZM, this is CFAI rather than CEV and in context it's about programmer independence, but doesn't this count as "wiping culturally defined values"?

TGGP,

Why, precisely?

CFAI is obsolete - nothing in there is my current thought unless I explicitly declare otherwise. I don't think there's anything left in CFAI now that isn't obsoleted by (a) "Coherent Extrapolated Volition", (b) some Overcoming Bias post, or (c) a good AI textbook such as "Artificial Intelligence: A Modern Approach".

With that said, ceteris paribus in terms of reasonable construal, ways of construing someone's 'reflective equilibrium' that tend to depend more heavily on current beliefs and values, will make it less likely for different reflective equilibria to overlap. Similarly with a fixed way of construing a reflective equilibrium, and arguments or observations which suggest that this fixed construal depends more heavily on mental content with more unconstrained degrees of freedom.

"Thus the freer the judgement of a man is in regard to a definite issue, with so much greater necessity will the substance of this judgement be determined." -- Friedrich Engels, Anti-Dühring

Would you mind if people modified CFAI and distributed the modified version? (Yes, I'm asking for a liberal copyright license.)

Yes, I'd mind. It's way too obsolete.

Is the content itself something you don't want propagated, or do you simply not want your name on it? If the former, I should note that parts of it certainly seem useful (e.g. injunctions, which are apparently not in CEV and likely not in any textbook); what should we do with these? Share them as quotes? Paraphrase them? Keep them to ourselves? Shun them without reading them first?

I'm in a position where it's not critical (yet), but I'd really like to build up some awareness of Friendlyness.

What do you think I should point people at? Keeping in mind that a large number of links means they're likely to wander off in the middle and forget their place.

Wasn't there some material in CFAI about solving the wirehead problem?

"I plan to go back to posting about plain old rationality on Monday."

You praise Bayes highly and frequently. Yet you haven't posted a commensurate amount of material on Bayesian theory. I've read the Intuitive and Technical Explanation essays, and they made me think that you could write a really superb series on Bayesian theory.

Philosophers have written lots on a priori arguments for Bayesianism (e.g. Cox's Theorem, Dutch Book Arguments, etc.). I'm more curious about the fruitfulness of Bayesianism: e.g. what issues it clarifies and what interesting questions it brings to light. Here are some more specific questions:

What are some of the insights you've gained from Pearl's work on causal graphs and counterfactuals? How did reading Pearl change your views about certain topics? What are the insights from Pearl that have been most productive for you in your own thinking? What do you disagree with Pearl about?
What are some more practical examples of powerful applications of Bayesianism in AI? That Bayesianism is the correct normative theory of rationality doesn't imply that adopting a Bayesian framework will immediately yield big practical advantages in AI design. It might take people time to develop practical methods. How good are those methods? (I'm thinking, for example, about tractability, as well as the fact that many AI people over 40 won't have had so much early training on Bayes).
What areas of the Bayesian picture need development? What problems do you think cannot currently be given a very satisfying treatment in the Bayesian framework?

Given your ability, demonstrated in "Intuitive" and elsewhere, to not just tell people how to think about a topic but to get them thinking in the right way, a series on Bayesian that started elementary and built up could be very worthwhile.

Eliezer: Thanks for the clarification.

Wondering, I like rationality posts.

EJ: It takes a much harder kick to the head to hurt as much as a kick to the balls.

As a martial artist (tae kwon do, specifically), I have been kicked in the head and balls many many times - and I would much rather be kicked in the head than the balls. The strongest kick to the head I've taken hurt a fair bit and made me groggy for an hour; but the strongest kick (which wasn't very) to my balls ruined my entire day.