Protected From Myself

Eliezer Yudkowsky

Followup to: The Magnitude of His Own Folly, Entangled Truths, Contagious Lies

Every now and then, another one comes before me with the brilliant idea: "Let's lie!"

Lie about what?—oh, various things. The expected time to Singularity, say. Lie and say it's definitely going to be earlier, because that will get more public attention. Sometimes they say "be optimistic", sometimes they just say "lie". Lie about the current degree of uncertainty, because there are other people out there claiming to be certain, and the most unbearable prospect in the world is that someone else pull ahead. Lie about what the project is likely to accomplish—I flinch even to write this, but occasionally someone proposes to go and say to the Christians that the AI will create Christian Heaven forever, or go to the US government and say that the AI will give the US dominance forever.

But at any rate, lie. Lie because it's more convenient than trying to explain the truth. Lie, because someone else might lie, and so we have to make sure that we lie first. Lie to grab the tempting benefits, hanging just within reach—

Eh? Ethics? Well, now that you mention it, lying is at least a little bad, all else being equal. But with so much at stake, we should just ignore that and lie. You've got to follow the expected utility, right? The loss of a lie is much less than the benefit to be gained, right?

Thus do they argue. Except—what's the flaw in the argument? Wouldn't it be irrational not to lie, if lying has the greatest expected utility?

When I look back upon my history—well, I screwed up in a lot of ways. But it could have been much worse, if I had reasoned like those who offer such advice, and lied.

Once upon a time, I truly and honestly believed that either a superintelligence would do what was right, or else there was no right thing to do; and I said so. I was uncertain of the nature of morality, and I said that too. I didn't know if the Singularity would be in five years or fifty, and this also I admitted. My project plans were not guaranteed to deliver results, and I did not promise to deliver them. When I finally said "Oops", and realized that I needed to go off and do more fundamental research instead of rushing to write code immediately—

—well, I can imagine the mess I would have had on my hands, if I had told the people who trusted me: that the Singularity was surely coming in ten years; that my theory was sure to deliver results; that I had no lingering confusions; and that any superintelligence would surely give them their own private island and a harem of catpersons of the appropriate gender. How exactly would one then explain why you're now going to step back and look for math-inventors instead of superprogrammers, or why the code now has to be theorem-proved?

When you make an honest mistake, on some subject you were honest about, the recovery technique is straightforward: Just as you told people what you thought in the first place, you now list out the actual reasons that you changed your mind. This diff takes you to your current true thoughts, that imply your current desired policy. Then, just as people decided whether to aid you originally, they re-decide in light of the new information.

But what if you were "optimistic" and only presented one side of the story, the better to fulfill that all-important goal of persuading people to your cause? Then you'll have a much harder time persuading them away from that idea you sold them originally—you've nailed their feet to the floor, which makes it difficult for them to follow if you yourself take another step forward.

And what if, for the sake of persuasion, you told them things that you didn't believe yourself? Then there is no true diff from the story you told before, to the new story now. Will there be any coherent story that explains your change of heart?

Conveying the real truth is an art form. It's not an easy art form—those darned constraints of honesty prevent you from telling all kinds of convenient lies that would be so much easier than the complicated truth. But, if you tell lots of truth, you get good at what you practice. A lot of those who come to me and advocate lies, talk earnestly about how these matters of transhumanism are so hard to explain, too difficult and technical for the likes of Joe the Plumber. So they'd like to take the easy way out, and lie.

We don't live in a righteous universe where all sins are punished. Someone who practiced telling lies, and made their mistakes and learned from them, might well become expert at telling lies that allow for sudden changes of policy in the future, and telling more lies to explain the policy changes. If you use the various forbidden arts that create fanatic followers, they will swallow just about anything. The history of the Soviet Union and their sudden changes of policy, as presented to their ardent Western intellectual followers, helped inspire Orwell to write 1984.

So the question, really, is whether you want to practice truthtelling or practice lying, because whichever one you practice is the one you're going to get good at. Needless to say, those who come to me and offer their unsolicited advice do not appear to be expert liars. For one thing, a majority of them don't seem to find anything odd about floating their proposals in publicly archived, Google-indexed mailing lists.

But why not become an expert liar, if that's what maximizes expected utility? Why take the constrained path of truth, when things so much more important are at stake?

Because, when I look over my history, I find that my ethics have, above all, protected me from myself. They weren't inconveniences. They were safety rails on cliffs I didn't see.

I made fundamental mistakes, and my ethics didn't halt that, but they played a critical role in my recovery. When I was stopped by unknown unknowns that I just wasn't expecting, it was my ethical constraints, and not any conscious planning, that had put me in a recoverable position.

You can't duplicate this protective effect by trying to be clever and calculate the course of "highest utility". The expected utility just takes into account the things you know to expect. It really is amazing, looking over my history, the extent to which my ethics put me in a recoverable position from my unanticipated, fundamental mistakes, the things completely outside my plans and beliefs.

Ethics aren't just there to make your life difficult; they can protect you from Black Swans. A startling assertion, I know, but not one entirely irrelevant to current affairs.

If you've been following along my story, you'll recall that the downfall of all my theories, began with a tiny note of discord. A tiny note that I wouldn't ever have followed up, if I had only cared about my own preferences and desires. It was the thought of what someone else might think—someone to whom I felt I owed an ethical consideration—that spurred me to follow up that one note.

And I have watched others fail utterly on the problem of Friendly AI, because they simply try to grab the banana in one form or another—seize the world for their own favorite moralities, without any thought of what others might think—and so they never enter into the complexities and second thoughts that might begin to warn them of the technical problems.

We don't live in a righteous universe. And so, when I look over my history, the role that my ethics have played is so important that I've had to take a step back and ask, "Why is this happening?" The universe isn't set up to reward virtue—so why did my ethics help so much? Am I only imagining the phenomenon? That's one possibility. But after some thought, I've concluded that, to the extent you believe that my ethics did help me, these are the plausible reasons in order of importance:

1) The honest Way often has a kind of simplicity that trangressions lack. If you tell lies, you have to keep track of different stories you've told different groups, and worry about which facts might encounter the wrong people, and then invent new lies to explain any unexpected policy shifts you have to execute on account of your mistake. This simplicity is powerful enough to explain a great deal of the positive influence that I attribute to my ethics, in a universe that doesn't reward virtue per se.

2) I was stricter with myself, and held myself to a higher standard, when I was doing various things that I considered myself ethically obligated to do. Thus my recovery from various failures often seems to have begun with an ethical thought of some type—e.g. the whole development where "Friendly AI" led into the concept of AI as a precise art. That might just be a quirk of my own personality; but it seems to help account for the huge role my ethics played in leading me to important thoughts, which I cannot just explain by saying that the universe rewards virtue.

3) The constraints that the wisdom of history suggests, to avoid hurting other people, may also stop you from hurting yourself. When you have some brilliant idea that benefits the tribe, we don't want you to run off and do X, Y, and Z, even if you say "the end justifies the means!" Evolutionarily speaking, one suspects that the "means" have more often benefited the person who executes them, than the tribe. But this is not the ancestral environment. In the more complicated modern world, following the ethical constraints can prevent you from making huge networked mistakes that would catch you in their collapse. Robespierre led a shorter life than Washington.

Part of the sequence Ethical Injunctions

Next post: "Ethical Inhibitions"

Previous post: "Ends Don't Justify Means (Among Humans)"

"But what if you were "optimistic" and only presented one side of the story, the better to fulfill that all-important goal of persuading people to your cause? Then you'll have a much harder time persuading them away from that idea you sold them originally - you've nailed their feet to the floor, which makes it difficult for them to follow if you yourself take another step forward."

Hmmm... if you don't need people following you, could it help you (from a rationality standpoint) to lie? Suppose that you read about AI technique X. Technique X looks really impressive, but you're still skeptical of it. If you talk about how great technique X looks, people will start to associate you with technique X, and if you try to change your mind about it, they'll demand an explanation. But if you lie (either by omission, or directly if someone asks you about X), you can change your mind about X later on and nobody will call you on it.

NOTE: This does require telling the same lie to everyone; telling different lies to different groups of people is, as noted, too messy.

I'm not sure that "Technique X looks really impressive, but you're still skeptical of it" is too complicated to explain, if that's the truth.

If you don't need people following you, why bother lying?

I suspect whatever reason there is to lie will be related to a reason to tell the truth.

"The universe isn't set up to reward virtue", but I think most people are. If someone is deceiving you then doing what they ask is likely not in your interest, otherwise they could persuade you without deception.

If something is difficult to explain due to technical understanding, you can 'lie' about it, while noting that it is an oversimplification intended to give an idea, and not wholly correct. I believe this is the norm for science publications targeted at the general population.

To lie effectively, I find the only way is to convince myself of something I know to be false. Then I can subsequently tell what I believe to be the truth without things like keeping track of what I told who or body language clues. This is, of course, still perilous and immoral in other ways, and often non-permanent since certain things can trigger the original memory.

Is it only honesty that has this protection-rail tendency, or have other ethics also had it?

Other ethics. For example, robbing a bank might seem like a good way to get funding, but there's all too many ways for it to go wrong.

On the other hand, I'm not sure there are any unithical risks that you'd still fallow through with if you were being honest about it.

This is a worrisome line of thought, as I consider one of the main underlying points of this blog to question the necessity and rationality of conventional ethics.

What if the belief in God grants you some form of protection against threats of which you are not currently aware? For example, the threat of insanity, which we know to be sort of an occupational hazard among AI researchers?

Just for the sake of devil's advocacy:

4) You want to attribute good things to your ethics, and thus find a way to interpret events that enables you to do so.

If we see that adhering to ethics in the past has wound up providing us with utility, the correct course of action is not to throw out the idea of maximizing our utility, but rather to use adherence to ethics as an integral part of our utility maximization strategy.

I wonder if liars or honest folk are happier and or more successful in life.

simon: "Just for the sake of devil's advocacy: 4) You want to attribute good things to your ethics, and thus find a way to interpret events that enables you to do so."

Eliezer: "The universe isn't set up to reward virtue - so why did my ethics help so much? Am I only imagining the phenomenon? That's one possibility."

I think considerations like these are probably not too meaningful. You're likely to be mentally unstable or misguided in some small way that has an overriding influence (at least at this level of effect) that you're unaware of.

The universe isn't set up to reward virtue.

I believe that ethics are an effort to improve the odds of good outcomes. So it's not that the universe is set up to reward ethics, it's that ethics are set up to follow the universe.

The challenge is that what we're taught is good is a mixture of generally useful rules, rules which are more useful to the people in charge than to the people who aren't, and mere mistakes.

When I saw The Dark Knight, I was left thinking how long it's going take before some truth-seeking cop realizes that Batman didn't kill those people and Gordon is part of the conspiracy. Acceptable risk, Batman?

You can't duplicate this protective effect by trying to be clever and calculate the course of "highest utility". The expected utility just takes into account the things you know to expect. It really is amazing, looking over my history, the extent to which my ethics put me in a recoverable position from my unanticipated, fundamental mistakes, the things completely outside my plans and beliefs.

You acted as though you anticipated the unanticipated?

Probably either: you were lucky; your utility function isn't what you consciously thought it was; - or you have supernatural moral powers.

Probably either: you were lucky; your utility function isn't what you consciously thought it was; - or you have supernatural moral powers.

Or it is a tiny note of accord, to be attended to as diligently as the tiny notes of discord. Which is what the post went on to do.

Success is as much to be learned from as failure.

Excellent post. Please write more on ethics as safety rails on unseen cliffs.

Good consequences may come from good virtues, I gather.

pdf23ds: I think considerations like these are probably not too meaningful. You're likely to be mentally unstable or misguided in some small way that has an overriding influence (at least at this level of effect) that you're unaware of.

Also, they might not be too meaningful if, anticipating in advance, one is allowed to say at a future point, 'Well, I applied virtues R, and this had optimal outcome A', because, anticipating in advance, one is allowed to think at a future point, 'Well, I applied virtues R, and unfortunately this had suboptimal outcome B'. This might be like planning to try and not planning to do, if the virtue variable is bound and the outcome variable is free.

Is it only honesty that has this protection-rail tendency, or have other ethics also had it?

Interesting question. As far as I can tell, the two main effects that leap out at me are (1) the benefit of having not done various life-complicating bad things in the pursuit of early goals that I later had to change, and (2) the beneficial effect of holding myself to a higher standard when pursuing ethical obligations.

Has my life been better because of my sense of ethical inhibition against taking and wielding power? I honestly don't know - I can't compare my possible selves side-by-side. Maybe that other Eliezer learned to wield power well through practice, and built a large solid organization. Or maybe he turned to the dark side and ended up surrounded by a coterie a la Rand. In the absence of anything that even looks like a really blatant effect, it's hard to extract so much as an anecdote.

Excellent post!

As for explanation, the way I would put it is that ethics consists of hard-won wisdom from many lifetimes, which is how it is able to provide me with a safety rail against the pitfalls I have yet to encounter in my single lifetime.

I'm confused, you aren't really arguing that people hiding Jews from the Nazis should answer to the SS honestly? Sometimes honesty is unethical.

If statements I make shift a listener's priors then we can evaluate the statements I choose to make based on how much they shift the listener's priors towards which truths. This is an interesting way, to compare the decision to make different types of possible statements with lies as a special case. "Successful" lies move at least one of the listener's priors away from truth, their belief about what you believe.

Even if I'm willing to restrict myself to true statements, which in extreme cases I won't, I face the dilemma of choosing which true statements to make.

This relates to your post about the clever arguer and filtered evidence.

I'm confused, you aren't really arguing that people hiding Jews from the Nazis should answer to the SS honestly? Sometimes honesty is unethical.

Yes, I was planning to mention that today - as an illustration of when you would willfully take on the unsimplicity and unforeseen pathways of lies.

If statements I make shift a listener's priors then we can evaluate the statements I choose to make based on how much they shift the listener's priors towards which truths.

That's a dangerous sort of path to go down - the idea that anything that persuades someone of what you believe to be true must be a good argument to make, without further restriction. It doesn't just take us toward the clever arguer; it takes us into the realm of manipulating people "for their own good", using lies for the sake of what is argued to be a greater epistemic good. This is the rationalization brought to me by many of the foolish advisors.

How can ethics be judged other than by referring to their consequences? You certainly can't use ethics to judge themselves.

The idea that "the universe does not reward virtue" gets it wrong. 'Virtue' is a meaningless concept by itself; it only has meaning in terms of what the universe does. Virtue is what the universe rewards, so to speak, to the degree that we can say the universe offers rewards.

It would be more accurate to say that virtue is what works in regards to the universe.

Sometimes honesty is unethical.

Ethics are just sets of rules used to determine our behavior in some context. Sometimes X is unethical, for any given value of X, depending on what ethics have been established.

"Always lie" is an ethic. Not a very evolutionarily fit ethic, nor a practical one. But it's an ethic.

Russell: "ethics consists of hard-won wisdom from many lifetimes, which is how it is able to provide me with a safety rail against the pitfalls I have yet to encounter in my single lifetime."

Yes, generations of selection for "what works" encoded in terms of principles tends to outweigh assessment within the context of an individual agent in terms of expected utility -- to the extent that the present environment is representative of the environment of adaptation. To the extent it isn't, then the best one can do is rely on the increasing weight of principles perceived hierarchically as increasingly effective over increasing scope of consequences, e.g. action on the basis of the principle known as the "law of gravity" is a pretty certain bet.

increasing weight of principles perceived hierarchically as increasingly effective over increasing scope of consequences

Ack. Could you please invent some terminology so you don't have to keep repeating this unwieldy phrase?

odf23ds: "Ack. Could you please invent some terminology so you don't have to keep repeating this unwieldy phrase?"

I'm eager for an apt idiom for the concept, and one also for "increasing coherence over increasing context."

It seems significant, and indicative of our cultural unfamiliarity -- even discomfort -- with concepts of systems, information, and evolutionary theory, that we don't have such shorthand.

But then I look at the gross misunderestimation of almost every issue of any complexity at every level of supposed sophistication of social decision-making, and then geek speak seems not so bad.

Suggestions?

the threat of insanity, which we know to be sort of an occupational hazard among AI researchers What? That sounds like sci-fi/horror writing, I've never heard of it happening in real life.

odf23ds: "Ack. Could you please invent some terminology so you don't have to keep repeating this unwieldy phrase?"

Well, there are worse things than an unwieldy phrase! Consider how many philosophers have spent entire books trying to communicate their thoughts, and still failed. Looked at that way, Jef's phrase has a very good ratio of length to precision.

For the record, I never intended to argue that any statement which shifts the audience's priors towards what I perceive to be the truth is justified.

What I was starting to get at, and I hope Eliezer will address, is how we should select which true statements to make.

What about true statements which shift at least one of the listener's priors away from the true prior? What about avoiding true statements which would improve the listener's priors?

I believe that intelligent people sometimes avoid telling lies by selectively choosing truths which manipulate someones priors.