Ethics Notes

Eliezer Yudkowsky

20 Ethics Notes

by Eliezer Yudkowsky

21st Oct 2008

13 min read

20

Followup to: Ethical Inhibitions, Ethical Injunctions, Prices or Bindings?

(Some collected replies to comments on the above three posts.)

From Ethical Inhibitions:

Spambot: Every major democratic political leader lies abundantly to obtain office, as it's a necessity to actually persuade the voters. So Bill Clinton, Jean Chretien, Winston Churchill should qualify for at least half of your list of villainy.

Have the ones who've lied more, done better?

In cases where the politician who told more lies won, has that politician gone on to rule well in an absolute sense?

Is it actually true that no one who refused to lie (and this is not the same as always telling the whole truth) could win political office?

Are the lies expected, and in that sense, less than true betrayals of someone who trusts you?

Are there understood Rules of Politics that include lies but not assassinations, which the good politicians abide by, so that they are not really violating the ethics of their tribe?

Will the world be so much worse off if sufficiently good people refuse to tell outright lies and are thereby barred from public office; or would we thereby lose a George Washington or Marcus Aurelius or two, and thereby darken history?

Pearson: American revolutionaries as well ended human lives for the greater good

Police must sometimes kill the guilty. Soldiers must sometimes kill civilians (or if the enemy knows you're reluctant, that gives them a motive to use civilians as a shield). Spies sometimes have legitimate cause to kill people who helped them, but this has probably been done far more often than it has been justified by a need to end the Nazi nuclear program.

I think it's worth noting that in all such cases, you can write out something like a code of ethics and at least try to have social acceptance of it. Politicians, who lie, may prefer not to discuss the whole thing, but politicians are only a small slice of society.

Are there many who transgress even the unwritten rules and end up really implementing the greater good? (And no, there's no unwritten rule that says you can rob a bank to stop global warming.)

...but if you're placing yourself under unusual stress, you may need to be stricter than what society will accept from you. In fact, I think it's fair to say that the further I push any art, such as rationality or AI theory, the more I perceive that what society will let you get away with is tremendously too sloppy a standard.

Yvain: There are all sorts of biases that would make us less likely to believe people who "break the rules" can ever turn out well. One is the halo effect. Another is availability bias—it's much easier to remember people like Mao than it is to remember the people who were quiet and responsible once their revolution was over, and no one notices the genocides that didn't happen because of some coup or assassination.

When the winners do something bad, it's never interpreted as bad after the fact. Firebombing a city to end a war more quickly, taxing a populace to give health care to the less fortunate, intervening in a foreign country's affairs to stop a genocide: they're all likely to be interpreted as evidence for "the ends don't justify the means" when they fail, but glossed over or treated as common sense interventions when they work.

Both fair points. One of the difficult things in reasoning about ethics is the extent to which we can expect historical data to be distorted by moral self-deception on top of the more standard fogs of history.

Morrison: I'm not sure you aren't "making too much stew from one oyster". I certainly feel a whole lot less ethically inhibited if I'm really, really certain I'm not going to be punished. When I override, it feels very deliberate—"system two" grappling and struggling with "system one"'s casual amorality, and with a significant chance of the override attempt failing.

Weeks: This entire post is kind of surreal to me, as I'm pretty confident I've never felt the emotion described here before... I don't remember ever wanting to do something that I both felt would be wrong and wouldn't have consequences otherwise.

I don't know whether to attribute this to genetic variance, environmental variance, misunderstanding, or a small number of genuine sociopaths among Overcoming Bias readers. Maybe Weeks is referring to "not wanting" in terms of not finally deciding to do something he felt was wrong, rather than not being tempted?

From Ethical Injunctions:

Psy-Kosh: Given the current sequence, perhaps it's time to revisit the whole Torture vs Dust Specks thing?

I can think of two positions on torture to which I am sympathetic:

Strategy 1: No legal system or society should ever refrain from prosecuting those who torture. Anything important enough that torture would even be on the table, like the standard nuclear bomb in New York, is important enough that everyone involved should be willing to go to prison for the crime of torture.

Strategy 2: The chance of actually encountering a "nuke in New York" situation, that can be effectively resolved by torture, is so low, and the knock-on effects of having the policy in place so awful, that a blanket injunction against torture makes sense.

In case 1, you would choose TORTURE over SPECKS, and then go to jail for it, even though it was the right thing to do.

In case 2, you would say "TORTURE over SPECKS is the right alternative of the two, but a human can never be in an epistemic state where you have justified belief that this is the case". Which would tie in well to the Hansonian argument that you have an O(3^^^3) probability penalty from the unlikelihood of finding yourself in such a unique position.

So I am sympathetic to the argument that people should never torture, or that a human can't actually get into the epistemic state of a TORTURE vs. SPECKS decision.

But I can't back the position that SPECKS over TORTURE is inherently the right thing to do, which I did think was the issue at hand. This seems to me to mix up an epistemic precaution with morality.

There's certainly worse things than torturing one person—torturing two people, for example. But if you adopt position 2, then you would refuse to torture one person with your own hands even to save a thousand people from torture, while simultaneously saying that that it is better for one person to be tortured at your own hands than for a thousand people to be tortured at someone else's.

I try to use the words "morality" and "ethics" consistently as follows: The moral questions are over the territory (or, hopefully equivalently, over epistemic states of absolute certainty). The ethical questions are over epistemic states that humans are likely to be in. Moral questions are terminal. Ethical questions are instrumental.

Hanson: The problem here of course is how selective to be about rules to let into this protected level of "rules almost no one should think themselves clever enough to know when to violate." After all, your social training may well want you to include "Never question our noble leader" in that set. Many a Christian has been told the mysteries of God are so subtle that they shouldn't think themselves clever enough to know when they've found evidence that God isn't following a grand plan to make this the best of all possible worlds.

Some of the flaws in Christian theology lie in what they think their supposed facts would imply: e.g., that because God did miracles you can know that God is good. Other problems come more from the falsity of the premises than the invalidity of the deductions. Which is to say, if God did exist and were good, then you would be justified in being cautious around stomping on parts of God's plan that didn't seem to make sense at the moment. But this epistemic state would best be arrived at via a long history of people saying, "Look how stupid God's plan is, we need to do X" and then X blowing up on them. Rather than, as is actually the case, people saying "God's plan is X" and then X blows up on them.

Or if you'd found with some historical regularity that, when you challenged the verdict of the black box, that you seemed to be right 90% of the time, but the other 10% of the time you got black-swan blowups that caused a hundred times as much damage, that would also be cause for hesitation—albeit it doesn't quite seem like grounds for suspecting a divine plan.

Nominull: S o... do you not actually believe in your injunction to "shut up and multiply"? Because for some time now you seem to have been arguing that we should do what feels right rather than trying to figure out what is right.

Certainly I'm not saying "just do what feels right". There's no safe defense, not even ethical injunctions. There's also no safe defense, not even "shut up and multiply".

I probably should have been clearer about this before, but I was trying to discuss things in order, and didn't want to wade into ethics without specialized posts...

People often object to the sort of scenarios that illustrate "shut up and multiply" by saying, "But if the experimenter tells you X, what if they might be lying?"

Well, in a lot of real-world cases, then yes, there are various probability updates you perform based on other people being willing to make bets against you; and just because you get certain experimental instructions doesn't imply the real world is that way.

But the base case has to be moral comparisons between worlds, or comparisons of expected utility between given probability distributions. If you can't ask about the base case, then what good can you get from instrumental ethics built on top?

Let's be very clear that I don't think that one small act of self-deception is an inherently morally worse event than, say, getting a hand chopped off. I'm asking, rather, how one should best avoid the dismembering chainsaw; and I am arguing that in reasonable states of knowledge a human can attain, the answer is, "Don't deceive yourself, it's a black-swan bet at best." Furthermore, that in the vast majority of cases where I have seen people conclude otherwise, it has indicated messed-up reasoning more than any actual advantage.

Vassar: For such a reason, I would be very wary of using such rules in an AGI, but of course, perhaps the actual mathematical formulation of the rule in question within the AGI would be less problematic, though a few seconds of thought doesn't give me much reason to think this.

Are we still talking about self-deception? Because I would give odds around as extreme as the odds I would give of anything, that if you tell me "the AI you built is trying to deceive itself", it indicates that some kind of really epic error has occurred. Controlled shutdown, immediately.

Vassar: In a very general sense though, I see a logical problem with this whole line of thought. How can any of these injunctions survive except as self-protecting beliefs? Isn't this whole approach just the sort of "fighting bias with bias" that you and Robin usually argue against?

Maybe I'm not being clear about how this would work in an AI!

The ethical injunction isn't self-protecting, it's supported within the structural framework of the underlying system. You might even find ethical injunctions starting to emerge without programmer intervention, in some cases, depending on how well the AI understood its own situation.

But the kind of injunctions I have in mind wouldn't be reflective—they wouldn't modify the utility function, or kick in at the reflective level to ensure their own propagation. That sounds really scary, to me—there ought to be an injunction against it!

You might have a rule that would controlledly shut down the (non-mature) AI if it tried to execute a certain kind of source code change, but that wouldn't be the same as having an injunction that exerts direct control over the source code to propagate itself.

To the extent the injunction sticks around in the AI, it should be as the result of ordinary reasoning, not reasoning taking the injunction into account! That would be the wrong kind of circularity; you can unwind past ethical unjunctions!

My ethical injunctions do not come with an extra clause that says, "Do not reconsider this injunction, including not reconsidering this clause." That would be going way too far. If anything, you ought to have an injunction against that kind of circularity (since it seems like a plausible failure mode in which the system has been parasitized by its own content).

You should never, ever murder an innocent person who's helped you, even if it's the right thing to do

Shut up and do the impossible!

Ord: As written, both these statements are conceptually confused. I understand that you didn't actually mean either of them literally, but I would advise against trading on such deep-sounding conceptual confusions.

I can't weaken them and make them come out as the right advice.

Even after "Shut up and do the impossible", there was that commenter who posted on their failed attempt at the AI-Box Experiment by saying that they thought they gave it a good try—which shows how hard it is to convey the sentiment of "Shut up and do the impossible!"

Readers can work out on their own how to distinguish the map and the territory, I hope. But if you say "Shut up and do what seems impossible!", then that, to me, sounds like dispelling part of the essential message—that what seems impossible doesn't look like it "seems impossible", it just looks impossible.

Likewise with "things you shouldn't do even if they're the right thing to do". Only the paradoxical phrasing, which is obviously not meant to be taken literally, conveys the danger and tension of ethics—the genuine opportunities you might be passing up—and for that matter, how dangerously meta the whole line of argument is.

"Don't do it, even if it seems right" sounds merely clever by comparison—like you're going to reliably divine the difference between what seems right and what is right, and happily ride off into the sunset.

Crowe: This seems closely related to inside-view versus outside-view. The think-lobe of the brain comes up with a cunning plan. The plan breaks an ethical rule but calculation shows it is for the greater good. The executive-lobe of the brain then ponders the outside view. Every-one who has executed an evil cunning plan has run a calculation of the greater good and had their plan endorsed. So the calculation lack outside-view credibility.

Yes, inside view versus outside view is definitely part of this. And the planning fallacy, optimism, and overconfidence, too.

But there are also biases arguing against the same line of reasoning, as noted by Yvain: History may be written by the victors to emphasize the transgressions of the losers while overlooking the moral compromises of those who achieved "good" results, etc.

Also, some people who execute evil cunning plans may just have evil intent—possibly also with outright lies about their intentions. In which case, they really wouldn't be in the reference class of well-meaning revolutionaries, albeit you would have to worry about your comrades; the Trotsky->Lenin->Stalin slide.

Kurz: What's to prohibit the meta-reasoning from taking place before the shutdown triggers? It would seem that either you can hard-code an ethical inhibition or you can't. Along those lines, is it fair to presume that the inhibitions are always negative, so that non-action is the safe alternative? Why not just revert to a known state?

If a self-modifying AI with the right structure will write ethical injunctions at all, it will also inspect the code to guarantee that no race condition exists with any deliberative-level supervisory systems that might have gone wrong in the condition where the code executes. Otherwise you might as well not have the code.

Inaction isn't safe but it's safer than running an AI whose moral system has gone awry.

Finney: Which is better: conscious self-deception (assuming that's even meaningful), or unconscious?

Once you deliberately choose self-deception, you may have to protect it by adopting other Dark Side Epistemology. I would, of course, say "neither" (as otherwise I would be swapping to the Dark Side) but if you ask me which is worse—well, hell, even I'm still undoubtedly unconsciously self-deceiving, but that's not the same as going over to the Dark Side by allowing it!

From Prices or Bindings?:

Psy-Kosh: Hrm. I'd think "avoid destroying the world" itself to be an ethical injunction too.

The problem is that this is phrased as an injunction over positive consequences. Deontology does better when it's closer to the action level and negative rather than positive.

Imagine trying to give this injunction to an AI. Then it would have to do anything that it thought would prevent the destruction of the world, without other considerations. Doesn't sound like a good idea.

Crossman: Eliezer, can you be explicit which argument you're making? I thought you were a utilitarian, but you've been sounding a bit Kantian lately.

If all I want is money, then I will one-box on Newcomb's Problem.

I don't think that's quite the same as being a Kantian, but it does reflect the idea that similar decision algorithms in similar epistemic states will tend to produce similar outputs, and that such decision systems should not pretend to the logical impossibility of local optimization. But this is a deep subject on which I have yet to write up my full views.

Clay: Put more seriously, I would think that being believed to put the welfare of humanity ahead of concerns about personal integrity could have significant advantages itself.

The whole point here is that "personal integrity" doesn't have to be about being a virtuous person. It can be about trying to save the world without any concern for your own virtue. It can be the sort of thing you'd want a pure nonsentient decision agent to do, something that was purely a means and not at all an end in itself.

Andrix: There seems to be a conflict here between not lying to yourself, and holding a traditional rule that suggests you ignore your rationality.

Your rationality is the sum of your full abilities, including your wisdom about what you refrain from doing in the presence of what seem like good reasons.

Yvain: I am glad Stanislav Petrov, contemplating his military oath to always obey his superiors and the appropriate guidelines, never read this post.

An interesting point, for several reasons.

First, did Petrov actually swear such an oath, and would it apply in such fashion as to require him to follow the written policy rather than using his own military judgment?

Second, you might argue that Petrov's oath wasn't intended to cover circumstances involving the end of the world, and that a common-sense exemption should apply when the stakes suddenly get raised hugely beyond the intended context of the original oath. I think this fails, because Petrov was regularly in charge of a nuclear-war installation and so this was exactly the sort of event his oath would be expected to apply to.

Third, the Soviets arguably implemented what I called Strategy 1 above: Petrov did the right thing, and was censured for it anyway.

Fourth—maybe, on sober reflection, we wouldn't have wanted the Soviets to act differently! Yes, the written policy was stupid. And the Soviet Union was undoubtedly censuring Petrov out of bureaucratic coverup, not for reasons of principle. But do you want the Soviet Union to have a written, explicit policy that says, "Anyone can ignore orders in a nuclear war scenario if they think it's a good idea," or even an explicit policy that says "Anyone who ignores orders in a nuclear war scenario, who is later vindicated by events, will be rewarded and promoted"?

Part of the sequence Ethical Injunctions

(end of sequence)

Previous post: "Prices or Bindings?"

HonestyMetaethicsPractical

Personal Blog

20

New Comment

Rendering 0/46 comments, sorted by

oldest

(show more) Click to highlight new comments since: Today at 12:56 PM

Moderation Log