Devil's Offers

Eliezer Yudkowsky

An iota of fictional evidence from The Golden Age by John C. Wright:

    Helion had leaned and said, "Son, once you go in there, the full powers and total command structures of the Rhadamanth Sophotech will be at your command. You will be invested with godlike powers; but you will still have the passions and distempers of a merely human spirit. There are two temptations which will threaten you. First, you will be tempted to remove your human weaknesses by abrupt mental surgery. The Invariants do this, and to a lesser degree, so do the White Manorials, abandoning humanity to escape from pain. Second, you will be tempted to indulge your human weakness. The Cacophiles do this, and to a lesser degree, so do the Black Manorials. Our society will gladly feed every sin and vice and impulse you might have; and then stand by helplessly and watch as you destroy yourself; because the first law of the Golden Oecumene is that no peaceful activity is forbidden. Free men may freely harm themselves, provided only that it is only themselves that they harm."
    Phaethon knew what his sire was intimating, but he did not let himself feel irritated. Not today. Today was the day of his majority, his emancipation; today, he could forgive even Helion's incessant, nagging fears.
    Phaethon also knew that most Rhadamanthines were not permitted to face the Noetic tests until they were octogenerians; most did not pass on their first attempt, or even their second. Many folk were not trusted with the full powers of an adult until they reached their Centennial. Helion, despite criticism from the other Silver-Gray branches, was permitting Phaethon to face the tests five years early...

    Then Phaethon said, "It's a paradox, Father. I cannot be, at the same time and in the same sense, a child and an adult. And, if I am an adult, I cannot be, at the same time, free to make my own successes, but not free to make my own mistakes."
    Helion looked sardonic. "'Mistake' is such a simple word. An adult who suffers a moment of foolishness or anger, one rash moment, has time enough to delete or destroy his own free will, memory, or judgment. No one is allowed to force a cure on him. No one can restore his sanity against his will. And so we all stand quietly by, with folded hands and cold eyes, and meekly watch good men annihilate themselves. It is somewhat... quaint... to call such a horrifying disaster a 'mistake.'"

Is this the best Future we could possibly get to—the Future where you must be absolutely stern and resistant throughout your entire life, because one moment of weakness is enough to betray you to overwhelming temptation?

Such flawless perfection would be easy enough for a superintelligence, perhaps—for a true adult—but for a human, even a hundred-year-old human, it seems like a dangerous and inhospitable place to live. Even if you are strong enough to always choose correctly—maybe you don't want to have to be so strong, always at every moment.

This is the great flaw in Wright's otherwise shining Utopia—that the Sophotechs are helpfully offering up overwhelming temptations to people who would not be at quite so much risk from only themselves. (Though if not for this flaw in Wright's Utopia, he would have had no story...)

If I recall correctly, it was while reading The Golden Age that I generalized the principle "Offering people powers beyond their own is not always helping them."

If you couldn't just ask a Sophotech to edit your neural networks—and you couldn't buy a standard package at the supermarket—but, rather, had to study neuroscience yourself until you could do it with your own hands—then that would act as something of a natural limiter. Sure, there are pleasure centers that would be relatively easy to stimulate; but we don't tell you where they are, so you have to do your own neuroscience. Or we don't sell you your own neurosurgery kit, so you have to build it yourself—metaphorically speaking, anyway—

But you see the idea: it is not so terrible a disrespect for free will, to live in a world in which people are free to shoot their feet off through their own strength—in the hope that by the time they're smart enough to do it under their own power, they're smart enough not to.

The more dangerous and destructive the act, the more you require people to do it without external help. If it's really dangerous, you don't just require them to do their own engineering, but to do their own science. A singleton might be justified in prohibiting standardized textbooks in certain fields, so that people have to do their own science—make their own discoveries, learn to rule out their own stupid hypotheses, and fight their own overconfidence. Besides, everyone should experience the joy of major discovery at least once in their lifetime, and to do this properly, you may have to prevent spoilers from entering the public discourse. So you're getting three social benefits at once, here.

But now I'm trailing off into plots for SF novels, instead of Fun Theory per se. (It can be fun to muse how I would create the world if I had to order it according to my own childish wisdom, but in real life one rather prefers to avoid that scenario.)

As a matter of Fun Theory, though, you can imagine a better world than the Golden Oecumene depicted above—it is not the best world imaginable, fun-theoretically speaking. We would prefer (if attainable) a world in which people own their own mistakes and their own successes, and yet they are not given loaded handguns on a silver platter, nor do they perish through suicide by genie bottle.

Once you imagine a world in which people can shoot off their own feet through their own strength, are you making that world incrementally better by offering incremental help along the way?

It's one matter to prohibit people from using dangerous powers that they have grown enough to acquire naturally—to literally protect them from themselves. One expects that if a mind kept getting smarter, at some eudaimonic rate of intelligence increase, then—if you took the most obvious course—the mind would eventually become able to edit its own source code, and bliss itself out if it chose to do so. Unless the mind's growth were steered onto a non-obvious course, or monitors were mandated to prohibit that event... To protect people from their own powers might take some twisting.

To descend from above and offer dangerous powers as an untimely gift, is another matter entirely. That's why the title of this post is "Devil's Offers", not "Dangerous Choices".

And to allow dangerous powers to be sold in a marketplace—or alternatively to prohibit them from being transferred from one mind to another—that is somewhere in between.

John C. Wright's writing has a particular poignancy for me, for in my foolish youth I thought that something very much like this scenario was a good idea—that a benevolent superintelligence ought to go around offering people lots of options, and doing as it was asked.

In retrospect, this was a case of a pernicious distortion where you end up believing things that are easy to market to other people.

I know someone who drives across the country on long trips, rather than flying. Air travel scares him. Statistics, naturally, show that flying a given distance is much safer than driving it. But some people fear too much the loss of control that comes from not having their own hands on the steering wheel. It's a common complaint.

The future sounds less scary if you imagine yourself having lots of control over it. For every awful thing that you imagine happening to you, you can imagine, "But I won't choose that, so it will be all right."

And if it's not your own hands on the steering wheel, you think of scary things, and imagine, "What if this is chosen for me, and I can't say no?"

But in real life rather than imagination, human choice is a fragile thing. If the whole field of heuristics and biases teaches us anything, it surely teaches us that. Nor has it been the verdict of experiment, that humans correctly estimate the flaws of their own decision mechanisms.

I flinched away from that thought's implications, not so much because I feared superintelligent paternalism myself, but because I feared what other people would say of that position. If I believed it, I would have to defend it, so I managed not to believe it. Instead I told people not to worry, a superintelligence would surely respect their decisions (and even believed it myself). A very pernicious sort of self-deception.

Human governments are made up of humans who are foolish like ourselves, plus they have poor incentives. Less skin in the game, and specific human brainware to be corrupted by wielding power. So we've learned the historical lesson to be wary of ceding control to human bureaucrats and politicians. We may even be emotionally hardwired to resent the loss of anything we perceive as power.

Which is just to say that people are biased, by instinct, by anthropomorphism, and by narrow experience, to underestimate how much they could potentially trust a superintelligence which lacks a human's corruption circuits, doesn't easily make certain kinds of mistakes, and has strong overlap between its motives and your own interests.

Do you trust yourself? Do you trust yourself to know when to trust yourself? If you're dealing with a superintelligence kindly enough to care about you at all, rather than disassembling you for raw materials, are you wise to second-guess its choice of who it thinks should decide? Do you think you have a superior epistemic vantage point here, or what?

Obviously we should not trust all agents who claim to be trustworthy—especially if they are weak enough, relative to us, to need our goodwill. But I am quite ready to accept that a benevolent superintelligence may not offer certain choices.

If you feel safer driving than flying, because that way it's your own hands on the steering wheel, statistics be damned—

—then maybe it isn't helping you, for a superintelligence to offer you the option of driving.

Gravity doesn't ask you if you would like to float up out of the atmosphere into space and die. But you don't go around complaining that gravity is a tyrant, right? You can build a spaceship if you work hard and study hard. It would be a more dangerous world if your six-year-old son could do it in an hour using string and cardboard.

"I flinched away from that thought's implications, not so much because I feared superintelligent paternalism myself, but because I feared what other people would say of that position."

This is basically THE reason I always advocate increased comfort with lying. It seems to me that this fear of believing what they don't want to say if they only believe truth is the single largest seemingly removable barrier to people becoming rationalists at all, or passing that barrier, to becoming the best rationalists they can be.

Can you expound on this just a bit. The second sentence is slightly difficult to parse, but sound like an interesting notion, so I'd like to be sure I understand what you said.

If you follow the rule that you honestly report your beliefs, and you believe something that other people disapprove of (they think it is crazy, immoral, whatever), then your rule says you have to honestly report your belief that other people disapprove of.

If you also don't want to report beliefs other people disapprove of, then you may wish to avoid acquiring beliefs that other people disapprove of.

This goal can contradict the epistemic rationalist's goal of acquiring accurate beliefs, when people disapprove of the truth. Therefore, (says Michael Vassar) discard the rule that you honestly report your beliefs, so that avoiding dissapproved of beliefs will not be your goal.

(An alternate solution is to be willing to report beliefs others disapprove of.)

There's a rather serious problems with that philosophy: It makes you a liar, which is often morally a bad thing to be. If you really eliminate all your discomfort with lying, you have converted yourself into a sociopath and probably become a scam artist to boot. You'll be like the guy selling "healing crystals" who knows full well that the crystals are bunk, but is making too much money to care.

My own solution is to think of disapproval as a relatively mild pain. We evolved to think that disapproval is one of the worst things in the universe, because back on the savannah it was: getting shunned meant getting killed.

But in a modern liberal society, you can be an atheist, a Singularitarian, a lesbian, a furry, and be disapproved of... and not actually die from this. In fact, usually it hurts about as much as a papercut. And since there may in fact be benefits to telling people you are these things---a more secular and pro-Singularity society for the former, more tolerance for your fellow lesbians and furries on the other---then speaking the disapproved belief is often precisely the right thing to do.

In some rare circumstances you can still die from disapproved beliefs---e.g. in Iran you can be hanged for being an atheist---in which case I honestly have no qualms about lying. I'll lie to torturers and murderers all day long if I have to.

It's a bit harder in intermediate cases, where the disapproval has real consequences but not fatal ones. But even then, one can escape most of the feelings of moral guilt by reminding oneself: I wouldn't be lying if they weren't bigots.

Atleast in my social surroundings, lying has never been asked for when i have an non-acceptable opinion, just keeping my mouth shut about them would be enough.

You are forgetting about "Werewolf Contracts" in the Golden Age. Under these contracts you can appoint someone who can "use force, if necessary, to keep the subscribing party away from addictions, bad nanomachines, bad dreams or other self-imposed mental alterations."

If you sign such a contract then, unlike what you wrote, it's not true that "one moment of weakness is enough to betray you."

I think the general point he's making still stands. You can always choose to remove the Werewolf Contract of your own volution, then force any sort of fever dream or nightmare onto yourself.

Moreover, The Golden Age also makes a point about the dangers of remaining unchanged. Orpheus, the most wealthy man in history, has modified his brain such that his values and worldview will never shift. This puts him in sharp contrast to Phaethon as the protagonist, whose whole arc is about shifting the strict moral equilibrium of the public to make important change happen. Orpheus, trapped in his morals, is as out of touch in the era of Phaethon as would be a Catholic crusader in modern Rome.

What is the point of trying to figure out what your friendly AI will choose in each standard difficult moral choice situation, if in each case the answer will be "how dare you disagree with it since it is so much smarter and more moral than you?" If the point is that your design of this AI will depend on how well various proposed designs agree with your moral intuitions in specific cases, well then the rest of us have great cause to be concerned about how much we trust your specific intuitions.

James is right; you only need one moment of "weakness" to approve a protection against all future moments of weakness, so it is not clear there is an asymmetric problem here.

In addition to what James said, I'm reminded of the mechanism to change screen resolution in Windows XP: It automatically resets to its original resolution in X seconds, in case you can't see the screen. This is so people can't break their computers in one moment of weakness.

A similar thing could be done with self-modification. Self-destruction would still be possible, of course, just as it is now (I could go jump off of a bridge). But just as suicide is something that is built up to in humans, failsafes could be put in place so self-modification was equally deliberate.

In addition to what James said, I'm reminded of the mechanism to change screen resolution in Windows XP: It automatically resets to its original resolution in X seconds, in case you can't see the screen. This is so people can't break their computers in one moment of weakness.

But you are absolutely allowed to break your computer in "one moment of weakness"; it isn't even hard. The reason for that dialog is because the computer honestly, genuinely can't predict if the new screen mode will work.

@James:

Doesn't the choice of a perfect external regulator amount to the same thing as directly imposing restrictions on yourself, thereby going back to the original problem? I suppose such a regulator, on indeed any stabilizing self-modification, could have the advantage of being publicly available and widely used, and therefore be well-tested and having thoroughly understood operations and consequences.

Another way to do it might be to create many copies of yourself (I'm assuming this scenario takes place inside a computer) and let majority (or 2/3s majority or etc) rule when it comes to "rescuing" copies that have made un-self-recoverable errors.

Anyway I suppose this is all somewhat beside the point since such a scenario was chosen as an example of what Eliezer expects a successful future to not look like.

@michael vassar:

So, are you saying that lying about your beliefs can be good because it allows you to freely believe some non-PC or otherwise unpopular idea (that your reason leads you to believe is the truth), without having to worry about the social consequences of being discovered to have such a belief?

I'm not sure if I agree with or not but it's worth thinking about.

Hrm... I think, at least initially I'd want some limiters for myself along the lines of, well, the system telling me "this isn't going to do what you actually really want it to do, so no."

But "no mental alteration at all without being a neuroanatomical master yourself in the first place", at least initially, seems a bit too harsh. That is, to the extent that one needs a bit of an intel/etc boost to fully master it in the first place, we'd have a bit of a problem here. :)

I'd be perfectly happy with something that, if, say, I said "I'd like to lower my agression" and it came back with "uh, no. The structure of your mind is such that that is tied to ambition/get-it-doneness and so on, and since even by your own measure you're overly passive as is. This is not what you want, even if you think it is."

(Note, I'm not saying that I have the knowledge to, well, say that agression and ambition are tied to each other like that. This is just a hypothetical, though at least from personal introspection they do seem potentially related like that)

But I'd like it if it also add on something like "However here is a subtler change that can be made that would have the effects you actually wanted out of what you just asked for."

or even "And on that note, here's something to do about that passivity/laziness that seems to be something that is a much larger source of frustration on your part."

However, I don't really have any objection to it sometimes returning with "No. This is the sort of thing you really want (even if you don't know it) to do/work out for yourself in terms of what's already available to you."

And on the other other other hand, there's the issue of "do we really want it to be the sort of thing that we'd perceive as a person, rather than an abstract process?"

"A singleton might be justified in prohibiting standardized textbooks in certain fields, so that people have to do their own science [...]"

No textbooks?! CEV had better overrule you on this one, or my future selves across the many worlds are all going to scream bloody murder. It may be said that I'm missing the point: that ex hypothesi the Friendly AI knows better than me.

But I'm still going to cry.

ShardPhoenix wrote "Doesn't the choice of a perfect external regulator amount to the same thing as directly imposing restrictions on yourself, thereby going back to the original problem?"

No because if there are many possible future states of the world it wouldn't be practical for you in advance to specify what restrictions you will have in every possible future state. It's much more practical for you to appoint a guardian who will make decisions after it has observed what state of the world has come to pass. Also, you might pick a regulator who would impose different restrictions on you than you would if you acted without a regulator.

ShardPhoenix also wrote "Another way to do it might be to create many copies of yourself (I'm assuming this scenario takes place inside a computer) and let majority (or 2/3s majority or etc) rule when it comes to 'rescuing' copies that have made un-self-recoverable errors."

Good idea except in the Golden Age World these copies would become free individuals who could modify themselves. You would also be financially responsible for all of these copies until they became adults.

Robin, if people are tempted to gloss my metaethical agenda as "creating a God to rule us all", then it seems clear that there's an expected benefit from talking about my object-level guesses in order to contradict this, since talking about the meta stuff doesn't seem to grab in quite the same way.

There's also the other standard reasons to talk about Fun Theory, such as people asking too little of the future (a God to rule over us is an example of this pattern, as is expecting wonderful new video games); or further crushing religious notions of theodicy (by illustrating what a well-designed world that respected its inhabitants free will and self-determination would look like, in contrast to this one).

Frelkins, Vassar advocates that rationalists should learn to lie, I advocate that rationalists should practice telling the truth more effectively, and we're still having that argument.

Frelkins, Vassar advocates that rationalists should learn to lie, I advocate that rationalists should practice telling the truth more effectively, and we're still having that argument.

A little over three years later, what are your and Vassar's current positions on this.

Re: Vassar advocates that rationalists should learn to lie, I advocate that rationalists should practice telling the truth more effectively, and we're still having that argument.

Uh huh. What are the goals of these hypothetical rational agents?

ShardPhoenix: Yes. This is the same principle that says that credible confidentiality within a group can sometimes improve aggregate information flow and collective epistemology.

Tim Tyler: Human goals. I definitely do NOT want alien rationalists to be able to lie, but I doubt I have much choice regarding that. Also not transhuman children. There I might have some limited choice.

Eliezer: I certainly think that rationalists should practice telling truth more effectively as well as lie, and you admit that not lying enough makes people gullible, so it's mostly a matter of estimates of the magnitude of the relevant trade-offs here. I think that our disagreements are based on radically different models of social psychology. We disagree a great deal about the degree to which being known to sometimes lie reduces future credibility in the eyes of actual existent humans relative to being known to sometimes mislead without lying. I believe that being known to lie increases credibility somewhat relative to "wizards oath", while you think it greatly decreases it. I think that I know your reasons for your belief and that you don't know mine. I'm not sure whether you think that I know your reasons, and I'm not sure whether this difference in social psychological theory is the specific belief we disagree about. I'd like confirmation on whether you agree that this is our main point of disagreement. Also possibly a poll of the audience on the social psychology fact.

For many reasons I think it's better to remember to see a superintelligence as modeling the world (including people in it) on a level different from intentionality, and using concepts unnatural to a human. The world with a superintelligence in it, if you need to understand its impact on the world, doesn't have any humans, any intelligent agents at all, not even the singleton itself in the model that singleton runs in its moments of decision. Only the singleton makes decisions, and with respect to those decisions everything else is stuff of its mind, the material that gets optimized, according to humane utility function. The utility function is ultimately over the stuff of reality, not over transhuman people or any kind of sentient beings. This underlies the perspective on singleton as new humane physics of the world.

The way we interpret the world in a singleton and actions of a singleton on the world is different from the way it interprets the world and makes decisions on it, even if a simplified model agrees with reality nine times out of ten. What the singleton builds can be interpreted back from our perspective as sentient beings, and again sentient beings that we interpret from the optimized stuff of reality, could from our perspective be seen as interpreting what's going on as there being multiple sentient beings going around in a new world, learning, communicating, living their lives. They can even (be interpreted to) interpret the actions of the singleton as certain adjustments to the physics, to people's minds, to objects in the world, but it's not the level where the singleton's decisions are being made. It's the level on which they make their own decisions. Their decisions are determined by their cognitive algorithms, but the outcomes of their decisions are taken into account in arranging the conditions that allow those decisions to be made, even to be thought about, even to the options for thoughts of one agent that lead to thoughts of other agents after object-level interaction that lead to the outcome in question. It's a perpetual worldwide Newcomb's paradox in action, with singleton arranging everything it can to be right, including keeping a balance with unwanted interference, and unwanted awareness of interference, which is interference in its own right, and so on. You are the stuff of physics, and you determine what comes of your actions, but this time physics is not at all simple, in very delicate ways, and you consist of this superintelligent physics as well. I think that this perspective allows to see how the guiding process can be much more subtle than prohibiting things that fall in natural human or transhuman categories.

Of course, these human interpretations would apply to optimized future only if the singleton is tuned so perfectly as to produce something that can be described by them, and maybe not even then, because a creative surprise could show a better unexpected way.

I think that an empirical approach self modification would quickly become prominent. alter one variable and test it, with a self imposed timeout clause. the problem is that this does not apply to one sort of change: a change in utility function. an inadvertent change of utility function is extremely dangerous, because changing your utility function is of infinite negative utility by the standards of your current utility, and vice-versa.

an inadvertent change of utility function is extremely dangerous, because changing your utility function is of infinite negative utility by the standards of your current utility, and vice-versa.

Not true at all. A change from N_paperclips to N_paperclips + 10^-100N_staples, for instance, probably has no effect. A change to N_paperclips + .5N_staples might result in fewer paperclips, but finitely many.

I should have specified a domain change. a modification that varies your utility function by degree has a calculable negative utility.

nazgulnarsil, can you give examples? I don't understand your claim. What do you mean by "domain change" here?

Michael Vasar:- maybe you chose to work in an area, where you had to lie to survive. Perhaps Eli works in an area where the discovery of lying has a higher price (in destroyed reputation) than sticking to the inconvenient truth. But unfortunately I think it is easier to discount a truth-sayer (he is after all an alien) than a randomised liar (he is one of us). In other words it is easier to buy the mix of truth-and-untruth than the truth and nothing but the truth. But the social result seems to be the same - untruth wins.

Wright either didn't know or chose to ignore the thinking that led to Asimov's Three Laws. While the laws themselves (that robots must keep humans from coming to harm, obey human orders, and preserve themselves, in that order of priority) are impossible to codify, the underlying insight that we make knives with hilts is sound. Science fiction has a dystopian/idiot inventor streak because that makes it easier to get the plot going.

From another angle, part of sf is amplifying aspects of the real world. We can wreck our lives in a moment of passion or bad judgement, or by following a bad idea repeatedly.

Having to figure out the neuroscience by yourself is not an especially good protection against mistakes. Knowing how to make a change is different from and easier than knowing how to debug a change.

I don't think prohibiting textbooks is necessary or sufficient to give people the pleasure of making major discoveries. Some people are content to solve puzzles, but others don't just want being right, they want to be right about something new. My feeling is that the world is always going to be more complex than what we know about it. I'm hoping that improved tools, including improved cognition, will mean that we'll never run out of new things, including new general principles, to discover.

I agree with Psy-Kosh that advice should and would be available, and also something like therapy if you suspect that you've deeply miscalibrated yourself. However, there is going to more than one system of advice and of therapy because there isn't going to be agreement on what constitutes an improvement.

Excuse me if it's been covered here, but in an environment like that deciding, not just what you want, but what changes turn you into not-you is a hard problem.

Eliezer, this post seems to me to reinforce, not weaken, a "God to rule us all" image. Oh, and among the various clues that might indicate to me that someone would make a good choice with power, the ability to recreate that power from scratch does not seem a particularly strong clue.

among the various clues that might indicate to me that someone would make a good choice with power, the ability to recreate that power from scratch does not seem a particularly strong clue.

That was my first reaction as well, but Eliezer must have intentionally chosen a "clue" that is not too strong. After all, an FAI doesn't really need to use any clues--it can just disallow any choice that is not actually good (except that would destroy the feeling of free will). So I think "let someone make a choice with a power if they can recreate that power from scratch" is meant to be an example of the kind of tradeoff an FAI might make between danger and freedom.

What I don't understand is, since this is talking about people born after the Singularity, why do parents continue to create children who are so prone to making bad choices. I can understand not wanting to take an existing person and forcibly "fix" them, but is there supposed to be something in our CEV that says even new beings created from scratch must have a tendency to make wrong choices to be maximally valuable?

In re lying when you're trying to set up a research and invention organization: It seems to me that it would make recruiting difficult. The public impression of what you're doing is going to be your lies, which makes it even harder to get the truth of what you're doing to the people you want to work with. And even the discrepancy between your public and private versions doesn't appear in some embarrassing form on the internet, you're going to tend to attract sneaky people and repel candid people, and this will probably make it harder to have an organization which does what you want.

The fact that Michael Vassar is willing to advocate "increased comfort with lying" in a public forum suggests to me that we are not talking about a literal Secret a la intelligence work, but something more along the lines of little white lies like "You're looking good today" where the listener as well as the speaker knows to apply a discounting factor. I might be willing to tolerate that in people I associate with - in fact, I do so all the time - so long as the overall system is one where it's okay if I give only true answers when I'm questioned myself.

However, the fact that Michael Vassar can't think of a better word than "lie" for this, for the sake of PR purposes, suggests to me that he's not going to be very good at shading the truth - that he's still trying to approach things the nerd way. Non-nerds lie easily and they'd never think of calling the process "increased comfort with lying", either - at least I've never read a non-nerd using those words outright, whatever it is they're actually advocating. But now I'm getting into the details of our current strategic debates, which isn't really on-topic for this post.

I should have read Michael Vassar's original post in this thread more carefully.

I suspect that people's fear of becoming more rational has at least as much to do with the perceived consequences of being more honest with themselves about what they're doing as it does with the fear of having to tell the truth to other people.

Michael, I thought that you advocated comfort with lying because smart people marginalize themselves by compulsive truth-telling. For instance, they find it hard to raise venture capital. Or (to take an example that happened at my company), when asked "Couldn't this project of yours be used to make a horrible terrorist bioweapon?", they say, "Yes." (And they interpret questions literally instead of practically; e.g., the question actually intended, and that people actually hear, is more like, "Would this project significantly increase the ease of making a bioweapon?", which might have a different answer.)

Am I compulsively telling the truth again? Doggone it.

Is it just me, or did Wright's writing style sound very much like Eliezer's?

Surely the problem with the clipping isn't the loaded gun or the stern stoicism - it's the daft Prime Directive. Of course you should edit someone back to sanity, by force if necessary. I could play rhetorical tricks and argue from incapacity, but I won't even do that. Saving people is just obviously the right thing to do.

@James Miller:

What justifies the right of your past self to exert coercive control over your future self? Their may be overlap of interests, which is one of the typical de facto criteria for coercive intervention; but can your past self have an epistemic vantage point over your future self?

Can you write a contract saying that if your future self ever converts away from Christianity, the Church has the right to convert you back? Can you write a contract saying that your mind is to be overwritten with an approximation of Richard Dawkins who will then be tortured in hell forever for his sins?

If you constrain the contracts that can be written, then clearly you have an idea of good or bad mindstates apart from the raw contract law, and someone is bound to ask why you don't outlaw the bad mindstates directly.

If children under 75 don't need Werewolf Contracts, why should children under 750?

Phaethon, in the story, refuses to sign a Werewolf Contract out of pride, just like his father. You could laugh and call him an idiot. Personally, I think that (a) many people are at least that stupid, at least right now and (b) it's cruel to inflict horrific punishments on people for no greater idiocy than that. But at any rate, why force Phaethon to sacrifice his pride, by putting him in that environment? Why make him give up on his dream of adulthood? Why force everyone to take a cautious non-heroic approach to life or else risk a fate worse than death? Phaethon is being harmed by the extra options offered him, one way or another.

Peter: if your change of utility functions is of domain rather than degree you can't calculate the negative utility. the difference in utility between making 25 paperclips a day and 500 a day is a calculable difference for a paperclip maximizing optimization process.

however, if the paperclip optimizer self-modifies and inadvertently changes his utility function to maximizing staples....well you can't calculate paperclips in terms of staples. This outcome is of infinite negative utility from the perspective of the paperclip maximizer. And vice-versa. Once the utility function has changed to maximizing staples, it would be of infinite negative utility to change back to paperclips from the perspective of the staple maximizing utility.

this defeats the built in time out clause. with a modification that only affects your ability to reach your current utility, you have a measurable output. with a change that changes your utility you are changing the very thing you were using to measure success by.

I know that this isn't worded very well. I'm sure one of elizer's posts has done this subject better at some point.

Eliezer-

“What justifies the right of your past self to exert coercive control over your future self? There may be overlap of interests, which is one of the typical de facto criteria for coercive intervention; but can your past self have an epistemic vantage point over your future self?”

In general I agree. But werewolf contracts protect against temporary lapses in rationality. My level of rationality varies. Even assuming that I remain in good health for eternity there will almost certainly exist some hour in the future in which my rationality is much lower than it is today. My current self, therefore, will almost certainly have an “epistemic vantage point over [at least a small part of my] future self.” Given that I could cause great harm to myself in a very short period of time I am willing to significantly reduce my freedom in return for protecting myself against future temporary irrationality.

Having my past self exert coercive control of my future self will reduce my future information costs. For example, when you download something from the web you must often agree to a long list of conditions. Under current law if these terms of conditions included something like “you must give Microsoft all of your wealth” the term wouldn’t be enforced. If the law did enforce such terms then you would have to spend a lot of time examining the terms of everything you agreed to. You would be much better off if your past self prevented your current self from giving away too much in the fine print of agreements.

“If you constrain the contracts that can be written, then clearly you have an idea of good or bad mindstates apart from the raw contract law, and someone is bound to ask why you don't outlaw the bad mindstates directly.”

The set of possible future mindstates / world state combinations is very large. It’s too difficult to figure out in advance which combinations are bad. It’s much more practical to sign a Werewolf contract which gives your guardian the ability to look at the mindstate / worldstate you are in and then decide if you should be forced to move to a different mindstate.

“why force Phaethon to sacrifice his pride, by putting him in that environment?”

Phaethon placed greater weight on freedom than pride and your type of paternalism would reduce his freedom.

But in general I agree that if most humans alive today were put in the Golden Age world then many would do great harm to themselves and in such a world I would prefer that the Sophotechs exercise some paternalism. But if such paternalism didn’t exist then Warewolf contracts would greatly reduce the type of harm you refer to.

nazgulnarsil, I think you're confused about what a utility function is. "Maximizing paperclips" or "maximizing staples" are not utility functions, although they may describe the actions carried out by an expected utility maximizer. Try reading the wikipedia article on expected utility.

When something is particularly dangerous or potentially destructive you must not be allowed to have a textbook telling you the safe way to implement it. Instead, you should discover such destructive powers by your own (relatively) weak skills and, presumably, trial and error. You are not permitted to learn from other people's mistakes. You must make them yourself.

I'm not feeling safe yet. Am I at least allowed to see casualty statistics for exploring particular fields of prohibited study? Perhaps a graph of success rate vs IQ and time spent in background research?

Cameron, I suppose that's a fair enough comment. I'm used to the way things work in AI, where the naive simply fail utterly and completely to accomplish anything whatsoever, rather than hurting anyone else or themselves, and you have to get pretty far to get beyond that to the realm of dangers.

Not to mention, I'm used to the idiom that the lack of any prior feedback or any second try is what makes something an adult problem - but that really is representative of the dangers faced by someone able to modify their own brain circuitry. If there's an AI that can say "No" but you're allowed to ignore the "No" then one mistake is fatal. The sort of people who think, "Gee, I'll just run myself with the modification for a while and see what happens 'cuz I can always go back" - they might ignore a "No" based on their concept of "testing", and then that would be the end of them.

You want to put the dangerous things behind a challenge/lock such that by the time you pass it you know how dangerous they really are. "Make an AI", unfortunately, may not be quite strong enough as a case of this, but "Make an AI without anyone helping you on a 100MHz computer with 1GB of RAM and 100GB of disk space" is probably strong enough.

Why not just make it so you have to take time to make a choice? If you have to spend 10% of your life choosing something, or even ten continuous hours without changing your mind back during any of it, it will take a lot more than a moment of weakness to make a big mistake.

I'm aware that I'm some three years late on this, but I can't help but disagree with you here. I'm all for having default safeguards on our Really Powerful Optimization Process to prevent people from condemning themselves to eternal hell or committing suicide on a whim - maybe something along the lines of the doctor's Do No Harm. You could phrase it, if not formalize it, something like:

'If a decision will predictably lead to consequences horrifying to those it affects, the system will refuse to help - if you wish to self destruct, you must do so with your own strength.'

Beyond that, though -- the advantage to Libertarianism is that you can implement any other system you want in it. If you want the AI to remove low-value choices from your environment, you are welcome to instruct it to do that. If you want it to prohibit you, in the future, from subtler methods of self-destruction, you are free to do that too. These options were available to Phaethon in the series, and he refused to take them -- because, frankly, the protagonist of those novels was an idiot.

It is certainly true that human choice is a fragile and whimsical thing -- but that doesn't mean that it has no value. If people are made unhappy by the default behavior of the system, they have the choice to change it, at least where it affects them. It feels wrong to me for you to suggest that we prohibit people from judging otherwise, forever, as part of the design for our superintelligences. Our lives have always been lived on the precipice of disaster, and we have always been given choices that limit those risks.

Just to make sure I understand the system you're proposing: suppose there's a Do No Harm rule like the one you propose, and I tell the AI to give me the option of "subtler methods of self-destruction" and the AI predicts that giving me that option is likely to lead to consequences that horrify someone it affects (or some more formal version of that condition).

In that case, the AI refuses to give me that option. Right?

If so, can you clarify how is that different from the OP's proposed behavior in this case?

I should have clarified: I meant horrifying in a pretty extreme sense. Like, telling the machine to torture you forever, or destroy you completely, or remove your sense of boredom. .

Just doing something that, say, alienates all your friends wouldn't qualify. Or loses all your money, if money is still a thing that makes sense. I was also including all the things that you CAN do with your own strength but probably shouldn't. Building a machine to torture your upload forever wouldn't be disallowed, but you might want to prohibit the system from letting you.

I meant the 'Do No Harm' rule to be a bare-minimal safeguard against producing a system with net negative utility because a small minority manage to put themselves into infinitely negative utility situations. Not to be a general-class 'the system knows what is best' measure, which is what it sounded to me like EY was proposing. Now, in his defense, this is probably, in the context of strong AI, a discussion of what the CEV of humanity might end up choosing wisely, but I don't like it.

I don't know that I agree with the OP's proposed basis for distinction, but I at least have a reasonable feel for what it would preclude. (I would even agree that, given clients substantially like modern-day humans, precluding that stuff is reasonably ethical. That said, the notion that a system on the scale the OP is discussing would have clients substantially like modern-day humans and relate to them in a fashion substantially like the fictional example given strikes me as incomprehensibly absurd.)

I don't quite understand the basis for distinction you're suggesting instead. I mean, I understand the specific examples you're listing for exclusion, of course (eternal torture, lack of boredom, complete destruction), but not what they have in common or how I might determine whether, for example, choosing to be eternally alienated from friendship should be allowed or disallowed. Is that sufficiently horrifying? How could one tell?

I do understand that you don't mean the system to prevent, say, my complete self-destruction as long as I can build the tools to destroy myself without the system's assistance. The OP might agree with you about that, I'm not exactly sure. I suspect I disagree, personally, though I admit it's a tricky enough question that a lot depends on how I frame it.

I know someone who drives across the country on long trips, rather than flying. Air travel scares him. Statistics, naturally, show that flying a given distance is much safer than driving it. But some people fear too much the loss of control that comes from not having their own hands on the steering wheel. It's a common complaint.

If that's their true rejection, they should be scared of getting into a car driven by someone else too.

When comparing travel safety you shouldn´t compare those statistics directly, if when traveling by car you don´t accept any pilots that are suicidal, on drugs (including alcohol), falling asleep, or wannabe racing drivers, your chance of accidents goes to ~10% of the chance that is used in those statistics.

In that case, the AI refuses to give me that option. Right?

If so, can you clarify how is that different from the OP's proposed behavior in this case?

I should have clarified: I meant horrifying in a pretty extreme sense. Like, telling the machine to torture you forever, or destroy you completely, or remove your sense of boredom. .

53

Devil's Offers

53

53