Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

That Tiny Note of Discord

16 Post author: Eliezer_Yudkowsky 23 September 2008 06:02AM

Followup toThe Sheer Folly of Callow Youth

When we last left Eliezer1997, he believed that any superintelligence would automatically do what was "right", and indeed would understand that better than we could; even though, he modestly confessed, he did not understand the ultimate nature of morality.  Or rather, after some debate had passed, Eliezer1997 had evolved an elaborate argument, which he fondly claimed to be "formal", that we could always condition upon the belief that life has meaning; and so cases where superintelligences did not feel compelled to do anything in particular, would fall out of consideration.  (The flaw being the unconsidered and unjustified equation of "universally compelling argument" with "right".)

So far, the young Eliezer is well on the way toward joining the "smart people who are stupid because they're skilled at defending beliefs they arrived at for unskilled reasons".  All his dedication to "rationality" has not saved him from this mistake, and you might be tempted to conclude that it is useless to strive for rationality.

But while many people dig holes for themselves, not everyone succeeds in clawing their way back out.

And from this I learn my lesson:  That it all began—

—with a small, small question; a single discordant note; one tiny lonely thought...

As our story starts, we advance three years to Eliezer2000, who in most respects resembles his self of 1997.  He currently thinks he's proven that building a superintelligence is the right thing to do if there is any right thing at all.  From which it follows that there is no justifiable conflict of interest over the Singularity, among the peoples and persons of Earth.

This is an important conclusion for Eliezer2000, because he finds the notion of fighting over the Singularity to be unbearably stupid.  (Sort of like the notion of God intervening in fights between tribes of bickering barbarians, only in reverse.)  Eliezer2000's self-concept does not permit him—he doesn't even want—to shrug and say, "Well, my side got here first, so we're going to seize the banana before anyone else gets it."  It's a thought too painful to think.

And yet then the notion occurs to him:

Maybe some people would prefer an AI do particular things, such as not kill them, even if life is meaningless?

His immediately following thought is the obvious one, given his premises:

In the event that life is meaningless, nothing is the "right" thing to do; therefore it wouldn't be particularly right to respect people's preferences in this event.

This is the obvious dodge.  The thing is, though, Eliezer2000 doesn't think of himself as a villain.  He doesn't go around saying, "What bullets shall I dodge today?"  He thinks of himself as a dutiful rationalist who tenaciously follows lines of inquiry.  Later, he's going to look back and see a whole lot of inquiries that his mind somehow managed to not follow—but that's not his current self-concept. 

So Eliezer2000 doesn't just grab the obvious out.  He keeps thinking.

But if people believe they have preferences in the event that life is meaningless, then they have a motive to dispute my Singularity project and go with a project that respects their wish in the event life is meaningless.  This creates a present conflict of interest over the Singularity, and prevents right things from getting done in the mainline event that life is meaningful.

Now, there's a lot of excuses Eliezer2000 could have potentially used to toss this problem out the window.  I know, because I've heard plenty of excuses for dismissing Friendly AI.  "The problem is too hard to solve" is one I get from AGI wannabes who imagine themselves smart enough to create true Artificial Intelligence, but not smart enough to solve a really difficult problem like Friendly AI.  Or "worrying about this possibility would be a poor use of resources, what with the incredible urgency of creating AI before humanity wipes itself out—you've got to go with what you have", this being uttered by people who just basically aren't interested in the problem.

But Eliezer2000 is a perfectionist.  He's not perfect, obviously, and he doesn't attach as much importance as I do to the virtue of precision, but he is most certainly a perfectionist. The idea of metaethics that Eliezer2000  espouses, in which superintelligences know what's right better than we do, previously seemed to wrap up all the problems of justice and morality in an airtight wrapper.

The new objection seems to poke a minor hole in the airtight wrapper.  This is worth patching.  If you have something that's perfect, are you really going to let one little possibility compromise it?

So Eliezer2000 doesn't even want to drop the issue; he wants to patch the problem and restore perfection.  How can he justify spending the time?  By thinking thoughts like:

What about Brian Atkins?  [Brian Atkins being the startup funder of the Singularity Institute.]  He would probably prefer not to die, even if life were meaningless.  He's paying for the Singularity Institute right now; I don't want to taint the ethics of our cooperation.

Eliezer2000's sentiment doesn't translate very well—English doesn't have a simple description for it, or any other culture I know.  Maybe the passage in the Old Testament, "Thou shalt not boil a young goat in its mother's milk".  Someone who helps you out of altruism shouldn't regret helping you; you owe them, not so much fealty, but rather, that they're actually doing what they think they're doing by helping you.

Well, but how would Brian Atkins find out, if I don't tell him?  Eliezer2000 doesn't even think this except in quotation marks, as the obvious thought that a villain would think in the same situation.  And Eliezer2000 has a standard counter-thought ready too, a ward against temptations to dishonesty—an argument that justifies honesty in terms of expected utility, not just a personal love of personal virtue:

Human beings aren't perfect deceivers; it's likely that I'll be found out.  Or what if genuine lie detectors are invented before the Singularity, sometime over the next thirty years?  I wouldn't be able to pass a lie detector test.

Eliezer2000 lives by the rule that you should always be ready to have your thoughts broadcast to the whole world at any time, without embarrassment.  Otherwise, clearly, you've fallen from grace: either you're thinking something you shouldn't be thinking, or you're embarrassed by something that shouldn't embarrass you.

(These days, I don't espouse quite such an extreme viewpoint, mostly for reasons of Fun Theory.  I see a role for continued social competition between intelligent life-forms, as least as far as my near-term vision stretches.  I admit, these days, that it might be all right for human beings to have a self; as John McCarthy put it, "If everyone were to live for others all the time, life would be like a procession of ants following each other around in a circle."  If you're going to have a self, you may as well have secrets, and maybe even conspiracies.  But I do still try to abide by the principle of being able to pass a future lie detector test, with anyone else who's also willing to go under the lie detector, if the topic is a professional one.  Fun Theory needs a commonsense exception for global catastrophic risk management.)

Even taking honesty for granted, there are other excuses Eliezer2000 could use to flush the question down the toilet.  "The world doesn't have the time" or "It's unsolvable" would still work.  But Eliezer 2000 doesn't know that this problem, the "backup" morality problem, is going to be particularly difficult or time-consuming.  He's just now thought of the whole issue.

And so Eliezer2000 begins to really consider the question:  Supposing that "life is meaningless" (that superintelligences don't produce their own motivations from pure logic), then how would you go about specifying a fallback morality?  Synthesizing it, inscribing it into the AI?

There's a lot that Eliezer2000 doesn't know, at this point.  But he has been thinking about self-improving AI for three years, and he's been a Traditional Rationalist for longer than that.  There are techniques of rationality that he has practiced, methodological safeguards he's already devised.  He already knows better than to think that all an AI needs is the One Great Moral Principle.  Eliezer2000 already knows that it is wiser to think technologically than politically.  He already knows the saying that AI programmers are supposed to think in code, to use concepts that can be inscribed in a computer.  Eliezer2000 already has a concept that there is something called "technical thinking" and it is good, though he hasn't yet formulated a Bayesian view of it. And he's long since noticed that  suggestively named LISP tokens don't really mean anything, etcetera.  These injunctions prevent him from falling into some of the initial traps, the ones that I've seen consume other novices on their own first steps into the Friendly AI problem... though technically this was my second step; I well and truly failed on my first.

But in the end, what it comes down to is this:  For the first time, Eliezer2000 is trying to think technically about inscribing a morality into an AI, without the escape-hatch of the mysterious essence of rightness.

That's the only thing that matters, in the end.  His previous philosophizing wasn't enough to force his brain to confront the details.  This new standard is strict enough to require actual work.  Morality slowly starts being less mysterious to him—Eliezer2000 is starting to think inside the black box.

His reasons for pursuing this course of action—those don't matter at all.

Oh, there's a lesson in his being a perfectionist.  There's a lesson in the part about how Eliezer2000 initially thought this was a tiny flaw, and could have dismissed it out-of-mind if that had been his impulse.

But in the end, the chain of cause and effect goes like this:  Eliezer2000 investigated in more detail, therefore he got better with practice.  Actions screen off justifications.  If your arguments happen to justify not working things out in detail, like Eliezer1996, then you won't get good at thinking about the problem.  If your arguments call for you to work things out in detail, then you have an opportunity to start accumulating expertise.

That was the only choice that mattered, in the end—not the reasons for doing anything.

I say all this, as you may well guess, because of the AI wannabes I sometimes run into, who have their own clever reasons for not thinking about the Friendly AI problem.  Our clever reasons for doing what we do, tend to matter a lot less to Nature than they do to ourselves and our friends.  If your actions don't look good when they're stripped of all their justifications and presented as mere brute facts... then maybe you should re-examine them.

A diligent effort won't always save a person.  There is such a thing as lack of ability.  Even so, if you don't try, or don't try hard enough, you don't get a chance to sit down at the high-stakes table—never mind the ability ante.  That's cause and effect for you.

Also, perfectionism really matters.  The end of the world doesn't always come with trumpets and thunder and the highest priority in your inbox.  Sometimes the shattering truth first presents itself to you as a small, small question; a single discordant note; one tiny lonely thought, that you could dismiss with one easy effortless touch...

...and so, over succeeding years, understanding begins to dawn on that past Eliezer, slowly.  That sun rose slower than it could have risen.  To be continued.

 

Part of the sequence Yudkowsky's Coming of Age

Next post: "Fighting a Rearguard Action Against the Truth"

Previous post: "The Sheer Folly of Callow Youth"

Comments (33)

Sort By: Old
Comment author: Doug_S. 23 September 2008 06:10:27AM 10 points [-]

I'm pretty confident that I have some very stupid beliefs. The problem is that I don't know what they are.

Comment author: Valter 23 September 2008 09:16:00AM 2 points [-]

It seems that Eliezer1997 thought that there is exactly ONE "meaning of life", valid for all intelligent beings at all times and without any conflicts of interest.

It does not seem a very intuitive belief (except for very religious types and Eliezer1997 was not one of those), so what was its justification?

Comment author: Ben_Jones 23 September 2008 10:43:27AM 2 points [-]

Must have been pretty distressing when that inherent universal rightness started looking less and less...inherent. To the point that it wasn't there at all. Kind of like standing on solid ground that slowly reveals itself to be the edge of a precipice.

My inner Hanson asks whether you can vividly remember that youthful sense of being absolutely, ineluctably right and correct in your assertions about things like this. It sounds as though maybe you can't - particularly when you talk about yourself in the third person.

Maybe you should embrace the fact that the maker of all these mistakes was in fact you, and not some strange entity distant in time and intellect. The consequence of not doing so would seem to be overconfidence in the positions that now seem so obviously correct.

What price further enlightenments?

Comment author: Eric5 23 September 2008 05:15:30PM 1 point [-]

Ben,

Maybe you should embrace the fact that the maker of all these mistakes was in fact you, and not some strange entity distant in time and intellect. The consequence of not doing so would seem to be overconfidence in the positions that now seem so obviously correct.

It seems to me that there's a difference here between looking back on opinions which you now disagree with, and looking back on methodologies which you now see as unreliable. Yes, I was as confident in the past about my religiosity as I am now about my rationality, and yes I look back on that time as though it were a completely different person (which, though there is a continuous progression from that person to the one I am now, is very true in a sense). But that's because I can see how prior methodologies led me to wrong conclusions, not just that I disagree with conclusions I once agreed with. The phrase occurs to me: "The wheel of science turns, but it doesn't turn backward." This seems to be a similar situation. (I wish I had slept better last night so my brain wasn't so foggy and I could elucidate better, but hopefully the concept comes through.)

Comment author: Lara_Foster2 23 September 2008 05:34:26PM 0 points [-]

I second Valter and Ben. It's hard for me to grasp that you actually believed there was any meaning to life at all, let alone with high confidence. Any ideas on where that came from? The thought, "But what if life is meaningless?" hardly seems like a "Tiny Note of Discord," but like a huge epiphany in my book. I was not raised with any religion (well, some atheist-communism, but still), and so never thought there was any meaning to life to begin with. I don't think this ever bothered me 'til I was 13 and recognized the concept of determinism, but that's another issue. Still- why would someone who believed that we're all just information-copying-optimization matter think there was any meaning to begin with?

Comment author: Lara_Foster2 23 September 2008 05:51:13PM 7 points [-]

Actually, I CANNOT grasp what life being 'meaningful' well... means. Meaningful to what? To the universe? That only makes sense if you believe there is some objective judge of what state of the universe is best. And then, why should *we* care? Cuz we should? HUH? Meaningful to us? Well yes- we want things...Did you think that there was one thing all people wanted? Why would you think that necessary to evolution? What on earth did you think 'meaning' could be?

Comment author: Tom_McCabe2 23 September 2008 05:57:05PM 2 points [-]

"Eliezer2000 lives by the rule that you should always be ready to have your thoughts broadcast to the whole world at any time, without embarrassment."

I can understand most of the paths you followed during your youth, but I don't really get this. Even if it's a good idea for Eliezer_2000 to broadcast everything, wouldn't it be stupid for Eliezer_1200, who just discovered scientific materialism, to broadcast everything?

"If everyone were to live for others all the time, life would be like a procession of ants following each other around in a circle."

For a more mathematical version of this, see http://www.acceleratingfuture.com/tom/?p=99.

"It does not seem a very intuitive belief (except for very religious types and Eliezer1997 was not one of those), so what was its justification?"

WARNING: Eliezer-1999 content.

http://yudkowsky.net/tmol-faq/tmol-faq.html

"Even so, if you don't try, or don't try hard enough, you don't get a chance to sit down at the high-stakes table - never mind the ability ante."

Are you referring to external exclusion of people who don't try, or self-exclusion?

Comment author: Valter 23 September 2008 06:04:59PM 1 point [-]

Lara: I believe that Eliezer1997 did conceive of the possibility that life has no meaning (apparently equated with a constant utility function for everyone?); my question was more along the lines of "why did he think there is only ONE meaning?"

After all, even classical candidates for "meaning of life" really imply different goals - e.g., "happiness" (or power or survival, etc.) could be MY happiness or YOUR happiness or the happiness of my future self, etc. and these "meanings" may well be mutually incompatible goals.

Comment author: Zubon 23 September 2008 07:02:47PM 2 points [-]

I took "meaning" in the sense that Lara Foster and Valter are discussing to be an example of the mind projection fallacy. As Lara says, "meaningful to x" is coherent, but just "meaningful" is like "taller" without "taller than x."

See also St. Anselm's ontological argument, which assumes that "to be conceived of" is a property of the thing being conceived.

Comment author: Eliezer_Yudkowsky 23 September 2008 07:12:40PM 4 points [-]

Valter: Mind projection fallacy. It seemed like right actions had an inherent indescribable rightness about them, and that this was just a fact like any other fact. Eliezer_1999 didn't think a human could unravel that mystery and so he didn't try - that is, he felt the same way about the ineffable rightness of right, as many philosophers talk about qualia and the indescribable redness of red. Those philosophers talking mysteriously helped legitimize the mistake for him, unfortunately. But see also the previous posts in this thread.

The truth is only one, but there can be a thousand mistakes, and so different people's mistakes need not seem compelling to each other.

Tom, the rule is not that broadcasting your thoughts shouldn't offend anyone, but that it should give no justifiable complaint against you.

Ben:

My inner Hanson asks whether you can vividly remember that youthful sense of being absolutely, ineluctably right and correct in your assertions about things like this. It sounds as though maybe you can't - particularly when you talk about yourself in the third person.

I'm not sure I did have such a youthful sense. I think I was adopting an attitude of sober scientific modesty to myself, and then nonetheless I would quietly talk of "proven beyond a reasonable doubt", and go on and make a mistake. I have seen the same behavior many times in others, who don't may not shout like classic teenagers, but this only amounts to their not admitting to themselves that they have in effect staked their life on a single possibility - so that they cannot even see their own degree of confidence. So the lesson I learn is not to congratulate myself on humility unless I really do have doubts and not just dutiful doubts, and I am preparing for those doubts and have fallback plans for them... this is covered under "The Proper Use of Doubt" and "The Proper Use of Humility", because how not to fake doubt and humility to yourself is a long discussion.

But don't think that any Eliezer was ever a classic teenager. He knew what a teenager was from the beginning, and avoided it. But to avoid the usual mistakes takes only a warning plus a relatively small amount of ability. This just leads to more original and creative mistakes that you weren't warned against, unless you hit a very high standard of precision indeed.

Of course you are correct that memory can't be trusted; this is demonstrated in many experiments. Memory fades, and then is recreated anew with each recollection. It seems that since I don't trust my memory, I don't remember a lot of the things that other people claim to remember. But if the memories that I can recall seem like foreign things now, that does seem to justify some degree of speaking in the third person - though my past memories to this self are a strange mix of total familiarity and alienness.

To move forward, you have to strike a balance between dismissing your past self out of hand - "Oh, teenagers will be teenagers, but I'm an adult now," said by someone who still falls prey to a different sort of peer pressure - versus identifying with your past self to a degree that averts Tsuyoku Naritai; to the point where it becomes a mere confession of sins, and not your determination to become a different person who sins less. You want to be able to declare yourself a different and improved person, but only after meeting some standard that forces you to put in genuine work on improvement.

One of the lessons here is that doing difficult things is largely about holding yourself to a high enough standard to force yourself to do real work on them; this is usually much higher than what we would instinctively take to ourselves as proof of having made an effort, since almost any effort will do for that.

Comment author: Phil_Goetz5 23 September 2008 09:42:42PM 1 point [-]

It sounds to me like this is leading towards collective extrapolated volition, and that you are presenting it as "patching" your previous set of beliefs so as to avoid catastrophic results in case life is meaningless.

It's not a patch. It's throwing out the possibility that life is not meaningless. Or, at least, it now opens up a big security hole for a set of new paths to catastrophe.

Approach 1: Try to understand morality. Try to design a system to be moral, or design a space for that system in which the gradient of evolution is similar to the gradient for morality.

Approach 2: CEV.

If there is some objective aspect to morality - perhaps not a specific morality, but let us say there are meta-ethics, rules that let us evaluate moral systems - then approach 1 can optimize above and beyond human morality.

Approach 2 can optimize accomplishment of our top-level goals, but can't further-optimize the top-level goals. It freezes-in any existing moral flaws at that level forever (such flaws do exist if there is an objective aspect to morality). Depending on the nature of the search space, it may inevitably lead to moral collapse (if we are at some point in moral space that has been chosen by adaptive processes that keep that point near some "ideal" manifold, and trajectories followed through moral space via CEV diverge from that manifold).

Comment author: Carl_Shulman 23 September 2008 09:52:01PM 2 points [-]

Phil,

If Approach 2 fails to achieve the aims of Approach 1, then humanity generally wouldn't want to pursue Approach 1 regardless. Are you asserting that your audience would tend to diverge from the rest of humanity if extrapolated, in the direction of Approach 1?

Comment author: Eliezer_Yudkowsky 23 September 2008 10:03:55PM 2 points [-]

As far as I can tell, Phil Goetz is still pursuing a mysterious essence of rightness - something that could be right, when the whole human species has the wrong rule of meta-morals.

Comment author: Lara_Foster2 23 September 2008 10:48:28PM 0 points [-]

What I think is a far more likely scenario than missing out on the mysterious essence of rightness by indulging the collective human id, is that what 'humans' want as a complied whole is not what we'll want as individuals. Phil might be aesthetically pleased by a coherent metamorality, and distressed if the CEV determines what most people want is puppies, sex, and crack. Remember that the percentage of the population that actually engages in debates over moral philosophy is diminishingly small, and everyone else just acts, frequently incoherently.

Comment author: Ben_Jones 24 September 2008 09:47:00AM 0 points [-]

To move forward, you have to strike a balance between dismissing your past self out of hand [...] versus identifying with your past self to a degree that averts Tsuyoku Naritai

Nail, head. Well said sir.

I think that if there's anything one should strive to remember from youth, it's just how easy it is to accept assumption as fact. How sure was I that God was in the sky looking down at me? As sure as it's possible to be. And I try to remember that yes, it was me who believed that. Keeps me honest about the likelihood of my current beliefs.

Comment author: Richard_Hollerith2 24 September 2008 06:17:56PM 0 points [-]

If Eliezer had not abandoned the metaethics he adopted in 1997 or so by the course described in this blog entry, he might have abandoned it later in the design of the seed AI when it became clear to him that the designer of the seed must choose the criterion the AI will use to recognize objective morality when it finds it. In other words, there is no way to program a search for objective morality or for any other search target without the programmer specifying or defining what constitutes a successful conclusion of the search.

The reason a human seems to be able to search for things without being able to define clearly at the start of the search what he or she is searching for is that humans have preferences and criteria that no one can articulate fully. Well, the reader might be thinking, why not design the AI so that it, too, has criteria that no one can articulate? My answer has 2 parts: one part explains that CEV is not such an unarticulatable design; the other part asserts that any truly unarticulatable design would be irresponsible.

Although it is true that no one currently in existence can articulate the volition of the humans, it is possible for some of us to specify or define with enough precision and formality what the volition of the humans is and how the AI should extrapolate it. In turn, a superintelligent AI in possession of such a definition can articulate the volition of the humans.

The point is that although it is a technically and scientifically challenging problem, it is not outside the realm of current human capability to define what is meant by the phrase "coherent extrapolated volition" in sufficient precision, reliability and formality to bet the outcome of the intelligence explosion on it.

Like I said, humans have systems of value and systems of goals that no one can articulate. The only thing that keeps it from being completely unethical to rely on humans for any important purpose is that we have no alternative means of achieving the important purpose. In contrast, it is possible to design a seed AI whose goal system is "articulatable", which means that some human or some team or community of humans can understand it utterly, the way that some humans can currently understand relativity theory utterly. An agent with an articulatable goal system is vastly preferrable to the alternative because it is vastly desirable for the designer of the agent to do his or her best in choosing the optimization target of the agent, and choosing an unarticulatable goal system is simply throwing away that ability to choose -- leaving the choice up to "chance".

To switch briefly to a personal note, when I found Eliezer's writings in 2001 his home page still linked to his explanation of the metaethics he adopted in 1997 or so, which happened to coincide with my metaethics at the time (which coincidence made me say to myself, "What a wonderful young man!" ). I can for example recall using the argument Eliezer give below in a discussion of ethics with my roommate in 1994:

In the event that life is meaningless, nothing is the "right" thing to do; therefore it wouldn't be particularly right to respect people's preferences in this event.

Anyway, I have presented the argument against the metaethics to which Eliezer and I used to subscribe that I find the most persuasive.

Comment author: Eliezer_Yudkowsky 24 September 2008 07:14:36PM 2 points [-]

In other words, there is no way to program a search for objective morality or for any other search target without the programmer specifying or defining what constitutes a successful conclusion of the search.

If you understand this, then I am wholly at a loss to understand why you think an AI should have "universal" goals or a goal system zero or whatever it is you're calling it.

Comment author: michael_vassar3 24 September 2008 09:13:20PM 0 points [-]

I think that Hollerith thinks that Omohundro's "Basic AI Drives" *are* G.E Moore's "good"

Comment author: Phil_Goetz5 24 September 2008 09:54:16PM 1 point [-]

Eliezer says:

As far as I can tell, Phil Goetz is still pursuing a mysterious essence of rightness - something that could be right, when the whole human species has the wrong rule of meta-morals.

Eliezer,

I have made this point twice now, and you've failed to comprehend it either time, and you're smart enough to comprehend it, so I conclude that you are overconfident. :)

The human species does not consciously have any rule of meta-morals. Neither do they consciously follow rules to evolve in a certain direction. Evolution happens because the system dynamics cause them to happen. There is a certain subspace of possible (say) genomes that is, by some objective measures, "good".

Likewise, human morality may have evolved in ways that are "good", without humans knowing how that happened. I'm not going to try to figure out here what "good" might mean; but I believe the analogy I'm about to make is strong enough that you should admit this as a possibility. And if you don't, you must admit (which you haven't) my accusation that CEV is abandoning the possibility that there is such a thing as "good".

(And if you don't admit any possibility that there is such a thing as goodness, you should close up shop, go home, and let the paperclipping AIs take over.)

If we seize control over our physical and moral evolution, we'd damn well better understand what we're replacing. CEV means replacing evolution with a system whereby people vote on what feature they'd like to evolve next.

I know you can understand this next part, so I'm hoping to hear some evidence of comprehension from you, or some point on which you disagree:

  • Dynamic systems can be described by trajectories through a state space. Suppose you take a snapshot of a bunch of particles traveling along these trajectories. For some open systems, the entropy of the set of particles can decrease over time. (You might instead say that, for the complete closed system, the entropy of the projection of a set of particles onto a manifold of its space can decrease. I'm not sure this is equivalent, but my instinct is that it is.) I will call these systems "interesting".

  • For a dynamic system to be interesting, it must have dimensions or manifolds in its space along which trajectories contract; in a bounded state space, this means that trajectories will end at a point, or in a cycle, or in a chaotic attractor.

  • We desire, as a rule of meta-ethics, for humanity to evolve according to rules that are interesting, in the sense just described. This is equivalent to saying that the complexity of humanity/society, by some measure, should increase. (Agree? I assume you are familiar enough with complex adaptive systems that I don't need to justify this.)

  • A system can be interesting only if there is some dynamic causing these attractors. In evolution, this dynamic is natural selection. Most trajectories for an organism's genome, without selection, would lead off of the manifold in which that genome builds a viable creature. Without selection, mutation would simply increase the entropy of the genome. Natural selection is a force pushing these trajectories back towards the "good" manifold.

  • CEV proposes to replace natural selection with (trans)human supervision. You want to do this even though you don't know what the manifold for "good" moralities is, nor what aspects of evolution have kept us near that manifold in the past. The only way you can NOT expect this to be utterly disastrous, is if you are COMPLETELY CERTAIN that morality is arbitrary, and there is no such manifold.

Since there OBVIOUSLY IS such a manifold for "fitness", I think the onus is on you to justify your belief that there is no such manifold for "morality". We don't even need to argue about terms. The fact that you put forth CEV, and that you worry about the ethics of AIs, proves that you do believe "morality" is a valid concept. We don't need to understand that concept; we need only to know that it exists, and is a by-product of evolution. "Morality" as developed further under CEV is something different than "morality" as we know it, by which I mean, precisely, that it would depart from the manifold. Whatever the word means, what CEV would lead to would be something different.

CEV makes an unjustified, arbitrary distinction between levels. It considers the "preferences" (which I, being a materialist, interpret as "statistical tendencies" of organisms, or of populations; but not of the dynamic system. Why do you discriminate against the larger system?

Carl writes,

If Approach 2 fails to achieve the aims of Approach 1, then humanity generally wouldn't want to pursue Approach 1 regardless. Are you asserting that your audience would tend to diverge from the rest of humanity if extrapolated, in the direction of Approach 1?

Yes; but reverse the way you say that. There are already forces in place that keep humanity evolving in ways that may be advantageous morally. CEV wants to remove those forces without trying to understand them first. Thus it is CEV that will diverge from the way human morality has evolved thus far.

Comment author: Phil_Goetz5 24 September 2008 10:02:17PM 0 points [-]

What I think is a far more likely scenario than missing out on the mysterious essence of rightness by indulging the collective human id, is that what 'humans' want as a complied whole is not what we'll want as individuals. Phil might be aesthetically pleased by a coherent metamorality, and distressed if the CEV determines what most people want is puppies, sex, and crack. Remember that the percentage of the population that actually engages in debates over moral philosophy is diminishingly small, and everyone else just acts, frequently incoherently.

Ooh! I vote for puppies, sex, and crack.

(Just not all at the same time.)

Comment author: Ben_Jones 25 September 2008 11:06:02AM 1 point [-]

Phil, very well articulated and interesting stuff. Have you seen Wall-E? It's the scenario your post warns against, but with physical instead of evolutionary fitness.

I agree that Eliezer seems to have brushed aside your viewpoint withough giving it due deliberation, when the topic of the ethics of transcending evolution seems right up his street for blogging on.

However: It considers the "preferences" (which I, being a materialist, interpret as "statistical tendencies" of organisms, or of populations; but not of the dynamic system. Why do you discriminate against the larger system?

Because he can. You're straying close to the naturalistic fallacy here. Just as soon as natural selection gets around to building a Bayesian superintelligence, it can specify whatever function it wants to. We build the AI, we get to give it our preferences. What's unfair about that?

Besides, we departed from selection's straight-and-narrow when we made chocolate, condoms, penicillin and spacecraft. We are what selection made us, with our thousand shards of desire, but I see no reason why we should be constrained by that. Our ethics are long since divorced from their evolutionary origins. It's understandable to worry that this makes them vulnerable - I think we all do. It won't be easy bringing them with us into the future, but that's why we're working hard at it.

@Lara: what 'humans' want as a complied whole is not what we'll want as individuals

Great description of why people in democracies bitch constantly but never rise up. The collective gets what it wants but the individuals are never happy. If I was a superintelligence I'd just paperclip us all and be done with it.

Comment author: Caledonian2 25 September 2008 08:52:31PM -1 points [-]

Our ethics are long since divorced from their evolutionary origins.

I think that statement is vastly overconfident, unless perhaps the 'our' refers to a very limited subset of posters/readers here at OB.

Talking about posters/readers here, or humans in general, renders your claim incorrect -- and obviously so. Our intellectual models might in some cases have diverged from selection, but not what we actually value.

Comment author: Tim_Tyler 26 September 2008 07:01:01AM 0 points [-]

Besides, we departed from selection's straight-and-narrow when we made chocolate, condoms, penicillin and spacecraft.

The large numbers of those objects strongly suggests that their production is favoured by selection processes. They are simply not the result of selection acting on human DNA.

Comment author: Richard_Hollerith2 26 September 2008 03:33:30PM 0 points [-]

In other words, there is no way to program a search for objective morality or for any other search target without the programmer specifying or defining what constitutes a successful conclusion of the search.

If you understand this, then I am wholly at a loss to understand why you think an AI should have "universal" goals or a goal system zero or whatever it is you're calling it.

The flip answer is that the AI must have some goal system (and the designer of the AI must choose it). The community contains vocal egoists, like Peter Voss, Hopefully Anonymous, maybe Denis Bider. They want the AI to help them achieve their egoistic ends. Are you less at a loss to understand them than me?

Comment author: anon666 26 September 2008 04:05:14PM 0 points [-]

"The flip answer is that the AI must have some goal system (and the designer of the AI must choose it). The community contains vocal egoists, like Peter Voss, Hopefully Anonymous, maybe Denis Bider. They want the AI to help them achieve their egoistic ends. Are you less at a loss to understand them than me?"

I certainly am. Your proposal doesn't benefit *anyone* at all.

Comment author: Eliezer_Yudkowsky 26 September 2008 06:03:04PM 2 points [-]

Second anon666's question. A selfish human is much more comprehensible to me, than turning the galaxy into computers that could run a question if there were any question to run, which there isn't.

(Voss is a libertarian/Randian type egoist, the sort who justifies egoism by pointing to how society will be better off, not the Dennis-type egoist who just genuinely doesn't give a damn for anyone else.)

Comment author: Richard_Hollerith2 28 September 2008 02:50:15AM 0 points [-]

It is true that my proposal does not benefit any person, human or otherwise, except as a means to further ends.

A human or a sentient has no intrinsic value in my way of thinking about morality -- though of course humans have great instrumental value as long as they remain the only intelligent agents in the known universe.

Now note that one galaxy converted into a superintelligent cloud of matter and energy suffices to keep each and every human alive for billions of years, end disease and suffering, etc, with plenty of matter and energy left over for frivolous toys like a planet transformed into a child's toy.

My proposal is mainly an answer to the question of what end to put all those other galaxies that are not being used to provide a nice place for the humans and their descendants to live.

turning the galaxy into computers that could run a question if there were any question to run.

That characterization of my system is unfair. The goal is more like turning the easy-to-reach matter and energy into computers and von-Neumann probes that will turn less-easy-to-reach matter and energy into computers and von-Neumann probes in an unending cycle, except that eventually the computers and probes will probably have to adopt other means of continuing, e.g., when it becomes clear that there is no way for computers and probes to continue to exist in this space-time continuum because the continuum itself will end in, e.g., a Big Rip.

Comment author: Tim_Tyler 28 September 2008 06:33:52AM 0 points [-]

The goal is more like turning the easy-to-reach matter and energy into computers and von-Neumann probes that will turn less-easy-to-reach matter and energy into computers and von-Neumann probes in an unending cycle [...]

It sounds rather like what evolution would give us by default - what I call "God's utility function", after Richard Dawkins, 1992 - though he didn't quite get it right.

Comment author: Ida 30 September 2008 04:44:15AM -1 points [-]

If you're going to have a self, you may as well have secrets, and maybe even conspiracies. But I do still try to abide by the principle of being able to pass a future lie detector test, with anyone else who's also willing to go under the lie detector, if the topic is a professional one. Fun Theory needs a commonsense exception for global catastrophic risk management.)

Comment author: idlewire 03 August 2009 03:55:07PM 4 points [-]

This reminds me of an idea I had after first learning about the singularity. I assumed that once we are uploaded into a computer, a large percentage of our memories could be recovered in detail, digitized, reconstructed and categorized and then you would have the opportunity to let other people view your life history (assuming that minds in a singularity are past silly notions of privacy and embarrassment or whatever).

That means all those 'in your head' comments that you make when having conversations might be up for review or to be laugh at. Every now and then I make comments in my head that are intended for a transhuman audience when watching a reconstruction of my life.

The idea actually has roots in my attempt to understand a heaven that existed outside of time, back when I was a believer. If heaven was not bound by time and I 'met the requirements', I was already up there looking down at a time-line version of my experience on earth. I knew for sure I'd be interested in my own life so I'd talk to the (hopefully existing) me in heaven.

On another note, I've been wanting to write a sci-fi story where a person slowly discovers they are an artificial intelligence led to believe they're human and are being raised on a virtual earth. The idea is that they are designed to empathize with humanity to create a Friendly AI. The person starts gaining either superpowers or super-cognition as the simulators start become convinced the AI person will use their power for good over evil. Maybe even have some evil AIs from the same experiment to fight. If anyone wants to steal this idea, go for it.

Comment author: BethMo 08 May 2011 07:17:57AM *  0 points [-]

On another note, I've been wanting to write a sci-fi story where a person slowly discovers they are an artificial intelligence led to believe they're human and are being raised on a virtual earth. The idea is that they are designed to empathize with humanity to create a Friendly AI. The person starts gaining either superpowers or super-cognition as the simulators start become convinced the AI person will use their power for good over evil. Maybe even have some evil AIs from the same experiment to fight. If anyone wants to steal this idea, go for it.

I want to read that story! Has anyone written it yet?

Comment author: wallowinmaya 16 May 2011 03:11:47PM 1 point [-]

Or "worrying about this possibility would be a poor use of resources, what with the incredible urgency of creating AI before humanity wipes itself out - you've got to go with what you have", this being uttered by people who just basically aren't interested in the problem.

I think this is unfair. When I first encountered the problem of FAI I reasoned similarly, and I was very interested in the problem. Now I know this argument has huge flaws, but they weren't caused by lack of interest.

Comment author: Brickman 27 July 2011 01:07:43AM 0 points [-]

Despite having seen you say it in the past, it wasn't until reading this article that in sunk in for me just how little danger we were actually in of Eliezer1997 (or even Eliezer2000) actually making his AI. He had such a poor understanding of the problem, I don't see how he could've gotten there from here without having to answer the question of "Ok, now what do I tell the AI to do?" The danger was in us almost never getting Eliezer2008, or in Eliezer2000 wasting a whole bunch of future-minded peoples' money getting to the point where he realized he was stuck.

Except I suppose he did waste a lot of other people's money and delay present-you by several years. So I guess that danger wasn't entirely dodged after all. And maybe you did have something you planned to tell the AI to do anyways, something simple and useful sounding in and of itself with a tangible result. Probably something it could do "before" solving the question of what morality is, as a warmup. That's what the later articles in this series suggest, at least.

I also peeked at the Creating Friendly AI article just to see it. That, unlike this, looks like the work of somebody who is very, very ready to turn the universe into paperclips. There was an entire chapter about why the AI probably won't ever learn to "retaliate", as if that was one of the most likely ways for it to go wrong. I couldn't even stand to read more than half a chapter and I'm not you.

"To the extent that they were coherent ideas at all" you've said of half-baked AI ideas in other articles. It's nice to finally understand what that means.