Mirrors and Paintings

Eliezer Yudkowsky

Followup to: Sorting Pebbles Into Correct Heaps, Invisible Frameworks

Background: There's a proposal for Friendly AI called "Coherent Extrapolated Volition" which I don't really want to divert the discussion to, right now. Among many other things, CEV involves pointing an AI at humans and saying (in effect) "See that? That's where you find the base content for self-renormalizing morality."

Hal Finney commented on the Pebblesorter parable:

I wonder what the Pebblesorter AI would do if successfully programmed to implement [CEV]... Would the AI pebblesort? Or would it figure that if the Pebblesorters got smarter, they would see that pebblesorting was pointless and arbitrary? Would the AI therefore adopt our own parochial morality, forbidding murder, theft and sexual intercourse among too-young people? Would that be the CEV of Pebblesorters?

I imagine we would all like to think so, but it smacks of parochialism, of objective morality. I can't help thinking that Pebblesorter CEV would have to include some aspect of sorting pebbles. Doesn't that suggest that CEV can malfunction pretty badly?

I'm giving this question its own post, for that it touches on similar questions I once pondered - dilemmas that forced my current metaethics as the resolution.

Yes indeed: A CEV-type AI, taking Pebblesorters as its focus, would wipe out the Pebblesorters and sort the universe into prime-numbered heaps.

This is not the right thing to do.

That is not a bug.

A primary motivation for CEV was to answer the question, "What can Archimedes do if he has to program a Friendly AI, despite being a savage barbarian by the Future's standards, so that the Future comes out right anyway? Then whatever general strategy Archimedes could plausibly follow, that is what we should do ourselves: For we too may be ignorant fools, as the Future measures such things."

It is tempting to further extend the question, to ask, "What can the Pebblesorters do, despite wanting only to sort pebbles, so that the universe comes out right anyway? What sort of general strategy should they follow, so that despite wanting something that is utterly pointless and futile, their Future ends up containing sentient beings leading worthwhile lives and having fun? Then whatever general strategy we wish the Pebblesorters to follow, that is what we should do ourselves: For we, too, may be flawed."

You can probably see in an intuitive sense why that won't work. We did in fact get here from the Greek era, which shows that the seeds of our era were in some sense present then - albeit this history doesn't show that no extra information was added, that there were no contingent moral accidents that sent us into one attractor rather than another. But still, if Archimedes said something along the lines of "imagine probable future civilizations that would come into existence", the AI would visualize an abstracted form of our civilization among them - though perhaps not only our civilization.

The Pebblesorters, by construction, do not contain any seed that might grow into a civilization valuing life, health, happiness, etc. Such wishes are nowhere present in their psychology. All they want is to sort pebble heaps. They don't want an AI that keeps them alive, they want an AI that can create correct pebble heaps rather than incorrect pebble heaps. They are much disturbed by the question of how such an AI can be created, when different civilizations are still arguing about heap sizes - though most of them believe that any sufficiently smart mind will see which heaps are correct and incorrect, and act accordingly.

You can't get here from there. Not by any general strategy. If you want the Pebblesorters' future to come out humane, rather than Pebblish, you can't advise the Pebblesorters to build an AI that would do what their future civilizations would do. You can't advise them to build an AI that would do what Pebblesorters would do if they knew everything the AI knew. You can't advise them to build an AI more like Pebblesorters wish they were, and less like what Pebblesorters are. All those AIs just sort the universe into prime heaps. The Pebblesorters would celebrate that and say "Mission accomplished!" if they weren't dead, but it isn't what you want the universe to be like. (And it isn't right, either.)

What kind of AI would the Pebblesorters have to execute, in order to make the universe a better place?

They'd have to execute an AI did not do what Pebblesorters would-want, but an AI that simply, directly, did what was right - an AI that cared directly about things like life, health, and happiness.

But where would that AI come from?

If you were physically present on the scene, you could program that AI. If you could send the Pebblesorters a radio message, you could tell them to program it - though you'd have to lie to them about what the AI did.

But if there's no such direct connection, then it requires a causal miracle for the Pebblesorters' AI to do what is right - a perpetual motion morality, with information appearing from nowhere. If you write out a specification of an AI that does what is right, it takes a certain number of bits; it has a Kolmogorov complexity. Where is that information appearing from, since it is not yet physically present in the Pebblesorters' Solar System? What is the cause already present in the Pebble System, of which the right-doing AI is an eventual effect? If the right-AI is written by a meta-right AI then where does the meta-right AI come from, causally speaking?

Be ye wary to distinguish between yonder levels. It may seem to you that you ought to be able to deduce the correct answer just by thinking about it - surely, anyone can see that pebbles are pointless - but that's a correct answer to the question "What is right?", which carries its own invisible framework of arguments that it is right to be moved by. This framework, though harder to see than arguments, has its physical conjugate in the human brain. The framework does not mention the human brain, so we are not persuaded by the argument "That's what the human brain says!" But this very event of non-persuasion takes place within a human brain that physically represents a moral framework that doesn't mention the brain.

This framework is not physically represented anywhere in the Pebble System. It's not a different framework in the Pebble System, any more than different numbers are prime here than there. So far as idealized abstract dynamics are concerned, the same thing is right in the Pebble System as right here. But that idealized abstract framework is not physically embodied anywhere in the Pebble System. If no human sends a physical message to the Pebble System, then how does anything right just happen to happen there, given that the right outcome is a very small target in the space of all possible outcomes? It would take a thermodynamic miracle.

As for humans doing what's right - that's a moral miracle but not a causal miracle. On a moral level, it's astounding indeed that creatures of mere flesh and goo, created by blood-soaked natural selection, should decide to try and transform the universe into a place of light and beauty. On a moral level, it's just amazing that the brain does what is right, even though "The human brain says so!" isn't a valid moral argument. On a causal level... once you understand how morality fits into a natural universe, it's not really all that surprising.

And if that disturbs you, if it seems to smack of relativism - just remember, your universalizing instinct, the appeal of objectivity, and your distrust of the state of human brains as an argument for anything, are also all implemented in your brain. If you're going to care about whether morals are universally persuasive, you may as well care about people being happy; a paperclip maximizer is moved by neither argument. See also Changing Your Metaethics.

It follows from all this, by the way, that the algorithm for CEV (the Coherent Extrapolated Volition formulation of Friendly AI) is not the substance of what's right. If it were, then executing CEV anywhere, at any time, would do what was right - even with the Pebblesorters as its focus. There would be no need to elaborately argue this, to have CEV on the left-hand-side and rightness on the r.h.s.; the two would be identical, or bear the same relation as PA+1 and PA.

So why build CEV? Why not just build a do-what's-right AI?

Because we don't know the complete list of our own terminal values; we don't know the full space of arguments we can be moved by. Human values are too complicated to program by hand. We might not recognize the source code of a do-what's-right AI, any more than we would recognize a printout of our own neuronal circuitry if we saw it. Sort of like how Peano Arithmetic doesn't recognize itself in a mirror. If I listed out all your values as mere English words on paper, you might not be all that moved by the list: is it more uplifting to see sunlight glittering off water, or to read the word "beauty"?

But in this art of Friendly AI, understanding metaethics on a naturalistic level, we can guess that our morals and metamorals will be physically represented in our brains, even though our morality (considered as an idealized abstracted dynamic) doesn't attach any explicit moral force to "Because a brain said so."

So when we try to make an AI whose physical consequence is the implementation of what is right, we make that AI's causal chain start with the state of human brains - perhaps nondestructively scanned on the neural level by nanotechnology, or perhaps merely inferred with superhuman precision from external behavior - but not passed through the noisy, blurry, destructive filter of human beings trying to guess their own morals.

The AI can't start out with a direct representation of rightness, because the programmers don't know their own values (not to mention that there are other human beings out there than the programmers, if the programmers care about that). The programmers can neither brain-scan themselves and decode the scan, nor superhumanly precisely deduce their internal generators from their outward behavior.

So you build the AI with a kind of forward reference: "You see those humans over there? That's where your utility function is."

As previously mentioned, there are tricky aspects to this. You can't say: "You see those humans over there? Whatever desire is represented in their brains, is therefore right." This, from a moral perspective, is wrong - wanting something doesn't make it right - and the conjugate failure of the AI is that it will reprogram your brains to want things that are easily obtained in great quantity. If the humans are PA, then we want the AI to be PA+1, not Self-PA... metaphorically speaking.

You've got to say something along the lines of, "You see those humans over there? Their brains contain the evidence you will use to deduce the correct utility function, even though right-ness is not caused by those brains, so that intervening to alter the brains won't alter the correct utility function." Here, the "correct" in "correct utility function" is relative to a meta-utility framework that points to the humans and defines how their brains are to be treated as information. I haven't worked out exactly how to do this, but it does look solvable.

And as for why you can't have an AI that rejects the "pointless" parts of a goal system and only keeps the "wise" parts - so that even in the Pebble System the AI rejects pebble-sorting and keeps the Pebblesorters safe and warm - it's the problem of the invisible framework again; you've only passed the recursive buck. Humans contain the physical representations of the framework that we appeal to, when we ask whether a goal is pointless or wise. Without sending a message to the Pebble System, the information there cannot physically materialize from nowhere as to which goals are pointless or wise. This doesn't mean that different goals are pointless in the Pebble System, it means that no physical brain there is asking that question.

The upshot is that structurally similar CEV algorithms will behave differently depending on whether they have humans at the focus, or Pebblesorters. You can infer that CEV will do what's right in the presence of humans, but the general algorithm in CEV is not the direct substance of what's right. There is no moral imperative to execute CEVs regardless of their focus, on any planet. It is only right to execute CEVs on decision systems that contain the seeds of rightness, such as humans. (Again, see the concept of a moral miracle that is not a causal surprise.)

Think of a Friendly AI as being like a finely polished mirror, which reflects an image more accurately than any painting drawn with blurred eyes and shaky hand. If you need an image that has the shape of an apple, you would do better to put an actual apple in front of the mirror, and not try to paint the apple by hand. Even though the drawing would inherently be apple-shaped, it wouldn't be a good one; and even though the mirror is not inherently apple-shaped, in the presence of an actual apple it is a better picture than any painting could be.

"Why not just use an actual apple?" you ask. Well, maybe this isn't a merely accurate mirror; it has an internal camera system that lightens the apple's image before displaying it. An actual apple would have the right starting shape, but it wouldn't be bright enough.

You may also want a composite image of a lot of apples that have multiple possible reflective equilibria.

As for how the apple ended up apple-shaped, when the substance of the apple doesn't define apple-shaped-ness - in the very important sense that squishing the apple won't change what's apple-shaped - well, it wasn't a miracle, but it involves a strange loop through the invisible background framework.

And if the whole affair doesn't sound all that right... well... human beings were using numbers a long time before they invented Peano Arithmetic. You've got to be almost as smart as a human to recognize yourself in a mirror, and you've got to be smarter than human to recognize a printout of your own neural circuitry. This Friendly AI stuff is somewhere in between. Would the rightness be easier to recognize if, in the end, no one died of Alzheimer's ever again?

Followup to: Sorting Pebbles Into Correct Heaps, Invisible Frameworks

Hal Finney commented on the Pebblesorter parable:

I wonder what the Pebblesorter AI would do if successfully programmed to implement [CEV]... Would the AI pebblesort? Or would it figure that if the Pebblesorters got smarter, they would see that pebblesorting was pointless and arbitrary? Would the AI therefore adopt our own parochial morality, forbidding murder, theft and sexual intercourse among too-young people? Would that be the CEV of Pebblesorters?

I imagine we would all like to think so, but it smacks of parochialism, of objective morality. I can't help thinking that Pebblesorter CEV would have to include some aspect of sorting pebbles. Doesn't that suggest that CEV can malfunction pretty badly?

I'm giving this question its own post, for that it touches on similar questions I once pondered - dilemmas that forced my current metaethics as the resolution.

Yes indeed: A CEV-type AI, taking Pebblesorters as its focus, would wipe out the Pebblesorters and sort the universe into prime-numbered heaps.

This is not the right thing to do.

That is not a bug.

What kind of AI would the Pebblesorters have to execute, in order to make the universe a better place?

They'd have to execute an AI did not do what Pebblesorters would-want, but an AI that simply, directly, did what was right - an AI that cared directly about things like life, health, and happiness.

But where would that AI come from?

So why build CEV? Why not just build a do-what's-right AI?

So you build the AI with a kind of forward reference: "You see those humans over there? That's where your utility function is."

You may also want a composite image of a lot of apples that have multiple possible reflective equilibria.

Eliezer,

I vaguely remember from the last time I visited this site that you are in the inductivist camp. In several articles you seemed to express a deep belief in Bayesian reasoning.

I think that while you are an intelligent guy but I think your abandonment of falsification in favor of induction is one of your primary mistakes. Falsification subsumes induction. Popper wins over Bayes.

Any presumed inductivism has foundations in trial and error, and not the other way around. Poppers construction is so much more straightforward than this convoluted edifice you are creating.

Once you understand falsification there is no problem explaining why science isn’t based on “faith”. That’s because once you accept falsification as the basis for science it is clear that one is not using mere induction.

At this point I’m wondering if you are a full blown inductionist. Do you believe that my beliefs are founded upon induction? Do you believe that because you believe I have no way to avoid the use of induction? I had a long discussion once with an inductivist and for the life of me I couldn’t get him to understand the difference between being founded upon and using.

I don’t even believe that I am using induction in many of the cases where inductivists claim that I am. I don’t assume the floor will be there when I step out of bed in the morning because of induction, nor do I believe sun will rise tomorrow because of induction.

I believe those things because I have well tested models. Models about how wood behaves, and models about how objects behave. Often I don’t even believe what is purported to be my belief.

The question, “will the sun rise tomorrow” has a broader meaning than “The sun will rise on August 24, 2008” in this discussion. In fact, I don’t explicitly and specifically hold such beliefs in any sort of long term storage. I don’t have a buffer for whether the sun is going to rise on the 24th, the 25th, and so forth. I don’t have enough memory for that. Nor do I determine the values to place in each of those buffers by an algorithm of induction.

I only think the question refers to August the 24th with further clarification by the speaker. I think he means “how do we know the sun will keep rising” and not that the questioner had any particular concern about the 24th.

I did run into a guy at a park who asked me if I believed the world would end on December 21, 2012. I had no idea what he was on about till he mentioned something about the Mayan calendar.

So in fact, in this discussion, when we are talking about the question of “will the sun will rise tomorrow” we aren’t concerned about whether any single new observation will match priors we are concerned about the principles upon which the sun operates. We are talking models, not observations.

As a child I remember just assuming the sun would rise. I don’t in fact remember any process of induction I went through to justify it. Of course that doesn’t mean my brain might not be operating via induction unbeknownst to me. The same could be said of animals. They two operate on the assumption that the sun will rise tomorrow.

They even have specific built in behaviors that are geared towards this. It’s pretty clear that where these assumption are encoded outside the brain, that the encoding was done by evolutionary processes and we know natural selection does not operate via induction.

What about the mental processes of animals. Must the fact that animals mentally operate on the presumption that “the sun will rise tomorrow” mean that they much have somewhere deep inside an inductive module to deal with the sun rising. I don’t think so. It isn’t even clear that they believe that they believe “the sun will rise tomorrow” either specifically or generally.

Even if they do it is not clear that induction plays a part in such a belief. It may be that natural selection has built up a many different possible mental models for operational possibilities and that observation is only used to classify things as fitting one of these predefined models.

Heck, I can even build new categories of models on the fly this way, this too on the basis of trial and error. A flexible mind finding that the behavior of some object in the real world does not quite fit one of the categories can take guesses at ways to tweak the model to better fit.

So it is not at all clear that anything has been foundationally been arrived at via induction.

In fact, if my memory serves me when I first inquired about the sun I was seeking a more sophisticated model. I knew I already had it categorized as the kind of object that behaved the same way as it did in the past, but was concerned that perhaps I was mistaken and that it might be categorized in some other way. Perhaps as something that doesn’t follow such a simple rule.

Now I’m not even sure I asked the question precisely as “will the sun rise tomorrow” but I do remember my mental transitions. At first I don’t remember even thinking about it. Later I modified my beliefs in various ways and I don’t recall in what order, or why. I came to understand the sun rose repetitively, on a schedule, etc.

I do remember certain specific transitions. Like the time I realized because of tweaking of other models that, in fact, the statement “The sun will rise tomorrow” taken generally is not true. That I know certainly came to mind when I learned the sun was going to burn out in six billion years. My model, in the sense I believed the “sun will rise tomorrow” meaning the next day would come on schedule, was wrong.

In my view, “things that act Bayesian” is just another model. Thus, I never found the argument that Bayes refutes Popper very compelling. Reading many of the articles linked off this one I see that you seem to be spinning your wheels. Popper covered the issue of justification much more satisfactorily than you have with your article, http://lesswrong.com/lw/s0/where_recursive_justification_hits_bottom/”">“Where Recursive Justification Hits Rock Bottom”.

The proper answer is that justification doesn’t hit rock bottom and that science isn’t about absolute proof. Science is about having tentative beliefs that are open to change given more information based on models that are open to falsification by whatever means.

Pursuing a foundationalist philosophical belief system is a fools errand once you understand that there is no base foundation to knowledge. The entire question of whether knowledge is based on faith vs. empiricism evaporates with this understanding. Proper knowledge is based on neither.

I could go on with this. I have thought these things through to a very great extent but I know you have a comment length restriction here and I’ve probably already violated it. That’s a shame because it limits the discussion and allows you to continue in your biases.

You are definitely on the wrong track here with your discussions on morality also. You are missing the fundamentality of natural selection in all this, both to constrain our creations and to how it arises. In my view, the Pebblesorters morality is already divorced from survival and therefore it should be of no concern to themselves whatever if their AI becomes uncontrollable, builds it’s own civilization, etc. Fish, in fact, do create piles of pebbles despite their beliefs and you expressed no belief on their part that they must destroy incorrectly piled pebbles created by nature. So why should they have moral cares if their AI wins independence and goes of and does the “wrong” thing.

For them to be concerned about the AI requires broader assumptions than you have made explicit in your assumption. Assumptions like feeling responsible for chains of events you have set in place. There are assumptions that are objectively required to even consider something a morality. Otherwise we have classified incorrectly. In fact, the pebble sorters are suffering from an obsessive delusion and not a true morality. Pebblesorting fails to fit even the most simplistic criteria for a morality.

Since I am limited in both length and quantity of posts and I don’t feel like splitting this into multiple posts over multiple articles. This is in response to many of your articles. Invisible Frameworks, Mirrors and Paintings, Pebblesorters, When Recursive Justification Hits Rock Bottom, etc.

I could post it on an older thread to be buried a hundred comments deep but that two isn’t a rational choice as I’d like people to actually see it. To see that this abandonment of falsification for induction is based on faulty reasoning. I’m concerned about this because I have been watching science become increasingly corrupted by politics over my lifetime and one of the main levers used to do this is the argument that real scientists don’t use falsification (while totally misunderstanding what the term means) but induction.

31

Mirrors and Paintings

31

31

31

Mirrors and Paintings

31

31