PhilGoetz comments on The scourge of perverse-mindedness - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (249)
There actually is a way in which they're right.
My first thought was, "You've got it backwards - it isn't that materialism isn't gloomy; it's that spiritualism is even gloomier." Because spiritual beliefs - I'm usually thinking of Christianity when I say that - don't really give you oughtness for free; they take the arbitrary moral judgements of the big guy in the sky and declare them correct. And so you're not only forced to obey this guy; you're forced to enjoy obeying him, and have to feel guilty if you have any independent moral ideas. (This is why Christianity, Islam, communism, and other similar religions often make their followers morally-deficient.)
But what do I mean by gloomier? I must have some baseline expectation which both materialism and spirituality fall short of, to feel that way.
And I do. It's memories of how I felt when I was a Christian. Like I was a part of a difficult but Good battle between right and wrong.
Now, hold off for a moment on asking whether that view is rational or coherent, and consider a dog. A dog wants to make its master happy. Dogs have been bred for thousands of years specifically not to want to challenge their master, or to pursue their own goals, as wolves do. When a dog can be with its master, and do what its master tells it to, and see that its master is pleased, the dog is genuinely, tail-waggingly happy. Probably happier than you or I are even capable of being.
A Christian just wants to be a good dog. They've found a way to reach that same blissful state themselves.
The materialistic worldview really is gloomy compared to being a dog.
And we don't have any way to say that we're right and they're wrong.
Factually, of course, they're wrong. But when you're a dog, being factually wrong isn't important. Obeying your master is important. Judged by our standards of factual correctness, we're right and they're wrong. Judged by their standards of being (or maybe feeling like) a good dog, they're right and we're wrong.
One of the problems with CEV, perhaps related to wireheading, is that it would probably fall into a doglike attractor. Possibly you can avoid it by writing into the rules that factual correctness trumps all other values. I don't think you can avoid it that easily. But even if you could, by doing so, you've already decided whose values you're going to implement, before your FAI has even booted up; and the whole framework of CEV is just a rationalization to excuse the fact that the world is going to end up looking the way you want it to look.
I disagree with most of this but vote it up for being an excellent presentation of a complex and important position that must be addressed (though as noted, I think it can be) and hasn't been adequately addressed to satisfy (or possibly even to be understood by) all or most LW readers.
Phil, I suggest, that you try to look at Christian and secular children (and possibly those of some other religions) and decide empirically whether they really seem to differ so much in happiness or well being. Looking at people in a wide range of cultures in situations would in general be helpful, but especially that contrast or mostly, I suspect, lack of contrast.
Children are where not to look. Dogs psychologically resemble wolf-pups; they are childlike. Religion, like the breeding of dogs, is neotenous; it allows retention of childlike features into adulthood. To see the differences I'm talking about, you therefore need to look at adults.
Anyway, if you're asking me to judge based on who is the happiest, you've taken the first step down the road to wireheading. Dogs have been genetically reprogrammed to develop in a way that wires their value system to getting a pat on the head from their master.
The basic problem here is how we can simultaneously preserve human values, and not become wireheads, when some people are already wireheads. The religious worldview I spoke of above is a kind of wireheading. Would CEV dismiss it as wireheading? If so, what human values aren't wireheading? How do we walk the tightrope between wireheads and moral realists? Is there even a tightrope to walk there?
IAWYC except for the last paragraph. While CEV isn't guaranteed to be a workable concept, and while it's dangerous to get into the habit of ruling out classes of counterargument by definition, I think there's a problem with criticizing CEV on the grounds "I think CEV will probably go this way, but I think that way is a big mistake, and I expect we'd all see it as a mistake even if we knew more, thought faster, etc." This is exactly the sort of error the CEV project is built to avoid.
I was a strong proponent of CEV as the most-correct theory I had heard on the topic of what goals to set, but I've become more skeptical as Eliezer started talking about potential tweaks to avoid insane results like the dog scenario above.
It seems similar in nature to the rule-building method of goal definition, where you create a list, and which has been roundly criticized as near impossible to do correctly.
I also dislike tweaks, but I think that Eliezer does too. I certainly don't endorse any sort of tweak that I have heard and understood.
FWIW, Eliezer seems to have suggested an anti-selfish-bastard tweak here.
Thanks! I'm unhappy to see that, but my preferences are over states of the world, not beliefs, unless they simply strongly favor the belief that they are over states of the world.
Fortunately, we have some time, but that does bode ill I think. OTOH, the general trend, though not the universal trend, is for CEV to look more difficult and stranger with time.
I don't trust CEV. The further you extrapolate from where you are, the less experience you have with applying the virtue you're trying to implement.
So you would like experience with the interactions through which our virtues unfold and are developed to be part of the extrapolation dynamic? http://www.google.com/search?q=%22grown+up+further+together%22&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a That always was intended I think.
If that's not what you mean, well, if you can propose alternatives to CEV that don't automatically fail and which also don't look to me like variations on CEV I think you will be the first to do so. CEV is terribly underspecified, so it's hard to think hard about the problem and propose something that doesn't already fall within the current specification.
That's why I prefer the 'would it satisfy everyone who ever lived?' strategy over CEV. Humanity's future doesn't have to be coherent. Coherence is something that happens at evolutionary choke-points, when some species dies back to within an order of magnitude of the minimum sustainable population. When some revolutionary development allows unprecedented surpluses, the more typical response is diversification.
Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.
Most people don't know what they want until you show it to them, and most of what they really want is personal. Food, shelter, maybe a rival tribe that's competent enough to be interesting but always loses when something's really at stake. The option of exploring a larger world, seldom exercised. It doesn't take a whole galaxy's resources to provide that, even if we're talking trillions of people.
I realized a pithy way of stating my objection to that strategy: given how unlikely I think it is that the test could be passed fairly by a Friendly AI, an AI passing the test is stronger evidence that the AI is cheating somehow than that the AI is Friendly.
If the AI is programmed so that it genuinely wants to pass the test (or the closest feasible approximation of the test) fairly, cheating isn't an issue. This isn't a matter of fast-talking it's way out of a box. A properly-designed AI would be horrified at the prospect of 'cheating,' the way a loving mother is horrified at the prospect of having her child stolen by fairies and replaced with a near-indistinguishable simulacrum made from sticks and snow.
It is probably possible to pass that test by exploiting human psychology. It is probably impossible to do well on that test by trying to convince humans that your viewpoint is right.
You're talking past orthonormal. You're assuming a properly-designed AI. He's saying that accomplishing the task would be strong evidence of unfriendliness.
What Phil said, and also:
Taboo "fairly"— this is another word the specification of which requires the whole of human values. Proving that the AI understands what we mean by fairness and wants to pass the test fairly is no easier than proving it Friendly in the first place.
"Fairly" was the wrong word in this context. Better might be 'honest' or 'truthful.' A truthful piece of information is one which increases the recipient's ability to make accurate predictions; an honest speaker is one whose statements contain only truthful information.
About what? Anything? That sounds very easy.
Remember Goodhart's Law - what we want is G, Good, not any particular G* normally correlated with Good.
Walking from Helsinki to Saigon sounds easy, too, depending on how it's phrased. Just one foot in front of the other, right?
Humans make predictions all the time. Any time you perceive anything and are less than completely surprised by it, that's because you made a prediction which was at least partly successful. If, after receiving and assimilating the information in question, any of your predictions is reduced in accuracy, any part of that map becomes less closely aligned with the territory, then the information was not perfectly honest. If you ignore or misinterpret it for whatever reason, even when it's in some higher sense objectively accurate, that still fails the honesty test.
A rationalist should win; an honest communicator should make the audience understand.
Given the option, I'd take personal survival even at the cost of accurate perception and ability to act, but it's not a decision I expect to be in the position of needing to make: an entity motivated to provide me with information that improves my ability to make predictions would not want to kill me, since any incoming information that causes my death necessarily also reduces my ability to think.
Your trilobite example is at odds with your everyone-who-lived strategy. The impact of the trilobite example is to show that CEV is fundamentally wrong, because trilobite cognition, no matter how far you extrapolate it, would never lead to love, or value it if it arose by chance.
Some degree of randomness is necessary to allow exploration of the landscape of possible worlds. CEV is designed to prevent exploration of that landscape.
Let me expand upon Vladimir's comment:
You have not yet learned that a certain argumentative strategy against CEV is doomed to self-referential failure. You have just argued that "exploring the landscape of possible worlds" is a good thing, something that you value. I agree, and I think it's a reflectively consistent value, which others generally share at some level and which they might share more completely if they knew more, thought faster, had grown up farther together, etc.
You then assume, without justification, that "exploring the landscape of possible worlds" will not be expressed as a part of CEV, and criticize it on these grounds.
Huh? What friggin' definition of CEV are you using?!?
EDIT: I realized there was an insult in my original formulation. I apologize for being a dick on the Internet.
Because EY has specifically said that that must be avoided, when he describes evolution as something dangerous. I don't think there's any coherent way of saying both that CEV will constrain future development (which is its purpose), and that it will not prevent us from reaching some of the best optimums.
Most likely, all the best optimums lie in places that CEV is designed to keep us away from, just as trilobite CEV would keep us away from human values. So CEV is worse than random.
That a "trilobite CEV" would never lead to human values is hardly a criticism of CEV's effectiveness. The world we have now is not "trilobite friendly"; trilobites are extinct!
CEV, as I understand it, is very weakly specified. All it says is that a developing seed AI chooses its value system after somehow taking into account what everyone would wish for, if they had a lot more time, knowledge, and cognitive power than they do have. It doesn't necessarily mean, for example, that every human being alive is simulated, given superintelligence, and made to debate the future of the cosmos in a virtual parliament. The combination of better knowledge of reality and better knowledge of how the human mind actually works may make it extremely clear that the essence of human values, extrapolated, is XYZ, without any need for a virtual referendum, or even a single human simulation.
It is a mistake to suppose, for example, that a human-based CEV process will necessarily give rise to a civilizational value system which attaches intrinsic value to such complexities as food, sex, or sleep, and which will therefore be prejudiced against modes of being which involve none of these things. You can have a value system which attributes positive value to human beings getting those things, not because they are regarded as intrinsically good, but because entities getting what they like is regarded as intrinsically good.
If a human being is capable of proposing a value system which makes no explicit mention of human particularities at all (e.g. Ben Goertzel's "growth, choice, and joy"), then so is the CEV process. So if the worry is that the future will be kept unnecessarily anthropomorphic, that is not a valid critique. (It might happen if something goes wrong, but we're talking about the basic idea here, not the ways we might screw it up.)
You could say, even a non-anthropomorphic CEV might keep us away from "the best optimums". But let's consider what that would mean. The proposition would be that even in a civilization making the best, wisest, most informed, most open-minded choices it could make, it still might fall short of the best possible worlds. For that to be true, must it not be the case that those best possible worlds are extremely hard to "find"? And if you propose to find them by just being random, must there not be some risk of instead ending up in very bad futures? This criticism may be comparable to the criticism that rational investment is a bad idea, because you'd make much more money if you won the lottery. If these distant optima are so hard to find, even when you're trying to find good outcomes, I don't see how luck can be relied upon to get you there.
This issue of randomness is not absolute. One might expect a civilization with an agreed-upon value system to nonetheless conduct fundamental experiments from time to time. But if there were experiments whose outcomes might be dangerous as well as rewarding, it would be very foolish to just go ahead and do them because if we get lucky, the consequences would be good. Therefore, I do not think that unconstrained evolution can be favored over the outcomes of non-anthropomorphic CEV.
That doesn't mean that you can't examine possible trajectories of evolution for good things you wouldn't have thought of yourself, just that you shouldn't allow evolution to determine the actual future.
I'm not sure what you mean by "constrain" here. A process that reliably reaches an optimum (I'm not saying CEV is such a process) constrains future development to reach an optimum. Any nontrivial (and non-self-undermining, I suppose; one could value the nonexistence of optimization processes or something) value system, whether "provincially human" or not, prefers the world to be constrained into more valuable states.
I don't see where you've responded to the point that CEV would incorporate whatever reasoning leads you to be concerned about this.
Or to take one step back:
It seems that you think there are two tiers of values, one consisting of provincial human values, and another consisting of the true universal values like "exploring the landscape of possible worlds". You worry that CEV will catch only the first group of values.
From where I stand, this is just a mistaken question; the values you worry will be lost are provincial human values too! There's no dividing line to miss.
This is one of the things I don't understand: If you think everything is just a provincial human value, then why do you care? Why not play video games or watch YouTube videos instead of arguing about CEV? Is it just more fun?
(There's a longish section trying to answer this question in the CEV document, but I can't make sense of it.)
There's a distinction that hasn't been made on LW yet, between personal values and evangelical values. Western thought traditionally blurs the distinction between them, and assumes that, if you have personal values, you value other people having your values, and must go on a crusade to get everybody else to adopt your personal values.
The CEVer position is, as far as I can tell, that they follow their values because that's what they are programmed to do. It's a weird sort of double-think that can only arise when you act on the supposition that you have no free will with which to act. They're talking themselves into being evangelists for values that they don't really believe in. It's like taking the ability to follow a moral code that you know has no outside justification from Nietzsche's "master morality", and combining it with the prohibition against value-creation from his "slave morality".
That's how most values work. In general, I value human life. If someone does not share this value, and they decide to commit murder, then I would stop them if possible. If someone does not share this value, but is merely apathetic about murder rather than a potential murderer themselves, then I would cause them to share this value if possible, so there will be more people to help me stop actual murderers. So yes, at least in this case, I would act to get other people to adopt my values, or inhibit them from acting on their own values. Is this overly evangelical? What is bad about it?
In any case, history seems to indicate that "evangelizing your values" is a "universal human value".
Groups that didn't/don't value evangelizing their values:
We get into one sort of confusion by using particular values as examples. You talk about valuing human life. How about valuing the taste of avocados? Do you want to evangelize that? That's kind of evangelism-neutral. How about the preferences you have that make one particular private place, or one particular person, or other limited resource, special to you? You don't want to evangelize those preferences, or you'd have more competition. Is the first sort of value the only one CEV works with? How does it make that distinction?
We get into another sort of confusion by not distinguishing between the values we hold as individuals, the values we encourage our society to hold, and the values we want God to hold. The kind of values you want your God to hold are very different from the kind of values you want people to hold, in the same way that you want the referee to have different desires than the players. CEV mushes these two very different things together.
Good points. I haven't thoroughly read the CEV document yet, so I don't know if there is any discussion of this, but it does seem that it should make a distinction between those different types of values and preferences.
I understand what you're saying, and I've heard that answer before, repeatedly; and I don't buy it.
Suppose we were arguing about the theory of evolution in the 19th century, and I said, "Look, this theory just doesn't work, because our calculations indicate that selection doesn't have the power necessary." That was the state of things around the turn of the century, when genetic inheritance was assumed to be analog rather than discrete.
An acceptable answer would be to discover that genes were discrete things that an organism had just 2 copies of, and that one was often dominant, so that the equations did in fact show that selection had the necessary power.
An unacceptable answer would be to say, "What definition of evolution are you using? Evolution makes organisms evolve! If what you're talking about doesn't lead to more complex organisms, then it isn't evolution."
Just saying "Organisms become more complex over time" is not a theory of evolution. It's more like an observation of evolution. A theory means you provide a mechanism and argue convincingly that it works. To get to a theory of CEV, you need to define what it's supposed to accomplish, propose a mechanism, and show that the mechanism might accomplish the purpose.
You don't have to get very far into this analysis to see why the answer you've given doesn't, IMHO, work. I'll try to post something later this afternoon on why.
I won't get around to posting that today, but I'll just add that I know that the intent of CEV is to solve the problems I'm complaining about. I know there are bullet points in the CEV document that say, "Renormalizing the dynamic", "Caring about volition," and, "Avoid hijacking the destiny of humankind."
But I also know that the CEV document says,
and
I think there is what you could call an order-of-execution problem, and I think there's a problem with things being ill-defined, and I think the desired outcome is logically impossible. I could be wrong. But since Eliezer worries that this could be the case, I find it strange that Eliezer's bulldogs are so sure that there are no such problems, and so quick to shoot down discussion of them.
You never learn.
Folks. Vladimir's response is not acceptable in a rational debate. The fact that it currently has 3 points is an indictment of the Less Wrong community.
Normally I would agree, but he was responding to "Some degree of randomness is necessary". Seriously, you should know that isn't right.
That post is about a different issue. It's about whether introducing noise can help an optimization algorithm. Sounds similar; isn't. The difference is that the optimization algorithm already knows the function that it's trying to optimize.
The basic problem with CEV is that it requires reifying values in a strange way so that there are atomic "values" that can be isolated from an agent's physical and cognitive architecture; and that (I think) it assumes that we have already evolved to the point where we have discovered all of these values. You can make very general value statements, such as that you value diversity, or complexity. But a trilobite can't make any of those value statements. I think it's likely that there are even more important fundamental value statements to be made that we have not yet conceptualized; and CEV is designed from the ground up specifically to prevent such new values from being incorporated into the utility function.
The need for randomness is not because random is good; it's because, for the purpose of discovering better primitives (values) to create better utility functions, any utility function you can currently state is necessarily worse than random.
Since when is randomness required to explore the "landscape of possible worlds"? Or the possible values that we haven't considered? A methodical search would be better. How did you miss that lesson from Worse Than Random, when it included an example (the pushbutton combination lock) of exploring a space of potential solutions?
There's several grounds for criticism here. Criticizing CEV by saying, "I think CEV will lead to good dogs, because that's what a lot of people would like," sounds valid to me, but would merit more argumentation (on both sides).
Another problem I mentioned is a possibly fundamental problem with CEV. Is it legitimate to say that, when CEV assumes that reasoned extrapolation trumps all existing values, that that is not the same as asserting that reason is the primary value? You could argue that reason is just an engine in service of some other value. There's some evidence that that actually works, as demonstrated by the theologians of the Roman Catholic Church, who have a long history of using reason to defeat reason. But I'm not convinced that makes sense. If it doesn't, then it means that CEV already assumes from the start the very kind of value that its entire purpose is to prevent being assumed.
Third, most human values, like dog-values, are neutral with respect to rationality or threatened by rationality. The dog itself needs to not be much more rational or intelligent than it is.
The only solution is to say that the rationality and the values are in the FAI sysop, while the conscious locus of the values is in the humans. That is, the sysop gets smarter and smarter, with dog-values as its value system. It knows that to get the experiential value out of dog-values, the conscious experiencer needs limited cognition; but that's okay, because the humans are the designated experiencers, while the FAI is the designated thinker and keeper-of-the-values.
There are two big problems with this.
By keeping the locus of consciousness out of the sysop, we're steering dangerously close to one of the worst-possible-of-all-worlds, which is building a singleton that, one way or the other, eventually ends up using most of the universe's computational energy, yet is not itself conscious. That's a waste of a universe.
Value systems are deictic, meaning they use the word "I" a lot. To interpret their meaning, you fill in the "I" with the identity of the reasoning agent. The sysop literally can't have human values if it doesn't have deictic values; and if it has deictic values, they're not going to stay doglike under extrapolation. (You could possibly get around this by using a non-deictic representation, and saying that the values have meaning only when seen in light of the combined sysop+humans system. Like the knowledge of Chinese in Searle's Chinese room.)
The FAI document says it's important to use non-deictic representations in the AI. Aside from the fact that this is probably impossible - cognition is compression, and deictic representations are much more compact, so any intelligence is going to end up using something equivalent to deictic representations - I don't know if it's meaningful to talk about non-deictic values. That would be like saying "I value the taste of chocolate" without saying who is tasting the chocolate. (That's one entry-point into paperclipping scenarios.)
The final, biggest problem illustrated by dog-values is that it's just not sensible to preserve "human values", when human values, even those found within the same person at different times of life, are as different as it is possible for values to be different. Sure, maybe we would have different values if we could see in the ultraviolet, or had seven sexes; but there is just no bigger difference between values than "valuing states of the external world", and "valuing phenomenal perceptions within my head." And there are already humans committed to each of those two fundamental value systems.
You have a point here. But as you mentioned, we aren't really capable of such a state, nor would it be virtuous to chase after one.
You guys have totally lost me with this AI stuff. I guess there's probably a sequence on it somewhere...