Kevin comments on Open Thread: February 2010, part 2 - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (857)
Objections to Coherent Extrapolated Volition
http://www.singinst.org/blog/2007/06/13/objections-to-coherent-extrapolated-volition/
Some quibbles:
These need seed content, but seem like they can be renormalized.
This may be a problem, but it seems to me that choosing this particular example, and being as confident of it as you appear to be, are symptomatic of an affective death spiral.
The original CEV proposal appears to me to endorse using something like a CFAI-style controlled ascent rather than blind FOOM: "A key point in building a young Friendly AI is that when the chaos in the system grows too high (spread and muddle both add to chaos), the Friendly AI does not guess. The young FAI leaves the problem pending and calls a programmer, or suspends, or undergoes a deterministic controlled shutdown."
What you're looking for is a way to construe the extrapolated volition that washes out superstition and dementation.
To the extent that vengefulness turns out to be a simple direct value that survives under many reasonable construals, it seems to me that one simple and morally elegant solution would be to filter, not the people, but the spread of their volitions, by the test, "Would your volition take into account the volition of a human who would unconditionally take into account yours?" This filters out extrapolations that end up perfectly selfish and those which end up with frozen values irrespective of what other people think - something of a hack, but it might be that many genuine reflective equilibria are just like that, and only a values-based decision can rule them out. The "unconditional" qualifier is meant to rule out TDT-like considerations, or they could just be ruled out by fiat, i.e., we want to test for cooperation in the Prisoner's Dilemma, not in the True Prisoner's Dilemma.
It's possible that having a complete mind design on hand would mean that there were no philosophy problems left, since the resources that human minds have to solve philosophy problems are finite, and knowing the exact method to use to solve a philosophy problem usually makes solving it pretty straightforward (the limiting factor on philosophy problems is never computing power). The reason why I pick on this particular cited problem as problematic is that, as stated, it involves an inherent asymmetry between the problems you want the AI to solve and your own understanding of how to meta-approach those problems, which is indeed a difficult and dangerous sort of state.
All approaches to superintelligence, without exception, have this problem. It is not quite as automatically lethal as it sounds (though it is certainly automatically lethal to all other parties' proposals for building superintelligence). You can build in test cases and warning criteria beforehand to your heart's content. You can detect incoherence and fail safely instead of doing something incoherent. You could, though it carries with its own set of dangers, build human checking into the system at various stages and with various degrees of information exposure. But it is the fundamental problem of superintelligence, not a problem of CEV.
I will not lend my skills to any such thing.
Is that just a bargaining position, or do you truly consider that no human values surviving is preferable to allowing an "unfair" weighing of volitions?
"Would your volition take into account the volition of a human who would unconditionally take into account yours?"
Doesn't this still give them the freedom to weight that voilition as small as they like?
I wish you had written this a few weeks earlier, because it's perfect as a link for the "their associated difficulties and dangers" phrase in my "Complexity of Value != Complexity of Outcome" post.
Please consider upgrading this comment to a post, perhaps with some links and additional explanations. For example, what is the ontology problem in ethics?
In practice, I find that this is never a problem. You usually rest your values on some intuitively obvious part whatever originally caused you to create the concepts in question.
I think mind copying technology may be a better illustration of the subjective anticipation problem than MW QM, but I agree that it's a good example of the ontology problem. BTW, do you have a reference for where the ontology problem was first stated, in case I need to reference it in the future?
Thanks for the pointer, but I think the argument you gave in that post is wrong. You argued that an agent smaller than the universe has to represent its goals using an approximate ontology (and therefore would have to later re-phrase its goals relative to more accurate ontologies). But such an agent can represent its goals/preferences in compressed form, instead of using an approximate ontology. With such compressed preferences, it may not have the computational resources to determine with certainty which course of action best satisfies its preferences, but that is just a standard logical uncertainty problem.
I think the ontology problem is a real problem, but it may just be a one-time problem, where we or an AI have to translate our fuzzy human preferences into some well-defined form, instead of a problem that all agents must face over and over again.
I invented it sometime around the dawn of time, don't know if Marcello did in advance or not.
Actually, I don't know if I could have claimed to invent it, there may be science fiction priors.
Useful and interesting list, thanks.
I thought the point of defining CEV as what we would choose if we knew better was (partly) that you wouldn't have to subset. We wouldn't be superstitious, vengeful, and so on if we knew better.
Also, can you expand on what you mean by "Rawlesian Reflective Equilibrium"? Are you referring (however indirectly) to the "veil of ignorance" concept?
http://plato.stanford.edu/entries/reflective-equilibrium/
I am only part way through but I really recommend that link. So far it's really helped me think about this.
The rest of Rawls' Theory of Justice is good too. I'm trying to figure out for myself (before I finally break down and ask) how CEV compares to the veil of ignorance.
Learning about the game-theoretic roots of a desire seems to generally weaken its force, and makes it apparent that one has a choice about whether or not to retain it. I don't know what fraction of people would choose in such a state not to be vengeful, though. (Related: 'hot' and 'cold' motivational states. CEV seems to naturally privilege cold states, which should tend to reduce vengefulness, though I'm not completely sure this is the right thing to do rather than something like a negotiation between hot and cold subselves.)
What it's like to be hurt is also factual knowledge, and seems like it might be extremely motivating towards empathy generally.
Why do you think it likely that people would retain that evaluative judgment upon losing the closely coupled beliefs? Far more plausibly, they could retain the general desire to punish violations of conservative social norms, but see above.
"If we knew better" is an ambiguous phrase, I probably should have used Eliezer's original: "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". That carries a lot of baggage, at least for me.
I don't experience (significant) desires of revenge, so I can only extrapolate from fictional evidence. Say the "someone" in question killed a loved one, and I wanted to hurt them for that. Suppose further that they were no longer able to kill anyone else. Given the time and the means to think about it clearly, I coud see that hurting them would not improve the state of the world for me, or for anyone else, and only impose further unnecessary suffering.
The (possibly flawed) assumption of CEV, as I understood it, is that if I could reason flawlessly, non-pathologically about all of my desires and preferences, I would no longer cleave to the self-undermining ones, and what remains would be compatible with the non-self-undermining desires and preferences of the rest of humanity.
Caveat: I have read the original CEV document but not quite as carefully as maybe I should have, mainly because it carried a "Warning: obsolete" label and I was expecting to come across more recent insights here.
I find it interesting that there seems to be a lot of variation in people's views regarding how much coherence there'd be in an extrapolation... You say that choosing a right group of humans is important while I'm under the impression that there is no such problem; basically everyone should be the game, and making higher level considerations about which humans to include is merely an additional source of error. Nevertheless, if there'll be really as much coherence as I think, and I think there'd be hella lot, picking some subset of humanity would pretty much produce a CEV that is very akin to CEVs of other possible human groups.
I think that even being an Islamic radical fundamentalist is a petty factor in overall coherence. If I'm correct, Vladimir Nesov has said several times that people can be wrong about their values, and I pretty much agree. Of course, there is an obvious caveat that it's rather shaky to guess what other people's real values might be. Saying "You're wrong about your professed value X, you're real value is along the lines of Y because you cannot possibly diverge that much from the psychological unity of mankind" also risks seeming like claiming excessive moral authority. Still, I think it is a potentially valid argument, depending on the exact nature of X and Y.
I'd ask Omega, "Which construal of volition are you using?"
There's light in us somewhere, a better world inside us somewhere, the question is how to let it out. It's probably more closely akin to the part of us that says "Wouldn't everyone getting their wishes really turn out to be awful?" than the part of us that thinks up cool wishes. And it may even be that Islamic fundamentalists just don't have any note of grace in them at all, that there is no better future written in them anywhere, that every reasonable construal of them ends up with an atheist who still wants others to burn in hell; and if so, the test I cited in the other comment, about filtering portions of the extrapolated volition that wouldn't respect the volition of another who unconditionally respected theirs, seems like it ought to filter that.
The lives of most evildoers are of course largely incredibly prosaic, and I find it hard to believe their values in their most prosaic doings are that dissimilar from everyone else around the world doing prosaic things.
I think part of the point of what you call "moral anti-realism" is that it frees up words like "evil" so that they can refer to people who have particular kinds of "should function", since there's nothing cosmic that the word could be busy referring to instead.
If I had to offer a demonology, I guess I might loosely divide evil minds into: 1) those capable of serious moral reflection but avoiding it, e.g. because they're busy wallowing in negative other-directed emotion, 2) those engaging in serious moral reflection but making cognitive mistakes in doing so, 3) those whose moral reflection genuinely outputs behavior that strongly conflicts with (the extension of) one's own values. I think 1 comes closest to what's traditionally meant by "evil", with 2 being more "misguided" and 3 being more "Lovecraftian". As I understand it, CEV is problematic if most people are "Lovecraftian" but less so if they're merely "evil" or "misguided", and I think you may in general be too quick to assume Lovecraftianity. (ETA: one main reason why I think this is that I don't see many people actually retaining values associated with wrong belief systems when they abandon those belief systems; do you know of many atheists who think atheists or even Christians should burn in hell?)
Alternately: They're evil. They have a very different 'should function' to me.
Consider the distinction between whether the output of a preference-aggregation algorithm will be very different for the Angolan Christian, and whether it should be very different. Some preference-aggregation algorithms may just be confused into giving diverging results because of inconsequential distinctions, which would be bad news for everyone, even the "enlightened" westerners.
(To be precise, the relevant factual statement is about whether any two same-culture people get preferences visibly closer to each other than any two culturally distant people. It's like with relatively small genetic relevance of skin color, where within-race variation is greater than between-races variation.)
I think we agree about this actually - several people's picture of someone with alien values was an Islamic fundamentalist, and they were the "evildoers" I have in mind...
Eliezer has already talked about this and argued that the right thing would be to run the CEV on the whole of humanity, basing himself partly on an argument that if some particular group (not us) got control of the programming of the AI, we would prefer that they run it on the whole of humanity rather running it on themselves.
The right thing for me to do is to run CEV on myself, almost by definition. The CEV oracle that I am using to work out my CEV can dereference the dependencies to other CEVs better than I can.
If truly, really wildly different? Obviously, I'd just disassemble them to useful matter via nanobots.
No, not obviously; I can't say I've ever seen anyone else claim to completely condition their concern for other people on the possession of similar reflective preferences.
(Or is your point that they probably wouldn't stay people for very long, if given the means to act on their reflective preferences? That wouldn't make it OK to kill them before then, and it would probably constitute undesirable True PD defection to do so afterwards.)
Well, my above reply was a bit tongue-in-cheek. My concern for other things in general is just as complex as my morality and it contains many meta elements such as "I'm willing to modify my preference X in order to conform to your preference Y because I currently care about your utility to a certain extent". On the simplest level, I care for things on a sliding scale that ranges from myself to rocks or Clippy AIs with no functional analogues for human psychology (pain, etc.). Somebody with a literally wildly differing reflective preference would not be a person and, as you say, would be preferably dealt with in True PD manners rather than ordinary human-human altruism contaminated interactions.
This is a very nonstandard usage; personhood is almost universally defined in terms of consciousness and cognitive capacities, and even plausibly relevant desire-like properties like boredom don't have much to do with reflective preference/volition.
Maybe I'm crazy but all that doesn't sound so hard.
More precisely, there's one part, the solution to which should require nothing more than steady hard work, and another part which is so nebulous that even the problems are still fuzzy.
The first part - requiring just steady hard work - is everything that can be reduced to existing physics and mathematics. We're supposed to take the human brain as input and get a human-friendly AI as output. The human brain is a decision-making system; it's a genetically encoded decision architecture or decision architecture schema, with the parameters of the schema being set in the individual by genetic or environmental contingencies. CEV is all about answering the question: If a superintelligence appeared in our midst, what would the human race want its decision architecture to be, if we had time enough to think things through and arrive at a stable answer? So it boils down to asking, if you had a number of instances of the specific decision architecture human brain, and they were asked to choose a decision architecture for an entity of arbitrarily high intelligence that was to be introduced into their environment, what would be their asymptotically stable preference? That just doesn't sound like a mindbogglingly difficult problem. It's certainly a question that should be answerable for much simpler classes of decision architecture.
So it seems to me that the main challenge is simply to understand what the human decision architecture is. And again, that shouldn't be beyond us at all. The human genome is completely sequenced, we know the physics of the brain down to nucleons, there's only a finite number of cell types in the body - yes it's complicated, but it's really just a matter of sticking with the problem. (Or would be, if there was no time factor. But how to do all this quickly is a separate problem.)
So to sum up, all we need to do is to solve the decision theory problem 'if agents X, Y, Z... get to determine the value system and cognitive architecture of a new, superintelligent agent A which will be introduced into their environment, what would their asymptotic preference be?'; correctly identify the human decision architecture; and then substitute this for X, Y, Z... in the preceding problem.
That's the first part, the 'easy' part. What's the second part, the hard but nebulous part? Everything to do with consciousness, inconceivable future philosophy problems, and so forth. Now what's peculiar about this situation is that the existence of nebulous hard problems suggests that the thinker is missing something big about the nature of reality, and yet the easy part of the problem seems almost completely specified. How can the easy part appear closed, an exactly specified problem simply awaiting solution, and yet at the same time, other aspects of the overall task seem so beyond understanding? This contradiction is itself something of a nebulous hard problem.
Anyway, achieving the CEV agenda seems to require a combination of steady work on a well-defined problem where we do already have everything we need to solve it, and rumination on nebulous imponderables in the hope of achieving clarity - including clarity about the relationship between the imponderables and the well-defined problem. I think that is very doable - the combination of steady work and contemplation, that is. And the contemplation is itself another form of steady work - steadily thinking about the nebulous problems, until they resolve themselves.
So long as there are still enigmas in the existential equation we can't be sure of the outcome, but I think we can know, right now, that it's possible to work on the problem (easy and hard aspects alike) in a systematic and logical way.
Isn't this one of the problems you can let the FAI solve?
And what if preferences cannot be measured by a common "ruler"? What then?
Could you clarify for me what you mean by requiring that that a human consciousness be instantiated? Is it that you don't believe it is possible to elicit a CEV from a human if instantiation is involved or that you object to the consequences of simulating human consciousnesses in potentially undesirable situations?
In the case of the latter I observe that this is only a problem under certain CEVs and so is somewhat different in nature to the other requirements. Some people's CEVs could then be extracted more easily than others.
I am no where near caught up on FAI readings but here are is a humble thought.
What I have read so far seems to be assuming a single jump FAI. That is once the FAI is set it must take us to where we ultimately want to go without further human input. Please correct me if I am wrong.
What about a multistage approach?
The problem that people might immediately bring up is that a multistage approach might lead elevating subgoals to goals. We say, "take us to mastery of nanotech" and the AI decides to rip us apart and organize all existing ribosomes under a coherent command.
However, perhaps what we need to do is verify that any intermediate state goal better than the current state.
So what if we have the AI guess a goal state. Then simulate that goal state and expose some subset of humans to that simulation. The AI the asks "Proceed to this stage or no" The humans answer.
Once in the next stage we can reassess.
To give a sense of motivation: it seems that verifying the goodness of future-state is easier than trying to construct the basic rules of good statedness.