wedrifid comments on In favour of a selective CEV initial dynamic - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (110)
I am wary of using arguments along the lines of "CEV<not everyone> is better for everyone than CEV<everyone>". If calculating based on a subset happens to be the most practical instrumentally useful hack for implementing CEV<everyone> then an even remotely competent AI can figure that out itself.
I would still implement the CEV<subset> option but I'd do it for real reasons.
That something you want to say in public?
Yes. I really don't want the volition of psychopaths, suicidal fanatics and jerks in general to be extrapolated in such a way as it could destroy all that I hold dear. Let this be my solemnly sworn testimony made public where all can see. Allow me (wedrifid_2011) to commit to my declaration of my preferences as of the 21st of October by requesting that you quote me, leaving me unable to edit it away.
wedrifid wrote:
Jack wrote:
wedrifid wrote:
Yes, but now they see you coming.
You are treading on treacherous moral ground! Your "jerk" may be my best mate (OK, he's a bit intense... but you are no angel either!). Your "suicidal fanatic" may be my hero. As for psychopaths, see this.
Also, I can understand "I really don't want the volition of ANYONE to be extrapolated in such a way as it could destroy all that I hold dear" - why pick on psychopaths, suicidal fanatics and jerks in particular?
If so then I don't want your volition extrapolated either. Because that would destroy everything I hold dear as well (given the extent to which you would either care about their dystopic values yourself or care about them getting those same values achieved).
I obviously would prefer an FAI to extrapolate only MY volition. Any other preference is a trivial reductio to absurdity. The reason to support the implementation of an FAI that extrapolates more generally is so that I can cooperate with other people whose preferences are not too much different to mine (and in some cases may even resolve to be identical). Cooperative alliances are best formed with people with compatible goals and not those whose success would directly sabotage your own.
Do I need to write a post "Giving a few examples does not assert a full specification of a set"? I'm starting to feel the need to have such a post to link to pre-emptively.
How? As is, psychopaths have some influence, and I don't consider the world worthless. Whatever their slice of a much larger pie, how would that be a difference in kind, something other than a lost opportunity?
There is a reasonable good chance that when averaged out by the currently unspecified method used by the CEV process that any abominable volitions are offset by volitions that are at least vaguely acceptable. But that doesn't mean including Jerks (where 'Jerk' is defined as agents whose extrapolated volitions are deprecated) in the process that determines the fate of the universe is The Right Thing To Do any more than including paperclippers, superhappies and babyeaters in the process is obviously The Right Thing To Do.
CEV<all of the humans> might turn out OK. Given the choice of setting loose a {Superintelligence Optimising CEV<all of the humans>} or {Nothing At All nothing at all and we all go extinct} I'll choose the former. There are also obvious political reasons why such a compromise might be necessary.
If anyone thinks that CEV<all of the humans> is not a worse thing to set loose than CEV<all of the humans that are not obviously Jerks> then they are not being altruistic or moral they are being confused about a matter of fact.
Disclaimer that is becoming almost mandatory in this kind of discussion: altruism, ethics and morality belong inside utility functions and volitions not in game theory or abstract optimisation processes.
Sure, inclusion is a thing that causes good and bad outcomes, and not necessarily net good outcomes.
Sure, but it's not logically necessary that it's a compromise, though it might be. It might be that the good outweighs the bad, or not, I'm not sure from where I stand.
Because I value inclusiveness more than zero, that's not necessarily true. It's probably true, or better yet, if one includes the best of the obvious Jerks with the rest of humanity, it's quite probably true. All else equal, I'd rather an individual be in than out, so if someone is all else equal worse than useless but only light ballast, having them is a net good.
It's Adam and Eve, not Adam and Vilfredo Pareto!
Huh? Chewbacca?
I think your distinction is artificial, can you use it to show how an example question is a wrong question and another isn't, and show how your distinction sorts among those two types well?
You are a jerk!
. . . .
See where this approach gets us?
Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start.
Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone.
What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).
I barely understand this sentence. Do you mean: Excluding "jerks" from CEV does not guarantee that their destructive preferences will not be included?
If so, I totally do not agree with you, as my opinion is: Including "jerks" in CEV will not pose a danger, and saves the trouble of determining who is a "jerk" in the first place.
This is based on the observation that "jerks" are a minority, an opinion that "EV-jerks" are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.
I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV<every human> is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.)
Declaring CEV<humanity> to be optimal amounts to saying "I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That's a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research.
(Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV<humanity> results in Jerkiness. This is the benefit of being able to acknowledge that CEV<humanity> isn't good by definition. You can plan ahead just in case!)
It is primarily a question of understanding (and being willing to understand) the content.
You don't know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn't magically make people nice or especially cooperative unless you decide to hack in a "make nicer" component to the extrapolation routine.
You don't know that either. The 'coherence' part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn't the only one. But I don't even know that AI<CEV<'majority'>> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn't something to count on without thorough research and isn't an ideal to aspire to either. Just something that may need to be compromised down to.
One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don't know how one would draw a line - allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless.
I'm not sure exactly what you're implying so I'll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect.
As most possible minds don't care about humans, I object to using "unfriendly" to mean "an AI that would result in a world that I don't value." I think it better to use "unfriendly" to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.
Through me, my dog is included. All the more so mothers' sons!
I don't think this is true, the safeguard that's safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.
You missed my point 3 times out of 3. Wait, I'll put down the flyswatter and pick up this hammer...:
Excluding certain persons from CEV creates issues that CEV was intended to resolve in the first place. The mechanic you suggest - excluding persons that YOU deem to be unfit - might look attractive to you, but it will not be universally acceptable.
Note that "our coherent extrapolated volition is our wish if we knew more, were smarter..." etc . The EVs of yourself and that suicidal fanatic should be pretty well aligned - you both probably value freedom, justice, friendship, security and like good food, sex and World of Warcraft(1)... you just don't know why he believes that suicidal fanaticism is the right way under his circumstances, and he is, perhaps, not smart enough to see other options to strive for his values.
Can I also ask you to re-read CEV, paying particular attention to Q4 and Q8 in the PAQ section? They deal with the instinctive discomfort of including everyone in the CEV.
(1) that was a backhand with the flyswatter, which I grabbed with my left hand just then.
No. I will NOT assume that extrapolating the volition of people with vastly different preferences to me will magically make them compatible with mine. The universe is just not that convenient. Pretending it is while implementing a FAI is suicidally naive.
I'm familiar with the document, as well as approximately everything else said on the subject here, even in passing. This includes Eliezer propozing ad-hoc work arounds to the "What if people are jerks?" problem.
What do you mean? As an analogy, .01% sure and 99.99% sure are both states of uncertainty. EVs are exactly the same or they aren't. If someone's unmuddled EV is different than mine - and it will be - I am better off with mine influencing the future alone rather than the future being influenced by both of us, unless my EV sufficiently values that person's participation.
My current EV places some non-infinite value on each person's participation. You can assume for the sake of argument each person's EV would more greatly value this.
You can correctly assume that for each person, all else equal, I'd rather have them than not, (though not necessarily at the cost of having the universe diverted from my wishes) but I don't really see why the death of most of the single ring species that is everything alive today makes selecting humans alone for CEV the right thing to do in a way that avoids the problem of excluding the disenfranchised whom the creators don't care sufficiently about.
If enough humans value what other humans want, and more so when extrapolated, it's an interlocking enough network to scoop up all humans but the biologist who spends all day with chimpanzees (dolphins, octopuses, dogs, whatever) is going to be a bit disappointed by the first-order exclusion of his or her friends from consideration.
I mean, once they both take pains to understand each other's situation and have a good, long think about it, they would find that they will agree on the big issues and be able to easily accommodate their differences. I even suspect that overall they would value the fact that certain differences exist.
EVs can, of course, be exactly the same, or differ to some degree. But - provided we restrict ourselves to humans - the basic human needs and wants are really quite consistent across an overwhelming majority. There is enough material (on the web and in print) to support this.
Wedrifid (IMO) is making a mistake of confusing some situation dependent subgoals (like say "obliterate Israel" or "my way or the highway") with high level goals.
I have not thought about extending CEV beyond human species, apart from taking into account the wishes of your example biologists etc. I suspect it would not work, because extrapolating wishes of "simpler" creatures would be impossible. See http://xkcd.com/605/.
You are mistaken. That I entertain no such confusion should be overwhelmingly clear from reading nearby comments.
That sounds awfully convenient. If there really is a threshold of how "non-simple" a lifeform has to be to have coherently extrapolatable volitions, do you have any particular evidence that humans clear that threshold and, say, dolphins don't?
For my part, I suspect strongly that any technique that arrives reliably at anything that even remotely approximates CEV for a human can also be used reliably on many other species. I can't imagine what that technique would be, though.
(Just for clarity: that's not to say one has to take other species' volition into account, any more than one has to take other individuals' volition into account.)
The lack of threshold is exactly the issue. If you include dolphins and chimpanzees, explicitly, you'd be in a position to apply the same reasoning to include parrots and dogs, then rodents and octopi, etc, etc.
Eventually you'll slide far enough down this slippery slope to reach caterpillars and parasitic wasps. Now, what would a wasp want to do, if it understood how its acts affect the other creatures worthy of inclusion in the CEV?
This is what I see as the difficulty in extrapolating the wishes of simpler creatures. Perhaps in fact there is a coherent solution, but having only thought about this a little, I suspect there might not be one.
Just, uh... just making sure: you do know that wedrifid has more fourteen thousand karma for a reason, right? It's actually not solely because he's an oldtimer, he can be counted on to have thought about this stuff pretty thoroughly.
Edit: I'm not saying "defer to him because he has high status", I'm saying "this is strong evidence that he is not an idiot."
I admit to being a little embarrassed as I wrote that paragraph, because this sort of thing can come across as "fuck you". Not my intent at all, just that the reference is relevant, well written, supports my point - and is too long to quote.
Having said that, your comment is pretty stupid. Yes, he has heaps more karma here - so what? I have more karma here than R. Dawkins and B. Obama combined!
(I prefer "Godspeed!")
The "so what" is, he's already read it. Also, he's, you know, smart. A bit abrasive (or more than a bit), but still. He's not going to go "You know, you're right! I never thought about it that way, what a fool I've been!"
Edit: Discussed here.
I suppose "ethical egoism" fits. But only in some completely subverted "inclusive ethical egoist" sense in which my own "self interest" already takes into account all my altruistic moral and ethical values. ie. I'm basically not an ethical egoist at all. I just put my ethics inside the utility function where they belong.
I'm not sure this can mean one thing that is also important.
Huh? Yes it can. It means "results in something closer to CEV<everyone> than the alternative does", which is pretty damn important given that it is exactly what the context was talking about.
I agree that context alone pointed to that interpretation, but as that makes your statement a tautology, I thought it more likely than not you were referencing a more general meaning than the one under discussion. This was particularly so because of the connotations of "wary", i.e. "this sort of argument tends to seem more persuasive than it should, but the outside view doesn't rule them out entirely," rather than "arguments of this form are always wrong because they are logically inconsistent".
Because Phlebas's argument is not, in fact, tautologically false and is merely blatantly false I chose to refrain from a (false) accusation of inconsistency.
Here is the post that you linked to, in which you ostensibly prove that an excerpt of my essay was blatantly false:
Phlebas:
wedrifid:
Note that I have made no particular claim in this excerpt about how likely it is that a selective CEV would produce output closer to that of an ideal universal CEV dynamic than a universal CEV would. I merely claimed that a universal CEV dynamic designed by humans is not what humans most desire collectively “by definition”, i.e. it is not logically necessary that it approximates the ideal human-wide CEV (such as a superintelligence might develop) better than the selective CEV.
Here is a comment claiming that CEV most accurately identifies a group’s average desires “by definition” (assuming he doesn’t edit it). So it is not a strawman position that I am criticising in that excerpt.
You argue that even given a suboptimal initial dynamic, the superintelligent AI “can” figure out a better dynamic and implement that instead. Well of course it “can” – nowhere have I denied that the universal CEV might (with strong likelihood in fact) ultimately produce at least as close an approximation to the ideal CEV of humanity as a selective CEV would.
Nonetheless, high probability =/= logical necessity. Therefore you may wish to revisit your accusation of blatant fallacy.
How probable exactly is an interesting question, but best left alone in this comment since I don't wish to muddy the waters regarding the nature of the original statement that you were criticising.