You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

wedrifid comments on In favour of a selective CEV initial dynamic - Less Wrong Discussion

12 [deleted] 21 October 2011 05:33PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (110)

You are viewing a single comment's thread. Show more comments above.

Comment author: wedrifid 27 October 2011 09:53:51AM 0 points [-]

See where this approach gets us?

Not anywhere closer to understanding how altruism and morality apply to extrapolated volition for a start.

Note that the conditions that apply to the quote but that are not included are rather significant. Approximately it is conditional on your volition being to help other agents do catastrophically bad things to the future light cone.

What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).

Comment author: D_Alex 31 October 2011 05:22:05AM 0 points [-]

What I am confident you do not understand is that excluding wannabe accomplices to Armageddon from the set of agents given to a CEV implementation does not even rule out (or even make unlikely) the resultant outcome taking into consideration all the preferences of those who are not safe to include (and just ignoring the obnoxiously toxic ones).

I barely understand this sentence. Do you mean: Excluding "jerks" from CEV does not guarantee that their destructive preferences will not be included?

If so, I totally do not agree with you, as my opinion is: Including "jerks" in CEV will not pose a danger, and saves the trouble of determining who is a "jerk" in the first place.

This is based on the observation that "jerks" are a minority, an opinion that "EV-jerks" are practically non-existent, and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV. If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.

Comment author: wedrifid 31 October 2011 06:48:18AM *  1 point [-]

I hope you are right. But that is what it is, hope. I cannot know with any confidence that and Artificial Intelligence implementing CEV<every human> is Friendly. I cannot know if it will result in me and the people I care about continuing to live. It may result in something that, say, Robin Hanson considers desirable (and I would consider worse than simple extinction.)

Declaring CEV<humanity> to be optimal amounts to saying "I have faith that everyone is allright on the inside and we would all get along if we thought about it a bit more. Bullshit. That's a great belief to have if you want to signal your personal ability to enforce cooperation in your social environment but not a belief that you want actual decision makers to have. Or, at least, not one you want them to simply assume without huge amounts of both theoretical and empirical research.

(Here I should again refer you to the additional safeguards Eliezer proposed/speculated on for in case CEV<humanity> results in Jerkiness. This is the benefit of being able to acknowledge that CEV<humanity> isn't good by definition. You can plan ahead just in case!)

If you disagree with any of these, please elaborate, but use a writing style that does not exceed the comprehension abilities of an M. Eng.

It is primarily a question of understanding (and being willing to understand) the content.

This is based on the observation that "jerks" are a minority, an opinion that "EV-jerks" are practically non-existent

You don't know that. Particularly since EV is not currently sufficiently defined to make any absolute claims. EV doesn't magically make people nice or especially cooperative unless you decide to hack in a "make nicer" component to the extrapolation routine.

and an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV

You don't know that either. The 'coherence' part of CEV is even less specified than the EV part. Majority rule is one way of resolving conflicts between competing agents. It isn't the only one. But I don't even know that AI<CEV<'majority'>> results in something I would consider Friendly. Again, there is a decent chance that it is not-completely-terrible but that isn't something to count on without thorough research and isn't an ideal to aspire to either. Just something that may need to be compromised down to.

Comment author: lessdazed 31 October 2011 07:52:39AM 0 points [-]

The 'coherence' part of CEV is even less specified than the EV part.

One possibility is the one inclined to shut down rather than do anything not neutral or better from every perspective. This system is pretty likely useless, but likely to be safe too, and not certainly useless. Variants allow some negatives, but I don't know how one would draw a line - allowing everyone a veto and requiring negotiation with them would be pretty safe, but also nearly useless.

EV doesn't magically make people nice or especially cooperative

I'm not sure exactly what you're implying so I'll state something you may or may not agree with. It seems likely it makes people more cooperative in some areas, and has unknown implications in other areas, so as to whether it makes them ultimately more or less cooperative, that is unknown. But the little we can see is of cooperation increasing, and it would be unreasonable to be greatly surprised in the event that were found to be the overwhelming net effect.

But I don't even know that AI<CEV<'majority'>> results in something I would consider Friendly.

As most possible minds don't care about humans, I object to using "unfriendly" to mean "an AI that would result in a world that I don't value." I think it better to use "unfriendly" to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.

Comment author: wedrifid 31 October 2011 01:09:09PM *  0 points [-]

As most possible minds don't care about humans, I object to using "unfriendly" to mean "an AI that would result in a world that I don't value." I think it better to use "unfriendly" to mean those minds indifferent to humans and the few hateful ones. Those that have value according to many but not all, such as perhaps those that seriously threaten to torture people, but only when they know those threatened will buckle, are better thought of as being a subspecies of Friendly AI.

I disagree. I will never refer to anything that wants to kill or torture me as friendly. Because that would be insane. AIs that are friendly to certain other people but not to me are instances of uFAIs in the same way that paperclippers are uFAIs (that are Friendly to paperclips). I incidentally also reject FAI<babyeaters> and FAI<superhappies>. Although in the latter case I would still choose it as an alternative to nothing (which likely defaults to extinction).

Mind you the nomenclature isn't really sufficient to the task either way. I prefer to make my meaning clear of ambiguities. So if talking about "Friendly" AI that will kill me I tend to use the quotes that I just used while if I am talking about something that is Friendly to a specific group I'll parameterize.

Comment author: lessdazed 31 October 2011 04:21:53PM 0 points [-]

I will never refer to anything that wants to kill or torture me as friendly

OK - this is included under what I would suggest to call "Friendly", certainly if it only wanted to do so instrumentally, so we have a genuine disagreement. This is a good example for you to raise, as most even here might agree with how you put that.

Nonetheless, my example is not included under this, so let's be sure not to talk past each other. It was intended to be a moderate case, one in which you might not call something friendly when many others here would* - one in which a being wouldn't desire to torture you, and would be bluffing if only in the sense that it had scrupulously avoided possible futures in which anyone would be tortured, if not in other senses (i.e. it actually would torture you, if you chose the way you won't).

As for not killing you, that sounds like an obviously badly phrased genie wish. As a similar point to the one you expressed would be reasonable and fully contrast with mine I'm surprised you added that.

One can go either way (or other or both ways) on this labeling. I am apparently buying into the mind-projection fallacy and trying to use "Friendly" the way terms like "funny" or "wrong" are regularly used in English. If every human but me "finds something funny", it's often least confusing to say it's "a funny thing that isn't funny to me" or "something everyone else considers wrong that I don't consider "wrong" (according to the simplest way of dividing concept-space) that is also advantageous for me". You favor taking this new term and avoiding using the MPF, unlike for other English terms, and having it be understood that listeners are never to infer meaning as if the speaker was committing it, I favor just using it like any other term.

So:

Mind you the nomenclature isn't really sufficient to the task either way

My way, a being that wanted to do well by some humans and not others would be objectively both Friendly and Unfriendly, so that might be enough to make my usage inferior. But if my molecules are made out of usefulonium, and no one else's are, I very much mind a being exploiting me for that, but wouldn't mind other humans calling that being friendly when it uses the usefulonium to shield the Earth from a supernova, or whatever - and it's not just not minding by comparison, either.

*I mean both when others refer to beings making analogous threats to them and to the one that would make them to you.

Comment author: lessdazed 31 October 2011 07:25:45AM *  0 points [-]

Do you mean: Excluding "jerks" from CEV does not guarantee that their destructive preferences will not be included?

If so, I totally do not agree with you

Through me, my dog is included. All the more so mothers' sons!

an understanding that where a direct conflict exist between EV of a minority and EV of a majority, it is the EV of a majority that will prevail in the CEV.

I don't think this is true, the safeguard that's safe is to shut down if a conflict exists. That way, either things are simply better or no worse; judging between cases when each case has some advantages over the other is tricky.