XiXiDu comments on What is Eliezer Yudkowsky's meta-ethical theory? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (368)
It seems so utterly wrong to me that I concluded it must be me who simply doesn't understand it. Why would it be right to help people to have more fun if helping people to have more fun does not match up with your current preferences. The main reason for why I was able to abandon religion was to realize that what I want implies what is right. That still feels intuitively right. I didn't expect to see many people on LW to argue that there exist preference/(agent/mind)-independent moral statements like 'it is right to help people' or 'killing is generally wrong'. I got a similar reply from Alicorn. Fascinating. This makes me doubt my own intelligence more than anything I've so far come across. If I parse this right it would mean that a Paperclip Maximizer is morally bankrupt?
Well, something I've been noticing is that in their tell your rationalist origin stories, the reason a lot of people give for why they left their religion aren't actually valid arguments. Make of that what you will.
Yes. It is morally bankrupt. (or would you not mind turning into paperclips if that's what the Paperclip Maximizer wanted?)
BTW, your current position is more-or-less what theists mean when they say atheists are amoral.
Yes, but that is a matter of taste.
Why would I ever change my current position? If Yudkowsky told me there was some moral laws written into the fabric of reality, what difference would that make? Either such laws are imperative, so that I am unable to escape them, or I simply ignore them if they are opposing my preferences.
Assume all I wanted to do is to kill puppies. Now Yudkowsky told me that this is prohibited and I will suffer disutility because of it. The crucial question would be, does the disutility outweigh the utility I assign to killing puppies? If it doesn't, why should I care?
Perhaps you assign net utility to killing puppies. If you do, you do. What EY tells you, what I tell you, what is prohibited, etc., has nothing to do with it. Nothing forces you to care about any of that.
If I understand EY's position, it's that it cuts both ways: whether killing puppies is right or wrong doesn't force you to care, but whether or not you care doesn't change whether it's right or wrong.
If I understand your position, it's that what's right and wrong depends on the agent's preferences: if you prefer killing puppies, then killing puppies is right; if you don't, it isn't.
My own response to EY's claim is "How do you know that? What would you expect to observe if it weren't true?" I'm not clear what his answer to that is.
My response to your claim is "If that's true, so what? Why is right and wrong worth caring about, on that model... why not just say you feel like killing puppies?"
I don't think those terms are useless, that moral doesn't exist. But you have to use those words with great care, because on its own they are meaningless. If I know what you want, I can approach the conditions that would be right for you. If I know how you define morality, I can act morally according to you. But I will do so only if I care about your preferences. If part of my preferences is to see other human beings happy then I have to account for your preferences to some extent, which makes them a subset of my preferences. All those different values are then weighted accordingly. Do you disagree with that understanding?
I agree with you that your preferences account for your actions, and that my preferences account for my actions, and that your preferences can include a preference for my preferences being satisfied.
But I think it's a mistake to use the labels "morality" and "preferences" as though they are interchangeable.
If you have only one referent -- which it sounds like you do -- then I would recommend picking one label and using it consistently, and not use the other at all. If you have two referents, I would recommend getting clear about the difference and using one label per referent.
Otherwise, you introduce way too many unnecessary vectors for confusion.
It seems relatively clear to me that EY has two referents -- he thinks there are two things being talked about. If I'm right, then you and he disagree on something, and by treating the language of morality as though it referred to preferences you obscure that disagreement.
More precisely: consider a system S comprising two agents A and B, each of which has a set of preferences Pa and Pb, and each of which has knowledge of their own and the other's preferences. Suppose I commit an act X in S.
If I've understood correctly, you and EY agree that knowing all of that, you know enough in principle to determine whether X is right or wrong. That is, there isn't anything left over, there's no mysterious essence of rightness or external privileged judge or anything like that.
In this, both of you disagree with many other people, such as theists (who would say that you need to consult God's will to make that determination) and really really strict consequentialists (who would say that you need to consult the whole future history of the results of X to make that determination).
If I've understood correctly, you and EY disagree on symmetry. That is, if A endorses X and B rejects X, you would say that whether X is right or not is undetermined... it's right by reference to A, and wrong by reference to B, and there's nothing more to be said. EY, if I understand what he's written, would disagree -- he would say that there is, or at least could be, additonal computation to be performed on S that will tell you whether X is right or not.
For example, if A = pebblesorters and X = sorting four pebbles into a pile, A rejects X, and EY (I think) would say that A is wrong to do so... not "wrong with reference to humans," but simply wrong. You would (I think) say that such a distinction is meaningless, "wrong" is always with reference to something. You consider "wrong" a two-place predicate, EY considers "wrong" a one-place predicate -- at least sometimes. I think.
For example, if A = SHFP and B = humans and X = allowing people to experience any pain at all, A rejects X and B endorses X. You would say that X is "right_human" and "wrong_SHFP" and that whether X is right or not is insufficiently specified question. EY would say that X is right and the SHFP are mistaken.
So, I disagree with your understanding, or at least your labeling, insofar as it leads you to elide real disagreements. I endorse clarity about disagreement.
As for whether I agree with your position or EY's, I certainly find yours easier to justify.
Thanks for this, very enlightening! A very good framing and analysis of my beliefs.
Yeah. While I'm reasonably confident that he holds the belief, I have no confidence in any theories how he arrives at that belief.
What I have gotten from his writing on the subject is a combination of "Well, it sure seems that way to me," and "Well, if that isn't true, then I don't see any way to build a superintelligence that does the right thing, and there has to be a way to build a superintelligence that does the right thing." Neither of which I find compelling.
But there's a lot of the metaethics sequence that doesn't make much sense to me at all, so I have little confidence that what I've gotten out of it is a good representation of what's there.
It's also possible that I'm completely mistaken and he simply insists on "right" as a one-place predicate as a rhetorical trick; a way of drawing the reader's attention away from the speaker's role in that computation.
I am fairly sure EY would say (and I agree) that there's no reason to expect them to. Different agents with different preferences will have different beliefs about right and wrong, possibly incorrigibly different.
Humans and Babykillers as defined will simply never agree about how the universe would best be ordered, even if they come to agree (as a political exercise) on how to order the universe, without the exercise of force (as the SHFP purpose to do, for example).
Um.
Certainly, this model says that you can order world-states in terms of their rightness and wrongness, and there might therefore be a single possible world-state that's most right within the set of possible world-states (though there might instead be several possible world-states that are equally right and better than all other possibilities).
If there's only one such state, then I guess "right" could designate a future world state; if there are several, it could designate a set of world states.
But this depends on interpreting "right" to mean maximally right, in the same sense that "cold" could be understood to designate absolute zero. These aren't the ways we actually use these words, though.
I don't see what the concept of free will contributes to this discussion.
I'm fairly certain that EY would reject the idea that what's right is logically implied by cause and effect, if by that you mean that an intelligence that started out without the right values could somehow infer, by analyzing causality in the world, what the right values were.
My own jury is to some degree still out on this one. I'm enough of a consequentialist to believe that an adequate understanding of cause and effect lets you express all judgments about right and wrong action in terms of more and less preferable world-states, but I cannot imagine how you could derive "preferable" from such an understanding. That said, my failure of imagination does not constitute a fact about the world.
Humans and Babykillers are not talking about the same subject matter when they debate what-to-do-next, and their doing different things does not constitute disagreement.
There's a baby in front of me, and I say "Humans and Babykillers disagree about what to do next with this baby."
The one replies: "No, they don't. They aren't talking about the same subject when they debate what to do next; this is not a disagreement."
"Let me rephrase," I say. "Babykillers prefer that this baby be killed. Humans prefer that this baby have fun. Fun and babykilling can't both be implemented on the same baby: if it's killed, it's not having fun; if it's having fun, it hasn't been killed."
Have I left out anything of value in my restatement? If so, what have I left out?
More generally: given all the above, why should I care whether or not what humans and Babykillers have with respect to this baby is a disagreement? What difference does that make?
The fact that killing puppies is wrong follows from the definition of wrong. The fact that Eliezer does not want to do what is wrong is a fact about his brain, determined by introspection.
Because right is a rigid designator. It refers to a specific set of terminal values. If your terminal values don't match up with this specific set of values, then they are wrong, i.e. not right. Not that you would particularly care, of course. From your perspective, you only want to maximize your own values and no others. If your values don't match up with the values defined as moral, so much for morality. But you still should be moral because should, as it's defined here, refers to a specific set of terminal values - the one we labeled "right."
(Note: I'm using the term should exactly as EY uses it, unlike in my previous comments in these threads. In my terms, should=should_human and on the assumption that you, XiXiDu, don't care about the terminal values defined as right, should_XiXiDu =/= should)
I'm getting the impression that nobody here actually disagrees but that some people are expressing themselves in a very complicated way.
I parse your comment to mean that the definition of moral is a set of terminal values of some agents and should is the term that they use to designate instrumental actions that do serve that goal?
Your second paragraph looks correct. 'Some agents' refers to humanity rather than any group of agents. Technically, should is the term anything should use when discussing humanity's goals, at least when speaking Eliezer.
Your first paragraph is less clear. You definitely disagree with others. There are also some other disagreements.
Correct, I disagree. What I wanted to say with my first paragraph was that I might disagree because I don't understand what others believe because they expressed it in a way that was too complicated for me to grasp. You are also correct that I myself was not clear in what I tried to communicate.
ETA That is if you believe that disagreement fundamentally arises out of misunderstanding as long as one is not talking about matters of taste.
In Eliezer's metaethics, all disagreement are from misunderstanding. A paperclip maximizer agrees about what is right, it just has no reason to act correctly.
To whoever voted the parent down, this is edit nearly /edit exactly correct. A paperclip maximizer could, in principle, agree about what is right. It doesn't have to, I mean a paperclip maximizer could be stupid, but assuming it's intelligent enough, it could discover what is moral. But a paperclip maximizer doesn't care about what is right, it only cares about paperclips, so it will continue maximizing paperclips and only worry about what is "right" when doing so helps it create more paperclips. Right is a specific set of terminal values that the paperclip maximizer DOESN"T have. On the other hand you, being human, do have those terminal values on EY's metaethics.
Agreed that a paperclip maximizer can "discover what is moral," in the sense that you're using it here. (Although there's no reason to expect any particular PM to do so, no matter how intelligent it is.)
Can you clarify why this sort of discovery is in any way interesting, useful, or worth talking about?
It drives home the point that morality is an objective feature of the universe that doesn't depend on the agent asking "what should I do?"
Huh. I don't see how it drives home that point at all. But OK, at least I know what your intention is... thank you for clarifying that.
Fascinating. I still don't understand in what sense this could be true, except maybe the way I tried to interpret EY here and here. But those comments simply got downvoted without any explanation or attempt to correct me, therefore I can't draw any particular conclusion from those downvotes.
You could argue that morality (what is right?) is human and other species will agree that from a human perspective what is moral is right is right is moral. Although I would agree, I don't understand how such a confusing use of terms is helpful.
I was the second to vote down the grandparent. It is not exactly correct. In particular it claims "all disagreement" and "a paperclip maximiser agrees", not "could in principle agree".
While the comment could perhaps be salvaged with some tweaks, as it stands it is not correct and would just serve to further obfuscate what some people find confusing as it is.
I concede that I was implicitly assuming that all agents have access to the same information. Other than that, I can think of no source of disagreements apart from misunderstanding. I also meant that if paperclip maximizer attempted to find out what is right and did not make any mistakes, it would arrive at the same answer as a human, though there is not necessarily any reason for it to try in the first place. I do not think that these distinctions were nonobvious, but this may be overconfidence on my part.
Can you say more about how the sufficiently intelligent paperclip maximizer goes about finding out what is right?
Yep, with the caveat that endoself added below: "should" refers to humanity's goals, no matter who is using the term (on EY's theory and semantics).
And if you modify this to say a certain subset of what you want -- the subset you'd still call "right" given omniscience, I think -- then it seems correct, as far as it goes. It just doesn't get you any closer to a more detailed answer, specifying the subset in question.
Or not much closer. At best it tells you not to worry that you 'are' fundamentally evil and that no amount of information would change that.
For what it's worth, I'm also one of those people, and I never did have religion. I don't know if there's a correlation there.
It is useful to think of right and wrong as being some agent's preferences. That agent doesn't have to be you - or even to exist IRL. If you are a sadist (no slur intended) you might want to inflict pain - but that would not make it "right" - in the eyes of conventional society.
It is fairly common to use "right" and "wrong" to describe society-level preferences.
Why would a sadistic Boltzmann brain conclude that it is wrong to be a sadistic Boltzmann brain? Whatever some society thinks is completely irrelevant to an agent with outlier preferences.
Morality serves several functions:
The lower items on the list have some significance, IMO.