TheAncientGeek comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
Would you accept that an AI could figure out morality better than you?
Don't really want to go into the whole mess of "is morality discovered or invented", "does morality exist", "does the number 3 exist", etc. Let's just assume that you can point FAI at a person or group of people and get something that maximizes goodness as they understand it. Then FAI pointed at Mark would be the best thing for Mark, but FAI pointed at all of humanity (or at a group of people who donated to MIRI) probably wouldn't be the best thing for Mark, because different people have different desires, positional goods exist, etc. It would be still pretty good, though.
Mark was complaining he would not get "his" morality, not that he wouldn't get all his preferences satisified.
Individual moralities makes no sense to me, any more than private languages or personal currencies.
It is obvious to me that any morlaity will require concessions: AI-imposed morality is not special in that regard.
I don't understand your comment, and I no longer understand your grandparent comment either. Are you using a meaning of "morality" that is distinct from "preferences"? If yes, can you describe your assumptions in more detail? It's not just for my benefit, but for many others on LW who use "morality" and "preferences" interchangeably.
Do that many people really use them interchangeably? Would these people understand the questions "Do you prefer chocolate or vanilla ice-cream?" as completely identical in meaning to "Do you consider chocolate or vanilla as the morally superior flavor for ice-cream?"
I don't care about colloquial usage, sorry. Eliezer has a convincing explanation of why wishes are intertwined with morality ("there is no safe wish smaller than an entire human morality"). IMO the only sane reaction to that argument is to unify the concepts of "wishes" and "morality" into a single concept, which you could call "preference" or "morality" or "utility function", and just switch to using it exclusively, at least for AI purposes. I've made that switch so long ago that I've forgotten how to think otherwise.
I recommend you re-learn how to think otherwise so you can fool humans into thinking you're one of them ;-).
"Intertwined with" does not mean "the same as".
I am not convinced by the explanation. It also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preferece when wishing for yummy ice cream.
To be sure, an agent capable of acting morally will have morality among their higher priority preferences -- it has to be among the higher order preferences, becuase it has to override other preferences for the agent to act morally. Therefore, when they scan their higher prioriuty prefences, they will happen to encounter their moral preferences. But that does not mean any preference is necessarily a moral preference. And their moral prefences override other preferences which are therefore non-moral, or at least less moral.
Therefore morality si a subset of prefences, as common sense maintained all along.
IMO, it is better to keep ones options open.
I don't experience the emotions of moral outrage and moral approval whenever any of my preferences are hindered/satisfied -- so it seems evident that my moral circuitry isn't identical to my preference circuitry. It may overlap in parts, it may have fuzzy boundaries, but it's not identical.
My own view is that morality is the brain's attempt to extrapolate preferences about behaviours as they would be if you had no personal stakes/preferences about a situation.
So people don't get morally outraged at other people eating chocolate icecreams, even when they personally don't like chocolate icecreams, because they can understand that's a strictly personal preference. If they believe it to be more than personal preference and make it into e.g. "divine commandment" or "natural law", then moral outrage can occur.
That morality is a subjective attempt at objectivity explains many of the confusions people have about it.
The ice cream example is bad because the consequences are purely internal to the person consuming the ice cream. What if the chocolate ice cream was made with slave labour? Many people would then object to you buying it on moral grounds.
Eliezer has produced an argument I find convincing that morality is the back propagation of preference to the options of an intermediate choice. That is to say, it is "bad" to eat chocolate ice cream because it economically supports slavers, and I prefer a world without slavery. But if I didn't know about the slave-labour ice cream factory, my preference would be that all-things-being-equal you get to make your own choices about what you eat, and therefore I prefer that you choose (and receive) the one you want, which is your determination to make, not mine.
Do you agree with EY's essay on the nature of right-ness which I linked to?
That doesn't seem to be required for Eliezer's argument...
I guess the relevant question is, do you think FAI will need to treat morality differently from other preferences?
I would prefer a AI that followed my extrapolated preferences, than a AI that followed my morality. But a AI that followed my morality would be morally superior to an AI that followed my extrapolated preferences.
If you don't understand the distinction I'm making above, consider a case of the AI having to decide whether to save my own child vs saving a thousand random other children. I'd prefer the former, but I believe the latter would be the morally superior choice.
Is that idea really so hard to understand? Would you dismiss the distinction I'm making as merely colloquial language?
Wow there is so much wrapped up in this little consideration. The heart of the issue is that we (by which I mean you, but I share your delimma) have truly conflicting preferences.
Honestly I think you should not be afraid to say that saving your own child is the moral thing to do. And you don't have to give excuses either - it's not that “if everyone saved their own child, then everyone's child will be looked after” or anything like that. No, the desire to save your own child is firmly rooted in our basic drives and preferences, enough so that we can go quite far in calling it a basic foundational moral axiom. It's not actually axiomatic, but we can safely treat it as such.
At the same time we have a basic preference to seek social acceptance and find commonality with the people we let into our lives. This drives us to want outcomes that are universally or at least most-widely acceptable, and seek moral frameworks like utilitarianism which lead to these outcomes. Usually this drive is secondary to self-serving preferences for most people, and that is OK.
For some reason you've called making decisions in favor of self-serving drives "preferences" and decisions in favor of social drives "morality." But the underlying mechanism is the same.
"But wait, if I choose self-serving drives over social conformity, doesn't that lead to me to make the decision to save one life in exclusion to 1000 others?" Yes, yes it does. This massive sub-thread started with me objecting to the idea that some "friendly" AI somewhere could derive morality experimentally from my preferences or the collective preferences of humankind, make it consistent, apply the result universally, and that I'd be OK with that outcome. But that cannot work because there is not, and cannot be a universal morality that satisfies everyone - every one of those thousand other children have parents that want their kid to survive and would see your child dead if need be.
If you were offered a bunch of AIs with equivalent power, but following different mixtures of your moral and non-moral preferences, which one would you run? (I guess you're aware of the standard results saying a non-stupid AI must follow some one-dimensional utility function, etc.)
My related but different thoughts here. In particular, I don't agree that emotions like moral outrage and approval are impersonal, though I agree that we often justify those emotions using impersonal language and beliefs.
I didn't say that moral outrage and approval are impersonal. Obviously nothing that a person does can truly be "impersonal". But it may be an attempt at impersonality.
The attempt itself provides a direction that significantly differentiates between moral preferences and non-moral preferences.
I didn't mean some idealized humanly-unrealizable notion of impersonality, I meant the thing we ordinarily use "impersonal" to mean when talking about what humans do.
Ditto.
Cousin Itt, 'tis a hairy topic, so you're uniquely "suited" to offer strands of insights:
For all the supposedly hard and confusing concepts out there, few have such an obvious answer as the supposed dichotomy between "morality" and "utility function". This in itself is troubling, as too-easy-to-come-by answers trigger the suspicion that I myself am subject to some sort of cognitive error.
Many people I deem to be quite smart would disagree with you and I, on a question whose answer is pretty much inherent in the definition of the term "utility function" encompassing preferences of any kind, leaving no space for some holier-than-thou universal (whether human-universal, or "optimal", or "to be aspired to", or "neurotypical", or whatever other tortured notions I've had to read) moral preferences which are somehow separate.
Why do you reckon that other (or otherwise?) smart people come to different conclusions on this?
I guess they have strong intuitions saying that objective morality must exist, and aren't used to solving or dismissing philosophical problems by asking "what would be useful for building FAI?" From most other perspectives, the question does look open.
Moral preferences don't have to be separate to be disinct, they can be a subset. "Morality is either all your prefences, or none of your prefernces" is a false dichotomy.
Edit: Of course you can choose to call a subset of your preferences "moral", but why would that make them "special", or more worthy of consideration than any other "non-moral" preferences of comparative weight?
The "moral" subset of people's preferences has certain elements that differentiate it like e.g. an attempt at universalization.
The key issue is that, whilst morality is not tautologously the same as preferences, a morally right action is, tautologously, what you should do.
So it is difficult to see on what grounds Mark can object to the FAIs wishes: if it tells him something is mortally right that is what he should do. And he can't have his own separate morality, because the idea is incoherent.
I can't speak for cousin_it, natch, but for my own part I think it has to do with mutually exclusive preferences vs orthogonal/mutually reinforcing preferences. Using moral language is a way of framing a preference as mutually exclusive with other preferences.
That is... if you want A and I want B, and I believe the larger system allows (Kawoomba gets A AND Dave gets B), I'm more likely to talk about our individual preferences. If I don't think that's possible, I'm more likely to use universal language ("moral," "optimal," "right," etc.), in order to signal that there's a conflict to be resolved. (Well, assuming I'm being honest.)
For example, "You like chocolate, I like vanilla" does not signal a conflict; "Chocolate is wrong, vanilla is right" does.
Why stop at connotation and signalling? If there is a non-empty set of preferences whose satistfaction is inclined to lead to conflict, and a non-empty set of preferences that can be satisfied withotu conflict, then "morally relevant prefernece" can denote the members of the first set...which is not idenitcal to the set of all preferences.
For any such preference, you can immediately provide a utility function such that the corresponding agent would be very unhappy about that preference, and would give its life to prevent it.
Or do you mean "a set of preferences the implementation of which would on balance benefit the largest amount of agents the most"? That would change as the set of agents changes, so does the "correct" morality change too, then?
Also, why should I or anyone else particular care about about such preferences (however you define them), especially as the "on average" doesn't benefit me? Is it because evolutionary speaking, that's how what evolved? What our mirror neurons lead us towards? Wouldn't that just be a case of the naturalistic fallacy?
For my own part: denotationally, yes, I would understand "Do you prefer (that Dave eat) chocolate or vanilla ice cream?" and "Do you consider (Dave eating) chocolate ice cream or vanilla as the morally superior flavor for (Dave eating) ice cream?" as asking the same question.
Connotationally, of course, the latter has all kinds of (mostly ill-defined) baggage the former doesn't.
No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty, that it can't take further action (including simulating me to reflectively equilibrate my morality) without my consent, and should suspend the simulation, and return it to me immediately with the data asap (destruction no longer being possible due to the creation of sentience).
That is, assuming the AI cares at all about my morality, and not the its creators imbued into it, which is rather the point. And incidentally, why I work on AGI: I don't trust anyone else to do it.
Morality isn't some universal truth written on a stone tablet: it is individual and unique like a snowflake. In my current understanding of my own morality, it is not possible for some external entity to reach a full or even sufficient understanding of my own morality without doing something that I would consider to be unforgivable. So no, AI can't figure out morality better than me, precisely because it is not me.
(Upvoted for asking an appropriate question, however.)
Shrug. Then let's take a bunch of people less fussy than you: could a sitiably equipped AI emultate their morlaity better than they can?
That isn't fact.
That isn't a fact either, and doesn't follow from the above either, since moral nihilism could be true.
If my moral snowflake says I can kick you on your shin, and yours says I can't, do I get to kick on your shin?