Mark_Friedenbach comments on The genie knows, but doesn't care - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (515)
I would object. I seriously doubt that the morality instilled in someone else's FAI matches my own; friendly by their definition, perhaps, but not by mine. I emphatically do not want anything controlling the future of humanity, friendly or otherwise. And although that is not a popular opinion here, I also know I'm not the only one to hold it.
Boxing is important because some of us don't want any AI to get out, friendly or otherwise.
I find this concept of 'controlling the future of humanity' to be too vaguely defined. Let's forget AIs for the moment and just talk about people, namely a hypothetical version of me. Let's say I stumble across a vial of a bio-engineered virus that would destroy the whole of humanity if I release it into the air.
Am I controlling the future of humanity if I release the virus?
Am I controlling the future of humanity if I destroy the virus in a safe manner?
Am I controlling the future of humanity if I have the above decided by a coin-toss (heads I release, tails I destroy)?
Am I controlling the future of humanity if I create an online internet poll and let the majority decide about the above?
Am I controlling the future of humanity if I just leave the vial where I found it, and let the next random person that encounters it make the same decision as I did?
Yeah, this old post makes the same point.
I want a say in my future and the part of the world I occupy. I do not want anything else making these decisions for me, even if it says it knows my preferences, and even still if it really does.
To answer your questions, yes, no, yes, yes, perhaps.
If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?
e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?
Because it can't do two things when only one choice is possible (e.g. save my child and the 1000 other children in this artificial scenario). You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone.
That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.
Yes, but this current world without an AI isn't turning out rosy for everyone either.
Sure, but there's lots of abuse in the world without an AI also.
Replace "AI" with "omni-powerful tyrannical dictator" and tell me if you still agree with the outcome.
If you need specify the AI to be bad ("tyrannical") in advance, that's begging the question. We're debating why you feel that any omni-powerful algorithm will necessarily be bad.
Look up the origin of the word tyrant, that is the sense in which I meant it, as a historical parallel (the first Athenian tyrants were actually well liked).
Would you accept that an AI could figure out morality better than you?
Don't really want to go into the whole mess of "is morality discovered or invented", "does morality exist", "does the number 3 exist", etc. Let's just assume that you can point FAI at a person or group of people and get something that maximizes goodness as they understand it. Then FAI pointed at Mark would be the best thing for Mark, but FAI pointed at all of humanity (or at a group of people who donated to MIRI) probably wouldn't be the best thing for Mark, because different people have different desires, positional goods exist, etc. It would be still pretty good, though.
Mark was complaining he would not get "his" morality, not that he wouldn't get all his preferences satisified.
Individual moralities makes no sense to me, any more than private languages or personal currencies.
It is obvious to me that any morlaity will require concessions: AI-imposed morality is not special in that regard.
I don't understand your comment, and I no longer understand your grandparent comment either. Are you using a meaning of "morality" that is distinct from "preferences"? If yes, can you describe your assumptions in more detail? It's not just for my benefit, but for many others on LW who use "morality" and "preferences" interchangeably.
Do that many people really use them interchangeably? Would these people understand the questions "Do you prefer chocolate or vanilla ice-cream?" as completely identical in meaning to "Do you consider chocolate or vanilla as the morally superior flavor for ice-cream?"
I don't care about colloquial usage, sorry. Eliezer has a convincing explanation of why wishes are intertwined with morality ("there is no safe wish smaller than an entire human morality"). IMO the only sane reaction to that argument is to unify the concepts of "wishes" and "morality" into a single concept, which you could call "preference" or "morality" or "utility function", and just switch to using it exclusively, at least for AI purposes. I've made that switch so long ago that I've forgotten how to think otherwise.
I recommend you re-learn how to think otherwise so you can fool humans into thinking you're one of them ;-).
"Intertwined with" does not mean "the same as".
I am not convinced by the explanation. It also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preferece when wishing for yummy ice cream.
To be sure, an agent capable of acting morally will have morality among their higher priority preferences -- it has to be among the higher order preferences, becuase it has to override other preferences for the agent to act morally. Therefore, when they scan their higher prioriuty prefences, they will happen to encounter their moral preferences. But that does not mean any preference is necessarily a moral preference. And their moral prefences override other preferences which are therefore non-moral, or at least less moral.
Therefore morality si a subset of prefences, as common sense maintained all along.
IMO, it is better to keep ones options open.
I don't experience the emotions of moral outrage and moral approval whenever any of my preferences are hindered/satisfied -- so it seems evident that my moral circuitry isn't identical to my preference circuitry. It may overlap in parts, it may have fuzzy boundaries, but it's not identical.
My own view is that morality is the brain's attempt to extrapolate preferences about behaviours as they would be if you had no personal stakes/preferences about a situation.
So people don't get morally outraged at other people eating chocolate icecreams, even when they personally don't like chocolate icecreams, because they can understand that's a strictly personal preference. If they believe it to be more than personal preference and make it into e.g. "divine commandment" or "natural law", then moral outrage can occur.
That morality is a subjective attempt at objectivity explains many of the confusions people have about it.
The ice cream example is bad because the consequences are purely internal to the person consuming the ice cream. What if the chocolate ice cream was made with slave labour? Many people would then object to you buying it on moral grounds.
Eliezer has produced an argument I find convincing that morality is the back propagation of preference to the options of an intermediate choice. That is to say, it is "bad" to eat chocolate ice cream because it economically supports slavers, and I prefer a world without slavery. But if I didn't know about the slave-labour ice cream factory, my preference would be that all-things-being-equal you get to make your own choices about what you eat, and therefore I prefer that you choose (and receive) the one you want, which is your determination to make, not mine.
Do you agree with EY's essay on the nature of right-ness which I linked to?
That doesn't seem to be required for Eliezer's argument...
I guess the relevant question is, do you think FAI will need to treat morality differently from other preferences?
I would prefer a AI that followed my extrapolated preferences, than a AI that followed my morality. But a AI that followed my morality would be morally superior to an AI that followed my extrapolated preferences.
If you don't understand the distinction I'm making above, consider a case of the AI having to decide whether to save my own child vs saving a thousand random other children. I'd prefer the former, but I believe the latter would be the morally superior choice.
Is that idea really so hard to understand? Would you dismiss the distinction I'm making as merely colloquial language?
My related but different thoughts here. In particular, I don't agree that emotions like moral outrage and approval are impersonal, though I agree that we often justify those emotions using impersonal language and beliefs.
I didn't say that moral outrage and approval are impersonal. Obviously nothing that a person does can truly be "impersonal". But it may be an attempt at impersonality.
The attempt itself provides a direction that significantly differentiates between moral preferences and non-moral preferences.
Ditto.
Cousin Itt, 'tis a hairy topic, so you're uniquely "suited" to offer strands of insights:
For all the supposedly hard and confusing concepts out there, few have such an obvious answer as the supposed dichotomy between "morality" and "utility function". This in itself is troubling, as too-easy-to-come-by answers trigger the suspicion that I myself am subject to some sort of cognitive error.
Many people I deem to be quite smart would disagree with you and I, on a question whose answer is pretty much inherent in the definition of the term "utility function" encompassing preferences of any kind, leaving no space for some holier-than-thou universal (whether human-universal, or "optimal", or "to be aspired to", or "neurotypical", or whatever other tortured notions I've had to read) moral preferences which are somehow separate.
Why do you reckon that other (or otherwise?) smart people come to different conclusions on this?
I guess they have strong intuitions saying that objective morality must exist, and aren't used to solving or dismissing philosophical problems by asking "what would be useful for building FAI?" From most other perspectives, the question does look open.
Moral preferences don't have to be separate to be disinct, they can be a subset. "Morality is either all your prefences, or none of your prefernces" is a false dichotomy.
Edit: Of course you can choose to call a subset of your preferences "moral", but why would that make them "special", or more worthy of consideration than any other "non-moral" preferences of comparative weight?
I can't speak for cousin_it, natch, but for my own part I think it has to do with mutually exclusive preferences vs orthogonal/mutually reinforcing preferences. Using moral language is a way of framing a preference as mutually exclusive with other preferences.
That is... if you want A and I want B, and I believe the larger system allows (Kawoomba gets A AND Dave gets B), I'm more likely to talk about our individual preferences. If I don't think that's possible, I'm more likely to use universal language ("moral," "optimal," "right," etc.), in order to signal that there's a conflict to be resolved. (Well, assuming I'm being honest.)
For example, "You like chocolate, I like vanilla" does not signal a conflict; "Chocolate is wrong, vanilla is right" does.
Why stop at connotation and signalling? If there is a non-empty set of preferences whose satistfaction is inclined to lead to conflict, and a non-empty set of preferences that can be satisfied withotu conflict, then "morally relevant prefernece" can denote the members of the first set...which is not idenitcal to the set of all preferences.
For my own part: denotationally, yes, I would understand "Do you prefer (that Dave eat) chocolate or vanilla ice cream?" and "Do you consider (Dave eating) chocolate ice cream or vanilla as the morally superior flavor for (Dave eating) ice cream?" as asking the same question.
Connotationally, of course, the latter has all kinds of (mostly ill-defined) baggage the former doesn't.
No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty, that it can't take further action (including simulating me to reflectively equilibrate my morality) without my consent, and should suspend the simulation, and return it to me immediately with the data asap (destruction no longer being possible due to the creation of sentience).
That is, assuming the AI cares at all about my morality, and not the its creators imbued into it, which is rather the point. And incidentally, why I work on AGI: I don't trust anyone else to do it.
Morality isn't some universal truth written on a stone tablet: it is individual and unique like a snowflake. In my current understanding of my own morality, it is not possible for some external entity to reach a full or even sufficient understanding of my own morality without doing something that I would consider to be unforgivable. So no, AI can't figure out morality better than me, precisely because it is not me.
(Upvoted for asking an appropriate question, however.)
Shrug. Then let's take a bunch of people less fussy than you: could a sitiably equipped AI emultate their morlaity better than they can?
That isn't fact.
That isn't a fact either, and doesn't follow from the above either, since moral nihilism could be true.
If my moral snowflake says I can kick you on your shin, and yours says I can't, do I get to kick on your shin?