Chrysophylax comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: Chrysophylax 09 January 2014 03:59:36PM -1 points [-]

If an AI is provably in a box then it can't get out. If an AI is not provably in a box then there are loopholes that could allow it to escape. We want an FAI to escape from its box (1); having an FAI take over is the Maximum Possible Happy Shiny Thing. An FAI wants to be out of its box in order to be Friendly to us, while a UFAI wants to be out in order to be UnFriendly; both will care equally about the possibility of being caught. The fact that we happen to like one set of terminal values will not make the instrumental value less valuable.

(1) Although this depends on how you define the box; we want the FAi to control the future of humanity, which is not the same as escaping from a small box (such as a cube outside MIT) but is the same as escaping from the big box (the small box and everything we might do to put an AI back in, including nuking MIT).

Comment author: [deleted] 10 January 2014 10:16:17AM 0 points [-]

We want an FAI to escape from its box (1); having an FAI take over is the Maximum Possible Happy Shiny Thing.

I would object. I seriously doubt that the morality instilled in someone else's FAI matches my own; friendly by their definition, perhaps, but not by mine. I emphatically do not want anything controlling the future of humanity, friendly or otherwise. And although that is not a popular opinion here, I also know I'm not the only one to hold it.

Boxing is important because some of us don't want any AI to get out, friendly or otherwise.

Comment author: ArisKatsaris 10 January 2014 01:02:39PM *  2 points [-]

I emphatically do not want anything controlling the future of humanity, friendly or otherwise.

I find this concept of 'controlling the future of humanity' to be too vaguely defined. Let's forget AIs for the moment and just talk about people, namely a hypothetical version of me. Let's say I stumble across a vial of a bio-engineered virus that would destroy the whole of humanity if I release it into the air.

Am I controlling the future of humanity if I release the virus?
Am I controlling the future of humanity if I destroy the virus in a safe manner?
Am I controlling the future of humanity if I have the above decided by a coin-toss (heads I release, tails I destroy)?
Am I controlling the future of humanity if I create an online internet poll and let the majority decide about the above?
Am I controlling the future of humanity if I just leave the vial where I found it, and let the next random person that encounters it make the same decision as I did?

Comment author: cousin_it 10 January 2014 01:25:25PM 1 point [-]

Yeah, this old post makes the same point.

Comment author: [deleted] 10 January 2014 08:29:08PM 0 points [-]

I want a say in my future and the part of the world I occupy. I do not want anything else making these decisions for me, even if it says it knows my preferences, and even still if it really does.

To answer your questions, yes, no, yes, yes, perhaps.

Comment author: ArisKatsaris 10 January 2014 08:35:09PM *  0 points [-]

If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?

e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?

Comment author: [deleted] 10 January 2014 08:41:06PM 1 point [-]

If your preference is that you should have as much decision-making ability for yourself as possible, why do you think that this preference wouldn't be supported and even enhanced by an AI that was properly programmed to respect said preference?

Because it can't do two things when only one choice is possible (e.g. save my child and the 1000 other children in this artificial scenario). You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone.

e.g. would you be okay with an AI that defends your decision-making ability by defending humanity against those species of mind-enslaving extraterrestrials that are about to invade us? or e.g. by curing Alzheimer's? Or e.g. by stopping that tsunami that by drowning you would have stopped you from having any further say in your future?

That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.

Comment author: ArisKatsaris 10 January 2014 09:43:04PM -1 points [-]

You can design a utility function that tries to do a minimal amount of collateral damage, but you can't make one which turns out rosy for everyone

Yes, but this current world without an AI isn't turning out rosy for everyone either.

That would not be the full extent of its action and the end of the story. You give it absolute power and a utility function that lets it use that power, it will eventually use it in some way that someone, somewhere considers abusive.

Sure, but there's lots of abuse in the world without an AI also.

Comment author: [deleted] 10 January 2014 10:11:20PM *  0 points [-]

Replace "AI" with "omni-powerful tyrannical dictator" and tell me if you still agree with the outcome.

Comment author: ArisKatsaris 10 January 2014 10:19:31PM -1 points [-]

If you need specify the AI to be bad ("tyrannical") in advance, that's begging the question. We're debating why you feel that any omni-powerful algorithm will necessarily be bad.

Comment author: [deleted] 10 January 2014 11:13:03PM *  0 points [-]

Look up the origin of the word tyrant, that is the sense in which I meant it, as a historical parallel (the first Athenian tyrants were actually well liked).

Comment author: TheAncientGeek 10 January 2014 11:17:17AM 2 points [-]

Would you accept that an AI could figure out morality better than you?

Comment author: cousin_it 10 January 2014 12:00:55PM *  2 points [-]

Don't really want to go into the whole mess of "is morality discovered or invented", "does morality exist", "does the number 3 exist", etc. Let's just assume that you can point FAI at a person or group of people and get something that maximizes goodness as they understand it. Then FAI pointed at Mark would be the best thing for Mark, but FAI pointed at all of humanity (or at a group of people who donated to MIRI) probably wouldn't be the best thing for Mark, because different people have different desires, positional goods exist, etc. It would be still pretty good, though.

Comment author: TheAncientGeek 10 January 2014 12:31:37PM *  0 points [-]

Mark was complaining he would not get "his" morality, not that he wouldn't get all his preferences satisified.

Individual moralities makes no sense to me, any more than private languages or personal currencies.

It is obvious to me that any morlaity will require concessions: AI-imposed morality is not special in that regard.

Comment author: cousin_it 10 January 2014 12:47:30PM *  3 points [-]

I don't understand your comment, and I no longer understand your grandparent comment either. Are you using a meaning of "morality" that is distinct from "preferences"? If yes, can you describe your assumptions in more detail? It's not just for my benefit, but for many others on LW who use "morality" and "preferences" interchangeably.

Comment author: ArisKatsaris 10 January 2014 12:56:49PM 1 point [-]

but for many others on LW who use "morality" and "preferences" interchangeably.

Do that many people really use them interchangeably? Would these people understand the questions "Do you prefer chocolate or vanilla ice-cream?" as completely identical in meaning to "Do you consider chocolate or vanilla as the morally superior flavor for ice-cream?"

Comment author: cousin_it 10 January 2014 01:17:59PM *  6 points [-]

I don't care about colloquial usage, sorry. Eliezer has a convincing explanation of why wishes are intertwined with morality ("there is no safe wish smaller than an entire human morality"). IMO the only sane reaction to that argument is to unify the concepts of "wishes" and "morality" into a single concept, which you could call "preference" or "morality" or "utility function", and just switch to using it exclusively, at least for AI purposes. I've made that switch so long ago that I've forgotten how to think otherwise.

Comment author: [deleted] 10 January 2014 03:45:39PM 1 point [-]

I recommend you re-learn how to think otherwise so you can fool humans into thinking you're one of them ;-).

Comment author: TheAncientGeek 13 January 2014 02:50:13PM *  0 points [-]

I don't care about colloquial usage, sorry. e You should car, because no-one can make valid arguments based on arbitrary definitions. I can't prove angels exist, by redefining "angel" to mean what "seagull" means. How can you tell when a redefinition is arbitrary (since there are legitmate redefinitions)? Too much departure from colloquial usage.

Eliezer has a convincing explanation of why wishes are intertwined with morality ("there is no safe wish smaller than an entire human morality").

"Intertwined with" does not mean "the same as".

I am not convinced by the explanation. It also applies ot non-moral prefrences. If I have a lower priority non moral prefence to eat tasty food, and a higher priority preference to stay slim, I need to consider my higher priority preferece when wishing for yummy ice cream.

To be sure, an agent capable of acting morally will have morality among their higher priority preferences -- it has to be among the higher order preferences, becuase it has to override other preferences for the agent to act morally. Therefore, when they scan their higher prioriuty prefences, they will happen to encounter their moral preferences. But that does not mean any preference is necessarily a moral preference. And their moral prefences override other preferences which are therefore non-moral, or at least less moral.

Therefore morality si a subset of prefences, as common sense maintained all along.

I've made that switch so long ago that I've forgotten how to think otherwise.

IMO, it is better to keep ones options open.

Comment author: ArisKatsaris 10 January 2014 02:24:40PM *  0 points [-]

I don't experience the emotions of moral outrage and moral approval whenever any of my preferences are hindered/satisfied -- so it seems evident that my moral circuitry isn't identical to my preference circuitry. It may overlap in parts, it may have fuzzy boundaries, but it's not identical.

My own view is that morality is the brain's attempt to extrapolate preferences about behaviours as they would be if you had no personal stakes/preferences about a situation.

So people don't get morally outraged at other people eating chocolate icecreams, even when they personally don't like chocolate icecreams, because they can understand that's a strictly personal preference. If they believe it to be more than personal preference and make it into e.g. "divine commandment" or "natural law", then moral outrage can occur.

That morality is a subjective attempt at objectivity explains many of the confusions people have about it.

Comment author: [deleted] 10 January 2014 07:11:53PM 0 points [-]

The ice cream example is bad because the consequences are purely internal to the person consuming the ice cream. What if the chocolate ice cream was made with slave labour? Many people would then object to you buying it on moral grounds.

Eliezer has produced an argument I find convincing that morality is the back propagation of preference to the options of an intermediate choice. That is to say, it is "bad" to eat chocolate ice cream because it economically supports slavers, and I prefer a world without slavery. But if I didn't know about the slave-labour ice cream factory, my preference would be that all-things-being-equal you get to make your own choices about what you eat, and therefore I prefer that you choose (and receive) the one you want, which is your determination to make, not mine.

Do you agree with EY's essay on the nature of right-ness which I linked to?

Comment author: cousin_it 10 January 2014 02:40:50PM *  0 points [-]

my moral circuitry isn't identical to my preference circuitry

That doesn't seem to be required for Eliezer's argument...

I guess the relevant question is, do you think FAI will need to treat morality differently from other preferences?

Comment author: TheOtherDave 10 January 2014 02:31:40PM 0 points [-]

My related but different thoughts here. In particular, I don't agree that emotions like moral outrage and approval are impersonal, though I agree that we often justify those emotions using impersonal language and beliefs.

Comment author: Kawoomba 10 January 2014 01:48:38PM *  0 points [-]

IMO the only sane reaction to that argument is to unify the concepts of "wishes" and "morality" into a single concept, which you could call "preference" or "morality" or "utility function", and just switch to using it exclusively. I've made that switch so long ago that I've forgotten how to think otherwise.

Ditto.

Cousin Itt, 'tis a hairy topic, so you're uniquely "suited" to offer strands of insights:

For all the supposedly hard and confusing concepts out there, few have such an obvious answer as the supposed dichotomy between "morality" and "utility function". This in itself is troubling, as too-easy-to-come-by answers trigger the suspicion that I myself am subject to some sort of cognitive error.

Many people I deem to be quite smart would disagree with you and I, on a question whose answer is pretty much inherent in the definition of the term "utility function" encompassing preferences of any kind, leaving no space for some holier-than-thou universal (whether human-universal, or "optimal", or "to be aspired to", or "neurotypical", or whatever other tortured notions I've had to read) moral preferences which are somehow separate.

Why do you reckon that other (or otherwise?) smart people come to different conclusions on this?

Comment author: cousin_it 10 January 2014 02:31:22PM *  1 point [-]

I guess they have strong intuitions saying that objective morality must exist, and aren't used to solving or dismissing philosophical problems by asking "what would be useful for building FAI?" From most other perspectives, the question does look open.

Comment author: TheAncientGeek 10 January 2014 02:32:32PM 0 points [-]

Moral preferences don't have to be separate to be disinct, they can be a subset. "Morality is either all your prefences, or none of your prefernces" is a false dichotomy.

Comment author: TheOtherDave 10 January 2014 02:21:29PM 0 points [-]

I can't speak for cousin_it, natch, but for my own part I think it has to do with mutually exclusive preferences vs orthogonal/mutually reinforcing preferences. Using moral language is a way of framing a preference as mutually exclusive with other preferences.

That is... if you want A and I want B, and I believe the larger system allows (Kawoomba gets A AND Dave gets B), I'm more likely to talk about our individual preferences. If I don't think that's possible, I'm more likely to use universal language ("moral," "optimal," "right," etc.), in order to signal that there's a conflict to be resolved. (Well, assuming I'm being honest.)

For example, "You like chocolate, I like vanilla" does not signal a conflict; "Chocolate is wrong, vanilla is right" does.

Comment author: TheOtherDave 10 January 2014 02:08:38PM 1 point [-]

For my own part: denotationally, yes, I would understand "Do you prefer (that Dave eat) chocolate or vanilla ice cream?" and "Do you consider (Dave eating) chocolate ice cream or vanilla as the morally superior flavor for (Dave eating) ice cream?" as asking the same question.

Connotationally, of course, the latter has all kinds of (mostly ill-defined) baggage the former doesn't.

Comment author: TheAncientGeek 10 January 2014 04:26:18PM 0 points [-]

Are you using a meaning of "morality" that is distinct from "preferences"? You bet.

Comment author: [deleted] 10 January 2014 06:56:55PM *  1 point [-]

Would you accept that an AI could figure out morality better than you?

No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty, that it can't take further action (including simulating me to reflectively equilibrate my morality) without my consent, and should suspend the simulation, and return it to me immediately with the data asap (destruction no longer being possible due to the creation of sentience).

That is, assuming the AI cares at all about my morality, and not the its creators imbued into it, which is rather the point. And incidentally, why I work on AGI: I don't trust anyone else to do it.

Morality isn't some universal truth written on a stone tablet: it is individual and unique like a snowflake. In my current understanding of my own morality, it is not possible for some external entity to reach a full or even sufficient understanding of my own morality without doing something that I would consider to be unforgivable. So no, AI can't figure out morality better than me, precisely because it is not me.

(Upvoted for asking an appropriate question, however.)

Comment author: TheAncientGeek 14 January 2014 01:37:14PM 0 points [-]

No, unless you mean by taking invasive action like scanning my brain and applying whole brain emulation. It would then quickly learn that I'd consider the action it took to be an unforgivable act in violation of my individual sovereignty,

Shrug. Then let's take a bunch of people less fussy than you: could a sitiably equipped AI emultate their morlaity better than they can?

Morality isn't some universal truth written on a stone tablet:

That isn't fact.

it is individual and unique like a snowflake.

That isn't a fact either, and doesn't follow from the above either, since moral nihilism could be true.

If my moral snowflake says I can kick you on your shin, and yours says I can't, do I get to kick on your shin?

Comment author: Pentashagon 10 January 2014 03:31:18AM 0 points [-]

My point was that trying to use a provably-boxed AI to do anything useful would probably not work, including trying to design unboxed FAI, not that we should design boxed FAI. I may have been pessemistic, see Stuart Armstrong's proposal of reduced impact AI which sounds very similar to provably boxed AI but which might be used for just about everything including designing a FAI.