Comment author: kjmiller 08 October 2011 03:05:07AM *  6 points [-]

You can construct a set of values and a utility function to fit your observed behavior, no matter how your brain produces that behavior.

I'm deeply hesitant to jump into a debate that I don't know the history of, but...

Isn't it pretty generally understood that this is not true? The Utility Theory folks showed that behavior of an agent can be captured by a numerical utility function iff the agent's preferences conform to certain axioms, and Allais and others have shown that human behavior emphatically does not.

Seems to me that if human behavior were in general able to be captured by a utility function, we wouldn't need this website. We'd be making the best choices we could, given the information we had, to maximize our utility, by definition. In other words, "instrumental rationality" would be easy and automatic for everyone. It's not, and it seems to me a big part of what we can do to become more rational is try and wrestle our decision-making algorithms around until the choices they make are captured by some utility function. In the meantime, the fact that we're puzzled by things like moral dilemmas looks like a symptom of irrationality.

Comment author: TimFreeman 09 October 2011 05:20:16AM *  5 points [-]

The Utility Theory folks showed that behavior of an agent can be captured by a numerical utility function iff the agent's preferences conform to certain axioms, and Allais and others have shown that human behavior emphatically does not.

A person's behavior can always be understood as optimizing a utility function, it just that if they are irrational (as in the Allais paradox) the utility functions start to look ridiculously complex. If all else fails, a utility function can be used that has a strong dependency on time in whatever way is required to match the observed behavior of the subject. "The subject had a strong preference for sneezing at 3:15:03pm October 8, 2011."

From the point of view of someone who wants to get FAI to work, the important question is, if the FAI does obey the axioms required by utility theory, and you don't obey those axioms for any simple utility function, are you better off if:

  • the FAI ascribes to you some mixture of possible complex utility functions and helps you to achieve that, or

  • the FAI uses a better explanation of your behavior, perhaps one of those alternative theories listed in the wikipedia article, and helps you to achieve some component of that explanation?

I don't understand the alternative theories well enough to know if the latter option even makes sense.

In response to A Rationalist's Tale
Comment author: [deleted] 09 September 2011 02:26:07PM 2 points [-]

Well written and thought out. Thanks for sharing. I have a very similar story, for me it has meant a dramatic difference in the way that I approach life. Before my rejection of faith, I was plagued by a feeling of impending doom. I carried the world on my shoulders as if the fate of all these "souls" relied upon my efforts. I was continually depressed and struggling with negative emotions and thoughts. It turns out it can be rather upsetting and extremely difficult for a person who thinks and asks questions to maintain a relationship with a being who never reciprocates the effort. After 6 years of slowly picking my way through all of the bogus arguments in favor of faith, I began to go through the typical winnowing process (christian pluralist, pluralist, agnostic, indifferent, atheist). Stumbling upon Sagan's The Demon Haunted World started me on a process of rearranging my way of thinking and processing information. Truth be told the gloom of life is gone, I am free to accept reality and embrace it. In rejecting fantasy and embracing reality, I have come to find that the route to piece of mind is not divine favor from an invisible deity, but to reasonably approach disagreeable circumstances with calm thought and reason. Honestly I have never looked back.

In response to comment by [deleted] on A Rationalist's Tale
Comment author: TimFreeman 30 September 2011 12:25:12AM 6 points [-]

Before my rejection of faith, I was plagued by a feeling of impending doom.

I was a happy atheist until I learned about the Friendly AI problem and estimated the likely outcome. I am now plagued by a feeling of impending doom.

Comment author: Vaniver 23 August 2011 10:42:26PM -1 points [-]

If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome.

That's not the situation I'm describing; if 0 is "you and all your friends and relatives getting tortured to death" and 1 is "getting everything you want," the utility monster is someone who puts "not getting one thing I want" at, say, .1 whereas normal people put it at .9999.

I think it can, in principle, estimate utilities from behavior.

And if humans turn out to be adaption-executers, then utility is going to look really weird, because it'll depend a lot on framing and behavior.

The problems I'm aware of have to do with creating new people.

How do you add two utilities together? If you can't add, how can you average?

As I said, because maximizing average utility seems to get a reasonable result in that case.

If people dislike losses more than they like gains and status is zero-sum, does that mean the reasonable result of average utilitarianism when applied to status is that everyone must be exactly the same status?

Comment author: TimFreeman 23 September 2011 08:11:50PM *  1 point [-]

If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome.

That's not the situation I'm describing; if 0 is "you and all your friends and relatives getting tortured to death" and 1 is "getting everything you want," the utility monster is someone who puts "not getting one thing I want" at, say, .1 whereas normal people put it at .9999.

You have failed to disagree with me. My proposal exactly fits your alleged counterexample.

Suppose Alice is a utility monster where:

  • U(Alice, torture of everybody) = 0
  • U(Alice, everything) = 1
  • U(Alice, no cookie) = 0.1
  • U(Alice, Alice dies) = 0.05

And Bob is normal, except he doesn't like Alice:

  • U(Bob, torture of everybody) = 0
  • U(Bob, everything) = 1
  • U(Bob, Alice lives, no cookie) = 0.8
  • U(Bob, Alice dies, no cookie) = 0.9

If the FAI has a cookie it can give to Bob or Alice, it will give it to Alice, since U(cookie to Bob) = U(Bob, everything) + U(Alice, everything but a cookie) = 1 + 0.1 = 1.1 < U(cookie to Alice) = U(Bob, everything but a cookie) + U(Alice, everything) = 0.8 + 1 = 1.8. Thus Alice gets her intended reward for being a utility monster.

However, if the are no cookies available and the FAI can kill Alice, it will do so for the benefit of Bob, since U(Bob, Alice lives, no cookie) + U(Alice, Alice lives, no cookie) = 0.8 + 0.1 = 0.9 < U(Bob, Alice dies, no cookie) + U(Alice, Alice dies) = 0.9 + 0.05 = 0.95. The basic problem is that since Alice had the cookie fixation, that ate up so much of her utility range that her desire to live in the absence of the cookie was outweighed by Bob finding her irritating.

Another problem with Alice's utility is that it supports the FAI doing lotteries that Alice would apparently prefer but a normal person would not. For example, assuming the outcome for Bob does not change, the FAI should prefer 50% Alice dies + 50% Alice gets a cookie (adds to 0.525) over 100% Alice lives without a cookie (which is 0.1). This is a different issue from interpersonal utility comparison.

How do you add two utilities together?

They are numbers. Add them.

And if humans turn out to be adaption-executers, then utility is going to look really weird, because it'll depend a lot on framing and behavior.

Yes. So far as I can tell, if the FAI is going to do what people want, it has to model people as though they want something, and that means ascribing utility functions to them. Better alternatives are welcome. Giving up because it's a hard problem is not welcome.

If people dislike losses more than they like gains and status is zero-sum, does that mean the reasonable result of average utilitarianism when applied to status is that everyone must be exactly the same status?

No. If Alice has high status and Bob has low status, and the FAI takes action to lower Alice's status and raise Bob's, and people hate losing, then Alice's utility decrease will exceed Bob's utility increase, so the FAI will prefer to leave the status as it is. Similarly, the FAI isn't going to want to increase Alice's status at the expense of Bob. The FAI just won't get involved in the status battles.

I have not found this conversation rewarding. Unless there's an obvious improvement in the quality of your arguments, I'll drop out.

Edit: Fixed the math on the FAI-kills-Alice scenario. Vaniver continued to change the topic with every turn, so I won't be continuing the conversation.

In response to Moral enhancement
Comment author: TimFreeman 20 September 2011 03:59:32AM 6 points [-]

There seems to be an assumption here that empathy leads to morality. Sometimes, at least, empathy leads to being jerked around by the stupid goals of others instead of pursuing your own stupid goals, and in this case it's not all that likely to lead to something fitting any plausible definition of "moral behavior". Chogyam Trungpa called this "idiot compassion".

Thus it's important to distinguish caring about humanity as a whole from caring about individual humans. I read some of the links in the OP and did not see this distinction mentioned.

Comment author: feanor1600 19 September 2011 07:59:07PM 7 points [-]

Academics may be relevant as a similar "thinking community". I've heard many academics say they are severe procrastinators. Possible reasons for this are: 1) Procrastinators are attracted to the job, its independence and long-term deadlines 2) The nature of the work makes people procrastinate; research is hard, plus no boss and long-term deadlines mean immediate punishment for procrastination is rare 3) The job makes people feel they have akrasia even when they don't, perhaps because colleagues and competitors seem smarter and harder-working than in other fields

If nothing else, reading LW makes me feel 3)

Comment author: TimFreeman 19 September 2011 09:22:45PM 0 points [-]

I procrastinated when in academia, but did not feel particularly attracted to the job, so option 1 is not always true. Comparison with people not in academia makes it seem that option 3 is not true for me either.

Comment author: TimFreeman 19 September 2011 08:46:24PM *  5 points [-]

More questions to perhaps add:

What is self-modification? (In particular, does having one AI build another bigger and more wonderful AI while leaving "itself" intact count as self-modification? The naive answer is "no", but I gather the informed answer is "yes", so you'll want to clarify this before using the term.)

What is wrong with the simplest decision theory? (That is, enumerate the possible actions and pick the one for which the expected utility of the outcome is best. I'm not sure what the standard name for that is.) It's important to answer this so at some point you state the problem that timeless decision theory etc. are meant to solve.

I gather one of the problems with the simplest decision theory is that it gives the AI an incentive to self-modify under certain circumstances, and there's a perceived need for the AI to avoid routine self-modification. The FAQ question might be "How can we avoid giving the AI an incentive to self-modify?" and perhaps "What are the risks of allowing the AI to self-modify?"

What problem is solved by extrapolation? (This goes in the CEV section.)

What are the advantages and disadvantages of having a bounded utility function?

Can we just upload a moral person? (In the "Need for FAI" section. IMO the answer is a clear "no".)

I suggest rephrasing "What powers might it have?" in 1.10 to "What could we reasonably expect it to be able to do?". The common phrase "magical powers" gives the word "powers" undesired connotations in this context, makes us sound like loonies.

Comment author: Vaniver 18 August 2011 04:47:19PM 0 points [-]

For example, if it develops some diet drug that lets you safely enjoy eating and still stay skinny and beautiful, that might be a better result than you could provide for yourself, and it doesn't need any special understanding of you to make that happen.

It might not need special knowledge of my psychology, but it certainly needs special knowledge of my physiology.

But notice that the original point was about human preferences. Even if it provides new technologies that dissolve internal conflicts, the question of whether or not to use the technology becomes a conflict. Remember, we live in a world where some people have strong ethical objections to vaccines. An old psychological finding is that oftentimes, giving people more options makes them worse off. If the AI notices that one of my modules enjoys sensory pleasure, offers to wirehead me, and I reject it on philosophical grounds, I could easily become consumed by regret or struggles with temptation, and wish that I never had been offered wireheading in the first place.

Putting an inferior argument first is good if you want to try to get the last word, but it's not a useful part of problem solving. You should try to find the clearest problem where solving that problem solves all the other ones.

I put the argument of internal conflicts first because it was the clearest example, and you'll note it obliquely refers to the argument about status. Did you really think that, if a drug were available to make everyone have perfectly sculpted bodies, one would get the same social satisfaction from that variety of beauty?

If it can do a reasonable job of comparing utilities across people, then maximizing average utility seems to do the right thing here.

I doubt it can measure utilities; as I argued two posts ago, and simple average utilitarianism is so wracked with problems I'm not even sure where to begin.

Comparing utilities between arbitrary rational agents doesn't work, but comparing utilities between humans seems to -- there's an approximate universal maximum (getting everything you want) and an approximate universal minimum (you and all your friends and relatives getting tortured to death).

A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don't get their way. How should such real-life utility monsters be dealt with?

Status conflicts are not one of the interesting use cases.

Why do you find status uninteresting?

Comment author: TimFreeman 23 August 2011 08:28:59PM 0 points [-]

A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don't get their way. How should such real-life utility monsters be dealt with?

If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome.

I doubt it can measure utilities

I think it can, in principle, estimate utilities from behavior. See http://www.fungible.com/respect.

simple average utilitarianism is so wracked with problems I'm not even sure where to begin.

The problems I'm aware of have to do with creating new people. If you assume a fixed population and humans who have comparable utilities as described above, are there any problems left? Creating new people is a more interesting use case than status conflicts.

Why do you find status uninteresting?

As I said, because maximizing average utility seems to get a reasonable result in that case.

Comment author: Vaniver 17 August 2011 02:28:35AM 0 points [-]

It's understanding of you doesn't have to be more rigorous than your understanding of you.

It does if I want it to give me results any better than I can provide for myself. I also provided the trivial example of internal conflicts- external conflicts are much more problematic. Human desire for status is possibly the source of all human striving and accomplishment. How will a FAI deal with the status conflicts that develop?

Comment author: TimFreeman 18 August 2011 03:51:14AM 0 points [-]

It's understanding of you doesn't have to be more rigorous than your understanding of you.

It does if I want it to give me results any better than I can provide for myself.

No. For example, if it develops some diet drug that lets you safely enjoy eating and still stay skinny and beautiful, that might be a better result than you could provide for yourself, and it doesn't need any special understanding of you to make that happen. It just makes the drug, makes sure you know the consequences of taking it, and offers it to you. If you choose take it, that tells the AI more about your preferences, but there's no profound understanding of psychology required.

I also provided the trivial example of internal conflicts- external conflicts are much more problematic.

Putting an inferior argument first is good if you want to try to get the last word, but it's not a useful part of problem solving. You should try to find the clearest problem where solving that problem solves all the other ones.

How will a FAI deal with the status conflicts that develop?

If it can do a reasonable job of comparing utilities across people, then maximizing average utility seems to do the right thing here. Comparing utilities between arbitrary rational agents doesn't work, but comparing utilities between humans seems to -- there's an approximate universal maximum (getting everything you want) and an approximate universal minimum (you and all your friends and relatives getting tortured to death). Status conflicts are not one of the interesting use cases. Do you have anything better?

Comment author: Vaniver 14 August 2011 09:43:22PM 2 points [-]

Can you attempt to state an argument for that?

Sure. I think I should clarify first that I meant evo psych should have been sufficient to realize that human preferences are not rigorously coherent. If I tell a FAI to make me do what I want to do, its response is going to be "which you?", as there is no Platonic me with a quickly identifiable utility function that it can optimize for me. There's just a bunch of modules that won the evolutionary tournament of survival because they're a good way to make grandchildren.

If I am conflicted between the emotional satisfaction of food and the emotional dissatisfaction of exercise combined with the social satisfaction of beauty, will a FAI be able to resolve that for me any more easily than I can resolve it?

If my far mode desires are rooted in my desire to have a good social identity, should the FAI choose those over my near mode desires which are rooted in my desire to survive and enjoy life?

In some sense, the problem of FAI is the problem of rigorously understanding humans, and evo psych suggests that will be a massively difficult problem. That's what I was trying to suggest with my comment.

Comment author: TimFreeman 16 August 2011 05:57:37PM 0 points [-]

In some sense, the problem of FAI is the problem of rigorously understanding humans, and evo psych suggests that will be a massively difficult problem.

I think that bar is unreasonably high. If you have conflict between enjoying eating a lot vs being skinny and beautiful, and the FAI helps you do one or the other, then you aren't in a position to complain that it did the wrong thing. It's understanding of you doesn't have to be more rigorous than your understanding of you.

Comment author: kragensitaker 13 August 2011 03:48:32AM 2 points [-]

Oh, thank you! I didn't realize that. Perhaps a process could be developed? For example, maybe you could chill the body rapidly to organ-donation temperatures, garrote the neck, extract the organs while maintaining head blood pressure with the garrote, then remove the head and connect perfusion apparatus to it?

Comment author: TimFreeman 13 August 2011 08:56:10PM 8 points [-]

For example, maybe you could chill the body rapidly to organ-donation temperatures, garrote the neck,..

It's worse than I said, by the way. If the patient is donating kidneys and is brain dead, the cryonics people want the suspension to happen as soon as possible to minimize further brain damage. The organ donation people want the organ donation to happen when the surgical team and recipient are ready, so there will be conflict over the schedule.

In any case, the fraction of organ donors is small, and the fraction of cryonics cases is much smaller, and the two groups do not have a history of working with each other. Thus even if the procedure is technically possible, I don't know of an individual who would be interested in developing the hybrid procedure. There's lots of other stuff that is more important to everyone involved.

View more: Prev | Next