Oracle AI: Human beliefs vs human values
It seems that if we can ever define the difference between human beliefs and values, we could program a safe Oracle by requiring it to maximise the accuracy of human beliefs on a question, while keeping human values fixed (or very little changing). Plus a whole load of other constraints, as usual, but that might work for a boxed Oracle answering a single question.
This is a reason to suspect it will not be easy to distinguish human beliefs and values ^_^
How do humans assign utilities to world states?
It seems like a good portion of the whole "maximizing utility" strategy which might be used by a sovereign relies on actually being able to consolidate human preferences into utilities. I think there are a few stages here, each of which may present obstacles. I'm not sure what the current state of the art is with regard to overcoming these, and am curious regarding such.
First, here are a few assumptions that I'm using just to make the problem a bit more navigable (dealing with one or two hard problems instead of a bunch at once) - will need to go back and do away with each of these (and each combination thereof) and see what additional problems result.
- The sovereign has infinite computing power (and to shorten the list of assumptions, can do 2-6 below)
- We're maximizing across the preferences of a single human (Alice for convenience). To the extent that Alice cares about others, we're accounting for their preferences, too. But we're not dealing with aggregating preferences across different sentient beings, yet. I think this is a separate hard problem.
- Alice has infinite computing power.
- We're assuming that Alice's preferences do not change and cannot change, ever, no matter what happens. So as Alice experiences different things in her life, she has the exact same preferences. No matter what she learns or concludes about the world, she has the exact same preferences. To be explicit, this includes preferences regarding the relative weightings of present and future worldstates. (And in CEV terms, no spread, no distance.)
- We're assuming that Alice (and the sovereign) can deductively conclude the future from the present, given a particular course of action by the sovereign. Picture a single history of the universe from the beginning of the universe to now, and a bunch of worldlines running into the future depending on what action the sovereign takes. To clarify, if you ask Alice about any single little detail across any of the future worldlines, she can tell you that detail.
- Alice can read minds and the preferences of other humans and sentient beings (implied by 5, but trying to be explicit.)
So Alice can conclude anything and everything, pretty much (and so can our sovereign.) The sovereign is faced with the problem of figuring out what action to take to maximize across Alice's preferences. However, Alice is basically a sack of meat that has certain emotions in response to certain experiences or certain conclusions about the world, and it doesn't seem obvious how to get the preference ordering of the different worldlines out of these emotions. Some difficulties:
- The sovereign notices that Alice experiences different feelings in response to different stimuli. How does the sovereign determine which types of feelings to maximize, and which to minimize? There are a bunch of ways to deal with this, but most of them seem to have a chance of error (and the conjunction of p(error) across all the times that the sovereign will need to do this approach 1). For example, could train off an existing data set, could have it simulate other humans with access to Alice's feelings and cognition and have a simulated committee discuss and reach a decision on each one, etc etc. But all of these bootstrap off of the assumed ability of humans to determine which feelings to maximize (just with amped up computing power) - this doesn't strike me as a satisfactory solution.
- Assume 1. is solved. The sovereign knows which feelings to maximize. However, it's ended up with a bunch of axes. How does it determine the appropriate trade-offs to make? (Or, to put it another way, how does it determine the relative value of different positions along each axis with different positions along different axes?)
So, to rehash my actual request: what's the state of the art with regards to these difficulties, and how confident are we that we've reached a satisfactory answer?
LessWrong Experience of Flavours
Following on from:
- http://lesswrong.com/lw/m2r/lesswrong_experience_on_alcohol/ and
- http://lesswrong.com/r/discussion/lw/m3j/tally_of_lesswrong_experience_on_alcohol/
I would like to ask for other people's experience of flavours. I am dividing food into significant categories that I can think of. I don't really like the 5 tastes categories for this task, but I am aware of them. This post is meant to be about taste preference although it might end up about dietary preferences.
LessWrong experience on Alcohol
following on from this thread:
http://lesswrong.com/lw/m14/id_like_advice_from_lw_regarding_migraines/c9kr?context=3
User Algon asked:
I don't drink alcohol, but is it really all that? I just assumed that most people have alcoholic beverages for the 'buzz'/intoxication.
I related my experience:
I have come to the conclusion that I taste things differently to a large subset of the population. I have a very sweet tooth and am very sensitive to bitter flavours.
I don't eat olives, most alcohol only tastes like the alcoholic aftertaste (which apparently some people don't taste) - imagine the strongest burning taste of the purest alcohol you have tasted, some people never taste that, I taste it with nearly every alcoholic beverage. Beer is usually awfully bitter too.
The only wine I could ever bother to drink is desert wine (its very sweet) and only slowly. (or also a half shot of rum and maple syrup)
Having said all this - yes; some people love their alcoholic beverages for their flavours.
-----------
I am wondering what the sensory experience of other LW users is of alcohol. Do you drink (if not why not?)? Do you have specific preferences? Do you have a particular pallet for foods (probably relevant)?
I hypothesise a lower proportion of drinkers than the rest of the population. (subject of course to cultural norms where you come from)
----------
Edit: I will make another post in a week about taste preferences because (as we probably already know) human tastes vary. I did want to mention that I avoid spicy things except for sweet chilli which is not spicy at all. And I don't drink coffee (because it tastes bad and I am always very awake and never need caffeine to wake me up). I am also quite sure I am a super-taster but wanted to not use that word for concern that the jargon might confuse people who don't yet know about it.
Thanks for all the responses! This has been really interesting and exactly what I expected (number of posts)!
In regards to experiences, I would mention that heavy drinking is linked with nearly every health problem you could think of and I am surprised we had a selection of several heavy drinkers (to those who are heavy drinkers I would suggest reading about the health implications and reconsidering the lifestyle, it sounds like most of you are not addicted). about the heavy drinkers - I suspect that is not representative of average, but rather the people who feel they are outliers decided to mention their cases (of people who did not reply; there are probably none or very few heavy drinkers, whereas there are probably some who did not reply and are light drinkers or did not reply and don't drink).
I hope to reply to a bunch of the comments and should get to it in the next few days.
Thank you again! Maybe this should be included on the next survey...
Edit 2: follow up post -http://lesswrong.com/r/discussion/lw/m3j/tally_of_lesswrong_experience_on_alcohol/
[link] Thoughts on defining human preferences
https://docs.google.com/document/d/1jDGpIT3gKZQZByO6A036dojRKMv62KEDEfEz87VuDoY/
Abstract: Discussion of how we might want to define human preferences, particularly in the context of building an AI intended to learn and implement those preferences. Starts with actual arguments about the applicability of the VNM utility theorem, then towards the end gets into hypotheses that are less well defended but possibly more important. At the very end, suggests that current hypothesizing about AI safety might be overemphasizing “discovering our preferences” over “creating our preferences”.
Multiverse-Wide Preference Utilitarianism
Summary
Some preference utilitarians care about satisfaction of preferences even when the organism with the preference doesn't know that it has been satisfied. These preference utilitarians should care to some degree about the preferences that people in other branches of our multiverse have regarding our own world, as well as the preferences of aliens regarding our world. In general, this suggests that we should give relatively more weight to tastes and values that we expect to be more universal among civilizations across the multiverse. This consideration is strongest in the case of aesthetic preferences about inanimate objects and is weaker for preferences about organisms that themselves have experiences.
Introduction
Classical utilitarianism aims to maximize the balance of happiness over suffering for all organisms. Preference utilitarianism focuses on fulfillment vs. frustration of preferences, rather than just at hedonic experiences. So, for example, if someone has a preference for his house to go to his granddaughter after his death, then it would frustrate his preference if it instead went to his grandson, even though he wouldn't be around to experience negative emotions due to his preference being thwarted.
Non-hedonic preferences
In practice, most of people's preferences concern their own hedonic wellbeing. Some also concern the wellbeing of their children and friends, although often these preferences are manifested through direct happiness or suffering in oneself (e.g., being on the edge of your seat with anxiety when your 14-year-old daughter hasn't come home by midnight).
However, some preferences are beyond hedonic experience by oneself. This is true of preferences about how the world will be after one dies, or whether the money you donated to that charity actually gets used well even if you wouldn't find out either way. It's true of many moral convictions. For instance, I want to actually reduce expected suffering rather than hook up to a machine that makes me think I reduced expected suffering and then blisses me out for the rest of my life. It's also true of some aesthetic preferences, such as the view that it would be good for art, music, and knowledge to exist even if no one was around to experience them.
Certainly these non-hedonic preferences have hedonic effects. If I learned that I was going to be hooked up to a machine that would erase my moral convictions and bliss me out for the rest of my life, I would feel upset in the short run. However, almost certainly this aversive feeling would be outweighed by my pleasure and lack of suffering in the long run. So my preference conflicts with egoistic hedonism in this case. (My preference not to be blissed out is consistent with hedonistic utilitarianism, rather than hedonistic egoism, but hedonistic utilitarianism is a kind of moral system that exists outside the realm of hedonic preferences of an individual organism.)
Because preference utilitarians believe that preference violations can be harmful even if they aren't accompanied by negative hedonic experience, there are some cases in which doing something that other people disapprove of is bad even if they never find out. For example, Muslims strongly oppose defacing the Quran. This means that, barring countervailing factors, it would be prima facie bad to deface a Quran in the privacy of your own home even if no one else knew about it.
Tyranny of the majority?
People sometimes object to utilitarianism on the grounds that it might allow for tyranny of the majority. This seems especially possible for preference utilitarianism, when considering preferences regarding the external world that don't directly affect a person's hedonic experience. For example, one might fear that if large numbers of people have a preference against gay sex, then even if these people are not emotionally affected by what goes on in the privacy of others' bedrooms, their preference against those private acts might still matter appreciably.
As a preliminary comment, I should point out that preference utilitarianism typically optimizes idealized preferences rather than actual preferences. What's important is not what you think you want but what you would actually want if you were better informed, had greater philosophical reflectiveness, etc. While there are strong ostensible preferences against gay sex in the world, it's less clear that there are strong idealized preferences against it. It's plausible that many gay opponents would come to see that (safe) gay sex is actually a positive expression of pleasure and love rather than something vile.
But let's ignore this for the moment and suppose that most people really did have idealized preferences against gay sex. In fact, let's suppose the world consists of N+2 people, two of whom are gay and would prefer to have sex with each other, and the other N of whom have idealized preferences opposing gay sex. If N is very large, do we have tyranny of the majority, according to which it's bad for the two gays to have sex?
This is a complicated question that involves more subtlety than it may seem. Even if the direct preference summation came out against gay sex, it might still be better to allow it for other reasons. For instance, maybe at a meta level, a more libertarian stance on social issues tends to produce better outcomes in the long run. Maybe allowing gay sex increases people's tolerance, leading to a more positive society in the future. And so on. But for now let's consider just the direct preference summation: Does the balance of opposition to gay sex exceed the welfare of the gay individuals themselves?
This answer isn't clear, and it depends how you weigh the different preferences. Intuitively it seems obvious that for large enough N, N people opposed to gay sex can trump two people who prefer it. On the other hand, that's less clear if we look at the matter from the perspective of scaled utility functions.
- Suppose unrealistically that the only thing the N anti-gay people care about is preventing gay sex. In particular, they're expected-gay-sex minimizers, who consider each act of gay sex as bad as another and aim to minimize the total amount that happens. The best possible world (normalized utility = 1) is one where no gay sex happens. The worst possible world (normalized utility = 0) is one where all N+2 people have gay sex. The world where just the two gay people have gay sex is almost as good as the best possible world. In particular, its normalized utility is N/(N+2). Thus, if gay sex happens, each anti-gay person only loses 2/(N+2) utility. Aggregated over all N anti-gay people, this is a loss of 2N/(N+2).
- Also unrealistically, suppose that the only thing the two gay people care about is having gay sex. Their normalized utility for having sex is 1 and for not having it is 0. Aggregated over the two of them, the total gain from having sex is 2.
- Because 2 > 2N/(N+2), it's overall better in direct preference summation for the gay sex to happen as long as we weight each person's normalized utility equally. This is true regardless of N.
That said, if the anti-gay people had diminishing marginal disutility for additional acts of gay sex, this conclusion would probably flip around.
It feels intuitively suspicious to just sum normalized utility. As an example, consider a Beethoven utility monster -- a person whose only goal in life is to hear Beethoven's Ninth Symphony. This person has no other desires, and if he doesn't hear Beethoven's Ninth, it's as good as being dead. Meanwhile, other people also want to hear Beethoven's Ninth, but their desire for it is just a tiny fraction of what they care about. In particular, they value not dying and being able to live the rest of their lives 99,999 times as much as hearing Beethoven's Ninth.
- Each normal person's normalized utility without hearing the symphony is 0.99999. Hearing the symphony would make it 1.00000.
- The Beethoven utility monster would be at 0 without hearing the symphony and 1 hearing it.
- Thus, if we directly sum normalized utilities, it's better for the Beethoven utility monster to hear the symphony than for 99,999 regular people to do the same.
This seems suspicious. Maybe it's because our intuitions are not well adapted to thinking about organisms with really different utility functions from ours, and if we interacted with them more -- seeing them struggle endlessly, risking life and limb for the symphony they so desire -- we would begin to feel differently. Another problem is that an organism's utility counts for less as soon as the range of its experience increases. If the Beethoven monster were transformed to want to hear Beethoven's Ninth and Eighth symphonies each with equal strength, suddenly the value of its hearing the Ninth alone is cut in half. Again, maybe this is plausible, but it's not clear. I think some people have the intuition that an organism with a broader range of possible joys counts more than one with fewer, though I'm not sure I agree with this.
So the question of tyranny remains indeterminate. It depends on how you weigh different preferences. However, it remains the case that it may be instrumentally valuable to preserve norms of individual autonomy in order to produce better societies in the long run.
Preferences across worlds: A story of art maximizers
Consider the following (highly unrealistic) story. It's the year 2100. A group of three artist couples is traveling on the first manned voyage to Mars. These couples value art for art's sake, and in fact, their moral views consider art to be worthwhile even if no one experiences it. Their utility functions are linear in the amount of art that exists, and so they wish to maximize the expected amount of art in the galaxy -- converting planets and asteroids into van Gogh, Shakespeare, and Chopin.
However, they don't quite agree on which art is best. One couple wants to maximize paintings, feeling that a galaxy filled with paintings would be worth +3. A galaxy filled with sculptures would be +2. And a galaxy filled with poetry or music would be worthless: 0. The second couple values poetry at +3, sculptures at +2, and the other art at 0. The third values music at +3, sculptures at +2, and everything else at 0. Despite their divergent views, they manage to get along in the joint Martian voyage.
However, a few weeks into the trip, a terrestrial accident vaporizes Earth, leaving no one behind. The only humans are now the artists heading for Mars, where they land several months later.
The original plan had been for Earth to send more supplies following this crew, but now that Earth is gone, the colonists have only the minimal resources that the Martian base currently has in stock. They plan to grow more food in their greenhouse, but this will take many months, and the artists will all starve in the meanwhile if they each stick around. They realize that it would be best if two of the couples sacrificed themselves so that the third would have enough supplies to continue to grow crops and eventually repopulate the planet.
Rather than fighting for control of the Martian base, which could be costly and kill everyone, the three couples realize that everyone would be better off in expectation if they selected a winner by lottery. In particular, they use a quantum random number generator to apportion 1/3 probabilities for each couple to survive. The lottery takes place, and the winner is the first couple, which values paintings most highly. The other two couples wish the winning couple the best of luck and then head to the euthanasia pods.
The pro-paintings couple makes it through the period of low food and manages to establish a successful farming operation. They then begin having children to populate the planet. After many generations, Mars is home to a thriving miniature city. All the inhabitants value paintings at +3, sculptures at +2, and everything else at 0, due to the influence of the civilization's founders.
By the year 2700, the city's technology is sufficient to deploy von Neumann probes throughout the galaxy, converting planets into works of art. The city council convenes a meeting to decide exactly what kind of art should be deployed. Because everyone in the city prefers paintings, the council assumes the case will be open and shut. But as a formality, they invite their local philosopher, Dr. Muchos Mundos, to testify.
Council president: Dr. Mundos, the council has proposed to deploy von Neumann probes that will fill the galaxy with paintings. Do you agree with this decision?
Dr. Mundos: As I understand it, the council wishes to act in the optimal preference-utilitarian fashion on this question, right?
Council president: Yes, of course. The greatest good for the greatest number. Given that everyone who has any preferences about art most prefers a galaxy of paintings, we feel it's clear that paintings are what we should deploy. It's true that when this colony was founded, there were two other couples who would have wanted poetry and music, but their former preferences are far outweighed by our vast population that now wants paintings.
Dr. Mundos: I see. Are you familiar with the many-worlds interpretation (MWI) of quantum mechanics?
Council president: I'm a politician and not a physicist, but maybe you can give me the run-down?
Dr. Mundos: According to MWI, when quantum randomness occurs, it's not the case that just a single outcome is selected. Rather, all outcomes happen, and our experiences of the world split into different branches.
Council president: Okay. What's the relevance to art policy?
Dr. Mundos: Well, a quantum lottery was used to decide which colonizing couple would populate Mars. The painting lovers won in this branch of the multiverse, but the poetry lovers won in another branch with equal measure, and the music lovers won in a third branch, also with equal measure. Presumably the couples in those branches also populated Mars with a city about as populous as our own. And if they care about art for art's sake, regardless of whether they know about it or where it exists, then the populations of those cities in other Everett branches also care about what art we deploy.
Council president: Oh dear, you're right. Our city contains M people, and suppose their cities have about the same populations. If we deploy paintings, our M citizens each get +3 of utility, and those in the other worlds get nothing. The aggregate is 3M. But if we deploy sculptures, which everyone values at +2, the total utility is 3 * 2M = 6M. This is much better than 3M for paintings.
Dr. Mundos: Yes, exactly. Of course, we might have some uncertainty over whether the populations in the other branches survived. But even if the probability they survived was only, say, 1/3, then the expected utility of sculptures would still be 2M for us plus (1/3)(2M + 2M) = 4M/3 for them. The sum is more than 3M, so it would still be better to do sculptures.
After further deliberation, the council agreed with this argument and deployed sculptures. The preference satisfaction of the poetry-loving and music-loving cities was improved.
Multiversal distribution of preferences
According to Max Tegmark's "Parallel Universes," there's probably an exact copy of you reading this article within 101028 meters away and in practice, probably much closer. As Tegmark explains, this claim assumes only basic physics that most cosmologists take for granted. Even nearer than this distance are many people very similar to you but with minor variations -- e.g., with brown eyes instead of blue, or who prefer virtue ethics over deontology.
In fact, all possible people exist somewhere in the multiverse, if only due to random fluctuations of the type that produce Boltzmann brains. Nick Bostrom calls these "freak observers." Just as there are art maximizers, there are also art minimizers who find art disgusting and want to eliminate as much of it as possible. For them, the thought of art triggers their brains' disgust centers instead of beauty centers.
However, the distribution of organisms across the multiverse is not uniform. For instance, we should expect suffering reducers to be much more common than suffering increasers because organisms evolve to dislike suffering by themselves, their kin, and their reciprocal trading partners. Societies -- whether human or alien -- should often develop norms against cruelty for collective benefit.
Human values give us some hints about what values across the multiverse look like, because human values are a kind of maximum likelihood estimator for the mode of the multiversal distribution. Of course, we should expect some variation about the mode. Even among humans, some cultural norms are distinct and others are universal. Probably values like not murdering, not causing unnecessary suffering, not stealing, etc. are more common among aliens than, say, the value of music or dance, which might be human-specific spandrels. Still, aliens may have their own spandrels that they call "art," and they might value those things.
Like human values, alien values might be mostly self-directed toward their own wellbeing, especially in their earlier Darwinian phases. Unless we meet the aliens face-to-face, we can't improve their welfare directly. However, the aliens may also have some outward-directed aesthetic and moral values that apply across space and time, like the value of art as seen by the art-maximizing cities on Mars in the previous section. If so, we can affect the satisfaction of these preferences by our actions, and presumably they should be included in preference-utilitarian calculations.
As an example, suppose there were 10 civilizations. All 10 valued reducing suffering and social equality. 5 of the 10 also valued generating knowledge. Only 1 of the 10 valued creating paintings and poetry. Suppose our civilization values all of those things. Perhaps previously we were going to spend money on creating more poetry, because our citizens value that highly. However, upon considering that poetry would not satisfy the preferences of the other civilizations, we might switch more toward knowledge and especially toward suffering reduction and equality promotion.
In general, considering the distribution of outward-directed preferences across the multiverse should lead us to favor more those preferences of ours that are more evolutionarily robust, i.e., that we predict more civilizations to have settled upon. One corollary is that we should care less about values that we have due to particular, idiosyncratic historical contingencies, such as who happened to win some very closely contested war, or what species were killed by a random asteroid strike. Values based on more inevitable historical trends should matter relatively more strongly.
Tyranny of the aliens?
Suppose, conservatively, that for every one human civilization, there are 1000 alien civilizations that have some outward-directed preferences (e.g., for more suffering reduction, justice, knowledge, etc.). Even if each alien civilization cares only a little bit about what we do, collectively do their preferences outweigh our preferences about our own destiny? Would we find ourselves beholden to the tyranny of the alien majority about our behavior?
This question runs exactly parallel to the standard concern about tyranny of the majority for individuals within a society, so the same sorts of arguments will apply on each side. Just as in that case, it's possible aliens would place value on the ability of individual civilizations to make their own choices about how they're constituted without too much outside interference. Of course, this is just speculation.
Even if tyranny of the alien majority was the result, we might choose to accept that conclusion. After all, it seems to yield more total preference satisfaction, which is what the preference utilitarians were aiming for.
Direct welfare may often dominate
In the preceding examples, I often focused on aesthetic values like art and knowledge for a specific reason: These are cases of preferences for something to exist or not where that thing does not itself have preferences. Art does not prefer for itself to keep existing or stop existing.
However, many human preferences have implications for the preferences of others. For instance, a preference by humans for more wilderness may mean vast numbers of additional wild animals, many of whom strongly (implicitly) prefer not to have endured the short lives and painful deaths inherent to the bodies in which they found themselves born. A relatively weak aesthetic preference for nature by a relatively small number of people is compared against strong hedonic preferences by large numbers of animals not to have existed. In this case, the preferences of the animals clearly dominate. The same is true for preferences about creating space colonies and the like: The preferences of the people, animals, and other agents in those colonies will tend to far outweigh the preferences of their creators.
Considering multiverse-wide aesthetic and moral preferences is thus cleanest in the case of preferences about inanimate things. Aliens' preferences about actions that affect the welfare of organisms in our civilization still matter, but relatively less than the contribution of their preferences about inanimate things.
Acknowledgments
This piece was inspired by Carl Shulman's "Rawls' original position, potential people, and Pascal's Mugging," as well as a conversation with Paul Christiano.
Utility Quilting
Related: Pinpointing Utility
Let's go for lunch at the Hypothetical Diner; I have something I want to discuss with you.
We will pick our lunch from the set of possible orders, and we will recieve a meal drawn from the set of possible meals, O.
Speaking in general, each possible order has an associated probability distribution over O. The Hypothetical Diner takes care to simplify your analysis; the probability distribution is trivial; you always get exactly what you ordered.
Again to simplify your lunch, the Hypothetical Diner offers only two choices on the menu: the Soup, and the Bagel.
To then complicate things so that we have something to talk about, suppose there is some set M of ways other things could be that may affect your preferences. Perhaps you have sore teeth on some days.
Suppose for the purposes of this hypothetical lunch date that you are VNM rational. Shocking, I know, but the hypothetical results are clear: you have a utility function, U. The domain of the utility function is the product of all the variables that affect your preferences (which meal, and whether your teeth are sore): U: M x O -> utility.
In our case, if your teeth are sore, you prefer the soup, as it is less painful. If your teeth are not sore, you prefer the bagel, because it is tastier:
U(sore & soup) > U(sore & bagel)
U(~sore & soup) < U(~sore & bagel)
Your global utility function can be partially applied to some m in M to get an "object-level" utility function U_m: O -> utility. Note that the restrictions of U made in this way need not have any resemblance to each other; they are completely separate.
It is convenient to think about and define these restricted "utility function patches" separately. Let's pick some units and datums so we can get concrete numbers for our utilities:
U_sore(soup) = 1 ; U_sore(bagel) = 0
U_unsore(soup) = 0 ; U_unsore(bagel) = 1
Those are separate utility functions, now, so we could pick units and datum seperately. Because of this, the sore numbers are totally incommensurable to the unsore numbers. *Don't try to comapre them between the UF's or you will get type-poisoning. The actual numbers are just a straightforward encoding of the preferences mentioned above.
What if we are unsure about where we fall in M? That is, we won't know whether our teeth are sore until we take the first bite. That is, we have a probability distribution over M. Maybe we are 70% sure that your teeth won't hurt you today. What should you order?
Well, it's usually a good idea to maximize expected utility:
EU(soup) = 30%*U(sore&soup) + 70%*U(~sore&soup) = ???
EU(bagel) = 30%*U(sore&bagel) + 70%*U(~sore&bagel) = ???
Suddenly we need those utility function patches to be commensuarable, so that we can actually compute these, but we went and defined them separately, darn. All is not lost though, recall that they are just restrictions of a global utility function to particular soreness-circumstance, with some (positive) linear transforms, f_m, thrown in to make the numbers nice:
f_sore(U(sore&soup)) = 1 ; f_sore(U(sore&bagel)) = 0
f_unsore(U(~sore&soup)) = 0 ; f_unsore(U(~sore&bagel)) = 1
At this point, it's just a bit of clever function-inverting and all is dandy. We can pick some linear transform g to be canonical, and transform all the utility function patches into that basis. So for all m, we can get g(U(m & o)) by inverting the f_m and then applying g:
g.U(sore & x) = (g.inv(f_sore).f_sore)(U(sore & x))
= k_sore*U_sore(x) + c_sore
g.U(~sore & x) = (g.inv(f_unsore).f_unsore)(U(~sore & x))
= k_unsore*U_unsore(x) + c_unsore
(I'm using . to represent composition of those transforms. I hope that's not too confusing.)
Linear transforms are really nice; all the inverting and composing collapses down to a scale k and an offset c for each utility function patch. Now we've turned our bag of utility function patches into a utility function quilt! One more bit of math before we get back to deciding what to eat:
EU(x) = P(sore) *(k_sore *U_sore(x) + c_sore) +
(1-P(sore))*(k_unsore*U_unsore(x) + c_unsore)
Notice that the terms involving c_m do not involve x, meaning that the c_m terms don't affect our decision, so we can cancel them out and forget they ever existed! This is only true because I've implicitly assumed that P(m) does not depend on our actions. If it did, like if we could go to the dentist or take some painkillers, then it would be P(m | x) and c_m would be relevent in the whole joint decision.
We can define the canonical utility basis g to be whatever we like (among positive linear transforms); for example, we can make it equal to f_sore so that we can at least keep the simple numbers from U_sore. Then we throw all the c_ms away, because they don't matter. Then it's just a matter of getting the remaining k_ms.
Ok, sorry, those last few paragraphs were rather abstract. Back to lunch. We just need to define these mysterious scaling constants and then we can order lunch. There is only one left; k_unsore. In general there will be n-1, where n is the size of M. I think the easiest way to approach this is to let k_unsore = 1/5 and see what that implies:
g.U(sore & soup) = 1 ; g.U(sore & bagel) = 0
g.U(~sore & soup) = 0 ; g.U(~sore & bagel) = 1/5
EU(soup) = (1-P(~sore))*1 = 0.3
EU(bagel) = P(~sore)*k_unsore = 0.14
EU(soup) > EU(bagel)
After all the arithmetic, it looks like if k_unsore = 1/5, even if we expect you to have nonsore teeth with P(sore) = 0.3, we are unsure enough and the relative importance is big enough that we should play safe safe and go with the soup anyways. In general we would choose soup if P(~sore) < 1/(k_unsore+1), or equivalently, if k_unsore < (1-P(~sore)/P(~sore).
So k is somehow the relative importance of possible preference stuctures under uncertainty. A smaller k in this lunch example means that the tastiness of a bagel over a soup is small relative to the pain saved by eating the soup instead. With this intuition, we can see that 1/5 is a somewhat reasonable value for this scenario, and for example, 1 would not be, and neither would 1/20
What if we are uncertain about k? Are we simply pushing the problem up some meta-chain? It turns out that no, we are not. Because k is linearly related to utility, you can simply use its expected value if it is uncertain.
It's kind of ugly to have these k_m's and these U_m's, so we can just reason over the product K x M instead of M and K seperately. This is nothing weird, it just means we have more utility function patches (Many of which encode the exact same object-level preferences).
In the most general case, the utility function patches in KxM are the space of all functions O -> RR, with offset equivalence, but not scale equivalence (Sovereign utility functions have full linear-transform equivalence, but these patches are only equivalent under offset). Remember, though, that these are just restricted patches of a single global utility function.
So what is the point of all this? Are we just playing in the VNM sandbox, or is this result actually interesting for anything besides sore teeth?
Perhaps Moral/Preference Uncertainty? I didn't mention it until now because it's easier to think about lunch than a philosophical minefield, but it is the point of this post. Sorry about that. Let's conclude with everything restated in terms of moral uncertainty.
TL;DR:
If we have:
-
A set of object-level outcomes
O, -
A set of "epiphenomenal" (outside of
O) 'moral' outcomesM, -
A probability distribution over
M, possibly correlated with uncertainty aboutO, but not in a way that allows our actions to influence uncertainty overM(that is, assuming moral facts cannot be changed by your actions.), -
A utility function over
Ofor each possible value ofM, (these can be arbitrary VNM-rational moral theories, as long as they share the same object-level), -
And we wish to be VNM rational over whatever uncertainty we have
then we can quilt together a global utility function U: (M,K,O) -> RR where and U(m,k,o) = k*U_m(o) so that EU(o) is the sum of all P(m)*E(k | m)*U_m(o)
Somehow this all seems like legal VNM.
Implications
So. Just the possible object-level preferences and a probability distribution over those is not enough to define our behaviour. We need to know the scale for each so we know how to act when uncertain. This is analogous to the switch from ordinal preferences to interval preferences when dealing with object-level uncertainty.
Now we have a well-defined framework for reasoning about preference uncertainty, if all our possible moral theories are VNM rational, moral facts are immutable, and we have a joint probability distribution over OxMxK.
In particular, updating your moral beliefs upon hearing new arguments is no longer a mysterious dynamic, it is just a bayesian update over possible moral theories.
This requires a "moral prior" that corellates moral outcomes and their relative scales to the observable evidence. In the lunch example, we implicitly used such a moral prior to update on observable thought experiments and conclude that 1/5 was a plausible value for k_unsore.
Moral evidence is probably things like preference thought-experiments, neuroscience and physics results, etc. The actual model for this, and discussion about the issues with defining and reasoning on such a prior are outside the scope of this post.
This whole argument couldn't prove its way out of a wet paper bag, and is merely suggestive. Bits and peices may be found incorrect, and formalization might change things a bit.
This framework requires that we have already worked out the outcome-space O (which we haven't), have limited our moral confusion to a set of VNM-rational moral theories over O (which we haven't), and have defined a "Moral Prior" so we can have a probability distribution over moral theories and their wieghts (which we haven't).
Nonetheless, we can sometimes get those things in special limited cases, and even in the general case, having a model for moral uncertainty and updating is a huge step up from the terrifying confusion I (and everyone I've talked to) had before working this out.
Stanovich on CEV
Keith Stanovich is a leading expert on the cogsci of rationality, but he also also written on a problem related to CEV, that of the "rational integration" of our preferences. Here he is on pages 81-86 of Rationality and the Reflective Mind (currently my single favorite book on rationality, out of the dozens I've read):
All multiple-process models of mind capture a phenomenal aspect of human decision making that is of profound importance — that humans often feel alienated from their choices. We display what folk psychology and philosophers term weakness of will. For example, we continue to smoke when we know that it is a harmful habit. We order a sweet after a large meal, merely an hour after pledging to ourselves that we would not. In fact, we display alienation from our responses even in situations that do not involve weakness of will — we find ourselves recoiling from the sight of a disfigured person even after a lifetime of dedication to diversity and inclusion.
This feeling of alienation — although emotionally discomfiting when it occurs — is actually a reflection of a unique aspect of human cognition: the use of Type 2 metarepresentational abilities to enable a cognitive critique of our beliefs and our desires. Beliefs about how well we are forming beliefs become possible because of such metarepresentation, as does the ability to evaluate one's desires — to desire to desire differently...
...There is a philosophical literature on the notion of higher-order evaluation of desires... For example, in a classic paper on second-order desires, Frankfurt (1971) speculated that only humans have such metarepresentational states. He evocatively termed creatures without second-order desires (other animals, human babies) wantons... A wanton simply does not reflect on his/her goals. Wantons want — but they do not care what they want.
Nonwantons, however, can represent a model of an idealized preference structure — perhaps, for example, a model based on a superordinate judgment of long-term lifespan considerations... So a human can say: I would prefer to prefer not to smoke. This second-order preference can then become a motivational competitor to the first-order preference. At the level of second-order preferences, I prefer to prefer to not smoke; nevertheless, as a first-order preference, I prefer to smoke. The resulting conflict signals that I lack what Nozick (1993) terms rational integration in my preference structures. Such a mismatched first-/second-order preference structure is one reason why humans are often less rational than bees in an axiomatic sense (see Stanovich 2004, pp. 243-247). This is because the struggle to achieve rational integration can destabilize first-order preferences in ways that make them more prone to the context effects that lead to the violation of the basic axioms of utility theory (see Lee, Amir, & Ariely 2009).
The struggle for rational integration is also what contributes to the feeling of alienation that people in the modern world often feel when contemplating the choices that they have made. People easily detect when their high-order preferences conflict with the choices actually made.
Of course, there is no limit to the hierarchy of higher-order desires that might be constructed. But the representational abilities of humans may set some limits — certainly three levels above seems a realistic limit for most people in the nonsocial domain (Dworking 1988). However, third-order judgments can be called upon to to help achieve rational integration at lower levels. So, for example, imagine that John is a smoker. He might realize the following when he probes his feelings: He prefers his preference to prefer not to smoke over his preference for smoking.
We might in this case say that John's third-order judgment has ratified his second-order evaluation. Presumably this ratification of his second-order judgment adds to the cognitive pressure to change the first-order preference by taking behavioral measures that will make change more likely (entering a smoking secession program, consulting his physician, staying out of smoky bars, etc.).
On the other hand, a third-order judgment might undermine the second-order preference by failing to ratify it: John might prefer to smoke more than he prefers his preference to prefer not to smoke.
In this case, although John wishes he did not want to smoke, the preference for this preference is not as strong as his preference for smoking itself. We might suspect that this third-order judgment might not only prevent John from taking strong behavioral steps to rid himself of his addiction, but that over time it might erode his conviction in his second-order preference itself, thus bringing rational integration to all three levels.
Typically, philosophers have tended to bias their analyses toward the highest level desire that is constructed — privileging the highest point in the regress of higher-order evaluations, using that as the foundation, and defining it as the true self. Modern cognitive science would suggest instead a Neurathian project in which no level of analysis is uniquely privileged. Philosopher Otto Neurath... employed the metaphor of a boat having some rotten planks. The best way to repair the planks would be to bring the boat ashore, stand on firm ground, and replace the planks. But what if the boat could not be brought ashore? Actually, the boat could still be repaired but at some risk. We could repair the planks at sea by standing on some of the planks while repairing others. The project could work — we could repair the boat without being on the firm foundation of ground. The Neurathian project is not guaranteed, however, because we might choose to stand on a rotten plank. For example, nothing in Frankfurt's (1971) notion of higher-order desires guarantees against higher-order judgments being infected by memes... that are personally damaging.
Also see: The Robot's Rebellion, Higher order preferences the master rationality motive, Wanting to Want, The Human's Hidden Utility Function (Maybe), Indirect Normativity
A brief tutorial on preferences in AI
Preferences are important both for rationality and for Friendly AI, so preferences are a major topic of discussion on Less Wrong. We've discussed preferences in the context of economics and decision theory, but I think AI has a more robust set of tools for working with preferences than either economics or decision theory has, so I'd like to introduce Less Wrong to some of these tools. In particular, I think AI's toolset for working with preferences may help us think more clearly about CEV.
In AI, we can think of working with preferences in four steps:
- Preference acquisition: In this step, we aim to extract preferences from a user. This can occur either by preference learning or by preference elicitation. Preference learning occurs when preferences are acquired from data about the user's past behavior or past preferences. Preference elicitation occurs as a result of an interactive process with the user, e.g. a question-answer process.
- Preferences modeling: Our next step is to mathematically express these acquired preferences as preferences between pairwise choices. The properties of a preferences model are important. For example, is the relation transitive? (If the model tells us that choice c1 is preferred to c2, and c2 is preferred to c3, can we conclude that c1 is preferred to c3?) And is the relation complete? (Is any choice comparable to any other choice, or are there some incomparabilities?)
- Preference representation: Assuming we want to capture and manipulate the user's preferences robustly, we'll next want to represent the preferences model in a preference representation language.
- Preferences reasoning: Once a user's preferences are represented in a preference representation language, we can do cool things like preferences aggregation (involving the preferences of multiple agents) and preference revision (a user's new preferences being added to her old preferences). We can also perform the usual computations of decision theory, game theory, and more.
Preference acquisition
Preference learning is typically an application of supervised machine learning (classification). Throw the algorithm at a database containing a user's preferences, and it will learn that user's preferences and make predictions about the preferences not listed in the database, including preferences about pairwise choices the user may never have faced before.
Preference elicitation involves asking a user a series of questions, and extracting their preferences from the answers they give. Chen & Pu (2004) survey some of the methods used for this.
In studying CEV, I am interested in methods built for learning a user's utility function from inconsistent behavior (because humans make inconsistent choices). Nielsen & Jensen (2004) provided two computationally tractable algorithms which handle the problem by interpreting inconsistent behavior as random deviations from an underlying "true" utility function. As far as I know, however, nobody in AI has tried to solve the problem with an algorithm informed by the latest data from neuroeconomics on how human choice is the product of at least three valuation systems, only one of which looks anything like an "underlying true utility function."
Preference Modeling
A model of a user's preferences describes one of three relations between any two choices ("objects"): a strict preference relation which says that one choice is preferred to another, an indifference relation, and an incomparability relation. Kaci (2011), chapter 2 provides a brief account of preference modeling.
Preference Representation
In decision theory, a preference relation is represented by a numerical function with associates a utility value with each choice. But this may not be the best representation. We face an exponential number of choices whose explicit enumeration and evaluation is time-consuming. Moreover, users can't compare all pairwise choices and evaluate how satisfactory each choice is.
Luckily, choices are often made on the basis of a set of attributes, e.g. cost, color, price, etc. You can use a preference representation language to represent partial descriptions of preferences and rank-order possible choices. The challenge of a preference representation language is that it should (1) cope with a user's preferences, (2) faithfully represent the user's preferences such that it rank-orders choices in a way similar to how the user would specify choices if they were able to provide preferences for every pairwise comparison, (3) cope with possibly inconsistent preferences, and (4) offer attractive complexity properties, i.e. the spatial cost of representing partial descriptions of preferences and the time cost of comparing pairwise choices or computing the best choices.
One popular method of preference representation is with the graphical representation language of conditional preference networks or "CP-nets." They look like this.
Preferences Reasoning
There are a multitude of ways in which one might want to reason algorithmically about preferences. I point the reader to Part II of Kaci (2011) for a very incomplete overview.
General Sources:
Domshlak et al. (2011). Preferences in AI: An Overview. Artificial Intelligence 175: 1037-1052.
Fürnkranz & Hüllermeier (2010). Preference Learning. Springer.
Kaci (2011). Working with Preferences: Less is More. Springer.
Informal job survey
I've recently been thinking about future job prospects and ways that I might alter my preferences to increase the likelihood that I'll be happy with my future career. I have read some of the LessWrong resources about this issue, but they don't seem to address my particular concerns. I think there is a high relative importance for selecting a career with a high capacity for making me happy. It will consume at least 8 prime daylight hours of my work days and in many cases also some of the weekend. In all likelihood I will also be forced to sit in front of a computer for extended periods of time. The tasks I am assigned may have nothing to do with the things that I happen to find intellectually interesting or of ethical importance. And the work will likely zap me of most of the energy that I could use to pursue hobbies or other more "intrinsically worthwhile endeavors" (intrinsic to my personal preference ordering). Given that I believe these factors will largely determine whether I feel happy in many future situations and also whether I feel generically happy about the content of my life as a whole, I think it is worthwhile to seek advice from other rationalists in how to choose an appropriate career goal and take steps to pursue it.
What I have found on LessWrong, however, is that ambiguous and open-ended pleas for advice generally steer off course, even if the tangential issues are very interesting and insightful. Rather than query everyone for open advice about preference hacking, vague goal achievement, and wisdom for properly assigning value to some of the factors I have listed above, I propose a simpler informal job survey.
If you are interested, please briefly list the job you have or the job of someone you know very well (well enough that you feel you know relevant details about the job, details that may be hard to gather in less than 1 hour of internet searching). You don't have to reveal the location or name of the employer or anything like that, just the type of job. Optionally, please also include a sentence stating whether you (or your friend, etc.) seem to enjoy the job and why. For example, my entry would be like this:
I am a graduate student studying applied mathematics. I enjoy the access to educational resources and the flexible schedule that my current job offers, but I think my personal displeasure with computer programming and my perception that future jobs doing mathematical theory are scarce cause me to dislike the job overall.
If enough people are willing to participate, my hope is that the stream of small anecdotal remarks will serve as a brainstorming session. I hope to hear about jobs I may never have thought of, and also reasons for liking or disliking a job that I may never have thought of. The goal is to spark additional search on my own and also to gauge my current preferences in light of preferences that others have experienced with specific jobs. Such a survey would be a very helpful resource allowing me to synthesize data about job directions where the initial search will have a higher probability of being helpful for me.
How to hack one's self to want to want to ... hack one's self.
I was inspired by the recent post discussing self-hacking for the purpose of changing a relationship perspective to achieve a goal. Despite my feeling inspired, though, I also felt like life hacking was not something I could ever want to do even if I perceived benefits to doing it. It seems to me that the place where I would need to begin is hacking myself in order to cause myself to want to be hacked. But then I started contemplating whether this is a plausible thing to do.
In my own case, there are two concrete examples in mind. I am a graduate student working on applied math and probability theory in the field of machine vision. I was one of those bright-eyes, bushy-tailed dolts as an undergrad who just sort of floated to grad school believing that as long as I worked sufficiently hard, it was a logical conclusion that I would get a tenure-track faculty position at a desirable university. Even though I am a fellowship award winner and I am working with a well-known researcher at an Ivy League school, my experience in grad school (along with some noted articles) has forced me to re-examine a lot of my priorities. Tenure-track positions are just too difficult to achieve and achieving them is based on networking, politics, and whether the popularity of your research happens to have a peak at the same time that your productivity in that area also has a peak.
But the alternatives that I see are: join the consulting/business/startup world, become a programmer/analyst for a large software/IT/computer company, work for a government research lab. I worked for two years at MIT's Lincoln Laboratory as a radar analyst and signal processing algorithm developer prior to grad school. The main reason I left that job was because I (foolishly) thought that graduate school was where someone goes to specifically learn the higher-level knowledge and skills to do theoretical work that transcends the software development / data processing work that is so common. I'm more interested in creating tools that go into the toolbox of an engineer than with actually using those tools to create something that people want to pay for.
I have been deeply thinking about these issues for more than two years now, almost every day. I read everything that I can and I try to be as blunt and to-the-point about it as I can be. Future career prospects seem bleak to me. Everyone is getting crushed by data right now. I was just talking with my adviser recently about how so much of the mathematical framework for studying vision over the last 30 years is just being flushed down the tubes because of the massive amount of data processing and large scale machine learning we can now tractably perform. If you want to build a cup-detector for example, you can do lots of fancy modeling, stochastic texture mapping, active contour models, fancy differential geometry, occlusion modeling, etc. Or.. you can just train an SVM on 50,000,000 weakly labeled images of cups you find on the internet. And that SVM will utterly crush the performance of the expert system based on 30 years of research from amazing mathematicians. And this crushing effect only stands to get much much worse and at an increasing pace.
In light of this, it seems to me that I should be learning as much as I can about large-scale data processing, GPU computing, advanced parallel architectures, and the gross details of implementing bleeding edge machine learning. But, currently, this is exactly the sort of thing I hate and went to graduate school to avoid. I wanted to study Total Variation minimization, or PDE-driven diffusion models in image processing, etc. And these are things that are completely crushed by large data processing.
So anyway, long story short: suppose that I really like "math theory and teaching at a respected research university" but I see the coming data steamroller and believe that this preference will cause me to feel unhappy in the future when many other preferences I have (and some I don't yet know about) are effected negatively by pursuit of a phantom tenure-track position. But suppose also that another preference I have is that I really hate "writing computer code to build widgets for customers" which can include large scale data analyses, and thus I feel an aversion to even trying to *want* to hack myself and orient myself to a more practical career goal.
How does one hack one's self to change one's preferences when the preference in question is "I don't want to hack myself?"
'Preferences in AI: An Overview' [link]
Those interested in AI preferences may appreciate this recent review:
Domshlak et al., Preferences in AI: An overview
Non-personal preferences of never-existed people
Some people see never-existed people as moral agents, and claim that we can talk about their preferences. Generally this means their personal preference in existing versus non-existing. Formulations such "it is better for someone to have existed than not" reflect this way of thinking.
But if the preferences of never-existed are relevant, then their non-personal perferences are also relevant. Do they perfer a blue world or a pink one? Would they want us to change our political systems? Would they want us to not bring into existence some never-existent people they don't like?
It seems that those who are advocating bringing never-existent people into being in order to satisfy those people's preferences should be focusing their attention on their non-personal preferences instead. After all, we can only bring into being so many trillions of trillions of trillions; but there is no theoretical limit to the number of never-existent people whose non-personal preferences we can satisfy. Just get some reasonable measure across the preferences of never-existent people, and see if there's anything that sticks out from the mass.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)