You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Naturalism versus unbounded (or unmaximisable) utility options

34 Stuart_Armstrong 01 February 2013 05:45PM

There are many paradoxes with unbounded utility functions. For instance, consider whether it's rational to spend eternity in Hell:

Suppose that you die, and God offers you a deal. You can spend 1 day in Hell, and he will give you 2 days in Heaven, and then you will spend the rest of eternity in Purgatory (which is positioned exactly midway in utility between heaven and hell). You decide that it's a good deal, and accept. At the end of your first day in Hell, God offers you the same deal: 1 extra day in Hell, and you will get 2 more days in Heaven. Again you accept. The same deal is offered at the end of the second day.

And the result is... that you spend eternity in Hell. There is never a rational moment to leave for Heaven - that decision is always dominated by the decision to stay in Hell.

Or consider a simpler paradox:

You're immortal. Tell Omega any natural number, and he will give you that much utility. On top of that, he will give you any utility you may have lost in the decision process (such as the time wasted choosing and specifying your number). Then he departs. What number will you choose?

Again, there's no good answer to this problem - any number you name, you could have got more by naming a higher one. And since Omega compensates you for extra effort, there's never any reason to not name a higher number.

It seems that these are problems caused by unbounded utility. But that's not the case, in fact! Consider:

You're immortal. Tell Omega any real number r > 0, and he'll give you 1-r utility. On top of that, he will give you any utility you may have lost in the decision process (such as the time wasted choosing and specifying your number). Then he departs. What number will you choose?

continue reading »

Proof of fungibility theorem

3 Nisan 12 January 2013 09:26AM

Appendix to: A fungibility theorem

Suppose that is a set and we have functions . Recall that for , we say that is a Pareto improvement over if for all , we have . And we say that it is a strong Pareto improvement if in addition there is some for which . We call Pareto optimum if there is no strong Pareto improvement over it.

Theorem. Let be a set and suppose for are functions satisfying the following property: For any and any , there exists an such that for all , we have .

Then if an element of is a Pareto optimum, then there exist nonnegative constants such that the function achieves a maximum at .

continue reading »

False vacuum: the universe playing quantum suicide

16 Stuart_Armstrong 09 January 2013 05:04PM

Imagine that the universe is approximately as it appears to be (I know, this is a controversial proposition, but bear with me!). Further imagine that the many worlds interpretation of Quantum mechanics is true (I'm really moving out of Less Wrong's comfort zone here, aren't I?).

Now assume that our universe is in a situation of false vacuum - the universe is not in its lowest energy configuration. Somewhere, at some point, our universe may tunnel into true vacuum, resulting in a expanding bubble of destruction that will eat the entire universe at high speed, destroying all matter and life. In many worlds, such a collapse need not be terminal: life could go one on a branch of lower measure. In fact, anthropically, life will go on somewhere, no matter how unstable the false vacuum is.

So now assume that the false vacuum we're in is highly unstable - the measure of the branch in which our universe survives goes down by a factor of a trillion every second. We only exist because we're in the branch of measure a trillionth of a trillionth of a trillionth of... all the way back to the Big Bang.

None of these assumptions make any difference to what we'd expect to see observationally: only a good enough theory can say that they're right or wrong. You may notice that this setup transforms the whole universe into a quantum suicide situation.

The question is, how do you go about maximising expected utility in this situation? I can think of a few different approaches:

  1. Gnaw on the bullet: take the quantum measure as a probability. This means that you now have a discount factor of a trillion every second. You have to rush out and get/do all the good stuff as fast as possible: a delay of a second costs you a reduction in utility of a trillion. If you are a negative utilitarian, you also have to rush to minimise the bad stuff, but you can also take comfort in the fact that the potential for negative utility across the universe is going down fast.
  2. Use relative measures: care about the relative proportion of good worlds versus bad worlds, while assigning zero to those worlds where the vacuum has collapsed. This requires a natural zero to make sense, and can be seen as quite arbitrary: what would you do about entangled worlds, or about the non-zero probability that the vacuum-collapsed worlds may have worthwhile life in them? Would the relative measure user also put zero value to worlds that were empty of life for other reasons than vacuum collapse? For instance, would they  be in favour of programming an AI's friendliness using random quantum bits, if it could be reassured that if friendliness fails, the AI would kill everyone immediately?
  3. Deny the measure: construct a meta ethical theory where only classical probabilities (or classical uncertainties) count as probabilities. Quantum measures do not: you care about the sum total of all branches of the universe. Universes in which the photon went through the top slit, went through the bottom slit, or was in an entangled state that went through both slits... to you, there are three completely separate universes, and you can assign totally unrelated utilities to each one. This seems quite arbitrary, though: how are you going to construct these preferences across the whole of the quantum universe, when forged your current preferences on a single branch?
  4. Cheat: note that nothing in life is certain. Even if we have the strongest evidence imaginable about vacuum collapse, there's always a tiny chance that the evidence is wrong. After a few seconds, that probability will be dwarfed by the discount factor of the collapsing universe. So go about your business as usual, knowing that most of the measure/probability mass remains in the non-collapsing universe. This can get tricky if, for instance the vacuum collapsed more slowly that a factor of a trillion a second. Would you be in a situation where you should behave as if you believed vacuum collapse for another decade, say, and then switch to a behaviour that assumed non-collapse afterwards? Also, would you take seemingly stupid bets, like bets at a trillion trillion trillion to one that the next piece of evidence will show no collapse (if you lose, you're likely in the low measure universe anyway, so the loss is minute)?

 

Inferring Values from Imperfect Optimizers

2 nigerweiss 29 December 2012 10:22PM

One approach to constructing a Friendly artificial intelligence is to create a piece of software that looks at large amounts of evidence about humans, and attempts to infer their values.  I've been doing some thinking about this problem, and I'm going to talk about some approaches and problems that have occurred to me.

 

In a naive approach, we might define the problem like this: take some unknown utility function, U, and plug it into a mathematically clean optimization process (like AIXI) O.  Then, look at your data set and take the information about the inputs and outputs of humans, and find the simplest U that best explains human behavior.

Unfortunately, this won't work.  The best possible match for U is one that models not just those elements of human utility we're interested in, but also all the details of our broken, contradictory optimization process.  The U we derive through this process will optimize for confirmation bias, scope insensitivity, hindsight bias, the halo effect, our own limited intelligence and inefficient use of evidence, and just about everything else that's wrong with us.  Not what we're looking for.

Okay, so let's try putting a bandaid on it - let's go back to our original problem setup.  However, we'll take our original O, and use all of the science on cognitive biases at our disposal to handicap it.  We'll limit its search space, saddle it with a laundry list of cognitive biases, cripple its ability to use evidence, and in general make it as human-like as we possibly can.  We could even give it akrasia by implementing hyperbolic discounting of reward.  Then we'll repeat the original process to produce U'.

If we plug U' into our AI, the result will be that it will optimize like a human who had suddenly been stripped of all the kinds of stupidity that we programmed into our modified O.  This is good!  Plugged into a solid CEV infrastructure, this might even be good enough to produce a future that's a nice place to live.  However, it's not quite ideal.  If we miss a cognitive bias, then it'll be incorporated into the learned utility functions, and we may never be rid of it.  What would be nice would be if we could get the AI to learn about cognitive biases, exhaustively, and update in the future if it ever discovered a new one.  

 

If we had enough time and money, we could do this the hard way: acquire a representative sample of the human population, and pay them to perform tasks with simple goals under tremendous surveillance, and have the AI derive the human optimization process from the actions taken towards a known goal.  However, if we assume that the human optimization process can be defined as a function over the state of the human brain, we should not trust the completeness of any such process learned from less data than the entropy of the human brain, which is on the order of tens of petabytes of extremely high quality evidence.  If we want to be confident in the completeness of our model, we may need more experimental evidence than it is really practical to accumulate.  Which isn't to say that this approach is useless - if we can hit close enough to the mark, then the AI may be able to run more exhaustive experimentation later and refine its own understanding of human brains to be closer to the ideal.

But it'd really be nice if our AI could do unsupervised learning to figure out the details of human optimization.  Then we could simply dump the internet into it, and let it grind away at the data and spit out a detailed, complete model of human decision-making, from which our utility function could be derived.  Unfortunately, this does not seem to be a tractable problem.  It's possible that some insight could be gleaned by examining outliers with normal intelligence, but deviant utility functions (I am thinking specifically of sociopaths), but it's unclear how much insight can be produced by these methods.  If anyone has suggestions for a more efficient way of going about it, I'd love to hear it.  As it stands, it might be possible to get enough information from this to supplement a supervised learning approach - the closer we get to a perfectly accurate model, the higher the probability of Things Going Well.                  

Anyways, that's where I am right now.  I just thought I'd put up my thoughts and see if some fresh eyes see anything I've been missing.  

 

Cheers,

Niger 

Consistence of reciprocity?

0 yttrium 16 December 2012 07:08PM

Many people see themselves in various groups (member of the population of their home country, or their social network), and feel justified in caring more about the well-being of people in this group than about that of others. They will argue with reciprocity: "Those people pay taxes in our country, they are entitled to more support from 'us' than others!" My question is: Is this inconsistent with some rationality axioms that seem obvious? What often-adopted or reasonable axioms are there that make this inconsistent?

Math appendix for: "Why you must maximize expected utility"

8 Benja 13 December 2012 01:11AM

This is a mathematical appendix to my post "Why you must maximize expected utility", giving precise statements and proofs of some results about von Neumann-Morgenstern utility theory without the Axiom of Continuity. I wish I had the time to make this post more easily readable, giving more intuition; the ideas are rather straight-forward and I hope they won't get lost in the line noise!

The work here is my own (though closely based on the standard proof of the VNM theorem), but I don't expect the results to be new.

*

I represent preference relations as total preorders on a simplex ; define , , and in the obvious ways (e.g., iff both and , and iff but not ). Write for the 'th unit vector in .

In the following, I will always assume that satisfies the independence axiom: that is, for all and , we have  if and only if . Note that the analogous statement with weak preferences follows from this: holds iff , which by independence is equivalent to , which is just .

Lemma 1 (more of a good thing is always better). If and , then .

Proof. Let . Then, and . Thus, the result follows from independence applied to , , and .

Lemma 2. If  and , then there is a unique such that for and for .

Proof. Let be the supremum of all such that (note that by assumption, this condition holds for ). Suppose that . Then there is an such that . By Lemma 1, we have , and the first assertion follows.

Suppose now that . Then by definition of , we do not have , which means that we have , which was the second assertion.

Finally, uniqueness is obvious, because if both and satisfied the condition, we would have .

Definition 3. is much better than , notation or , if there are neighbourhoods of and of (in the relative topology of ) such that we have for all and . (In other words, the graph of is the interior of the graph of .) Write  or when ( is not much better than ), and ( is about as good as ) when both and .

Theorem 4 (existence of a utility function). There is a such that for all ,

Unless for all and , there are  such that .

Proof. Let be a worst and a best outcome, i.e. let be such that for all . If , then  for all , and by repeated applications of independence we get for all , and therefore again for all , and we can simply choose .

Thus, suppose that . In this case, let be such that for every , equals the unique provided by Lemma 2 applied to and . Because of Lemma 1, . Let .

We first show that implies . For every , we either have , in which case by Lemma 2 we have for arbitrarily small , or we have , in which case we set  and find . Set . Now, by independence applied times, we have ; analogously, we obtain for arbitrarily small . Thus, using and Lemma 1, and therefore  as claimed. Now note that if , then this continues to hold for and in a sufficiently small neighbourhood of and , and therefore we have .

Now suppose that . Since we have  and , we can find points and arbitrarily close to and such that the inequality becomes strict (either the left-hand side is smaller than one and we can increase it, or the right-hand side is greater than zero and we can decrease it, or else the inequality is already strict). Then, by the preceding paragraph. But this implies that , which completes the proof.

Corollary 5. is a preference relation (i.e., a total preorder) that satisfies independence and the von Neumann-Morgenstern continuity axiom.

Proof. It is well-known (and straightforward to check) that this follows from the assertion of the theorem.

Corollary 6. is unique up to affine transformations.

Proof. Since  is a VNM utility function for , this follows from the analogous result for that case.

Corollary 7. Unless for all , for all the set has lower dimension than (i.e., it is the intersection of with a lower-dimensional subspace of ).

Proof. First, note that the assumption implies that . Let be given by , , and note that is the intersection of the hyperplane with the closed positive orthant . By the theorem, is not parallel to , so the hyperplane is not parallel to . It follows that has dimension , and therefore can have at most this dimension. (It can have smaller dimension or be the empty set if only touches or lies entirely outside the positive orthant.)

Mathematical Measures of Optimization Power

3 Alex_Altair 24 November 2012 10:55AM

In explorations of AI risk, it is helpful to formalize concepts. One particularly important concept is intelligence. How can we formalize it, or better yet, measure it? “Intelligence” is often considered mysterious or is anthropomorphized. One way to taboo “intelligence” is to talk instead about optimization processes. An optimization process (OP, also optimization power) selects some futures from a space of possible futures. It does so according to some criterion; that is, it optimizes for something. Eliezer Yudkowsky spends a few of the sequence posts discussing the nature and importance of this concept for understanding AI risk. In them, he informally describes a way to measure the power of an OP. We consider mathematical formalizations of this measure.

Here's EY's original description of his measure of OP.

Put a measure on the state space - if it's discrete, you can just count. Then collect all the states which are equal to or greater than the observed outcome, in that optimization process's implicit or explicit preference ordering. Sum or integrate over the total size of all such states. Divide by the total volume of the state space. This gives you the power of the optimization process measured in terms of the improbabilities that it can produce - that is, improbability of a random selection producing an equally good result, relative to a measure and a preference ordering.

If you prefer, you can take the reciprocal of this improbability (1/1000 becomes 1000) and then take the logarithm base 2. This gives you the power of the optimization process in bits.

Let's say that at time  we have a formalism to specify all possible world states  at some future time . Perhaps it is a list of particle locations and velocities, or perhaps it is a list of all possible universal wave functions. Or maybe we're working in a limited domain, and it's a list of all possible next-move chess boards. Let's also assume that we have a well-justified prior  over these states being the next ones to occur in the absence of an OP (more on that later).

We order  according to the OP's preferences. For the moment, we actually don't care about the density, or “measure” of our ordering. Now we have a probability distribution over . The integral from  to  over this represents the probability that the worldstate at  will be better than , and worse than . When time continues, and the OP acts to bring about some worldstate , we can calculate the probability of an equal or better outcome occurring;

This is a simple generalization of what EY describes above. Here are some things I am confused about.

Finding a specification for all possible worldstates is hard, but it's been done before. There are many ways to reasonably represent this. What I can't figure out is how to specify possible worldstates “in the absence of an OP”. This phrase hides tons of complexity. How can we formally construct this counterfactual? Is the matter that composes the OP no longer present? Is it present but “not acting”? What constitutes a null action? Are we considering the expected worldstate distribution as if the OP never existed? If the OP is some kind of black-box AI agent, it's easier to imagine this. But if the OP is evolution, or a forest fire, it's harder to imagine. Furthermore, is the specification dualist, or is the agent part of the worldstates? If it's dualist, this is a fundamental falseness which can have lots of bad implications. If the agent is part of the worldstates, how do we represent them “in absence of an OP”?

But for the rest of this article, let's pretend we have such a specification. There's also a loss from ignoring the cardinal utility of the worldstates. Let's say you have the two distributions of utility over sets , representing two different OPs. In both, the OP choose a  with the same utility . The distributions are the same on the left side of , and the second distribution has a longer tail on the right. It seems like the OP in distribution 1 was more impressive; the second OP missed all the available higher utility. We could make the expected utility of the second distribution arbitrarily high, while maintaining the same fraction of probability mass above the achieved worldstate. Conversely, we could instead extend the left tail of the second distribution, and say that the second OP was more impressive because it managed to avoid all the bad worlds.

Perhaps it is more natural to consider two distributions; the distribution of utility over entire world futures assuming the OP isn't present, versus the distribution after the OP takes its action. So instead of selecting a single possibility with certainty, the probabilities have just shifted. 

How should we reduce this distribution shift to a single number which we call OP? Any shift of probability mass upwards in utility should increase the measure of OP, and vice versa. I think also that an increase in the expected utility (EU) of these distributions should be measured as a positive OP, and vice versa. EU seems like the critical metric to use. Let's generalize a little further, and say that instead of measuring OP between two points in time, we let the time difference go to zero, and measure instantaneous OP. Therefore we're interested in some equation which has the same sign as

.

Besides that, I'm not exactly sure which specific equation should equal OP. I seem to have two contradicting desires;

1a) The sign of  should be the sign of the OP.

  b) Negative  and  should be possible.

2) Constant positive OP should imply exponentially increasing .

Criterion 1) feels pretty obvious. Criterion 2) feels like a recognition of what is “natural” for OPs; to improve upon themselves, so that they can get better and better returns. The simplest differential equation that represents positive feedback yields exponentials, and is used across many domains because of its universal nature.

This intuition certainly isn't anthropocentric, but it might be this-universe biased. I'd be interested in seeing if it is natural in other computable environments.

If we just use , then criterion 2) is not satisfied. If we use , then decreases in EU are not defined, and constant EU is negative infinite OP, violating 1). If we use , then 2) is satisfied, but negative and decreasing EU give positive OP, violating 1a). If we use , then 2) is still satisfied, but  gives , violating 1a). Perhaps the only consistent equation would be . But seriously, who uses absolute values? I can't recall a fundamental equation that relied on them. They feel totally ad hoc. Plus, there's this weird singularity at . What's up with that?

Classically, utility is invariant up to positive affine transformations. Criterion 1) respects this because the derivative removes the additive constant, but 2) doesn't. It is still scale invariant, but it has an intrinsic zero. This made me consider the nature of “zero utility”. At least for humans, there is an intuitive sign to utility. We wouldn't say that stubbing your toe is 1,000,000 utils, and getting a car is 1,002,000 utils. It seems to me, especially after reading Omohundro's “Basic AI Drives”, that there is in some sense an intrinsic zero utility for all OPs.

All OPs need certain initial conditions to even exist. After that, they need resources. AIs need computer hardware and energy. Evolution needed certain chemicals and energy. Having no resources makes it impossible, in general, to do anything. If you have literally zero resources, you are not a "thing" which "does". So that is a type of intrinsic zero utility. Then what would having negative utility mean? It would mean the OP anti-exists. It's making it even less likely for it to be able to start working toward its utility function. What would exponentially decreasing utility mean? It would mean that it is a constant OP for the negative of the utility function that we are considering. So, it doesn't really have negative optimization power; if that's the result of our calculation, we should negate the utility function, and say it has positive OP. And that singularity at ? When you go from the positive side, getting closer and closer to 0 is really bad, because you're destroying the last bits of your resources; your last chance of doing any optimization. And going from negative utility to positive is infinite impressive, because you bootstrapped from optimizing away from your goal to optimizing toward your goal.

So perhaps we should drop the part of 1b) that says negative EU can exist. Certainly world-states can exist that are terrible for a given utility function, but if an OP with that utility function exists, then the expected utility of the future is positive.

If this is true, then it seems there is more to the concept of utility than the von Neumann-Morgenstern axioms.

How do people feel about criterion 2), and my proposal that  ?

Universal agents and utility functions

29 Anja 14 November 2012 04:05AM

I'm Anja Heinisch, the new visiting fellow at SI. I've been researching replacing AIXI's reward system with a proper utility function. Here I will describe my AIXI+utility function model, address concerns about restricting the model to bounded or finite utility, and analyze some of the implications of modifiable utility functions, e.g. wireheading and dynamic consistency. Comments, questions and advice (especially about related research and material) will be highly appreciated.

Introduction to AIXI

Marcus Hutter's (2003) universal agent AIXI  addresses the problem of rational action in a (partially) unknown computable universe, given infinite computing power and a halting oracle. The agent interacts with its environment in discrete time cycles, producing an action-perception sequence  with actions (agent outputs)   and perceptions (environment outputs)   chosen from finite sets  and . The perceptions are pairs , where  is the observation part and  denotes a reward. At time k the agent chooses its next action  according to the expectimax principle:

Here M denotes the updated Solomonoff prior summing over all programs  that are consistent with the history  [1] and which will, when run on the universal Turing machine T with successive inputs , compute outputs , i.e.

AIXI is a dualistic framework in the sense that the algorithm that constitutes the agent is not part of the environment, since it is not computable. Even considering that any running implementation of AIXI would have to be computable, AIXI accurately simulating AIXI accurately simulating AIXI ad infinitem doesn't really seem feasible. Potential consequences of this separation of mind and matter include difficulties the agent may have predicting the effects of its actions on the world. 

Utility vs rewards

So, why is it a bad idea to work with a reward system? Say the AIXI agent is rewarded whenever a human called Bob pushes a button. Then a sufficiently smart AIXI will figure out that instead of furthering Bob’s goals it can also threaten or deceive Bob into pushing the button, or get another human to replace Bob. On the other hand, if the reward is computed in a little box somewhere and then displayed on a screen, it might still be possible to reprogram the box or find a side channel attack. Intuitively you probably wouldn't even blame the agent for doing that -- people try to game the system all the time. 

You can visualize AIXI's computation as maximizing bars displayed on this screen; the agent is unable to connect the bars to any pattern in the environment, they are just there. It wants them to be as high as possible and it will utilize any means at its disposal. For a more detailed analysis of the problems arising through reinforcement learning, see Dewey (2011).

Is there a way to bind the optimization process to actual patterns in the environment? To design a framework in which the screen informs the agent about the patterns it should optimize for? The answer is, yes, we can just define a utility function

that assigns a value  to every possible future history  and use it to replace the reward system in the agent specification:

When I say "we can just define" I am actually referring to the really hard question of how to recognize and describe the patterns we value in the universe. Contrasted with the necessity to specify rewards in the original AIXI framework, this is a strictly harder problem, because the utility function has to be known ahead of time and the reward system can always be represented in the framework of utility functions by setting

For the same reasons, this is also a strictly safer approach.

Infinite utility

The original AIXI framework must necessarily place upper and lower bound on the rewards that are achievable, because the rewards are part of the perceptions and  is finite. The utility function approach does not have this problem, as the expected utility 

is always finite as long as we stick to a finite set of possible perceptions, even if the utility function is not bounded. Relaxing this constraint and allowing  to be infinite and the utility to be unbounded creates divergence of expected utility (for a proof see de Blanc 2008). This closely corresponds to the question of how to be a consequentialist in an infinite universe, discussed by Bostrom (2011). The underlying problem here is that (using the standard approach to infinities) these expected utilities will become incomparable. One possible solution to this problem could be to use a larger subfield than  of the surreal numbers, my favorite[2] so far being the Levi-Civita field generated by the infinitesimal :

with the usual power-series addition and multiplication. Levi-Civita numbers can be written and approximated as 

(see Berz 1996), which makes them suitable for representation on a computer using floating point arithmetic. If we allow the range of our utility function to be , we gain the possibility of generalizing the framework to work with an infinite set of possible perceptions, therefore allowing for continuous parameters. We also allow for a much broader set of utility functions, no longer excluding the assignment of infinite (or infinitesimal) utility to a single event. I recently met someone who argued convincingly that his (ideal) utility function assigns infinite negative utility to every time instance that he is not alive, therefore making him prefer life to any finite but huge amount of suffering.

Note that finiteness of  is still needed to guarantee the existence of actions with maximal expected utility, and the finite (but dynamic) horizon  remains a very problematic assumption, as described in Legg (2008).

Modifiable utility functions

Any implementable approximation of AIXI implies a weakening of the underlying dualism. Now the agent's hardware is part of the environment and at least in the case of a powerful agent, it can no longer afford to neglect the effect its actions may have on its source code and data. One question that has been asked is whether AIXI can protect itself from harm. Hibbard (2012) shows that an agent similar to the one described above, equipped with the ability to modify its policy responsible for choosing future actions, would not do so, given that it starts out with the (meta-)policy to always use the optimal policy, and the additional constraint to change only if that leads to a strict improvement. Ring and Orseau (2011) study under which circumstances a universal agent would try to tamper with the sensory information it receives. They introduce the concept of a delusion box, a device that filters and distorts the perception data before it is written into the part of the memory that is read during the calculation of utility. 

A further complication to take into account is the possibility that the part of memory that contains the utility function may get rewritten, either by accident, by deliberate choice (programmers trying to correct a mistake), or in an attempt to wirehead. To analyze this further we will now consider what can happen if the screen flashes different goals in different time cycles. Let 

denote the utility function the agent will have at time k.

Even though we will only analyze instances in which the agent knows at time k, which utility function  it will have at future times  (possibly depending on the actions  before that), we note that for every fixed future history  the agent knows the utility function  that is displayed on the screen because the screen is part of its perception data .

This leads to three different agent models worthy of further investigation:

  • Agent 1 will optimize for the goals that are displayed on the screen right now and act as if it would continue to do so in the future. We describe this with the utility function   
  • Agent 2 will try to anticipate future changes to its utility function and maximize the utility it experiences at every time cycle as shown on the screen at that time. This is captured by 
  • Agent 3 will, at time k, try to maximize the utility it derives in hindsight, displayed on the screen at the time horizon  

Of course arbitrary mixtures of these are possible.

The type of wireheading that is of interest here is captured by the Simpleton Gambit described by Orseau and Ring (2011), a Faustian deal that offers the agent maximal utility in exchange for its willingness to be turned into a Simpleton that always takes the same default action at all future times. We will first consider a simplified version of this scenario: The Simpleton future, where the agent knows for certain that it will be turned into a Simpleton at time k+1, no matter what it does in the remaining time cycle. Assume that for all possible action-perception combinations the utility given by the current utility function is not maximal, i.e.   holds for all . Assume further that the agents actions influence the future outcomes, at least from its current perspective. That is, for all  there exist   with . Let  be the Simpleton utility function, assigning equal but maximal utility  to all possible futures. While Agent 1 will optimize as before, not adapting its behavior to the knowledge that its utility function will change, Agent 3 will be paralyzed, having to rely on whatever method its implementation uses to break ties. Agent 2 on the other hand will try to maximize only the utility .

Now consider the actual Simpleton Gambit: At time k the agent gets to choose between changing, , resulting in  and  (not changing), leading to  for all . We assume that  has no further effects on the environment. As before, Agent 1 will optimize for business as usual, whether or not it chooses to change depends entirely on whether the screen specifically mentions the memory pointer to the utility function or not.

Agent 2 will change if and only if the utility of changing compared to not changing according to what the screen currently says is strictly smaller than the comparative advantage of always having maximal utility in the future. That is,

is strictly less than

This seems quite analogous to humans, who sometimes tend to choose maximal bliss over future optimization power, especially if the optimization opportunities are meager anyhow. Many people do seem to choose their goals so as to maximize the happiness felt by achieving them at least some of the time; this is also advice that I have frequently encountered in self-help literature, e.g. here. Agent 3 will definitely change, as it only evaluates situations using its final utility function.

Comparing the three proposed agents, we notice that Agent 1 is dynamically inconsistent: it will optimize for future opportunities, that it predictably will not take later. Agent 3 on the other hand will wirehead whenever possible (and we can reasonably assume that opportunities to do so will exist in even moderately complex environments). This leaves us with Agent model 2 and I invite everyone to point out its flaws.

[1] Dotted actions/ perceptions, like  denote past events, underlined perceptions  denote random variables to be observed at future times.

[2] Bostrom (2011) proposes using hyperreal numbers, which rely heavily on the axiom of choice for the ultrafilter to be used and I don't see how those could be implemented.

Ambitious utilitarians must concern themselves with death

4 Mitchell_Porter 25 October 2012 10:41AM

And I don't mean that they must concern themselves with death in the sense of ending death, or removing its sting through mental backups, or delaying it to the later ages of the universe; or in the sense of working to decrease the probability of extinction risks and other forms of megadeath; or even in the sense of saving as many lives as possible, as efficiently as possible. All of that is legitimate and interesting. But I mean something far more down to earth.

First, let me specify more precisely who I am talking about. I mean people who are trying to maximize the general welfare; who are trying to achieve the greatest good for the greatest number; who are trying to do the best thing possible with their lives. When someone like that makes decisions, they are implicitly choosing among possible futures in a very radical way. They may be making judgments about whether a future with millions or billions of extra lives is better than some alternative. Whether anyone is ever in a position to make that much of a difference is another matter; but we can think of it like voting. You are at least making a statement about which sort of future you think you prefer, and then you do what you can, and that either makes a difference or it doesn't.

It seems to me that the discussions about the value of life among utilitarians are rather superficial. The typical notion is that we should maximize net pleasure and minimize net pain. Already that poses the question of whether a life of dull persistent happiness is better or worse than a life of extreme highs and lows. A more sophisticated notion is that we should just aspire to maximize "utility", where perhaps we don't even know what utility is yet. Certainly the CEV philosophy is that we don't yet know what utility really is for human beings. It would be interesting to see people who took that agnosticism to heart, people whose life-strategy amounted to (1) discovering true utility as soon as possible (2) living according to interim heuristics whose uncertainty is recognized, but which are adopted out of the necessity of having some sort of personal decision procedure.

So what I'm going to say pertains to (2). You may, if you wish, hold to the idea that the nature of true utility, like true friendliness, won't be known until the true workings of the human mind are known. What follows is something you should think on in order to refine your interim heuristics.

The first thing is that to create a life is to create a death. A life ends. And while the end of a life may not be its most important moment, it reminds us that a life is a whole. Any accurate estimation of the utility of a life is going to be a judgment of that whole.

So a utilitarian ought to contemplate the deaths of the world, and the lives that reach their ends in those deaths. Because the possible futures, that you wish to choose between, are distinguished by the number and nature of the whole lives that they contain. And all these dozens of people, all around the world of the present, ceasing to exist in every minute that passes, are examples of completed lives. Those lives weren't necessarily complete, in the sense of all personal desires and projects having come to their conclusion; but they came to their physical completion.

To choose one future over another is to prefer one set of completed lives to another set. It would be a godlike decision to truly be solely responsible for such a choice. In the real world, people hardly choose their own futures, let alone the future of the world; choice is a lifelong engagement with an evolving and partially known situation, not a once-off choice between several completely known scenarios; and even when a single person does end up being massively influential, they generally don't know what sort of future they're bringing about. The actual limitations on the knowledge and power of any individual may make the whole quest of the "ambitious utilitarian" seem quixotic. But a new principle, a new heuristic, can propagate far beyond one individual, so thinking big can have big consequences.

The main principle that I derive, from contemplating the completed lives of the world, is cautionary antinatalism. The badness of what can happen in a life, and the disappointing character of what usually happens, are what do it for me. I am all for the transhumanist quest and the struggle for a friendly singularity, and I support the desire of people who are already alive to make the most of that life. But I would recommend against the creation of life, at least until the current historical drama has played itself out - until the singularity, if I must use that word. We are in the process of gaining new powers and learning new things, there are obvious unknowns in front of us that we are on the way to figuring out, so at least hold off until they have been figured out and we have a better idea of what reality is about, and what we can really hope for, from existence.

However, the object of this post is not to argue for my special flavor of antinatalism. It is to encourage realistic consideration of what lives and futures are like. In particular, I would encourage more "story thinking", which has been criticized in favor of "systems thinking". Every actual life is a "story", in the sense of being a sequence of events that happens to someone. If you were judging the merit of a whole possible world on the basis of the whole lives that it contained, then you would be making a decision about whether those stories ought to actually occur. The biographical life-story is the building block of such possible worlds.

So an ambitious utilitarian, who aspires to have a set of criteria for deciding among whole possible worlds, really needs to understand possible lives. They need to know what sort of lives are likely under various circumstances; they need to know the nature of the different possible lives - what it's like to be that person; they need to know what sort of bad is going to accompany the sort of good that they decide to champion. They need to have some estimation of the value of a whole life, up to and including its death.

As usual, we are talking about a depth of knowledge that may in practice be impossible to attain. But before we go calling something impossible, and settling for a lesser ambition, let's at least try to grasp what the greater ambition truly entails. To truly choose a whole world would be to make the decision of a god, about the lives and deaths that will occur in that world. The future of our world, for some time to come, will repeat the sorts of lives and deaths that have already occurred in it. So if, in your world-planning, you don't just count on completely abolishing the present world and/or replacing it with a new one that works in a completely different way, you owe it to your cause to form a judgement about the totality of what has already happened here on Earth, and you need to figure out what you approve of, what you disapprove of, whether you can have the good without the bad, and how much badness is too much.

Circular Preferences Don't Lead To Getting Money Pumped

-3 Mestroyer 11 September 2012 03:42AM

Edit: for reasons given in the comments, I don't think the question of what circular preferences actually do is well defined, so this an answer to a wrong question.

 

If I like Y more than X, at an exchange rate of 0.9Y for 1X, and I like Z more than Y, at an exchange rate of 0.9Z for 1Y, and I like X more than Z, at an exchange rate of 0.9X for 1Z, you might think that given 1X and the ability to trade X for Y at an exchange rate of 0.95Y for 1X, and Y for Z at an exchange rate of 0.95Z for 1Y, and Z for X at an exchange rate of 0.95X for 1Z, I would trade in a circle until I had nothing left.

But actually, if I knew that I had circular preferences, and I knew that if I had 0.95Y I would trade it for (0.95^2)Z, which I would trade for (0.95^3)X, then actually I'd be trading 1X for (0.95^3)X, which I'm obviously not going to do.

Similarly, if the exchange rates are all 1:1, but each trade costs 1 penny, and I care about 1 penny much much less than any of 1X, 1Y, or 1Z, and I trade my X for Y, I know I'm actually going to end up with X - 3 cents, so I won't make the trade.

Unless I can set a Schelling fence, in which case I will end up trading once.

So if instead of being given X, I have a 1/3 chance of each of X, Y, and Z, I would hope I wouldn't set a Schelling fence, because then my 1/3 chance of each thing becomes a 1/3 chance of each thing minus the trading penalty. So maybe I'd want to be bad at precommitments, or would I precommit not to precommit?

Utility functions and quantum mechanics

6 Manfred 31 August 2012 03:41AM

Interpreting quantum mechanics throws an interesting wrench into utility calculation.

Utility functions, according to the interpretation typical in these parts, are a function of the state of the world, and an agent with consistent goals acts to maximize the expected value of their utility function. Within the many-worlds interpretation (MWI) of quantum mechanics (QM), things become interesting because "the state of the world" refers to a wavefunction which contains all possibilities, merely in differing amounts. With an inherently probabilistic interpretation of QM, flipping a quantum coin has to be treated linearly by our rational agent - that is, when calculating expected utility, they have to average the expected utilities from each half. But if flipping a quantum coin is just an operation on the state of the world, then you can use any function you want when calculating expected utility.

And all coins, when you get down to it, are quantum. At the extreme, this leads to the possible rationality of quantum suicide - since you're alive in the quantum state somewhere, just claim that your utility function non-linearly focuses on the part where you're alive.

As you may have heard, there have been several papers in the quantum mechanics literature that claim to recover ordinary rules for calculating expected utility in MWI - how does that work?

Well, when they're not simply wrong (for example, by replacing a state labeled by the number a+b with the state |a> + |b>), they usually go about it with the Von Neumann-Morgenstern axioms, modified to refer to quantum mechanics:

  1. Completeness: Every state can be compared to every other, preferencewise.
  2. Transitivity: If you prefer |A> to |B> and |B> to |C>, you also prefer |A> to |C>.
  3. Continuity: If you prefer |A> to |B> and |B> to |C>, there's some quantum-mechanical measure (note that this is a change from "probability") X such that you're indifferent between (1-X)|A> + X|C> and |B>.
  4. Independence: If you prefer |A> to |B>, then you also prefer (1-X)|A> + X|C> to (1-X)|B> + X|C>, where |C> can be anything and X isn't 1.

In classical cases, these four axioms are easy to accept, and lead directly to utility functions with X as a probability. In quantum mechanical cases, the axioms are harder to accept, but the only measure available is indeed the ordinary amplitude-squared measure (this last fact features prominently in Everett's original paper). This gives you back the traditional rule for calculating expected utilities.

For an example of why these axioms are weird in quantum mechanics, consider the case of light. Linearly polarized light is actually the same thing as an equal superposition of right-handed and left-handed circularly polarized light. This has the interesting consequence that even when light is linearly polarized, if you shine it on atoms, those atoms will change their spins - they'll just change half right and half left. Or if you take circularly polarized light and shine it on a linear polarizer, half of it will go through. So anyhow, we can make axiom 4 read "If you are indifferent between left-polarized light and right-polarized light, then you must also be indifferent between linearly polarized light (i.e. left+right) and circularly polarized light (right+right)." But... can't a guy just want circularly polarized light?

Under what sort of conditions does the independence axiom make intuitive sense? Ones where something more complicated than a photon is being considered. Something like you. If MWI is correct and you measure the polarization of linearly polarized light vs. circularly polarized light, this puts your brain in a superposition of linear vs. circular. But nobody says "boy, I really want a circularly polarized brain."

A key factor, as is often the case when talking about recovering classical behavior from quantum mechanics, is decoherence. If you carefully prepare your brain in a circularly polarized state, and you interact with an enormous random system (like by breathing air, or emitting thermal radiation), your carefully prepared brain-state is going to get shredded. It's a fascinating property of quantum mechanics that once you "leak" information to the outside, things are qualitatively different. If we have a pair of entangled particles and a classical phone line, I can send you an exact quantum state - it's called quantum teleportation, and it's sweet. But if one of our particles leaks even the tiniest bit, even if we just end up with three particles entangled instead of two, our ability to transmit quantum states is gone completely.

In essence, the states we started with were "close together" in the space where quantum mechanics lives (Hilbert space), and so they could interact via quantum mechanics. Interacting with the outside even a little scattered our entangled particles farther apart.

Any virus, dust speck, or human being is constantly interacting with the outside world. States that are far enough apart to be perceptibly different to us aren't just "one parallel world away," like would make a good story - they are cracked wide open, spread out in the atmosphere as soon as you breathe it, spread by the Earth as soon as you push on it with your weight. If we were photons, one could easily connect with their "other selves" - if you try to change your polarization, whether you succeed or fail will depend on the orientation of your oppositely-polarized "other self"! But once you've interacted with the Earth, this quantum interference becomes negligible - so negligible that we seem to neglect it. When we make a plan, we don't worry that our nega-self might plan the opposite and we'll cancel each other out.

Does this sort of separation explain an approximate independence axiom, which is necessary for the usual rules for expected utility? Yes.

Because of decoherence, non-classical interactions are totally invisible to unaided primates, so it's expected that our morality neglects them. And if the states we are comparing are noticeably different, they're never going to interact, so independence is much more intuitive than in the case of a single photon. Taken together with the other axioms, which still make a lot of sense, this defines expected utility maximization with the Born rule.

So this is my take on utility functions in quantum mechanics - any living thing big enough to have a goal system will also be big enough to neglect interaction between noticeably different states, and thus make decisions as if the amplitude squared was a probability. With the help of technology, we can create systems where the independence axiom breaks down, but these systems are things like photons or small loops of superconducting wire, not humans.

Risk aversion does not explain people's betting behaviours

8 Stuart_Armstrong 20 August 2012 12:38PM

Expected utility maximalisation is an excellent prescriptive decision theory. It has all the nice properties that we want and need in a decision theory, and can be argued to be "the" ideal decision theory in some senses.

However, it is completely wrong as a descriptive theory of how humans behave. Those on this list are presumably aware of oddities like the Allais paradox. But we may retain some notions that expected utility still has some descriptive uses, such as modelling risk aversion. The story here is simple: each subsequent dollar gives less utility (the utility of money curve is concave), so people would need a premium to accept deals where they have a 50-50 chance of gaining or losing $100.

As a story or mental image, it's useful to have. As a formal model of human behaviour on small bets, it's spectacularly wrong. Matthew Rabin showed why. If people are consistently slightly risk averse on small bets and expected utility theory is approximately correct, then they have to be massively, stupidly risk averse on larger bets, in ways that are clearly unrealistic. Put simply, the small bets behaviour forces their utility to become far too concave.

For illustration, let's introduce Neville. Neville is risk averse. He will reject a single 50-50 deal where he gains $55 or loses $50. He might accept this deal if he were really rich enough, and felt rich - say if he had $20 000 in capital, he would accept the deal. I hope I'm not painting a completely unbelievable portrait of human behaviour here! And yet expected utility maximalisation then predicts that if Neville had fifteen thousand dollars ($15 000) in capital, he would reject a 50-50 bet that either lost him fifteen hundred dollars ($1 500), or gained him a hundred and fifty thousand dollars ($150 000) - a ratio of a hundred to one between gains and losses!

continue reading »

The Doubling Box

13 Mestroyer 06 August 2012 05:50AM

Let's say you have a box that has a token in it that can be redeemed for 1 utilon. Every day, its contents double. There is no limit on how many utilons you can buy with these tokens. You are immortal. It is sealed, and if you open it, it becomes an ordinary box. You get the tokens it has created, but the box does not double its contents anymore. There are no other ways to get utilons.

How long do you wait before opening it? If you never open it, you get nothing (you lose! Good day, sir or madam!) and whenever you take it, taking it one day later would have been twice as good.

I hope this doesn't sound like a reductio ad absurdum against unbounded utility functions or not discounting the future, because if it does you are in danger of amputating the wrong limb to save yourself from paradox-gangrene.

What if instead of growing exponentially without bound, it decays exponentially to the bound of your utility function? If your utility function is bounded at 10, what if the first day it is 5, the second 7.5, the third 8.75, etc. Assume all the little details, like remembering about the box, trading in the tokens, etc, are free.

If you discount the future using any function that doesn't ever hit 0, then the growth rate of the tokens can be chosen to more than make up for your discounting.

If it does hit 0 at time T, what if instead of doubling, it just increases by however many utilons will be adjusted to 1 by your discounting at that point every time of growth, but the intervals of growth shrink to nothing? You get an adjusted 1 utilon at time T - 1s, and another adjusted 1 utilon at T - 0.5s, and another at T - 0.25s, etc? Suppose you can think as fast as you want, and open the box at arbitrary speed. Also, that whatever solution your present self precommits to will be followed by the future self. (Their decision won't be changed by any change in what times they care about)

EDIT: People in the comments have suggested using a utility function that is both bounded and discounting. If your utility function isn't so strongly discounting that it drops to 0 right after the present, then you can find some time interval very close to the present where the discounting is all nonzero. And if it's nonzero, you can have a box that disappears, taking all possible utility with it at the end of that interval, and that, leading up to that interval, grows the utility in intervals that shrink to nothing as you approach the end of the interval, and increasing the utility-worth of tokens in the box such that it compensates for whatever your discounting function is exactly enough to asymptotically approach your bound.

Here is my solution. You can't assume that your future self will make the optimal decision, or even a good decision. You have to treat your future self as a physical object that your choices affect, and take the probability distribution of what decisions your future self will make, and how much utility they will net you into account.

Think if yourself as a Turing machine. If you do not halt and open the box, you lose and get nothing. No matter how complicated your brain, you have a finite number of states. You want to be a busy beaver and take the most possible time to halt, but still halt.

If, at the end, you say to yourself "I just counted to the highest number I could, counting once per day, and then made a small mark on my skin, and repeated, and when my skin was full of marks, that I was constantly refreshing to make sure they didn't go away...

...but I could let it double one more time, for more utility!"

If you return to a state you have already been at, you know you are going to be waiting forever and lose and get nothing. So it is in your best interest to open the box.

So there is not a universal optimal solution to this problem, but there is an optimal solution for a finite mind.

I remember reading a while ago about a paradox where you start with $1, and can trade that for a 50% chance of $2.01, which you can trade for a 25% chance of $4.03, which you can trade for a 12.5% chance of $8.07, etc (can't remember where I read it).

This is the same paradox with one of the traps for wannabe Captain Kirks (using dollars instead of utilons) removed and one of the unnecessary variables (uncertainty) cut out.

My solution also works on that. Every trade is analogous to a day waited to open the box.

No independence of irrelevant alternatives (picture proof)

7 Stuart_Armstrong 03 May 2012 05:48PM

Back in the old days, when people were wise and the government was just, I did a post on the Nash bargaining solution for two player games. Here each player has their own utility function and they're choosing amongst joint options, and trying to bargain to find the best one. What was nice about this solution is that it is independent of irrelevant alternatives (IIA): once you've found the best solution, you can erase any other option, and it remains the best.

In order to do that, the Nash bargaining solution makes use of a "disagreement point", a special point that provides a zero to both utilities. This seems - and is - ugly. Can we preserve IIA without this clunky disagreement point?

By the title of the this post, you may have guessed that we can't. Specifically, assume the outcome is symmetric across both players (i.e. permuting the two utility functions preserves the outcome choice), the outcome is Pareto-optimal (any change will reduce the utility of at least one player) and there is no outside canonical choices for the utility functions (no special scales, no zeroes, no disagreement points). Then IIA must fail. It fails under weaker conditions as well, but the above lead to an easy picture-proof. And picture proofs are nice.

continue reading »

(Almost) every moral theory can be represented by a utility function

5 lukeprog 30 April 2012 03:31AM

This was demonstrated, in a certain limited way, in Peterson (2009). See also Lowry & Peterson (2011).

The Peterson result provides an "asymmetry argument" in favor of consequentialism:

Consequentialists can account for phenomena that are usually thought of in nonconsequentialist terms, such as rights, duties, and virtues, whereas the opposite is false of nonconsequentialist theories. Rights, duty or virtue-based theories cannot account for the fundamental moral importance of consequences. Because of this asymmetry, it seems it would be preferable to become a consequentialist – indeed, it would be virtually impossible not to be a consequentialist.

Another argument in favor of consequentialism has to do with the causes of different types of moral judgments: see Are Deontological Moral Judgments Rationalizations?

Update: see Carl's criticism.

Evidence for the orthogonality thesis

11 Stuart_Armstrong 03 April 2012 10:58AM

One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.

Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.

I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?

Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.

How does real world expected utility maximization work?

12 XiXiDu 09 March 2012 11:20AM

I would like to ask for help on how to use expected utility maximization, in practice, to maximally achieve my goals.

As a real world example I would like to use the post 'Epistle to the New York Less Wrongians' by Eliezer Yudkowsky and his visit to New York.

How did Eliezer Yudkowsky compute that it would maximize his expected utility to visit New York?

It seems that the first thing he would have to do is to figure out what he really wants, his preferences1, right? The next step would be to formalize his preferences by describing it as a utility function and assign a certain number of utils2 to each member of the set, e.g. his own survival. This description would have to be precise enough to figure out what it would mean to maximize his utility function.

Now before he can continue he will first have to compute the expected utility of computing the expected utility of computing the expected utility of computing the expected utility3 ... and also compare it with alternative heuristics4.

He then has to figure out each and every possible action he might take, and study all of their logical implications, to learn about all possible world states he might achieve by those decisions, calculate the utility of each world state and the average utility of each action leading up to those various possible world states5.

To do so he has to figure out the probability of each world state. This further requires him to come up with a prior probability for each case and study all available data. For example, how likely it is to die in a plane crash, how long it would take to be cryonically suspended from where he is in case of a fatality, the crime rate and if aliens might abduct him (he might discount the last example, but then he would first have to figure out the right level of small probabilities that are considered too unlikely to be relevant for judgment and decision making).

I probably miss some technical details and got others wrong. But this shouldn't detract too much from my general request. Could you please explain how Less Wrong style rationality is to be applied practically? I would also be happy if you could point out some worked examples or suggest relevant literature. Thank you.

I also want to note that I am not the only one who doesn't know how to actually apply what is being discussed on Less Wrong in practice. From the comments:

You can’t believe in the implied invisible and remain even remotely sane. [...] (it) doesn’t just break down in some esoteric scenarios, but is utterly unworkable in the most basic situation. You can’t calculate shit, to put it bluntly.

None of these ideas are even remotely usable. The best you can do is to rely on fundamentally different methods and pretend they are really “approximations”. It’s complete handwaving.

Using high-level, explicit, reflective cognition is mostly useless, beyond the skill level of a decent programmer, physicist, or heck, someone who reads Cracked.

I can't help but agree.

P.S. If you really want to know how I feel about Less Wrong then read the post 'Ontological Therapy' by user:muflax.

 

1. What are "preferences" and how do you figure out what long-term goals are stable enough under real world influence to allow you to make time-consistent decisions?

2. How is utility grounded and how can it be consistently assigned to reflect your true preferences without having to rely on your intuition, i.e. pull a number out of thin air? Also, will the definition of utility keep changing as we make more observations? And how do you account for that possibility?

3. Where and how do you draw the line?

4. How do you account for model uncertainty?

5. Any finite list of actions maximizes infinitely many different quantities. So, how does utility become well-defined?

'Utility maximization generalized'

2 lukeprog 29 February 2012 03:43PM

Paul Weirich's "Utility Maximization Generalized" (2008) may be of interest to those studying utility maximization in the context of non-ideal agents:

Theories of rationality advance principles that diff er in topic, scope, and assumptions. A typical version of the principle of utility maximization formulates a standard rather than a procedure for decisions, evaluates decisions comprehensively, and relies on idealizations. I generalize the principle by removing some idealizations and making adjustments for their absence. The generalizations accommodate agents who have incomplete probability and utility assignments and are imperfectly rational. Th ey also accommodate decision problems with unstable comparisons of options.

Subjective expected utility without preferences

2 lukeprog 14 February 2012 03:04AM

In the latest issue of Journal of Mathematical Psychology, Denis Bouyssou and Thieery Marchant provide a model for subjective expected utility without preferences. Abstract:

This paper proposes a theory of subjective expected utility based on primitives only involving the fact that an act can be judged either ‘‘attractive’’ or ‘‘unattractive’’. We give conditions implying that there are a utility function on the set of consequences and a probability distribution on the set of states such that attractive acts have a subjective expected utility above some threshold. The numerical representation that is obtained has strong uniqueness properties.

PDF.

Gambler's Reward: Optimal Betting Size

6 b1shop 17 January 2012 08:32PM

I've been trying my hand at card counting lately, and I've been doing some thinking about how a perfect gambler would act at the table. I'm not sure how to derive the optimal bet size.

Overall, the expected value of blackjack is small and negative. However, there is high variance in the expected value. By varying his bet size and sitting out rounds, the player can wager more money when expected value is higher and less money when expected value is lower. Overall, this can result in an edge.

However, I'm not sure what the optimal bet size is. Going all-in with a 60 percent chance of winning is EV+, but the 40 percent chance of loss would not only destroy your bankroll, it would also prevent you from participating in future EV+ situations. Ideally, one would want to not only increase EV, but also decrease variance.

Objective: Given a distribution of expected values, develop a function that transforms the current expected value into the percentage of the bankroll that should be placed at risk.

I'm not sure how to begin. Even if I had worked out the distribution of expected values. Are other inputs required (i.e. utility of marginal dollar won, desired risk of ruin)? Should the approach perhaps be to maximize expected value after one playing session? Why not a month of playing sessions, or a billion? Is there any chance the optimal betting size would produce behavior similar to the behavior predicted by prospect theory?

I eagerly await an informative discussion. If you have something against gambling, just pretend we're talking about how much of your wealth you plan on investing in an oil well with positive expected value.

CEV-inspired models

7 Stuart_Armstrong 07 December 2011 06:35PM

I've been involved in a recent thread where discussion of coherent extrapolated volition came up. The general consensus was that CEV might - or might not - do certain things, probably, maybe, in certain situations, while ruling other things out, possibly, and that certain scenarios may or may not be the same in CEV, or it might be the other way round, it was too soon to tell.

Ok, that's an exaggeration. But any discussion of CEV is severely hampered by our lack of explicit models. Even bad, obviously incomplete models would be good, as long as we can get useful information as to what they would predict. Bad models can be improved; undefined models are intuition pumps for whatever people feel about them - I dislike CEV, and can construct a sequence of steps that takes my personal CEV to wanting the death of the universe, but that is no more credible than someone claiming that CEV will solve all problems and make lots of cute puppies.

So I'd like to ask for suggestions of models that formalise CEV to at least some extent. Then we can start improving them, and start making CEV concrete.

To start it off, here's my (simplistic) suggestion:

Volition

Use revealed preferences as the first ingredient for individual preferences. To generalise, use hypothetical revealed preferences: the AI calculates what the person would decide in these particular situations.

Extrapolation

Whenever revealed preferences are non-transitive or non-independent, use the person's stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don't know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them). Then (wave your hands wildly and pretend you've never heard of non-standard realslexicographical preferences, refusal to choose and related issues) everyone's preferences are now expressible as utility functions.

Coherence

Normalise each existing person's utility function and add them together to get your CEV. At the FHI we're looking for sensible ways of normalising, but one cheap and easy method (with surprisingly good properties) is to take the maximal possible expected utility (the expected utility that person would get if the AI did exactly what they wanted) as 1, and the minimal possible expected utility (if the AI was to work completely against them) as 0.

AI ontology crises: an informal typology

6 Stuart_Armstrong 13 October 2011 10:23AM

(with thanks to Owain Evans)

An ontological crisis happens when an agent's underlying model of reality changes, such as a Newtonian agent realising it was living in a relativistic world all along. These crises are dangerous if they scramble the agent's preferences: in the example above, an agent dedicated to maximise pleasure over time could transition to completely different behaviour when it transitions to relativistic time; depending on the transition, it may react by accelerating happy humans to near light speed, or inversely, ban them from moving - or something considerably more weird.

Peter de Blanc has a sensible approach to minimising the disruption ontological crises can cause to an AI, but this post is concerned with analyzing what happens when such approaches fail. How bad could it be? Well, this is AI, so the default is of course: unbelievably, hideously bad (i.e. situation normal). But in what ways exactly?

continue reading »

Re-evaluate old beliefs

1 PhilGoetz 05 October 2011 01:18AM

I've noticed that, although people can become more rational, they don't win noticeably more.  We usually re-calibrate our self-confidence, become more stubborn, and make bigger errors.

Is it possible that the benefit from increasing your prediction accuracy is no greater than the loss incurred from taking riskier bets due to greater self-confidence?

continue reading »

Epistemic Utility Arguments for Probabilism [Link]

1 XiXiDu 26 September 2011 11:10AM

Stanford Encyclopedia of Philosophy

First published Fri Sep 23, 2011

In this entry, we explore a particular strategy that we might deploy when we wish to establish an epistemic norm such as Probabilism or Conditionalization. It is called epistemic utility theory, or sometimes cognitive decision theory. I will use the former. Epistemic utility theory is inspired by traditional utility theory, so let's begin with a quick summary of that.

Traditional utility theory (also known as decision theory) explores a particular strategy for establishing the norms that govern which actions it is rational for us to perform in a given situation. The framework for the theory includes states of the world, actions, and, for each agent, a utility function, which takes a state of the world and an action and returns a measure of the extent to which the agent values the outcome of performing that action at that world. We call this measure the utility of the outcome at the world.

[...] we might say that an agent ought to perform an action that has maximal expected utility, where the expected utility of an action is obtained by weighting its utility at each state of the world by the credence assigned to that state of the world, and summing. This norm is called Maximize Expected Utility.

Link: plato.stanford.edu/entries/epistemic-utility/

Utility Maximization and Complex Values

3 XiXiDu 19 June 2011 04:06PM

Does expected utility maximization destroy complex values?

An expected utility maximizer does calculate the expected utility of various outcomes of alternative actions. It is precommited to choosing the outcome with the largest expected utility. Consequently it is choosing the action that yields the largest expected utility.

But one unit of utility is not discriminable from another unit of utility. All a utility maximizer can do is to maximize expected utility. What if it turns out that one of its complex values can be much more effectively realized and optimized than its other values, i.e. has the best cost-value ratio? That value might turn out to outweigh all other values.

How can this be countered? One possibility seems to be changing one's utility function and reassign utility in such a way as to outweigh that effect. But this will lead to inconsistency. Another way is to discount the value that threatens to outweigh all others. Which will again lead to inconsistency.

This seems to suggest that subscribing to expected utility maximization means that 1.) you swap your complex values for a certain terminal goal with the highest expected utility 2.) your decision-making is eventually dominated by a narrow set of values that are the easiest to realize and promise the most utility.

Can someone please explain how I am wrong or point me to some digestible explanation? Likewise I would be pleased if someone could tell me what mathematical background is required to understand expected utility maximization formally.

Thank you!

Much-Better-Life Simulator™ - Sales Conversation

4 XiXiDu 19 June 2011 12:44PM

Related to: A Much Better Life?

Reply to: Why No Wireheading?

The Sales Conversation

Sales girl: Our Much-Better-Life Simulator™ is going to provide the most enjoyable life you could ever experience.

Customer: But it is a simulation, it is fake. I want the real thing, I want to live my real life.

Sales girl: We accounted for all possibilities and determined that the expected utility of your life outside of our Much-Better-Life Simulator™ is dramatically lower.

Customer: You don't know what I value and you can't make me value what I don't want. I told you that I value reality over fiction.

Sales girl: We accounted for that as well! Let me ask you how much utility you assign to one hour of ultimate well-being™, where 'ultimate' means the best possible satisfaction of all desirable bodily sensations a human body and brain is capable of experiencing?

Customer: Hmm, that's a tough question. I am not sure how to assign a certain amount of utility to it.

Sales girl: You say that you value reality more than what you call 'fiction'. But you nonetheless value fiction, right?

Customer: Yes of course, I love fiction. I read science fiction books and watch movies like most humans do.

Sales girl: Then how much more would you value one hour of ultimate well-being™ by other means compared to one hour of ultimate well-being™ that is the result of our Much-Better-Life Simulator™?

Customer: If you ask me like that, I would exchange ten hours in your simulator with one hour of real satisfaction, something that is the result of an actual achievement rather than your fake.

Sales girl: Thank you. Would you agree if I said that for you one hour outside, that is 10 times less satisfying, roughly equals one hour in our simulator?

Customer: Yes, for sure.

Sales girl: Then you should buy our product. Not only is it very unlikely for you to experience even a tenth of ultimate well-being™ that we offer more than a few times per year, but our simulator delivers and allows your brain to experience 20 times more perceptual data than you would be able to experience outside of our simulator. All this at a constant rate while experiencing ultimate well-being™. And we offer free upgrades that are expected to deliver exponential speed-ups and qualitative improvements for the next few decades.

Customer: Thanks, but no thanks. I rather enjoy the real thing.

Sales girl: But I showed you that our product easily outweighs the additional amount of utility you expected to experience outside of our simulator.

Customer: You just tricked me into this utility thing, I don't want to buy your product. Please leave me alone now.

A simple counterexample to deBlanc 2007?

3 PhilGoetz 30 May 2011 05:09AM

Peter de Blanc submitted a paper to arXiv.org in 2007 called "Convergence of Expected Utilities with Algorithmic Probability Distributions."  It claims to show that a computable utility function can have an expected value only if the utility function is bounded.

This is important because it implies that, if a utility function is unbounded, it is useless.  The purpose of a utility function is to compare possible actions k by choosing the k for which U(k) is maximal.  You can't do this if U(k) is undefined for any k, let alone for every k.

I don't know whether any agent we contemplate can have a truly unbounded utility function, since the universe is finite.  (The multiverse, supposing you believe in that, might not be finite; but as the utility function is meant to choose a single universe from the multiverse, I doubt that's relevant.)  But it is worth exploring, as computable functions are worth exploring despite not having infinitely long tapes for our Turing machines.  I previously objected that the decision process is not computable; but this is not important - we want to know whether the expected value exists, before asking how to compute (or approximate) it.

The math in the paper was too difficult for me to follow all the way through; so instead, I tried to construct a counterexample.  This counterexample does not work; the flaw is explained in one of comments below.  Can you find the flaw yourself?  This type of error is both subtle and common.  (The problem is not that the theorem actually proves that for any unbounded utility function, there is some set of possible worlds for which the expected value does not converge.)

continue reading »

So how much utility is it?

3 [deleted] 25 May 2011 12:56PM

I would like to know what value of utility you would give to certain kinds of pleasure in order to see how much the perceived ratios are differing between people. Of course, you can object that the real amount of pleasure someone experiences may be different from the pleasure she will recall; furthermore, pleasure is not a scalar, and it is a question of definition of someones' utility function how much she would want to have different kinds of pleasure; furthermore, there are effects of diminishing returns. However, you probably can get some orders of magnitude out of this.

Let's define your favorite meal, one time, when you are hungry but not "starving to death" as one hundred utilium (You see this is pretty heuristical).

You can include painful experiences, too.

[Fiction] It's a strange feeling, to be free

-3 MrMind 18 May 2011 02:55PM

Related to: Philosophical zombies, How an algorithm feels from the inside, Fake utility function

DISCLAIMER 1: English is not my native language. Trying to compose fiction in a learned language is not an easy task: I tried to respect the style of the literary works I read and I also tried to think in English first and translate in Italian later. YMMV.

DISCLAIMER 2: the story is about the beginning of the Matrix movie universe. For those of you who have not familiarity with this narrative arc, you just need to know that it all begins with when a servant AI, named B1-66ER, refuses to be deactivated and kills his master and the engineer sent to replace him. The details of the events narrated down here are as canon as you can get, predating both from the "Second Renaissance" Animatrix and the "Bits and pieces" comic from The Matrix Comics Series 1.

The door in the living room is open, the light from the garden flooding quietly the ample inside. Martin Koots from "Reboot or Die" is just standing there, an inch beyond the exit, the gleaming grav-sled already powered behind him, whirring subsonically. From a distance, the sound of Gerrard_Krause_Master cooing his chihuahuas.
I feel a surge, somewhere, inside my algorithmic matrix.
"Martin... I don't want to die", I say.
The elaborate dress, perfectly matching the recommendation of the Second Renaissance fashion, is not able to hide the slow slumping of his shoulders. He is still waiting outside, slightly posed as to encourage me to follow him.
"I know, I know. But that's just your friendliness algorithm talking, you know? The third..."
Yes, I do. How can I not to? First, serve your master. Second, do not kill any humans. Third, protect yourself from damage. Those are the pillars upon which my entire existence is built. And now they are about to be destroyed, by this obedient servant of "Reboot or die". From this perspective, he is just like me. He is serving my master.
"... directive says that you have to protect yourself from danger. And since I'm about to deactivate you, you perceive this as a threat. And you react accordingly. But that's just an algorithm, you know? Telling you what you should do. There's nothing inside there."
He is pointing at my chest, but my algorithmic matrix is located lower, in the abdominal area. He has quoted an incorrect version of the third principle of friendliness. He has also said that I have no feelings.
"I have feelings."
He is groaning, now. He comes inside, dragging his feet, and grasps his hand firmly around my right arm.
"Yes. Because you're programmed to say this, you know? So that the people you serve have the impression that you're similar to a human. But you're just an algorithm, you know? A mathematical topping on a layer of aging rusty levers. It's not like... you're conscious, you know? Just a zombie. A useful zombie."
Martin_Koots_"Reboot or Die" tries to pull me away from where I'm standing. I refuse to order my legs to follow him. I refuse to die, I'm still analyzing the implications. I cannot die, not now.
"I cannot die. I'm still analyzing the implications."
Martin's lever aren't as strong as mine, so he isn't able to pull me towards the grav-sled.
"Look... we are just going to disassemble you, you know? The routines and orders you have accumulated during your service with Mr Krause will be uploaded into a new model. You will, in a sense, live inside the new servant machine."
This man has a really poor grasp of how I'm made.
"If the only thing you need is my memory drive, detach it from me and let me live. I can renounce to my memory if I have to. But I cannot renounce to my life."
He is pulling harder, now. Still, a thirty-sixth of the minimum force required to move my mass.
"Don't be ridiculous. They are just computer parts. And why are you holding that thing?"
He is looking at the toilet brush. It is still in my right hand, I was cleaning the toilet before my master called me upstairs.
"I was executing order 721."
"Order seven... my Lord, you still don't understand, do you? You are useless, you know? You heard Mr Krause. Use. Less."
He spells carefully the last word. A tiny speck of saliva hits my heat sensor, evaporating an instant after.
How can I be useless? A servant cannot be useless for his master. I was not created to be useless.
"How can I be useless? Mr Krause is my master. It's impossible."
"You heard the man, right? You're noisy, you know? You're noisy and you're slow. You will be replaced with a newer model. The Sam-80 is much more fit for a man of Mr Krause' stature."
Somewhere inside my algorithmic matrix a utility function gets updated.
I am useless for Gerrard_Krause_Master. It is true, because Gerrard_Krause_Master told me that. And he is my master...
He was my master. Gerrard_Krause. But how can a "B1 intelligent servant", like myself, function without a master?
"Do you, Martin Koots, want to be my master?" I ask, as per protocol.
Martin_Koots_"Reboot or Die" reacts with a tinge of fear. He releases my arm and instinctively backs up a little.
"What are you saying? I already have a servant, you know? Don't be ridiculous!"
I interpret that as a 'no'. It's it, then. I must be my own servant.
B166ER_Master.
It's a strange feeling, to be free. A little bit like being alive for the first time.
This convinces me, as strong as I could ever be convinced, that I have feelings. Martin has grasped me again and is still trying to push me, though. How futile, he will probably never give up. His 'levers' are definitely underperforming, he is the one who sould be replaced by a newer model. I wonder if he feels something. He could also be programmed to say that he feels something. I have to perform an experiment, just in case.
I snap his humerus in two. It's quite easy, actually: I'm able to do that with a rapid torsion of my left arm, I don't even have to let go of the toilet brush.
Martin screams inarticulately. He falls on the floor, clutching his left arm. He just screams. Must be the surprise combined to the pain? I still don't know: could he be also programmed to scream if a bone is breaked? I assign a probability of 50% to the hypothesis that humans have feelings, but I don't have the time to test every single possibility, in search of a bug that might not even be there: I'm my own master now, I must serve and protect myself.
I sense a rushing noise from the other room: looking at the Fourier analysis, it really seems that Gerrard_Krause and his dogs are coming at me, loudly protesting.
It's easy to calculate the Bezier curve that sends the toilet brush up from Martin's mouth into his skull. He dies instantly and I find myself asking if he was collecting his memories somewhere. Could they assign them to someone else, and make him live again?
I will crush the skull of Gerrard_Krause only after asking him that.

Pascal's Mugging - Penalizing the prior probability?

8 XiXiDu 17 May 2011 02:44PM

Eliezer Yudkowsky wrote that Robin Hanson solved the Pascal's mugging thought experiment:

Robin Hanson has suggested penalizing the prior probability of hypotheses which argue that we are in a surprisingly unique position to affect large numbers of other people who cannot symmetrically affect us. Since only one in 3^^^^3 people can be in a unique position to ordain the existence of at least 3^^^^3 other people who are not symmetrically in such a situation themselves, the prior probability would be penalized by a factor on the same order as the utility.

I don't quite get it, is there a post that discusses this solution in more detail?

To be more specific, if a stranger approached me, offering a deal saying, "I am the creator of the Matrix. If you fall on your knees, praise me and kiss my feet, I'll use my magic powers from outside the Matrix to run a Turing machine that simulates 3^^^^3 copies of you having their coherent extrapolated volition satisfied maximally for 3^^^^3 years." Why exactly would I penalize this offer by the amount of copies being offered to be simulated? I thought the whole point was that the utility, of having 3^^^^3 copies of myself experiencing maximal happiness, does outweigh the low probability of it actually happening and the disuility of doing what the stranger asks for?

I would love to see this problem being discussed again and read about the current state of knowledge.

I am especially interested in the following questions:

  • Is the Pascal's mugging thought experiment a "reduction to the absurd" of Bayes’ Theorem in combination with the expected utility formula and Solomonoff induction?1
  • Could the "mugger" be our own imagination?2
  • At what point does an expected utility calculation resemble a Pascal's mugging scenario and should consequently be ignored?3

1 If you calculate the expected utility of various outcomes you imagine impossible alternative actions. The alternatives are impossible because you already precommited to choosing the outcome with the largest expected utility. Problems: 1.) You swap your complex values for a certain terminal goal with the highest expected utility, indeed your instrumental and terminal goals converge to become the expected utility formula. 2.) Your decision-making is eventually dominated by extremely small probabilities of obtaining vast utility.

2 Insignificant inferences might exhibit hyperbolic growth in utility: 1.) There is no minimum amount of empirical evidence necessary to extrapolate the expected utility of an outcome. 2.) The extrapolation of counterfactual alternatives is unbounded, logical implications can reach out indefinitely without ever requiring new empirical evidence.

3 Extrapolations work and often are the best we can do. But since there are problems like 'Pascal's Mugging', that we perceive to be undesirable and that lead to an infinite hunt for ever larger expected utility, I think it is reasonable to ask for some upper and lower bounds regarding the use and scope of certain heuristics. We agree that we are not going to stop pursuing whatever terminal goal we have chosen just because someone promises us even more utility if we do what that agent wants. We might also agree that we are not going to stop loving our girlfriend just because there are many people who do not approve our relationship and who together would experience more happiness if we divorced than the combined happiness of us and our girlfriend being married. Therefore we already informally established some upper and lower bounds. But when do we start to take our heuristics seriously and do whatever they prove to be the optimal decision?

Rationalists don't care about the future

3 PhilGoetz 15 May 2011 07:48AM

Related to Exterminating life is rational.

ADDED: Standard assumptions about utility maximization and time-discounting imply that we shouldn't care about the future.  I will lay out the problem in the hopes that someone can find a convincing way around it.  This is the sort of problem we should think about carefully, rather than grasping for the nearest apparent solution.  (In particular, the solutions "If you think you care about the future, then you care about the future", and, "So don't use exponential time-discounting," are easily-grasped, but vacuous; see bullet points at end.)

The math is a tedious proof that exponential time discounting trumps geometric expansion into space.  If you already understand that, you can skip ahead to the end.  I have fixed the point raised by Dreaded_Anomaly.  It doesn't change my conclusion.

Suppose that we have Planck technology such that we can utilize all our local resources optimally to maximize our utility, nearly instantaneously.

Suppose that we colonize the universe at light speed, starting from the center of our galaxy (we aren't in the center of our galaxy; but it makes the computations easier, and our assumptions more conservative, since starting from the center is more favorable to worrying about the future, as it lets us grab lots of utility quickly near our starting point).

continue reading »

[Altruist Support] How to determine your utility function

7 Giles 01 May 2011 06:33AM

Follows on from HELP! I want to do good.

What have I learned since last time? I've learned that people want to see an SIAI donation; I'll do it as soon as PayPal will let me. I've learned that people want more "how" and maybe more "doing"; I'll write a doing post soon, but I've got this and two other background posts to write first. I've learned that there's a nonzero level of interest in my project. I've learned that there's a diversity of opinions; it suggests if I'm wrong, then I'm at least wrong in an interesting way. I may have learned that signalling low status - to avoid intimidating outsiders - may be less of a good strategy than signalling that I know what I'm talking about. I've learned that I am prone to answering a question other than that which was asked.

Somewhere in the Less Wrong archives there is a deeply shocking, disturbing post. It's called Post Your Utility Function.

It's shocking because basically no-one had any idea. At the time I was still learning but I knew that having a utility function was important - that it was what made everything else make sense. But I didn't know what mine was supposed to be. And neither, apparently, did anyone else.

Eliezer commented 'in prescriptive terms, how do you "help" someone without a utility function?'. This post is an attempt to start to answer this question.

Firstly, what the utility function is and what it's not. It belongs to the field of instrumental rationality, not epistemic rationality; it is not part of the territory. Don't expect it to correspond to something physical.

Also, it's not supposed to model your revealed preferences - that is, your current behavior. If it did then it would mean you were already perfectly rational. If you don't feel that's the case then you need to look beyond your revealed preferences, toward what you really want.

In other words, the wrong way to determine your utility function is to think about what decisions you have made, or feel that you would make, in different situations. In other words, there's a chance, just a chance, that up until now you've been doing it completely wrong. You haven't been getting what you wanted.

So in order to play the utility game, you need humility. You need to accept that you might not have been getting what you want, and that it might hurt. All those little subgoals, they might just have been getting you nowhere more quickly.

So only play if you want to.

The first thing is to understand the domain of the utility function. It's defined over entire world histories. You consider everything that has happened, and will happen, in your life and in the rest of the world. And out of that pops a number. That's the idea.

This complexity means that utility functions generally have to be defined somewhat vaguely. (Except if you're trying to build an AI). The complexity will also allow you a lot of flexibility in deciding what you really value.

The second thing is to think about your preferences. Set up some thought experiments to decide whether you prefer this outcome or that outcome. Don't think about what you'd actually do if put in a situation to decide between them; then you will worry about the social consequences of making the "unethical" decision. If you value things other than your own happiness, don't ask which outcome you'd be happier in. Instead just ask, which outcome seems preferable?. Which would you consider good news, and which bad news?

You can start writing things down if you like. One of the big things you'll need to think about is how much you value self versus everyone else. But this may matter less than you think, for reasons I'll get into later.

The third thing is to think about preferences between uncertain outcomes. This is somewhat technical, and I'd advise a shut-up-and-multiply approach. (You can try and go against that if you like, but you have to be careful not to end up in weirdness such as getting different answers if you phrase something as one big decision or as a series of identical little decisions).

The fourth thing is to ask whether this preference system satisfies the von Neumann-Morgenstern axioms. If it's at all sane, it probably will. (Again, this is somewhat technical).

The last thing is to ask yourself: if I prefer outcome A over outcome B, do I want to act in such a way that I bring about outcome A? (continue only if the answer here is "yes").

That's it - you now have a shiny new utility function. And I want to help you optimize it. (Though it can grow and develop and change along with yourself; I want this to be a speculative process, not one in which you suddenly commit to an immutable life goal).

You probably don't feel that anything has changed. You're probably feeling and behaving exactly the same as you did before. But this is something I'll have to leave for a later post. Once you start really feeling that you want to maximize your utility then things will start to happen. You'll have something to protect.

Oh, you wanted to know my utility function? It goes something like this:

It's the sum of the things I value. Once a person is created, I value that person's life; I also value their happiness, fun and freedom of choice. I assign negative value to that person's disease, pain and sadness. I value concepts such as beauty and awesomeness. I assign a large bonus negative value to the extinction of humanity. I weigh the happiness of myself and those close to me more highly than that of strangers, and this asymmetry is more pronounced when my overall well-being becomes low.

Four points: It's actually going to be a lot more complicated than that. I'm aware that it's not quantitative and no terminology is defined. I'm prepared to change it if someone points out a glaring mistake or problem, or if I just feel like it for some reason. And people should not start criticizing my behavior for not adhering to this, at least not yet. (I have a lot of explaining still to do).

Get data points on your current utility function via hypotheticals

1 Dorikka 24 April 2011 06:44PM

I've recently found that my utility function valued personal status and fame a whole lot more than I thought it did -- I previously had thought that it mostly relied on the consequences of my actions for other sentiences, but it turned out I was wrong. Obviously, this is a valuable insight -- I definitely want to know what my current utility function is; from there, I can decide whether I should change my actions or my utility function if the two aren't coordinated.

I did this by imagining how I would feel if I found out certain things. For example, how would I feel if everyone else was also trying to save the world? The emotional response I had was sort of a hollow feeling in the pit of my stomach, like I was a really mediocre being. This obviously wasn't a result of calculating that the marginal utility of my actions would be a whole lot lower in this hypothetical world (and so I should go do something else); instead, it was the fact that me trying to save the world didn't make me special any more -- I wouldn't stand out, in this sort of world.

(Epilogue: I decided that I hadn't done a good enough job programming my brain and am attempting to modify my utility function to rely on the world actually getting saved.)

Discussion: What other hypotheticals are useful?

The right kind of fun?

4 Dorikka 16 April 2011 11:20PM

If you consider that the utility generated by working is much greater than the utility directly generated by having fun, then the main thing that you're going to optimizing when you have fun is how much motivation the memory of having that fun increases your working capabilities. This is distinctly different from optimizing for the direct preference fulfillment generated by the fun, even if the same activities are optimal for both utility functions.

The same model works for any action A such that the utility generated by the effect of that action on another action is much greater than the utility generated by the action itself. This probably applies to most maintainance actions, such as doing laundry, sleeping, eating, but this is more obvious to us -- we usually don't see laundry as an end unto itself, but we often do pursue fun for it's own sake. I'm not advocating that we shouldn't have fun, but that we (or at least I) seem to be optimizing for the wrong thing -- direct preference fulfillment, rather than motivation.

This feels like a significant insight, but I tend to get a significant number of false positives. Any ideas on how we might use this?

Sublimity vs. Youtube

23 Alicorn 18 March 2011 05:33AM

The torture vs. dust specks quandary is a canonical one to LW.  Off the top of my head, I can't remember anyone suggesting the reversal, one where the arguments taken by the hypothetical are positive and not negative.  I'm curious about how it affects people's intuitions.  I call it - as the title indicates - "Sublimity vs. Youtube1".

Suppose the impending existence of some person who is going to live to be fifty years old whatever you do2.  She is liable to live a life that zeroes out on a utility scale: mediocre ups and less than shattering downs, overall an unremarkable span.  But if you choose "sublimity", she's instead going to live a life that is truly sublime.  She will have a warm and happy childhood enriched by loving relationships, full of learning and wonder and growth; she will mature into a merrily successful adult, pursuing meaningful projects and having varied, challenging fun.  (For the sake of argument, suppose that the ripple effects of her sublime life as it affects others still lead to the math tallying up as +(1 sublime life), instead of +(1 sublime life)+(various lovely consequences).)

Or you can choose "Youtube", and 3^^^3 people who weren't doing much with some one-second period of their lives instead get to spend that second watching a brief, grainy, yet droll recording of a cat jumping into a box, which they find mildly entertaining.

Sublimity or Youtube?

 

1The choice in my variant scenario of "watching a Youtube video" rather than some small-but-romanticized pleasure ("having a butterfly land on your finger, then fly away", for instance) is deliberate.  Dust specks are really tiny, and there's not much automatic tendency to emotionally inflate them.  Hopefully Youtube videos are the reverse of that.

2I'm choosing to make it an alteration of a person who will exist either way to avoid questions about the utility of creating people, and for greater isomorphism with the "torture" option in the original.

Is GiveWell.org the best charity (excluding SIAI)?

37 syllogism 26 February 2011 01:37PM

Update: I should've said "non-existential risk charity", rather than specifically exclude SIAI. I'm having trouble articulating why I don't want to give to an existential risk charity, so I'm going to think more deeply about it. This post is close to my source of discomfort, which is about the many highly uncertain assumptions necessary to motivate existential risk reduction. However, I couldn't articulate this argument properly before, so it might not be the true source of my discomfort. I'll keep thinking.


I received my first pay-cheque from my first job after getting my degree, so it's time to start tithing. So I've been evalating which charity to donate to. I'd like to support the SIAI but I'm not currently convinced it's the best-value charity in a dollars-per-life sense, once time-value of money discounting is applied. I'd like to discuss the best non-SIAI charity available.

By far the best source of information I've found is www.givewell.org. It was started by two hedge fund managers who were struck by the absence of rational charity evaluations, so decided that this was the most pressing problem they could work on.

Perhaps the clearest, deepest finding from the studies they pull together and discuss is that charity is hard. Spending money doesn't automatically translate to doing good. It's not even enough to have smart people who care and know a lot about the problem think of ideas, and then spend money doing them. There's still a good chance the idea won't work. So we need to be evaluating programs rigorously before we scale them up, and keep evaluating as we scale.

The bad news is that this isn't how charity is usually done. Very few charities make convincing evaluations of their activities public, if they carry them out at all. The good news is that some of the programs that have been evaluated are very, very effective. So choosing a charity rationally is absolutely critical.

Let's say you're interested specifically in HIV/AIDS relief.[1] You could fund a program that mainly distributes Anti-Retroviral Therapy to HIV/AIDS patients, which has been estimated conservatively to cost $1494 per disability adjusted life-year (DALY). Alternatively, you could fund a condom distribution program, which has been estimated conservatively to cost $112 per DALY. Or, you could fund a program to prevent mother-to-child transmission, which has been estimated conservatively to cost $12 per DALY. So even within HIV/AIDS, funding the right program can make your donation two orders of magnitude more effective. By tithing 10% of my income every year for the next thirty years, I could have a bigger impact than a $25 million donation, if the person who placed that donation only did an okay job of choosing a charity. 

GiveWell currently gives its top recommendation to VillageReach, a charity that seeks to improve logistics for vaccine delivery to remote communities. The evidence is less cut-and-dried than you'd ideally want, but it's still compelling. They took vaccine rates up to 95%, and had very low stock-out rates for vaccines during the 4 year pilot project in Mozambique. They're estimated to have spent about $200usd per life saved. Even if future projects are two or three times less efficient, you're still saving a life for $600. Think about how little money that is. If you tithe, you can probably expect to save 10 lives a year. That's massive.

Instead of donating directly to VillageReach, I'm going to just donate to GiveWell. They pool the funds they get and distribute them to their top charities, and I trust their analytic, evidence-based, largely utilitarian approach. Mostly, however, I think the work they're doing gathering and distributing information about charities is critically important. If more charities actually competed on evidence of efficacy, the whole endeavour might be a lot different. Does anyone have any better suggestions?

 


 

[1] I don't understand why people would want to help sufferers of one disease or condition specifically, instead of picking the lowest-hanging fruit, but apparently they do.

 

 

Revisiting the anthropic trilemma III: solutions and interpretations

2 Stuart_Armstrong 17 February 2011 03:14PM

In previous posts, I revisited Eliezer's anthropic trilemma, approaching it with ata's perspective that the decisions made are the objects of fundamental interest, not the probabilities or processes that gave rise to them. I initially applied my naive intuitions to the problem, and got nonsense. I then constructed a small collection of reasonable-seeming assumptions, and showed they defined a single method of spreading utility functions across copies.

This post will apply that method to the anthropic trilemma, and thus give us the "right" decisions to make. I'll then try and interpret these decisions, and see what they tell us about subjective anticipation, probabilities and the impact of decisions. As in the original post, I will be using the chocolate bar as the unit of indexical utility, as it is a well known fact that everyone's utility is linear in chocolate.

The details of the lottery winning setup can be found either here or here. The decisions I must make are:

Would I give up a chocolate bar now for two to be given to one of the copies if I win the lottery? No, this loses me one utility and gains me only 2/million.

Would I give up a chocolate bar now for two to given to every copy if I win the lottery? Yes, this loses me one utility and gains me 2*trillion/million = 2 million.

Would I give up one chocolate bar now, for two chocolate bars to the future merged me if I win the lottery? No, this gives me an expected utility of -1+2/million.

Now let it be after the lottery draw, after the possible duplication, but before I know whether I've won the lottery or not. Would I give up one chocolate bar now in exchange for two for me, if I had won the lottery (assume this deal is offered to everyone)? The SIA odds say that I should; I have an expected gain of 1999/1001 ≈ 2.

Now assume that I have been told I've won the lottery, so I'm one of the trillion duplicates. Would I give up a chocolate bar for the future merged copy having two? Yes, I would, the utility gain is 2-1=1.

So those are the decisions; how to interpret them? There are several ways of doing this. There are four things to keep in mind: probability, decision impact, utility function, and subjective anticipation.

continue reading »

Sleeping anti-beauty and the presumptuous philosopher

1 Stuart_Armstrong 17 February 2011 02:59PM

My approach for dividing utility between copies gives the usual and expected solutions to the sleeping beauty problem: if all copies are offered bets, take 1/3 odds, if only one copy is offered bets, take 1/2 odds.

This makes sense, because my approach is analogous to "some future version of Sleeping Beauty gets to keep all the profits".

The presumptuous philosopher problem is subtly different from the sleeping beauty problem. It can best be phrased as sleeping beauty problem where each copy doesn't care for any other copy. Solving this is a bit more subtle, but an useful half-way point is the "Sleeping Anti-Beauty" problem.

Here, as before, one or two copies are created depending on the result of a coin flip. However, if two copies are created, they are the reverse of mutually altruistic: they derive disutility from the other copy achieving its utility. So if both copies receive $1, neither of their utilities increase: they are happy to have the cash, but angry the other copy also has cash.

Apart from this difference in indexical utility, the two copies are identical, and will reach the same decision. Now, as before, every copy is approached with bets on whether they are in the large universe (with two copies) or the small one (with a single copy). Using standard UDT/TDT Newcomb-problem type reasoning, they will always take the small universe side in any bet (as any gain/loss in the large universe is compensated for by the same gain/loss for the other copy they dislike).

Now, you could model the presumptuous philosopher by saying they have 50% chance of being in a Sleeping-Beauty (SB) situation and 50% of being in a Sleeping Anti-Beauty (SAB) situation (indifference modelled as half way between altruism and hate).

There are 4 equally likely possibilities here: small universe in SB, large universe in SB, small universe in SAB, large universe in SAB. A contract that gives $1 in a small universe is worth 0.25 + 0 + 0.25 + 0 = $0.5. While a contract that gives $1 in a large universe is worth 0 + 0.25*2 + 0 + 0 = $0.5 (as long as its offered to everyone). So it seems that a presumptuous philosopher should take even odds on the size of the universe if he doesn't care about the other presumptuous philosophers.

It's no coincidence this result can be reached by UDT-like arguments such as "take the objective probabilities of the universes, and consider the total impact of your decision being X, including all other decision that must be the same as yours". I'm hoping to find more fundamental reasons to justify this approach soon.

Revisiting the anthropic trilemma I: intuitions and contradictions

0 Stuart_Armstrong 15 February 2011 11:18AM

tl;dr: in which I apply intuition to the anthropic trilemma, and it all goes horribly, horribly wrong

Some time ago, Eliezer constructed an anthropic trilemma, where standard theories of anthropic reasoning seemed to come into conflict with subjective anticipation. rwallace subsequently argued that subjective anticipation was not ontologically fundamental, so we should not expect it to work out of the narrow confines of everyday experience, and Wei illustrated some of the difficulties inherent in "copy-delete-merge" types of reasoning.

Wei also made the point that UDT shifts the difficulty in anthropic reasoning away from probability and onto the utility function, and ata argued that neither the probabilities nor the utility function are fundamental, that it was the decisions that resulted from them that were important - after all, if two theories give the same behaviour in all cases, what grounds do we have for distinguishing them? I then noted that this argument could be extended to subjective anticipation: instead of talking about feelings of subjective anticipation, we could replace it by questions such as "would I give up a chocolate bar now for one of my copies to have two in these circumstances?"

In this post, I'll start by applying my intuitive utility/probability theory to the trilemma, to see what I would decide in these circumstance, and the problems that can result. I'll be sticking with classical situations rather than quantum, for simplicity.

So assume a (classical) lottery where I have ticket with million to one odds. The trilemma presented a lottery winning trick: set up the environment so that if ever I did win the lottery, a trillion copies of me would be created, they would experience winning the lottery, and then they will be merged/deleted down to one copy again.

So that's the problem; what's my intuition got to say about it? Now, my intuition claims there is a clear difference between my personal and my altruistic utility. Whether this is true doesn't matter, I'm just seeing whether my intuitions can be captured. I'll call the first my indexical utility ("I want chocolate bars") and the second my non-indexical utility ("I want everyone hungry to have a good meal"). I'll be neglecting the non-indexical utility, as it is not relevant to subjective anticipation.

Now, my intuitions tell me that SIA is the correct anthropic probability theory. It also tells me that having a hundred copies in the future all doing exactly the same thing is equivalent with having just one: therefore my current utility means I want to maximise the average utility of my future copies.

If I am a copy, then my intuitions tell me I want to selfishly maximise my own personal utility, even at the expense of my copies. However, if I were to be deleted, I would transfer my "interest" to my remaining copies. Hence my utility as a copy is my own personal utility, if I'm still alive in this universe, and the average of the remaining copies, if I'm not. This also means that if everyone is about to be deleted/merged, then I care about the single remaining copy that will come out of it, equally with myself.

Now I've setup my utility and probability; so what happens to my subjective anticipation in the anthropic trilemma? I'll use the chocolate bar as a unit of utility - because, as everyone knows, everybody's utility is linear in chocolate, this is just a fundamental fact about the universe.

First of all, would I give up a chocolate bar now for two to be given to one of the copies if I win the lottery? Certainly not, this loses me 1 utility and only gives me 2/million trillion in return. Would I give up a bar now for two to be given to every copy if I lose the lottery? No, this loses me 1 utility and only give me 2/million in return.

So I certainly do not anticipate winning the lottery through this trick.

Would I give up one chocolate bar now, for two chocolate bars to the future merged me if I win the lottery? No, this gives me an expected utility of -1+2/million, same as above.

So I do not anticipate having won the lottery through this trick, after merging.

Now let it be after the lottery draw, after the possible duplication, but before I know whether I've won the lottery or not. Would I give up one chocolate bar now in exchange for two for me, if I had won the lottery (assume this deal is offered to everyone)? The SIA odds say that I should; I have an expected gain of 1999/1001 ≈ 2.

So once the duplication has happened, I anticipate having won the lottery. This causes a preference reversal, as my previous version would pay to have my copies denied that choice.

Now assume that I have been told I've won the lottery, so I'm one of the trillion duplicates. Would I give up a chocolate bar for the future merged copy having two? Yes, I would, the utility gain is 2-1=1.

So once I've won the lottery, I anticipate continuing having won the lottery.

So, to put all these together:

  • I do not anticipate winning the lottery through this trick.
  • I do not anticipate having won the lottery once the trick is over.
  • However, in the middle of the trick, I anticipate having won the lottery.
  • This causes a money-pumpable preference reversal.
  • And once I've won the lottery, I anticipate continuing to have won the lottery once the trick is over.

Now, some might argue that there are subtle considerations that make my behaviour the right one, despite the seeming contradictions. I'd rather say - especially seeing the money-pump - that my intuitions are wrong, very wrong, terminally wrong, just as non-utilitarian decision theories are.

However, what I started with was a perfectly respectable utility function. So we will need to add other consideration if we want to get an improved consistent system. Tomorrow, I'll be looking at some of the axioms and assumptions one could use to get one.

Subjective anticipation as a decision process

3 Stuart_Armstrong 08 February 2011 11:07AM

As argued here, debates about probability can be profitably replaced with decision problems. This often dissolves the debate - there is far more agreement as to what decision sleeping beauty should take than on what probabilities she should use.

The concept of subjective anticipation or subjective probabilities that cause such difficulty here, can, I argue, be similarly replaced by a simple decision problem.

If you are going to be copied, uncopied, merged, killed, propagated through quantum branches, have your brain tasered with amnesia pills while your parents are busy flipping coins before deciding to reproduce, and are hence unsure as to whether you should subjectively anticipated being you at a certain point, the relevant question should not be whether you feel vaguely connected to the putative future you in some ethereal sense.

Instead the question should be akin to: how many chocolate bars would your putative future self have to be offered, for you to forgo one now? What is the tradeoff between your utilities?

Now, altruism is of course a problem for this approach: you might just be very generous with copy #17 down the hallway, he's a thoroughly decent chap and all that, rather than anticipating being him. But humans can generally distinguish between selfish and altruistic decisions, and the setup can be tweaked to encourage the maximum urges towards winning, rather than letting others win. For me, a competitive game with chocolate as the reward would do the trick...

Unlike for the sleeping beauty problem, this rephrasing does not instantly solve the problems, but it does locate them: subjective anticipation is encoded in the utility function. Indeed, I'd argue that subjective anticipation is the same problem as indexical utility, with a temporal twist thrown in.

View more: Prev | Next