You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Continuity axiom of vNM

3 Stuart_Armstrong 30 July 2014 04:27PM

In a previous post, I left a somewhat cryptic comment on the continuity/Archimedean axiom of vNM expected utility.

  • (Continuity/Achimedean) This axiom (and acceptable weaker versions of it) is much more subtle that it seems; "No choice is infinity important" is what it seems to say, but " 'I could have been a contender' isn't good enough" is closer to what it does. Anyway, that's a discussion for another time.

Here I'll explain briefly what I mean by it. Let's drop that axiom, and see what could happen. First of all, we could have a utility function with non-standard real value. This allows some things to be infinitely more important than others. A simple illustration is lexicographical ordering; eg my utility function consists of the amount of euros I end up owning, with the amount of sex I get serving as a tie-breaker.

There is nothing wrong with such a function! First, because in practice it functions as a standard utility function (I'm unlikely to be able to indulge in sex in a way that has absolutely no costs or opportunity costs, so the amount of euros will always predominate). Secondly because, even if it does make a difference... it's still expected utility maximisation, just a non-standard version.

But worse things can happen if you drop the axiom. Consider this decision criteria: I will act so that, at some point, there will have been a chance of me becoming heavy-weight champion of the world. This is compatible with all the other vNM axioms, but is obviously not what we want as a decision criteria. In the real world, such decision criteria is vacuous (there is a non-zero chance of me becoming heavyweight champion of the world right now), but it certainly could apply in many toy models.

That's why I said that the continuity axiom is protecting us from "I could have been a contender (and that's all that matters)" type reasoning, not so much from "some things are infinitely important (compared to others)".

Also notice that the quantum many-worlds version of the above decision criteria - "I will act so that the measure of type X universe is non-zero" - does not sound quite as stupid, especially if you bring in anthropics.

An extended class of utility functions

3 Stuart_Armstrong 17 June 2014 04:36PM

This is a technical result that I wanted to check before writing up a major piece on value loading.

The purpose of a utility function is to give an agent criteria with which to make a decision. If two utility functions always give the same decisions, they're generally considered the same utility function. So, for instance, the utility function u always gives the same decisions as u+C for some constant C, or Du for some positive constant D. Thus we can say that utility functions are equivalent if they are related by a positive affine transformation.

For specific utility functions, and specific agents, the class of functions that give the same decisions is quite a bit larger. For instance, imagine that v is a utility function with the property v("any universe which contains humans")=constant. Then any human who attempts to follow u, could equivalently follow u+v (neglecting acausal trade) - it makes no difference. In general, if no action the agent could ever take would change the value of v, then u and u+v give the same decisions.

More subtly, if the agent can change v but cannot change the expectation of v, then u and u+v still give the same decisions. This is because for any actions a and b the agent could take:

E(u+v | a) = E(u | a) + E(v | a) = E(u | a) + E(v | b).

Hence E(u+v | a) > E(u+v | b) if and only if E(u | a) > E(u | b), and so the decision hasn't changed.

Note that E(v | a) need not be constant for all actions: simply that for every actions and b that an agent could take at a particular decision point, E(v | a) = E(v | b). It's perfectly possible for the expectation of v to be different at different moments, or conditional on different decisions made at different times.

Finally, as long as v obeys the above properties, there is no reason for it to be a utility function in the classical sense - it could be constructed any way we want.

 

An example: suffer not from probability, nor benefit from it

The preceding seems rather abstract, but here is the motivating example. It's a correction term T that adds or subtracts utility, as external evidence comes in (it's important that the evidence is external - the agent gets no correction from knowing what its own actions are/were). If the AI knows evidence e, and new (external) evidence f comes in, then its utility gets adjusted by T(e,f) which is defined as

T(e,f) = E(u | e) - E(u | e, f)

In other words, the agents utility gets adjusted by the difference between the new expected utility and the old - and hence the agent's expected utility is unchanged by new external evidence.

Consider for instance an agent with a utility u linear in money. It much choose between a bet that goes 50-50 on $0 (heads) or $100 (tails), versus a sure $49. It correctly choose the bet, having an expected utility of u=$50 - in other words, E(u, bet)=$50. But now imagine that the coin comes out heads. The utility u plunges to $0 (in other words E(u | bet, heads)=0). But the correction term cancels that out:

u(bet, heads) + T(bet, heads) = $0 + E(u | bet) - E(u |bet, heads) = $0 + $50 -$0 = $50.

A similar effect leaves utility unchanging if the coin is tails, cancelling the increase. In other words, adding the T correction term removes the impact of stochastic effects on utility.

But the agent will still make the same decisions. This is because before seeing evidence f, it cannot predict its impact on EU(u). In other words, summing over all possible evidences f:

E(u | e) = Σ p(f)E(u | e, f),

which is another way of phrasing "conservation of expected evidence". This implies that

E(T(e,-)) = Σ p(f)T(e,f)

Σ p(f)((E(u | e) - E(u | e, f))

= E(u | e) - Σ p(f)E(u | e, f)

= 0,

and hence that adding the T term does not change the agent's decisions. All the various corrections add on to the utility as the agent continues making decisions, but none of them make the agent change what it does.

The relevance of this will be explained in a subsequent post (unless someone finds an error here).

On the fragility of values

4 Stuart_Armstrong 04 November 2011 06:15PM

Programming human values into an AI is often taken to be very hard because values are complex (no argument there) and fragile. I would agree that values are fragile in the construction; anything lost in the definition might doom us all. But once coded into a utility function, they are reasonably robust.

As a toy model, let's say the friendly utility function U has a hundred valuable components - friendship, love, autonomy, etc... - assumed to have positive numeric values. Then to ensure that we don't lose any of these, U is defined as the minimum of all those hundred components.

Now define V as U, except we forgot the autonomy term. This will result in a terrible world, without autonomy or independence, and there will be wailing and gnashing of teeth (or there would, except the AI won't let us do that). Values are indeed fragile in the definition.

continue reading »

Satisficers want to become maximisers

21 Stuart_Armstrong 21 October 2011 04:27PM

(with thanks to Daniel Dewey, Owain Evans, Nick Bostrom, Toby Ord and BruceyB)

In theory, a satisficing agent has a lot to recommend it. Unlike a maximiser, that will attempt to squeeze the universe to every drop of utility that it can, a satisficer will be content when it reaches a certain level expected utility (a satisficer that is content with a certain level of utility is simply a maximiser with a bounded utility function). For instance a satisficer with a utility linear in paperclips and a target level of 9, will be content once it's 90% sure that it's built ten paperclips, and not try to optimize the universe to either build more paperclips (unbounded utility), or obsessively count the ones it has already (bounded utility).

Unfortunately, a self-improving satisficer has an extremely easy way to reach its satisficing goal: to transform itself into a maximiser. This is because, in general, if E denotes expectation,

E(U(there exists an agent A maximising U))  ≥  E(U(there exists an agent A satisficing U))

How is this true (apart from the special case when other agents penalise you specifically for being a maximiser)? Well, agent A will have to make decisions, and if it is a maximiser, will always make the decision that maximises expected utility. If it is a satisficer, it will sometimes not make the same decision, leading to lower expected utility in that case.

So hence if there were a satisficing agent for U, and it had some strategy S to accomplish its goal, then another way to accomplish this would be to transform itself into a maximising agent and let that agent implement S. If S is complicated, and transforming itself is simple (which would be the case for a self-improving agent), then self-transforming into a maximiser is the easier way to go.

So unless we have exceedingly well programmed criteria banning the satisficer from using any variant of this technique, we should assume satisficers are as likely to be as dangerous as maximisers.

Edited to clarify the argument for why a maximiser maximises better than a satisficer.

Edit: See BruceyB's comment for an example where a (non-timeless) satisficer would find rewriting itself as a maximiser to be the only good strategy. Hence timeless satisficers would behave as maximisers anyway (in many situations). Furthermore, a timeless satisficer with bounded rationality may find that rewriting itself as a maximiser would be a useful precaution to take, if it's not sure to be able to precalculate all the correct strategies.