Comment Permalink

Qiaochu_Yuan12y310

Stanovich's paper on why humans are apparently worse at following the VNM axioms than some animals has some interesting things to say, although I don't like the way it says them. I quit halfway through the paper out of frustration, but what I got out of the paper (which may not be what the paper itself was trying to say) is more or less the following: humans model the world at different levels of complexity at different times, and at each of those levels different considerations come into play for making decisions. An agent behaving in this way can appear to be behaving VNM-irrationally when really it is just trying to efficiently use cognitive resources by not modeling the world at the maximum level of complexity all the time. Non-human animals may model the world at more similar levels of complexity over time, so they behave more VNM-rationally even if they have less overall optimization power than humans.

A related consideration, which is more about the methodology of studies claiming to measure human irrationality, is that the problem you think a test subject is solving is not necessarily the problem they're actually solving. I guess a well-known example is when you ask people to play the prisoner's dilemma but in their heads they're really playing the iterated prisoner's dilemma.

And another point: an agent can have a utility function and still behave VNM-irrationally if computing the VNM-rational thing to do given its utility function takes too much time, so the agent computes some approximation of it. It's a given in practical applications of Bayesian statistics that Bayesian inference is usually intractable, so it's necessary to compute some approximation to it, e.g. using Monte Carlo methods. The human brain may be doing something similar (a possibility explored in Lieder-Griffiths-Goodman, for example).

(Which reminds me: we don't talk anywhere near enough about computational complexity on LW for my tastes. What's up with that? An agent can't do anything right if it can't compute what "right" means before the Sun explodes.)

Showing 3 of 5 replies (Click to show all)

Eugine_Nier12y00

humans model the world at different levels of complexity at different times, and at each of those levels different considerations come into play for making decisions. An agent behaving in this way can appear to be behaving VNM-irrationally when really it is just trying to efficiently use cognitive resources by not modeling the world at the maximum level of complexity all the time. Non-human animals may model the world at more similar levels of complexity over time, so they behave more VNM-rationally even if they have less overall optimization power than humans.

Notice the obvious implications to the ability of super-human AI's to behave VNM-rationally.

6David_Gerard12y

I spent a large chunk of Sunday and Monday finally reading Death Note and came to appreciate how some people on LW can think that agents meticulously working out each other's "I know that you know that I know" and then acting so as to interact with their simulations of each other, including their simulations of simulating each other, can seem a reasonable thing to aspire to. Even if actual politicians and so forth seem to do it by intuition, i.e., much more in hardware.

14Vaniver12y

I agree with this concern (and my professional life is primarily focused on heuristic optimization methods, where computational complexity is huge). I suspect it doesn't get talked about much here because of the emphasis on intelligence explosion, missing AI insights, provably friendly, normative rationality, and there not being much to say. (The following are not positions I necessarily endorse.) An arbitrarily powerful intelligence might not care much about computational complexity (though it's obviously important if you still care about marginal benefit and marginal cost at that level of power). Until we understand what's necessary for AGI, the engineering details separating polynomial, exponential, and totally intractable algorithms might not be very important. It's really hard to prove how well heuristics do at optimization, let alone robustness. The Heuristics and Biases literature focuses on areas where it's easy to show humans aren't using the right math, rather than how best to think given the hardware you have, and some of that may be deeply embedded in the LW culture. I think that there's a strong interest in prescriptive rationality, though, and if you have something to say on that topic or computational complexity, I'm interested in hearing it.

See in context

73 We Don't Have a Utility Function

by [anonymous]

2nd Apr 2013

5 min read

119

73

Related: Pinpointing Utility

If I ever say "my utility function", you could reasonably accuse me of cargo-cult rationality; trying to become more rational by superficially immitating the abstract rationalists we study makes about as much sense as building an air traffic control station out of grass to summon cargo planes.

There are two ways an agent could be said to have a utility function:

It could behave in accordance with the VNM axioms; always choosing in a sane and consistent manner, such that "there exists a U". The agent need not have an explicit representation of U.
It could have an explicit utility function that it tries to expected-maximize. The agent need not perfectly follow the VNM axioms all the time. (Real bounded decision systems will take shortcuts for efficiency and may not achieve perfect rationality, like how real floating point arithmetic isn't associative).

Neither of these is true of humans. Our behaviour and preferences are not consistent and sane enough to be VNM, and we are generally quite confused about what we even want, never mind having reduced it to a utility function. Nevertheless, you still see the occasional reference to "my utility function".

Sometimes "my" refers to "abstract me who has solved moral philosophy and or become perfectly rational", which at least doesn't run afoul of the math, but is probably still wrong about the particulars of what such an abstract idealized self would actually want. But other times it's a more glaring error like using "utility function" as shorthand for "entire self-reflective moral system", which may not even be VNMish.

But this post isn't really about all the ways people misuse terminology, it's about where we're actually at on the whole problem for which a utility function might be the solution.

As above, I don't think any of us have a utility function in either sense; we are not VNM, and we haven't worked out what we want enough to make a convincing attempt at trying. Maybe someone out there has a utility function in the second sense, but I doubt that it actually represents what they would want.

Perhaps then we should speak of what we want in terms of "terminal values"? For example, I might say that it is a terminal value of mine that I should not murder, or that freedom from authority is good.

But what does "terminal value" mean? Usually, it means that the value of something is not contingent on or derived from other facts or situations, like for example, I may value beautiful things in a way that is not derived from what they get me. The recursive chain of valuableness terminates at some set of values.

There's another connotation, though, which is that your terminal values are akin to axioms; not subject to argument or evidence or derivation, and simply given, that there's no point in trying to reconcile them with people who don't share them. This is the meaning people are sometimes getting at when they explain failure to agree with someone as "terminal value differences" or "different set of moral axioms". This is completely reasonable, if and only if that is in fact the nature of the beliefs in question.

About two years ago, it very much felt like freedom from authority was a terminal value for me. Those hated authoritarians and fascists were simply wrong, probably due to some fundamental neurological fault that could not be reasoned with. The very prototype of "terminal value differences".

And yet here I am today, having been reasoned out of that "terminal value", such that I even appreciate a certain aesthetic in bowing to a strong leader.

If that was a terminal value, I'm afraid the term has lost much of its meaning to me. If it was not, if even the most fundamental-seeming moral feelings are subject to argument, I wonder if there is any coherent sense in which I could be said to have terminal values at all.

The situation here with "terminal values" is a lot like the situation with "beliefs" in other circles. Ask someone what they believe in most confidently, and they will take the opportunity to differentiate themselves from the opposing tribe on uncertain controversial issues; god exists, god does not exist, racial traits are genetic, race is a social construct. The pedant answer of course is that the sky is probably blue, and that that box over there is about a meter long.

Likewise, ask someone for their terminal values, and they will take the opportunity to declare that those hated greens are utterly wrong on morality, and blueness is wired into their very core, rather than the obvious things like beauty and friendship being valuable, and paperclips not.

So besides not having a utility function, those aren't your terminal values. I'd be suprised if even the most pedantic answer weren't subject to argument; I don't seem to have anything like a stable and non-negotiable value system at all, and I don't think that I am even especially confused relative to the rest of you.

Instead of a nice consistent value system, we have a mess of intuitions and hueristics and beliefs that often contradict, fail to give an answer, and change with time and mood and memes. And that's all we have. One of the intuitions is that we want to fix this mess.

People have tried to do this "Moral Philosophy" thing before, myself included, but it hasn't generally turned out well. We've made all kinds of overconfident leaps to what turn out to be unjustified conclusions (utilitarianism, egoism, hedonism, etc), or just ended up wallowing in confused despair.

The zeroth step in solving a problem is to notice that we have a problem.

The problem here, in my humble opinion, is that we have no idea what we are doing when we try to do Moral Philosophy. We need to go up a meta-level and get a handle on Moral MetaPhilosophy. What's the problem? What are the relevent knowns? What are the unknowns? What's the solution process?

Ideally, we could do for Moral Philosphy approximately what Bayesian probability theory has done for Epistemology. My moral intuitions are a horrible mess, but so are my epistemic intuitions, and yet we more-or-less know what we are doing in epistemology. A problem like this has been solved before, and this one seems solvable too, if a bit harder.

It might be that when we figure this problem out to the point where we can be said to have a consistent moral system with real terminal values, we will end up with a utility function, but on the other hand, we might not. Either way, let's keep in mind that we are still on rather shaky ground, and at least refrain from believing the confident declarations of moral wisdom that we so like to make.

Moral Philosophy is an important problem, but the way is not clear yet.

Meta-PhilosophyBounded RationalityUtility Functions

Frontpage

73

We Don't Have a Utility Function

New Comment

119 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:50 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Qiaochu_Yuan12y310

[-]Vaniver12y140

(Which reminds me: we don't talk anywhere near enough about computational complexity on LW for my tastes. What's up with that? An agent can't do anything right if it can't compute what "right" means before the Sun explodes.)

I agree with this concern (and my professional life is primarily focused on heuristic optimization methods, where computational complexity is huge).

I suspect it doesn't get talked about much here because of the emphasis on intelligence explosion, missing AI insights, provably friendly, normative rationality, and there not being much to say. (The following are not positions I necessarily endorse.) An arbitrarily powerful intelligence might not care much about computational complexity (though it's obviously important if you still care about marginal benefit and marginal cost at that level of power). Until we understand what's necessary for AGI, the engineering details separating polynomial, exponential, and totally intractable algorithms might not be very important. It's really hard to prove how well heuristics do at optimization, let alone robustness. The Heuristics and Biases literature focuses on areas where it's easy to show humans aren't using the right math, rather than how best to think given the hardware you have, and some of that may be deeply embedded in the LW culture.

I think that there's a strong interest in prescriptive rationality, though, and if you have something to say on that topic or computational complexity, I'm interested in hearing it.

9[anonymous]12y

Right, this is an important point that could use more discussion. In closer inspection a lot of the "irrationalities" are either rational on a higher-level game, or to be expected given the inability of people to "feel" abstract facts that they are told. That said, the inability to properly incorporate abstract information is quite a rationality problem.

2diegocaleiro12y

I've made this point quite a few times here and here

1Eugine_Nier12y

Depends, sometimes this is actually a decent way avoid believing every piece of abstract information one is presented with.

6David_Gerard12y

5jooyous12y

Have you ever played that thumb game where you stand around in a circle with some people and at each turn show 0, 1 or 2 thumbs? And each person takes turns calling out a guess for the total number of thumbs that will be shown? Playing that game gives a really strong sense of "Aha! I modeled you correctly because I knew that you knew that I knew ..." but I never actually know if it's real modeling or hindsight bias because of the way the game is played in real time. Maybe there's a way to modify the rules to test that?

[-]TheOtherDave12y130

I once spent a very entertaining day with a friend wandering around art exhibits once, with both of us doing a lot of "OK, you really like that and that and that and you hate that and that" prediction and subsequent correction.

One thing that quickly became clear was that I could make decent guesses about her judgments long before I could articulate the general rules I was applying to do so, which gave me a really strong sense of having modeled her really well.

One thing that became clear much more slowly was that the general rules I was applying, once I became able to articulate them, were not nearly as complex as they seemed to be when I was simply engaging with them as these ineffable chunks of knowledge.

I concluded from this that that strong ineffable sense of complex modeling is no more evidence of complex modeling than the similar strong ineffable sense of "being on someone's wavelength" is evidence of telepathy. It's just the way my brain feels when it's applying rules it can't articulate to predict the behavior of complex systems.

4TheOtherDave12y

This kind of explicit modelling is a recurring fictional trope. For example, Herbert uses it a lot in Dosadi Experiment to show off how totes cognitively advanced the Dosadi are.

8David_Gerard12y

Yes, but aspiring to it as an achievable thing very much strikes me as swallowing fictional evidence whole. (And, around LW, manga and anime.)

4TheOtherDave12y

No argument; just citing prior fictional art. :-)

5[anonymous]12y

Yes. "(Real bounded decision systems will take shortcuts for efficiency and may not achieve perfect rationality, like how real floating point arithmetic isn't associative)." On one hand, a lot of this is lacking a proper theory of logical uncertainty, which a lot of this is (I think). On the other hand, the usual solution is to step up a level to choose best decision algorithm instead of trying to directly compute best decision. Then you can step up to not taking forever at this. I don't know how to bottom this out. Related: A properly built AI need not do any explicit utility maximizing at all; it could all be built implicitly into hardcoded algorithms, the same way most algorithms have implicit probability distributions. Of course, one of the easiest ways to maximize expected utility is to explicitly do so, but I would still expect most code in an optimized AI to be implicitly maximizing.

[-]private_messaging12y210

What you need to estimate for maximizing the utility is not utility but sign of the difference in expected utilities. "More accurate" estimation of utility on one side of the comparison can lead to less accurate estimation of the sign of the difference. Which is what Pascal muggers use.

8[anonymous]12y

This is a very good point. I wonder what the implications are...

[-]private_messaging12y100

The main implication is that actions based on comparison between most complete available estimations of utility do not maximize utility. It is similar to evaluating sums; when evaluating 1-1/2+1/3-1/4 and so on, the 1+1/3+1/5+1/7 is a more complete sum than 1 - you have processed more terms (and can pat yourself on the head for doing more arithmetics) , but less accurate. In practice one obtains highly biased "estimates" from someone putting a lot more effort into finding terms of the sign that benefits them the most, and sometimes, from some terms being easier to find.

6[anonymous]12y

Yes, that is a problem. Are there other schemes that do a better job, though?

4private_messaging12y

In the above example, attempts to produce a most accurate estimate of the sum do a better job than attempts to produce most complete sum. In general what you learn from applied mathematics is that plenty of methods that are in some abstract sense more distant from the perfect method have a result closer to the result of the perfect method. E.g. the perfect method could evaluate every possible argument, sum all of them, and then decide. The approximate method can evaluate a least biased sample of the arguments, sum them, and then decide, whereas the method that tries to match the perfect method the most would sum all available arguments. If you could convince an agent that the latter is 'most rational' (which may be intuitively appealing because it does resemble the perfect method the most) and is what should be done, then in a complex subject where agent does not itself enumerate all arguments, you can feed arguments to that agent, biasing the sum, and extract profit of some kind.

3Jade12y

"Taken together the four experiments provide support for the Sampling Hypothesis, and the idea that there may be a rational explanation for the variability of children’s responses in domains like causal inference."

0Nornagest12y

That seems to be behind what I suspect is a paywall, except that the link I'd expect to solicit me for money is broken. Got a version that isn't?

7gwern12y

It's going through a university proxy, so it's just broken for you. Here's the paper: http://dl.dropboxusercontent.com/u/85192141/2013-denison.pdf

0Eugine_Nier12y

Notice the obvious implications to the ability of super-human AI's to behave VNM-rationally.

2private_messaging12y

Which are what? The AI that is managing some sort of upload society could trade it's clock time for utility. It's no different from humans where you can either waste your time pondering if you're being rational about how jumpy you are when you see a moving shadow that looks sort of like a sabre-toothed tiger, or you can figure out how to tie a rock to a stick; in the modern times, ponder what is a better deal at the store vs try to invent something and make a lot of money.

0Eugine_Nier12y

It still has to deal with the external world.

3private_messaging12y

But the point is, it's computing time costs utility, and so it can't waste it on things that will not gain it enough utility. If you consider 2x1x1 cube to have probability of 1/6 of landing on each side, you can still be VNM rational about that - then you won't be dutch booked, you'll lose money though because that cube is not a perfect die and you'll accept losing bets. Real world is like that, it doesn't give cookies for non-dutch-bookability, it gives cookies for correct predictions of what is actually going to happen.

[-]RogerS12y60

Confidence in moral judgments is never a sound criterion for them being "terminal", it seems to me.

To see why, consider that ones working values are unavoidably a function of two related things: one's picture of oneself, and of the social world. Thus, confident judgments are likely to reflect confidence in relevant parts of these pictures, rather than the shape of the function. To take your example, your adverse judgement of authority could have been a reflection of a confident picture of your ideal self as not being submissive, and of human soc... (read more)

1MugaSofer12y

Excellent suggestion. I would like to add "Nazi"to that list, and note that if you imagine doing something other than the historical results (in those cases where we know the historical result) you're doing this wrong. EDIT: reading this over, it sounds kinda sarcastic. Just want to clarify I'm being sincere here.

5RogerS12y

Yes indeed, it is a challenge to understand how the same human moral functionality "F" can result in a very different value system "M" to ones own, though I suspect a lot of historical reading would be necessary to fully understand the Nazi's construction of the social world - "S", in my shorthand. A contemporary example of the same challenge is the cultures that practice female genital mutilation. You don't have to agree with a construction of the world to begin to see how it results in the avowed values that emerge from it, but you do have to be able to picture it properly. In both cases, this challenge has to be distinguished from the somewhat easier task of explaining the origins of the value system concerned.

-1MugaSofer12y

Oh, I didn't mean it was particularly challenging - at least, as long as you avoid the antipattern of modelling them as Evil Monsters - just that it was a good exercise for this sort of thing. Indeed, I think most people can model the antisemitism (if not the philosophy and rhetorical/emotional power) by imagining society is being subverted by insidious alien Pod People. Another excellent point. I don't know much about FGM or the cultures that practice it, but it might easily be analogous to so-called "male genital mutilation" or circumcision.

[-]dspeyer12y60

Can you describe the process by which you changed your view on authority? I suspect that could be important.

[-][anonymous]12y100

Read lots, think lots, do lots.

More specifically, become convinced of consequentialism so that pragmatic concerns and exceptions can be handled in a principled way, realize that rule by Friendly AI would be acceptable, attempt to actually run a LW meetup and learn of the pragmatic effectiveness of central decision making, notice major inconsistencies in and highly suspect origin of my non-authoritarian beliefs, notice aesthetic value of leadership and following, etc.

7John_Maxwell12y

"Authority" isn't necessarily just one thing. For example, an all-powerful Friendly AI could choose to present itself in an extremely deferential way, and even conform exactly to it's human users' wishes. Being a central decisionmaker, projecting high status, having impressive accomplishments, having others feel instinctively deferential to you, and having others actually act deferential to you are all distinct but frequently related. I think at least some of these are worrisome (link). If you increase the authority of a group's leader along all the dimensions of authority (which probably happens by default), I'd guess you get increased group coherence at the expense of decreased group rationality. You also run the risk of having the leader's preferences be satisfied at the expense of the group's preferences. In situations where it doesn't actually matter what you do much and it mostly just matters that everyone does it together in an orderly way, maybe this can be a good trade-off.

0Error12y

This is interesting. For some time, I've had my anti-authoritarianism (and anti-governmentism) sort of filed away in the back of my mind as "review this opinion when I think I can handle finding out I'm wrong" Sounds like you've been through the process already. How much of your change of heart would you attribute to explicit reasoning, aesthetics, and personal experience respectively?

4[anonymous]12y

Good question. I wouldn't say it breaks up so nicely. First of all, the aesthetic appreciation basically got uncovered when the big aversions went away. It was like, "ok authority can be practical a lot of the time, and oh, look, now that I'm not afraid of it, it's kind of beautiful after all." The personal experience (having been an anarchist, running a LW meetup, etc) mostly just provided a bit of extra verification fuel once anti-authoritarianism was being seriously questioned. The thing that actually got me to explicitly formulate the whole process was reading Moldbug. He pointed out some glitches in the matrix, so to speak. I don't know how to weight the importance of these, or what that would mean. Is there a more specific question you're interested in?

7Eugine_Nier12y

I think you're failing to distinguish between authority one voluntarily submits to (potentially even reserving the right to reverse the decision), e.g., meetup organizer, and authority backed by a monopoly on violence, i.e., the modern conception of government.

[-]MindTheLeap12y50

I hadn't come across the von Neumann-Morgenstern utility theorem before reading this post, thanks for drawing it to my attention.

Looking at Moral Philosophy through the lens of agents working with utility/value functions is an interesting exercise; it's something I'm still working on. In the long run, I think some deep thinking needs to be done about what we end up selecting as terminal values, and how we incorporate them into a utility function. (I hope that isn't stating something that is blindingly obvious.)

I guess where you might be headed is into Met... (read more)

2PrawnOfFate12y

The second sentence doesn't follow from the first. If rational agents converge on their values, that is objective enough. Analogy: one can accept that mathematical truth is objective (mathematicians will converge) without being a Platonists (mathematical truths have an existence separate from humans) I fin d that hard to follow. If the test i rationally justifiable, and leads to the uniform results, how is that not objective? You seem to be using "objective" (having a truth value independent of individual humans) to mean what I would mean by "real" (having existence independent of humans).

0MindTheLeap12y

First of all, thanks for the comment. You have really motivated me to read and think about this more -- starting with getting clearer on the meanings of "objective", "subjective", and "intrinsic". I apologise for any confusion caused by my incorrect use of terminology. I guess that is why Eliezer likes to taboo words. I hope you don't mind me persisting in trying to explain my view and using those "taboo" words. Since I was talking about meta-ethical moral relativism, I hope that it was sufficiently clear that I was referring to moral values. What I meant by "objective values" was "objectively true moral values" or "objectively true intrinsic values". The second sentence was an explanation of the first: not logically derived from the first sentence, but a part of the argument. I'll try to construct my arguments more linearly in future. If I had to rephrase that passage I'd say: If there are no agents to value something, intrinsically or extrinsically, then there is also nothing to act on those values. In the absence of agents to act, values are effectively meaningless. Therefore, I'm not convinced that there is objective truth in intrinsic or moral values. However, the lack of meaningful values in the absence of agents hints at agents themselves being valuable. If value can only have meaning in the presence of an agent, then that agent probably has, at the very least, extrinsic/instrumental value. Even a paperclip maximiser would probably consider itself to have instrumental value, right? I think there is a difference between it being objectively true that, in certain circumstances, the values of rational agents converge, and it being objectively true that those values are moral. A rational agent can do really "bad" things if the beliefs and intrinsic values on which it is acting are "bad". Why else would anyone be scared of AI? I accept the possibility of objective truth values. I'm not convinced that it is objectively true that the convergence of subjective

-1PrawnOfFate12y

That's what I like to hear! But there is no need for morality in the absence of agents. When agents are there, values will be there, when agents are not there, the absence of values doesn't matter. I don't require their values to converge, I require them to accept the truths of certain claims. This happens in real life. People say "I don't like X, but I respect your right to do it". The first part says X is a disvalue, the second is an override coming from rationality.

0[anonymous]12y

I'm assuming a lot of background in this post that you don't seem to have. Have you read the sequences, specifically the metaethics stuff? Moral philosophy on LW is decades (at the usual philosophical pace) ahead of what you would learn elsewhere and a lot of the stuff you mentioned is considered solved or obsolete.

6wedrifid12y

Really? That's kind of scary if true. Moral philosophy on LW doesn't strike me as especially well developed (particularly compared to other rationality related subjects LW covers).

5[anonymous]12y

I don't believe anyone's really taken the metaethics sequence out for a test drive to see if it solves any nontrivial problems in moral philosophy.

2PrawnOfFate12y

Its worse than that. No-one even knows what the theory laid out is. EY says different things in different places.

-1wedrifid12y

If I recall correctly it struck me as an ok introduction to metaethics but it stopped before it got to the hard (ie. interesting) stuff.

0[anonymous]12y

Moral philosophy is not well developed on LW, but I think it's further than it is elsewhere, and when I look at the pace of developments in philosophy, it looks like it will take decades for everyone else to catch up. Maybe I'm underestimating the quality of mainstream philosophy, though. All I know is that people who are interested in moral philosophy who haven't been exposed to LW are a lot more confused than those on LW. And that those on LW are more confused than they think they are (hence the OP).

2Wei Dai12y

What do you think represents the best moral philosophy that LW has to offer? Just a few months ago you seemed to be saying that we didn't need to study moral philosophy, but just try to maximize "awesomeness", which "You already know that you know how to compute". I find it confusing that this post doesn't mention that one at all. Have you changed your mind since then, if so why? Or are you clarifying your position, or something else?

2[anonymous]12y

The metaethics sequence sinks most of the standard confusions, though it doesn't offer actual conclusions or procedures. Complexity of value. Value being human specific. morality as optimization target. etc. Maybe it's just the epistemic quality around here though. LWers talking about morality are able to go much further without getting derailed than the best I've seen elsewhere, even if there weren't much good work on moral philosophy on LW. Right. This is a good question. For actually making decisions, use Awesomeness or something as your moral proxy, because it more or less just works. For those of us who want to go deeper and understand the theory of morality declaratively, the OP applies; we basically don't have any good theory. They are two sides of the same coin; the situation in moral philosophy is like the situation in physics a few hundred (progress subjective) years ago, and we need to recognize this before trying to build the house on sand, so to speak. So we are better off just using our current buggy procedural morality. I could have made the connection clearer I suppose. This post is actually a sort of precurser to some new and useful (I hope) work on the subject that I've written up but haven't gotten around to polishing and posting. I have maybe 5 posts worth of morality related stuff in the works, and then I'm getting out of this godforsaken dungeon.

2Wei Dai12y

Given that we don't have a good explicit theory of what morality really is, how do you know (and how could you confidently claim in that earlier post) that Awesomeness is a good moral proxy? I think I understand what you're saying now, thanks for the clarification. However, my current buggy procedural morality is not "maximize awesomeness" but more like an instinctive version of Bostrom and Ord's moral parliament.

2[anonymous]12y

It seems to fit with intuition. How exactly my intuitions are supposed to imply actual morality is an open question.

1PrawnOfFate12y

Could you nominate some confusions that are unsunk amongst professional philosophers (vis a vis your "decades ahead" claim).

0BerryPick612y

You don't tend to find much detailed academic discussion regarding metaethical philosophy on the blogosphere at all. Disclaimers: strictly comparing it to other subjects which I consider similar from an outside view, and supported only by personal experience and observation.

3PrawnOfFate12y

I have, and I found it unclear and inconclusive. A number of people have offered to explain it , and they all ended up bowing out unable to do so I find no evidence for that claim.

2MindTheLeap12y

Sorry, I have only read selections of the sequences, and not many of the posts on metaethics. Though as far as I've gotten, I'm not convinced that the sequences really solve, or make obsolete, many of the deeper problems or moral philosophy. The original post, and this one, seems to be running into the "is-ought" gap and moral relativism. Being unable to separate terminal values from biases is due to there being no truly objective terminal values. Despite Eliezer's objections, this is a fundamental problem for determining what terminal values or utility function we should use -- a task you and I are both interested in undertaking.

5TimS12y

I think this community vastly over-estimates its grip on meta-ethical concepts like moral realism or moral anti-realism. (E.g. the hopelessly confused discussion in this thread). I don't think the meta-ethics sequence resolves these sorts of basic issues.

2MindTheLeap12y

I'm still coming to terms with the philosophical definitions of different positions and their implications, and the Stanford Encyclopedia of Philosophy seems like a more rounded account of the different view points than the meta-ethics sequences. I think I might be better off first spending my time continuing to read the SEP and trying to make my own decisions, and then reading the meta-ethics sequences with that understanding of the philosophical background. By the way, I can see your point that objections to moral anti-realism in this community may be somewhat motivated by the possibility that friendly AI becomes unprovable. As I understand it, any action can be "rational" if the value/utility function is arbitrary.

1Jabberslythe12y

There is a lot of diversity of opinions in philosophers and that may be true as a whole of the discipline, there is some good stuff to be found there. I'd recommend staying here for the most part rather than wading through philosophy elsewhere, though. Also, many moral philosophers may have very different moral sentiments from you and that maybe that makes them seem like idiots more than they actually are. Different moral sentiments as to whether consequentialism rather than just within consequentialism among other things.

[-]Arkanj3l12y30

One day we're going to have to unpack "aesthetic" a bit. I think it's more than just 'oh it feels really nice and fun', but after we used it as applied to HPMOR and Atlas Shrugged - or parable fiction in general - I've been giving it a similar meaning as 'mindset' or 'way of viewing'. It's becoming less clear to me as to how to use the term.

I've been using it in justifications of reading (certain) fiction now, but I want to be careful that I'm not talking about something else, or something that doesn't exist, so my rationality can aim true.

0[anonymous]12y

What has aesthetics got to do with HPMOR and AS? Just taboo all the weird terms, and be specific. What are you reading and why?

[-]Eugine_Nier12y20

That's much more of a qualitative difference based on how much say you have.

[-]lukstafi12y20

Is there a reason you say "terminal value" rather than "intrinsic value"?

4Qiaochu_Yuan12y

It's the preferred local term. But if Wikipedia is to be believed, it's also a term used by mainstream philosophers, just less commonly.

2gjm12y

I'm not nyan_sandwich, but here's why I think those are different things and would use the former, not the latter, for what's being said here. An "intrinsic value" is intrinsic to the thing being valued; e.g., perhaps some beautiful things are beautiful in a way that's got nothing to do with the particular tastes human beings happen to have, and are just Beautiful In Themselves. A "terminal value" is terminal to the agent doing the valuing; e.g., perhaps my dislike of celery isn't reducible to any other preferences and principles I have, but just Is What It Is, and other value judgements I make build on my fundamental, irreducible, dislike of celery. There can be terminal values even if there aren't intrinsic values (e.g., maybe no value judgement is ever meaningful outside the context of a particular value system, but some value systems really do have things sufficiently axiom-like to be rightly called terminal values). There can be intrinsic values even if there aren't terminal values (e.g., maybe there is One True Value System but it doesn't have the sort of logical structure that would make some of its values terminal). nyan_sandwich is writing about the structure of human beings' value systems, and suggesting that they don't involve anything as axiom-like as terminal values. NS is not writing about the objective moral structure of the universe and suggesting that it doesn't involve ascribing intrinsic value to particular things. Therefore, the proposition NS is endorsing is "we don't have terminal values", rather than "things don't have intrinsic value". [EDITED to avoid formatting screwage from multiple underscores.]

[-]TimS12y20

About two years ago, it very much felt like freedom from authority was a terminal value for me. Those hated authoritarians and fascists were simply wrong, probably due to some fundamental neurological fault that could not be reasoned with. The very prototype of "terminal value differences".
And yet here I am today, having been reasoned out of that "terminal value", such that I even appreciate a certain aesthetic in bowing to a strong leader.

On what basis do you assert you were "reasoned out" of that position? For example, ... (read more)

0selylindi12y

TimS mentioned moral anti-realism as one possibility. I have a favorable opinion of desire utilitarianism (search for pros and cons), which is a system that would be compatible with another possibility: real and objective values, but not necessarily any terminal values. By analogy, such a situation would be a description for moral values like epistemological coherentism (versus foundationalism) describes knowledge. The mental model could be a web rather than a hierarchy. At least it's a possibility -- I don't intend to argue for or against it right now as I have minimal evidence.

0[anonymous]12y

I'll admit it's rather shaky and I'd be saying the same thing if I'd merely been brainwashed. It doesn't feel like it was precipitated by anything other than legitimate moral argument, though. If I can be brainwashed out of my "terminal values" so easily, and it really doesn't feel like something to resist, then I'd like a sturdier basis on which to base my moral reasoning. What is a conversation metaphor? I'm afraid I don't see what you're getting at. I still value freedom in what feels like a fundamental way, I just also value hierarchy and social order now. What is gone is the extreme feeling of ickyness attached to authority, and the feeling of sacredness attached to freedom, and the belief that these things were terminal values. The point is that things I'm likely to identify as "terminal values", especially in the contexts of disagreements, are simply not that fundamental, and are much closer to derived surface heuristics or even tribal affiliation signals. I feel like I'm not properly responding to your comment though.

7Furslid12y

Nyan, I think your freedom example is a little off. The converse of freedom is not bowing down to a leader. It's being made to bow. People choosing to bow can be beautiful and rational, but I fail to see any beauty in someone bowing when their values dictate they should stand.

4TimS12y

My fault for failing to clarify. There are roughly three ways one can talk about changes to an agent's terminal values. (1) Such changes never happen. (At a society level, this proposition appears to be false). (2) Such changes happen through rational processes (i.e. reasoning). (3) Such changes happen through non-rational processes (e.g. tribal affiliation + mindkilling). I was using "conversion" as a metaphorical shorthand for the third type of change.

4Eugine_Nier12y

BTW, you might want to change "conversation" to "conversion" in the grandparent.

2TimS12y

Ah! Thanks.

0[anonymous]12y

Ok. Then my answer to that is roughly this: This could of course use more detail, unless you understand what I'm getting at.

2TimS12y

That's certainly a serious risk, especially if terminal values work like axioms. There's a strong incentive in debate or policy conflict to claim an instrumental value was terminal just to insulate it from attack. And then, by process of the failure mode identified in Keep Your Identity Small, one is likely to come to believe that the value actually is a terminal value for oneself. I took your essay as trying to make a meta-ethical point about "terminal values" and how using the term with an incoherent definition causes confusion in the debate. Parallel to when you said if we interact with an unshielded utility, it's over, we've committed a type error. If that was not your intent, then I've misunderstood the essay.

0[anonymous]12y

Oops, it wasn't really about how we use terms or anything. I'm trying to communicate that we are not as morally wise as we sometimes pretend to be, or think we are. That Moral Philosophy is an unsolved problem, and we don't even have a good idea how to solve it (unlike, say physics, where it's unsolved, but the problem is understood). This is in preparation for some other posts on the subject, the next of which will be posted tonight or soon.

-2Eugine_Nier12y

That said there has been centuries of work on the subject, that Eliezer unfortunately through out because VHM-utilitarianism is so mathematically elegant.

1private_messaging12y

Are you sure you aren't simply trading open ended beliefs for those that circularly support themselves to a greater extent? When you trust in an authority which tells you to trust in that authority, that's sturdier.

0Strange712y

Gygax would say your alignment has shifted a step toward Lawful. I tend to prefer the Exalted system, which could represent such a shift through the purchase of a third dot in the virtue of Temperance.

[-]Shmi12y20

Thanks, it's a great post.

[-]Kerrigan2mo10

How are humans exploitable, given that they don't have utility functions?

[-]Pentashagon12y00

I wonder if, and when, we should behave as if we were VNM-rational. It seems vital to act VNM-rational if we're interacting with Omega or for that matter anyone else who is aware of VNM-rationality and capable of creating money pumps. But as you point out we don't have VNM-utility functions. Therefore, there exist some VNM-rational decisions that will make us unhappy. The big question is whether we can be happy about a plan to change all of our actual preferences so that we become VNM-rational, and if not, is there a way to remain happy while strategic... (read more)

[-]SarahNibs12y00

I have not yet come to terms with how constructs of personal identity fit in with having or not having a utility function. What if it makes most sense to model my agency as a continuous limit of a series of ever more divided discrete agents who bring subsequent, very similar, future agents into existence? Maybe each of those tiny-in-time-extent agents have a utility function, and maybe that's significant?

2Qiaochu_Yuan12y

I think it makes the most sense to learn a ton of cognitive neuroscience and figure out what that neuroscience suggests about how you should model yourself. Kaj Sotala's mini-sequence about the modular mind seems to be a good place to start.

[-]Furslid12y00

I think your definition of terminal value is a little vague. The definition I prefer is as follows. A value is instrumental if derives its value from its ability to make other values possible. To the degree that a value is not instrumental, it is a terminal value. Values may be fully instrumental (money), partially instrumental (health [we like being healthy, but it also lets us do other things we like]) or fully terminal (beauty).

Terminal values do not have the warm fuzzy glow of high concepts. Beauty, truth, justice, and freedom may be terminal valu... (read more)

1Eugine_Nier12y

Either that or a bias. The difficulty (or even impossibility) of separating out biases from terminal values is the main problem with thinking of oneself as a VNM-utilitarian.

2[anonymous]12y

What? How so? Are there other theories that don't have this problem? (for reference, I take VNM seriously but not absolutely, and I don't take utilitarianism seriously.)

2Eugine_Nier12y

You had an entire post on the subject, you even linked to it in the OP. I'm not sure. My point was that VNM is not nearly as final a solution to morality as a lot of people around here seem to think.

2[anonymous]12y

Sorry, I read your comment as implying that it was a failure of VNM in particular.

0Furslid12y

The difference between instrumental and terminal values are in the perception of the evaluator. If they believe that something is useful to achieve other values, then it is an instrumental value. If they are wrong about its usefulness, that makes it an error in evaluation, not a terminal value. The difference between instrumental and terminal values is in the map, not in the territory. For someone who believes in astrology, getting their horoscope done is an instrumental value.

0Eugine_Nier12y

In practice this criterion is frequently circular. See also the blue minimizing robot.

[-]BerryPick612y00

First off, I think your observations about terminal values are spot-on, and I was always confused by how little we actually talk about these queer entities known as terminal values.

This discussion reminds me a bit of Scanlon's What We Owe To Each Other. His formulation of moral discourse strikes me as a piece of Meta-Moral philosophy: 'An act is wrong if and only if any principle that permitted it would be one that could reasonably be rejected by people moved to find principles for the general regulation of behaviour that others, similarly motivated, could... (read more)

0khafra12y

The term "terminal values" kinda assumes a consequentialist meta-ethical framework, I think; and that particular statement (and Scanlon in general) is more on the contractualist side; a framework opposed to consequentialism.

[-]PhilGoetz12y-10

You do have a utility function (though it may be stochastic). You just don't know what it is. "Utility function" means the same thing as "decision function"; it just has different connotations. Something determines how you act; that something is your utility function, even if it can be described only as a physics problem plus random numbers generated by your free will and adjustments made by God. (God must be encapsulated in an oracle function.) We call it a utility function to clue people into our purposes and the literature that ... (read more)

5[anonymous]12y

This contradicts my knowledge. By "utility function", I mean that thing which VNM proves exists; a mapping from possible worlds to real numbers. Where are the references for "utility function" being interchangable with "decision algorithm"? I have never seen that stated in any technical discussion of decisions. I'm confused. Do you just mean the difference between modeling a thing as an agent, vs modeling it as a causal system? Can you elaborate on how this relates here? Agree. Moral philosophy is hard. I'm working on it. Can you elaborate on why you think it is impossible for a machine to do good things? Or why such a question is meaningless? Tricky question indeed. Again, working on it.

2Decius12y

I have a utility function, but it is not time-invariant, and is often not continuous on the time axis.

4[anonymous]12y

And I'm a universe. Just a bit stochastic around the edges...

0Decius12y

Universes are like that. Are you deterministic, purely stochastic, or do you make decisions?

1A1987dM12y

What? Not having terminal values means that either you don't care about anything at all, or that “the recursive chain of valuableness” is infinitely deep. Neither of these seems likely to me.

0PrawnOfFate12y

I think there's a third possibility: values have a circular, strange-loop structure.

0itaibn012y

As far as I can tell, you aren't making any argument for your position that we have utility functions. You are merely asserting it.

4Kawoomba12y

"Whenever you do anything, that which determines your action - whatever it may be - can be called a decision - or utility - function. You are doing something, ergo you have a utility function." [ ]

4[anonymous]12y

I have never seen "utility function" used like this in any technical discussion. Am I missing something?

0Eugine_Nier12y

I think Phil is confusing the economist's (descriptive) utility function, with the VNM-ethicist's (prescriptive) utility function. Come to think of it, a case could be made that the VNM-utilitarian is similarly confused.

3[anonymous]12y

Care to expand on what you mean by VNM-utilitarian? You refer to it a lot and I'm never quite sure what you mean. (I'm also interested in what you think of it).

-1Eugine_Nier12y

By VNM-utilitarianism I mean the moral theories that one should act to maximize a utility function. Around here this is sometimes called "consequentialism" or simply "utilitarianism". Unfortunately, both terms are ambiguous. It's possible to have consequentialist theories that aren't based on a utility function, and "utilitarianism" is also used to mean the theory with the specific utility function of total happiness. Thus, I've taken to using "VNM-utilitarianism" as a hopefully less ambiguous and self-explanatory term. As for what I think of VNM-utilitarianism this comment gives a brief summery.

2wedrifid12y

When it is called 'utilitarianism' there are other people who call it wrong. I recommend saying consequentialism to avoid confusion. Mind you, I don't even know what you mean by those letters (VHM). My best guess is that you mean the Von Neumann Morgenstern utility theorem but got the letters wrong. If you are referring to those axioms then you could also consider saying VNM-utility instead of VNM-utilitarianism. Because those words have meanings that are far more different than their etymology might suggest to you.

-1Eugine_Nier12y

Oops. Fixed. That's why I talk about "VNM-utilitarianism" rather than simply "utilitarianism".

6wedrifid12y

That isn't enough to disambiguate the meaning. In fact, your intended meaning is not even one of the options to disambiguate between. Your usage is still wrong and misleading. I suggest following nshepperd's advice and using "VNM-rational" or "VNM-ratinality". (Obviously I will be downvoting all comments that persist with "VNM-utilitarianism". Many others will not downvote but will take your muddled terminology to be strong evidence that you are confused or ill-informed about the subject matter.)

3Eugine_Nier12y

I'm curious, what were the options for what you thought it meant. How about "VNM-consequentialism"?

2wedrifid12y

Utilitarianism in practice means some kind of aggregation of all people's preferences. Most typically either 'total' or 'average'. Even though I am a consequentialist (at least in a highly abstract combatibilist sense) I dismiss utilitarianism as stupid, arbitrary and not worth priveleging as a moral hypothesis. Adding VNM to it effectively narrows it down to 'preference utilitarianism' which at least gets rid of the worst of the crazy ('hedonic utilitarianism' Gahh!). But I don't think that is what you are trying to refer to when you challenge VNM-X (because it wouldn't be compatible with the points you make). Perfect! Please do. 'Consequentialism' means what one would naively expect 'utilitarianism' to mean, if not for an unfortunate history of bad philosophy having defined the term already. The VNM qualifier then narrows consequentialism down to the typical case that we tend to mean around here (because you are right, technically consequentialism is more broad than just that based on VNM axioms.)

4nshepperd12y

I believe "VNM-utilitarianism" is problematic because it would suggest that it is a kind of utilitarianism. By the most usual definition of "utilitarianism" (a moral theory involving an 'objective' aggregative measure of value + utility-maximising decision theory) it is not. However, I remember "VNM-rational" and "VNM-rationality" being accepted terminology.

0[anonymous]12y

No, I don't think it's just descriptive vs prescriptive, I mentioned both in my post and asserted that we had neither. Phil is saying that we do have a decision algorithm (I agree), and further, that "utility function" means "decision algorithm" (which I disagree with, but I'm not one to argue terminology.

0Eugine_Nier12y

Economists frequently assume humans have utility functions as part of their spherical cow model of human behavior. Unfortunately, they sometimes forget that this is just a spherical cow model, especially once one gets away from modeling collective economic behavior.

[-]MugaSofer12y-10

The problem here, in my humble opinion, is that we have no idea what we are doing when we try to do Moral Philosophy. We need to go up a meta-level and get a handle on Moral MetaPhilosophy. What's the problem? What are the relevent knowns? What are the unknowns? What's the solution process?

Wasn't there a whole sequence on this?

0[anonymous]12y

yep, but I find that after that, I'm still a long way from actually knowing how to write a program that does moral philosophy, even in principle, whereas physics is much further along (solomonoff induction, etc).

[+]timtyler12y-60

Moderation Log

Curated and popular this week