A minute ago I finally understood why this site has adopted expected utility as a prescriptive theory (it's certainly not descriptive) and what real-world meaning we're supposed to assign to questions about "utility of one human life" and such. Basically, we're working out the kinks of yet another new moral) code. Once you realize that, you can get with the project or leave it. Personally, I'd like to see people abandon it en masse.
The project of moving morality from brains into tools is the same project as moving arithmetic from brains into calculators: you are more likely to get a correct answer, and you become able to answer orders of magnitude more difficult questions. If the state of the tool is such that the intuitive answer is better, then one should embrace intuitive answers (for now). The goal is to eventually get a framework that is actually better than intuitive answers in at least some nontrivial area of applicability (or to work in the direction of this goal, while it remains unattainable).
The problem with "moral codes" is that they are mostly insane, in their overconfidence considering rather confused raw material as useful answers. Trying to finally get it right is not the same as welcoming insanity, although the risk is always there.
You say: It's possible to specify a utility function such that, if we feed it to a strong optimization process, the result will be good.
I say: Yeah? Why do you think so? What little evidence we currently have, isn't on your side.
You say: It's possible to specify a utility function such that, if we feed it to a strong optimization process, the result will be good.
Formally, it's trivially true even as you put it, as you can encode any program with an appropriately huge utility function. Therefore, whatever way of doing things is better than using ape-brains, can be represented this way.
It's not necessarily useful to look at the problem in a way you stated it: I'm at this point doubtful of "expected utility maximization" being the form of a usefully stated correct solution. So I speak of tools. That there are tools better than ape-brains should be intuitively obvious, as a particular case of a tool is just an ape-brain that has been healed of all ills, an example of a step in the right direction, proving that steps in the right directions are possible. I contend there are more steps to be taken, some not as gradual or obvious.
Vladimir, sorry. I noticed my mistake before you replied, and deleted my comment. Your reply is pretty much correct.
Do you think you can expand on this, perhaps in a top-level post? I feel somewhat sympathetic towards what you said, but would like to understand your position better to see if I really agree with it or not.
As far as "this site has adopted expected utility as a prescriptive theory", that's hardly surprising since expected utility has been the dominant paradigm of rationality for decades. (Perhaps centuries? Wikipedia says Bernoulli invented it in 1738.) This site has actually done more than most to challenge it, I think.
There certainly is a lot of moral prescription going on. This is mostly indirect, implicit in the kind of questions that get asked rather than directly asserted. "Expected utility" is the right thing to optimise for, almost by definition. But there is more than that at play. In particular, there tends to be an assumption that other people's utility functions will, and in fact 'should' contribute to mine in a simple, sometimes specific, way. I don't particularly respect that presumption.
Edit: Fixed the typo that cousin_it tactfully corrected in his quote.
You don't value other people's lives because they value their own lives. Paperclip maximizers value paperclips, but you won't take that into account. It's not so much contribution of other people's utility functions that drives your decisions (or morality). You just want mostly the same things, and care about others' well-being (which you should to an unknown extent, but which you obviously do at least somewhat).
"Expected utility" is the right thing to optimise for, almost by definition.
This isn't clear. Preferences of any actual human seem to form a directed graph, but it's incomplete and can contain cycles. Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere. Different algorithms will destroy different facets of actual human preference, but there's certainly no algorithm that can preserve all of it; that much we can consider already proven beyond reasonable doubt. It's not obvious to me that there's a single, well-defined, canonical way to perform this surgery.
And it's not at all obvious that going from a single human to an aggregate of all humanity will mitigate the problem (see Torture vs Specks). That's just too many leaps of faith.
I agree/upvoted your point. Human preferences are cyclic. I'd go further and say that without at least having a preference graph that is acyclic it is not possible to optimise a decision at all. The very thought seems meaningless.
Assuming one can establish coherent preferences the question of whether one should optimise for expected utility encounters a further complication. Many human preferences are refer to our actions and not outcomes. An agent could in fact decide to optimise for making 'Right' choices and to hell with the consequences. They could choose not to optimise for expected utility. Of course, it seems like that choice was the one with the highest expected value in their rather wacky utility function.
It's not an observation that warrants much more than those three words and the comma but it seems to me that either you are optimising a decision for expected utility or you are doing some other thing than optimising. 'Expected utility' just happens to be the name given to value in the function you use if you are optimising a decision.
In the light of the correction you've made just now, do you retract this comment as well? (It looks to be based on the same mistake, but if you don't think so, I'd like to argue.)
No, it's a different point, and one I'd be happy to argue. Here I talk about encoding actual human preferences over all possible futures, not designing an algorithm that will yield one good future. For example, an algorithm that gives one good future may never actually have to worry about torture vs dust specks. So it's not clear that we should worry about it either.
Preferences of any actual human seem to form a directed graph, but it's incomplete and can contain cycles.
I suspect you are not talking about neurons in the brain, but have no idea what you do mean...
Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere. Different algorithms will destroy different facets of actual human preference, but there's certainly no algorithm that can preserve all of it; that much we can consider already proven beyond reasonable doubt. It's not obvious to me that there's a single, well-defined, canonical way to perform this surgery.
By Church-Turing thesis, you can construct an artifact behaviorally indistinguishable from a human based even on expected utility maximization (even though it's an inadequate thing to do). Whatever you can expect of a real human, including answering hypothetical questions, you can expect from this construction.
Here I talk about encoding actual human preferences over all possible futures, not designing an algorithm that will yield one good future. For example, an algorithm that gives one good future may never actually have to worry about torture vs dust specks. So it's not clear that we should worry about it either.
Algorithms are strategies, they are designed to work depending on observations. When you design an algorithm, you design behaviors for all possible futures. Other than giving this remark, I don't know what to do with your comment...
Preference as order on situations? Make that order on histories, or better order on games to be provably won, but you should already know that, so again I don't see what you are saying.
Oh, okay, on possible histories. I really don't understand what's unclear to you. It's not obvious to me that there's a unique canonical way to build a complete acyclic graph (utility-based preference) from an incomplete graph with cycles (actual human preference). Yes, expected utility optimization can mimic any behavior, but I don't want to mimic behavior, I want to represent the data structure of preferences.
By C-T, you can represent any data, right? The utility-surrogate can have a detailed scan of a human in its virtual utility-maximizing pocket, or even run a simulation of human brain, just on a different substrate.
For histories: you argue that people have cyclic preference over world histories as well, because you consider preference to be the same thing as choice, that is prone to whim? That's not what I mean by preference (which you should also know), but it explains your comments in this thread.
Whims are all we can observe. We disagree on whether whims can be canonically regularized into something coherent. I don't think Eliezer knows that either (it's kind of similar to the question whether humanity's volition coheres). Yeah, he's trying to regularize his whims, and you may strive for that too, but what about the rest of us?
You can consider a person as a system that gives various counterfactual reactions to interaction -- most of these reactions won't be observed in the history of what actually happened to that person in the past. While it e.g. makes sense to talk about what a person (actually) answered to a question asked in English, you are not working with concepts themselves in this setting: just as the interpretation of words is a little iffy, deeper understanding of the meaning of the words (by the person who answers the questions) is even more iffy.
What you need to talk about preference is to compare huge formal strategies or games (not even snapshots of the history of the world), while what you get in the naive settings is asking "yes/no" questions in English.
Unavailability of adequate formalization of what it means to ask the actual question about consequences doesn't justify jumping to identification of preference with "yes/no" utterances resulting from questions obtained in unspecified manner.
I don't see how going from yes/no questions to simulated games helps. People will still exhibit preference reversals in their actions, or just melt down.
I wasn't proposing a solution (I wasn't talking about simulating humans playing a game -- I was referring to a formal object). The strategies that need to be compared are too big for a human to comprehend -- that's one of the problems with defining what the preference is via asking questions (or simulating humans playing games). When you construct questions about the actual consequences in the world, you are simplifying, and through this simplification lose precision. That a person can make mistakes, can be wrong, is the next step through which this process loses the original question, and a way in which you can get incoherent responses: that's noise. It doesn't follow from the presence of noise that noise is inherent in the signal, and it doesn't make sense to define signal as signal with noise.
But you need at least a conceptual way to tell signal from noise. Maybe an analogy will help: do you also think that there's an ideal Platonic market price that gets tainted by real-world "noise"?
I don't understand market price enough to make this analogy. I don't propose solutions, I merely say that considering noise as part of signal ignores the fact that it's noise. There is even a strong human intuition that there are errors. If I understand that I made an error, I consider it preferable that my responses-in-error won't be considered correct by definition.
The concept of correct answer is distinct from the concept of answer actually given. When we ask questions about preference, we are interested in correct answers, not in answers actually given. Furthermore, we are interested in correct answers to the questions that can't physically be neither asked from nor answered by a human.
Formalizing the sense of correct answers is a big chunk of FAI, while formalizing the sense of actual answers or even counterfactual actual answers is trivial if you start from physics. It seems clear that these concepts are quite different, and the (available) formalization of the second doesn't work for the first. Furthermore, "actual answers" also need to be interfaced with a tool that states "complete-states-of-the-world with all quarks and stuff" as human-readable questions.
Preferences of any actual human seem to form a directed graph, but it's incomplete and can contain cycles. Any way to transform it into a complete acyclic graph (any pair of situations comparable, no preference loops) must differ from the original graph somewhere.
What graph??? An accurate account should take care of every detail. I feel you are attacking some simplistic strawman, but I'm not sure of what kind.
Do you agree that it's possible in principle to implement an artifact behaviorally indistinguishable from a human being that runs on expected utility maximization, with sufficiently huge "utility function" and some simple prior? Well, this claim seems to be both trivial and useless, as it speaks not about improvement, just surrogate.
"Expected utility" is the right thing to optimise for, almost by definition.
Why? I don't see any conclusive justification for that yet, except mathematical convenience.
I get this from the post: a person may not only be happy in themselves, but bring (on net) happiness to others.
In such a case, the decision is clear: keep/add such a person, even if you care only for average happiness.
It is important to get straight what you are varying and what you are holding constant when you vary population. In this post you seem to be talking about adding one more person without having to pay the cost of creating that person, and then considering all the effects of adding that person to our total welfare. Most philosophers instead talk about holding most everything else constant while adding one more person at some assumed standard of living.
As an example, consider a parent and young child. If you allow one of them to die, not only do you end that life, but you make the other one significantly worse off.
No. The parent was already dead. Now we just randomly 'saved' young orphan and they will live a life of squalor. By consuming from what little resources are available in her miserable circumstances the overall quality of life for her and her fellow orphans will be even worse for her presence.
My point is that it is not appropriate to draw a conclusion that the value of a random human life is superlinear at the margin by providing a specific example of a human life that has significant and obvious impact on another. It would be equally absurd to 'prove' a negative utility for saving a life by mentioning saving an axe murderer while he is in his axe-slaying prime.
It would be equally absurd to 'prove' a negative utility for saving a life by mentioning saving an axe murderer while he is in his axe-slaying prime.
Not "equally absurd." The stated assumption is that the general case is more like the parent-child case than the axe murderer case:
But this generalizes: the marginal person (on average) produces positive net value to society (though being an employee, friend, spouse, etc.) in addition to accruing their own utilons
Also, are you actually saying the young orphan will be better off dead? Is that what she would choose, given the choice?
Not "equally absurd." The stated assumption is that the general case is more like the parent-child case than the axe murderer case:
True.
Also, are you actually saying the young orphan will be better off dead?
I'm saying that the marginal person on average adds negative net value to the expected utility of the universe as evaluated by me. As well as pointing out the negative utility of an orphan starving I point out that a saved life can have negative externalities as well as positive. In this case it was to the other orphans who lose some of their resources.
A negative externality that I place the more weight on is the contribution to existential risk. I do discount the far distant future but not to the extent that I don't consider existential risk at this critical stage of development to be pretty damn important.
Is that what she would choose, given the choice?
This isn't fallacious since the argument behind this line of questioning is implied rather than explicit. In most cases the reasoning is not even noticed consciously by the author or many readers.
let's just remember that it is an approximation.
From my work with modeling an simulation you only approximate when you can do no better. In the case of calculating "utility," ostensibly towards some decision based reasoning, that isn't good enough. At least for me. There are too many exogenous variables currently.
You seem to be implying that you can do better. Please tell us how; how do you approximate the utility curve of the human population, and how do you know that your approximation is "better" than a linear one?
You seem to be implying that you can do better.
Quite the contrary - I am saying it is currently impossible; thus in my view useless for accurate predictive models which should be applied to decision making. I think this is all great philosophy and science but once we start talking about individual "utility functions" we are talking fantasy. This is where I diverge with the consequentialist camp (granted I used to be a consequentialist).
That was the epigraph Eliezer used on a perfectly nice post reminding us to shut up and multiply when valuing human lives, rather than relying on the (roughly) logarithmic amount of warm fuzzies we'd receive. Implicit in the expected utility calculation is the idea that the value of human lives scales linearly: indeed, Eliezer explicitly says, "I agree that one human life is of unimaginably high value. I also hold that two human lives are twice as unimaginably valuable."
However, in a comment on Wei Dai's brilliant recent post comparing boredom and altruism, Vladimir Nesov points out that "you can value lives sublinearly" and still make an expected utility calculation rather than relying on warm-fuzzy intuition. This got me thinking about just what the functional form of U(Nliving-persons) might be.
Attacking from the high end (the "marginal" calculation), it seems to me that the utility of human lives is actually superlinear to a modest degree1; that is, U(N+1)-U(N) > U(N)-U(N-1). As an example, consider a parent and young child. If you allow one of them to die, not only do you end that life, but you make the other one significantly worse off. But this generalizes: the marginal person (on average) produces positive net value to society (though being an employee, friend, spouse, etc.) in addition to accruing their own utilons, and economies of scale dictate that adding another person allows a little more specialization and hence a little more efficiency. I.e., the larger the pool of potential co-workers/friends/spouses is, the pickier everyone can be, and the better matches they're likely to end up with. Steven Landsburg (in Fair Play) uses a version of this argument to conclude that children have positive externalities and therefore people on average have fewer children than would be optimal.
In societies with readily available birth control, that is. And naturally, in societies which are insufficiently technological for each marginal person to be able to make a contribution, however indirect, to (e.g.) the food output, it's quite easy for the utility of lives to be sublinear, which is the classical Malthusian problem, and still very much with us in the poorer areas of the world. (In fact, I was recently informed by a humor website that the Black Death had some very positive effects for medieval Europe.)
Now let's consider the other end of the problem (the "inductive" calculation). As an example let's assume that humanity has been mostly supplanted by AIs or some alien species. I would certainly prefer to have at least one human still alive: such a person could represent humanity (and by extension, me), carry on our culture, values and perspective on the universe, and generally push for our agenda. Adding a second human seems far less important—but still quite important, since social interactions (with other humans) are such a vital component of humanity. So adding a third person would be less important still, and so on. A sublinear utility function.
So are the marginal calculation and the inductive calculation inconsistent? I don't think so: it's perfectly possible to have a utility function whose first derivative is complex and non-monotonic. The two calculations are simply presenting two different terms of the function, which are dominant in different regimes. Moreover the linear approximation is probably good enough for most ordinary circumstances; let's just remember that it is an approximation.
1. Note that in these arguments I'm averaging over "ability to create utility" (not to mention "capacity to experience utility").