The AI is now reflectively consistent, but is this the right outcome?
Yes.
Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?
'Should'? Us deciding what should be is already us pretending, hoping or otherwise counterfactually assuming for the purposes of discussion that we can choose the fate of the universe. It so happens that many people that happen to be alive at this arbitrary point in time have preferences with altruistic components that could consider future agents. Lucky them, assuming these arbitrary agents get their way.
Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.
That may explain my disagreement (or, as phrased, unexpected agreement). I tend to consider utilitarianism (as typically described) to be naive, verging on silly. The U" option you describe at least seems to have the coherency required to be implemented in practice without a catastrophic or absurd result.
Given that most attempts at thinking through the consequences of utilitarian ethics resemble a proof by contradiction that utilitarianism cannot be a good basis for ethics it surprises me how many people continue to embrace it and try to fix it.
Please don't throw around Gödel's Theorem before you've really understood it— that's one thing that makes people look like cranks!
"When does that ever happen and how does answering that question help me be more ethical?"
Very rarely; but pondering such hypotheticals has helped me to see what some of my actual moral intuitions are, once they are stripped of rationalizations (and chances to dodge the question). From that point on, I can reflect on them more effectively.
There's a far worse problem with the concept of 'utility function' as a static entity than that different generations have different preferences: The same person has very different preferences depending on his environment and neurochemistry. A heroin addict really does prefer heroin to a normal life (at least during his addiction). An ex-junkie friend of mine wistfully recalls how amazing heroin felt and how he realized he was failing out of school and slowly wasting away to death, but none of that mattered as long as there was still junk. Now, it's no...
The AI is now reflectively consistent, but is this the right outcome?
I'd say so.
I wan't the AI to maximize my utility, and not dilute the optimization power with anyone else's preferences (by definition). Of course, to the extent that I care about others they will get some weight under my utility function, but any more than that is not something I'd wan't.
Anything else is just cooperation, which is great, since it greatly increases the chance of it working- and even more so the chance of it working for you. The group of all people the designers can easi...
I wouldn't be so quick to discard the idea of the AI persuading us that things are pretty nice the way they are. There are probably strong limits to the persuadability of human beings, so it wouldn't be a disaster. And there is a long tradition of advice regarding the (claimed) wisdom of learning to enjoy life as you find it.
Obviously, weighing equally over every logically possible utility function will produce a null result - for every utility function, a corresponding utility function with the opposite preferences will exist.
If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.
To do this it would have to be badly programmed. We start out with a time-dependent utility function U'(t). We propose to change it to U'', where U''(t) = U'(0) for all times t. But those are different functions! The ...
creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.
Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time?
Well, making people's preferences coincide with the universe by adjusting people's preferences is not possible if people prefer their preferences not to be adjusted to the universe. Or possible only to the extent people currently prefer being chan...
I believe you can strip the AI of any preferences towards human utility functions with a simple hack.
Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.
Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.
Note the importance of the "current" expected utility in this setup; an AI will decide whether to in...
Related question: What is the purpose of taking into consideration the preferences of people NOT around to deal with the AI?
The dead and the potential-future-people, not to mention the people of other possible worlds, haven't got any say in anything that happens now in this world. This is because it is physically impossible for us (people in the present of this possible world) to find out what those preferences are. At best, we can only guess and extrapolate.
Unless the AI has the ability to find out those preferences, it ought to weigh currently our preferences more heavily because of that additional certainty.
So are we going to take into account the preferences of the AI itself? Or are we going to violate its rights by creating its preferences based on our current liking? What about the other AI's and their preferences? Obviously this is a paradox which arises by considering to please imaginary entities.
My version of utilitarianism is "dealism", and the way I'd suggest thinking about this is in terms of the scope of the implicit "deal" you are implementing. At one extreme you as dictator just enforce your temporary personal preferences over everything, while at the other extreme you weigh the preferences of all creatures who ever have existed or ever could exist. Doing anything but the later may be a slippery slope. First you'll decide to ignore possible creatures, then future creatures, then animals, then maybe people with low IQ, ...
If you believe that human morality is isomorphic to preference utilitarianism - a claim that I do not endorse, but which is not trivially false - then using preferences from a particular point in time should work fine, assuming those preferences belong to humans. (Presumably humans would not value the creation of minds with other utility functions if this would obligate us to, well, value their preferences.)
use its super intelligence to persuade us to be more satisfied with the universe as it is.
Actually, I would consider this outcome pretty satisfactory. My life is (presumably) unimaginably good compared to that of a peasant from the 1400s but I'm only occasionally ecstatic with happiness. It's not clear to me that a radical upgrade in my standard of living would change this...
The AI might [...] just use its super intelligence to persuade us to be more satisfied with the universe as it is.
Well, that can’t be what we want.
Actually, I believe Buddhism says that this is exactly what we want.
As far as I can tell, all this post says is that utilitarianism is entirely dependant on a given set of preferences, and its outcomes will only be optimal from the perspective of those preferences.
This is true, but I'm not sure its all that interesting.
I'm convinced of utilitarianism as the proper moral construct, but I don't think an AI should use a free-ranging utilitarianism, because it's just too dangerous. A relatively small calculation error, or a somewhat eccentric view of the future can lead to very bad outcomes indeed.
A really smart, powerful AI, it seems to me, should be constrained by rules of behavior (no wiping out humanity/no turning every channel into 24-7 porn/no putting everyone to work in the paperclip factory), The assumption that something very smart would necessarily reach correct u...
I am not sure the exact semantics of the word "utilitarism" in your post, but IMO it would be better to use multi-dimensional objective function rather than simple numbers.
For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.
Another aspect is t...
For example killing a single moral agent should outweigh convenience gain by any number of agents. (see dust speck vs. torture). That can be modeled by a two-dimensional objective, the first number represents the immorality of choice and the second is the total preference. The total order over the scoring would be a lexicographic order of the two components.
If not killing has lexical priority, all other concerns will be entirely overridden by tiny differences in the probability of killing, in any non-toy case.
Anyway, our preferences seem more directly not to give life lexical priority. We're willing to drive to the store for convenience, and endorse others doing so, even though driving imposes a nontrivial risk of death on oneself and others.
In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:
Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.
Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one: time inconsistency.
The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.
The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.
So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?
Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?