The Preference Utilitarian’s Time Inconsistency Problem

Wei Dai

36 The Preference Utilitarian’s Time Inconsistency Problem

by Wei Dai

15th Jan 2010

2 min read

108

36

In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:

If preference utilitarianism is about making peoples’ preferences and the universe coincide, wouldn't it be much easier to change peoples’ preferences than the universe?

Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.

Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one: time inconsistency.

The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T₀, where T₀ is a constant representing the time of self-modification.

The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.

So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?

Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?

ConsequentialismUtility Functions

Personal Blog

36

New Comment

Rendering 0/108 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:57 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

36 The Preference Utilitarian’s Time Inconsistency Problem

by Wei Dai

15th Jan 2010

2 min read

108

36

In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:

If preference utilitarianism is about making peoples’ preferences and the universe coincide, wouldn't it be much easier to change peoples’ preferences than the universe?

Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?

ConsequentialismUtility Functions

Personal Blog

36

Mentioned in

148Problems I've Tried to Legibilize

70Where do selfish values come from?

New Comment

Rendering 0/108 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 5:57 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Moderation Log

More from Wei Dai

Curated and popular this week

108Comments

108

Comment Permalink

Nick_Tarleton16y40

One can't really equate risking a life with outright killing.

Even if you can cleanly distinguish them for a human, what's the difference from the perspective of an effectively omniscient and omnipotent agent? (Whether or not an actual AGI would be such, a proposed morality should work in that case.)

If we want any system that is aligned with human morality we just can't make decision based on the desirability of the outcome. For example: "Is it right to kill a healthy person to give its organs to five terminally ill patients and therefore save five lives at a cost of one." Our sense says killing an innocent bystander as immoral, even if it saves more lives. (See http://www.justiceharvard.org/)

Er, doesn't that just mean human morality assigns low desirability to the outcome innocent bystander killed to use organs? (That is, if that actually is a pure terminal value - it seems to me that this intuition reflects a correct instrumental judgment based on things like harms to public trust, not a terminal judgment about the badness of a death increasing in proportion to the benefit ensuing from that death or something.)

If we want a system to be well-defined, reflectively consistent, and stable under omniscience and omnipotence, expected-utility consequentialism looks like the way to go. Fortunately, it's pretty flexible.

Christian_Szegedy16y10

Even if you can cleanly distinguish them for a human, what's the difference from the perspective of an effectively omniscient and omnipotent agent? (Whether or not an actual AGI would be such, a proposed morality should work in that case.)

To me, "omniscience" and "omnipotence" seem to be self-contradictory notions. Therefore, I consider it a waste of time to think about beings with such attributes.

reflects a correct instrumental judgment based on things like harms to public trust, not a terminal judgment about the badness of a deat

... (read more)

0Christian_Szegedy16y

That's why I put "I am unsure how you define utilitarism". If you just evaluate the outcome, then you see f(1 dead)+f(5 alive). If you evaluate the whole process, you see "f(1 guy killed as an innocent bystander) + f(5 alive)", which may have a much lower desirability due to morality impact. The same consideration applies to the OP: If you only evaluate the final outcome: you may think that killing hard to satisfy people is a good thing. However if you add the morality penalty of killing innocent people, then the equation suddenly changes. The question of 1/multi-dimensional objective remains: the extreme liberal moralism would say that it is not allowed to take one dollar from a person, even if it could pay for saving one life, or killing one innocent bystander is wrong even if it could save billion lifes. Just because our agents are autonomous entities and they have unalienable rights to life, property, freedom, that can't be violated, even for the greater good. The above problems can only be solved if the moral agents voluntarily opt into a system that takes away a portion of their individual freedom for a greater good. However this system should not give arbitrary power to a single entity but every (immoral) violation of autonomy should happen for a well defined "higher" purpose. I don't say that this is the definitive way to address morality abstractly in the presence of a superintelligent entity, these are just reiterations of some of the moral principles our liberal western democracy are built upon.

See in context