I'm convinced of utilitarianism as the proper moral construct, but I don't think an AI should use a free-ranging utilitarianism, because it's just too dangerous. A relatively small calculation error, or a somewhat eccentric view of the future can lead to very bad outcomes indeed.
A really smart, powerful AI, it seems to me, should be constrained by rules of behavior (no wiping out humanity/no turning every channel into 24-7 porn/no putting everyone to work in the paperclip factory), The assumption that something very smart would necessarily reach correct utiltarian views seems facially false; it could assume that humans must think like it does, or assume that dogs generate more utility with less effort due to their easier ability to be happy, or decide that humans need more superintelligent machines in a great big hurry and should build them regardless of anything else.
And maybe it'd be right here or there. But maybe not. I think almost definitionally that FAI cannot be full-on, free-range utilitarian of any stripe. Am I wrong?
The ideas under consideration aren't as simple as having the AI act by pleasure utlitarianism or preference utilitarianism, because we actually care about a whole lot of things in our evaluation of futures. Many of the things that might horrify us are things we've rarely or never needed to be consciously aware of, because nobody currently has the power or the desire to enact them; but if we miss adding just one hidden rule, we could wind up in a horrible future.
Thus "rule-following AI" has to get human nature just as right as "utilitarian A...
In May of 2007, DanielLC asked at Felicifa, an “online utilitarianism community”:
Indeed, if we were to program a super-intelligent AI to use the utility function U(w) = sum of w’s utilities according to people (i.e., morally relevant agents) who exist in world-history w, the AI might end up killing everyone who is alive now and creating a bunch of new people whose preferences are more easily satisfied, or just use its super intelligence to persuade us to be more satisfied with the universe as it is.
Well, that can’t be what we want. Is there an alternative formulation of preference utilitarianism that doesn’t exhibit this problem? Perhaps. Suppose we instead program the AI to use U’(w) = sum of w’s utilities according to people who exist at the time of decision. This solves the Daniel’s problem, but introduces a new one: time inconsistency.
The new AI’s utility function depends on who exists at the time of decision, and as that time changes and people are born and die, its utility function also changes. If the AI is capable of reflection and self-modification, it should immediately notice that it would maximize its expected utility, according to its current utility function, by modifying itself to use U’’(w) = sum of w’s utilities according to people who existed at time T0, where T0 is a constant representing the time of self-modification.
The AI is now reflectively consistent, but is this the right outcome? Should the whole future of the universe be shaped only by the preferences of those who happen to be alive at some arbitrary point in time? Presumably, if you’re a utilitarian in the first place, this is probably not the kind of utilitarianism that you’d want to subscribe to.
So, what is the solution to this problem? Robin Hanson’s approach to moral philosophy may work. It tries to take into account everyone’s preferences—those who lived in the past, those who will live in the future, and those who have the potential to exist but don’t—but I don’t think he has worked out (or written down) the solution in detail. For example, is the utilitarian AI supposed to sum over every logically possible utility function and weigh them equally? If not, what weighing scheme should it use?
Perhaps someone can follow up Robin’s idea and see where this approach leads us? Or does anyone have other ideas for solving this time inconsistency problem?