For example, if it develops some diet drug that lets you safely enjoy eating and still stay skinny and beautiful, that might be a better result than you could provide for yourself, and it doesn't need any special understanding of you to make that happen.
It might not need special knowledge of my psychology, but it certainly needs special knowledge of my physiology.
But notice that the original point was about human preferences. Even if it provides new technologies that dissolve internal conflicts, the question of whether or not to use the technology becomes a conflict. Remember, we live in a world where some people have strong ethical objections to vaccines. An old psychological finding is that oftentimes, giving people more options makes them worse off. If the AI notices that one of my modules enjoys sensory pleasure, offers to wirehead me, and I reject it on philosophical grounds, I could easily become consumed by regret or struggles with temptation, and wish that I never had been offered wireheading in the first place.
Putting an inferior argument first is good if you want to try to get the last word, but it's not a useful part of problem solving. You should try to find the clearest problem where solving that problem solves all the other ones.
I put the argument of internal conflicts first because it was the clearest example, and you'll note it obliquely refers to the argument about status. Did you really think that, if a drug were available to make everyone have perfectly sculpted bodies, one would get the same social satisfaction from that variety of beauty?
If it can do a reasonable job of comparing utilities across people, then maximizing average utility seems to do the right thing here.
I doubt it can measure utilities; as I argued two posts ago, and simple average utilitarianism is so wracked with problems I'm not even sure where to begin.
Comparing utilities between arbitrary rational agents doesn't work, but comparing utilities between humans seems to -- there's an approximate universal maximum (getting everything you want) and an approximate universal minimum (you and all your friends and relatives getting tortured to death).
A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don't get their way. How should such real-life utility monsters be dealt with?
Status conflicts are not one of the interesting use cases.
Why do you find status uninteresting?
A common tactic in human interaction is to care about everything more than the other person does, and explode (or become depressed) when they don't get their way. How should such real-life utility monsters be dealt with?
If everyone's inferred utility goes from 0 to 1, and the real-life utility monster cares more than the other people about one thing, the inferred utility will say he cares less than other people about something else. Let him play that game until the something else happens, then he loses, and that's a fine outcome.
...I doubt it can measur
Anyone who does not believe mental states are ontologically fundamental - ie anyone who denies the reality of something like a soul - has two choices about where to go next. They can try reducing mental states to smaller components, or they can stop talking about them entirely.
In a utility-maximizing AI, mental states can be reduced to smaller components. The AI will have goals, and those goals, upon closer examination, will be lines in a computer program.
But in the blue-minimizing robot, its "goal" isn't even a line in its program. There's nothing that looks remotely like a goal in its programming, and goals appear only when you make rough generalizations from its behavior in limited cases.
Philosophers are still very much arguing about whether this applies to humans; the two schools call themselves reductionists and eliminativists (with a third school of wishy-washy half-and-half people calling themselves revisionists). Reductionists want to reduce things like goals and preferences to the appropriate neurons in the brain; eliminativists want to prove that humans, like the blue-minimizing robot, don't have anything of the sort until you start looking at high level abstractions.
I took a similar tack asking ksvanhorn's question in yesterday's post - how can you get a more accurate picture of what your true preferences are? I said:
A more practical example: when people discuss cryonics or anti-aging, the following argument usually comes up in one form or another: if you were in a burning building, you would try pretty hard to get out. Therefore, you must strongly dislike death and want to avoid it. But if you strongly dislike death and want to avoid it, you must be lying when you say you accept death as a natural part of life and think it's crass and selfish to try to cheat the Reaper. And therefore your reluctance to sign up for cryonics violates your own revealed preferences! You must just be trying to signal conformity or something.
The problem is that not signing up for cryonics is also a "revealed preference". "You wouldn't sign up for cryonics, which means you don't really fear death so much, so why bother running from a burning building?" is an equally good argument, although no one except maybe Marcus Aurelius would take it seriously.
Both these arguments assume that somewhere, deep down, there's a utility function with a single term for "death" in it, and all decisions just call upon this particular level of death or anti-death preference.
More explanatory of the way people actually behave is that there's no unified preference for or against death, but rather a set of behaviors. Being in a burning building activates fleeing behavior; contemplating death from old age does not activate cryonics-buying behavior. People guess at their opinions about death by analyzing these behaviors, usually with a bit of signalling thrown in. If they desire consistency - and most people do - maybe they'll change some of their other behaviors to conform to their hypothesized opinion.
One more example. I've previously brought up the case of a rationalist who knows there's no such thing as ghosts, but is still uncomfortable in a haunted house. So does he believe in ghosts or not? If you insist on there being a variable somewhere in his head marked $belief_in_ghosts = (0,1) then it's going to be pretty mysterious when that variable looks like zero when he's talking to the Skeptics Association, and one when he's running away from a creaky staircase at midnight.
But it's not at all mysterious that the thought "I don't believe in ghosts" gets reinforced because it makes him feel intelligent and modern, and staying around a creaky staircase at midnight gets punished because it makes him afraid.
Behaviorism was one of the first and most successful eliminationist theories. I've so far ignored the most modern and exciting eliminationist theory, connectionism, because it involves a lot of math and is very hard to process on an intuitive level. In the next post, I want to try to explain the very basics of connectionism, why it's so exciting, and why it helps justify discussion of behaviorist principles.