Brief Question about FAI approaches

Dolores1984

I've been reading through this to get a sense of the state of the art at the moment:

http://lukeprog.com/SaveTheWorld.html

Near the bottom, when discussing safe utility functions, the discussion seems to center on analyzing human values and extracting from them some sort of clean, mathematical utility function that is universal across humans. This seems like an enormously difficult (potentially impossible) way of solving the problem, due to all the problems mentioned there.

Why shouldn't we just try to design an average bounded utility maximizer? You'd build models of all your agents (if you can't model arbitrary ordered information systems, you haven't got an AI), run them through your model of the future resulting from a choice, take the summation of their utility over time, and take the average across all the people all the time. To measure the utility (or at least approximate it), you could just ask the models. The number this spits out is the output of your utility function. It'd probably also be wise to add a reflexive consistency criteria, such that the original state of your model must consider all future states to be 'the same person.' -- and I acknowledge that that last one is going to be a bitch to formalize. When you've got this utility function, you just... maximize it.

Something like this approach seems much more robust. Even if human values are inconsistent, we still end up in a universe where most (possibly all) people are happy with their lives, and nobody gets wireheaded. Because it's bounded, you're even protected against utility monsters. Has something like this been considered? Is there an obvious reason it won't work, or would produce undesirable results?

Thanks,

Dolores

Bounded utility and infinite utility are different things. A utility function u from outcomes to real numbers is bounded if there is a number M such that for every outcome x, we have |u(x)| < M.

I was confused, thanks There are two ways that I can imagine having a bounded utility function; either define the function so that it has a finite bound or only define it over a finite domain. I was only thinking about the former when I wrote that comment (and not assuming its range was limited to the reals, e.g. "infinity" was a valid utility), and so I missed the fact that the utility function could be unbounded as the result of an infinite domain.

When we talk about utility functions, we're talking about functions that encode a rational agent's preferences. It does not represent how happy an agent is.

First of all, was I wrong in assuming that A's high preference for an odd number of stars puts it at a disadvantage to B in normalized utility, making B the utility monster? If not, please explain how A can become a utility monster if, e.g. A's most important preference is having an odd number of stars and B's most important preference is happily living forever. Doesn't a utility monster only happen if one agent's utility for the same things is overvalued, which normalization should prevent?

What does it mean for A and B to "have identical preferences" if in fact A has an overriding preference for an odd number of stars? I think that the maximum utility (if it exists) that an agent can achieve should be normalized against the maximum utility of other agents otherwise the immediate result is a utility monster. It's one thing for A to have its own high utility for something, it's quite another for A to have arbitrarily more utility than any other agent.

Also, if A's highest preference has no chance of being an outcome then isn't the solution to fix A's utility function instead of favoring B's achievable preferences? The other possibility is to do run-off voting on desired outcomes so that A's top votes are always going to be for outcomes with an odd number of stars, but when those world states lose the votes will run off to the outcomes that are identical except for there being an even or indeterminate number of stars, and then A's and B's voting preferences will be exactly the same.

Ah, you're right. B would be the utility monster. Not because A's normalized utilities are lower, but because the intervals between them are shorter. I could go into more detail in a top-level Discussion post, but I think we're basically in agreement here.

Also, if A's highest preference has no chance of being an outcome then isn't the solution to fix A's utility function instead of favoring B's achievable preferences?

Well, now you're abandoning the program of normalizing utilities and averaging them, the inadequacy of which program this thought experiment was meant to demonstrate.

2[anonymous]14y

Agent utility and utilitarian utility (this renormaization/combining buisness) are two entirely seperate things. No reason the former has to impact the latter, in fact, as we can see, it causes utility monsters and such. I can't comment further. Every way I look at it, combining preferences (utilitarianism) is utterly incoherent. Game theory/cooperation seems the only tractible path. I don't know the context here tho... [...] Solution for who? A certainly doesn't want you mucking around it its utility function as that would cause it to not do good things in the universe (from its perspective)