Brief Question about FAI approaches

Dolores1984

I've been reading through this to get a sense of the state of the art at the moment:

http://lukeprog.com/SaveTheWorld.html

Near the bottom, when discussing safe utility functions, the discussion seems to center on analyzing human values and extracting from them some sort of clean, mathematical utility function that is universal across humans. This seems like an enormously difficult (potentially impossible) way of solving the problem, due to all the problems mentioned there.

Why shouldn't we just try to design an average bounded utility maximizer? You'd build models of all your agents (if you can't model arbitrary ordered information systems, you haven't got an AI), run them through your model of the future resulting from a choice, take the summation of their utility over time, and take the average across all the people all the time. To measure the utility (or at least approximate it), you could just ask the models. The number this spits out is the output of your utility function. It'd probably also be wise to add a reflexive consistency criteria, such that the original state of your model must consider all future states to be 'the same person.' -- and I acknowledge that that last one is going to be a bitch to formalize. When you've got this utility function, you just... maximize it.

Something like this approach seems much more robust. Even if human values are inconsistent, we still end up in a universe where most (possibly all) people are happy with their lives, and nobody gets wireheaded. Because it's bounded, you're even protected against utility monsters. Has something like this been considered? Is there an obvious reason it won't work, or would produce undesirable results?

Thanks,

Dolores

Let me review the features of the algorithm:

The FAI maximizes overall utility.
It obtains a value for the overall utility of a possible world by adding the personal utilities of everyone in the world. But there is a bound. It's unclear to me whether the bound applies directly to personal utilities - so that a personal utility exceeding the bound is reduced to the bound for the purposes of subsequent calculation - or whether the bound applies to the sum of personal utilities - so that if the overall utility of a possible world exceeds the bound, it is reduced to the bound for the purposes of decision-making (comparison between worlds).
If one of the people whose personal utilities gets summed, is a future continuation of an existing person (someone who exists at the time the FAI gets going), then the present-day person gets to say whether that is a future self of which they would approve.

The last part is the most underspecified aspect of the algorithm: how the approval-judgement is obtained, what form it takes, and how it affects the rest of the decision-making calculation. Is the FAI only to consider scenarios where future continuants of existing people are approved continuants, with any scenario containing an unapproved continuant just ruled out apriori? Or are there degrees of approval?

I think I will call my version (which probably deviates from your conception somewhere) a "Bounded Approved Utility Maximizer". It's still a dumb name, but it will have to do until we work our way to a greater level of clarity.

By bounded, I simply meant that all reported utilities are normalized to a universal range before being summed. Put another way, every person has a finite, equal fraction of the machine's utility to distribute among possible future universes. This is entirely to avoid utility monsters. It's basically a vote, and they can split it up however they like.

Also, the reflexive consistency criteria should probably be applied even to people who don't exist yet. We don't want plans to rely on creating new people, then turning them into happy monsters, even i... (read more)