The trolley problem
In 2009, a pair of computer scientists published a paper enabling computers to behave like humans on the trolley problem (PDF here). They developed a logic that a computer could use to justify not pushing one person onto the tracks in order to save five other people. They described this feat as showing "how moral decisions can be drawn computationally by using prospective logic programs."
I would describe it as devoting a lot of time and effort to cripple a reasoning system by encoding human irrationality into its logic.
Which view is correct?
Dust specks
Eliezer argued that we should prefer 1 person being tortured for 50 years over 3^^^3 people each once getting a barely-noticeable dust speck in their eyes. Most people choose the many dust specks over the torture. Some people argued that "human values" includes having a utility aggregation function that rounds tiny (absolute value) utilities to zero, thus giving the "dust specks" answer. No, Eliezer said; this was an error in human reasoning. Is it an error, or a value?
Sex vs. punishment
In Crime and punishment, I argued that people want to punish criminals, even if there is a painless, less-costly way to prevent crime. This means that people value punishing criminals. This value may have evolved to accomplish the social goal of reducing crime. Most readers agreed that, since we can deduce this underlying reason, and accomplish it more effectively through reasoning, preferring to punish criminals is an error in judgement.
Most people want to have sex. This value evolved to accomplish the goal of reproducing. Since we can deduce this underlying reason, and accomplish it more efficiently than by going out to bars every evening for ten years, is this desire for sex an error in judgement that we should erase?
The problem for Friendly AI
Until you come up with a procedure for determining, in general, when something is a value and when it is an error, there is no point in trying to design artificial intelligences that encode human "values".
(P.S. - I think that necessary, but not sufficient, preconditions for developing such a procedure, are to agree that only utilitarian ethics are valid, and to agree on an aggregation function.)
Estimates of individual utility functions can be averaged, if you do it right, so far as I can tell. A possible estimate of everybody's utility is a computable function that given a person id and the person's circumstances, returns a rational number in the interval [0,1]. Discard the computable functions inconsistent with observed behavior of people. Average over all remaining possibilities weighing by the universal prior, thus giving you an estimated utility for each person in the range [0, 1]. We're estimating utilities for humans, not arbitrary hypothetical creatures, so there's an approximate universal minimum utility (torturing you and everyone you care about to death) and an approximate maximum utility (you get everything you want). We're estimating everybody's utility with one function, so an estimate that says that I don't like to be tortured will be simpler than one that doesn't even if I have never been tortured, because other people have attempted to avoid torture.
Does that proposal make sense? (I'm concerned that I may have been too brief.)
Does anything obvious break if you average these across humans?
As far as I see, your proposal is well-defined and consistent. However, even if we ignore all the intractable problems with translating it into any practical answers about concrete problems (of which I'm sure you're aware), this is still only one possible way to aggregate and compare utilities interpersonally, with no clear reason why you would use it instead of some other one that would favor and disfavor different groups and individuals.