They have preferences like ambiguity aversion, eg being willing to pay to find out, during a holiday, whether they were accepted for a job, while knowing that they can't make any relevant decisions with that early knowledge. This is not compatible with following a standard utility function.
I don't know what you mean by "standard" utility function. I don't even know what you mean by "following". We want to find out since uncertainty makes you nervous, being nervous is unpleasant and pleasure is a terminal value. It is entirely consistent with having a utility function and with my formalism in particular.
Humans are not ideal rational optimizers of their respective utility functions.
Then why claim that they have one? If humans have intransitive preferences (A>B>C>A), as I often do, then why claim that actually their preferences are secretly transitive but they fail to act on them properly?
In what epistemology are you asking this question? That is, what is the criterion according to which the validity of answer would be determined?
If you don't think human preferences are "secretly transitive", then why do you suggest the following:
Whenever revealed preferences are non-transitive or non-independent, use the person's stated meta-preferences to remove the issue. The AI thus calculates what the person would say if asked to resolve the transitivity or independence (for people who don't know about the importance of resolving them, the AI would present them with a set of transitive and independent preferences, derived from their revealed preferences, and have them choose among them).
What is the meaning of asking a person to resolve intransitivities if there are no transitive preferences underneath?
I don't even know what you mean by "following".
That is, what is the criterion according to which the validity of answer would be determined?
Those are questions for you, not for me. You're claiming that humans have a hidden utility function. What do you mean by that, and what evidence do you have for your position?
To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.
There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?
Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.
The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?
To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?