In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you
But it is, and the contrary approach of teaching humans to recognize things doesn't have an obvious relation to FAI, unless we think that the details of teaching human brains by instruction and example are relevant to how you'd set up a similar training program for an unspecified AI algorithm. If this is the purported connection to FAI it should be spelled out explicitly, and the possible failure of the connection spelled out explicitly. I'm also not sure the example is a good one for the domain. Asking how to distinguish happiness from pleasure, what people really want from what they say they want, the difference between panic and justified fear? Or maybe if we want to start with something more object-level, what should be tested and when you should draw confident conclusions about what someone's taste buds will like (under various circumstances?), i.e., how much do you need to know to decide that someone will like the taste of a cinnamon candy if they've never tried anything cinnamon before? Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking - if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?
I like the "What does it take to predict taste buds?" question, of those I brainstormed above, because it's something we could conceivably test in practice. Or maybe an even more practical conjugate would be Netflix-style movie score prediction, only you can ask the subject whatever you like, have them rate particular other movies, etc., all to predict the rating on that one movie.
Porn vs. erotica seems meant to take us into a realm of conflicting values, disagreements, legalisms, and a large prior literature potentially getting in the way of original thinking - if each of these aspects is meant to be relevant, then can the relevance of each aspect be spelled out?
Well, conflicting values is obviously relevant, and disagreements seem so as well to a less extend (consider the problem of choosing priors for an AI), for starters.
To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.
There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?
Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.
The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?
To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?