Just criminalize porn, and leave it to the jury to decide whether or not it's porn. That's how we handle most moral ambiguities, isn't it?
I will assume that the majority of the population shares my definition of porn and is on board with this, creating low risk of an activist jury (otherwise this turns into the harder problem of "how to seize power from the people".)
Edit: On more careful reading, I guess that's not allowed since it would fall in the "I know it when I see it" category. But then, since we obviously are not going to write an actual algorithm, how specific does the answer need to be?
Would "It is pornography if the intention is primarily to create sexual arousal, and it's up to the jury to decode intention" be an acceptably well-defined answer? Would "I'm going to use theoretically possible mind-reading technology to determine whether or not the viewer / creator of the pornography were primarily intending to view / create sexually arousing stimuli" be an acceptably well defined answer? Do I have to define the precise threshold upon which something is "primarily" about a factor with neuron-level accuracy, or can I just approximately define the threshold of "primarily" via a corpus of examples?
I guess what I'm saying is... "how to ban pornography" seems to be "solved" in the abstract as soon as you adequately define pornography, and the rest is all implementation.
To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.
There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?
Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.
The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?
To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?