To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.
There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?
Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.
The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?
To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?
Thank you for the thoughtful reply!
In the white-box approach it can't really hide. But I guess it's rather tangential to the discussion.
What do you mean by "follow a utility function"? Why do you thinks humans don't do it? If it isn't there, what does it mean to have a correct solution to the FAI problem?
The main problem with Yvain's thesis is in the paragraph:
What does Yvain mean by "give the robot human level intelligence"? If the robot's code remained the same, in what sense does it have human level intelligence?
This is the part of the CEV proposal which always seemed redundant to me. Why should we do it? If you're designing the AI, why wouldn't you use your own utility function? At worst, an average utility function of the group of AI designers? Why do we want / need the whole humanity there? Btw, I would obviously prefer my utility function in the AI but I'm perfectly willing to settle on e.g. Yudkowsky's.
It seems that you're identifying my proposal with something like "maximize pleasure". The latter is a notoriously bad idea, as was discussed endlessly. However, my proposal is completely different. The AI wouldn't do something the upload wouldn't do because such an action is opposed to the upload's utility function.
Actually, I'm not far from it (at least I don't think I'm further than CEV). Note that I have already defined formally I(A, U) where I=intelligence, A=agent, U=utility function. Now we can do something like "U(A) is defined to be U s.t. the probability that I(A, U) > I(R, U) for random agent R is maximal". Maybe it's more correct to use something like a thermal ensemble with I(A, U) playing the role of energy: I don't know, I don't claim to have solved it all already. I just think it's a good research direction.
Humans are neither independent not transitive. Human preferences change over time, depending on arbitrary factors, including how choices are framed. Humans suffer because of things they cannot affect, and humans suffer because of details of their probability assessment (eg ambiguity aversion). That bears repeating - humans have preference over their state of knowledge. The core of this is that "assessment of fact" and "values" are not disc... (read more)