I don't think this problem is very hard to resolve. If an AI is programmed to make sense of natural-language concepts like "chocolate bar", there should be a mechanism to acquire a best-effort understanding. So you could rewrite the motivation as:
"create things which the maximum amount of people understand to be a chocolate bar"
or alternatively:
"create things which the programmer is most likely to have understood to be a chocolate bar".
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Are you saying the AI will rewrite its goals to make them easier, or will just not be motivated to fill in missing info?
In the first case, why wont it go the whole hog and wirehead? Which is to say, that any AI which is does anything except wireheading will be resistant to that behaviour -- it is something that needs to be solved, and which we can assume has been solved in a sensible AI design.
If you programme it with incomplete info, and without any goal to fill in the gaps, then it will have the behaviour you mention...but I'm not seeing the generality. There are many other ways to programme it.
An AI that was programmed to attempt to fill in gaps in knowledge it detected, halt if it found conflicts, etc would not behave they way you describe. Consider the objection as actually saying:
"Why has the AI been programmed so as to have selective areas of ignorance and stupidity, which are immune from the learning abilities it displays elsewhere?"
PS This has been discussed before, see
http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/
and
http://lesswrong.com/lw/igf/the_genie_knows_but_doesnt_care/
see particularly
http://lesswrong.com/lw/m5c/debunking_fallacies_in_the_theory_of_ai_motivation/ccpn
We don't know how to program a foolproof method of "filling in the gaps" (and a lot of "filling in the gaps" would be a creative process rather that a mere learning one, such as figuring out how to extend natural language concepts to new areas).
And it helps it people speak about this problem in terms of coding, rather than high level concepts, because all the specific examples people have ever come up with for coding learning, have had these kind of flaws. Learning natural language is not some sort of natural category.
Coding learning with some imperfections might be ok if the AI is motivated to merely learn, but is positively pernicious if the AI has other motivations as to what to do with that learning (see my post here for a way of getting around it: https://agentfoundations.org/item?id=947 )