Few people, when learning their values in childhood, ended up considering examples such as this one and explicitly learning that they were wrong. Yet the persuasive power of that example comes from most people instantly reject the desirability of the dopamine drip scenario when it’s suggested to them.
I for one don't "instantly reject" the desirability of this scenario. I think it's a difficult philosophy problem as to whether dopamine drip is desirable or not. My worry is that either the AI will not be as uncertain as I am about it, or it will not handle or resolve the normative uncertainty in the same way as I would or should.
Today's machine learning algorithms tend to be unreasonably certain (and wrong) about inputs very different from their training data, but that is perhaps just due to machine learning researchers currently focusing mostly on commercial settings where inputs are rarely very different from training data, and there aren't terrible consequences for getting things wrong. So maybe we can expect this to improve in the future as researchers start to focus more on safety.
But even if we manage to build an AI that is properly uncertain about whether something like the dopamine drip scenario is good or bad, how do we get it to resolve its uncertainty in the right way, especially if its creators/owners are also uncertain or possibly wrong so it can't just ask? Resolving the uncertainty incorrectly or getting the uncertainty permanently frozen into its utility function seem to be two big risks here. So I worry just as much about the reverse maverick nanny scenario, where we eventually, after centuries of philosophical progress, figure out that we actually do want to be put on dopamine drips, but the AI says "Sorry, I can't let you do that."
Today's machine learning algorithms tend to be unreasonably certain (and wrong) about inputs very different from their training data
Read about covariate shift. (More generally ML people are getting into systematic biases now, including causal inference, in a big way).
This has little to do with AGI, though.
New essay summarizing some of my latest thoughts on AI safety, ~3500 words. I explain why I think that some of the thought experiments that have previously been used to illustrate the dangers of AI are flawed and should be used very cautiously, why I'm less worried about the dangers of AI than I used to be, and what are some of the remaining reasons for why I do continue to be somewhat worried.
Backcover celebrity endorsement: "Thanks, Kaj, for a very nice write-up. It feels good to be discussing actually meaningful issues regarding AI safety. This is a big contrast to discussions I've had in the past with MIRI folks on AI safety, wherein they have generally tried to direct the conversation toward bizarre, pointless irrelevancies like "the values that would be held by a randomly selected mind", or "AIs with superhuman intelligence making retarded judgments" (like tiling the universe with paperclips to make humans happy), and so forth.... Now OTOH, we are actually discussing things of some potential practical meaning ;p ..." -- Ben Goertzel