I believe you can strip the AI of any preferences towards human utility functions with a simple hack.
Every decision of the AI will have two effects on expected human utility: it will change it, and it will change the human utility functions.
Have the AI make its decisions only based on the effect on the current expected human utility, not on the changes to the function. Add a term granting a large disutility for deaths, and this should do the trick.
Note the importance of the "current" expected utility in this setup; an AI will decide whether to industrialise a primitive tribe based on their current utility; if it does industrialise them, it will base its subsequent decisions on their new, industrialised utility.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
I'm convinced of utilitarianism as the proper moral construct, but I don't think an AI should use a free-ranging utilitarianism, because it's just too dangerous. A relatively small calculation error, or a somewhat eccentric view of the future can lead to very bad outcomes indeed.
A really smart, powerful AI, it seems to me, should be constrained by rules of behavior (no wiping out humanity/no turning every channel into 24-7 porn/no putting everyone to work in the paperclip factory), The assumption that something very smart would necessarily reach correct utiltarian views seems facially false; it could assume that humans must think like it does, or assume that dogs generate more utility with less effort due to their easier ability to be happy, or decide that humans need more superintelligent machines in a great big hurry and should build them regardless of anything else.
And maybe it'd be right here or there. But maybe not. I think almost definitionally that FAI cannot be full-on, free-range utilitarian of any stripe. Am I wrong?
Why are you more concerned about something with unlimited ability to self reflect making a calculation error than about the above being a calculation error? The AI could implement the above if the calculation implicit in it is correct.