We have a reasonably clear sense of what "good" is, but it's not perfect. Suffering is bad, pleasure is good, more people living enjoyable lives is good, yes, but tradeoffs are hard. How much worse is it to go blind than to lose your leg? [1] How do we compare the death of someone at eighty to the death of someone at twelve? If you wanted to build some automated system that would go from data about the world to a number representing how well it's doing, where you would prefer any world that scored higher to any world scoring lower, that would be very difficult.
Say, however, that you've built a metric that you think matches your values well and you put some powerful optimizer to work maximizing that metric. This optimizer might do many things you think are great, but it might be that the easiest ways to maximize the metric are the ones that pull it apart from your values. Perhaps after it's in place it turns out your metric included many things that only strongly correlated with what you cared about, where the correlation breaks down under maximization.
What confuses me is that the people who warn about this scenario with respect to AI are often the same people in favor of futarchy. They both involve trying to define your values and then setting an indifferent optimizer to work on them. If you think AI would be very dangerous but futarchy would be very good, why?
I also posted this on my blog.
[1] This is a question people working in public health try to answer with Disability Weights for DALYs.
This is kinda like how futarchy works... STAR WARS or STAR TREK… we let the swarm decide! The difference is that the outcome would be a lot more accurate with futarchy. Why? Because people would be putting their money where their mouths are.
As I pointed out here... AI Safety vs Human Safety... nobody, that I know of, has applied the best method we have for controlling humans (the market) to robots. Which isn't too surprising since AI largely falls under the scope of computer science. But it's the "safety" aspect that also falls under the scope of economics. The development of an evil AI is most definitely an inefficient allocation of society's limited resources.
With futarchy we could bet on which organization/company is most likely to develop harmful AI. We could also bet on which organization is most likely to develop beneficial AI. Then we could shift our money from the former to the latter.
Don't Give Evil Robots A Leg To Stand On!
On a related point, here's a post about using swarms to build morality into intelligent systems:
http://unanimousai.com/building-moral/