On the other hand, a machine could fail a morality test simply by saying something controversial, or just failing to signal properly. For example atheism could be considered immoral by religious people; they could conclude that the machine is missing a part of human utility function. Or if some nice and correct belief has bad consequences, but humans compartmentalize it away and the machine would point it out explicitly, that could be percieved as a moral failure.
If the machine is allowed to lie, passing this test could just mean the machine is a skilled psychopath. If the machine is not allowed to lie, failing this test could just mean humans confuse signalling with the real thing.
It looks as though lukeprog has finished his series on how to purchase AI risk reduction. But the ideas lukeprog shares are not the only available strategies. Can Less Wrong come up with more?
A summary of recommendations from Exploring the Idea Space Efficiently:
If you're strictly a lurker, you can send your best ideas to lukeprog anonymously using his feedback box. Or send them to me anonymously using my feedback box so I can post them here and get all your karma.
Thread Usage
Please reply here if you wish to comment on the idea of this thread.
You're encouraged to discuss the ideas of others in addition to coming up with your own ideas.
If you split your ideas into individual comments, they can be voted on individually and you will probably increase your karma haul.