Yvain comments on A Brief Overview of Machine Ethics - Less Wrong

6 Post author: lukeprog 05 March 2011 06:09AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (90)

You are viewing a single comment's thread. Show more comments above.

Comment author: Yvain 05 March 2011 04:11:25PM *  23 points [-]

I started looking through some of the papers and so far I don't feel enlightened.

I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.

But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.

For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.

But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake". So if a robot were to propose "I, a robot, will kill all humans", why should we expect it to universalize it to "Everyone will kill everyone else" rather than "Other robots will also kill all humans", which just means the robot gets help?

And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we're talking about, then even in the best case scenario where everything goes right the maxim "I will try to kill all humans" will universalize to "Everyone will try to kill everyone else". Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder - but in an AI context this isn't contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.

(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).

I'll look through some of the other papers later, but so far I'm not seeing anything to make me think Eliezer's opinion of the state of the field was overly pessimistic.

Comment author: Yvain 05 March 2011 04:34:21PM *  14 points [-]

Allen - Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like "has the capacity to disobey the law, but doesn't" and "deliberates in a certain way". Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.

He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.

Utilitarianism considered difficult to implement because it's computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn't be perfect, but it would do better than humans, at least.

Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI's motivation - if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse - see the Sequence post about the tanks in cloudy weather.

Still not impressed.

Comment author: Yvain 05 March 2011 04:46:11PM *  9 points [-]

Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I'm no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is "If you should see to it that X, then you should see to it that you should see to it that X."

I can't immediately see a way this would destroy the human race, but that's only because it's nowhere near the point where it involves what humans actually think of as "morality" yet.

Comment author: Yvain 05 March 2011 05:03:18PM *  11 points [-]

Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of "owner health" in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like "turn on stove" and "administer glucose") and a very specific list of owner health indicators (like "hunger" and "blood glucose level"), but it's not very relevant to the broader Friendly AI program.

Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head