A Brief Overview of Machine Ethics

lukeprog

Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.

Previously, I provided an overview of formal epistemology, that field of philosophy that deals with (1) mathematically formalizing concepts related to induction, belief, choice, and action, and (2) arguing about the foundations of probability, statistics, game theory, decision theory, and algorithmic learning theory.

Now, I've written Machine Ethics is the Future, an introduction to machine ethics, the academic field that studies the problem of how to design artificial moral agents that act ethically (along with a few related problems). There, you will find PDFs of a dozen papers on the subject.

Enjoy!

I started looking through some of the papers and so far I don't feel enlightened.

I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.

But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.

For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.

But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake". So if a robot were to propose "I, a robot, will kill all humans", why should we expect it to universalize it to "Everyone will kill everyone else" rather than "Other robots will also kill all humans", which just means the robot gets help?

And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we're talking about, then even in the best case scenario where everything goes right the maxim "I will try to kill all humans" will universalize to "Everyone will try to kill everyone else". Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder - but in an AI context this isn't contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.

(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).

I'll look through some of the other papers later, but so far I'm not seeing anything to make me think Eliezer's opinion of the state of the field was overly pessimistic.

Every sufficiently smart person who thinks about Kantian ethics comes up with this objection. I don't believe it's possible to defend against it entirely. However...

After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake".

That may be what Kant actually says (does he?) but if he does then I think he's wrong about his own theory. As I understand it, what you're supposed to... (read more)

0lukeprog15y

For Googleability, I'll not that this objection is called the problem of maxim specification.

2lukeprog15y

BTW, I so identify with this quote: [...] In fact, I've said the same thing myself, in slightly different words.

10

A Brief Overview of Machine Ethics

10

10

10

A Brief Overview of Machine Ethics

10

10