lukeprog comments on A Brief Overview of Machine Ethics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (90)
I started looking through some of the papers and so far I don't feel enlightened.
I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.
But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.
For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.
But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake". So if a robot were to propose "I, a robot, will kill all humans", why should we expect it to universalize it to "Everyone will kill everyone else" rather than "Other robots will also kill all humans", which just means the robot gets help?
And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we're talking about, then even in the best case scenario where everything goes right the maxim "I will try to kill all humans" will universalize to "Everyone will try to kill everyone else". Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder - but in an AI context this isn't contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.
(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).
I'll look through some of the other papers later, but so far I'm not seeing anything to make me think Eliezer's opinion of the state of the field was overly pessimistic.
For Googleability, I'll not that this objection is called the problem of maxim specification.
That currently has no Google results besides your post.
Yes, sorry. "Maxim specification" won't give you much, but variations on that will. People don't usually write "the problem of maxim specification" but instead things like "...specifying the maxim..." or "the maxim... specified..." and so on. It in general isn't easily Googled like "is-ought gap" is.
But here is one use.