I started looking through some of the papers and so far I don't feel enlightened.
I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.
But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.
For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.
But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant univer...
Allen - Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like "has the capacity to disobey the law, but doesn't" and "deliberates in a certain way". Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it's computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn't be perfect, but it...
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I'm no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is "If you should see to it that X, then you should see to it that you should see to it that X."
I can't immediately see a way this would destroy the human race, but that's only because it's nowhere near the point where it involves what humans actually think of as "morality" yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of "owner health" in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like "turn on stove" and "administer glucose") and a very specific list of owner health indicators (like "hunger" and "blood glucose level"), but it's not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head
Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
Eliezer defined the virtue of scholarship as (a) "Study many sciences and absorb their power as your own." He was silent on whether, after you survey a literature and conclude that nobody has the right approach yet, you should (b) still cite the literature (presumably to show that you're familiar with it), and/or (c) rebut the wrong approaches (presumably to try to lead others away from the wrong paths).
I'd say that (b) and (c) are much more situational than (a). (b) is mostly a signaling issue. If you can convince your audience to take you seriously without doing it, then why bother? And (c) depends on how much effort you'd have to spend to convince others that they are wrong, and how likely they are to contribute to the correct solution after you turn them around. Or perhaps you're not sure that your approach is right either, and think it should just be explored alongside others.
At least some of the lack of scholarship that you see h...
...there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards. For example:
...I tried - once - going to an interesting-sounding mainstream AI conference that happened to be in my area. I met ordinary research scholars and looked at their posterboards and read some of their papers. I watched their presentations and talked to them at lunch. And they were way below the level of the b
I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards.
"Below their standards" is a bad way to describe this situation, it suggests some kind of presumption of social superiority, while the actual problem is just that the things almost all researchers write presumably on this topic are not helpful. They are either considering a different problem (e.g. practical ways of making real near-future robots not kill wrong people, where it's perfectly reasonable to say that philosophy of consequentialism is useless, since there is no practical way to apply it; or applied ethics, where we ask how humans should act), or contemplate the confusingness of the problem, without making useful progress (a lot of philosophy).
This property doesn't depend on whether we are making progress ourselves, so it's perfectly possible (and to a large extent true) that progress that is up to the standard of being useful is not made by SIAI either.
A point where SIAI makes visible and useful progress is in communicating the difficulty of the problem, the very fact that most of what is purportedly progress on FAI is actually not.
You seem to be under the impression that Eliezer is going to create an artificial general intelligence, and oversight is necessary to ensure that he doesn't create one which places his goals over humanity's interests. It is important, you say, that he is not allowed unchecked power. This is all fine, except for one very important fact that you've missed.
Eliezer Yudkowsky can't program. He's never published a nontrivial piece of software, and doesn't spend time coding. In the one way that matters, he's a muggle. Ineligible to write an AI. Eliezer has not positioned himself to be the hero, the one who writes the AI or implements its utility function. The hero, if there is to be one, has not yet appeared on stage. No, Eliezer has positioned himself to be the mysterious old wizard - to lay out a path, and let someone else follow it. You want there to be oversight over Eliezer, and Eliezer wants to be the oversight over someone else to be determined.
But maybe we shouldn't trust Eliezer to be the mysterious old wizard, either. If the hero/AI programmer comes to him with a seed AI, then he knows it exists, and finding out that a seed AI exists before it launches is the hardest part of any...
With regards to your (and Eliezer's) quest, I think Oppenheimer's Maxim is relevant:
It is a profound and necessary truth that the deep things in science are not found because they are useful, they are found because it was possible to find them.
A theory of machine ethics may very well be the most useful concept ever discovered by humanity. But as far as I can see, there is no reason to believe that such a theory can be found.
For the list:
The Ethics of Artificial Intelligence http://www.nickbostrom.com/ethics/artificial-intelligence.pdf
Ethical Issues in Advanced Artificial Intelligence http://www.nickbostrom.com/ethics/ai.html
Beyond AI http://mol-eng.com/
Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
Previously, I provided an overview of formal epistemology, that field of philosophy that deals with (1) mathematically formalizing concepts related to induction, belief, choice, and action, and (2) arguing about the foundations of probability, statistics, game theory, decision theory, and algorithmic learning theory.
Now, I've written Machine Ethics is the Future, an introduction to machine ethics, the academic field that studies the problem of how to design artificial moral agents that act ethically (along with a few related problems). There, you will find PDFs of a dozen papers on the subject.
Enjoy!