I started looking through some of the papers and so far I don't feel enlightened.
I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.
But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.
For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.
But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant univer...
Allen - Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like "has the capacity to disobey the law, but doesn't" and "deliberates in a certain way". Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.
He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.
Utilitarianism considered difficult to implement because it's computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn't be perfect, but it...
Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I'm no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is "If you should see to it that X, then you should see to it that you should see to it that X."
I can't immediately see a way this would destroy the human race, but that's only because it's nowhere near the point where it involves what humans actually think of as "morality" yet.
Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of "owner health" in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like "turn on stove" and "administer glucose") and a very specific list of owner health indicators (like "hunger" and "blood glucose level"), but it's not very relevant to the broader Friendly AI program.
Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head
Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
Eliezer defined the virtue of scholarship as (a) "Study many sciences and absorb their power as your own." He was silent on whether, after you survey a literature and conclude that nobody has the right approach yet, you should (b) still cite the literature (presumably to show that you're familiar with it), and/or (c) rebut the wrong approaches (presumably to try to lead others away from the wrong paths).
I'd say that (b) and (c) are much more situational than (a). (b) is mostly a signaling issue. If you can convince your audience to take you seriously without doing it, then why bother? And (c) depends on how much effort you'd have to spend to convince others that they are wrong, and how likely they are to contribute to the correct solution after you turn them around. Or perhaps you're not sure that your approach is right either, and think it should just be explored alongside others.
At least some of the lack of scholarship that you see h...
...there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards. For example:
...I tried - once - going to an interesting-sounding mainstream AI conference that happened to be in my area. I met ordinary research scholars and looked at their posterboards and read some of their papers. I watched their presentations and talked to them at lunch. And they were way below the level of the b
I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards.
"Below their standards" is a bad way to describe this situation, it suggests some kind of presumption of social superiority, while the actual problem is just that the things almost all researchers write presumably on this topic are not helpful. They are either considering a different problem (e.g. practical ways of making real near-future robots not kill wrong people, where it's perfectly reasonable to say that philosophy of consequentialism is useless, since there is no practical way to apply it; or applied ethics, where we ask how humans should act), or contemplate the confusingness of the problem, without making useful progress (a lot of philosophy).
This property doesn't depend on whether we are making progress ourselves, so it's perfectly possible (and to a large extent true) that progress that is up to the standard of being useful is not made by SIAI either.
A point where SIAI makes visible and useful progress is in communicating the difficulty of the problem, the very fact that most of what is purportedly progress on FAI is actually not.
You seem to be under the impression that Eliezer is going to create an artificial general intelligence, and oversight is necessary to ensure that he doesn't create one which places his goals over humanity's interests. It is important, you say, that he is not allowed unchecked power. This is all fine, except for one very important fact that you've missed.
Eliezer Yudkowsky can't program. He's never published a nontrivial piece of software, and doesn't spend time coding. In the one way that matters, he's a muggle. Ineligible to write an AI. Eliezer has not positioned himself to be the hero, the one who writes the AI or implements its utility function. The hero, if there is to be one, has not yet appeared on stage. No, Eliezer has positioned himself to be the mysterious old wizard - to lay out a path, and let someone else follow it. You want there to be oversight over Eliezer, and Eliezer wants to be the oversight over someone else to be determined.
But maybe we shouldn't trust Eliezer to be the mysterious old wizard, either. If the hero/AI programmer comes to him with a seed AI, then he knows it exists, and finding out that a seed AI exists before it launches is the hardest part of any...
With regards to your (and Eliezer's) quest, I think Oppenheimer's Maxim is relevant:
It is a profound and necessary truth that the deep things in science are not found because they are useful, they are found because it was possible to find them.
A theory of machine ethics may very well be the most useful concept ever discovered by humanity. But as far as I can see, there is no reason to believe that such a theory can be found.
For the list:
The Ethics of Artificial Intelligence http://www.nickbostrom.com/ethics/artificial-intelligence.pdf
Ethical Issues in Advanced Artificial Intelligence http://www.nickbostrom.com/ethics/ai.html
Beyond AI http://mol-eng.com/
There are definitely a number of true fanboys on this site, they may even be the majority (although I hope not)...
See, that one person who donated the current balance of his bank account got 52 upvotes for it. Now I'm not particularly shocked by him doing that or the upvotes. I don't worry that all that money might be better spend somehow. What drives me is curiosity mixed with my personality, I want to do what's right. That is the reason for why I criticize and why some comments may seem, or actually are derogatory. I think it needs to be said, I believe I can provoke feedback that way and learn more about the underlying rational. I desperately try to figure out if there is something I am missing.
I haven't read most of the sequences yet, let me explain why. I'm a really slow reader, I have almost no education and need a lot of time to learn anything. I did a lot of spot tests, reading various posts and came across people who read the sequences but haven't been able to conclude that they should stop doing anything except trying to earn money for the SIAI. My conclusion is that reading the sequences shouldn't be a priority right now but rather learning the mathematical basics, programming and reading various books. But I still try to spend some time here to see if that assessment might be wrong.
My current take on the whole issue is that the sequences do not provide much useful insights. I already know that by all that we know today AGI is possible and that it is unlikely that humans are the absolute limit when it comes to intelligence. I intuitively agree with the notion that AGI in its abstract form (intelligence as an algorithm) doesn't share our values if you do not deliberately 'tell' it to care. I see that one can outweigh even a low probability of risks from AI by assuming a future galactic civilization that is at stake. So what is my problem? I've written hundreds of comments about all kinds of problems I have with it, but maybe the biggest problem is a simple bias. I have an overwhelming gut feeling telling me that something is wrong with all this. I also do not trust my current ability to assess the situation to the extent that I would sacrifice other more compelling goals right now. And I am simply risk-adverse. I know that there is always either a best choice or all options are equal, no matter what uncertainty. Maybe everything is currently speaking in favor of the SIAI, but I'm not able to ignore my gut feeling right now. Trying to do so frequently makes me reluctant to do anything at all. Something is very wrong, I can't pinpoint what it is right now so I'm throwing everything I got at it to see if the facade crumbles. So far it did not crumble but neither have I received much reassuring feedback.
My recent comments have been made after a night of no sleep and being in a bad mood. I wouldn't have written them in that way on another day. I even messaged Eliezer yesterday telling him that he can edit/delete any of my submissions here that might be harmful without having to fear that I will protest and therefore cause more trouble. I don't care about myself much, but I care not to hurt others or cause damage. Sadly I often become reluctant, then I say 'fuck it' and just go ahead to write something because I was overwhelmed by all the possible implications and subsequently ignored them.
What is really confusing is that, taken at face value, the SIAI is working on the most important and most dangerous problem anyone will ever face. The SIAI is trying to take over the universe! Yet all I see in its followers is extreme scope insensitivity. How so? Because if you seriously believe that someone else believes that he is trying to take over the multiverse then you don't just trust him because he wrote a few posts about rationality and being honest. If the stakes are high, people do everything. Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky? Why wouldn't he write the sequences, why wouldn't he claim to be implementing CEV? That is one of the problems that make me feel that something is wrong here. Either people really don't believe all this stuff about fooming AI, galactic civilizations and the ability of the SIAI to create a seed AI, or I'm missing something. What I would expect to see is people asking for transparency. I expect people to demand oversight and ask how exactly their money is being spend. I expect people to be much more critical and to not just believe Yudkowsky but ask for data and progress reports. Nada.
Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky? Why wouldn't he write the sequences, why wouldn't he claim to be implementing CEV?
Yes, it is impossible to distinguish a sincere optimist from a perfectly selfish sociopath. At least until they gain power (or move to an audience where the signalling game is played at a higher level of sophistication than that of conveying altruism).
Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.
Previously, I provided an overview of formal epistemology, that field of philosophy that deals with (1) mathematically formalizing concepts related to induction, belief, choice, and action, and (2) arguing about the foundations of probability, statistics, game theory, decision theory, and algorithmic learning theory.
Now, I've written Machine Ethics is the Future, an introduction to machine ethics, the academic field that studies the problem of how to design artificial moral agents that act ethically (along with a few related problems). There, you will find PDFs of a dozen papers on the subject.
Enjoy!