You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

A Brief Overview of Machine Ethics

6 Post author: lukeprog 05 March 2011 06:09AM

Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.

Previously, I provided an overview of formal epistemology, that field of philosophy that deals with (1) mathematically formalizing concepts related to induction, belief, choice, and action, and (2) arguing about the foundations of probability, statistics, game theory, decision theory, and algorithmic learning theory.

Now, I've written Machine Ethics is the Future, an introduction to machine ethics, the academic field that studies the problem of how to design artificial moral agents that act ethically (along with a few related problems). There, you will find PDFs of a dozen papers on the subject.

Enjoy!

Comments (90)

Comment author: Yvain 05 March 2011 04:11:25PM *  23 points [-]

I started looking through some of the papers and so far I don't feel enlightened.

I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid. Take Prospects For a Kantian Machine. The first part is about building a machine whose maxims satisfy the universalizability criterion: that they can be universalized without contradicting themselves.

But this seems to rely a lot on being very good at parsing categories in exactly the right way to come up with the answer you wanted originally.

For example, it seems reasonable to have maxims that only apply to certain portions of the population, for example: "I, who am a policeman, will lock up this bank robber awaiting trial in my county jail" generalizes to "Other policemen will also lock up bank robbers awaiting trial in their county jails" if you're a human moral philosopher who knows how these things are supposed to work.

But I don't see what's stopping a robot from coming up with "Everyone will lock up everyone else" or "All the world's policemen will descend upon this one bank robber and try to lock him up in their own county jails". After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake". So if a robot were to propose "I, a robot, will kill all humans", why should we expect it to universalize it to "Everyone will kill everyone else" rather than "Other robots will also kill all humans", which just means the robot gets help?

And even if it does universalize correctly, in the friendly AI context it need not be a contradiction! If this is a superintelligent AI we're talking about, then even in the best case scenario where everything goes right the maxim "I will try to kill all humans" will universalize to "Everyone will try to kill everyone else". Kant said this was contradictory in that every human will then be dead and none of them will gain the desserts of their murder - but in an AI context this isn't contradictory at all: the superintelligence will succeed at killing everyone else, the actions of the puny humans will be irrelevant, and the AI will be just fine.

(actually, just getting far enough to make either of those objections involves hand-waving away about thirty other intractable problems you would need just to get that far; but these seemed like the most pertinent).

I'll look through some of the other papers later, but so far I'm not seeing anything to make me think Eliezer's opinion of the state of the field was overly pessimistic.

Comment author: Yvain 05 March 2011 04:34:21PM *  14 points [-]

Allen - Prolegomena to Any Future Moral Agent places a lot of emphasis on figuring out of a machine can be truly moral, in various metaphysical senses like "has the capacity to disobey the law, but doesn't" and "deliberates in a certain way". Not only is it possible that these are meaningless, but in a superintelligence the metaphysical implications should really take second-place to the not-getting-turned-into-paperclips implications.

He proposes a moral Turing Test, where we call a machine moral if it can answer moral questions indistinguishably from a human. But Clippy would also pass this test, if a consequence of passing was that the humans lowered their guard/let him out of the box. In fact, every unfriendly superintelligence with a basic knowledge of human culture and a motive would pass.

Utilitarianism considered difficult to implement because it's computationally impossible to predict all consequences. Given that any AI worth its salt would have a module for predicting the consequences of its actions anyway, and that the potential danger of the AI is directly related to how good this module is, that seems like a non-problem. It wouldn't be perfect, but it would do better than humans, at least.

Deontology, same problem as the last one. Virtue ethics seems problematic depending on the AI's motivation - if it were motivated to turn the universe to paperclips, would it be completely honest about it, kill humans quickly and painlessly and with a flowery apology, and declare itself to have exercised the virtues of honesty, compassion, and politeness? Evolution would give us something at best as moral as humans and probably worse - see the Sequence post about the tanks in cloudy weather.

Still not impressed.

Comment author: Yvain 05 March 2011 04:46:11PM *  9 points [-]

Mechanized Deontic Logic is pretty okay, despite the dread I had because of the name. I'm no good at formal systems, but as far as I can understand it looks like a logic for proving some simple results about morality: the example they give is "If you should see to it that X, then you should see to it that you should see to it that X."

I can't immediately see a way this would destroy the human race, but that's only because it's nowhere near the point where it involves what humans actually think of as "morality" yet.

Comment author: Yvain 05 March 2011 05:03:18PM *  11 points [-]

Utilibot Project is about creating a personal care robot that will avoid accidentally killing its owner by representing the goal of "owner health" in a utilitarian way. It sounds like it might work for a robot with a very small list of potential actions (like "turn on stove" and "administer glucose") and a very specific list of owner health indicators (like "hunger" and "blood glucose level"), but it's not very relevant to the broader Friendly AI program.

Having read as many papers as I have time to before dinner, my provisional conclusion is that Vladimir Nesov hit the nail on the head

Comment author: lukeprog 05 March 2011 07:04:46PM 4 points [-]

I don't disagree with much of anything you've said here, by the way.

Remember that I'm writing a book that, for most of its length, will systematically explain why the proposed solutions in the literature won't work.

The problem is that SIAI is not even engaging in that discussion. Where is the detailed explanation of why these proposed solutions won't work? I don't get the impression someone like Yudkowsky has even read these papers, let alone explained why the proposed solutions won't work. SIAI is just talking a different language than the professional machine ethics community is.

Most of the literature on machine ethics is not that useful, but that's true of almost any subject. The point of a literature hunt is to find the gems here and there that genuinely contribute to the important project of Friendly AI. Another points is to interact with the existing literature and explain to people why it's not going to be that easy.

Comment author: Vladimir_Nesov 05 March 2011 08:05:16PM *  7 points [-]

My sentiment about the role of engaging existing literature on machine ethics is analogous to what you describe in a recent post on your blog. Particularly this:

Oh God, you think. That’s where the level of discussion is, on this planet.

You either push the boundaries, or fight the good fight. And the good fight is best fought by writing textbooks and opening schools, not by public debates with distinguished shamans. But it's not entirely fair, since some of machine ethics addresses a reasonable problem of making good-behaving robots, which just happens to have the same surface feature of considering moral valuation of decisions of artificial reasoners, but on closer inspection is mostly unrelated to the problem of FAI.

Comment author: lukeprog 05 March 2011 10:56:50PM 3 points [-]

Sure. One of the hopes of my book is, as stated earlier, to bring people up to where Eliezer Yudkowsky was circa 2004.

Also, I worry that something is being overlooked by the LW / SIAI community because the response to suggestions in the literature has been so quick and dirty. I'm on the prowl for something that's been missed because nobody has done a thorough literature search and detailed rebuttal. We'll see what turns up.

Comment author: lukeprog 06 March 2011 12:17:56AM 2 points [-]

BTW, I so identify with this quote:

I've never been able to tell whether I don't understand Kantian ethics, or Kantian ethics is just stupid.

In fact, I've said the same thing myself, in slightly different words.

Comment author: AlephNeil 13 March 2011 07:19:13AM *  1 point [-]

Every sufficiently smart person who thinks about Kantian ethics comes up with this objection. I don't believe it's possible to defend against it entirely. However...

After all, Kant universalizes "I will deceive this murderer so he can't find his victim" to "Everyone will deceive everyone else all the time" and not to "Everyone will deceive murderers when a life is at stake".

That may be what Kant actually says (does he?) but if he does then I think he's wrong about his own theory. As I understand it, what you're supposed to do is look at the bit of reasoning which is actually causing you to want to do X and see whether that generalizes, not cast around for a bit of reasoning which would (or in this case, would not) generalize, and then pretend to be basing your action on that.

In the example you mention, you should only generalize to "everyone will deceive everyone all the time" if what you're considering doing is deceiving this person simply because he's a person. If you want to deceive him because of his intention to commit murder, and would not want to otherwise, then the thing you generalize must have this feature.

Similarly, I might try to justify lying to someone this morning on the basis that it generalizes to "I, who am AlephNeil, always lies on the morning of 13th day of March 2011 if it is to my advantage" which is both consistent and advantageous (to me). But really I would be lying purely because it's to my advantage - the date and time, and the fact that I am AlephNeil, don't enter into the computation.

Comment author: lukeprog 13 March 2011 06:13:41AM *  0 points [-]

For Googleability, I'll not that this objection is called the problem of maxim specification.

Comment author: Document 23 April 2011 06:20:36AM 0 points [-]

That currently has no Google results besides your post.

Comment author: lukeprog 23 April 2011 06:51:05AM 1 point [-]

Yes, sorry. "Maxim specification" won't give you much, but variations on that will. People don't usually write "the problem of maxim specification" but instead things like "...specifying the maxim..." or "the maxim... specified..." and so on. It in general isn't easily Googled like "is-ought gap" is.

But here is one use.

Comment author: Wei_Dai 06 March 2011 07:55:05PM *  9 points [-]

Earlier, I lamented that even though Eliezer named scholarship as one of the Twelve Virtues of Rationality, there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.

Eliezer defined the virtue of scholarship as (a) "Study many sciences and absorb their power as your own." He was silent on whether, after you survey a literature and conclude that nobody has the right approach yet, you should (b) still cite the literature (presumably to show that you're familiar with it), and/or (c) rebut the wrong approaches (presumably to try to lead others away from the wrong paths).

I'd say that (b) and (c) are much more situational than (a). (b) is mostly a signaling issue. If you can convince your audience to take you seriously without doing it, then why bother? And (c) depends on how much effort you'd have to spend to convince others that they are wrong, and how likely they are to contribute to the correct solution after you turn them around. Or perhaps you're not sure that your approach is right either, and think it should just be explored alongside others.

At least some of the lack of scholarship that you see here just reflect a cost-benefit analysis on (b) and (c), instead of a lack of "interest" or "virtue". (Of course you probably have different intuitions on the costs and benefits involved, and I think you should certainly pursue writing your book if you think it's a good use of your time.)

Also, I note that there is remarkably little existing research on some of the topics we discuss here. For example, for my The Nature of Offense post, I was able to find just one existing article on the topic, and that was in a popular online magazine, instead of an academic publication.

Comment author: lukeprog 07 March 2011 03:24:38AM *  1 point [-]

This is an excellent comment, and you're probably right to some degree.

But I will say, I've learned many things already from the machine ethics literature, and I've only read about 1/4 of it so far.

Comment author: Vladimir_Nesov 07 March 2011 12:08:04PM 1 point [-]

But I will say, I've learned many things already from the machine ethics literature

Such as?

Comment author: lukeprog 07 March 2011 05:41:43PM 0 points [-]

Hold, please. I'm writing several articles and a book on this. :)

Comment author: lukeprog 09 March 2011 04:44:19AM 1 point [-]

But for now, this was Louie Helm's favorite paper among those we read during our survey of the literature on machine ethics.

Comment author: Pavitra 06 March 2011 08:01:10PM 1 point [-]

Citing the literature makes it easier for your reader to verify your reasoning. If you don't, then a proper confirmation or rebuttal requires (more) independent scholarship to discover the relevant existing literature from scratch.

Comment author: XiXiDu 05 March 2011 09:55:53AM 8 points [-]

...there is surprisingly little interest in (or citing of) the academic literature on some of Less Wrong's central discussion topics.

I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards. For example:

I tried - once - going to an interesting-sounding mainstream AI conference that happened to be in my area. I met ordinary research scholars and looked at their posterboards and read some of their papers. I watched their presentations and talked to them at lunch. And they were way below the level of the big names. I mean, they weren't visibly incompetent, they had their various research interests and I'm sure they were doing passable work on them. And I gave up and left before the conference was over, because I kept thinking "What am I even doing here?" (Competent Elites)

More:

I don't mean to bash normal AGI researchers into the ground. They are not evil. They are not ill-intentioned. They are not even dangerous, as individuals. Only the mob of them is dangerous, that can learn from each other's partial successes and accumulate hacks as a community. (Above-Average AI Scientists)

Even more:

I am tempted to say that a doctorate in AI would be negatively useful, but I am not one to hold someone's reckless youth against them - just because you acquired a doctorate in AI doesn't mean you should be permanently disqualified. (So You Want To Be A Seed AI Programmer)

And:

If you haven't read through the MWI sequence, read it. Then try to talk with your smart friends about it. You will soon learn that your smart friends and favorite SF writers are not remotely close to the rationality standards of Less Wrong, and you will no longer think it anywhere near as plausible that their differing opinion is because they know some incredible secret knowledge you don't. (Eliezer_Yudkowsky August 2010 03:57:30PM)

Comment author: Vladimir_Nesov 05 March 2011 04:41:09PM *  15 points [-]

I think one of the reasons is that the LW/SIAI crowd thinks all other people are below their standards.

"Below their standards" is a bad way to describe this situation, it suggests some kind of presumption of social superiority, while the actual problem is just that the things almost all researchers write presumably on this topic are not helpful. They are either considering a different problem (e.g. practical ways of making real near-future robots not kill wrong people, where it's perfectly reasonable to say that philosophy of consequentialism is useless, since there is no practical way to apply it; or applied ethics, where we ask how humans should act), or contemplate the confusingness of the problem, without making useful progress (a lot of philosophy).

This property doesn't depend on whether we are making progress ourselves, so it's perfectly possible (and to a large extent true) that progress that is up to the standard of being useful is not made by SIAI either.

A point where SIAI makes visible and useful progress is in communicating the difficulty of the problem, the very fact that most of what is purportedly progress on FAI is actually not.

Comment author: lukeprog 06 March 2011 12:30:06AM 2 points [-]

A point where SIAI makes visible and useful progress is in communicating the difficulty of the problem...

This is, in fact, the main goal of my book on the subject. Except, I'll do it in more detail, and spend more time citing the specific examples from the literature that are wrong. Eliezer has done some of this, but there's lots more to do.

Comment author: benelliott 05 March 2011 12:06:27PM 3 points [-]

Your definition of 'LW/SIAI crowd' appears to be 'Eliezer Yudkowsky'.

Comment author: XiXiDu 05 March 2011 12:48:05PM *  -2 points [-]

Your definition of 'LW/SIAI crowd' appears to be 'Eliezer Yudkowsky'.

My current perception is that there are not many independent minds to be found here. I perceive there to be a strong tendency to jump if Yudkowsky tells people to jump. I'm virtually the only true critic of the SIAI, which is really sad and frightening. There are many examples that show how people just 'trust' him or believe into him and I haven't been able to figure out good reasons to do so.

ETA I removed the links to various 'examples' of what I have written above. Please PM me if you are curious.

Comment author: Emile 05 March 2011 09:36:04PM 4 points [-]

My current perception is that there are not many independent minds to be found here. I perceive there to be a strong tendency to jump if Yudkowsky tells people to jump. I'm virtually the only true critic of the SIAI, which is really sad and frightening.

Maybe it's because this "being an independant mind" thing isn't as great as you think it is? Like most people here, I've been raised hearing about the merits of challenging authority, thinking for yourself, questioning everything, not following the herd, etc. But there's a dark side to that, and it's thinking that when you disagree with the experts, you're right and the experts are wrong.

I now think that a lot of those "think for yourself" and "listen to your heart" things are borderline dark-side epistemology, and that by default, the experts are right and I should just shut up until I have some very good reasons to disagree. Any darn fool can decide the experts are victim of groupthink, or don't dare think outside the box, or just want to preserve the status quo. I think changing one's mind when faced with disagreeing expert opinion is a better sign off rationality than "thinking for oneself". I think that many rationalist's self-image as iconoclasts is harmful.

I'm willing to call myself an "Eliezer Yudkosky fanboy" in a bullet-biting kind of way. I don't see the lack of systematic disagreement as a bad thing, and I don't care about looking like a cult member.

Comment author: XiXiDu 06 March 2011 10:26:03AM 1 point [-]

...by default, the experts are right and I should just shut up until I have some very good reasons to disagree.

Yet you decided to trust Yudkowsky, not the experts.

Any darn fool can decide the experts are victim of groupthink, or don't dare think outside the box, or just want to preserve the status quo.

I don't, that is why I am asking experts, many seem not to share Yudkowsky's worries.

I'm willing to call myself an "Eliezer Yudkosky fanboy" in a bullet-biting kind of way. I don't see the lack of systematic disagreement as a bad thing, and I don't care about looking like a cult member.

I actually got a link to his homepage and the SIAI on my homepage for a few years under 'favorites sites'.

Comment author: benelliott 05 March 2011 01:18:07PM *  4 points [-]

Your karma balance should be enough to prove that you definitely aren't the only critic on LW. Others who also disagree with him about various things have even higher balances.

There are definitely a number of true fanboys on this site, they may even be the majority (although I hope not), but they certainly aren't the whole of the LW crowd, and it is intellectually dishonest to put words in the rest of our mouths just by quoting Eliezer.

As for SIAI, by its very purpose only attracts people who agree with Eliezer's philosophy of AI. There is nothing wrong with this. There is no good reason for someone who doesn't believe in the necessity or possibility of FAI to go work there. Would you also object if it seemed like everyone working for Village Reach agreed about giving vaccinations to African children being a good idea?

Comment author: XiXiDu 05 March 2011 04:46:39PM 7 points [-]

There are definitely a number of true fanboys on this site, they may even be the majority (although I hope not)...

See, that one person who donated the current balance of his bank account got 52 upvotes for it. Now I'm not particularly shocked by him doing that or the upvotes. I don't worry that all that money might be better spend somehow. What drives me is curiosity mixed with my personality, I want to do what's right. That is the reason for why I criticize and why some comments may seem, or actually are derogatory. I think it needs to be said, I believe I can provoke feedback that way and learn more about the underlying rational. I desperately try to figure out if there is something I am missing.

I haven't read most of the sequences yet, let me explain why. I'm a really slow reader, I have almost no education and need a lot of time to learn anything. I did a lot of spot tests, reading various posts and came across people who read the sequences but haven't been able to conclude that they should stop doing anything except trying to earn money for the SIAI. My conclusion is that reading the sequences shouldn't be a priority right now but rather learning the mathematical basics, programming and reading various books. But I still try to spend some time here to see if that assessment might be wrong.

My current take on the whole issue is that the sequences do not provide much useful insights. I already know that by all that we know today AGI is possible and that it is unlikely that humans are the absolute limit when it comes to intelligence. I intuitively agree with the notion that AGI in its abstract form (intelligence as an algorithm) doesn't share our values if you do not deliberately 'tell' it to care. I see that one can outweigh even a low probability of risks from AI by assuming a future galactic civilization that is at stake. So what is my problem? I've written hundreds of comments about all kinds of problems I have with it, but maybe the biggest problem is a simple bias. I have an overwhelming gut feeling telling me that something is wrong with all this. I also do not trust my current ability to assess the situation to the extent that I would sacrifice other more compelling goals right now. And I am simply risk-adverse. I know that there is always either a best choice or all options are equal, no matter what uncertainty. Maybe everything is currently speaking in favor of the SIAI, but I'm not able to ignore my gut feeling right now. Trying to do so frequently makes me reluctant to do anything at all. Something is very wrong, I can't pinpoint what it is right now so I'm throwing everything I got at it to see if the facade crumbles. So far it did not crumble but neither have I received much reassuring feedback.

My recent comments have been made after a night of no sleep and being in a bad mood. I wouldn't have written them in that way on another day. I even messaged Eliezer yesterday telling him that he can edit/delete any of my submissions here that might be harmful without having to fear that I will protest and therefore cause more trouble. I don't care about myself much, but I care not to hurt others or cause damage. Sadly I often become reluctant, then I say 'fuck it' and just go ahead to write something because I was overwhelmed by all the possible implications and subsequently ignored them.

What is really confusing is that, taken at face value, the SIAI is working on the most important and most dangerous problem anyone will ever face. The SIAI is trying to take over the universe! Yet all I see in its followers is extreme scope insensitivity. How so? Because if you seriously believe that someone else believes that he is trying to take over the multiverse then you don't just trust him because he wrote a few posts about rationality and being honest. If the stakes are high, people do everything. Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky? Why wouldn't he write the sequences, why wouldn't he claim to be implementing CEV? That is one of the problems that make me feel that something is wrong here. Either people really don't believe all this stuff about fooming AI, galactic civilizations and the ability of the SIAI to create a seed AI, or I'm missing something. What I would expect to see is people asking for transparency. I expect people to demand oversight and ask how exactly their money is being spend. I expect people to be much more critical and to not just believe Yudkowsky but ask for data and progress reports. Nada.

Comment author: Zack_M_Davis 05 March 2011 06:24:55PM 7 points [-]

Either people really don't believe all this stuff about [...] the ability of the SIAI to create a seed AI

It's worth noting that AGI is decades away; no one's trying to take over the universe just yet. In this light, donations to SingInst now are better seen as funding preliminary research and outreach regarding this important problem, rather than funding AI construction.

not just believe Yudkowsky but ask for data and progress reports.

What sort of data and progress reports are you looking for? Glancing at the first two pages of the SingInst blog, I see a list of 2010 publications, and newsletters for last July and October. There's certainly room for criticism (e.g., "Why no newsletter since last October?" or "All this outreach is not very useful; I want to see incremental progress towards FAI"), but I wouldn't say there've been no progress reports.

Comment author: XiXiDu 05 March 2011 07:02:54PM 7 points [-]

What sort of data and progress reports are you looking for?

  • What are they working on right now?
  • Why are they working on it?
  • What constitutes a success of the current project?
  • How much money was spend on that project?
  • What could be done with more or less money?

As far as I know Yudkowsky is currently writing a book. He earnt $95,550 last year.

What I can't reconcile right now is the strong commitment and what is actually being done. Quite a few people here actually seem to donate considerable amounts of their income to the SIAI. No doubt writing the sequences, a book and creating an online community is pretty cool but does not seem to be too cost intensive. At least others manage to do that without lots of people sending them money. I myself donated 3 times. But why are many people acting like the SIAI is the only charity that currently deserves funding, why is nobody asking if they actually need more money or if they are maybe sustainable right now? I haven't heard anything about the acquisition of a supercomputer, field experiments in neuroscience or the hiring of mathematicians. All that would justify further donations. I feel people here are not critical and demanding enough.

Comment author: Dorikka 05 March 2011 06:57:07PM *  4 points [-]

Upvoting for honesty and posting a true rejection.

I haven't read most of the sequences yet

Even if you're a slow reader, I think that it is very, very worth it to read most of the sequences. I've not read QM, Evolution, Decision Theory, and parts of Metaethics/ Human's guide to words, but I think that reading the others has drastically increased my rationality (especially the Core Sequences.) I don't think that reading technical books would have done so nearly as much because I find reading prose much more engaging than math.

My recent comments have been made after a night of no sleep and being in a bad mood.

I've recently concluded that I should place a 'highly suspect' marker on my thoughts (especially negative generalizations) if I am very hungry or tired. I tend to be quite irritable in both cases -- I'll get into arguments in which I'm really not interested in finding truth, but just getting a high from bashing the other person into the ground (please note that I am sharing my own experiences, not accusing you of this.) You may want to type these comments out so that you don't lose the thought but wait to post them until you're feeling better.

Because if you seriously believe that someone else believes that he is trying to take over the multiverse then you don't just trust him because he wrote a few posts about rationality and being honest. If the stakes are high, people do everything. Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky?

I've had these same thoughts before and since resolved them, but I've run out of mental steam and need to do some schoolwork. I may edit this or make a separate reply to this later.

Edit: Bolded script in this post was added for clarification -- bolding does not indicate emphasis here.

Comment author: benelliott 05 March 2011 05:34:45PM 4 points [-]

Interesting thought, I'll admit hadn't actually considered that (I have a general problem with being too trusting and not seeing ulterior motives, although I suspect most people really aren't very dishonest).

I can see a few reasons why others might not be asking:

1) Its unlikely to get an answer. There hasn't been a whole lot of willingness to respond to similar requests in the past, EY has a thing about not giving in to demands. This doesn't really explain why people are still donating.

2) The number of genuine Dr Evils in the world is very small. Historically the most dangerous individuals have been the well-intentioned but deluded rather than the rationally malicious, which is odd since the latter category seem much more dangerous and therefore provides evidence of their rarity. Maybe people are just making an expected utility calculation and determining that the Dr Evil hypothesis is unlikely enough to trust SIAI anyway.

3) Eliezer is not the whole of SIAI, he is not even in charge. Some of the people involved have existing track records, if there is a conspiracy it runs very deep. I suppose its possible he has tricked every other member of the organization, but we are now adding a lot of burdensome details to what was already a fairly unlikely hypothesis.

4) If there are any real Dr Evils out there, then SIAI transparency might actually help them by giving away SIAI ideas while Dr Evil keeps his ideas to himself and as a result finishes his design first.

5) If I was Dr Evil trying to build an AI, then I wouldn't say that was what I was doing, since AI is quite a hard sell and will only get donations from a limited demographic (even more so for an out-of-the-mainstream idea like FAI). I would found the "organization for the protection of puppies kittens and bunnies" or something like that, which will probably get more donation money (or maybe even go into business rather than charity, since current evidence suggests that is overwhelmingly the most effective way to make large amounts of money).

Frankly, rather than a Dr Evil who wants to take over the Galaxy (I don't think he's ever said anything about the multiverse) a much more likely prospect is a conman who's found an underused niche. Of course, this wouldn't explain how he got some fairly big names like Jaan Tallinn and Edwin Edward to sponsor his donation drive.

Most of these reasons are being quite charitable to LW members, and unfortunately I suspect my own reason is the most common.

Comment author: wedrifid 06 March 2011 03:55:26AM *  2 points [-]

Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky? Why wouldn't he write the sequences, why wouldn't he claim to be implementing CEV?

Yes, it is impossible to distinguish a sincere optimist from a perfectly selfish sociopath. At least until they gain power (or move to an audience where the signalling game is played at a higher level of sophistication than that of conveying altruism).

Comment author: Vladimir_Nesov 05 March 2011 05:05:34PM *  -1 points [-]

Ask yourself, what difference would you expect to see if Dr. Evil would disguise as Eliezer Yudkowsky? Why wouldn't he write the sequences, why wouldn't he claim to be implementing CEV?

In that case, I would expect a stupid Eliezer Yudkowsky. But one shouldn't actually reason this way, the question is, what do you anticipate, given observations actually made; not how plausible are the observations actually made, given an uncaused hypothesis.

Comment author: Pavitra 05 March 2011 06:14:49PM 2 points [-]

You can't compute P(H|E) without computing P(E|H).

Comment author: Vladimir_Nesov 05 March 2011 07:42:00PM 2 points [-]

But one shouldn't confuse the two.

Comment author: XiXiDu 05 March 2011 06:28:53PM *  2 points [-]

In that case, I would expect a stupid Eliezer Yudkowsky

Why is evil stupid and what evidence is there that Yudkowsky is smart enough not to be evil?

But one shouldn't actually reason this way, the question is, what do you anticipate, given observations actually made; not how plausible are the observations actually made, given an uncaused hypothesis.

If you got someone working on friendly AI you better ask if the person is friendly in the first place. You also shouldn't make conclusions based on the output of the subject of your conclusions. If Yudkowsky states what is right and states that he will do what is right that provides no evidence about the rightness and honesty of those statements. Besides, the most advanced statements about Yudkowsky's intentions are CEV and the meta-ethics sequence. Both are either criticized or not understood.

The question should be, what is the worst-case scenario regarding Yudkowsky and the SIAI and how can we discern it from what he is signaling? If the answer isn't clear, one should ask for transparency and oversight.

Comment author: Quirinus_Quirrell 05 March 2011 08:37:24PM 17 points [-]

You seem to be under the impression that Eliezer is going to create an artificial general intelligence, and oversight is necessary to ensure that he doesn't create one which places his goals over humanity's interests. It is important, you say, that he is not allowed unchecked power. This is all fine, except for one very important fact that you've missed.

Eliezer Yudkowsky can't program. He's never published a nontrivial piece of software, and doesn't spend time coding. In the one way that matters, he's a muggle. Ineligible to write an AI. Eliezer has not positioned himself to be the hero, the one who writes the AI or implements its utility function. The hero, if there is to be one, has not yet appeared on stage. No, Eliezer has positioned himself to be the mysterious old wizard - to lay out a path, and let someone else follow it. You want there to be oversight over Eliezer, and Eliezer wants to be the oversight over someone else to be determined.

But maybe we shouldn't trust Eliezer to be the mysterious old wizard, either. If the hero/AI programmer comes to him with a seed AI, then he knows it exists, and finding out that a seed AI exists before it launches is the hardest part of any plan to steal it and rewrite its utility function to conquer the universe. That would be pretty evil, but would "transparency and oversight" make things turn out better, or worse? As far as I can tell, transparency would mean announcing the existence of a pre-launch AI to the world. This wouldn't stop Eliezer from make a play to conquer the universe, but it would present that option to everybody else, including at least some people and organizations who are definitely evil.

So that's a bad plan. A better plan would be to write a seed AI yourself, keep it secret from Eliezer, and when it's time to launch, ask for my input instead.

Comment author: Eliezer_Yudkowsky 27 December 2013 12:22:47AM 4 points [-]

(For the record: I've programmed in C++, Python, Java, wrote some BASIC programs on a ZX80 when I was 5 or 6, and once very briefly when MacOS System 6 required it I wrote several lines of a program in 68K assembly. I admit I haven't done much coding recently, due to other comparative advantages beating that one out.)

Comment author: topynate 27 December 2013 12:50:18AM 0 points [-]

I can't find it by search, but haven't you stated that you've written hundreds of KLOC?

Comment author: XiXiDu 06 March 2011 11:57:45AM 1 point [-]

Eliezer has not positioned himself to be the hero, the one who writes the AI or implements its utility function

I disagree based on the following evidence:

After all, if you had the complete decision process, you could run it as an AI, and I'd be coding it up right now. (Eliezer_Yudkowsky 12 October 2009 06:19:28PM)

You further write:

If the hero/AI programmer comes to him with a seed AI, then he knows it exists, and finding out that a seed AI exists before it launches is the hardest part of any plan to steal it and rewrite its utility function to conquer the universe.

I'm not aware of any reason to believe that recursively self-improving artificial general intelligence is going to be something you can 'run away with'. It looks like some people here think so, that there will be some kind of, with hindsight, simple algorithm for intelligence that people can just run and get superhuman intelligence. Indeed, transparency could be very dangerous in that case. But that doesn't mean it is an all or nothing decision. There are many other reasons for transparency, including reassurance and the ability to discern a trickster or impotent individual from someone who deserves more money. But as I said, I don't see that anyway. It'll more likely be a blue sheet of different achievements that are each not dangerous on their own. I further think it will be not just a software solution but also a conceptual and computational revolution. In those cases an open approach will allow public oversight. And even if someone is going to run with it, you want them to use your solution rather than one that will most certainly be unfriendly.

Comment author: Vladimir_Nesov 05 March 2011 07:45:58PM *  2 points [-]

Evil is not necessarily stupid (well, it is, if we are talking about humans, but let's abstract from that). Still, it would take a stupid Dr Evil to decide that pretending to be Eliezer Yudkowsky is the best available course of action.

Comment author: timtyler 05 March 2011 08:48:33PM *  1 point [-]

You don't think that being Eliezer Yudkowsky is an effective way to accomplish the task at hand? What should Dr Evil do, then?

FWIW, my usual comparison is not with Dr Evil, but with Gollum. The Singularity Instutute have explicitly stated said they are trying to form "The Fellowship of the AI". Obviously we want to avoid Gollum's final scene.

Gollum actually started out good - it was the exposure to the ring that caused problems later on.

Comment author: Leonhart 05 March 2011 08:58:03PM 0 points [-]

I seem to remember Smeagol being an unpleasant chap even before Deagol found the ring. But admittedly, we weren't given much.

Comment author: timtyler 05 March 2011 07:16:40PM *  -1 points [-]

what is the worst-case scenario regarding Yudkowsky and the SIAI and how can we discern it from what he is signaling? If the answer isn't clear, one should ask for transparency and oversight.

Transparency is listed as being desirable here:

It will become increasingly important to develop AI algorithms that are not just powerful and scalable, but also transparent to inspection - to name one of many socially important properties.

However, apparently, this doesn't seem to mean open source software - e.g. here:

the Singularity Institute does not currently plan to develop via an open-source method

Comment author: Vladimir_Nesov 05 March 2011 07:40:10PM 2 points [-]

You equivocate two unrelated senses of "transparency".

Comment author: XiXiDu 05 March 2011 05:10:39PM -1 points [-]

Would you also object if it seemed like everyone working for Village Reach agreed about giving vaccinations to African children being a good idea?

If I would disagree and believe that it is worth it to voice my disagreement, then yes. You just can't compare that though. Can you name another group of people who try to take over the universe?

As for SIAI, by its very purpose only attracts people who agree with Eliezer's philosophy of AI. There is nothing wrong with this.

Jehovah's Witnesses also only attract certain people. A lot of money is being donated and spend on brainwashing material designed to get even more money to spend on brainwashing. I think that is wrong. The problem is that nobody there is deliberately doing something 'wrong'. There is no guru, they all believe to do what is 'right'. Nobody is critical. But if they had a forum where one could openly discuss with them about their ideas then I'd be there and challenge them. Not that I want to compare them with LW, that be crazy, but I want to challenge your argument.

Comment author: benelliott 05 March 2011 05:41:02PM 0 points [-]

The Village Reach argument was referring to SIAI, not Less Wrong. They are distinct entities, one is a forum for discussion and the other is an organization with the aim of doing something. It is quite right that the first has many dissenting opinions, whereas the latter does not. SIAI may be able to benefit from dissent on the many sub-issues related to FAI, but not to the fundamental idea that FAI is important.

Imagine a company where about 40% of the employees, even at the highest levels, disagreed with the premise that they should be trying to make money and instead either intentionally tried to lose the company money, or argued constantly with the other 60%. Nothing would get done.

Disagreement about FAI may be good for LW but it is probably not good for SIAI. Since there is disagreement on LW, I really don't see the problem.

Comment author: Vladimir_Nesov 05 March 2011 05:46:43PM *  2 points [-]

SIAI may be able to benefit from dissent on the many sub-issues related to FAI, but not to the fundamental idea that FAI is important.

If FAI is unimportant, SIAI should conclude that FAI is unimportant. Hence it's not clear where the following distinction happens.

Disagreement about FAI may be good for LW but it is probably not good for SIAI.

Comment author: benelliott 05 March 2011 05:55:23PM *  1 point [-]

I don't think its the best use of any organization's money to employ people who disagree with the premise that the organization should exist.

Comment author: Vladimir_Nesov 05 March 2011 05:59:02PM 0 points [-]

Nonetheless, I don't think its the best use of any organization's money to employ people who disagree with the premise that the organization should exist.

But disagreement itself is not the reason for this being a bad strategy.

Comment author: benelliott 05 March 2011 06:05:51PM *  3 points [-]

I don't quite follow. The only point I was trying to make was that "everybody in SIAI agrees about FAI, therefore they're all a bunch of brainwashed zombies" is not a valid complaint.

Comment author: Vladimir_Nesov 05 March 2011 05:15:41PM 0 points [-]

Not that I want to compare them with LW, that be crazy, but I want to challenge your argument.

What argument? benelliott suggested that your argument makes use of a very weak piece of evidence (presence of significant agreement). Obviously, interpreted as counterevidence of the opposite claim, it is equally weak.

Comment author: lukeprog 06 March 2011 03:28:16AM 2 points [-]

I doubt you're "virtually the only true critic of the SIAI."

But if you think I'm not much of a critic of SIAI/Yudkowsky, you're right. Many of my posts have included minor criticisms, but that's because it's not as valued here to just repeat all the thousands of things on which I agree with Eliezer.

Comment author: XiXiDu 06 March 2011 10:34:01AM *  3 points [-]

But if you think I'm not much of a critic of SIAI/Yudkowsky, you're right. Many of my posts have included minor criticisms, but that's because it's not as valued here to just repeat all the thousands of things on which I agree with Eliezer.

I actually messaged him telling him that he can edit/delete any harmful submissions of mine without having to expect harmful protest. Does that look like I particularly disagree with him, or assign a high probability to him being Dr. Evil? I don't, but it is a possibility and it is widely ignored. To get provable friendly AI you'll need provable friendly humans. If that isn't possible you'll need oversight and transparency.

  • Smart people can be wrong.
  • Smart people can be evil.
  • People can appear smarter than they are.

That's why I demand...

  • Third-party peer-review of Yudkowsky's work.
  • Oversight and transparency.
  • Progress reports, roadmaps and confirmable success.
Comment author: wedrifid 06 March 2011 10:45:12AM 2 points [-]

To get provable friendly AI you'll need provable friendly humans.

Not actually true.

Comment author: XiXiDu 06 March 2011 11:33:37AM *  1 point [-]

Not actually true.

Technically it isn't of course. But I don't expect unfriendly humans not to show me friendly AI but actually implement something else. What I meant is that you'll need friendly humans to not end up with some trickster who takes your money and in 30 years you notice that all he has done is to code some chat bot. There are a lot of reasons that the trustworthiness of the humans involved is important. Of course, provable friendly AI is provable friendly no matter who coded it.

Comment author: wedrifid 06 March 2011 03:41:26AM *  0 points [-]

My current perception is that there are not many independent minds to be found here. I perceive there to be a strong tendency to jump if Yudkowsky tells people to jump. I'm virtually the only true critic of the SIAI, which is really sad and frightening.

I criticise Eliezer frequently. I manage to do so without being particularly negatively received by the alleged Yudkowsky hive mind.

Note: My criticisms of EY/SIAI are specific even if consistent. Like lukeprog I do not feel the need to repeat the thousands of things about which I agree with EY.

Further Note: There are enough distinct things that I disagree with Eliezer about that, given my metacognitive confidence levels I can expect that on at least one of them I am mistaken. Which is a curious epistemic state to be in but purely tangential. ;)

Yet another edit: A clear example of criticism of Eliezer is with respect to his discussion of his metaethics and CEV. I didn't find his contribution in the linked conversation satisfactory and consider it representative of his other recent contributions on the subject. Everything except his sequence on the subject has been nowhere near the standard I would expect from someone dedicating their life to studying a subject that will rely reasoning flawlessly in the related area!

Comment author: XiXiDu 06 March 2011 10:51:16AM 0 points [-]

Like lukeprog I do not feel the need to repeat the thousands of things about which I agree with EY.

You think I don't? I agree with almost everyone about thousands of things. I perceive myself to be an uneducated fool. If I read a few posts of someone like Yudkowsky and intuitively agree, that is very weak evidence to trust him or of his superior intellect.

I still think that he's one of the smartest people though. But there is a limit to what I'll just accept on mere reassurance. And I have seen nothing that would allow me to conclude that he could accomplish much regarding friendly AI without a billion dollars and a team of mathematicians and other specialists.

Comment author: wedrifid 06 March 2011 11:47:11AM 0 points [-]

You think I don't?

No, that wasn't for your benefit at all. Just disclaiming limits. Declarations of criticism are sometimes worth tempering just a tad. :)

Comment author: David_Gerard 05 March 2011 10:43:03AM *  0 points [-]

By the MWI sequence, I presume he means the QM sequence, which appears clear to me but bogus to physicists I've asked ... and, more importantly, to the physicists who commented on the posts in it and said that he couldn't do what he'd just done (To which he answered that he doesn't claim to be a physicist.)

Also, judging by the low votes and small number of commenters, it seems that even people who claim to have read the sequences have tended to tl;dr at the QM sequence.

(I finally finished a first run through the million words of sequences and the millions of words of comments. I only finally tipped my tl;dr tilt sensor at the decision theory sequence, which isn't actually very sequential.)

Comment author: wedrifid 06 March 2011 01:02:41AM -1 points [-]

I love those quotes. The one about negatively useful AI doctorates is a favourite of mine. :)

Comment author: Manfred 05 March 2011 11:45:19AM -1 points [-]

Huh, just read So You Want To Be A Seed AI Programmer. Appears to be from 2009. I would recommend http://www.fastcompany.com/magazine/06/writestuff.html as a highly contrasting frame of thought.

Comment author: komponisto 05 March 2011 03:25:12PM 3 points [-]

Huh, just read So You Want To Be A Seed AI Programmer. Appears to be from 2009

It's from much earlier than that (like 2005 or something). That particular wiki isn't the original source.

Comment author: Daniel_Burfoot 05 March 2011 05:28:14PM *  2 points [-]

With regards to your (and Eliezer's) quest, I think Oppenheimer's Maxim is relevant:

It is a profound and necessary truth that the deep things in science are not found because they are useful, they are found because it was possible to find them.

A theory of machine ethics may very well be the most useful concept ever discovered by humanity. But as far as I can see, there is no reason to believe that such a theory can be found.

Comment author: lukeprog 06 March 2011 12:16:46AM 4 points [-]

Daniel_Burfoot,

I share your pessimism. When superintelligence arrives, humanity is almost certainly fucked. But we can try.

Comment author: timtyler 05 March 2011 07:41:11PM 0 points [-]

For the list:

The Ethics of Artificial Intelligence http://www.nickbostrom.com/ethics/artificial-intelligence.pdf

Ethical Issues in Advanced Artificial Intelligence http://www.nickbostrom.com/ethics/ai.html

Beyond AI http://mol-eng.com/

Comment author: lukeprog 06 March 2011 12:15:49AM 0 points [-]

Tim,

I have hundreds of papers I could upload and put on the list. The list was just a preview. Thanks anyway.

Comment author: Daniel_Burfoot 05 March 2011 02:42:28PM 0 points [-]

Cloos, “The Utilibot Project: An Autonomous Mobile Robot Based on Utilitarianism"

!!!