BOOK DRAFT: 'Ethics and Superintelligence' (part 1, revised)

lukeprog

As previously announced, I plan to post the first draft of the book, Ethics and Superintelligence, in tiny parts, to the Less Wrong discussion area. Your comments and constructive criticisms are much appreciated.

This is not a book for a mainstream audience. Its style is that of contemporary Anglophone philosophy. Compare to, for example, Chalmers' survey article on the singularity.

Bibliographic references are provided here.

This "part 1" section is probably the only part of which I will post revision to Less Wrong. Revisions of further parts of the book will probably not appear publicly until the book is published.

Revised part 1 below....

1. The technological singularity is coming soon.

Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica (MacKenzie 1995). By the late 1990s, “expert systems” had surpassed human ability in a wide range of tasks.[i] In 1997, IBM’s Deep Blue defeated the reigning World Chess Champion Garry Kasparov (Campbell et al. 2002). In 2011, IBM’s Watson beat the best human players at a much more complicated game: Jeopardy! (Someone, 2011). Recently, a robot scientist was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results. It answered a question about yeast that had baffled human scientists for 150 years (King 2011).

Many experts think that human-level general intelligence may be created within this century.[ii] This raises an important question. What will happen when an artificial intelligence (AI) surpasses human ability at designing artificial intelligences?

I.J. Good (1965) speculated that such an AI would be able to improve its own intelligence, leading to a positive feedback loop of improving intelligence – an “intelligence explosion.” Such a machine would rapidly become intelligent enough to take control of the internet, use robots to build itself new hardware, do science on a massive scale, invent new computing technology and energy sources, or achieve similar dominating goals. As such, it could be humanity’s last invention (Bostrom 2003).

Humans would be powerless to stop such a “superintelligence” (Bostrom 1998) from accomplishing its goals. Thus, if such a scenario is at all plausible, then it is critically important to program the goal system of this superintelligence such that it does not cause human extinction when it comes to power.

Success in that project could mean the difference between a utopian solar system of unprecedented harmony and happiness, and a solar system in which all available matter (including human flesh) has been converted into parts for a planet-sized computer built to solve difficult mathematical problems.[iii]

The technical challenges of designing the goal system of such a superintelligence are daunting.[iv] But even if we can solve those problems, the question of which goal system to give the superintelligence remains. It is at least partly a question of philosophy – a question of ethics.

***

In this chapter I argue that a single, powerful superintelligence - one variety of what Bostrom (2006) calls a “singleton" - is likely to arrive within the next 200 years unless a worldwide catastrophe drastically impedes scientific progress.

The singleton will produce very different future worlds depending on which normative theory is used to design its goal system. In chapter two, I survey many popular normative theories, and conclude that none of them offer an attractive basis for designing the motivational system of a machine superintelligence.

Chapter three reformulates and strengthens what is perhaps the most developed plan for the design of the singleton’s goal system – Eliezer Yudkowsky’s (2004) “Coherent Extrapolated Volition.” Chapter four considers some outstanding worries about this plan.

In chapter five I argue that we cannot decide how to design the singleton’s goal system without considering meta-ethics, because normative theory depends on meta-ethics. The next chapter argues that we should invest little effort in meta-ethical theories that do not fit well with our emerging reductionist picture of the world, just as we quickly abandon scientific theories that don’t fit the available scientific data. I also identify several meta-ethical positions that I think are good candidates for abandonment.

But the looming problem of the technological singularity requires us to have a positive theory, too. Chapter seven proposes some meta-ethical claims about which I think naturalists should come to agree. In the final chapter, I consider the implications of these meta-ethical claims for the design of the singleton’s motivational system.

***

[i] For a detailed history of achievements and milestone in artificial intelligence, see Nilsson (2009).

[ii] Bainbridge (2005), Baum et al. (2010), Chalmers (2010), Legg (2008), Vinge (1993), Nielsen (2011), Yudkowsky (2008).

[iii] This particular nightmare scenario is given in Yudkowsky (2001), who believes Marvin Minsky may have been the first to suggest it.

[iv] These technical challenges are discussed in the literature on artificial agents in general and Artificial General Intelligence (AGI) in particular. Russell and Norvig (2009) provide a good overview of the challenges involved in the design of artificial agents. Goertzel and Pennachin (2010) provide a collection of recent papers on the challenges of AGI. Yudkowsky (2010) proposes a new extension of causal decision theory to suit the needs of a self-modifying AI. Yudkowsky (2001) discusses other technical (and philosophical) problems related to designing the goal system of a superintelligence.

(1) The subject "a program written in 1956" is vague. What program? Does it have a name? Who wrote it? Later you write "one of them" when referring to the proof. Which one?

(2) The next three sentences start the same way. Try to avoid that.

(3) Omit needless words.

(4) Avoid the passive voice.

(5) This section needs a rewrite.

(6) The adverb is not your friend.

(7) What is a "technological singularity"? You haven't defined this term.

Great start. I look forward to more.

It's great to see that you're doing this. I think it's incredibly valuable to formulate, clarify, and argue in structured format like this. Keep going! Here are some comments, I haven't held back :)

If you want to reach, be taken seriously by, and convince the kind of audience for which Chalmers' paper was intended then you need to get much more careful and meticulous in your analysis.

In your first paragraph you give excited accounts of some successes in AI. This is fine for scene-setting in mainstream literature but if you're going to cite these in an academic setting then you need to expand and deepen your description of them.

Each subsequent paragraph takes quite gigantic analytical steps. To someone that isn't already familiar with the topic, it's not clear the extent to which your citations back up your claims (one might just think they define terms or provide some weak corroboration). You could say things like "Bostrom shows..." or "Yudkowsky has developed...". Even if you target an audience that is already familiar with this area I think it's worth greatly expanding this introductory area.

Since you're going to use singletons deeply within your analysis it would be well worth giving a clear and concise definition rather than only citing Bostrom.

Yeah; this is only the intro. I'm going to revisit all this material in more detail in the very next section.

Again, I'm not sure how relevant this is to your goals, but if you're trying to reach some kind of academic audience a la Chalmers' paper, then you should probably say "we will argue in chapter X that ..."

You are also making lots of specific claims, .e.g "unless a global catastrophe stop scientific progress" - are there really no other scenarios? In constructing a well-reasoned argument, you want to avoid as much of this unnecessary vulnerability - people will quote this out of context and then dismiss your entire book

I repeat that this is the intro. I don't argue for anything here. That comes later. Read the openings of any other academic work. They do not contain arguments. They contain previews of what will be argued.

He seems to be going into details about those other points in part 2, which he already posted.

Interestingly, I think he should go into less details about those scenarios, to avoid boring his audience and losing them in irrelevant detail. Which goes to show there's no pleasing everybody.

Yeah, like most books for academics, this is definitely a book only for people who are highly interested in this very narrow topic.

Sounds great so far! Good job trying to reach out to new audiences! Upvoted.

However, I STRONGLY recommend discussing whatever grievances with his theories you have with Eliezer for a few minutes before writing on them. To avoid embarrassing mistakes like spending a chapter discussing a problem he could instantly refute with a single sentence and then only mentioning the problem that turned out to actually be problematic only in passing.

Formatting note: You should really use relative links for anchors, your use of absolute links renders them broken. (This has happened with some of your other posts too, e.g. your last one, though there they weren't totally broken as at least the linked page was on the web.)

Yudkowsky (2010) proposes a new extension of causal decision theory to suit the needs of a self-modifying AI.

I wouldn't describe TDT as "an extension of CDT". Also, the AI being self-modifying is probably not the central issue actually addressed by TDT (as opposed to what possibly motivated development of the ideas).

"An extension of CDT" is how Yudkowsky (2010) presents TDT. The paper also says one of its central goals is to handle self-modifying AI.

A page number, rather than a reminder of the publication year, would be a more helpful parenthetical there.

The first point is made in paragraph #1:

"Timeless decision theory (TDT) is an extension of causal decision networks that compactly represents uncertainty about correlated computational processes and represents the decision-maker as such a process."

The second point is made in paragraph #2:

"I show that an evidential or causal decision-maker capable of self-modifying actions, given a choice between remaining an evidential or causal decision-maker and modifying itself to imitate a timeless decision-maker, will choose to imitate a timeless decisionmaker on a large class of problems."

http://singinst.org/upload/TDT-v01o.pdf

"Timeless decision theory (TDT) is an extension of causal decision networks that compactly represents uncertainty about correlated computational processes and represents the decision-maker as such a process."

After a fashion, since causal networks are not exactly CDT, modeling correlated computations with causal networks makes them less "causal" (i.e. related to physical causality), and the paper doesn't achieve clear specification of how to do that (it's an open problem, but I can say that any nontrivial causal network relating computations may need to be revised in face of new logical evidence, which makes the decision procedure that itself works with resolution of logical uncertainty brittle).

"I show that an evidential or causal decision-maker capable of self-modifying actions, given a choice between remaining an evidential or causal decision-maker and modifying itself to imitate a timeless decision-maker, will choose to imitate a timeless decisionmaker on a large class of problems."

That CDT/EDT agents with self-modification would become more TDT-like is somewhat different from saying that TDT "suits the needs of a self-modifying AI". TDT is a more sane theory, and to the extent CDT/EDT agents would prefer to be more effective, they'd prefer to adopt TDT's greater sanity. But TDT is not a fixed point, suiting the needs of a self-modifying AI is a tall order that probably can't be met to any reasonable extent, since that would mean establishing some rules that AI itself would not have much opportunity to rebuild. Perhaps laws of physics or logic can quality, with appropriate framing.

(I agree with your description more now than when I posted the comment, my perception of the paper having clouded the memory of its wording.)

Fair enough. Thanks for this. I've clarified the wording in my copy of this intro.

The first point is made in paragraph #1:

"Timeless decision theory (TDT) is an extension of causal decision networks that compactly represents uncertainty about correlated computational processes and represents the decision-maker as such a process."

Warning: my comment here is based on dim memories and vague understandings. That said...

Are you sure that being an extension of causal decision networks is the same as being an extension of causal decision theory? I took "causal decision networks" to be referring to the Judea Pearl-inspired stuff. I thought that Eliezer said somewhere that CDT just assumes causal dependencies wherever it wants, while Eliezer derives them with causal networks.

I'm familiar with the link. I suggested adding the page number because that affords significant ease to finding the passage you claim is present, even given the pdf file.

Edit: To avoid this passive-aggressive beating around the bush: the part you've posted so far gives off the vibe of bibliography padding. You've done a great service to LW in your articles reminding everyone of the importance of building off the work of others. But if there's a specific claim the author makes that you're relying on, it helps establish relevance and make things easy on the reader if you are more specific than just citing the whole work

Re: your edit.

That is not standard practice when the point I'm citing the work for is in the abstract of the paper. Moreover, it's just not true that it's always standard practice to cite the page number of the specific work. I can point you to thousands of examples. That said, for books it is indeed helpful to cite page numbers.

Separate issues going on here: yes, it is common not to cite the specific page number, perhaps because one's referring to the work "as a whole". But my criticism of your paper having a vibe of bib-padding was not because of you saying "Yudkowsky (2010)" here, or any particular time in your excerpt, but rather, because taken as a whole, the excerpt looks that way. For example, there is much more citation than explanation of what the cited works are specifically there to substantiate.

Certainly, doing that a few times is fine, but here it looks more like what would result from someone trying to game the "bibliography length metric".

Independently of that, your comment on this thread was supposedly to help someone find where a paper makes a claim that others didn't remember the paper making. Regardless of the standards for academic papers, it is general etiquette that would suggest you give the page number in such a reply. And that shouldn't be hard if you know the paper well enough to be using it for that claim.

So while "Yudkowsky (2010)" might be fine for the paper, when someone questions whether it makes a specific claim, you should do a little more than just give the naked citation a second time within the same thread -- then you should give the page number.

I just disagree with all of this. The "explanation" stuff comes later; this is just the intro. If you read major works in Anglophone philosophy, you'll see that what I have here is very much the same style. You can even compare to, for example, the extended abstract submitted by Salamon & Bostrom for an edited volume on the technological singularity.

And no, you don't usually give the page number when the claims you're saying a cited work covers are in the abstract of the paper. The hope is that someone will bother to at least read the abstract of the cited paper before bothering the original author with questions about "Which page number is that on?"

"Um... page one?"

I just disagree with all of this.

I would ask you to reconsider. As SilasBarta says, "Yudkowsky (2010)" is fine in the paper, but you used it in the comments here in response to someone's question in this forum.

And no, you don't usually give the page number when the claims you're saying a cited work covers are in the abstract of the paper. The hope is that someone will bother to at least read the abstract of the cited paper before bothering the original author with questions about "Which page number is that on?"

"Um... page one?"

You seem to assume that the only way someone could have asked that question is if they hadn't read even the abstract. But it is easy for me to imagine someone who read the whole paper, or some significant fraction, and just have missed or forgotten the claim that you attribute to Yudkowsky . In which case, saying "It's on page one" would be helpful.

In fact, having read that significant fraction, I would be moderately surprised to hear Yudkowsky characterize TDT as an extension of CDT. He gave me the strong impression of offering an alternative to CDT, one which gets right answers where CDT is wrong. To me, calling TDT "an extension of CDT" implies that it applies to a wider range of problems than CDT, while agreeing with CDT where CDT gives a well-defined answer.

To me, calling TDT "an extension of CDT" implies that it applies to a wider range of problems than CDT, while agreeing with CDT where CDT gives a well-defined answer.

But this is a correct characterization of what TDT does. It extends applicability of CDT from action-determined to decision-determined problems.

But this is a correct characterization of what TDT does. It extends applicability of CDT from action-determined to decision-determined problems.

But CDT already gives well-defined answers to decision-determined problems such as Newcomb's problem. They're just not necessarily the right answers.

By "applies", I mean "yields an output, which supporters claim is correct", not "yields a correct output".

Again, separate issues here that I think you're blurring (of I'm just being unclear): I'm not criticizing you for lacking page numbers in the paper excerpt, but for piling on whole-work citations without clarifying what specific insight it adds. I will have to retract that in light of viewing this as an introduction, since you say your paper will cover that later.

WRT page numbers, my criticism was that when someone says, "hey, that claim about [citation of X which I have already read] doesn't sound right", then you should give a more helpful answer than "oh, that's in [citation of X]" -- you should point to a more specific passage.

And yes, it would have indeed been helpful if you had said "see the abstract", because that would have told the questioner (and the onlooker, me, who was wondering the same thing) what you are basing that claim on. In this matter, for the reasons Vladimir gave, TDT isn't best regarded as an extension of CDT. So a reply that you just got it from the abstract would show that (as turned out to be the case) your claim was based on a summary rather than on a specific analysis of the mechanics of TDT and its relationship to other decision theories.

Edit: I apologize for any abrasiveness in my comments here. I sensed a kind of stubbornness and condescension in your replies and overreacted to that.

WRT page numbers, my criticism was that when someone says, "hey, that claim about [citation of X which I have already read] doesn't sound right", then you should give a more helpful answer than "oh, that's in [citation of X]" -- you should point to a more specific passage.

Luke quoted the source and gave a link to it in digital form. If you want to find the context, open it up in Acrobat and copy the quote into the search bar.

It wasn't in quote form, and like I just said, it matters which instance of the "TDT = extension of CDT" claim lukeprog had in mind. As it turns out, he was relying on a qualification-free summary, rather than the meat of the paper. If he had said so the first time Vladimir asked about it (rather than repeat the exact citation he already gave, and which the questioner had already read and already had the URL for), then he could have saved himself, me, and Vladimir the time it took to unravel this oversimplification (as it turned out to be).

"Find the passage that helps me the most" just isn't good enough.

It wasn't in quote form

Luke's comment has two paragraphs in quotes, and a link to Eliezer's TDT paper. If you follow the link, you can copy either of the quoted paragraphs into Acrobats search bar, and it will show where in the document that paragraph appears.

Your first request for clarification that led to this comment was justified. But that should have satisfied your desire for a citation.

That wasn't until ~3 rounds of back-and-forth! And I didn't ask further after that; I just said why the previous answers weren't as helpful. And while the desire for a citation may have been satisfied, the entire point was to reveal the level of understanding which led to his claim about the TDT paper, and that issue was not satisfied by lukeprog's responses. His exchange with Vladimir, however, did show my concerns to be justified.

Again, if he had just simply said from the beginning, "Oh, I was just copying what the abstract said", it would have saved everyone a lot of time. But instead, he decided to unhelpfully repeat a citation everyone already knew about, thus hiding his level of understanding for a few more rounds.

I don't know why you'd want to defend that.

That wasn't until ~3 rounds of back-and-forth!

You asked for clarification once, and Luke gave a satisfactory response. How do you get "~3 rounds" from that?

I just said why the previous answers weren't as helpful.

There was no reason for you to do that, you got an answer that addressed those concerns.

And while the desire for a citation may have been satisfied, the entire point was to reveal the level of understanding which led to his claim about the TDT paper, and that issue was not satisfied by lukeprog's responses.

Notice how Nesov, by focusing back on the object level after the issue of communicating citations was resolved, was able to deal with in this comment.

It answered a question about yeast that had baffled human scientists for 150 years (King 2011).

I've read several of King's papers, and I don't recall it answering any important questions. Which King publication are you citing, and what was the discovery?

Such a machine would rapidly become intelligent enough to take control of the internet, use robots to build itself new hardware, do science on a massive scale, invent new computing technology and energy sources, or achieve similar dominating goals.

This section sounds like Sci-fi, and might make some readers take the work as a whole less seriously. On the other hand, is is a pretty reasonable prediction (in terms of "how big an impact an AI could have", even if the details are wrong), so down-toning it would be a bit dishonest.

I don't know what the best strategy is to talk about such technical risks without sounding like a loon. I'm more in favor of a more cautious path of keeping such claims vague and general unless they are preceded by enough explanations that they don't seem that outlandish. But I don't know your audience as well as you do, I just think it's an aspect that requires more care than when writing for LessWrong.

Vague and general is never a good idea.

Vague and general claims about the future are more likely to be accurate than detailed claims, even though we might find detailed claims more believable - see Burdensome Details.

My objection was about the writing. See Williams.

Relevant old link: Singularity writing advice by Eliezer.

"Timeless decision theory (TDT) is an extension of causal decision networks that compactly represents uncertainty about correlated computational processes and represents the decision-maker as such a process."

"I show that an evidential or causal decision-maker capable of self-modifying actions, given a choice between remaining an evidential or causal decision-maker and modifying itself to imitate a timeless decision-maker, will choose to imitate a timeless decisionmaker on a large class of problems."

(I agree with your description more now than when I posted the comment, my perception of the paper having clouded the memory of its wording.)

Fair enough. Thanks for this. I've clarified the wording in my copy of this intro.