Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Eliezer_Yudkowsky comments on Reply to Holden on 'Tool AI' - Less Wrong

92 Post author: Eliezer_Yudkowsky 12 June 2012 06:00PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (339)

You are viewing a single comment's thread. Show more comments above.

Comment author: Eliezer_Yudkowsky 18 July 2012 02:12:12PM 25 points [-]

So first a quick note: I wasn't trying to say that the difficulties of AIXI are universal and everything goes analogously to AIXI, I was just stating why AIXI couldn't represent the suggestion you were trying to make. The general lesson to be learned is not that everything else works like AIXI, but that you need to look a lot harder at an equation before thinking that it does what you want.

On a procedural level, I worry a bit that the discussion is trying to proceed by analogy to Google Maps. Let it first be noted that Google Maps simply is not playing in the same league as, say, the human brain, in terms of complexity; and that if we were to look at the winning "algorithm" of the million-dollar Netflix Prize competition, which was in fact a blend of 107 different algorithms, you would have a considerably harder time figuring out why it claimed anything it claimed.

But to return to the meta-point, I worry about conversations that go into "But X is like Y, which does Z, so X should do reinterpreted-Z". Usually, in my experience, that goes into what I call "reference class tennis" or "I'm taking my reference class and going home". The trouble is that there's an unlimited number of possible analogies and reference classes, and everyone has a different one. I was just browsing old LW posts today (to find a URL of a quick summary of why group-selection arguments don't work in mammals) and ran across a quotation from Perry Metzger to the effect that so long as the laws of physics apply, there will always be evolution, hence nature red in tooth and claw will continue into the future - to him, the obvious analogy for the advent of AI was "nature red in tooth and claw", and people who see things this way tend to want to cling to that analogy even if you delve into some basic evolutionary biology with math to show how much it isn't like intelligent design. For Robin Hanson, the one true analogy is to the industrial revolution and farming revolutions, meaning that there will be lots of AIs in a highly competitive economic situation with standards of living tending toward the bare minimum, and this is so absolutely inevitable and consonant with The Way Things Should Be as to not be worth fighting at all. That's his one true analogy and I've never been able to persuade him otherwise. For Kurzweil, the fact that many different things proceed at a Moore's Law rate to the benefit of humanity means that all these things are destined to continue and converge into the future, also to the benefit of humanity. For him, "things that go by Moore's Law" is his favorite reference class.

I can have a back-and-forth conversation with Nick Bostrom, who looks much more favorably on Oracle AI in general than I do, because we're not playing reference class tennis with "But surely that will be just like all the previous X-in-my-favorite-reference-class", nor saying, "But surely this is the inevitable trend of technology"; instead we lay out particular, "Suppose we do this?" and try to discuss how it will work, not with any added language about how surely anyone will do it that way, or how it's got to be like Z because all previous Y were like Z, etcetera.

My own FAI development plans call for trying to maintain programmer-understandability of some parts of the AI during development. I expect this to be a huge headache, possibly 30% of total headache, possibly the critical point on which my plans fail, because it doesn't happen naturally. Go look at the source code of the human brain and try to figure out what a gene does. Go ask the Netflix Prize winner for a movie recommendation and try to figure out "why" it thinks you'll like watching it. Go train a neural network and then ask why it classified something as positive or negative. Try to keep track of all the memory allocations inside your operating system - that part is humanly understandable, but it flies past so fast you can only monitor a tiny fraction of what goes on, and if you want to look at just the most "significant" parts, you would need an automated algorithm to tell you what's significant. Most AI algorithms are not humanly understandable. Part of Bayesianism's appeal in AI is that Bayesian programs tend to be more understandable than non-Bayesian AI algorithms. I have hopeful plans to try and constrain early FAI content to humanly comprehensible ontologies, prefer algorithms with humanly comprehensible reasons-for-outputs, carefully weigh up which parts of the AI can safely be less comprehensible, monitor significant events, slow down the AI so that this monitoring can occur, and so on. That's all Friendly AI stuff, and I'm talking about it because I'm an FAI guy. I don't think I've ever heard any other AGI project express such plans; and in mainstream AI, human-comprehensibility is considered a nice feature, but rarely a necessary one.

It should finally be noted that AI famously does not result from generalizing normal software development. If you start with a map-route program and then try to program it to plan more and more things until it becomes an AI... you're doomed, and all the experienced people know you're doomed. I think there's an entry or two in the old Jargon File aka Hacker's Dictionary to this effect. There's a qualitative jump to writing a different sort of software - from normal programming where you create a program conjugate to the problem you're trying to solve, to AI where you try to solve cognitive-science problems so the AI can solve the object-level problem. I've personally met a programmer or two who've generalized their code in interesting ways, and who feel like they ought to be able to generalize it even further until it becomes intelligent. This is a famous illusion among aspiring young brilliant hackers who haven't studied AI. Machine learning is a separate discipline and involves algorithms and problems that look quite different from "normal" programming.

Comment author: HoldenKarnofsky 18 July 2012 04:29:00PM 14 points [-]

Thanks for the response. My thoughts at this point are that

  • We seem to have differing views of how to best do what you call "reference class tennis" and how useful it can be. I'll probably be writing about my views more in the future.
  • I find it plausible that AGI will have to follow a substantially different approach from "normal" software. But I'm not clear on the specifics of what SI believes those differences will be and why they point to the "proving safety/usefulness before running" approach over the "tool" approach.
  • We seem to have differing views of how frequently today's software can be made comprehensible via interfaces. For example, my intuition is that the people who worked on the Netflix Prize algorithm had good interfaces for understanding "why" it recommends what it does, and used these to refine it. I may further investigate this matter (casually, not as a high priority); on SI's end, it might be helpful (from my perspective) to provide detailed examples of existing algorithms for which the "tool" approach to development didn't work and something closer to "proving safety/usefulness up front" was necessary.
Comment author: oooo 06 July 2013 05:50:31PM *  3 points [-]

Canonical software development examples emphasizing "proving safety/usefulness before running" over the "tool" software development approach are cryptographic libraries and NASA space shuttle navigation.

At the time of writing this comment, there was recent furor over software called CryptoCat that didn't provide enough warnings that it was not properly vetted by cryptographers and thus should have been assumed to be inherently insecure. Conventional wisdom and repeated warnings from the security community state that cryptography is extremely difficult to do properly and attempting to create your own may result in catastrophic results. A similar thought and development process goes into space shuttle code.

It seems that the FAI approach to "proving safety/usefulness" is more similar to the way cryptographic algorithms are developed than the (seemingly) much faster "tool" approach, which is more akin to web development where the stakes aren't quite as high.

EDIT: I believe the "prove" approach still allows one to run snippets of code in isolation, but tends to shy away from running everything end-to-end until significant effort has gone into individual component testing.

Comment author: wedrifid 19 July 2012 02:05:49AM 1 point [-]

I can have a back-and-forth conversation with Nick Bostrom, who looks much more favorably on Oracle AI in general than I do, because we're not playing reference class tennis with "But surely that will be just like all the previous X-in-my-favorite-reference-class", nor saying, "But surely this is the inevitable trend of technology"; instead we lay out particular, "Suppose we do this?" and try to discuss how it will work, not with any added language about how surely anyone will do it that way, or how it's got to be like Z because all previous Y were like Z, etcetera.

That's one way to "win" a game of reference class tennis. Declare unilaterally that what you are discussing falls into the reference class "things that are most effectively reasoned about by discussing low level details and abandoning or ignoring all observed evidence about how things with various kinds of similarity have worked in the past". Sure, it may lead to terrible predictions sometimes but by golly, it means you can score an 'ace' in the reference class tennis while pretending you are not even playing!

Comment author: Eliezer_Yudkowsky 19 July 2012 05:52:40PM 9 points [-]

And atheism is a religion, and bald is a hair color.

The three distinguishing characteristics of "reference class tennis" are (1) that there are many possible reference classes you could pick and everyone engaging in the tennis game has their own favorite which is different from everyone else's; (2) that the actual thing is obviously more dissimilar to all the cited previous elements of the so-called reference class than all those elements are similar to each other (if they even form a natural category at all rather than having being picked out retrospectively based on similarity of outcome to the preferred conclusion); and (3) that the citer of the reference class says it with a cognitive-traffic-signal quality which attempts to shut down any attempt to counterargue the analogy because "it always happens like that" or because we have so many alleged "examples" of the "same outcome" occurring (for Hansonian rationalists this is accompanied by a claim that what you are doing is the "outside view" (see point 2 and 1 for why it's not) and that it would be bad rationality to think about the "individual details").

I have also termed this Argument by Greek Analogy after Socrates's attempt to argue that, since the Sun appears the next day after setting, souls must be immortal.

Comment author: [deleted] 19 July 2012 10:20:53PM *  11 points [-]

I have also termed this Argument by Greek Analogy after Socrates's attempt to argue that, since the Sun appears the next day after setting, souls must be immortal.

For the curious, this is from the Phaedo pages 70-72. The run of the argument are basically thus:

P1 Natural changes are changes from and to opposites, like hot from relatively cold, etc.

P2 Since every change is between opposites A and B, there are two logically possible processes of change, namely A to B and B to A.

P3 If only one of the two processes were physically possible, then we should expect to see only one of the two opposites in nature, since the other will have passed away irretrievably.

P4 Life and death are opposites.

P5 We have experience of the process of death.

P6 We have experience of things which are alive

C From P3, 4, 5, and 6 there is a physically possible, and actual, process of going from death to life.

The argument doesn't itself prove (haha) the immortality of the soul, only that living things come from dead things. The argument is made in support of the claim, made prior to this argument, that if living people come from dead people, then dead people must exist somewhere. The argument is particularly interesting for premises 1 and 2, which are hard to deny, and 3, which seems fallacious but for non-obvious reasons.

Comment author: Eliezer_Yudkowsky 20 July 2012 04:40:00PM 5 points [-]

This sounds like it might be a bit of a reverent-Western-scholar steelman such as might be taught in modern philosophy classes; Plato's original argument for the immortality of the soul sounded more like this, which is why I use it as an early exemplar of reference class tennis:

-

Then let us consider the whole question, not in relation to man only, but in relation to animals generally, and to plants, and to everything of which there is generation, and the proof will be easier. Are not all things which have opposites generated out of their opposites? I mean such things as good and evil, just and unjust—and there are innumerable other opposites which are generated out of opposites. And I want to show that in all opposites there is of necessity a similar alternation; I mean to say, for example, that anything which becomes greater must become greater after being less.

True.

And that which becomes less must have been once greater and then have become less.

Yes.

And the weaker is generated from the stronger, and the swifter from the slower.

Very true.

And the worse is from the better, and the more just is from the more unjust.

Of course.

And is this true of all opposites? and are we convinced that all of them are generated out of opposites?

Yes.

And in this universal opposition of all things, are there not also two intermediate processes which are ever going on, from one to the other opposite, and back again; where there is a greater and a less there is also an intermediate process of increase and diminution, and that which grows is said to wax, and that which decays to wane?

Yes, he said.

And there are many other processes, such as division and composition, cooling and heating, which equally involve a passage into and out of one another. And this necessarily holds of all opposites, even though not always expressed in words—they are really generated out of one another, and there is a passing or process from one to the other of them?

Very true, he replied.

Well, and is there not an opposite of life, as sleep is the opposite of waking?

True, he said.

And what is it?

Death, he answered.

And these, if they are opposites, are generated the one from the other, and have there their two intermediate processes also?

Of course.

Now, said Socrates, I will analyze one of the two pairs of opposites which I have mentioned to you, and also its intermediate processes, and you shall analyze the other to me. One of them I term sleep, the other waking. The state of sleep is opposed to the state of waking, and out of sleeping waking is generated, and out of waking, sleeping; and the process of generation is in the one case falling asleep, and in the other waking up. Do you agree?

I entirely agree.

Then, suppose that you analyze life and death to me in the same manner. Is not death opposed to life?

Yes.

And they are generated one from the other?

Yes.

What is generated from the living?

The dead.

And what from the dead?

I can only say in answer—the living.

Then the living, whether things or persons, Cebes, are generated from the dead?

That is clear, he replied.

Then the inference is that our souls exist in the world below?

That is true.

(etc.)

Comment author: [deleted] 20 July 2012 07:36:25PM 2 points [-]

This sounds like it might be a bit of a reverent-Western-scholar steelman such as might be taught in modern philosophy classes

That was roughly my aim, but I don't think I inserted any premises that weren't there. Did you have a complaint about the accuracy of my paraphrase? The really implausible premise there, namely that death is the opposite of life, is preserved I think.

As for reverence, why not? He was, after all, the very first person in our historical record to suggest that thinking better might make you happier. He was also an intellectualist about morality, at least sometimes a hedonic utilitarian, and held no great respect for logic. And he was a skilled myth-maker. He sounds like a man after your own heart, actually.

Comment author: thomblake 25 July 2012 08:14:40PM 2 points [-]

I think your summary didn't leave anything out, or even apply anything particularly charitable.

Comment author: thomblake 25 July 2012 08:18:57PM 0 points [-]

Esar's summary doesn't seem to be different from this, other than 1) adding the useful bit about "passed away irretrievably" and 2) yours makes it clear that the logical jump happens right at the end.

I'm actually not sure now why you consider this like "reference class tennis". The argument looks fine, except for the part where "souls exist in the world below" jumps in as a conclusion, not having been mentioned earlier in the argument.

Comment author: [deleted] 25 July 2012 08:50:02PM *  0 points [-]

The 'souls exist in the world below' bit is directly before what Eliezer quoted:

Suppose we consider the question whether the souls of men after death are or are not in the world below. There comes into my mind an ancient doctrine which affirms that they go from hence into the other world, and returning hither, are born again from the dead. Now if it be true that the living come from the dead, then our souls must exist in the other world, for if not, how could they have been born again? And this would be conclusive, if there were any real evidence that the living are only born from the dead; but if this is not so, then other arguments will have to be adduced.

Very true, replied Cebes.

Then let us consider the whole question...

But you're right that nothing in the argument defends the idea of a world below, just that souls must exist in some way between bodies.

Comment author: TheAncientGeek 04 July 2014 12:14:14PM 0 points [-]

The argument omits that living things can come from living things and dead thingsfrom dead things

Therefore, the fact that living things can come from dead things does not mean that have to in every case.

Although, if everything started off dead, they must have at some point.

So it's an argument for abiogenesis,

Comment author: bogdanb 10 July 2013 06:28:03PM *  0 points [-]

just that souls must exist in some way between bodies.

Not even that, at least in the part of the argument I’ve seen (paraphrased?) above.

He just mentions an ancient doctrine, and then claims that souls must exist somewhere while they’re not embodied, because he can’t imagine where they would come from otherwise. I’m not even sure if the ancient doctrine is meant as argument from authority or is just some sort of Chewbacca defense.

(He doesn’t seem to explicitly claim the “ancient doctrine” to be true or plausible, just that it came to his mind. It feels like I’ve lost something in the translation.)

Comment author: wedrifid 20 July 2012 12:15:37AM 5 points [-]

(2) that the actual thing is obviously more dissimilar to all the cited previous elements of the so-called reference class than all those elements are similar to each other (if they even form a natural category at all rather than having being picked out retrospectively based on similarity of outcome to the preferred conclusion);

Ok, it seems like under this definition of "reference class tennis" (particularly parts (2) and (3)) the participants must be wrong and behaving irrationality about it in order to be playing reference class tennis. So when they are either right or at least applying "outside view" considerations correctly, given all the information available to them they aren't actually playing "reference class tennis" but instead doing whatever it is that reasoning (boundedly) correctly using reference to actual relevant evidence about related occurrences is called when it isn't packaged with irrational wrongness.

With this definition in mind it is necessary to translate replies such as those here by Holden:

We seem to have differing views of how to best do what you call "reference class tennis" and how useful it can be. I'll probably be writing about my views more in the future.

Holden's meaning is, of course, not that that he argues <reference class tennis: (1), (2), (3)> is actually a good thing but rather declaring that the label doesn't apply to what he is doing. He is instead doing that other thing that is actually sound thinking and thinks people are correct to do so.

Come to think of it if most people in Holden's shoes heard Eliezer accuse them of "reference class tennis" and actually knew that he intended it with the meaning he explicitly defines here rather than the one they infer from context they would probably just consider him arrogant, rude and mind killed then write him and his organisation off as not worth engaging with.

In the vast majority of cases where I have previously seen Eliezer argue against people using "outside view" I have agreed with Eliezer, and have grown rather fond of using the phrase "reference class tennis" as a reply myself where appropriate. But seeing how far Eliezer has taken the anti-outside-view position here and the extent to which "reference class tennis" is defined as purely an anti-outside-view semantic stop sign I'll be far more hesitant to make us of it myself.

It is tempting to observe "Eliezer is almost always right when he argues against 'outside view' applications, and the other people are all confused. He is currently arguing against 'outside view' applications. Therefore, the other people are probably confused." To that I reply either "Reference class tennis!" or "F*$% you, I'm right and you're wrong!" (I'm honestly not sure which is the least offensive.)

Comment author: Eliezer_Yudkowsky 20 July 2012 12:43:07AM 5 points [-]

Which of 1, 2 and 3 do you disagree with in this case?

Edit: I mean, I'm sorry to parody but I don't really want to carefully rehash the entire thing, so, from my perspective, Holden just said, "But surely strong AI will fall into the reference class of technology used to give users advice, just like Google Maps doesn't drive your car; this is where all technology tends to go, so I'm really skeptical about discussing any other possibility." Only Holden has argued to SI that strong AI falls into this particular reference class so far as I can recall, with many other people having their own favored reference classes e.g. Hanson et. al as cited above; a strong AI is far more internally dissimilar from Google Maps and Yelp than Google Maps and Yelp are internally similar to each other, plus there are many many other software programs that don't provide advice at all so arguably the whole class may be chosen-post-facto; and I'd have to look up Holden's exact words and replies to e.g. Jaan Tallinn to decide to what degree, if any, he used the analogy to foreclose other possibilities conversationally without further debate, but I do think it happened a little, but less so and less explicitly than in my Robin Hanson debate. If you don't think I should at this point diverge into explaining the concept of "reference class tennis", how should the conversation proceed further?

Also, further opinions desired on whether I was being rude, whether logically rude or otherwise.

Comment author: Randaly 25 July 2012 04:39:40AM 7 points [-]

Viewed charitably, you were not being rude, although you did veer away from your main point in ways likely to be unproductive. (For example, being unnecessarily dismissive towards Hanson, who you'd previously stated had given arguments roughly as good as Holden's; or spending so much of your final paragraph emphasizing Holden's lack of knowledge regarding AI.)

On the most likely viewing, it looks like you thought Holden was probably playing reference class tennis. This would have been rude, because it would imply that you thought the following inaccurate things about him:

  • He was "taking his reference class and going home"
  • That you can't "have a back-and-forth conversation" with him

I don't think that you intended those implications. All the same, your final comment came across as noticeably less well-written than your post.

Comment author: Eliezer_Yudkowsky 25 July 2012 05:53:38PM 1 point [-]

Thanks for the third-party opinion!

Comment author: TimS 20 July 2012 12:47:06AM 1 point [-]

I'm confused how you thought "reference class tennis" was anything but a slur on the other side's argument. Likewise "mindkilled." Sometimes, slurs about arguments are justified (agnostic in the instant case) - but that's a separate issue.

Comment author: Sewing-Machine 19 July 2012 10:40:08PM 1 point [-]

The three distinguishing characteristics of "reference class tennis" are

Do Karnofsky's contributions have even one of these characteristics, let alone all of them?

Comment author: Eliezer_Yudkowsky 20 July 2012 12:09:48AM 2 points [-]

Empirically obviously 1 is true, I would argue strongly for 2 but it's a legitimate point of dispute, and I would say that there were relatively small but still noticeable but quite forgiveable traces of 3.

Comment author: aaronsw 04 August 2012 10:37:44AM *  0 points [-]

Then it does seem like your AI arguments are playing reference class tennis with a reference class of "conscious beings". For me, the force of the Tool AI argument is that there's no reason to assume that AGI is going to behave like a sci-fi character. For example, if something like On Intelligence turns out to be true, I think the algorithms it describes will be quite generally intelligent but hardly capable of rampaging through the countryside. It would be much more like Holden's Tool AI: you'd feed it data, it'd make predictions, you could choose to use the predictions.

(This is, naturally, the view of that school of AI implementers. Scott Brown: "People often seem to conflate having intelligence with having volition. Intelligence without volition is just information.")

Comment author: MatthewBaker 18 July 2012 04:27:27PM *  -1 points [-]

Your prospective AI plans for programmer-understandability seems very close to Starmap-AI by which I mean

It's called the Global Association Table. The points or stars represent concepts, and the lines are the links between them.

The best story I've read about a not so failed utopia involves this kind of accountability over the FAI. While I hate to generalize from fictional evidence it definitely seems like a necessary step to not becoming a galaxy that tiles over the aliens with happy faces instead of just freezing them in place to prevent human harm.