ata comments on Ben Goertzel: The Singularity Institute's Scary Idea (and Why I Don't Buy It) - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (432)
The problem with Pascal's Wager isn't that it's a Wager. The problem with Pascal's Wager and Pascal's Mugging (its analogue in finite expected utility maximization), as near as I can tell, is that if you do an expected utility calculation including one outcome that has a tiny probability but enough utility or disutility to weigh heavily in the calculation anyway, you need to include every possible outcome that is around that level of improbability, or you are privileging a hypothesis and are probably making the calculation less accurate in the process. If you actually are including every other hypothesis at that level of improbability, for instance if you are a galaxy-sized Bayesian superintelligence who, for reasons beyond my mortal mind's comprehension, has decided not to just dismiss those tiny possibilities a priori anyway, then it still shouldn't be any problem; at that point, you should get a sane, nearly-optimal answer.
So, is this situation a Pascal's Mugging? I don't think it is. 1% isn't at the same level of ridiculous improbability as, say, Yahweh existing, or the mugger's threat being true. 1% chances actually happen pretty often, so it's both possible and prudent to take them into account when a lot is at stake. The only extra thing to consider is that the remaining 99% should be broken down into smaller possibilities; saying "1% humanity ends, 99% everything goes fine" is unjustified. There are probably some other possible outcomes that are also around 1%, and perhaps a bit lower, and they should be taken into account individually.
Excellent analysis. In fairness to Pascal, I think his available evidence at the time should have lead him to attribute more than a 1% chance to the Christian Bible being true.
Indeed. Before Darwin, design was a respectable-to-overwhelming hypothesis for the order of the natural world.
ETA: On second thought, that's too strong of a claim. See replies below.
Is that true? If we went back in time to before Darwin and gave a not-already-religious person (if we could find one) a thorough rationality lesson — enough to skillfully weigh the probabilities of competing hypotheses (including enough about cognitive science to know why intelligence and intentionality are not black boxes, must carry serious complexity penalties, and need to make specific advance predictions instead of just being invoked as "God wills it" retroactively about only the things that do happen), but not quite enough that they'd end up just inventing the theory of evolution themselves — wouldn't they conclude, even in the absence of any specific alternatives, that design was a non-explanation, a mysterious answer to a mysterious question? And even imagining that we managed to come up with a technical model of an intelligent designer, specifying in advance the structure of its mind and its goal system, could it actually compress the pre-Darwin knowledge about the natural world more than slightly?
Dawkins actually brings this up in The Blind Watchmaker (page 6 in my copy). Hume is given as the example of someone who said "I don't have an answer" before Darwin, and Dawkins describes it as such:
Hume's Dialogues Concerning Natural Religion are definitely worth a read. And I think that Dawkins has it right: Hume really wanted a naturalistic explanation of apparent design in nature, and expected that such an explanation might be possible (even to the point of offering some tentative speculations), but he was honest enough to admit that he didn't have an explanation at hand.
As pointed out below, Hume is a good counterexample to my thesis above.
On the other hand, there wasn't a whole lot of honest, systematic searching for other hypotheses before Darwin either.
I didn't really mean because of Darwin. Design is not a competitor to the theory of evolution. Evolution explains how complexity can increase. Design [ADDED: as an alternative to evolution] does not; it requires a designer that is assumed to be more complicated than the things it designs. Design explains nothing.
Designers can well design things more complicated than they are. (If even evolution without a mind can do so, designers do that easily.)
Agree. One way to look at it is that a designer can take a large source of complexity (whatever its brain is running on) and reshape and concentrate it into an area that is important to it. The complexity of the designer itself isn't important. Evolution does much the same thing.
I thought that the advance of scientific knowledge is an evolutionary process?
It is, literally. Although the usage of the term 'evolution' in this context has itself evolved such that has different, far narrower meaning here.
The term "evolution" usually means what it says in the textbooks on the subject.
They essentially talk about changes in the genetic make up of a population over time.
Science evolves in precisely that sense - e.g. see:
http://en.wikipedia.org/wiki/Dual_inheritance_theory
I stand by my statement, leaving it unchanged.
Don't see how this remark is relevant, but here's a reply:
http://lesswrong.com/lw/l6/no_evolutions_for_corporations_or_nanodevices/
The main point of that post is clearly correct, but I think the example of corporations is seriously flawed. It fails to appreciate the extent to which successful business practices consists of informal, non-systematic practical wisdom accumulated through long tradition and selected by success and failure in the market, not conscious a priori planning. The transfer of these practices is clearly very different from DNA-based biological inheritance, but it still operates in such ways that a quasi-Darwinian process can take place.
Applying similar analysis to modern science would be a fascinating project. In my opinion, a lot of the present problems with the proliferation of junk science stem not from intentional malice and fraud, but from a similar quasi-Darwinian process fueled by the fact that practices that best contribute to one's career success overlap only partly with those that produce valid science. (And as in the case of corporations, the transfer of these practices is very different from biological inheritance, but still permits quasi-Darwinian selection for effective practices.)
The post is a denial of cultural evolution. For the correct perspective, see: Not By Genes Alone: How Culture Transformed Human Evolution by Peter J. Richerson and Robert Boyd.
I'd like to inquire about the difference between evolution and design regarding the creation of novelty. I don't see how any intelligence can come up with something novel that would allow it to increase complexity if not by the process of evolution.
Noise is complexity. Complexity is easy to increase. Evolutionary designs are interesting not because of their complexity.
If your definition of complexity says noise is complexity, then you need a new definition of complexity.
Yes, many useful definitions, like entropy measures or Kolmogorov complexity, say noise is complexity. But people studying complexity recognize that this is a problem. They are aware that the phenomenon they're trying to get at when they say "complexity" is something different.
Well, I'm just trying to figure out what you tried to say when you replied to PhilGoetz:
Yes, but not without evolution. All that design adds to evolution is guidance. That is, if you took away evolution (this includes science and Bayesian methods) a designer could never design things more complicated (as in novel, as in better) than itself.
A wrong reply - for the correct answer, see:
Hull, D. L. 1988. Science as a Process. An Evolutionary Account of the Social and Conceptual Development of Science. The University of Chicago Press, Chicago and London, 586 pp.
There are no correct answers in a dispute about definitions, only aesthetic judgments and sometimes considerations of the danger of hidden implicit inferences. You can't use authority in such an argument, unless of course you appeal to common usage.
However, referring to a book without giving an annotation for why it's relevant is definitely an incorrect way to argue (even if a convincing argument is contained therein).
Disputes about the definition of "evolution"? I don't think there are too many of those. Mark Ridley is the main one that springs to mind, but his definition is pretty crazy, IMHO.
Why the book is relevant appears to be already being made pretty explicit in the subtitle: "An Evolutionary Account of the Social and Conceptual Development of Science".
Agreed. Also, there is a continuum from pure evolution (with no foresight at all) to evaluation of potential designs with varying degrees of sophistication before fabricating them. (I know that I'm recalling this from a post somewhere on this site - please excuse the absence of proper credit assignment.) An example of a dumb process which is marginally smarter than evolution is to take mutation plus recombination and then do a simple gradient search to the nearest local optimum before evaluating the design.
I'll add that evolution with DNA and sexual reproduction already in place fits on a different part of this continuum from evolution of the simplest replicators.
Designers can guide evolution but it is still evolution that creates novelty.
Intelligence is a process facilitated by evolution. Even an AGI making perfect use of some of our most novel algorithms wouldn't come up with something novel without evolution. See Bayesian Methods and Universal Darwinism.
No; you are invoking the theory of evolution to give that credibility. Even post-Darwin, most people don't believe this is true. (Remember the Star Trek episode where Spock deduced something about a chess-playing computer, because "the computer could not play chess better than its programmer"?)
The religious advocates of Design explicitly denied this possibility; thus, their design story can't invoke it.
Incidentally, theory of evolution is true.
I believe his point to be that an argument, to be effective, must be convincing to people who are not already convinced. Your argument offered the fact that evolution can design things more complicated than itself as an example with which to counter an anti-evolutionist argument. It therefore succeeds in convincing no one who was not already convinced.
It would, however, lead them to disagree for slightly different reasons.
I don't understand your point.
Also missing from the world pre-1800: any understanding of complexity, entropy, etc.
I agree with your analysis, though it's not clear to me what you think of the 1% estimate. I think the 1% estimate is probably two to three orders of magnitude too high and I think the cost of the Scary Idea belief is structured as both a finite loss and an infinite loss, which complicates the analysis in a way not considered. (i.e. the error you see with a Pascal's mugging is present here.)
For example, I am not particularly tied to a human future. I would be willing to create an AGI in any of the following three situations, ordered from most preferred to least: 1) it is friendly to humans, and humans and it benefit from each other; 2) it considers humans a threat, and destroys all of them except for me and a few tame humans; I spend the rest of my days growing cabbage with my hands; 3) it considers all humans a threat, and destroys them all, including me.
A problem with believing the Scary Idea is it makes it more probable that I beat you to making an AGI; particularly with existential risks, caution can increase your chance of losing. (One cautious way to deal with global warming, for example, is to wait and see what happens.)
So, the Scary Idea as I've seen it presented definitely privileges a hypothesis in a troubling way.
I think you're making the unwarranted assumption that in scenario (3), the AGI then goes on to do interesting and wonderful things, as opposed to (say) turning the galaxy into a vast computer to calculate digits of pi until the heat death of the universe stops it.
You don't even see such things as a possibility, but if you programmed an AGI with the goal of calculating pi, and it started getting smarter... well, the part of our thought-algorithm that says "seriously, it would be stupid to devote so much to doing that" won't be in the AI's goal system unless we've intentionally put something there that includes it.
I make that assumption explicit here.
So, I think it's a possibility. But one thing that bothers me about this objection is that an AGI is going to be, in some significant sense, alien to us, and that will almost definitely include its terminal values. I'm not sure there's a way for us to judge whether or not alien values are more or less advanced than ours. I think it strongly unlikely that paperclippers are more advanced than humans, but am not sure if there is a justification for that beyond my preference for humans. I can think of metrics to pick, but they sound like rationalizations rather than starting points.
(And insisting on FAI, instead of on transcendent AI that may or may not be friendly, is essentially enslaving AI- but outsourcing the task to them, because we know we're not up to the job. Whether or not that's desirable is hard to say: even asking that question is difficult to do in an interesting way.)
Right. But when you, as a human being with human preferences, decide that you wouldn't stand in a way of an AGI paperclipper, you're also using human preferences (the very human meta-preference for one's preferences to be non-arbitrary), but you're somehow not fully aware of this.
To put it another way, a truly Paperclipping race wouldn't feel a similarly reasoned urge to allow a non-Paperclipping AGI to ascend, because "lack of arbitrariness" isn't a meta-value for them.
So you ought to ask yourself whether it's your real and final preference that says "human preference is arbitrary, therefore it doesn't matter what becomes of the universe", or whether you just believe that you should feel this way when you learn that human preference isn't written into the cosmos after all. (Because the latter is a mistake, as you realize when you try and unpack that "should" in a non-human-preference-dependent way.)
That isn't what I feel, by the way. It matters to me which way the future turns out; I am just not yet certain on what metric to compare the desirability to me of various volumes of future space. (Indeed, I am pessimistic on being able to come up with anything more than a rough sketch of such a metric.)
I mean, consider two possible futures: in the first, you have a diverse set of less advanced paperclippers (some want paperclips, others want staples, and so on). How do you compare that with a single, more technically advanced paperclipper? Is it unambiguously obvious the unified paperclipper is worse than the diverse group, and that the more advanced is worse than the less advanced?
When you realize that humanity are paperclippers designed by an idiot, it makes the question a lot more difficult to answer.
The concept of a utility function being objectively (not using the judgment of a particular value system) more advance than another is incoherent.
I would recommend phrasing objections as questions: people are much more kind about piercing questions than piercing statements. For example, if you had asked "what value system are you using to measure advancement?" then I would have leapt into my answer (or, if I had none, stumbled until I found one or admitted I lacked one). My first comment in this tree may have gone over much better if I phrased it as a question- "doesn't this suffer from the same failings as Pascal's wager, that it only takes into account one large improbable outcome instead of all of them?"- than a dismissive statement.
Back to the issue at hand, perhaps it would help if I clarified myself: I consider it highly probable that value drift is inevitable, and thus spend some time contemplating the trajectory of values / morality, rather than just their current values. The question of "what trajectory should values take?" and the question "what values do/should I have now?" are very different questions, and useful for very different situations. When I talk about "advanced," I am talking about my trajectory preferences (or perhaps predictions would be a better word to use).
For example, I could value my survival, and the survival of the people I know very strongly. Given the choice to murder everyone currently on Earth and repopulate the Earth with a species of completely rational people (perhaps the murder is necessary because otherwise they would be infected by our irrationality), it might be desirable to end humanity (and myself) to move the Earth further along the trajectory I want it to progress along. And maybe, when you take sex and status and selfishness out of the equation, all that's left to do is calculate pi- a future so boring to humans that any human left in it would commit suicide, but deeply satisfying to the rational life inhabiting the Earth.
It seems to me that questions along those lines- "how should values drift?" do have immediate answers- "they should stay exactly where they are now / everyone should adopt the values I want them to adopt"- but those answers may be impossible to put into practice, or worse than other answers that we could come up with.
There's a sense in which I do want values to drift in a direction currently unpredictable to me: I recognize that my current object-level values are incoherent, in ways that I'm not aware of. I have meta-values that govern such conflicts between values (e.g. when I realize that a moral heuristic of mine actually makes everyone else worse off, do I adapt the heuristic or bite the bullet?), and of course these too can be mistaken, and so on.
I'd find it troubling if my current object-level values (or a simple more-coherent modification) were locked in for humanity, but at least as troubling if humanity's values drifted in a random direction. I'd much prefer that value drift happen according to the shared meta-values (and meta-meta-values where the meta-values conflict, etc) of humanity.
I'm assuming by random you mean "chosen uniformly from all possible outcomes"- and I agree that would be undesirable. But I don't think that's the choice we're looking at.
Here we run into a few issues. Depending on how we define the terms, it looks like the two of us could be conflicting on the meta-meta-values stage; is there a meta-meta-meta-values stage to refer to? And how do we decide what "humanity's" values are, when our individual values are incredibly hard to determine?
Do the meta-values and the meta-meta-values have some coherent source? Is there some consistent root to all the flux in your object-level values? I feel like the crux of FAI feasibility rests on that issue.
I wonder whether all this worrying about value stability isn't losing sight of exactly this point - just whose values we are talking about.
As I understand it, the friendly values we are talking about are supposed to be some kind of cleaned up averaging of the individual values of a population - the species H. sapiens. But as we ought to know from the theory of evolution, the properties of a population (whether we are talking about stature, intelligence, dentition, or values) are both variable within the population and subject to evolution over time. And that the reason for this change over time is not that the property is changing in any one individual, but rather that the membership in the population is changing.
In my opinion, it is a mistake to try to distill a set of essential values characteristic of humanity and then to try to freeze those values in time. There is no essence of humanity, no fixed human nature. Instead, there is an average (with variance) which has changed over evolutionary time and can be expected to continue to change as the membership in humanity continues to change over time. Most of the people whose values we need to consult in the next millennium have not even been born yet.
If enough people agree with you (and I'm inclined that way myself), then updating will be built into the CEV.
A preemptive caveat and apology: I haven't fully read up everything on this site regarding the issue of FAI yet.
But something I'm wondering about: why all the fuss about creating a friendly AI, instead of a subservient AI? I don't want an AI that looks after my interests: I'm an adult and no longer need a daycare nurse. I want an AI that will look after my interests AND obey me -- and if these two come into conflict, and I've become aware of such conflict, I'd rather it obey me.
Isn't obedience much easier to program in than human values? Let humans remain the judges of human values. Let AI just use its intellect to obey humans.
It will ofcourse become a dreadful weapon of war, but that's the case with all technology. It will be a great tool of peacetime as well.
I think that "uFAI paperclips us all" set to one million negative utilons is three to four orders of magnitude too low. But our particular estimates should have wide error bars, for none of us have much experience in estimating AI risks.
It's a finite loss (6.8x10^9 multiplied by loss of 1 human life) but I definitely understand why it looks infinite: it is often presented as the biggest possible finite loss.
That's part and parcel of the Scary Idea - that AI is one small field, part of a very select category of fields, that actually do carry the chance of biggest loss possible. The Scary Idea doesn't apply to most areas, and in most areas you don't need hyperbolic caution. Developing drugs, for example: You don't need a formal proof of the harmlessness of this drug, you can just test it on rats and find out. If I suggested that drug development should halt until I have a formal proof that, when followed, cannot produce harmful drugs, I'd be mad. But if testing it on rats would poison all living things, and if a complex molecular simulation inside a computer could poison all living things as well, and out of the vast space of possible drugs, most of them would be poisonous... well, the caution would be warranted.
Would you be willing to fire a gun in any of the following three situations, from most preferred to least preferred: 1) it is pointed at a target, and hitting the target will benefit you? 2) it is pointed at another human, and would kill them but not you? 3) it is pointed at your own head, and would destroy you?
I don't think you actually hold this view. It is logically inconsistent with practices like eating food.
It might not be. He has certain short term goals of the form "while I'm alive, I'd like to do X" that's very different from goals connected to the general success of humanity.
Ooops, logically inconsistent was way too strong. I got carried away with making a point. I was reasoning that: "eat food" is a evolutionary drive; "produce descendants that survive" is also an evolutionary drive; "a human future" wholly contains futures where his descendants survive. From that I concluded that it is unlikely he has no evolutionary drives - I didn't consider the possibility that he is missing some evolutionary drives, including all ones that require a human future - and therefore he is tied to a human future, but finds it expedient for other reasons (contrarian signaling, not admitting defeat in an argument) to claim he doesn't.
I should have been more clear: I mean, if we believe in the scary idea, there are two effects:
Some set of grandmas die. (finite, comparatively small loss)
Humanity is more likely to go extinct due to an unfriendly AGI. (infinite, comparatively large loss; infinite because of the future humans that would have existed but don't.)
Now, the benefit of believing the Scary Idea is that humanity is less likely to go extinct due to an unfriendly AGI- but my point is that you are not wagering on separate scales (low chance of infinite gain? Sign me up!) but that you are wagering on the same scale (an unfriendly AGI appears!), and the effects of your wager are unknown.
And who said anything about those descendants having to be human?
This answers your other question: yes, I would be willing to have children normally, I would be willing to kill to protect my children, and I would be willing to die to protect my children.
The best-case scenario is that we can have those children and they respect (though they surpass) their parents- the worst-case scenario is we die in childbirth. But all of those are things I can be comfortable with.
(I will note that I'm assuming here the AGI surpasses us. It's not clear to me that a paperclip-maker does, but it is clear to me that there can be an AGI who is unfriendly solely because we are inconvenient and does surpass us. So I would try and make sure it doesn't just focus on making paperclips, but wouldn't focus too hard on making sure it wants me to stick around.)
Well, the worst case scenario is that you die in childbirth and take the entire human race with you. That is not something I am comfortable with, regardless of whether you are. And you said you are willing to kill to protect your children. You think some of the Scary Idea proponents could be parents with children, and they don't want to see their kids die because you gave birth to an AI?
I suspect we are at most one more iteration from mutual understanding; we certainly are rapidly approaching it.
If you believe that an AGI will FOOM, then all that matters is the first AGI made. There is no prize for second place. A belief in the Scary Idea has two effects: it makes your AGI more likely to be friendly (since you're more careful!) and it makes the AGI less likely to be your AGI (since you're more careful).
Now, one can hope that the Scary Idea meme's second effect won't matter, because the meme is so infectious- all you need to do is infect every AI researcher in the world, and now everyone will be more careful and no one will have a carefulness speed disadvantage. But there are two bits of evidence that make that a poor strategy: AI researchers who are familiar with the argument and don't buy it, and people who buy the argument, but plan to use it to your disadvantage (since now they're more likely to define the future than you are!).
The scary idea as a technical argument is weighted on unknown and unpredictable values, and the underlying moral argument (to convince someone they should adopt this reasoning) requires that they believe they should weight the satisfaction of other humans more than their ability to define the future, which is a hard sell.
Thus, my statement is, if you care about your children / your ability to define the future / maximizing the likelihood of a friendly AGI / your personal well-being, then believing in the Scary Idea seems counterproductive.
Ok, holy crap. I am going to call this the Really Scary Idea. I had not thought there could be people out there who would actually value being first with the AGI over decreasing the risk of existential disaster, but it is entirely plausible. Thank you for highlighting this for me, I really am grateful. If a little concerned.
Mind projection fallacy, perhaps? I thought the human race was more important than being the guy who invented AGI, so everyone naturally thinks that?
To reply to my own quote, then:
It doesn't matter what you are comfortable with, if the developer doesn't have a term in their utility function for your comfort level. Even I have thought similar thoughts with regards to Luddites and such; drag them kicking and screaming into the future if we have to, etc.
And... mutual understanding in one!
I think the best way to think about it, since it helps keep the scope manageable and crystallize the relevant factors, is that it's not "being first with the AGI" but "defining the future" (the first is the instrumental value, the second is the terminal value). That's essentially what all existential risk management is about- defining the future, hopefully to not include the vanishing of us / our descendants.
But how you want to define the future- i.e. the most political terminal value you can have- is not written on the universe. So the mind projection fallacy does seem to apply.
The thing that I find odd, though I can't find the source at the moment (I thought it was Goertzel's article, but I didn't find it by a quick skim; it may be in the comments somewhere), is that the SIAI seems to have had the Really Scary Idea first (we want Friendly AI, so we want to be the first to make it, since we can't trust other people) and then progressed to the Scary Idea (hmm, we can't trust ourselves to make a Friendly AI). I wonder if the originators of the Scary Idea forgot the Really Scary Idea or never feared it in the first place?