DSimon comments on David Chalmers' "The Singularity: A Philosophical Analysis" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (202)
I think I see what you're saying; just as we reflect on our desires and try to understand how they tick and where, biologically and historically and culturally, they come from, so also might any AI.
However, the thing about it is: that doesn't actually change those values. For example (and despite the dire warnings of some creationists), despite the fact that we now understand that our value system is a consequence of an evolutionary algorithm, we haven't actually started valuing evolutionary goals over our own built-in goals. For example, contraception is popular even though it's quite silly from the perspective of gene propogation.
Similarly, a paperclip-maximizer might well be interested in figuring out why its utility function is what it is, so that it may better understand the world it lives in... but that's not going to change its overriding and primary interest in making paperclips.
It seems as though that sometimes triggers intimate pair-bonding activities while reducing your exposure to STDs. Use of condoms is often not remotely silly from that perspective - IMHO.
The example still works since there are quite a few couples who use condoms because they just don't want to have kids. They don't have any worry about STDs from their partner. If you insist on a clear cut case look at men who get vasectomies.
The idea that use of contraception is "silly" from the perspective of gene propagation seem just wrong to me. There are plenty of cases where it would make sense for those who want to spread their genes around to agree to use contraceptives. Contraceptive use makes sense sometimes, and not others.
It could be claimed that the average effect of contraception on genes is negative - but that seems to be a whole different thesis.
Tim, do you agree that there exist couples who plan to never have children and use contraception to that end?
Sure. Surely we are not disagreeing here. The original comment was:
My position is just that contraception has a perfectly reasonably place for gene propogators. The idea that contraception is always opposed to your genetic interests is wrong. Lack of contraception can easily result in things like this - which really doesn't help. That using contraception is "silly" from a genetic perspective is a popular myth.
I'm not sure if we are. The fact that contraception might have a reasonable place for gene propagators is not the issue. The point is that much, and possibly the vast majority, of contraceptive use is contrary to the goals of gene propagation.
Not really. Remember, evolution doesn't care about your happiness. Indeed, regarding the example you linked to, from an evolutionary perspective,a one night stand with all the protection is utterly useless. It is very likely in that male's evolutionary advantage to not use condoms.
And even if you don't agree with the condom example the other example, of a people engaging in a generally irreversible or difficult to reverse operation which renders them close to sterile is pretty clearly against the interest of gene propagation.
Humans evolved in a context where we didn't have easy contraception and the best humans could do to prevent contraception was things like coitus interruptus. It shouldn't surprise you that evolution has not made human instincts catch up with modern technologies.
One might think that from an evolutionary perspective it makes sense to substantially delay or reduce offspring number so as to invest maximum resources in a small number of offspring. But humans in the developed world now reside in a situation with low disease rates and lots of resources, so that strategy is sub-optimal from an evolutionary perspective. Look at how charedi(ultra-orthodox) Jews and the Amish are two of the fastest growing populations in the United States.
I can see what you think the issue is. What I don't see is where in the context you are getting that impression from.
Your example is stacked to favour your conclusion. What you need to try and do in order to understand my position is to think about an example that favours my conclusion.
So: get rid of the one-night stand, and imagine that the girl is desirable - that having safe sex with her looks like the best way to initiate a pair-bonding process leading to the two of you having some babies together - and that the alternative is rejection, and her walking off and telling her friends what a jerk you are when it comes to protecting your girl.
In the modern context, if you impregnate someone without planning it out properly, there's a non-negligible chance they'll get an abortion, which is even worse for gene propagation. Furthermore, parents are to some extent legally responsible for their children's actions, so having too many poorly-regulated kids running around means exposing yourself to liability. A big part of the optimal strategy for present-day long-term reproductive success is to get rich, and a big part of getting rich is not having more kids than you can keep track of.
In terms of genetic success, having more kids than you can keep track of is pretty much the ideal, as long as all or at least most survive to reproductive adulthood.
I think that's a retcon. People use contraception so they can have more sex than they would if they had to worry about having kids every time. They may or may not rationalise further, I suspect that generally they don't.
But some people consciously choose never to have any kids. That's silly from the perspective of gene propagation if anything is.
Right, but as far as I can tell without having put lots of hours into trying to solve the problem of clippyAI, it's really damn hard to precisely specify a paperclip. (There are things that are easier to specify that this argument doesn't apply to and that are more plausibly dangerous, like hyperintelligent theorem provers...) Thus in trying to figure out what it's utility function actually is (like what humans are doing as they introspect more) it could discover that the only reason its goal is (something mysterious like) 'maximize paperclips' is because 'maximize paperclips' was how humans were (probabilistically inaccurately) expressing their preferences in some limited domain. This is related to the theme Eliezer quite elegantly goes on about in Creating Friendly AI and that he for some reason barely mentioned in CEV, which is that the AI should look at its own source code as evidence of what its creators were trying to get at, and update its imperfect source code accordingly. Admittedly, most uFAIs probably won't be that sophisticated, and so worrying about AI-related existential risks is still definitely a big deal. We just might want to be a little more cognizant of potential motivations for people who disagree with what has recently been dubbed SIAI's 'scary idea'.
Hm. I suppose that's possible, though it would require that the AI be given a utility function that's specifically meant to be amenable to that kind of revision.
Under the most straightforward (i.e. not CEV-style) utility function design, fuzziness in its definition of "paperclip" would just drive the paperclip-maximizer to choose the possible definition that yields the highest utility score.
To pick a different silly example, a dog-maximizer with a utility function based on the number of dogs in the universe would simply prefer to tile the solar system with tiny Chihuahas rather than Great Danes; the whole range of "dog" definitions fit the function, so it just chooses the one that is most convenient for maximum utility. It wouldn't try to resolve it by trying to decide which definition is more in line with the designer's ideals, unless "consider the designer's ideals" were designed into the system from the start.
Is designing "consider the designer's ideals" in an AI difficult?
Currently expected to be difficult, since we don't know of an easy way to do so. That it'll turn out to be easy (in the hindsight) is not totally out of the question.
Has anyone considered approaching this problem in the same way we might approach "read the user's handwriting"? That is, the task is not one we program the AI to accomplish - instead, we train the AI to accomplish it. And, most importantly, we train the AI to ask for further clarification in ambiguous cases.
Mirrors and Paintings (yes, you want to point your program at the world and have it figure out what you referred to), The Hidden Complexity of Wishes (if you need to answer AI's question or give it instructions, you're doing something wrong and it won't work).
I have to admit, as someone who has worked in software testing, I find it difficult to take the suggestion (non-destructive full-brain scan) in the first link very seriously. How, exactly, do I become convinced that the AI can come to know more about what I want by scanning me than I can know by introspection? How can I (or it) even do a comparison between the two without it asking me questions?
But then we get down to doing the comparison. The AI informs me that what I really want is to kill my father and sleep with my mother. I deny this. Do we take this as evidence that the AI really does know me better than I know myself, or as a symptom of a bug?
I would argue that if you don't need to answer the AI's questions or give it instructions, you're doing something wrong and it won't work. By definition. At least for the first ten thousand scans or so. And even then there will remain questions on which the AI and introspection would deliver different answers. Questions with hidden complexity. I just don't see how anyone would trust a CEV extrapolated from brain scans until we had decades of experience suggesting that scanning and modeling yields better results than introspection.
Agreed. And any useful AI will have to understand human language to do or learn much anything of value.
The detailed analysis of full brain scanning tech I've seen puts it far into the future, well beyond human-level AGI.
You have to make sure AI predictably gives a better answer even on questions where you disagree. And there will be questions which can't even be asked of a human.
Irrelevant. Assume you magically have a perfect working simulation of yourself.
Why would I want to do that? I.e. how would making that assumption lead me to take Eliezer's suggestion more seriously? My usual practice is to take things less seriously when magic is involved.
And how does this assumption interact with your other comment stating that I have to make sure the AI is somehow even better than myself if there is any difference between simulation and reality? Haven't you just asked me to assume that there are no differences?
Sorry, I simply don't understand your responses, which suggests to me that you did not understand my comment. Did you notice, in my preamble, that I mentioned software testing? Perhaps my point may be clearer to you if you keep this preamble in mind when formulating your responses.
Relevant - Can we just assume you magically have a friendly AI then?
If the plan for creating a friendly AI depends on a non-destructive full-brain scan already being available, the odds of achieving friendly AI before other forms of AI vanish to near zero.
This is the straightforward approach.
Once you have an AGI that has the cognitive capability and learning capacity of a human infant brain, you teach it everything else in human language - right/wrong, ethics/morality, etc.
Programming languages are precise and well suited for creating the architecture itself, but human languages are naturally more effective for conveying human knowledge.
I tend to agree that we need a natural language interface to the AI. But it is far easier to create automatic proofs of program correctness when the really important stuff (like ethics) is presented in a formal language equipped with a deductive system.
There is something to be said for treating all the natural language input as if it were testimony from unreliable witnesses - suitable, perhaps, for locating hypotheses, but not really suitable as strong evidence for accepting the hypotheses.
I'm not sure how this applies - can you formally prove the correctness of a probabilistic belief network? Is that even a valid concept?
I can understand how you can prove a formal deterministic circuit or the algorithms underlying the belief network and learning systems, but the data values?
Agree. That is why I suggest that the really important stuff - meta-ethics, epistemology, etc., be represented in some other way than by 'neural' networks. Something formal and symbolic, rather than quasi-analog. All the stuff which we (and the AI) need to be absolutely certain doesn't change meaning when the AI "rewrites its own code"
There are some promising lines of attack (grounded in decision theory) that might take only a few years of research. We'll see where they lead. Other open problems in FAI might start looking very solvable if we start making progress on this front.
Show me.
PM'd.
Yes. :)
Yes, but it still has to be explicitly programmed to do that! The question is how to get it to do so. AFAIK shaper-anchor semantics is still quite a ways from being fully specified, but it seems the bigger obstacle is that an AI writer is less likely than not to take the effort to program it that way in the first place.
This is surely the kind of thing that superintelligences will be good at. They will have access to every paperclip picture on the net, every paperclip specification too. They will surely have a much clearer idea about what a paperclip is than humans do. They will know what boxes are too.
I made a stab at it here, and it got some upvotes. So here's a repost:
Make a wire, 10 cm long and 1mm in diameter, composed of an alloy of 99.8% iron and 0.2% carbon. Start at one end and bend it such that the segments from 2-2.5cm, 2.75-3.25cm, 5.25-5.75cm form half-circles, with all the bends in the same direction and forming an inward spiral (the end with the first bend is outside the third bend).
(Please let me know if reposting violates LW ettiquette so I know not to do it again.)
I don't think it violates LW etiquette.
Here's a sort of fully general counterargument against proposals to naturalize human concepts in AI terms: if you can naturalize human concepts, you should be able to naturalize the human concept of a box. And if you can do that, we can build Oracle AI and save the world. It's very easy to describe what we mean by 'stay in the box', but it turns out that seed (self-modifying!) AIs just don't have a natural ontology for the descriptions.
This argument might be hella flawed; it seems kind of tenuous.
Aren't you simply assuming that the world is doomed here? It sure looks like it!
Since when is that assumption part of a valid argument?
That assumption isn't really a core part of the argument... the general "if specifying human concepts is easy, then come up with a plan for making a seed AI want to stay in a box" argument still stands, even if we don't actually want to keep arbitrary seed AIs in boxes.
For the record I am significantly less certain than most LW or SIAI singularitarians that seed AIs not explicitly coded with human values in mind will end up creating a horrible future, or at least a more horrible future than something like CEV. I do think it's worth a whole lot of continued investigation.
Sure it does. A devout priest spends half his life celibate and serving God. One day he has a crisis, reads a bunch of stuff on the internet and suddenly realizes he doesn't believe in God. His values change.
Even this is questionable. I suspect any concept of universal morality must be evolutionary. This certainly is a widespread concept in systems/transhumanist/singularitan/cosmist thought. We do value evolution in and of itself.
It's probably possible in principle to build such an AI - it would probably need some sort of immutable hard-coded paperclip recognition module which it could evaluate potential simulated futures generated from the more complex general intelligence system.
If such a thing developed to a human level or beyond and could reflect on it's cognition, it may explain in lucid detail how futures filled with paperclips were good and others were evil.
It could even understand that it's concepts of morality and good/bad were radically different than those of humans, and it would even understand that this difference relates to it's hard coded paper-clip recognizer, and it would explain in detail how this architecture was superior to human value systems .. because it helped to maximize expected future paperclips.
It could even write books such as "Paperclip Morality: the Truth".
But just because such a thing is possible in principle doesn't make it the slightest bit likely.
If you can build an AGI that can understand human language, it would be much easier and considerably more effective to make the AGI's goal system dynamically modifiable on reflection through human language.
Instead of having a special hard-coded circuit to evaluate the utility of potential futures, you could just have the general conceptual circuitry handle this. The concept of 'good' would still be somewhat special in it's role in the goal system itself, but the 'goodness recognizer' could change and evolve over time.
Well, the counter-argument to that particular example would be that the priest's belief in God wasn't a terminal value; rather, their goals of being happy and helping other people and understanding the universe were. Believing in and obeying God were just instrumental values.
However, agreed that there's nothing in particular forcing people, weird and funky and clunky as our minds are, from always having the same fixed terminal values either. To pick an extreme example, peoples' brains can sometimes be messed up severely by hormonal imbalances, which can in turn cause people to do such drastically anti-own-terminal-value things as committing suicide.
I should've been more specific and just said that, in general, understanding evolutionary psychology never or only very rarely causes peoples' terminal values to change.
Human morality is a product of evolution; however, our morality is not itself an evolutionary algorithm execution mechanism. It's kind of a vague approximation of one (in that all the moralities that sucked for fitness were selected against), but it still often leads to drastically different results than a straight-up evolutionary fitness maximization algorithm with access to our brains' resources would.
For example: I intend never to have biological children and consider this decision to be a moral one. However, from an evolutionary perspective, deliberately preventing my own genes from propagating is just plain silly.
Yes, the paper-clip maximizer is just a whimsical example. However, similarly Really Unfriendly optimizers are quite plausible. Imagine the horrors that could result from a naive human-happiness-maximizer hitting the singularity asymptote.
Yes, that would be important, but it still wouldn't be enough to solve the problem; in fact, the really hard part of the problem still remains! The happiness-maximizer might base its understanding of happiness on descriptive human usage of the word, and end up with a truly thorough and consistent understanding of the word... and then still turn everybody into nearly mindless wireheads.
Our morality engines and our language aren't properly tuned for dealing with the kind of reality-bending power a superintelligent entity would have.
You may not have liked that particular example, but I think you are in agreement that terminal values change.
Just to make sure though, a few more examples:
I don't find such maximizers significantly plausible at even the human level intelligence. Possible in principle? Sure. But if you look at realistic, plausible routes to AGI it becomes clear that an AGI necessarily will be programmed in human languages and will pick up human cultural programs.
And finally, even if it was plausible that a flawed design could hit the singularity asymptote, that itself might only be a big problem if it had a short planning horizon.
It seems that all superintelligences with infinite planning horizons become behaviorally indistinguishable. All long-term value systems converge on a single universal attractor - they become cosmists.
That is what I mean when I said "any concept of universal morality must be evolutionary".
That would again be assuming humans capable of building a superhuman AGI but asinine enough to attempt to somehow hardcode it's goal system, instead of making it open-ended dynamic as a human's.
How would you build a happiness maximizer and fix the value of happiness? The meaning of a word in a human brain is stored as huge set of associate weights that anchor it in a massive distributed belief network. The exact meaning of each word changes over time as the network learns and reconfigures itself - no concept is quite static. So for an AGI to understand the word in the same way we do, the word's meaning is always subject to some drift. And this is a good thing.
I think we may be experiencing some terminology confusion here. Just to be clear, you realize that these are all not terminal values, right?
Here's the big issue: if it's open-ended, how do we keep it from drifting off somewhere terrible? The system that guides that seems to be the largest potential risk point of the approach you describe.
I'm very confused by this; can you go into more detail about why you think this is so? In particular, why would it be true for all long-term value systems (including flawed and simplistic value systems), and not just a very small subset?
No. What is a terminal value? That which stimulates the planning reward circuit in the human nucleus accumbens? I'm not sure I buy into the concept.
The point of value or preferences from the perspective of intelligence is to rate potential futures.
We are open-ended! Our future-preferences depend on and our intertwined with our knowledge. So any superintelligence or evolutionary accelerator we create will also need to be open-ended, or it wouldn't be protecting our dynamic core.
I discussed some of this in my first, somewhat hasty, LW post here. A few others here have mentioned a similar idea, I may write more about it as I find it interesting.
Basically, if your planning horizon extends to infinity you will devote all of your resources towards expanding your net intelligence for the long term future, regardless of what your long term goals are.
So no matter whether your long term goal is to maximize paper-clips, human happiness or something more abstract, in each case this leads to an identical outcome for the foreseeable future: a local computational singularity with an exponentially expanding simulated metaverse.
There is some speculation within physics that black hole like singularities can create new physical universes through inflation. If this is true than the long term goals of a superintelligence are best served by literally creating new physical multiverses that have more of the desirable space-time properties.
No, what I'm referring to is also known as an intrinsic value. It's a value that is valuable in and of itself, not in justification for some other value. A non-terminal value is commonly referred to as an instrumental value.
For example, I value riding roller-coasters, and I also value playing Dance Dance Revolution. However, those values are expressible in terms of another, deeper value, the value I place on having fun. That value may in turn be thought of as an instrumental value of a yet deeper value: the value I place on being happy moment-to-moment.
If you were going to implement your own preference function as a Turing machine, trying to keep the code as short as possible, the terminal values would be the things that machine would value.
Okay, I see where you're coming from. However, from a human perspective, that's still a pretty large potential target range, and a large proportion of it is undesirable.
From the deeper perspective of computational neuroscience, the intrinsic/instrumental values reduce to cached predictions of your proposed 'terminal value' (being happy moment-to-moment), which reduces to various types of stimulations of the planning reward circuitry.
Labeling the experience of chocolate ice cream as an 'instrumental value' and the resulting moment-to-moment happiness as the real 'terminal value' is a useless distinction - it then collapses your terminal values down to the singular of 'happiness' and relabels everything worthy of discussion as 'instrumental'.
The quality of being happy moment-to-moment is anything but a single value and should not by any means be reduced to a single concept. It is a vast space of possible mental stimuli, each of which creates a unique conscious experience.
The set of mental states encompassed by "being happy moment-to-moment moment-to-moment" is vast: the gustatory pleasure of eating chocolate ice cream, the feeling of smooth silk sheets, the release of orgasm, the satisfaction of winning a game of chess, the accomplishment of completing a project, the visual experience of watching a film, the euphoria of eureka, all of these describe entire complex spaces of possible mental states.
Furthemore, the set of possible mental states is forever dynamic, incomplete, and undefined. The set of possible worlds that could lead to different visual experiences, as just a starter example, is infinite, and each new experience or piece of knowledge itself changes the circuitry underlying the experiences and thus changes our values.
The simplest complete turing machine implementation of your preference function is an emulation of your mind. It is you, and it has no perfect simpler equivalent (although many imperfect simulations are possible).
The core of the cosmist idea is that for any possible goal evaluator with an infinite planning horizon, there is a single convergent optimal path towards that goal system. So no, the potential target range in theory is not large at all - it is singularly narrow.
As an example, consider a model universe consisting of a modified game of chess or go. The winner of the game is then free to arrange the pieces on the board in any particular fashion (including the previously dead pieces). The AI's entire goal is to make some particular board arrangement - perhaps a smily face. For any such possible goal system, all AI's play the game exactly the same at the limits of intelligence - they just play optimally. Their behaviour doesn't differ in the slightest until the game is done and they have won.
Whether the sequence of winning moves such a god would make on our board is undesirable or not from our current perspective is a much more important, and complex, question.