Reply to Holden on 'Tool AI'

94 Eliezer_Yudkowsky 12 June 2012 06:00PM

I begin by thanking Holden Karnofsky of Givewell for his rare gift of his detailed, engaged, and helpfully-meant critical article Thoughts on the Singularity Institute (SI). In this reply I will engage with only one of the many subjects raised therein, the topic of, as I would term them, non-self-modifying planning Oracles, a.k.a. 'Google Maps AGI' a.k.a. 'tool AI', this being the topic that requires me personally to answer.  I hope that my reply will be accepted as addressing the most important central points, though I did not have time to explore every avenue.  I certainly do not wish to be logically rude, and if I have failed, please remember with compassion that it's not always obvious to one person what another person will think was the central point.

Luke Mueulhauser and Carl Shulman contributed to this article, but the final edit was my own, likewise any flaws.

Summary:

Holden's concern is that "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." His archetypal example is Google Maps:

Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

The reply breaks down into four heavily interrelated points:

First, Holden seems to think (and Jaan Tallinn doesn't apparently object to, in their exchange) that if a non-self-modifying planning Oracle is indeed the best strategy, then all of SIAI's past and intended future work is wasted.  To me it looks like there's a huge amount of overlap in underlying processes in the AI that would have to be built and the insights required to build it, and I would be trying to assemble mostly - though not quite exactly - the same kind of team if I was trying to build a non-self-modifying planning Oracle, with the same initial mix of talents and skills.

Second, a non-self-modifying planning Oracle doesn't sound nearly as safe once you stop saying human-English phrases like "describe the consequences of an action to the user" and start trying to come up with math that says scary dangerous things like (he translated into English) "increase the correspondence between the user's belief about relevant consequences and reality".  Hence why the people on the team would have to solve the same sorts of problems.

Appreciating the force of the third point is a lot easier if one appreciates the difficulties discussed in points 1 and 2, but is actually empirically verifiable independently:  Whether or not a non-self-modifying planning Oracle is the best solution in the end, it's not such an obvious privileged-point-in-solution-space that someone should be alarmed at SIAI not discussing it.  This is empirically verifiable in the sense that 'tool AI' wasn't the obvious solution to e.g. John McCarthy, Marvin Minsky, I. J. Good, Peter Norvig, Vernor Vinge, or for that matter Isaac Asimov.  At one point, Holden says:

One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a "tool" and giving arguments for why AGI is likely to work only as an "agent."

If I take literally that this is one of the things that bothers Holden most... I think I'd start stacking up some of the literature on the number of different things that just respectable academics have suggested as the obvious solution to what-to-do-about-AI - none of which would be about non-self-modifying smarter-than-human planning Oracles - and beg him to have some compassion on us for what we haven't addressed yet.  It might be the right suggestion, but it's not so obviously right that our failure to prioritize discussing it reflects negligence.

The final point at the end is looking over all the preceding discussion and realizing that, yes, you want to have people specializing in Friendly AI who know this stuff, but as all that preceding discussion is actually the following discussion at this point, I shall reserve it for later.

continue reading »

Holden's Objection 1: Friendliness is dangerous

11 PhilGoetz 18 May 2012 12:48AM

Nick_Beckstead asked me to link to posts I referred to in this comment.  I should put up or shut up, so here's an attempt to give an organized overview of them.

Since I wrote these, LukeProg has begun tackling some related issues.  He has accomplished the seemingly-impossible task of writing many long, substantive posts none of which I recall disagreeing with.  And I have, irrationally, not read most of his posts.  So he may have dealt with more of these same issues.

I think that I only raised Holden's "objection 2" in comments, which I couldn't easily dig up; and in a critique of a book chapter, which I emailed to LukeProg and did not post to LessWrong.  So I'm only going to talk about "Objection 1:  It seems to me that any AGI that was set to maximize a "Friendly" utility function would be extraordinarily dangerous."  I've arranged my previous posts and comments on this point into categories.  (Much of what I've said on the topic has been in comments on LessWrong and Overcoming Bias, and in email lists including SL4, and isn't here.)

 

The concept of "human values" cannot be defined in the way that FAI presupposes

Human errors, human values:  Suppose all humans shared an identical set of values, preferences, and biases.  We cannot retain human values without retaining human errors, because there is no principled distinction between them.

A comment on this post:  There are at least three distinct levels of human values:  The values an evolutionary agent holds that maximize their reproductive fitness, the values a society holds that maximizes its fitness, and the values a rational optimizer holds who has chosen to maximize social utility.  They often conflict.  Which of them are the real human values?

Values vs. parameters:  Eliezer has suggested using human values, but without time discounting (= changing the time-discounting parameter).  CEV presupposes that we can abstract human values and apply them in a different situation that has different parameters.  But the parameters are values.  There is no distinction between parameters and values.

A comment on "Incremental progress and the valley":  The "values" that our brains try to maximize in the short run are designed to maximize different values for our bodies in the long run.  Which are human values:  The motivations we feel, or the effects they have in the long term?  LukeProg's post Do Humans Want Things? makes a related point.

Group selection update:  The reason I harp on group selection, besides my outrage at the way it's been treated for the past 50 years, is that group selection implies that some human values evolved at the group level, not at the level of the individual.  This means that increasing the rationality of individuals may enable people to act more effectively in their own interests, rather than in the group's interest, and thus diminish the degree to which humans embody human values.  Identifying the values embodied in individual humans - supposing we could do so - would still not arrive at human values.  Transferring human values to a post-human world, which might contain groups at many different levels of a hierarchy, would be problematic.

I wanted to write about my opinion that human values can't be divided into final values and instrumental values, the way discussion of FAI presumes they can.  This is an idea that comes from mathematics, symbolic logic, and classical AI.  A symbolic approach would probably make proving safety easier.  But human brains don't work that way.  You can and do change your values over time, because you don't really have terminal values.

Strictly speaking, it is impossible for an agent whose goals are all indexical goals describing states involving itself to have preferences about a situation in which it does not exist.  Those of you who are operating under the assumption that we are maximizing a utility function with evolved terminal goals, should I think admit these terminal goals all involve either ourselves, or our genes.  If they involve ourselves, then utility functions based on these goals cannot even be computed once we die.  If they involve our genes, they they are goals that our bodies are pursuing, that we call errors, not goals, when we the conscious agent inside our bodies evaluate them.  In either case, there is no logical reason for us to wish to maximize some utility function based on these after our own deaths.  Any action I wish to take regarding the distant future necessarily presupposes that the entire SIAI approach to goals is wrong.

My view, under which it does make sense for me to say I have preferences about the distant future, is that my mind has learned "values" that are not symbols, but analog numbers distributed among neurons.  As described in "Only humans can have human values", these values do not exist in a hierarchy with some at the bottom and some on the top, but in a recurrent network which does not have a top or a bottom, because the different parts of the network developed simultaneously.  These values therefore can't be categorized into instrumental or terminal.  They can include very abstract values that don't need to refer specifically to me, because other values elsewhere in the network do refer to me, and this will ensure that actions I finally execute incorporating those values are also influenced by my other values that do talk about me.

Even if human values existed, it would be pointless to preserve them

Only humans can have human values:

  • The only preferences that can be unambiguously determined are the preferences a person (mind+body) implements, which are not always the preferences expressed by their beliefs.
  • If you extract a set of consciously-believed propositions from an existing agent, then build a new agent to use those propositions in a different environment, with an "improved" logic, you can't claim that it has the same values, since it will behave differently.
  • Values exist in a network of other values.  A key ethical question is to what degree values are referential (meaning they can be tested against something outside that network); or non-referential (and hence relative).
  • Supposing that values are referential helps only by telling you to ignore human values.
  • You cannot resolve the problem by combining information from different behaviors, because the needed information is missing.
  • Today's ethical disagreements are largely the result of attempting to extrapolate ancestral human values into a changing world.
  • The future will thus be ethically contentious even if we accurately characterize and agree on present human values, because these values will fail to address the new important problems.


Human values differ as much as values can differ:  There are two fundamentally different categories of values:

  • Non-positional, mutually-satisfiable values (physical luxury, for instance)
  • Positional, zero-sum social values, such as wanting to be the alpha male or the homecoming queen

All mutually-satisfiable values have more in common with each other than they do with any non-mutually-satisfiable values, because mutually-satisfiable values are compatible with social harmony and non-problematic utility maximization, while non- mutually-satisfiable values require eternal conflict.  If you find an alien life form from a distant galaxy with non-positional values, it would be easier to integrate those values into a human culture with only human non-positional values, than to integrate already-existing positional human values into that culture.

It appears that some humans have mainly the one type, while other humans have mainly the other type.  So talking about trying to preserve human values is pointless - the values held by different humans have already passed the most-important point of divergence.

 

Enforcing human values would be harmful

The human problem:  This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving.  This is the most-important objection of all.

Re-reading this, I see that the critical paragraph is painfully obscure, as if written by Kant; but it summarizes the argument: "Once the initial symbol set has been chosen, the semantics must be set in stone for the judging function to be "safe" for preserving value; this means that any new symbols must be defined completely in terms of already-existing symbols.  Because fine-grained sensory information has been lost, new developments in consciousness might not be detectable in the symbolic representation after the abstraction process.  If they are detectable via statistical correlations between existing concepts, they will be difficult to reify parsimoniously as a composite of existing symbols.  Not using a theory of phenomenology means that no effort is being made to look for such new developments, making their detection and reification even more unlikely.  And an evaluation based on already-developed values and qualia means that even if they could be found, new ones would not improve the score.  Competition for high scores on the existing function, plus lack of selection for components orthogonal to that function, will ensure that no such new developments last."

Averaging value systems is worse than choosing one:  This describes a neural-network that encodes preferences, and takes some input pattern and computes a new pattern that optimizes these preferences.  Such a system is taken as analogous for a value system and an ethical system to attain those values.  I then define a measure for the internal conflict produced by a set of values, and show that a system built by averaging together the parameters from many different systems will have higher internal conflict than any of the systems that were averaged together to produce it.  The point is that the CEV plan of "averaging together" human values will result in a set of values that is worse (more self-contradictory) than any of the value systems it was derived from.


A point I may not have made in these posts, but made in comments, is that the majority of humans today think that women should not have full rights, homosexuals should be killed or at least severely persecuted, and nerds should be given wedgies.  These are not incompletely-extrapolated values that will change with more information; they are values.  Opponents of gay marriage make it clear that they do not object to gay marriage based on a long-range utilitarian calculation; they directly value not allowing gays to marry.  Many human values horrify most people on this list, so they shouldn't be trying to preserve them.

So You Want to Save the World

41 lukeprog 01 January 2012 07:39AM

This post is very out-of-date. See MIRI's research page for the current research agenda.

So you want to save the world. As it turns out, the world cannot be saved by caped crusaders with great strength and the power of flight. No, the world must be saved by mathematicians, computer scientists, and philosophers.

This is because the creation of machine superintelligence this century will determine the future of our planet, and in order for this "technological Singularity" to go well for us, we need to solve a particular set of technical problems in mathematics, computer science, and philosophy before the Singularity happens.

The best way for most people to save the world is to donate to an organization working to solve these problems, an organization like the Singularity Institute or the Future of Humanity Institute.

Don't underestimate the importance of donation. You can do more good as a philanthropic banker than as a charity worker or researcher.

But if you are a capable researcher, then you may also be able to contribute by working directly on one or more of the open problems humanity needs to solve. If so, read on...

continue reading »

A Rationalist's Tale

82 lukeprog 28 September 2011 01:17AM

Warning: sappy personal anecdotes ahead! See also Eliezer's Coming of Age story, SarahC's Reflections on rationality a year out, and Alicorn's Polyhacking.

On January 11, 2007, at age 21, I finally whispered to myself: There is no God.

I felt the world collapse beneath me. I'd been raised to believe that God was necessary for meaning, morality, and purpose. My skin felt cold and my tongue felt like cardboard. This was the beginning of the darkest part of my life, but the seed of my later happiness.

I grew up in Cambridge, Minnesota — a town of 5,000 people and 22 Christian churches (at the time). My father was (and still is) pastor of a small church. My mother volunteered to support Christian missionaries around the world.

I went to church and Bible study every week. I prayed often and earnestly. For 12 years I attended a Christian school that taught Bible classes and creationism. I played in worship bands. As a teenager I made trips to China and England to tell the godless heathens there about Jesus. I witnessed miraculous healings unexplained by medical science.

And I felt the presence of God. Sometimes I would tingle and sweat with the Holy Spirit. Other times I felt led by God to give money to a certain cause, or to pay someone a specific compliment, or to walk to the cross at the front of my church and bow before it during a worship service.

Around age 19 I got depressed. But then I read Dallas Willard’s The Divine Conspiracy, a manual for how to fall in love with God so that following his ways is not a burden but a natural and painless product of loving God. And one day I saw a leaf twirling in the wind and it was so beautiful — like the twirling plastic bag in American Beauty — that I had an epiphany. I realized that everything in nature was a gift from God to me. Grass, lakes, trees, sunsets — all these were gifts of beauty from my Savior to me. That's how I fell in love with God, and he delivered me from my depression.

I moved to Minneapolis for college and was attracted to a Christian group led by Mark van Steenwyk. Mark’s small group of well-educated Jesus-followers are 'missional' Christians: they think that loving and serving others in the way of Jesus is more important than doctrinal truth. That resonated with me, and we lived it out with the poor immigrants of Minneapolis.

continue reading »

Not for the Sake of Pleasure Alone

36 lukeprog 11 June 2011 11:21PM

Related: Not for the Sake of Happiness (Alone), Value is Fragile, Fake Fake Utility Functions, You cannot be mistaken about (not) wanting to wirehead, Utilons vs. Hedons, Are wireheads happy?

When someone tells me that all human action is motivated by the desire for pleasure, or that we can solve the Friendly AI problem by programming a machine superintelligence to maximize pleasure, I use a two-step argument to persuade them that things are more complicated than that.

First, I present them with a variation on Nozick's experience machine,1 something like this:

Suppose that an advanced team of neuroscientists and computer scientists could hook your brain up to a machine that gave you maximal, beyond-orgasmic pleasure for the rest of an abnormally long life. Then they will blast you and the pleasure machine into deep space at near light-speed so that you could never be interfered with. Would you let them do this for you?

Most people say they wouldn't choose the pleasure machine. They begin to realize that even though they usually experience pleasure when they get what they desired, they want more than just pleasure. They also want to visit Costa Rica and have good sex and help their loved ones succeed.

But we can be mistaken when inferring our desires from such intuitions, so I follow this up with some neuroscience.

continue reading »

The Urgent Meta-Ethics of Friendly Artificial Intelligence

45 lukeprog 01 February 2011 02:15PM

Barring a major collapse of human civilization (due to nuclear war, asteroid impact, etc.), many experts expect the intelligence explosion Singularity to occur within 50-200 years.

That fact means that many philosophical problems, about which philosophers have argued for millennia, are suddenly very urgent.

Those concerned with the fate of the galaxy must say to the philosophers: "Too slow! Stop screwing around with transcendental ethics and qualitative epistemologies! Start thinking with the precision of an AI researcher and solve these problems!"

If a near-future AI will determine the fate of the galaxy, we need to figure out what values we ought to give it. Should it ensure animal welfare? Is growing the human population a good thing?

But those are questions of applied ethics. More fundamental are the questions about which normative ethics to give the AI: How would the AI decide if animal welfare or large human populations were good? What rulebook should it use to answer novel moral questions that arise in the future?

But even more fundamental are the questions of meta-ethics. What do moral terms mean? Do moral facts exist? What justifies one normative rulebook over the other?

The answers to these meta-ethical questions will determine the answers to the questions of normative ethics, which, if we are successful in planning the intelligence explosion, will determine the fate of the galaxy.

Eliezer Yudkowsky has put forward one meta-ethical theory, which informs his plan for Friendly AI: Coherent Extrapolated Volition. But what if that meta-ethical theory is wrong? The galaxy is at stake.

Princeton philosopher Richard Chappell worries about how Eliezer's meta-ethical theory depends on rigid designation, which in this context may amount to something like a semantic "trick." Previously and independently, an Oxford philosopher expressed the same worry to me in private.

Eliezer's theory also employs something like the method of reflective equilibrium, about which there are many grave concerns from Eliezer's fellow naturalists, including Richard Brandt, Richard Hare, Robert Cummins, Stephen Stich, and others.

My point is not to beat up on Eliezer's meta-ethical views. I don't even know if they're wrong. Eliezer is wickedly smart. He is highly trained in the skills of overcoming biases and properly proportioning beliefs to the evidence. He thinks with the precision of an AI researcher. In my opinion, that gives him large advantages over most philosophers. When Eliezer states and defends a particular view, I take that as significant Bayesian evidence for reforming my beliefs.

Rather, my point is that we need lots of smart people working on these meta-ethical questions. We need to solve these problems, and quickly. The universe will not wait for the pace of traditional philosophy to catch up.

David Chalmers' "The Singularity: A Philosophical Analysis"

33 lukeprog 29 January 2011 02:52AM

David Chalmers is a leading philosopher of mind, and the first to publish a major philosophy journal article on the singularity:

Chalmers, D. (2010). "The Singularity: A Philosophical Analysis." Journal of Consciousness Studies 17:7-65.

Chalmers' article is a "survey" article in that it doesn't cover any arguments in depth, but quickly surveys a large number of positions and arguments in order to give the reader a "lay of the land." (Compare to Philosophy Compass, an entire journal of philosophy survey articles.) Because of this, Chalmers' paper is a remarkably broad and clear introduction to the singularity.

Singularitarian authors will also be pleased that they can now cite a peer-reviewed article by a leading philosopher of mind who takes the singularity seriously.

Below is a CliffsNotes of the paper for those who don't have time to read all 58 pages of it.

 

The Singularity: Is It Likely?

Chalmers focuses on the "intelligence explosion" kind of singularity, and his first project is to formalize and defend I.J. Good's 1965 argument. Defining AI as being "of human level intelligence," AI+ as AI "of greater than human level" and AI++ as "AI of far greater than human level" (superintelligence), Chalmers updates Good's argument to the following:

  1. There will be AI (before long, absent defeaters).
  2. If there is AI, there will be AI+ (soon after, absent defeaters).
  3. If there is AI+, there will be AI++ (soon after, absent defeaters).
  4. Therefore, there will be AI++ (before too long, absent defeaters).

By "defeaters," Chalmers means global catastrophes like nuclear war or a major asteroid impact. One way to satisfy premise (1) is to achieve AI through brain emulation (Sandberg & Bostrom, 2008). Against this suggestion, Lucas (1961), Dreyfus (1972), and Penrose (1994) argue that human cognition is not the sort of thing that could be emulated. Chalmers (1995; 1996, chapter 9) has responded to these criticisms at length. Briefly, Chalmers notes that even if the brain is not a rule-following algorithmic symbol system, we can still emulate it if it is mechanical. (Some say the brain is not mechanical, but Chalmers dismisses this as being discordant with the evidence.)

continue reading »

Anthropomorphic AI and Sandboxed Virtual Universes

-3 jacob_cannell 03 September 2010 07:02PM

Intro

The problem of Friendly AI is usually approached from a decision theoretic background that starts with the assumptions that the AI is an agent that has awareness of AI-self and goals, awareness of humans as potential collaborators and or obstacles, and general awareness of the greater outside world.  The task is then to create an AI that implements a human-friendly decision theory that remains human-friendly even after extensive self-modification.

That is a noble goal, but there is a whole different set of orthogonal compatible strategies for creating human-friendly AI that take a completely different route: remove the starting assumptions and create AI's that believe they are humans and are rational in thinking so.  

continue reading »

Minimum computation and data requirements for consciousness.

-13 daedalus2u 23 August 2010 11:53PM

Consciousness is a difficult question because it is poorly defined and is the subjective experience of the entity experiencing it. Because an individual experiences their own consciousness directly, that experience is always richer and more compelling than the perception of consciousness in any other entity; your own consciousness always seem more “real” and richer than the would-be consciousness of another entity.

Because the experience of consciousness is subjective, we can never “know for sure” that an entity is actually experiencing consciousness. However there must be certain computational functions that must be accomplished for consciousness to be experienced. I am not attempting to discuss all computational functions that are necessary, just a first step at enumerating some of them and considering implications.

First an entity must have a “self detector”; a pattern recognition computation structure which it uses to recognizes its own state of being an entity and of being the same entity over time. If an entity is unable to recognize itself as an entity, then it can't be conscious that it is an entity. To rephrase Descartes, "I perceive myself to be an entity, therefore I am an entity."  It is possible to be an entity and not perceive that one is an entity. This happens in humans but rarely. Other computation structures may be necessary also, but without an ability to recognize itself as an entity an entity cannot be conscious.

continue reading »

Positioning oneself to make a difference

5 Mitchell_Porter 18 August 2010 11:54PM

Last weekend, while this year's Singularity Summit took place in San Francisco, I was turning 40 in my Australian obscurity. 40 is old enough to be thinking that I should just pick a SENS research theme and work on it, and also move to wherever in the world is most likely to have the best future biomedicine (that might be Boston). But at least since the late 1990s, when Eliezer first showed up, I have perceived that superintelligence trumps life extension as a futurist issue. And since 2006, when I first grasped how something like CEV could be an answer to the problem of superintelligence, I've had it before me as a model of how the future could and should play out. I have "contrarian" ideas about how consciousness works, but they do not contradict any of the essential notions of seed AI and friendly AI; they only imply that those notions would need to be adjusted and fitted to the true ontology, whatever that may be.

So I think this is what I should be working on - not just the ontological subproblem, but all aspects of the problem. The question is, how to go about this. At the moment, I'm working on a lengthy statement of how I think a Friendly Singularity could be achieved - a much better version of my top-level posts here, along with new material. But the main "methodological" problem is economic and perhaps social - what can I live on while I do this, and where in the world and in society should I situate myself for maximum insight and productivity. That's really what this post is about.

The obvious answer is, apply to SIAI. I'm not averse to the idea, and on occasion I raise the possibility with them, but I have two reasons for hesitation.

The first is the problem of consciousness. I often talk about this in terms of vaguely specified ideas about quantum entanglement in the brain, but the really important part is the radical disjunction between the physical ontology of the natural sciences and the manifest nature of consciousness. I cannot emphasize enough that this is a huge gaping hole in the scientific understanding of the world, the equal of any gap in the scientific worldview that came before it, and that the standard "scientific" way of thinking about it is a form of property dualism, even if people won't admit this to themselves. All the quantum stuff you hear from me is just an idea about how to restore a type of monism. I actually think it's a conservative solution to a very big problem, but to believe that you would have to agree with me that the other solutions on offer can't work (as well as understanding just what it is that I propose instead).

This "reason for not applying to SIAI" leads to two sub-reasons. First, I'm not sure that the SIAI intellectual environment can accommodate my approach. Second, the problem with consciousness is of course not specific to SIAI, it is a symptom of the overall scientific zeitgeist, and maybe I should be working there, in the field of consciousness studies. If expert opinion changes, SIAI will surely notice, and so I should be trying to convince the neuroscientists, not the Friendly AI researchers.

The second top-level reason for hesitation is simply that SIAI doesn't have much money. If I can accomplish part of the shared agenda while supported by other means, that would be better. Mostly I think in terms of doing a PhD. A few years back I almost started one with Ben Goertzel as co-supervisor, which would have looked at implementing a CEV-like process in a toy physical model, but that fell through at my end. Lately I'm looking around again. In Australia we have David Chalmers and Marcus Hutter. I know Chalmers from my quantum-mind days in Arizona ten years ago, and I met with Hutter recently. The strong interdisciplinarity of my real agenda makes it difficult to see where I could work directly on the central task, but also implies that there are many fields (cognitive neuroscience, decision theory, various quantum topics) where I might be able to limp along with partial support from an institution.

So that's the situation. Are there any other ideas? (Private communications can go to mporter at gmail.)

View more: Prev | Next