Metaphilosophical Mysteries

35 Wei_Dai 27 July 2010 12:55AM

Creating Friendly AI seems to require us humans to either solve most of the outstanding problems in philosophy, or to solve meta-philosophy (i.e., what is the nature of philosophy, how do we practice it, and how should we program an AI to do it?), and to do that in an amount of time measured in decades. I'm not optimistic about our chances of success, but out of these two approaches, the latter seems slightly easier, or at least less effort has already been spent on it. This post tries to take a small step in that direction, by asking a few questions that I think are worth investigating or keeping in the back of our minds, and generally raising awareness and interest in the topic.

continue reading »

Hacking the CEV for Fun and Profit

52 Wei_Dai 03 June 2010 08:30PM

It’s the year 2045, and Dr. Evil and the Singularity Institute have been in a long and grueling race to be the first to achieve machine intelligence, thereby controlling the course of the Singularity and the fate of the universe. Unfortunately for Dr. Evil, SIAI is ahead in the game. Its Friendly AI is undergoing final testing, and Coherent Extrapolated Volition is scheduled to begin in a week. Dr. Evil learns of this news, but there’s not much he can do, or so it seems.  He has succeeded in developing brain scanning and emulation technology, but the emulation speed is still way too slow to be competitive.

There is no way to catch up with SIAI's superior technology in time, but Dr. Evil suddenly realizes that maybe he doesn’t have to. CEV is supposed to give equal weighting to all of humanity, and surely uploads count as human. If he had enough storage space, he could simply upload himself, and then make a trillion copies of the upload. The rest of humanity would end up with less than 1% weight in CEV. Not perfect, but he could live with that. Unfortunately he only has enough storage for a few hundred uploads. What to do…

Ah ha, compression! A trillion identical copies of an object would compress down to be only a little bit larger than one copy. But would CEV count compressed identical copies to be separate individuals? Maybe, maybe not. To be sure, Dr. Evil gives each copy a unique experience before adding it to the giant compressed archive. Since they still share almost all of the same information, a trillion copies, after compression, just manages to fit inside the available space.

Now Dr. Evil sits back and relaxes. Come next week, the Singularity Institute and rest of humanity are in for a rather rude surprise!

Human values differ as much as values can differ

13 PhilGoetz 03 May 2010 07:35PM

George Hamilton's autobiography Don't Mind if I Do, and the very similar book by Bob Evans, The Kid Stays in the Picture, give a lot of insight into human nature and values.  For instance: What do people really want?  When people have the money and fame to travel around the world and do anything that they want, what do they do?  And what is it that they value most about the experience afterward?

You may argue that the extremely wealthy and famous don't represent the desires of ordinary humans.  I say the opposite: Non-wealthy, non-famous people, being more constrained by need and by social convention, and having no hope of ever attaining their desires, don't represent, or even allow themselves to acknowledge, the actual desires of humans.

I noticed a pattern in these books:  The men in them value social status primarily as an ends to a means; while the women value social status as an end in itself.

continue reading »

The Concepts Problem

9 Kaj_Sotala 16 April 2010 06:21AM

I'm not sure how obvious the following is to people, and it probably is obvious to most of the people thinking about FAI. But just thought I'd throw out a summary of it here anyway, since this is the one topic that makes me the most pessimistic about the notion of Friendly AI being possible. At least one based heavily on theory and not plenty of experimentation.

A mind can only represent a complex concept X by embedding it into a tightly intervowen network of other concepts that combine to give X its meaning. For instance, a "cat" is playful, four-legged, feline, a predator, has a tail, and so forth. These are the concepts that define what it means to be a cat; by itself, "cat" is nothing but a complex set of links defining how it relates to these other concepts. (As well as a set of links to memories about cats.) But then, none of those concepts means anything in isolation, either. A "predator" is a specific biological and behavioral class, the members of which hunt other animals for food. Of that definition, "biological" pertains to "biology", which is a "natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy". "Behavior", on the other hand, "refers to the actions of an organism, usually in relation to the environment". Of those words... and so on.

It does not seem likely that humans could preprogram an AI with a ready-made network of concepts. There have been attempts to build knowledge ontologies by hand, but any such attempt is both hopelessly slow and lacking in much of the essential content. Even given a lifetime during which to work and countless of assistants, could you ever hope to code everything you knew into a format from which it was possible to employ that knowledge usefully? Even a worse problem is that the information would need to be in a format compatible with the AI's own learning algorithms, so that any new information the AI learnt would fit seamlessly to the previously-entered database. It does not seem likely that we can come up with an efficient language of thought that can be easily translated into a format that is intuitive for humans to work with.

Indeed, there are existing plans for AI systems which make the explicit assumption that the AI's network of knowledge will develop independently as the system learns, and the concepts in this network won't necessarily have an easy mapping to those used in human language. The OpenCog wikibook states that:

Some ConceptNodes and conceptual PredicateNode or SchemaNodes may correspond with human-language words or phrases like cat, bite, and so forth. This will be the minority case; more such nodes will correspond to parts of human-language concepts or fuzzy collections of human-language concepts. In discussions in this wikibook, however, we will often invoke the unusual case in which Atoms correspond to individual human-language concepts. This is because such examples are the easiest ones to discuss intuitively. The preponderance of named Atoms in the examples in the wikibook implies no similar preponderance of named Atoms in the real OpenCog system. It is merely easier to talk about a hypothetical Atom named "cat" than it is about a hypothetical Atom (internally) named [434]. It is not impossible that a OpenCog system represents "cat" as a single ConceptNode, but it is just as likely that it will represent "cat" as a map composed of many different nodes without any of these having natural names. Each OpenCog works out for itself, implicitly, which concepts to represent as single Atoms and which in distributed fashion.

Designers of Friendly AI seek to build a machine with a clearly-defined goal system, one which is guaranteed to preserve the highly complex values that humans have. But the nature of concepts poses a challenge for this objective. There seems to be no obvious way of programming those highly complex goals into the AI right from the beginning, nor to guarantee that any goals thus preprogrammed will not end up being drastically reinterpreted as the system learns. We cannot simply code "safeguard these human values" into the AI's utility function without defining those values in detail, and defining those values in detail requires us to build the AI with an entire knowledge network. On a certain conceptual level, the decision theory and goal system of an AI is separate from its knowledge base; in practice, it doesn't seem like this would be possible.

The goal might not be impossible, though. Humans do seem to be pre-programmed with inclinations towards various complex behaviors which might suggest pre-programmed concepts to various degrees. Heterosexuality is considerably more common in the population than homosexuality, though this may have relatively simple causes such as an inborn preference towards particular body shapes combined with social conditioning. (Disclaimer: I don't really know anything about the biology of sexuality, so I'm speculating wildly here.) Most people also seem to react relatively consistently to different status displays, and people have collected various lists of complex human universals. The exact method of their transmission remains unknown, however, as does the role that culture serves in it. It also bears noting that most so-called "human universals" are actually cultural as opposed to individual universals. In other words, any given culture might be guaranteed to express them, but there will always be individuals who don't fit into the usual norms.

See also: Vladimir Nesov discusses a closely related form of this problem as the "ontology problem".

Interesting Peter Norvig interview

6 xamdam 03 March 2010 12:59AM

(Sorry this is mostly a link instead of a post, but I think it will interesting to the FAI folks here)

I helped arrange this interview with Peter Norvig:

http://www.reddit.com/r/blog/comments/b8aln/peter_norvig_answers_your_questions_ask_me/

I think the answer to the AGI question 4 is telling, but judge for yourself. (BTW, the 'components' Peter referred to are probabilistic relational learning and hierarchical modeling. He singled these two in his singularity summit talk)

Complexity of Value ≠ Complexity of Outcome

32 Wei_Dai 30 January 2010 02:50AM

Complexity of value is the thesis that our preferences, the things we care about, don't compress down to one simple rule, or a few simple rules. To review why it's important (by quoting from the wiki):

  • Caricatures of rationalists often have them moved by artificially simplified values - for example, only caring about personal pleasure. This becomes a template for arguing against rationality: X is valuable, but rationality says to only care about Y, in which case we could not value X, therefore do not be rational.
  • Underestimating the complexity of value leads to underestimating the difficulty of Friendly AI; and there are notable cognitive biases and fallacies which lead people to underestimate this complexity.

I certainly agree with both of these points. But I worry that we (at Less Wrong) might have swung a bit too far in the other direction. No, I don't think that we overestimate the complexity of our values, but rather there's a tendency to assume that complexity of value must lead to complexity of outcome, that is, agents who faithfully inherit the full complexity of human values will necessarily create a future that reflects that complexity. I will argue that it is possible for complex values to lead to simple futures, and explain the relevance of this possibility to the project of Friendly AI.

continue reading »

Welcome to Heaven

23 denisbider 25 January 2010 11:22PM

I can conceive of the following 3 main types of meaning we can pursue in life.

1. Exploring existing complexity: the natural complexity of the universe, or complexities that others created for us to explore.

2. Creating new complexity for others and ourselves to explore.

3. Hedonic pleasure: more or less direct stimulation of our pleasure centers, with wire-heading as the ultimate form.

What I'm observing in the various FAI debates is a tendency of people to shy away from wire-heading as something the FAI should do. This reluctance is generally not substantiated or clarified with anything other than "clearly, this isn't what we want". This is not, however, clear to me at all.

The utility we get from exploration and creation is an enjoyable mental process that comes with these activities. Once an FAI can rewire our brains at will, we do not need to perform actual exploration or creation to experience this enjoyment. Instead, the enjoyment we get from exploration and creation becomes just another form of pleasure that can be stimulated directly.

If you are a utilitarian, and you believe in shut-up-and-multiply, then the correct thing for the FAI to do is to use up all available resources so as to maximize the number of beings, and then induce a state of permanent and ultimate enjoyment in every one of them. This enjoyment could be of any type - it could be explorative or creative or hedonic enjoyment as we know it. The most energy efficient way to create any kind of enjoyment, however, is to stimulate the brain-equivalent directly. Therefore, the greatest utility will be achieved by wire-heading. Everything else falls short of that.

What I don't quite understand is why everyone thinks that this would be such a horrible outcome. As far as I can tell, these seem to be cached emotions that are suitable for our world, but not for the world of FAI. In our world, we truly do need to constantly explore and create, or else we will suffer the consequences of not mastering our environment. In a world where FAI exists, there is no longer a point, nor even a possibility, of mastering our environment. The FAI masters our environment for us, and there is no longer a reason to avoid hedonic pleasure. It is no longer a trap.

Since the FAI can sustain us in safety until the universe goes poof, there is no reason for everyone not to experience ultimate enjoyment in the meanwhile. In fact, I can hardly tell this apart from the concept of a Christian Heaven, which appears to be a place where Christians very much want to get.

If you don't want to be "reduced" to an eternal state of bliss, that's tough luck. The alternative would be for the FAI to create an environment for you to play in, consuming precious resources that could sustain more creatures in a permanently blissful state. But don't worry; you won't need to feel bad for long. The FAI can simply modify your preferences so you want an eternally blissful state.

Welcome to Heaven.

Intuitive supergoal uncertainty

4 JustinShovelain 04 December 2009 05:21AM

There is a common intuition and feeling that our most fundamental goals may be uncertain in some sense. What causes this intuition? For this topic I need to be able to pick out one’s top level goals, roughly one’s context insensitive utility function, and not some task specific utility function, and I do not want to imply that the top level goals can be interpreted in the form of a utility function. Following from Eliezer’s CFAI paper I thus choose the word “supergoal” (sorry Eliezer, but I am fond of that old document and its tendency to coin new vocabulary). In what follows, I will naturalistically explore the intuition of supergoal uncertainty.

To posit a model, what goal uncertainty (including supergoal uncertainty as an instance) means is that you have a weighted distribution over a set of possible goals and a mechanism by which that weight may be redistributed. If we take away the distribution of weights how can we choose actions coherently, how can we compare? If we take away the weight redistribution mechanism we end up with a single goal whose state utilities may be defined as the weighted sum of the constituent goals’ utilities, and thus the weight redistribution mechanism is necessary for goal uncertainty to be a distinct concept.

continue reading »

Quantifying ethicality of human actions

-14 bogus 13 October 2009 04:10PM

Background:  This article is licensed under the GNU Free Documentation License and Creative Commons Attributions-Share-Alike Unported. It was posted to Wikipedia by an author who wished to remain anonymous, known variously as "24" and "142".  It was subsequently removed from view on Wikipedia, but its text has been preserved by a number of mirrors.  While it could be seen as no more than a basic primer in moral philosophy, it is arguably required reading to anyone unfamiliar with the philosophical background of such concepts as Friendly AI and Coherent Extrapolated Volition.

The search for a formal method for evaluating and quantifying ethicality and morality of human actions stretches back to ancient times. While any simple view of right, wrong and dispute resolution relies on some linguistic and cultural norms, a 'formal' method presumably cannot, and must rely instead on knowledge of more basic human nature, and symbolic methods that allow for only very simple evidence.

continue reading »

Friendlier AI through politics

1 Jonathan_Graehl 16 August 2009 09:29PM

David Brin suggests that some kind of political system populated with humans and diverse but imperfectly rational and friendly AIs would evolve in a satisfactory direction for humans.

I don't know whether creating an imperfectly rational general AI is any easier, except that limited perceptual and computational resources obviously imply less than optimal outcomes; still, why shouldn't we hope for optimal given those constraints?  I imagine the question will become more settled before anyone nears unleashing a self-improving superhuman AI.

An imperfectly friendly AI, perfectly rational or not, is a very likely scenario.  Is it sufficient to create diverse singleton value-systems (demographically representative of humans' values) rather than a consensus (over all humans' values) monolithic Friendly?  

What kind of competitive or political system would make fragmented squabbling AIs safer than an attempt to get the monolithic approach right?  Brin seems to have some hope of improving politics regardless of AI participation, but I'm not sure exactly what his dream is or how to get there - perhaps his "disputation arenas" would work if the participants were rational and altruistically honest).

View more: Prev | Next