You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Goal retention discussion with Eliezer

56 MaxTegmark 04 September 2014 10:23PM

Although I feel that Nick Bostrom’s new book “Superintelligence” is generally awesome and a well-needed milestone for the field, I do have one quibble: both he and Steve Omohundro appear to be more convinced than I am by the assumption that an AI will naturally tend to retain its goals as it reaches a deeper understanding of the world and of itself. I’ve written a short essay on this issue from my physics perspective, available at http://arxiv.org/pdf/1409.0813.pdf.

Eliezer Yudkowsky just sent the following extremely interesting comments, and told me he was OK with me sharing them here to spur a broader discussion of these issues, so here goes.

On Sep 3, 2014, at 17:21, Eliezer Yudkowsky <yudkowsky@gmail.com> wrote:

Hi Max!  You're asking the right questions.  Some of the answers we can
give you, some we can't, few have been written up and even fewer in any
well-organized way.  Benja or Nate might be able to expound in more detail
while I'm in my seclusion.

Very briefly, though:
The problem of utility functions turning out to be ill-defined in light of
new discoveries of the universe is what Peter de Blanc named an
"ontological crisis" (not necessarily a particularly good name, but it's
what we've been using locally).

http://intelligence.org/files/OntologicalCrises.pdf

The way I would phrase this problem now is that an expected utility
maximizer makes comparisons between quantities that have the type
"expected utility conditional on an action", which means that the AI's
utility function must be something that can assign utility-numbers to the
AI's model of reality, and these numbers must have the further property
that there is some computationally feasible approximation for calculating
expected utilities relative to the AI's probabilistic beliefs.  This is a
constraint that rules out the vast majority of all completely chaotic and
uninteresting utility functions, but does not rule out, say, "make lots of
paperclips".

Models also have the property of being Bayes-updated using sensory
information; for the sake of discussion let's also say that models are
about universes that can generate sensory information, so that these
models can be probabilistically falsified or confirmed.  Then an
"ontological crisis" occurs when the hypothesis that best fits sensory
information corresponds to a model that the utility function doesn't run
on, or doesn't detect any utility-having objects in.  The example of
"immortal souls" is a reasonable one.  Suppose we had an AI that had a
naturalistic version of a Solomonoff prior, a language for specifying
universes that could have produced its sensory data.  Suppose we tried to
give it a utility function that would look through any given model, detect
things corresponding to immortal souls, and value those things.  Even if
the immortal-soul-detecting utility function works perfectly (it would in
fact detect all immortal souls) this utility function will not detect
anything in many (representations of) universes, and in particular it will
not detect anything in the (representations of) universes we think have
most of the probability mass for explaining our own world.  In this case
the AI's behavior is undefined until you tell me more things about the AI;
an obvious possibility is that the AI would choose most of its actions
based on low-probability scenarios in which hidden immortal souls existed
that its actions could affect.  (Note that even in this case the utility
function is stable!)

Since we don't know the final laws of physics and could easily be
surprised by further discoveries in the laws of physics, it seems pretty
clear that we shouldn't be specifying a utility function over exact
physical states relative to the Standard Model, because if the Standard
Model is even slightly wrong we get an ontological crisis.  Of course
there are all sorts of extremely good reasons we should not try to do this
anyway, some of which are touched on in your draft; there just is no
simple function of physics that gives us something good to maximize.  See
also Complexity of Value, Fragility of Value, indirect normativity, the
whole reason for a drive behind CEV, and so on.  We're almost certainly
going to be using some sort of utility-learning algorithm, the learned
utilities are going to bind to modeled final physics by way of modeled
higher levels of representation which are known to be imperfect, and we're
going to have to figure out how to preserve the model and learned
utilities through shifts of representation.  E.g., the AI discovers that
humans are made of atoms rather than being ontologically fundamental
humans, and furthermore the AI's multi-level representations of reality
evolve to use a different sort of approximation for "humans", but that's
okay because our utility-learning mechanism also says how to re-bind the
learned information through an ontological shift.

This sorta thing ain't going to be easy which is the other big reason to
start working on it well in advance.  I point out however that this
doesn't seem unthinkable in human terms.  We discovered that brains are
made of neurons but were nonetheless able to maintain an intuitive grasp
on what it means for them to be happy, and we don't throw away all that
info each time a new physical discovery is made.  The kind of cognition we
want does not seem inherently self-contradictory.

Three other quick remarks:

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
The Omohundrian/Yudkowskian argument is not that we can take an arbitrary
stupid young AI and it will be smart enough to self-modify in a way that
preserves its values, but rather that most AIs that don't self-destruct
will eventually end up at a stable fixed-point of coherent
consequentialist values.  This could easily involve a step where, e.g., an
AI that started out with a neural-style delta-rule policy-reinforcement
learning algorithm, or an AI that started out as a big soup of
self-modifying heuristics, is "taken over" by whatever part of the AI
first learns to do consequentialist reasoning about code.  But this
process doesn't repeat indefinitely; it stabilizes when there's a
consequentialist self-modifier with a coherent utility function that can
precisely predict the results of self-modifications.  The part where this
does happen to an initial AI that is under this threshold of stability is
a big part of the problem of Friendly AI and it's why MIRI works on tiling
agents and so on!

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
It built humans to be consequentialists that would value sex, not value
inclusive genetic fitness, and not value being faithful to natural
selection's optimization criterion.  Well, that's dumb, and of course the
result is that humans don't optimize for inclusive genetic fitness.
Natural selection was just stupid like that.  But that doesn't mean
there's a generic process whereby an agent rejects its "purpose" in the
light of exogenously appearing preference criteria.  Natural selection's
anthropomorphized "purpose" in making human brains is just not the same as
the cognitive purposes represented in those brains.  We're not talking
about spontaneous rejection of internal cognitive purposes based on their
causal origins failing to meet some exogenously-materializing criterion of
validity.  Our rejection of "maximize inclusive genetic fitness" is not an
exogenous rejection of something that was explicitly represented in us,
that we were explicitly being consequentialists for.  It's a rejection of
something that was never an explicitly represented terminal value in the
first place.  Similarly the stability argument for sufficiently advanced
self-modifiers doesn't go through a step where the successor form of the
AI reasons about the intentions of the previous step and respects them
apart from its constructed utility function.  So the lack of any universal
preference of this sort is not a general obstacle to stable
self-improvement.

*)   The case of natural selection does not illustrate a universal
computational constraint, it illustrates something that we could
anthropomorphize as a foolish design error.  Consider humans building Deep
Blue.  We built Deep Blue to attach a sort of default value to queens and
central control in its position evaluation function, but Deep Blue is
still perfectly able to sacrifice queens and central control alike if the
position reaches a checkmate thereby.  In other words, although an agent
needs crystallized instrumental goals, it is also perfectly reasonable to
have an agent which never knowingly sacrifices the terminally defined
utilities for the crystallized instrumental goals if the two conflict;
indeed "instrumental value of X" is simply "probabilistic belief that X
leads to terminal utility achievement", which is sensibly revised in the
presence of any overriding information about the terminal utility.  To put
it another way, in a rational agent, the only way a loose generalization
about instrumental expected-value can conflict with and trump terminal
actual-value is if the agent doesn't know it, i.e., it does something that
it reasonably expected to lead to terminal value, but it was wrong.

This has been very off-the-cuff and I think I should hand this over to
Nate or Benja if further replies are needed, if that's all right.

A question about Eliezer

33 perpetualpeace1 19 April 2012 05:27PM

I blew through all of MoR in about 48 hours, and in an attempt to learn more about the science and philosophy that Harry espouses, I've been reading the sequences and Eliezer's posts on Less Wrong. Eliezer has written extensively about AI, rationality, quantum physics, singularity research, etc. I have a question: how correct has he been?  Has his interpretation of quantum physics predicted any subsequently-observed phenomena?  Has his understanding of cognitive science and technology allowed him to successfully anticipate the progress of AI research, or has he made any significant advances himself? Is he on the record predicting anything, either right or wrong?   

Why is this important: when I read something written by Paul Krugman, I know that he has a Nobel Prize in economics, and I know that he has the best track record of any top pundit in the US in terms of making accurate predictions.  Meanwhile, I know that Thomas Friedman is an idiot.  Based on this track record, I believe things written by Krugman much more than I believe things written by Friedman.  But if I hadn't read Friedman's writing from 2002-2006, then I wouldn't know how terribly wrong he has been, and I would be too credulous about his claims.  

Similarly, reading Mike Darwin's predictions about the future of medicine was very enlightening.  He was wrong about nearly everything.  So now I know to distrust claims that he makes about the pace or extent of subsequent medical research.  

Has Eliezer offered anything falsifiable, or put his reputation on the line in any way?  "If X and Y don't happen by Z, then I have vastly overestimated the pace of AI research, or I don't understand quantum physics as well as I think I do," etc etc.

Is That Your True Rejection? by Eliezer Yudkowsky @ Cato Unbound

30 XiXiDu 07 September 2011 06:27PM

A response essay written by Eliezer Yudkowsky posted at Cato Unbound for the issue Brain, Belief, and Politics:

Is That Your True Rejection? by Eliezer Yudkowsky

Eliezer Yudkowsky suggests that the partial mutability of human traits is an auxiliary reason at best for Michael Shermer’s libertarianism. Take that fact away, and Shermer’s politics probably wouldn’t go with it. Yudkowsky says that his own small-l libertarian tendencies come from the long history of government incompetence, indifference, and outright malevolence. These, and not brain science, are the best reasons for libertarians to believe what they do.

Moreover, we make a logical error when we infer shares of causality from shares of observed variance; the relationship between nature and nurture is cooperative, not zero-sum. One thing, however, is clear: Human genetic variance is tiny, as indeed it must be for human beings all to constitute a single species. Environmental manipulation can only achieve so much in part because of this universal human inheritance.

The lead essay has been written by Michael Shermer:

Liberty and Science by Michael Shermer

Michael Shermer discusses scientific findings about belief formation. Beliefs, including political beliefs, are usually the result of automatic or intuitive moral judgments, not rational calculations. One cluster of those intuitions presumes that human nature is malleable; these usually produce a liberal politics. Another group of intuitions presumes that human nature is static; these tend to produce conservatism. But Shermer argues that humans really fall somewhere in between — malleable, within some important limits. He argues that this set of findings should produce a libertarian politics.

A Gameplay Exploration of Yudkowsky's "Twelve Virtues"

43 ac3raven 18 May 2011 06:56PM

Hello Less Wrong, this is my first post (kind of).  I belong to a small game development company called Shiny Ogre Games.  We have a vested interest in making games that, as Johnathan Blow puts it, "speak to the human condition."  I am here to announce our next project for you.

In this announcement for Shiny Ogre's next project, There are two points to address.  Firstly:

Thought is a process like any other. The methods by which we think can be identified, specified, defined, categorized and even predicted.  One method of thinking that has been thoroughly defined is rationality.  Many would consider rationality (i.e. the careful exercise of reason), to be an essential path toward enlightenment (hence this).

Secondly: The objective, logical, and mechanical approach to reason that rationality takes, meshes nicely with game development, because any well-defined system can be turn into a game.  A game is a system composed of players making decisions while considering objectives, governed by a rule set.

Where there is no decision there can be no game.  Where decisions matter, a game can make them matter more.

Therefore, rationality is a core component of game playing.

Games are learning tools.  They are perhaps the best learning tool available to humans, because they invoke our biological tendency to play.

With that said, our announcement:

We're making a video game about rationality.

The game will explore rationality in the context of Eliezer Yudkowsky's "Twelve Virtues of Rationality" (which we have permission for).  From a narrative perspective the game takes place inside a mind on the brink of epiphany and will heavily feature themes from Plato's "Allegory of the Cave".

Yudkowsky's twelve virtues are the basis of the twelve levels in the game, and will feature each virtue in metaphorical form.  The underlying message here is that if you master all of the twelve virtues (by completing all of the twelve levels), you will achieve 'epiphany'.

The game is a 2D side-scrolling puzzle-platformer.  The player assumes the role of a figure that represents his/her own conscious mind while it constructs machines (ala "Incredible Machine") that are a metaphor for the thoughts and concepts that one would create while meditating on a complex problem.

We will update our progress and share development information on our website here, as well as with posts on Less Wrong, our twitter account, and the game's website.

You can expect discussions of design decisions for this project to be written frequently from the angle of game design theory.  We may also release a small documentary film of the development process after the release of the game.

A release date has been set (and its not too long from now), but I don't want to announce it just yet.

Here is some concept art for our Curiosity metaphor (you can view more art at our website linked above):

If you're interested, just upvote and/or comment.  If you have any specific queries related to this project or about game design in general, it would be cool if you went here.

We will be sharing our progress as we make this game over the next few months.  So pay attention to Less Wrong and/or shinyogre.com for updates.

Thanks!

 

Discussion for Eliezer Yudkowsky's paper: Timeless Decision Theory

10 Alexei 06 January 2011 12:28AM

I have not seen any place to discuss Eliezer Yudkowsky's new paper, titled Timeless Decision Theory, so I decided to create a discussion post. (Have I missed an already existing post or discussion?)