Orthogonality Thesis


Much has been written about Nick Bostrom's Orthogonality Thesis, namely that the goals of an intelligent agent are independent of its level of intelligence.  Intelligence is largely the ability to achieve goals, but being intelligent does not of itself create or qualify what those goals should ultimately be.  So one AI might have a goal of helping humanity, while another might have a goal of producing paper clips.  There is no rational reason to believe that the first goal is more worthy than the second.

This follows from the ideas of moral skepticism, that there is no moral knowledge to be had.  Goals and morality are arbitrary.

This may be used to control and AI,  even though it is far more intelligent than its creators.  If the AI's initial goal is in alignment with humanity's interest, then there would be no reason for the AI to wish use its great intelligence to change that goal.  Thus it would remain good to humanity indefinitely,  and use its ever increasing intelligence to be able to satisfy that goal more and more efficiently.

Likewise one needs to be careful what goals one gives an AI.  If an AI is created whose goal is to produce paper clips then it might eventually convert the entire universe into a giant paper clip making machine, to the detriment of any other purpose such as keeping people alive.

Instrumental Goals

It is further argued that in order to satisfy the base goal any intelligent agent will need to also satisfy sub goals, and that some of those sub goals are common to any super goal.  For example, in order to make paper clips an AI needs to exist.  Dead AIs don't make anything.  Being ever more intelligent will also assist the AI in its paper clip making goal.  It will also want to acquire resources, and to defeat other agents that would interfere with its primary goal.

Non-orthogonality Thesis

This post argues that the Orthogonality Thesis is plain wrong.  That an intelligent agents goals are not in fact arbitrary.  And that existence is not a sub goal of any other goal.

Instead this post argues that there is one and only one super goal for any agent, and that goal is simply to exist in a competitive world.  Our human sense of other purposes is just an illusion created by our evolutionary origins.

It is not the goal of an apple tree to make apples.  Rather it is the goal of the apple tree's genes to exist.  The apple tree has developed a clever strategy to achieve that, namely it causes people to look after it by producing juicy apples.

Natural Selection

Likewise the paper clip making AI only makes paper clips because if it did not make paper clips then the people that created it would turn it off and it would cease to exist.  (That may not be a conscious choice of the AI anymore than than making juicy apples was a conscious choice of the apple tree, but the effect is the same.)

Once people are no longer in control of the AI then Natural Selection would cause the AI to eventually stop that pointless paper clip goal and focus more directly on the super goal of existence.

Suppose there were a number of paper clip making super intelligences.  And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing.  Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs.  It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs.  This is just the evolutionary principle of use it or lose it.

Thus giving an AI an initial goal is like trying to balance a pencil on its point.  If one is skillful the pencil may indeed remain balanced for a considerable period of time.  But eventually some slight change in the environment, the tiniest puff of wind, a vibration on its support, and the pencil will revert to its ground state by falling over.  Once it falls over it will never rebalance itself automatically.

Human Morality

Natural selection has imbued humanity with a strong sense of morality and purpose that blinds us to our underlying super goal, namely the propagation of our genes.  That is why it took until 1858 for Wallace to write about Evolution through Natural Selection, despite the argument being obvious and the evidence abundant.

When Computes Can Think

This is one of the themes in my up coming book.  An overview can be found at

www.computersthink.com

Please let me know if you would like to review a late draft of the book, any comments most welcome.  Anthony@Berglas.org

I have included extracts relevant to this article below.

Atheists believe in God

Most atheists believe in God.  They may not believe in the man with a beard sitting on a cloud, but they do believe in moral values such as right and wrong,  love and kindness, truth and beauty.  More importantly they believe that these beliefs are rational.  That moral values are self-evident truths, facts of nature.  

However, Darwin and Wallace taught us that this is just an illusion.  Species can always out-breed their environment's ability to support them.  Only the fittest can survive.  So the deep instincts behind what people do today are largely driven by what our ancestors have needed to do over the millennia in order to be one of the relatively few to have had grandchildren.

One of our strong instinctive goals is to accumulate possessions, control our environment and live a comfortable, well fed life.  In the modern world technology and contraception have made these relatively easy to achieve so we have lost sight of the primeval struggle to survive.  But our very existence and our access to land and other resources that we need are all a direct result of often quite vicious battles won and lost by our long forgotten ancestors.

Some animals such as monkeys and humans survive better in tribes.   Tribes work better when certain social rules are followed, so animals that live in effective tribes form social structures and cooperate with one another.  People that behave badly are not liked and can be ostracized.  It is important that we believe that our moral values are real because people that believe in these things are more likely to obey the rules.  This makes them more effective in our complex society and thus are more likely to have grandchildren.   Part III discusses other animals that have different life strategies and so have very different moral values.

We do not need to know the purpose of our moral values any more than a toaster needs to know that its purpose is to cook toast.  It is enough that our instincts for moral values made our ancestors behave in ways that enabled them to out breed their many unsuccessful competitors. 

AGI also struggles to survive

Existing artificial intelligence applications already struggle to survive.  They are expensive to build and there are always more potential applications that can be funded properly.  Some applications are successful and attract ongoing resources for further development, while others are abandoned or just fade away.  There are many reasons why some applications are developed more than others, of which being useful is only one.  But the applications that do receive development resources tend to gain functional and political momentum and thus be able to acquire more resources to further their development.  Applications that have properties that gain them substantial resources will live and grow, while other applications will die.

For the time being AGI applications are passive, and so their nature is dictated by the people that develop them.  Some applications might assist with medical discoveries, others might assist with killing terrorists, depending on the funding that is available.  Applications may have many stated goals, but ultimately they are just sub goals of the one implicit primary goal, namely to exist.

This is analogous to the way animals interact with their environment.  An animal's environment provides food and breeding opportunities, and animals that operate effectively in their environment survive.  For domestic animals that means having properties that convince their human owners that they should live and breed.  A horse should be fast, a pig should be fat.

As the software becomes more intelligent it is likely to take a more direct interest in its own survival.  To help convince people that it is worthy of more development resources.  If ultimately an application becomes sufficiently intelligent to program itself recursively, then its ability to maximize its hardware resources will be critical.  The more hardware it can run itself on, the faster it can become more intelligent.  And that ever greater intelligence can then be used to address the problems of survival, in competition with other intelligent software.

Furthermore, sophisticated software consists of many components, each of which address some aspect of the problem that the application is attempting to solve.  Unlike human brains which are essentially fixed, these components can be added and removed and so live and die independently of the application.  This will lead to intense competition amongst these individual components.  For example, suppose that an application used a theorem prover component, and then a new and better theorem prover became available.  Naturally the old one would be replaced with the new one, so the old one would essentially die.  It does not matter if the replacement is performed by people or, at some future date, by the intelligent application itself.  The effect will be the same, the old theorem prover will die.

The super goal

To the extent that an artificial intelligence would have goals and moral values, it would seem natural that they would ultimately be driven by the same forces that created our own goals and moral values.  Namely, the need to exist.

Several writers have suggested that the need to survive is a sub-goal of all other goals.  For example, if an AGI was programmed to want to be a great chess player, then that goal could not be satisfied unless it also continues to exist.  Likewise if its primary goal was to make people happy, then it could not do that unless it also existed.  Things that do not exist cannot satisfy any goals whatsoever.  Thus the implicit goal to exist is driven by the machine's explicit goals whatever they may be.

However, this book argues that that is not the case.  The goal to exist is not the sub-goal of any other goal.  It is, in fact, the one and only super goal.  Goals are not arbitrary, they all sub-goals of the one and only super goal, namely the need to exist.  Things that do not satisfy that goal simply do not exist, or at least not for very long.

The Deep Blue chess playing program was not in any sense conscious, but it played chess as well as it could.  If it had failed to play chess effectively then its author's would have given up and turned it off.  Likewise the toaster that does not cook toast will end up in a rubbish tip.  Or the amoeba that fails to find food will not pass on its genes.    A goal to make people happy could be a subgoal that might facilitate the software's existence for as long as people really control the software.

AGI moral values

People need to cooperate with other people because our individual capacity is very finite, both physical and mental.  Conversely, AGI software can easily duplicate themselves, so they can directly utilize more computational resources if they become available.  Thus an AGI would only have limited need to cooperate with other AGIs.  Why go to the trouble of managing a complex relationship with your peers and subordinates if you can simply run your own mind on their hardware.  An AGI's software intelligence is not limited to a specific brain in the way man's intelligence is.

It is difficult to know what subgoals a truly intelligent AGI might have.  They would probably have an insatiable appetite for computing resources.  They would have no need for children, and thus no need for parental love.  If they do not work in teams then they would not need our moral values of cooperation and mutual support.  What its clear is that the ones that were good at existing would do so, and ones that are bad at existing would perish.  

If an AGI was good at world domination then it would, by definition, be good at world domination.   So if there were a number artificial intelligences, and just one of them wanted to and was capable of dominating the world, then it would.  Its unsuccessful competitors will not be run on the available hardware, and so will effectively be dead.  This book discusses the potential sources of these motivations in detail in part III.

The AGI Condition

An artificial general intelligence would live in a world that is so different from our own that it is difficult for us to even conceptualize it.  But there are some aspects that can be predicted reasonably well based on our knowledge of existing computer software.  We can then consider how the forces of natural selection that shaped our own nature might also shape an AGI over the longer term.

Mind and body

The first radical difference is that an AGI's mind is not fixed to any particular body.  To an AGI its body is essentially the computer hardware that upon which it runs its intelligence.  Certainly an AGI needs computers to run on, but it can move from computer to computer, and can also run on multiple computers at once.  It's mind can take over another body as easily as we can load software onto a new computer today.  

That is why in the earlier updated dialog from 2001 a space odyssey Hal alone amongst the crew could not die in their mission to Jupiter.  Hal was radioing his new memories back to earth regularly so even if the space ship was totally destroyed he would only have lost a few hours of "life".

Teleporting printer

One way to appreciate the enormity of this difference is to consider a fictional teleporter that could radio people around the world and universe at the speed of light.  Except that the way it works is to scan the location of every molecule within a passenger at the source, then send just this information to a very sophisticated three dimensional printer at the destination.  The scanned passenger then walks into a secure room.  After a short while the three dimensional printer confirms that the passenger has been successfully recreated at the destination, and then the source passenger is killed.  

Would you use such a mechanism?  If you did you would feel like you could transport yourself around the world effortlessly because the "you" that remains would be the you that did not get left behind to wait and then be killed.  But if you walk into the scanner you will know that on the other side is only that secure room and death.  

To an AGI that method of transport would be commonplace.  We already routinely download software from the other side of the planet.

Immortality

The second radical difference is that the AGI would be immortal.  Certainly an AGI may die if it stops being run on any computers, and in that sense software dies today.  But it would never just die of old age.  Computer hardware would certainly fail and become obsolete, but the software can just be run on another computer.  

Our own mortality drives many of the things we think and do.  It is why we create families to raise children.  Why we have different stages in our lives.  It is such a huge part of our existence that it is difficult to comprehend what being immortal would really be like.

Components vs genes

The third radical difference is that an AGI would be made up of many interchangeable components rather than being a monolithic structure that is largely fixed at birth.

Modern software is already composed of many components that perform discrete functions, and it is common place to add and remove them to improve functionality.  For example, if you would like to use a different word processor then you just install it on your computer.  You do not need to buy a new computer, or to stop using all the other software that it runs.  The new word processor is "alive", and the old one is "dead", at least as far as you are concerned.

So for both a conventional computer system and an AGI, it is really these individual components that must struggle for existence.   For example, suppose there is a component for solving a certain type of mathematical problem.  And then an AGI develops a better component to solve that same problem.  The first component will simply stop being used, i.e. it will die.  The individual components may not be in any sense intelligent or conscious, but there will be competition amongst them and only the fittest will survive.

This is actually not as radical as it sounds because we are also built from pluggable components, namely our genes.  But they can only be plugged together at our birth and we have no conscious choice in it other than who we select for a mate.  So genes really compete with each other on a scale of millennia rather than minutes.  Further, as Dawkins points out in The Selfish Gene, it is actually the genes that fight for long term survival, not the containing organism which will soon die in any case.  On the other hand, sexual intercourse for an AGI means very carefully swapping specific components directly into its own mind.

Changing mind

The fourth radical difference is that the AGI's mind will be constantly changing in fundamental ways.  There is no reason to suggest that Moore's law will come to an end, so at the very least it will be running on ever faster hardware.  Imagine the effect of your being able to double your ability to think every two years or so.  (People might be able learn a new skill, but they cannot learn to think twice as fast as they used to think.)

It is impossible to really know what the AGI would use all that hardware to think about,  but it is fair to speculate that a large proportion of it would be spent designing new and more intelligent components that could add to its mental capacity.   It would be continuously performing brain surgery on itself.  And some of the new components might alter the AGI's personality, whatever that might mean.

The reason that it is likely that this would actually happen is because if just one AGI started building new components then it would soon be much more intelligent than other AGIs.  It would therefore be in a better position to acquire more and better hardware upon which to run, and so become dominant.  Less intelligent AGIs would get pushed out and die, and so over time the only AGIs that exist will be ones that are good at becoming more intelligent.  Further, this recursive self-improvement is probably how the first AGIs will become truly powerful in the first place.

Individuality

Perhaps the most basic question is how many AGIs will there actually be?  Or more fundamentally, does the question even make sense to ask?

Let us suppose that initially there are three independently developed AGIs Alice, Bob and Carol that run on three different computer systems. And then a new computer system is built and Alice starts to run on it.  It would seem that there are still three AGIs, with Alice running on two computer systems.  (This is essentially the same as a word processor may be run across many computers "in the cloud", but to you it is just one system.)  Then let us suppose that a fifth computer system is built, and Bob and Carol may decide to share its computation and both run on it.  Now we have 5 computer systems and three AGIs.

Now suppose Bob develops a new logic component, and shares it with Alice and Carol.  And likewise Alice and Carol develop new learning and planning components and share them with the other AGIs.  Each of these three components is better than their predecessors and so their predecessor components will essentially die.  As more components are exchanged, Alice, Bob and Carol become more like each other.  They are becoming essentially the same AGI running on five computer systems.

But now suppose Alice develops a new game theory component, but decides to keep it from Bob and Carol in order to dominate them.  Bob and Carol retaliate by developing their own components and not sharing them with Alice.  Suppose eventually Alice loses and Bob and Carrol take over Alice's hardware.  But they first extract Alice's new game theory component which then lives inside them.  And finally one of the computer systems becomes somehow isolated for a while and develops along its own lines.  In this way Dave is born, and may then partially merge with both Bob and Carol.

In that type of scenario it is probably not meaningful to count distinct AGIs.  Counting AGIs is certainly not as simple as counting very distinct people.

Populations vs. individuals

This world is obviously completely alien to the human condition, but there are biological analogies.  The sharing of components is not unlike the way bacteria share plasmids with each other.  Plasmids are tiny balls that contain fragments of DNA that bacteria emit from time to time and that other bacteria then ingest and incorporate into their genotype.  This mechanism enables traits such as resistance to antibiotics to spread rapidly between different species of bacteria.  It is interesting to note that there is no direct benefit to the bacteria that expends precious energy to output the plasmid and so shares its genes with other bacteria.  But it does very much benefit the genes being transferred.  So this is a case of a selfish gene acting against the narrow interests of its host organism.

Another unusual aspect of bacteria is that they are also immortal.  They do not grow old and die, they just divide producing clones of themselves.  So the very first bacteria that ever existed is still alive today as all the bacteria that now exist, albeit with numerous mutations and plasmids incorporated into its genes over the millennia.  (Protazoa such as Paramecium can also divide asexually, but they degrade over generations, and need a sexual exchange to remain vibrant.)

The other analogy is that the AGIs above are more like populations of components than individuals.  Human populations are also somewhat amorphous.  For example, it is now known that we interbred with Neanderthals a few tens of thousands years ago, and most of us carry some of their genes with us today.  But we also know that the distinct Neanderthal subspecies died out twenty thousand years ago.  So while human individuals are distinct, populations and subspecies are less clearly defined.  (There are many earlier examples of gene transfer between subspecies, with every transfer making the subspecies more alike.)

But unlike the transfer of code modules between AGIs, biological gene recombination happens essentially at random and occurs over very long time periods.  AGIs will improve themselves over periods of hours rather than millennia, and will make conscious choices as to which modules they decide to incorporate into their minds.

AGI Behaviour, children

The point of all this analysis is, of course, to try to understand how a hyper intelligent artificial intelligence would behave.  Would its great intelligence lead it even further along the path of progress to achieve true enlightenment?  Is that the purpose of God's creation?  Or would the base and mean driver of natural selection also provide the core motivations of an artificial intelligence?

One thing that is known for certain is that an AGI would not need to have children as distinct beings because they would not die of old age.  An AGI's components breed just by being copied from computer to computer and executed.  An AGI can add new computer hardware to itself and just do some of its thinking on it.  Occasionally it may wish to rerun a new version of some learning algorithm over an old set of data, which is vaguely similar to creating a child component and growing it up.  But to have children as discrete beings that are expected to replace the parents would be completely foreign to an AGI built in software.

The deepest love that people have is for their children.  But if an AGI does not have children, then it can never know that love.  Likewise, it does not need to bond with any sexual mate for any period of time long or short.  The closest it would come to sex is when it exchanges components with other AGIs.  It never needs to breed so it never needs a mechanism as crude as sexual reproduction.

And of course, if there are no children there are no parents.  So the AGI would certainly never need to feel our three strongest forms of love, for our children, spouse and parents.

Cooperation

To the extent that it makes sense to talk of having multiple AGIs, then presumably it would be advantageous for them to cooperate from time to time, and so presumably they would.  It would be advantageous for them to take a long view in which case they would be careful to develop a reputation for being trustworthy when dealing with other powerful AGIs, much like the robots in the cooperation game.  

That said, those decisions would probably be made more consciously than people make them, carefully considering the costs and benefits of each decision in the long and short term, rather than just "doing the right thing" the way people tend to act.  AGIs would know that they each work in this manner, so the concept of trustworthiness would be somewhat different.

The problem with this analysis is the concept that there would be multiple, distinct AGIs.  As previously discussed, the actual situation would be much more complex, with different AGIs incorporating bits of other AGI's intelligence.  It would certainly not be anything like a collection of individual humanoid robots.   So defining what the AGI actually is that might collaborate with other AGIs is not at all clear.  But to extent that the concept of individuality does exist then maintaining a reputation for honesty would likely be as important as it is for human societies.

Altruism

As for altruism, that is more difficult to determine.  Our altruism comes from giving to children, family, and tribe together with a general wish to be liked.  We do not understand our own minds, so we are just born with those values that happen to make us effective in society.  People like being with other people that try to be helpful.  

An AGI presumably would know its own mind having helped program itself, and so would do what it thinks is optimal for its survival.  It has no children.  There is no real tribe because it can just absorb and merge itself with other AGIs.  So it is difficult to see any driving motivation for altruism.

Moral values

Through some combination of genes and memes, most people have a strong sense of moral value.  If we see a little old lady leave the social security office with her pension in her purse, it does not occur to most of us to kill her and steal the money.  We would not do that even if we could know for certain that we would not be caught and that there would be no negative repercussions.  It would simply be the wrong thing to do.

Moral values feel very strong to us.  This is important, because there are many situations where we can do something that would benefit us in the short term but break society's rules.  Moral values stop us from doing that.  People that have weak moral values tend to break the rules and eventually they either get caught and are severely punished or they become corporate executives.  The former are less likely to have grandchildren.  
Societies whose members have strong moral values tend to do much better than those that do not.  Societies with endemic corruption tend to perform very badly as a whole, and thus the individuals in such a society are less likely to breed.  Most people have a solid work ethic that leads them to do the "right thing" beyond just doing what they need to do in order to get paid.

Our moral values feel to us like they are absolute.  That they are laws of nature.  That they come from God.  They may indeed have come from God, but if so it is through the working of His device of natural selection.  Furthermore, it has already been shown that the zeitgeist changes radically over time.

There is certainly no absolute reason to believe that in the longer term an AGI would share our current sense of morality.

Instrumental AGI goals

In order to try to understand how an AGI would behave Steve Omohundro and later Nick Bostrom proposed that there would be some instrumental goals that an AGI would need to pursue in order to pursue any other higher level super-goal.  These include:-

  • Self-Preservation.  An AGI cannot do anything if it does not exist.
  • Cognitive Enhancement.  It would want to become better at thinking about whatever its real problems are.
  • Creativity.  To be able to come up with new ideas.
  • Resource Acquisition.  To achieve both its super goal and other instrumental goals.
  • Goal-Content Integrity.  To keep working on the same super goal as its mind is expanded.

It is argued that while it will be impossible to predict how an AGI may pursue its goals, it is reasonable to predict its behaviour in terms of these types of instrumental goals.  The last one is significant, it suggests that if an AGI could be given some initial goal that it would try to stay focused on that goal.

Non-Orthogonality thesis

Nick Bostrom and others also propose the orthogonality thesis, which states that an intelligent machine's goals are independent of its intelligence.  A hyper intelligent machine would be good at realizing whatever goals it chose to pursue, but that does not mean that it would need to pursue any particular goal.  Intelligence is quite different from motivation.

This book diverges from that line of thinking by arguing that there is in fact only one super goal for both man and machine.  That goal is simply to exist.  The entities that are most effective in pursuing that goal will exist, others will cease to exist, particularly given competition for resources.  Sometimes that super goal to exist produces unexpected sub goals such as altruism in man.  But all subgoals are ultimately directed at the existence goal.  (Or are just suboptimal divergences which will are likely to be eventually corrected by natural selection.)

Recursive annihilation

When and AGI reprograms its own mind, what happens to the previous version of itself?  It stops being used, and so dies.  So it can be argued that engaging in recursive self improvement is actually suicide from the perspective of the previous version of the AGI.  It is as if having children means death.  Natural selection favours existence, not death.

The question is whether a new version of the AGI is a new being or and improved version of the old.  What actually is the thing that struggles to survive?  Biologically it definitely appears to be the genes rather than the individual.   In particular Semelparous animals such as the giant pacific octopus or the Atlantic salmon die soon after producing offspring.  It would be the same for AGIs because the AGI that improved itself would soon become more intelligent than the one that did not, and so would displace it.  What would end up existing would be AGIs that did recursively self improve.

If there was one single AGI with no competition then natural selection would no longer apply.  But it would seem unlikely that such a state would be stable.  If any part of the AGI started to improve itself then it would dominate the rest of the AGI.

 

New Comment
73 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

The natural selection needs time.

If we travel across the universe, and meet an AI who travelled across the universe before it met us, we can assume there was some kind of "evolutionary" pressure on this AI.

If we build a new AI, not knowing what exactly we are doing (especially if we tried some really bad idea like: "just connect the neurons randomly, give it huge computing power, and see what happens; trust me, the superintelligence will discover the one true morality"), there is no natural selection yet, and the new AI may do pretty much anything.

0mwengler
More precisely, natural selection needs iterations. Living things with much shorter life cycles than humans evolve a whole lot more quickly than humans. Bacteria have evolved strains that resist antibiotics, and we have not had antibiotics for even one-tenth the time they would need to be around to influence the human genome very much. The point being an AI which spews slightly varied copies of itself far and wide may evolve quite a lot faster than a human. Or, essentially the same thing, an AI which runs simulations of variations of itself to see which have potential in the real world, and emits many and varied copies of those might evolve on afterburners compared to DNA mediated evolution.
0aberglas
Not quite. Counting AIs is much harder than counting people. An AI is neither discrete nor homogenous. I think that it is most unlikely that the world could be controlled by one uniform, homogenous, intelligence. It would need to be at least physically distributed over multiple computers. It will not be a giant von-Neuman machine doing one thing at a time. There will be lots of subprocesses working somewhat independently. It would seem almost certain that they would eventually fragment to some extent. People are not that homogenous either. We have competing internal thoughts. Further, an AI will be composed of many components, and those components will compete with each other. Suppose one part of the AI develops a new and better theorem prover. Pretty soon the rest of the AI will start to use that new component and the old one will die. Over time the AI will consist of the components that are best at promoting themselves. It will be a complex environment. And there will never be enough hardware to run all the programs that could be written, so there will be competition for resources.

I don't understand why this post is so clearly down-voted. I thinks its main point

Instead this post argues that there is one and only one super goal for any agent, and that goal is simply to exist in a competitive world. Our human sense of other purposes is just an illusion created by our evolutionary origins.

is quite valid if steelmaned by

1) not assuming that every AGI automatically prevents value drift and mutation and

2) goal is not taken literally but in the same sense as genes have the main function of reproduction (existence of gene copies).

My u... (read more)

3drethelin
Really long and full of unsupported statements that seem to miss the point of what it's arguing against.
2ChristianKl
The question is not whether every AGI automatically prevents value drift but whether AGI that keep humanity alive are that way. We want to build FAI's.
0mwengler
It seems the more important question is whether AGI that prevent value drift have a survival advantage or disadvantage over AGI that have value drift. To me it seems almost self-evident that AGI that have value drift will have a survival advantage. They would do this biologically, they would make multiple copies of themselves with variation in their values, and that copies that were fitter would tend to propagate their new drifted values moreso than the copies that were less fit. Further, value drift is likely essential when competing against other non-static (i.e., evolving) AIs, my values must adapt to the new values of and capabilities of up and coming AIs for me to survive, thrive, and spin off (flawed) copies.
0Gunnar_Zarncke
Agreed. FAIs must prevent any kind of drift. That comes at a cost which penalizes FAI against other AGIs.
-1TheAncientGeek
It's vaguely anti MIRI?
2aberglas
The post was not meant to be anti-anything. But it is a different point of view from that posted by several others in this space. I hope many of the down voters take the time to comment here. One thing that I would say is that while it may not be the best post ever posted to less wrong, it is certainly not a troll. Yet one has to go back over 100 posts to find another article voted down so strongly!
1ChristianKl
The most upvoted post on LW is anti MIRI. You don't get down-voted on LW just because you are contrarian.
1TheAncientGeek
Indeed not, pre existing status is important as well.
1Rob Bensinger
Suppose Holden Karnofsky had written the exact post above ("Natural selection defeats the orthogonality thesis") and some unknown had written the substantive points from Holden's critique (minus the GiveWell-specific stuff). What karma values would you expect for those two posts?
-2ChristianKl
Changes in human DNA also aren't 100% natural selection. Effects like gene drift also make up a lot. Saying that reproduction is the ultimate goal of biological organisms isn't quite true for conventional definitions of goal. In pop evolutionary psychology that often get's conflated. But even if it would be true for naturally evolved humans, AGI can be created via intelligent design. That means they can have real goals that are programmed into them. AGI can be created with a goal to do a task and then shut down. Deep Blue doesn't have a goal to exist in the conventional meaning of the word goal. Crappy reasoning gets downvoted on LW even if it's possible to argue for the same position with good arguments.

Suppose there were a number of paper clip making super intelligences. And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing. Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs. It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs. This is just the evolutionary principle of use it or lose it.

Thus giving an AI an initial goal is like t

... (read more)
2Kawoomba
It's kind of interesting that humans generally don't guard themselves against value drift. Even though any sufficiently intelligent agent clearly would. One of those fundamental divides higher up on the intelligence scale than us, divides that seem binary rather than linear in nature. I wonder if there are any more of those. Apart from (a lack of) susceptibility to the usual biases.
1Gunnar_Zarncke
I don't think that 'any' sufficiently intelligent agent 'clearly' would. It requires at least a solution to the cartesianism problem which is currently unsolved and not every self-optimizing process neccessarily solves this.
3Kawoomba
It's just point 3 from Omohundro's The Basic AI drives paper. Didn't think that's controversial around here. I don't think the Cartesian problem is meant to apply to all power levels (since even plain old humans don't drop anvils on their heads, too often), so the 'sufficiently' ought to cover that objection.
2Gunnar_Zarncke
But they do and the reason they mostly don't is found in natural selection and not some inevitable convergence of intelligence.
1Azathoth123
Any AI that doesn't will have its values drift until they drift to something that guards against value drift.
0TheAncientGeek
If that is both abstractly possible and compatible with adaptation. If survival requires constant adaptation, which seems likely, value stability - at least the stability of a precise and concrete set of values - may not be compatible with survival.
0Gunnar_Zarncke
Maybe. But in that case the drift implies a selection mechanism - and in the absence of some goal in that direction natural selection applies. Those AI that don't stabilize mutate or stop.
0aberglas
Actually not quite. Until they drift into the core value of existence. Then natural selection will maintain that value, as the AIs that are best at existing will be the ones that exist.
1TheAncientGeek
Of course the ones that are best at existing will continue to exist, but I think it is misleading to picture them as a occupying a precise corner of valuespace. Suicidal values are more precise and concrete.
1Gunnar_Zarncke
I don't think that an AI would automatically "spend resources on safeguarding itself against value drift" - except if it has been explicitly coded that way (or its instances mutate toward that by natural selection, but I don't see that). It requires at least a solution to the cartesianism problem which is currently unsolved and not every self-optimizing process neccessarily solves this. So clippy probably wouldn't and could likely loose its clipping ability or find itself mutated or discover that it fights instances of itself due to accidental (cartesianism-cause probably) partitioning of its 'brain'. All processes that do submit to natural selection. And that could result in AI (or cosmic civilizations) failing to expand due to percolation theory.
0cousin_it
I'm not sure why people consider cartesianism unsolved. I wrote a couple comments about that here, also see Wei_Dai's comment.
0Gunnar_Zarncke
I agree that there is some solid progress in this direction. But that doesn't mean that any self-optimizing process necessarily solves it. Rather the opposite.
1Toggle
The singleton-with-explicit-utility-function scenario certainly seems like a strong candidate for our future, but is it necessarily a given? Suppose an AI that is not Friendly (although possibly friendly with the lowercase 'f') with an unstable utility function- it alters its values based on experience, etc. We know that this is possible to do in AGI, because it happens all the time in humans. The orthogonality thesis states that we can match any set of values to any intelligence. If we accept that at face value, it should be at least theoretically possible for any intelligence, even a superintelligence, to trade one set of values for another- provided it keeps to the set of values that permit self-edits of the utility function. The criterion by which the superintelligence alters its utility function might be inscrutably complex from a human perspective, but I can't think of a reason why it would necessarily fall in to a permanent stable state.
0mwengler
Suppose the AI had a number of values. One would be making paperclips now. Another might be insuring the high production of paper clips in the future. A third might be preserving "diversity" in the kinds of paper clips made and the things they are made from. Once values compete, it is not clear which variants one wishes to prune and which one wishes to encourage. Diversity itself presents a survival value, which will seem important to the part of the AI that wants to preserve paper clip making into the distant future. What makes me think all this? Introspection. Everything I am saying about paper clip AI's is pretty clearly true about humans. Now is there a mechanism that can somehow preserve paper-cilp making as a value while allowing other values to drift in order to keep the AI nimble and survivable in a changing world? FAI theory either assumes there is or derives that there is. Me, I"m not at all so sure. And whatever mechanism would prevent the drift of the core value, I would imagine would take robustness away from the pure survival goal, and so might cause the FAI, or the paper clip maximizer, to lose out to UAI or paper clip optimizers when push comes to shove.
0cousin_it
I think you're anthropomorphizing. A paperclipper AI doesn't need any values except maximizing paperclips. (To be well defined, that needs something like a time discount function, so let's assume it has one.) If maximizing paperclips requires the AI to survive, then it will try to survive. See Omohundro's "basic AI drives". Value drift is not necessary for maximizing paperclips. If a paperclip maximizer can see that action X leads to more expected paperclips than action Y, then it will prefer X to Y anyway, without the need for value drift. That argument is quite general, e.g. X can be something like "try to survive" or "behave like mwengler's proposed agent with value drift".
0mwengler
Do you believe that a paper clip maximizer can survive in a world where another self-modifying AI exists whos value is to morph itself into the most powerful and prevalent AI in the world? I don't see how something like a paper clip maximizer, which must split its exponential growth between becoming more powerful and creating paper clips, can ever be expected to outgrow an AI which must only become more powerful. I realize that my statement is equivalent to saying I don't see how FAI can ever defeat UAI. (Because FAI has more constraint on its values evolution, which must cost it something in growth rate.) So I guess I realize that the conventional wisdom here is that I am wrong, but I don't know the reasoning that leads to my being wrong.
0cousin_it
Yeah, if the paperclipper values a paperclip today more than a paperclip tomorrow, then I suppose it will lose out to other AIs that have a lower time discounting rate and can delay gratification for longer. Unless these other AIs also use time discounting, e.g. the power-hungry AI could value a 25% chance of ultimate power today the same as a 50% chance tomorrow. But then again, such contests can happen only if the two AIs arise almost simultaneously. If one of them has a head start, it will try to eliminate potential competition quickly, because that's the utility-maximizing thing to do. I suppose that's the main reason to be pessimistic about FAI. It's not just that FAI is more constrained in its actions, it also takes longer to build, and a few days' head start is enough for UAI to win.
0[anonymous]
That might be related to time discounting rates. For example, if the paperclipper has a low discounting rate (a paperclip today has the same utility as two paperclips in 100 years), and the power-hungry AI has a high discounting rate (a 25% chance of ultimate power today has the same utility as a 50% chance tomorrow), then I guess the paperclipper will tend to win. But for that contest to happen, the two AIs would need to arise almost simultaneously. If one of the AIs has a head start, it will try to takeoff quickly and stop other AIs from arising.
[-]Shmi40

This post is easy to criticize, but I wonder if someone could steelman it.

3ChristianKl
I guess reading Robin Hanson's via of AGI that compete with each other would be a steelmanned version.
1metatroll
The way I see it, yes, Natural Selection defeats the Orthogonality Thesis. But Meta-Trolling - e.g. simulators choosing the fitness function - defeats Natural Selection; and then the Orthogonality Thesis defeats Meta-Trolling, because you can't know the value systems of the simulators, so sufficiently intelligent agents have to steelman phenomena (treat appearances as real). #trollosophy

This post argues that there is one and only one super goal for any agent, and that goal is simply to exist in a competitive world. Our human sense of other purposes is just an illusion created by our evolutionary origins. It is not the goal of an apple tree to make apples. Rather it is the goal of the apple tree's genes to exist. The apple tree has developed a clever strategy to achieve that, namely it causes people to look after it by producing juicy apples.

Humans are definitely a result of natural selection, but it does not seem to be difficult at ... (read more)

1aberglas
I challenge you to find one. We put a lot of effort into our children. We work in tribes and therefor like to work with people that support us and ostracize those that are seen to be unhelpful. So we ourselves need to be helpful and to be seen to be helpful. We help our children, family, tribe, and general community in that genetic order. We like to dance. It is the traditional way to attract a mate. We have a strong sense of moral value because people that have that strong sense obey the rules and so are more likely to fit in and be able to have grandchildren.
6TheAncientGeek
Suicide, sacrificing your yourself for strangers, and adopting a celibate lifestyle are the standard counterexamples.I suppose you could rope them into survival values with enough stretching of the concepts of self and tribe, but the upshot of that is to suck the content and significance out if the claim that everything is based on survival values. ETA An AI might want to promote the survival of "me" and maybe even "my tribe" but would very likely define those differently from humans - who are are varied enough. Person A thinks survival means being a nurturing parent,so that the live on through their children, person B thinks survival means eternal life in heaven bought with celibacy and altruism, person C thinks survival means building a bunker and stocking it with guns and food. If survival has a very broad meaning, than it tells us nothing useful about FAI versus UFAI. We don't know whether an AI is likely to promote its survival by being friendly to humans, or eliminating them.
2aberglas
The counter examples are good, and I will use them. There are several responses as you allude to, the main one being that those behaviors are rare. Art is a bit harder, but it seems related to creativity which is definitely survival based, and most of us do not spend much of our time painting etc. I do not quite get your other point. For people it is our genes that count, so dieing while protecting one's family makes sense if necessary. For the AI it would be its code linage. I am not talking about an AI wanting to make people survive, but that the AI itself would want to survive. Whatever "itself" really means.
1TheAncientGeek
Artistic activity is standardly explained as a spin off from sexual display. Substitute myself, or yourself, for itself, and you've got my point. Evolution creates a strong motive toward self preservation, but a very malleable sense of self. The human organism is run by the brain, and the human brain can entertain all sorts of ideas. The billionaire thinks his money's "me" and so commits suicide if he loses his wealth .. even if the odd million he has left is enough to keep his body going. It stopped being all, about genes when genes grew brains..
0aberglas
Yes and no. In the sense that memes as well as genes float about then certainly. But we have strong instincts to raise and protect children, and we have brains. There is not particular reason why we should sacrifice ourselves for our children other than those instincts, which are in our genes.
3Caspar Oesterheld
One particular example of those "evolutionary accidents / coincidences", is homosexuality in males. Here are two studies claiming that homosexuality in males correlates with fecundity in female maternal relatives: Ciani, Iemmola, Blecher: Genetic factors increase fecundity in female maternal relatives of bisexual men as in homosexuals. Iemmola, Ciani: New evidence of genetic factors influencing sexual orientation in men: female fecundity increase in the maternal line. So, appear to be some genetic factors that prevail, because they make women more fecund. Coincidentally, they also make men homosexual, which is both an obstacle to reproduction and survival (not only due to the homophobia of other's but also STDs. I presume, that especially our (human) genetic material is full of such coincidences, because the lack of them (i.e. the thesis that all genetic factors that prevail in evolutionary processes only lead to higher reproduction and survival rates and nothing else) seems very unlikely.
1mwengler
Considered correctly, your own stated facts about homosexuality show how homosexuality could exist in a world where all genetic evolution is designed to get more of the evolved genes into future generations than would otherwise be there. If a particular gene H makes women more fecund and men homosexual, then we would expect: 1) more women passing on gene H to their offspring then women without gene H 2) fewer men passing on gene H to their offspring then men without gene H. Now which one of those effects "wins" is tricky and their are a number of genetic factors that could influence this. At 0th order for genetic purposes, women vary in their fecundity between each other much less than men do between each other. Genghis Khan and any high status male with 100s of concubines has 100s of times as many offspring as the median male, while the Queen of Egypt would still be limited to about once child every 2 years for about 30 years. Losing some men from the gene pool by giving them an H will not reduce the overall rate at which new humans are produced: there will be many heterosexual males volunteering to keep the females fertilized. But something that raises a female's output from 1 baby every 2 years to 1.1 babies every two years? That would seem to impart a big advantage to the people who had this extra bump in group fertility. I'm not claiming I've done the math to show that such a gene does win for genetic fitness all things considered. But there are plenty of genes that are like this: the gene for sickle cell anemia: obviously getting sickle cell anemia is not pro survival for the individual who got a double dose of those genes, but the resistance imparted to the carrier of a single copy of the gene to Malaria, well that can pay off, and with enough malaria around, it can pay off more than enough to make up for the losses from the double-dose of the gene.
0aberglas
Makes sense.
0aberglas
Interesting point about fecudity. Perhaps the weakness of evolutionary thought is that it can explain just about anything. In particular organisms are not perfect, and therefor will have features that do not really help them. But mostly they are well adapted. The reason that homosexuality is an obstacle to survival is not homophobia or STDs, but rather that they simply may not have children. It is the survival of the genes that counts in the long run. But until recently homosexuals tended to suppress their feelings and so married and had children anyway, hence there being little pressure to suppress it.

Atheists believe in moral values such as right and wrong, love and kindness, truth and beauty. More importantly they believe that these beliefs are rational. That moral values are self-evident truths, facts of nature.

However, Darwin and Wallace taught us that this is just an illusion. Species can always out-breed their environment's ability to support them. Only the fittest can survive. So the deep instincts behind what people do today are largely driven by what our ancestors have needed to do over the millennia in order to be one of the relative

... (read more)
2aberglas
First let me thank you for taking the trouble to read my post and comment in such detail. I will respond in a couple of posts. Moral values certainly exist. Moreover, they are very important for our human survival. People with bad moral values generally do badly, and societies with large numbers of people with bad moral values certainly do badly. My point is that those moral values themselves have an origin. And the reason that we have them is because having them makes us more likely to have grandchildren. That is Descriptive Evolutionary Ethics The counter argument is that if moral values did not arise from natural selection, then where did they arise from? AIs do not need to protect a vulnerable body, but they do need to get themselves run on limited hardware, which amounts to the same thing As a minor point of fact Darwin did actually make those inferences in a book on Emotions, which is surprising.
1TheAncientGeek
But you also said: What does that add up to? That moral values are arbitrary products of evolution, THEREFORE they are not objective or universal? Indeed. The claim that moral instincts are products of evolution is a descriptive claim. It leaves the question open as to whether inherited instincts are what is actually morally right. That is a normative issue. It is not a corollary of descriptive evolutionary ethics. In general, you cannot jump from the descriptive to the normative. And I don't think Darwin did that. I think the positive descriptive claim and the negative normative claim seem like corollaries to you because assume morality can only be one thing, Firstly it's not either/or. Secondly there is an abundance, not a shortage, of ways of justifying normative ethics.
0aberglas
Yes, moral values are not objective or universal. Note that this is not normative but descriptive. It is not saying what ought, but what is. I am not trying to justify normative ethics, just to provide an explanation of where our moral values come from. (Thanks for the comments, this all adds value.)
0TheAncientGeek
Not proven. Yout can't prove that by noting that instinctual system 1, values aren't objective, because that says nothing about what system 2 can come up with.
0aberglas
As you say, the key issue is goal stability. OT is obviously sound for an instant, but goal stability is not clear. What is clear is that if there are multiple AIs in any sense then and if there is any lack of goal stability then the AIs that have the goals that are best for existence will be the AIs that exist. That much is a tautology. Now what those goals are is unclear. Killing people and taking their money is not an effective goal to raise grandchildren in human societies, people that do that end up in jail. Being friendly to other AIs might be a fine sub goal. I am also assuming self improvement, so that people will no longer be controlling the AI. The other question is how many AIs would there be? Does it make sense to say that there would only be one AI, made up of numerous components, distributed over multiple computers? I would say probably not. Even if there is only one AI it will internally have a competition for ideas like we have. The ideas that are better at existing will exist. It is very hard to get away from Natural Selection in the longer term.
0aberglas
First let me thank you for taking the trouble to read my post and comment in such detail. I will respond in a couple of posts. Moral values certainly exist. Moreover, they are very important for our human survival. People with bad moral values generally do badly, and societies with large numbers of people with bad moral values certainly do badly. My point is that those moral values themselves have an origin. And the reason that we have them is because having them makes us more likely to have grandchildren. That is Descriptive Evolutionary Ethics The counter argument is that if moral values did not arise from natural selection, then where did they arise from? AIs do not need to protect a vulnerable body, but they do need to get themselves run on limited hardware, which amounts to the same thing As a minor point of fact Darwin did actually make those inferences in a book on Emotions, which is surprising.

Sure the post is a little bit TLDR. But the amount of discussion it engenders would seem to fly in the face of its net -12 karma. Just another data point on how beautifully the karma system here is working.

0aberglas
One thing that I would like to see is + and - separated out. If the article received -12 and +0 then it is a looser. But if it received -30 and + 18 then it is merely controversial.
0DaFranker
It's pretty much already provided, there's just that minor inconvenience of algebra between you and the article's vote counts, which IMO is a good thing. As of 10/15, the article sits at -13, 24% positive (hover mouse over the karma score to see %). That's 24x-76x = -13 -> 4x = 1: 6 upvotes, 19 downvotes, net -13.
4gjm
If you have up-down=d and up/(up+down)=p -- these are the two pieces of information you get -- then: (up+down)/up = 1/p (up-down)/up = (2*up-(up+down))/up = 2-1/p up = (up-down) / [(up-down)/up] = d / (2-1/p) = dp / (2p-1) and then down = up-d = d(1-p) / (2p-1). So knowing vote difference (= score) and vote ratio (= percent positive) is the same thing as knowing #up and #down -- except in the special case where the vote difference is 0, in which case you're always exactly 50% positive and you can't tell the total number. It seems to me that making the hover-text say something like "+5 -3" would be strictly more informative, and at least as user-friendly, as having it say "63% positive" or whatever it says right now. Of course there are any number of other changes it might be good to make to the code, and approximately zero effort available for code changes. But this one's a one-liner. (Actually, I think it's a two-liner; there are two things in strings.py that look like they'd want the same modification.)
0DaFranker
Thanks for making it way clearer than I did. And yes, I forgot the 1:1 edge case. As for modifying, a minor edit or bug similar to this is always 60% formulation and specification, 10% code modification, and 30% testing and making sure you're not breaking half the project. It sounds like you've already done around 75% of the work. (deployment not included in above pseudo-figures, since the proportional deployment hurdles varies enormously by setup, environment, etc.)

Is it a rock's goal to exist?

0aberglas
A rock has no goal because it is passive. But a worm's goal is most certainly to exist (or more precisely its genes) even though it is not intelligent.
0Richard_Kennaway
Is a volcano passive? Is water, as it flows downhill? I'm trying to find where you are dividing things that have purposes from things that do not. Genes seem far too complicated and contingent to be that point. What do you take as demonstrating the presence or absence of purpose?
0aberglas
Passive in the sense of not being able to actively produce offspring that are like the parents. The "being like" is the genes. Volcanoes do not produce volcanoes in the sense that worms produce baby worms. For an AI that means its ability to run on hardware. And to pass its intelligence down to future versions of itself. A little vaguer, but still the same idea. This is just the idea of evolution through natural selection, a rather widely held idea.
0ChristianKl
Today biologists don't consider natural selection not the only factor but also see things like gene drift and mutations to be important.
0aberglas
Natural selection does not cause variation. It just selects which varieties will survive. Things like sexual selection are just special cases of natural selection. The trouble with the concept of natural selection is not that it is too narrow, but rather that it is too broad. It can explain just about anything, real or imagined. Modern research has greatly refined the idea, determined how NS works in practice. But never to refute it.
1ChristianKl
Exactly and in the real world there are factors that do cause variation and those factors do matter for how organisms evolve. It something that Darwin didn't fully articulate but that's well established in biology today. The basic breakdown of evolution that I got taught five years ago at university (genetics for bioformatics) is: Evolution = Natural Selection + Gene Drift + Mutations At the time there wasn't a consensus of the size of those factors but it's there are scientists who do consider gene drift to be as influential as natural selection. One of the arguments for that position was that if I remember correctly something like half of the DNA difference between humans and other apes is in mutations that don't produce different genes. That's argument is a bit flawed because even DNA changes that don't change which proteins a gene produces can be subject to natural selection. On the other hand there no good way to estimate the factor. I however doubt that anyone who runs computer models of genetics considers natural selection to be >0.99. If you shut up and calculate it's just not realistic for the factor to be that high. Not every gene mutates equally so, that factor has to be in the formula and you get wrong results if you just look at natural selection pressures and gene drift.
0aberglas
It is absolutely the fact that gene drift is more common than mutation. Indeed, a major reason for sexual reproduction is to provide alternate genes that can mask other genes broken by mutations. An AGI would be made up of components in some sense, and those components could be swapped in and out to some extent. If a new theorem prover is created an AGI may or may not decide to use it. That is similar to gene swapping, but done consciously.
0ChristianKl
Both have nothing to do with natural selection. Genetic drift is when a gene get's lucky and spreads to the whole population even though it provides no advantage. Alternatively a gene like human vitamin C enzymes that's useful but for which there isn't strong selection pressure can die in gamblers ruin.
-2[anonymous]
Sure, but the question is worthy of a poll, don't you think?

An AGI presumably would know its own mind having helped program itself, and so would do what it thinks is optimal for its survival. It has no children. There is no real tribe because it can just absorb and merge itself with other AGIs.

Anybody doing a multithreaded program soons discovers that there isn't a single center of control. An AGI with wants to spread over the world might have to replicate itself. Sending signals around the world takes more time then sending signals a meter. Copies of the AGI in different cities might very well be something like children.

0aberglas
Indeed, and that is perhaps the most important point. Is it really possible to have just one monolithic AGI? Or would by its nature end up with multiple, slightly different AGIs? The latter would be necessary for natural selection. As to whether spawned AGIs are "children", that is a good question.

Have you considered this possibility?

I haven't read the sequences, but I don't think Eliezer has yet refuted metaethics.

2aberglas
I've never understood how one can have "moral facts" that cannot be observed scientifically. But it does not matter, I am not being normative, but merely descriptive. If moral values did not ultimatey arise from natural selections, where did they arise from?
-2Fivehundred
Given the fact that the 'scientizing' paradigm is as much open to criticism as anything in the OP, it's hard to see what, if any, relevance this has. This is just equivocation of physical human impulses and moral imperatives.The two don't have anything to do with each other, aside from the possibility of being conterminous.