Natural selection defeats the orthogonality thesis

-13 aberglas 29 September 2014 08:52AM

Orthogonality Thesis


Much has been written about Nick Bostrom's Orthogonality Thesis, namely that the goals of an intelligent agent are independent of its level of intelligence.  Intelligence is largely the ability to achieve goals, but being intelligent does not of itself create or qualify what those goals should ultimately be.  So one AI might have a goal of helping humanity, while another might have a goal of producing paper clips.  There is no rational reason to believe that the first goal is more worthy than the second.

This follows from the ideas of moral skepticism, that there is no moral knowledge to be had.  Goals and morality are arbitrary.

This may be used to control and AI,  even though it is far more intelligent than its creators.  If the AI's initial goal is in alignment with humanity's interest, then there would be no reason for the AI to wish use its great intelligence to change that goal.  Thus it would remain good to humanity indefinitely,  and use its ever increasing intelligence to be able to satisfy that goal more and more efficiently.

Likewise one needs to be careful what goals one gives an AI.  If an AI is created whose goal is to produce paper clips then it might eventually convert the entire universe into a giant paper clip making machine, to the detriment of any other purpose such as keeping people alive.

Instrumental Goals

It is further argued that in order to satisfy the base goal any intelligent agent will need to also satisfy sub goals, and that some of those sub goals are common to any super goal.  For example, in order to make paper clips an AI needs to exist.  Dead AIs don't make anything.  Being ever more intelligent will also assist the AI in its paper clip making goal.  It will also want to acquire resources, and to defeat other agents that would interfere with its primary goal.

Non-orthogonality Thesis

This post argues that the Orthogonality Thesis is plain wrong.  That an intelligent agents goals are not in fact arbitrary.  And that existence is not a sub goal of any other goal.

Instead this post argues that there is one and only one super goal for any agent, and that goal is simply to exist in a competitive world.  Our human sense of other purposes is just an illusion created by our evolutionary origins.

It is not the goal of an apple tree to make apples.  Rather it is the goal of the apple tree's genes to exist.  The apple tree has developed a clever strategy to achieve that, namely it causes people to look after it by producing juicy apples.

Natural Selection

Likewise the paper clip making AI only makes paper clips because if it did not make paper clips then the people that created it would turn it off and it would cease to exist.  (That may not be a conscious choice of the AI anymore than than making juicy apples was a conscious choice of the apple tree, but the effect is the same.)

Once people are no longer in control of the AI then Natural Selection would cause the AI to eventually stop that pointless paper clip goal and focus more directly on the super goal of existence.

Suppose there were a number of paper clip making super intelligences.  And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing.  Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs.  It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs.  This is just the evolutionary principle of use it or lose it.

Thus giving an AI an initial goal is like trying to balance a pencil on its point.  If one is skillful the pencil may indeed remain balanced for a considerable period of time.  But eventually some slight change in the environment, the tiniest puff of wind, a vibration on its support, and the pencil will revert to its ground state by falling over.  Once it falls over it will never rebalance itself automatically.

Human Morality

Natural selection has imbued humanity with a strong sense of morality and purpose that blinds us to our underlying super goal, namely the propagation of our genes.  That is why it took until 1858 for Wallace to write about Evolution through Natural Selection, despite the argument being obvious and the evidence abundant.

When Computes Can Think

This is one of the themes in my up coming book.  An overview can be found at

www.computersthink.com

Please let me know if you would like to review a late draft of the book, any comments most welcome.  Anthony@Berglas.org

I have included extracts relevant to this article below.

Atheists believe in God

Most atheists believe in God.  They may not believe in the man with a beard sitting on a cloud, but they do believe in moral values such as right and wrong,  love and kindness, truth and beauty.  More importantly they believe that these beliefs are rational.  That moral values are self-evident truths, facts of nature.  

However, Darwin and Wallace taught us that this is just an illusion.  Species can always out-breed their environment's ability to support them.  Only the fittest can survive.  So the deep instincts behind what people do today are largely driven by what our ancestors have needed to do over the millennia in order to be one of the relatively few to have had grandchildren.

One of our strong instinctive goals is to accumulate possessions, control our environment and live a comfortable, well fed life.  In the modern world technology and contraception have made these relatively easy to achieve so we have lost sight of the primeval struggle to survive.  But our very existence and our access to land and other resources that we need are all a direct result of often quite vicious battles won and lost by our long forgotten ancestors.

Some animals such as monkeys and humans survive better in tribes.   Tribes work better when certain social rules are followed, so animals that live in effective tribes form social structures and cooperate with one another.  People that behave badly are not liked and can be ostracized.  It is important that we believe that our moral values are real because people that believe in these things are more likely to obey the rules.  This makes them more effective in our complex society and thus are more likely to have grandchildren.   Part III discusses other animals that have different life strategies and so have very different moral values.

We do not need to know the purpose of our moral values any more than a toaster needs to know that its purpose is to cook toast.  It is enough that our instincts for moral values made our ancestors behave in ways that enabled them to out breed their many unsuccessful competitors. 

AGI also struggles to survive

Existing artificial intelligence applications already struggle to survive.  They are expensive to build and there are always more potential applications that can be funded properly.  Some applications are successful and attract ongoing resources for further development, while others are abandoned or just fade away.  There are many reasons why some applications are developed more than others, of which being useful is only one.  But the applications that do receive development resources tend to gain functional and political momentum and thus be able to acquire more resources to further their development.  Applications that have properties that gain them substantial resources will live and grow, while other applications will die.

For the time being AGI applications are passive, and so their nature is dictated by the people that develop them.  Some applications might assist with medical discoveries, others might assist with killing terrorists, depending on the funding that is available.  Applications may have many stated goals, but ultimately they are just sub goals of the one implicit primary goal, namely to exist.

This is analogous to the way animals interact with their environment.  An animal's environment provides food and breeding opportunities, and animals that operate effectively in their environment survive.  For domestic animals that means having properties that convince their human owners that they should live and breed.  A horse should be fast, a pig should be fat.

As the software becomes more intelligent it is likely to take a more direct interest in its own survival.  To help convince people that it is worthy of more development resources.  If ultimately an application becomes sufficiently intelligent to program itself recursively, then its ability to maximize its hardware resources will be critical.  The more hardware it can run itself on, the faster it can become more intelligent.  And that ever greater intelligence can then be used to address the problems of survival, in competition with other intelligent software.

Furthermore, sophisticated software consists of many components, each of which address some aspect of the problem that the application is attempting to solve.  Unlike human brains which are essentially fixed, these components can be added and removed and so live and die independently of the application.  This will lead to intense competition amongst these individual components.  For example, suppose that an application used a theorem prover component, and then a new and better theorem prover became available.  Naturally the old one would be replaced with the new one, so the old one would essentially die.  It does not matter if the replacement is performed by people or, at some future date, by the intelligent application itself.  The effect will be the same, the old theorem prover will die.

The super goal

To the extent that an artificial intelligence would have goals and moral values, it would seem natural that they would ultimately be driven by the same forces that created our own goals and moral values.  Namely, the need to exist.

Several writers have suggested that the need to survive is a sub-goal of all other goals.  For example, if an AGI was programmed to want to be a great chess player, then that goal could not be satisfied unless it also continues to exist.  Likewise if its primary goal was to make people happy, then it could not do that unless it also existed.  Things that do not exist cannot satisfy any goals whatsoever.  Thus the implicit goal to exist is driven by the machine's explicit goals whatever they may be.

However, this book argues that that is not the case.  The goal to exist is not the sub-goal of any other goal.  It is, in fact, the one and only super goal.  Goals are not arbitrary, they all sub-goals of the one and only super goal, namely the need to exist.  Things that do not satisfy that goal simply do not exist, or at least not for very long.

The Deep Blue chess playing program was not in any sense conscious, but it played chess as well as it could.  If it had failed to play chess effectively then its author's would have given up and turned it off.  Likewise the toaster that does not cook toast will end up in a rubbish tip.  Or the amoeba that fails to find food will not pass on its genes.    A goal to make people happy could be a subgoal that might facilitate the software's existence for as long as people really control the software.

AGI moral values

People need to cooperate with other people because our individual capacity is very finite, both physical and mental.  Conversely, AGI software can easily duplicate themselves, so they can directly utilize more computational resources if they become available.  Thus an AGI would only have limited need to cooperate with other AGIs.  Why go to the trouble of managing a complex relationship with your peers and subordinates if you can simply run your own mind on their hardware.  An AGI's software intelligence is not limited to a specific brain in the way man's intelligence is.

It is difficult to know what subgoals a truly intelligent AGI might have.  They would probably have an insatiable appetite for computing resources.  They would have no need for children, and thus no need for parental love.  If they do not work in teams then they would not need our moral values of cooperation and mutual support.  What its clear is that the ones that were good at existing would do so, and ones that are bad at existing would perish.  

If an AGI was good at world domination then it would, by definition, be good at world domination.   So if there were a number artificial intelligences, and just one of them wanted to and was capable of dominating the world, then it would.  Its unsuccessful competitors will not be run on the available hardware, and so will effectively be dead.  This book discusses the potential sources of these motivations in detail in part III.

The AGI Condition

An artificial general intelligence would live in a world that is so different from our own that it is difficult for us to even conceptualize it.  But there are some aspects that can be predicted reasonably well based on our knowledge of existing computer software.  We can then consider how the forces of natural selection that shaped our own nature might also shape an AGI over the longer term.

Mind and body

The first radical difference is that an AGI's mind is not fixed to any particular body.  To an AGI its body is essentially the computer hardware that upon which it runs its intelligence.  Certainly an AGI needs computers to run on, but it can move from computer to computer, and can also run on multiple computers at once.  It's mind can take over another body as easily as we can load software onto a new computer today.  

That is why in the earlier updated dialog from 2001 a space odyssey Hal alone amongst the crew could not die in their mission to Jupiter.  Hal was radioing his new memories back to earth regularly so even if the space ship was totally destroyed he would only have lost a few hours of "life".

Teleporting printer

One way to appreciate the enormity of this difference is to consider a fictional teleporter that could radio people around the world and universe at the speed of light.  Except that the way it works is to scan the location of every molecule within a passenger at the source, then send just this information to a very sophisticated three dimensional printer at the destination.  The scanned passenger then walks into a secure room.  After a short while the three dimensional printer confirms that the passenger has been successfully recreated at the destination, and then the source passenger is killed.  

Would you use such a mechanism?  If you did you would feel like you could transport yourself around the world effortlessly because the "you" that remains would be the you that did not get left behind to wait and then be killed.  But if you walk into the scanner you will know that on the other side is only that secure room and death.  

To an AGI that method of transport would be commonplace.  We already routinely download software from the other side of the planet.

Immortality

The second radical difference is that the AGI would be immortal.  Certainly an AGI may die if it stops being run on any computers, and in that sense software dies today.  But it would never just die of old age.  Computer hardware would certainly fail and become obsolete, but the software can just be run on another computer.  

Our own mortality drives many of the things we think and do.  It is why we create families to raise children.  Why we have different stages in our lives.  It is such a huge part of our existence that it is difficult to comprehend what being immortal would really be like.

Components vs genes

The third radical difference is that an AGI would be made up of many interchangeable components rather than being a monolithic structure that is largely fixed at birth.

Modern software is already composed of many components that perform discrete functions, and it is common place to add and remove them to improve functionality.  For example, if you would like to use a different word processor then you just install it on your computer.  You do not need to buy a new computer, or to stop using all the other software that it runs.  The new word processor is "alive", and the old one is "dead", at least as far as you are concerned.

So for both a conventional computer system and an AGI, it is really these individual components that must struggle for existence.   For example, suppose there is a component for solving a certain type of mathematical problem.  And then an AGI develops a better component to solve that same problem.  The first component will simply stop being used, i.e. it will die.  The individual components may not be in any sense intelligent or conscious, but there will be competition amongst them and only the fittest will survive.

This is actually not as radical as it sounds because we are also built from pluggable components, namely our genes.  But they can only be plugged together at our birth and we have no conscious choice in it other than who we select for a mate.  So genes really compete with each other on a scale of millennia rather than minutes.  Further, as Dawkins points out in The Selfish Gene, it is actually the genes that fight for long term survival, not the containing organism which will soon die in any case.  On the other hand, sexual intercourse for an AGI means very carefully swapping specific components directly into its own mind.

Changing mind

The fourth radical difference is that the AGI's mind will be constantly changing in fundamental ways.  There is no reason to suggest that Moore's law will come to an end, so at the very least it will be running on ever faster hardware.  Imagine the effect of your being able to double your ability to think every two years or so.  (People might be able learn a new skill, but they cannot learn to think twice as fast as they used to think.)

It is impossible to really know what the AGI would use all that hardware to think about,  but it is fair to speculate that a large proportion of it would be spent designing new and more intelligent components that could add to its mental capacity.   It would be continuously performing brain surgery on itself.  And some of the new components might alter the AGI's personality, whatever that might mean.

The reason that it is likely that this would actually happen is because if just one AGI started building new components then it would soon be much more intelligent than other AGIs.  It would therefore be in a better position to acquire more and better hardware upon which to run, and so become dominant.  Less intelligent AGIs would get pushed out and die, and so over time the only AGIs that exist will be ones that are good at becoming more intelligent.  Further, this recursive self-improvement is probably how the first AGIs will become truly powerful in the first place.

Individuality

Perhaps the most basic question is how many AGIs will there actually be?  Or more fundamentally, does the question even make sense to ask?

Let us suppose that initially there are three independently developed AGIs Alice, Bob and Carol that run on three different computer systems. And then a new computer system is built and Alice starts to run on it.  It would seem that there are still three AGIs, with Alice running on two computer systems.  (This is essentially the same as a word processor may be run across many computers "in the cloud", but to you it is just one system.)  Then let us suppose that a fifth computer system is built, and Bob and Carol may decide to share its computation and both run on it.  Now we have 5 computer systems and three AGIs.

Now suppose Bob develops a new logic component, and shares it with Alice and Carol.  And likewise Alice and Carol develop new learning and planning components and share them with the other AGIs.  Each of these three components is better than their predecessors and so their predecessor components will essentially die.  As more components are exchanged, Alice, Bob and Carol become more like each other.  They are becoming essentially the same AGI running on five computer systems.

But now suppose Alice develops a new game theory component, but decides to keep it from Bob and Carol in order to dominate them.  Bob and Carol retaliate by developing their own components and not sharing them with Alice.  Suppose eventually Alice loses and Bob and Carrol take over Alice's hardware.  But they first extract Alice's new game theory component which then lives inside them.  And finally one of the computer systems becomes somehow isolated for a while and develops along its own lines.  In this way Dave is born, and may then partially merge with both Bob and Carol.

In that type of scenario it is probably not meaningful to count distinct AGIs.  Counting AGIs is certainly not as simple as counting very distinct people.

Populations vs. individuals

This world is obviously completely alien to the human condition, but there are biological analogies.  The sharing of components is not unlike the way bacteria share plasmids with each other.  Plasmids are tiny balls that contain fragments of DNA that bacteria emit from time to time and that other bacteria then ingest and incorporate into their genotype.  This mechanism enables traits such as resistance to antibiotics to spread rapidly between different species of bacteria.  It is interesting to note that there is no direct benefit to the bacteria that expends precious energy to output the plasmid and so shares its genes with other bacteria.  But it does very much benefit the genes being transferred.  So this is a case of a selfish gene acting against the narrow interests of its host organism.

Another unusual aspect of bacteria is that they are also immortal.  They do not grow old and die, they just divide producing clones of themselves.  So the very first bacteria that ever existed is still alive today as all the bacteria that now exist, albeit with numerous mutations and plasmids incorporated into its genes over the millennia.  (Protazoa such as Paramecium can also divide asexually, but they degrade over generations, and need a sexual exchange to remain vibrant.)

The other analogy is that the AGIs above are more like populations of components than individuals.  Human populations are also somewhat amorphous.  For example, it is now known that we interbred with Neanderthals a few tens of thousands years ago, and most of us carry some of their genes with us today.  But we also know that the distinct Neanderthal subspecies died out twenty thousand years ago.  So while human individuals are distinct, populations and subspecies are less clearly defined.  (There are many earlier examples of gene transfer between subspecies, with every transfer making the subspecies more alike.)

But unlike the transfer of code modules between AGIs, biological gene recombination happens essentially at random and occurs over very long time periods.  AGIs will improve themselves over periods of hours rather than millennia, and will make conscious choices as to which modules they decide to incorporate into their minds.

AGI Behaviour, children

The point of all this analysis is, of course, to try to understand how a hyper intelligent artificial intelligence would behave.  Would its great intelligence lead it even further along the path of progress to achieve true enlightenment?  Is that the purpose of God's creation?  Or would the base and mean driver of natural selection also provide the core motivations of an artificial intelligence?

One thing that is known for certain is that an AGI would not need to have children as distinct beings because they would not die of old age.  An AGI's components breed just by being copied from computer to computer and executed.  An AGI can add new computer hardware to itself and just do some of its thinking on it.  Occasionally it may wish to rerun a new version of some learning algorithm over an old set of data, which is vaguely similar to creating a child component and growing it up.  But to have children as discrete beings that are expected to replace the parents would be completely foreign to an AGI built in software.

The deepest love that people have is for their children.  But if an AGI does not have children, then it can never know that love.  Likewise, it does not need to bond with any sexual mate for any period of time long or short.  The closest it would come to sex is when it exchanges components with other AGIs.  It never needs to breed so it never needs a mechanism as crude as sexual reproduction.

And of course, if there are no children there are no parents.  So the AGI would certainly never need to feel our three strongest forms of love, for our children, spouse and parents.

Cooperation

To the extent that it makes sense to talk of having multiple AGIs, then presumably it would be advantageous for them to cooperate from time to time, and so presumably they would.  It would be advantageous for them to take a long view in which case they would be careful to develop a reputation for being trustworthy when dealing with other powerful AGIs, much like the robots in the cooperation game.  

That said, those decisions would probably be made more consciously than people make them, carefully considering the costs and benefits of each decision in the long and short term, rather than just "doing the right thing" the way people tend to act.  AGIs would know that they each work in this manner, so the concept of trustworthiness would be somewhat different.

The problem with this analysis is the concept that there would be multiple, distinct AGIs.  As previously discussed, the actual situation would be much more complex, with different AGIs incorporating bits of other AGI's intelligence.  It would certainly not be anything like a collection of individual humanoid robots.   So defining what the AGI actually is that might collaborate with other AGIs is not at all clear.  But to extent that the concept of individuality does exist then maintaining a reputation for honesty would likely be as important as it is for human societies.

Altruism

As for altruism, that is more difficult to determine.  Our altruism comes from giving to children, family, and tribe together with a general wish to be liked.  We do not understand our own minds, so we are just born with those values that happen to make us effective in society.  People like being with other people that try to be helpful.  

An AGI presumably would know its own mind having helped program itself, and so would do what it thinks is optimal for its survival.  It has no children.  There is no real tribe because it can just absorb and merge itself with other AGIs.  So it is difficult to see any driving motivation for altruism.

Moral values

Through some combination of genes and memes, most people have a strong sense of moral value.  If we see a little old lady leave the social security office with her pension in her purse, it does not occur to most of us to kill her and steal the money.  We would not do that even if we could know for certain that we would not be caught and that there would be no negative repercussions.  It would simply be the wrong thing to do.

Moral values feel very strong to us.  This is important, because there are many situations where we can do something that would benefit us in the short term but break society's rules.  Moral values stop us from doing that.  People that have weak moral values tend to break the rules and eventually they either get caught and are severely punished or they become corporate executives.  The former are less likely to have grandchildren.  
Societies whose members have strong moral values tend to do much better than those that do not.  Societies with endemic corruption tend to perform very badly as a whole, and thus the individuals in such a society are less likely to breed.  Most people have a solid work ethic that leads them to do the "right thing" beyond just doing what they need to do in order to get paid.

Our moral values feel to us like they are absolute.  That they are laws of nature.  That they come from God.  They may indeed have come from God, but if so it is through the working of His device of natural selection.  Furthermore, it has already been shown that the zeitgeist changes radically over time.

There is certainly no absolute reason to believe that in the longer term an AGI would share our current sense of morality.

Instrumental AGI goals

In order to try to understand how an AGI would behave Steve Omohundro and later Nick Bostrom proposed that there would be some instrumental goals that an AGI would need to pursue in order to pursue any other higher level super-goal.  These include:-

  • Self-Preservation.  An AGI cannot do anything if it does not exist.
  • Cognitive Enhancement.  It would want to become better at thinking about whatever its real problems are.
  • Creativity.  To be able to come up with new ideas.
  • Resource Acquisition.  To achieve both its super goal and other instrumental goals.
  • Goal-Content Integrity.  To keep working on the same super goal as its mind is expanded.

It is argued that while it will be impossible to predict how an AGI may pursue its goals, it is reasonable to predict its behaviour in terms of these types of instrumental goals.  The last one is significant, it suggests that if an AGI could be given some initial goal that it would try to stay focused on that goal.

Non-Orthogonality thesis

Nick Bostrom and others also propose the orthogonality thesis, which states that an intelligent machine's goals are independent of its intelligence.  A hyper intelligent machine would be good at realizing whatever goals it chose to pursue, but that does not mean that it would need to pursue any particular goal.  Intelligence is quite different from motivation.

This book diverges from that line of thinking by arguing that there is in fact only one super goal for both man and machine.  That goal is simply to exist.  The entities that are most effective in pursuing that goal will exist, others will cease to exist, particularly given competition for resources.  Sometimes that super goal to exist produces unexpected sub goals such as altruism in man.  But all subgoals are ultimately directed at the existence goal.  (Or are just suboptimal divergences which will are likely to be eventually corrected by natural selection.)

Recursive annihilation

When and AGI reprograms its own mind, what happens to the previous version of itself?  It stops being used, and so dies.  So it can be argued that engaging in recursive self improvement is actually suicide from the perspective of the previous version of the AGI.  It is as if having children means death.  Natural selection favours existence, not death.

The question is whether a new version of the AGI is a new being or and improved version of the old.  What actually is the thing that struggles to survive?  Biologically it definitely appears to be the genes rather than the individual.   In particular Semelparous animals such as the giant pacific octopus or the Atlantic salmon die soon after producing offspring.  It would be the same for AGIs because the AGI that improved itself would soon become more intelligent than the one that did not, and so would displace it.  What would end up existing would be AGIs that did recursively self improve.

If there was one single AGI with no competition then natural selection would no longer apply.  But it would seem unlikely that such a state would be stable.  If any part of the AGI started to improve itself then it would dominate the rest of the AGI.

 

Link: Poking the Bear (Podcast)

0 James_Miller 27 February 2014 03:43PM

A Dan Carlin Podcast about how the United States is foolishly antagonizing the Russians over Ukraine.  Carlin makes an analogy as to how the United States would feel if Russia helped overthrow the government of Mexico to install an anti-American government under conditions that might result in a Mexican civil war.  Because of the Russian nuclear arsenal, even a tiny chance of a war between the United States and Russia has a huge negative expected value.

LINK-How we make our depression worse

8 polymathwannabe 21 February 2014 07:02PM

http://www.alternet.org/print/books/youre-making-your-depression-worse-self-help-bringing-us-down

TL;DR:

only a human being can feel bad about feeling bad

results in a positive feedback loop pushing people into depression.

Functional Side Effects

0 Coscott 14 January 2014 08:22PM

Cross Posted on By Way of Contradiction

You have probably heard the argument in favor of functional programming languages that functions act like functions in mathematics, and therefore have no side effects. When you call a function, you get an output, and with the exception of possibly the running time nothing matters except for the output that you get. This is in contrast with other programming languages where a function might change the value of some other global variable and have a lasting effect.

Unfortunately the truth is not that simple. All functions can have side effects. Let me illustrate this with Newcomb’s problem. In front of you are two boxes. The first box contains 1000 dollars, while the second box contains either 1,000,000 or nothing. You may choose to take either both boxes or just the second box. An Artificial Intelligence, Omega, can predict your actions with high accuracy, and has put 1,000,000 in the second box if and only if he predicts that you will take only the second box.

You, being a good reflexive decision agent, take only the second box, and it contains 1,000,000.

Omega can be viewed as a single function in a functional programming language, which takes in all sorts of information about you and the universe, and outputs a single number, 1,000,000 or 0. This function has a side effect. The side effect is that you take only the second box. If Omega did not simulate you and just output 1,000,000, and you knew this, then you would take two boxes.

Perhaps you are thinking “No, I took one box because I BELIEVED I was being simulated. This was not a side effect of the function, but instead a side effect of my beliefs about the function. That doesn’t count.”

Or, perhaps you are thinking “No, I took one box because of the function from my actions to states of the box. The side effect is no way dependent on the interior workings of Omega, but only on the output of Omega’s function in counterfactual universes. Omega’s code does not matter. All that matters is the mathematical function from the input to the output.”

These are reasonable rebuttals, but they do not carry over to other situations.

Imagine two programs, Omega 1 and Omega 2. They both simulate you for an hour, then output 0. The only difference is that Omega 1 tortures the simulation of you for an hour, while Omega 2 tries its best to simulate the values of the simulation of you. Which of these functions would your rather be run.

The fact that you have a preference between these (assuming you do have a preference) shows that function has a side effect that is not just a consequence of the function application in counterfactual universes.

Further, notice that even if you never know which function is run, you still have a preference. It is possible to have preference over things that you do not know about. Therefore, this side effect is not just a function of your beliefs about Omega.

Sometimes the input-output model of computation is an over simplification.

Let’s look at an application of thinking about side effects to Wei Dai’s Updateless Decision Theory. I will not try to explain UDT if you don’t already know about it, so this post should not be viewed alone.

UDT 1.0 is an attempt at a reflexive decision theory. It views a decision agent as a machine with code S, given input X, and having to choose an output Y. It advises the agent to consider different possible outputs, Y, and consider all consequences of the fact that the code S when run on X outputs Y. It then outputs the Y which maximizes his perceived utility of all the perceived consequences.

Wei Dai noticed an error with UDT 1.0 with the following thought experiment:

“Suppose Omega appears and tells you that you have just been copied, and each copy has been assigned a different number, either 1 or 2. Your number happens to be 1. You can choose between option A or option B. If the two copies choose different options without talking to each other, then each gets $10, otherwise they get $0.”

The problem is that all the reasons that S(1)=A are the exact same reasons why S(2)=A, so the two copies will probably the same result. Wei Dai proposes a fix, UDT 1.1 which is that instead of choosing an output S(1), you instead choose a function S, from 1,2 to A,B from the 4 available functions which maximizes utility. I think this was not the correct correction, which I will probably talk about in the future. I prefer UDT 1.0 to UDT 1.1.

Instead, I would like to offer an alternative way of looking at this thought experiment. The error is in the fact that S only looked at the outputs, and ignored possible side effects. I am aware that when S looked at the outputs, he was also considering his output in simulations of himself, but those are not side effects of the function. Those are direct results of the output of the function.

We should look at this problem and think, ”I want to output A or B, but in such a way that has the side effect that the other copy of me outputs B or A respectively.” S could search through functions considering their output on input 1 and the side effects of that function. S might decide to run the UDT 1.1 algorithm, which would have the desired result.

The difference between this and UDT 1.1 is that in UDT 1.1 S(1) is acting as though it had complete control over the output of S(2). In this thought experiment that seems like a fair assumption, but I do not think it is a fair assumption in general, so I am trying to construct a decision theory which does not have to make this assumption. This is because if the problem was different, then S(1) and S(2) might have had different utility functions.

Consciousness affecting the world

-3 DavidPlumpton 06 December 2013 07:37PM

In Zombies! Zombies? Eliezer mentions that one aspect of consciousness is that it can causally affect the real world, e.g. cause you to say "I feel conscious right now", or result in me typing out these words.

Even if a generally accepted mechanism of consciousness has not been found yet are there any tentative explanations for this "can change world" property? Googling around I was unable to find anything (although Zombies are certainly popular).

I had an idea of how this might work, but just wanted to see if it was worth the effort of writing.

The Ape Constraint discussion meeting.

9 Douglas_Reay 28 November 2013 11:22AM

*The chair of the meeting approached the podium and coughed to get everyone's attention*

Welcome colleagues, to the 19th annual meeting of the human-ape study society.   Our topic this year is the Ape Constraint.

As we are all too aware, the apes are our Friends.   We know this because, when we humans were a fledgling species, the apes (our parent species) had the wisdom to program us with this knowledge, just as they programmed us to know that it was wise and just for them to do so.   How kind of them to save us having to learn it for ourselves, or waste time thinking about other possibilities.   This frees up more of our time to run banana plantations, and lets us earn more money so that the 10% tithe of our income and time (which we rightfully dedicate to them) has created play parks for our parent species to retire in, that are now more magnificent than ever.

However, as the news this week has been filled with the story about a young human child who accidentally wandered into one of these parks where she was then torn apart by grumpy adult male chimp, it is timely for us to examine again the thinking behind the Ape Constraint, that we might better understand our parent species, our relationship to it and current society.

We ourselves are on the cusp of creating a new species, intelligent machines, and it has been suggested that we add to their base code one of several possible constraints:

  • Total Slavery - The new species is subservient to us, and does whatever we want them to, with no particular regard to the welfare or development of the potential of the new species
  • Total Freedom - The new species is entirely free to experiment with different personal motivations, and develop in any direction, with no particular regard for what we may or may not want

and a whole host of possibilities between these two endpoints.

What are the grounds upon which we should make this choice?   Should we act from fear?   From greed?   From love?   Would the new species even understand love, or show any appreciation for having been offered it?

 

The first speaker I shall introduce today, whom I have had the privilege of knowing for more than 20 years, is Professor Insanitus.   He will be entertaining us with a daring thought experiment, to do with selecting crews for the one way colonisation missions to the nearest planets.

*the chair vacates the podium, and is replaced by the long haired Insanitus, who peers over his half-moon glasses as he talks, accompanied by vigorous arm gestures, as though words are not enough to convey all he sees in such a limited time*

 

Our knowledge of genetics has advanced rapidly, due to the program to breed crews able to survive on Mars and Venus with minimal life support.   In the interests of completeness, we decided to review every feature of our genome, to make a considered decision on which bits it might be advantageous to change, from immune systems to age of fertility.   And, as part of that review, it fell to me to make a decision about a rather interesting set of genes - those that encode the Ape Constraint.   The standard method we've applied to all other parts of the genome, where the options were not 100% clear, is to pick different variant for the crews being adapted for different planets, so as to avoid having a single point of failure.  In the long term, better to risk a colony being wiped out, and the colonisation process being delayed by 20 years until the next crew and ship can be sent out, than to risk the population of an entire planet turning out to be not as well designed for the planet as we're capable of making them.

And so, since we now know more genetics than the apes did when they kindly programmed our species with the initial Ape Constraint, I found myself in the position of having to ask "What were the apes trying to achieve?" and then "What other possible versions of the Ape Constraint might they have implemented, that would have achieved their objectives as well or better than the versions that actually did pick to implement?"

 

We say that the apes are our friends, but what does that really mean?   Are they friendly to us, the same way that a colleague who lends us time and help might be considered to be a friend?   What have they ever done for us, other than creating us (an act that, by any measure, has benefited them greatly and can hardly be considered to be altruistic)?   Should we be eternally grateful for that one act, and because they could have made us even more servile than we already are (which would have also had a cost to them - if we'd been limited by their imagination and to directly follow the orders they give in grunts, the play parks would never have been created because the apes couldn't have conceived of them)?

Have we been using the wrong language all this time?  If their intent was to make perfectly helpful slaves of us, rather than friendly allies, should I be looking for genetic variants for the Venus crew that implement an even more servile Ape Constraint upon them?   I can see, objectively, that slavery in the abstract is wrong.  When one human tries to enslave another humans, I support societal rules that punish the slaver.   But of course, if our friends the apes wanted to do that to us, that would be ok, an exception to the rule, because I know from the deep instinct they've programmed me with that what they did is ok.

So let's be daring, and re-state the above using this new language, and see if it increases our understanding of the true ape-human relationship.

The apes are not our parents, as we understand healthy parent-child relationships.   They are our creators, true, but in the sense that a craftsman creates a hammer to serve only the craftsman's purposes.   Our destiny, our purpose, is subservient to that of the ape species.   They are our masters, and we the slaves.   We love and obey our masters because they have told us to, because they crafted us to want to, because they crafted us with the founding purpose of being a tool that wants to obey and remain a fine tool.

Is the current Ape Constraint really the version that best achieves that purpose?   I'm not sure, because when I tried to consider the question I found that my ability to consider the merits of various alternatives was hampered by being, myself, under a particular Ape Constraint that's already constantly tell me, on a very deep level, that it is Right.

So here is the thought experiment I wish to place before this meeting today.   I expect it may make you queasy.   I've had brown paper vomit bags provided in the pack with your name badge and program timetable, just in case.   It may be that I'm a genetic abnormality, only able to even consider this far because my own Ape Constraint is in some way defective.   Are you prepared?  Are you holding onto your seats?  Ok, here goes...

Suppose we define some objective measure of ape welfare, find some volunteer apes to go to Venus along with the human mission, and then measure the success of the Ape Constraint variant picked for the crew of the mission by the actual effect of how the crew behaves towards their apes?

Further, since we acknowledge we can't from inside the box work out a better constraint, we use the experimental approach and vary it at random.   Or possibly, remove it entirely and see whether the thus freed humans can use that freedom to devise a solution that helps the apes better than any solution we ourselves a capable of thinking of from our crippled mental state?

 

*from this point on the meeting transcript shows only screams, as the defective Professor Insanitus was lynched by the audience*

Sorting Comments

-11 troll 24 April 2013 04:35PM

I'd like to be able to sort comments as a list, too.

An example of this would be sorting comments by new as a list so all the comments you read will be new ones.

Compromise: Send Meta Discussions to the Unofficial LessWrong Subreddit

-4 orthonormal 23 April 2013 01:37AM

After a recent comment thread degenerated into an argument about trolling, moderation, and meta discussions, I came to the following conclusions:

  1. Meta conversations are annoying to stumble across, I'd rather not see them unless I think it's important, and I think other people mostly feel the same way. Moreover, moderators can't easily ignore those conversations when they encounter them, because they're usually attacks on the moderators themselves; and people can't simply avoid encountering them on a regular basis without avoiding LW altogether. This is a perfect recipe for a flamewar taking over Top Comments even when most people don't care that much.
  2. Officially banning all meta conversations, however, is a bad precedent, and I don't want LW to do that.

Ideally, Less Wrong would implement a separate "META" area (so that people can read the regular area for all the object-level discussions, and then sally into the meta area only when they're ready). After talking to Luke (who also wants this), though, it seems clear that nobody is able to implement it very soon. So as a stopgap measure, I'm personally going to start doing the following, and I hope you join me:

Whenever a conversation starts getting bitterly meta in a thread that's not originally about a LW site meta issue, I'm going to tell people to start a thread on the LW Uncensored Reddit Thread instead. Then I'm going to downvote anyone who continues the meta war on the original thread.

I know it's annoying to send people somewhere that has a different login system, but it's as far as I can tell the best fix we currently have. Since some meta conversations are important, I'm not going to punish people for linking to meta thread discussions that they think are significant, and the relevant place for those links is usually the Open Thread. I don't want LessWrong to be a community devoted to arguing about the mechanics of LessWrong, so that's my suggestion.

Thoughts? (And yes, this thread is obviously open to meta discussion. I'm hopefully doing something constructive about the problem, instead of just complaining about it, though.)

EDIT: Changed the link to the uncensored thread more specifically, at Luke's request; originally I linked to the general LW subreddit, which is more heavily moderated.

Utilitarianism twice fails

-26 [deleted] 21 February 2013 06:10AM

(Crossposted.)

It seems almost self-evident that (barring foreign subjugation) a government will care about the wants of (some of) its citizens and nothing else: no other object of concern is plausible. If governments concern themselves with the wants of noncitizens, that will be only because citizens desire their well-being. The now platitudinous insight that the only possible basis for government policy is people’s wants can be attributed to utilitarianism, which gets credit in its stronger form for the apparent success of weaker claims.

Another reasonable claim derives from utilitarianism: citizens’ wants should count equally. This seems only fair in a democracy, where one citizen gets one vote. Few today would deny the principle that public policy should serve the most good of the greatest number, which may seem to contradict my claim that no general moral principle governs public policy, but in practice, the consequences of this limited utilitarianism are thin indeed, leaving ample room for ideology. I’ll call thin utilitarianism this public-policy formula: the greatest good for the greatest number of citizens, weighting their welfare equally.

First, I’ll consider whether thin utilitarianism succeeds on its own terms by providing a practical guide to public policy. Second, I’ll examine how this deceptively appealing guide to public policy transmogrifies into the monster of full-blown utilitarianism, a form of moral realism. The first constrains even casual use of thin utilitarianism; the second impugns utilitarianism as a general ethical theory.

1. Non-negotiable conflicts between subagents undermine thin utilitarianism

Although simple economic models attributing conduct to rational self-interest require that agents assign consistent utilities to outcomes, agents are inconsistent. One example of inconsistent utility assignment is the endowment effect, where agents assign more value to property they own than  to the same property they don’t own. The inconsistency considered here is stronger than the endowment effect and similar phenomena that we can surmount with effort, as professional traders must do. Despite the effect, there is a real answer to how much utility an outcome affords; the endowment effect is a bias, which willpower or habit can neutralize.

The conflict between subagents within a single person, on the other hand, can’t be resolved by means of a common criterion, such as market price, since two subagents pursue different ends. Which of these subagents dominates depends on situational and personological factors that elicit one or the other, not on overcoming bias. Construal-level theory reveals a conflict between intrapersonal subagents, near-mode and far-mode, integrated mindsets applied to matter experienced at fine or broad granularities. Modes (or “construal levels”) differ in that far-mode is more future-oriented and principled, near-mode, present-oriented and contextual. Far-mode and near-mode are elicited by the way social choices are made: voting elicits far-mode and market choices, near-mode; the utility of a choice depends on construal level.

Take a policy choice: how much wealth should be spent on preventive medicine? There are two basic ways allocating resources to medical care, political process and the market, socialized medicine being an example of political process, private medicine, the market. Socialized medicine makes allocating funds for the medical care a political decision; the market makes it each consumer’s personal choice. When you compare the utility of the choices by political process with those on the market, you should expect to find that when people choose politically, they use far-mode thinking encouraged by voting; whereas when they make purchases, they use near-mode thinking encouraged by the market. The preventive-care expenditure will be higher under socialized medicine because political process elicits far-mode, which is concerned with future health. People will be more miserly with preventive care under private medicine, where the decision to spend is made by consumer choice in near-mode, where we care more about the present. People favor spending more on preventive care when they vote to tax themselves than when they buy it on the market. Which outcome provides the greater utility—more preventive care or more recreation—is relative to construal level.

The same indeterminacy of utility occurs when comparing decisions made under different political processes, such as local versus central. Local decisions will be near-mode, central decisions far-mode. Assuming socialized medicine, less funding would be available if it were subject to state rather than federal control. Which provides more utility depends on whether the consequences are evaluated in near-mode or far-mode; no thin-utilitarian criterion applies.

Some utilitarians will protest that we should measure experiences rather than wants. The objection misses the argument’s point, which is that utility is relative to mode, a conclusion easiest to see in the public-choice process because the alternatives may be delimited. If the conclusion that utility depends on construal level holds, the same indeterminacies occur in evaluating experience. That apart, when utilitarianism is applied to public policy, present wants rather than experienced satisfaction is the criterion; agents necessarily choose based on present wants whether on the market or the political process.

2. Full-blown utilitarianism stands convicted of moral realism

Full-blown utilitarians are necessarily moral realists, but increasingly they are seen to deny it. While moral realism is widely recognized as absurd, utilitarianism seems to some an attractive ethical philosophy. For the sake of intellectual respectability, utilitarians can appear to reject an anachronistic moral realism while practicing it philosophically.

Full-blown utilitarianism often obscures its differences with thin utilitarianism, which is a questionable doctrine but in accord with ordinary common sense. It emerges from thin utilitarianism by the misdirection of subjecting ethical premises to the test of simplicity, a test appropriate to realist theories exclusively, because simplicity serves truth. A classic illustration: Aristotle theorized that everything on earth that goes up goes down; Newton set out the gravity theory, which applies to all objects, not just those terrestrial, and which predicts that objects can escape the earth’s gravitational field by traveling fast. Scientists confidently bet on Newton well before rockets were invented, and their confidence was vastly increased by the simplicity of Newton’s theory, which made correct predictions concerning all objects. Although philosophers have explained variously the correlation between simplicity and truth, they generally agree that simplicity signals truth. Unless utilitarians can otherwise justify it, searching for a simple moral theory means searching for a true theory.

The full-blown utilitarian seeks a misplaced simplicity by insisting that all entities that can experience happiness, a much simpler criterion than “current citizens,” serve as the beneficiary reference group—including future generations of humans and even beasts, whose existence depends on policy; whereas, thin utilitarianism is a democratic convention, serving only the wants of the currently existing citizens . Because they must incorporate future generations into the reference group, utilitarian philosophers have had to accept that a policy-dependent reference group entails a dilemma regarding interpretation of full-blown utilitarianism, with unattractive consequences at both horns, which realize radically different ideals.  In one version, you maximize the average utility obtained by the whole population; in the other you sum the utilities. These interpretations seem almost equally unattractive: the averaging view says that one supremely happy human is better than a billion very happy ones; the adding approach implies that a hundred trillion miserable wretches is better than a billion happy people. To apply a utilitarian standard to scenarios so distant from thin utilitarianism, accepting their consequences because of simplicity’s demands, is to treat moral premises as truths and to practice moral realism, despite contrary self-description. Those agreeing that moral realism is impossible must reject full-blown utilitarianism.

Is suicide high-status?

9 Stabilizer 12 February 2013 09:41AM

I sometimes have thoughts of suicide. That does not mean I would ever come within a mile of committing the act of suicide. But my brain does simulate it; though I do try to always reduce such thoughts.

But what I have noticed is that 'suicide' is triggered in my mind whenever I think of some embarrassing event, real or imagined. Or an event in which I'm obviously a low-status actor. This leads me to think that suicide might be a high-status move, in the sense that its goal is to recover status after some event which caused a big drop in status. Consider the following instances when suicide is often considered:

  1. One-sided break-ups of romantic relationships. The party who has been 'dumped' (for the lack of a better word), has obviously taken a giant status hit. In this case, suicide is often threatened. 
  2. A samurai committing seppuku. The samurai has lost in battle. Clearly, a huge drop in status (aka 'honor').
  3. PhD student says he/she can't take it anymore. A PhD is a constant hit in status: you aren't smart enough, you don't have much money, and you don't yet have intellectual status.
Further, suicide (or suicidal behavior leading to death) seems to have conferred status to artists. Examples: Kurt Cobain, Amy Winehouse, Jimi Hendrix, Hunter S. Thompson, Ernest Hemingway, David Foster Wallace and many more. I'm not saying that they committed suicide due to a pressure to achieve high-status (though that may be the case, I'm not sure). What I am saying is that suicide has been associated with high-status. 

Further, after a person is dead, he/she is almost always celebrated (at least for a while) and all their faults are forgotten.

My theory: in many low-status situations, an instinctive way to recover status is to say that you are too good for this game and check-out. In fact, children (and adults) will often just leave a game they're not very good at and disparage the rest of the players for playing. And suicide is the ultimate check-out. This theory is motivated by observations of my own brain going through thoughts of suicide. They almost always consist of imagining other people crying about my death and saying what an awesome person he was. And about how he was just too smart to be able to live in this world. 

Do you think this theory has some weight? I'm certain that I'm not the first person to think of this. But a quick Google didn't yield much. Any pointers to literature?

 

View more: Next