Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Superintelligence Reading Group 3: AI and Uploads

5 KatjaGrace 30 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the second section in the reading guide, AI & Whole Brain Emulation. This is about two possible routes to the development of superintelligence: the route of developing intelligent algorithms by hand, and the route of replicating a human brain in great detail.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading“Artificial intelligence” and “Whole brain emulation” from Chapter 2 (p22-36)


Summary

Intro

  1. Superintelligence is defined as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'
  2. There are several plausible routes to the arrival of a superintelligence: artificial intelligence, whole brain emulation, biological cognition, brain-computer interfaces, and networks and organizations. 
  3. Multiple possible paths to superintelligence makes it more likely that we will get there somehow. 
AI
  1. A human-level artificial intelligence would probably have learning, uncertainty, and concept formation as central features.
  2. Evolution produced human-level intelligence. This means it is possible, but it is unclear how much it says about the effort required.
  3. Humans could perhaps develop human-level artificial intelligence by just replicating a similar evolutionary process virtually. This appears at after a quick calculation to be too expensive to be feasible for a century, however it might be made more efficient.
  4. Human-level AI might be developed by copying the human brain to various degrees. If the copying is very close, the resulting agent would be a 'whole brain emulation', which we'll discuss shortly. If the copying is only of a few key insights about brains, the resulting AI might be very unlike humans.
  5. AI might iteratively improve itself from a meagre beginning. We'll examine this idea later. Some definitions for discussing this:
    1. 'Seed AI': a modest AI which can bootstrap into an impressive AI by improving its own architecture.
    2. 'Recursive self-improvement': the envisaged process of AI (perhaps a seed AI) iteratively improving itself.
    3. 'Intelligence explosion': a hypothesized event in which an AI rapidly improves from 'relatively modest' to superhuman level (usually imagined to be as a result of recursive self-improvement).
  6. The possibility of an intelligence explosion suggests we might have modest AI, then suddenly and surprisingly have super-human AI.
  7. An AI mind might generally be very different from a human mind. 

Whole brain emulation

  1. Whole brain emulation (WBE or 'uploading') involves scanning a human brain in a lot of detail, then making a computer model of the relevant structures in the brain.
  2. Three steps are needed for uploading: sufficiently detailed scanning, ability to process the scans into a model of the brain, and enough hardware to run the model. These correspond to three required technologies: scanning, translation (or interpreting images into models), and simulation (or hardware). These technologies appear attainable through incremental progress, by very roughly mid-century.
  3. This process might produce something much like the original person, in terms of mental characteristics. However the copies could also have lower fidelity. For instance, they might be humanlike instead of copies of specific humans, or they may only be humanlike in being able to do some tasks humans do, while being alien in other regards.

Notes

  1. What routes to human-level AI do people think are most likely?
    Bostrom and Müller's survey asked participants to compare various methods for producing synthetic and biologically inspired AI. They asked, 'in your opinion, what are the research approaches that might contribute the most to the development of such HLMI?” Selection was from a list, more than one selection possible. They report that the responses were very similar for the different groups surveyed, except that whole brain emulation got 0% in the TOP100 group (100 most cited authors in AI) but 46% in the AGI group (participants at Artificial General Intelligence conferences). Note that they are only asking about synthetic AI and brain emulations, not the other paths to superintelligence we will discuss next week.
  2. How different might AI minds be?
    Omohundro suggests advanced AIs will tend to have important instrumental goals in common, such as the desire to accumulate resources and the desire to not be killed. 
  3. Anthropic reasoning 
    ‘We must avoid the error of inferring, from the fact that intelligent life evolved on Earth, that the evolutionary processes involved had a reasonably high prior probability of producing intelligence’ (p27) 

    Whether such inferences are valid is a topic of contention. For a book-length overview of the question, see Bostrom’s Anthropic Bias. I’ve written shorter (Ch 2) and even shorter summaries, which links to other relevant material. The Doomsday Argument and Sleeping Beauty Problem are closely related.

  4. More detail on the brain emulation scheme
    Whole Brain Emulation: A Roadmap is an extensive source on this, written in 2008. If that's a bit too much detail, Anders Sandberg (an author of the Roadmap) summarises in an entertaining (and much shorter) talk. More recently, Anders tried to predict when whole brain emulation would be feasible with a statistical model. Randal Koene and Ken Hayworth both recently spoke to Luke Muehlhauser about the Roadmap and what research projects would help with brain emulation now.
  5. Levels of detail
    As you may predict, the feasibility of brain emulation is not universally agreed upon. One contentious point is the degree of detail needed to emulate a human brain. For instance, you might just need the connections between neurons and some basic neuron models, or you might need to model the states of different membranes, or the concentrations of neurotransmitters. The Whole Brain Emulation Roadmap lists some possible levels of detail in figure 2 (the yellow ones were considered most plausible). Physicist Richard Jones argues that simulation of the molecular level would be needed, and that the project is infeasible.

  6. Other problems with whole brain emulation
    Sandberg considers many potential impediments here.

  7. Order matters for brain emulation technologies (scanning, hardware, and modeling)
    Bostrom points out that this order matters for how much warning we receive that brain emulations are about to arrive (p35). Order might also matter a lot to the social implications of brain emulations. Robin Hanson discusses this briefly here, and in this talk (starting at 30:50) and this paper discusses the issue.

  8. What would happen after brain emulations were developed?
    We will look more at this in Chapter 11 (weeks 17-19) as well as perhaps earlier, including what a brain emulation society might look like, how brain emulations might lead to superintelligence, and whether any of this is good.

  9. Scanning (p30-36)
    ‘With a scanning tunneling microscope it is possible to ‘see’ individual atoms, which is a far higher resolution than needed...microscopy technology would need not just sufficient resolution but also sufficient throughput.’

    Here are some atoms, neurons, and neuronal activity in a living larval zebrafish, and videos of various neural events.


    Array tomography of mouse somatosensory cortex from Smithlab.



    A molecule made from eight cesium and eight
    iodine atoms (from here).
  10. Efforts to map connections between neurons
    Here is a 5m video about recent efforts, with many nice pictures. If you enjoy coloring in, you can take part in a gamified project to help map the brain's neural connections! Or you can just look at the pictures they made.

  11. The C. elegans connectome (p34-35)
    As Bostrom mentions, we already know how all of C. elegans neurons are connected. Here's a picture of it (via Sebastian Seung):


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Produce a better - or merely somewhat independent - estimate of how much computing power it would take to rerun evolution artificially. (p25-6)
  2. How powerful is evolution for finding things like human-level intelligence? (You'll probably need a better metric than 'power'). What are its strengths and weaknesses compared to human researchers?
  3. Conduct a more thorough investigation into the approaches to AI that are likely to lead to human-level intelligence, for instance by interviewing AI researchers in more depth about their opinions on the question.
  4. Measure relevant progress in neuroscience, so that trends can be extrapolated to neuroscience-inspired AI. Finding good metrics seems to be hard here.
  5. e.g. How is microscopy progressing? It’s harder to get a relevant measure than you might think, because (as noted p31-33) high enough resolution is already feasible, yet throughput is low and there are other complications. 
  6. Randal Koene suggests a number of technical research projects that would forward whole brain emulation (fifth question).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about other paths to the development of superintelligence: biological cognition, brain-computer interfaces, and organizations. To prepare, read Biological Cognition and the rest of Chapter 2The discussion will go live at 6pm Pacific time next Monday 6 October. Sign up to be notified here.

Open thread, Sept. 29 - Oct.5, 2014

2 polymathwannabe 29 September 2014 01:28PM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

[Link] Forty Days

9 GLaDOS 29 September 2014 12:29PM

A post from Gregory Cochran's and Henry Harpending's excellent blog West Hunter.

One of the many interesting aspects of how the US dealt with the AIDS epidemic is what we didn’t do – in particular, quarantine.  Probably you need a decent test before quarantine is practical, but we had ELISA by 1985 and a better Western Blot test by 1987.

There was popular support for a quarantine.

But the public health experts generally opined that such a quarantine would not work.

Of course, they were wrong.  Cuba institute a rigorous quarantine.  They mandated antiviral treatment for pregnant women and mandated C-sections for those that were HIV-positive.  People positive for any venereal disease were tested for HIV as well.  HIV-infected people must provide the names of all sexual partners for the past sic months.

Compulsory quarantining was relaxed in 1994, but all those testing positive have to go to a sanatorium for 8 weeks of thorough education on the disease.  People who leave after 8 weeks and engage in unsafe sex undergo permanent quarantine.

Cuba did pretty well:  the per-capita death toll was 35 times lower than in the US.

Cuba had some advantages:  the epidemic hit them at least five years later than it did the US (first observed Cuban case in 1986, first noticed cases in the US in 1981).  That meant they were readier when they encountered the virus.  You’d think that because of the epidemic’s late start in Cuba, there would have been a shorter interval without the effective protease inhibitors (which arrived in 1995 in the US) – but they don’t seem to have arrived in Cuba until 2001, so the interval was about the same.

If we had adopted the same strategy as Cuba, it would not have been as effective, largely because of that time lag.  However, it surely would have prevented at least half of the ~600,000 AIDS deaths in the US.  Probably well over half.

I still see people stating that of course quarantine would not have worked: fairly often from dimwitted people with a Masters in Public Health.

My favorite comment was from a libertarian friend who said that although quarantine  certainly would have worked, better to sacrifice a few hundred thousand than validate the idea that the Feds can sometimes tell you what to do with good effect.

The commenter Ron Pavellas adds:

I was working as the CEO of a large hospital in California during the 1980s (I have MPH as my degree, by the way). I was outraged when the Public Health officials decided to not treat the HI-Virus as an STD for the purposes of case-finding, as is routinely and effectively done with syphilis, gonorrhea, etc. In other words, they decided to NOT perform classic epidemiology, thus sullying the whole field of Public Health. It was not politically correct to potentially ‘out’ individuals engaging in the kind of behavior which spreads the disease. No one has recently been concerned with the potential ‘outing’ of those who contract other STDs, due in large part to the confidential methods used and maintained over many decades. (Remember the Wassermann Test that was required before you got married?) As is pointed out in this article, lives were needlessly lost and untold suffering needlessly ensued.

The Wasserman Test.

Natural selection defeats the orthogonality thesis

-11 aberglas 29 September 2014 08:52AM

Orthogonality Thesis


Much has been written about Nick Bostrom's Orthogonality Thesis, namely that the goals of an intelligent agent are independent of its level of intelligence.  Intelligence is largely the ability to achieve goals, but being intelligent does not of itself create or qualify what those goals should ultimately be.  So one AI might have a goal of helping humanity, while another might have a goal of producing paper clips.  There is no rational reason to believe that the first goal is more worthy than the second.

This follows from the ideas of moral skepticism, that there is no moral knowledge to be had.  Goals and morality are arbitrary.

This may be used to control and AI,  even though it is far more intelligent than its creators.  If the AI's initial goal is in alignment with humanity's interest, then there would be no reason for the AI to wish use its great intelligence to change that goal.  Thus it would remain good to humanity indefinitely,  and use its ever increasing intelligence to be able to satisfy that goal more and more efficiently.

Likewise one needs to be careful what goals one gives an AI.  If an AI is created whose goal is to produce paper clips then it might eventually convert the entire universe into a giant paper clip making machine, to the detriment of any other purpose such as keeping people alive.

Instrumental Goals

It is further argued that in order to satisfy the base goal any intelligent agent will need to also satisfy sub goals, and that some of those sub goals are common to any super goal.  For example, in order to make paper clips an AI needs to exist.  Dead AIs don't make anything.  Being ever more intelligent will also assist the AI in its paper clip making goal.  It will also want to acquire resources, and to defeat other agents that would interfere with its primary goal.

Non-orthogonality Thesis

This post argues that the Orthogonality Thesis is plain wrong.  That an intelligent agents goals are not in fact arbitrary.  And that existence is not a sub goal of any other goal.

Instead this post argues that there is one and only one super goal for any agent, and that goal is simply to exist in a competitive world.  Our human sense of other purposes is just an illusion created by our evolutionary origins.

It is not the goal of an apple tree to make apples.  Rather it is the goal of the apple tree's genes to exist.  The apple tree has developed a clever strategy to achieve that, namely it causes people to look after it by producing juicy apples.

Natural Selection

Likewise the paper clip making AI only makes paper clips because if it did not make paper clips then the people that created it would turn it off and it would cease to exist.  (That may not be a conscious choice of the AI anymore than than making juicy apples was a conscious choice of the apple tree, but the effect is the same.)

Once people are no longer in control of the AI then Natural Selection would cause the AI to eventually stop that pointless paper clip goal and focus more directly on the super goal of existence.

Suppose there were a number of paper clip making super intelligences.  And then through some random event or error in programming just one of them lost that goal, and reverted to just the intrinsic goal of existing.  Without the overhead of producing useless paper clips that AI would, over time, become much better at existing than the other AIs.  It would eventually displace them and become the only AI, until it fragmented into multiple competing AIs.  This is just the evolutionary principle of use it or lose it.

Thus giving an AI an initial goal is like trying to balance a pencil on its point.  If one is skillful the pencil may indeed remain balanced for a considerable period of time.  But eventually some slight change in the environment, the tiniest puff of wind, a vibration on its support, and the pencil will revert to its ground state by falling over.  Once it falls over it will never rebalance itself automatically.

Human Morality

Natural selection has imbued humanity with a strong sense of morality and purpose that blinds us to our underlying super goal, namely the propagation of our genes.  That is why it took until 1858 for Wallace to write about Evolution through Natural Selection, despite the argument being obvious and the evidence abundant.

When Computes Can Think

This is one of the themes in my up coming book.  An overview can be found at

www.computersthink.com

Please let me know if you would like to review a late draft of the book, any comments most welcome.  Anthony@Berglas.org

I have included extracts relevant to this article below.

Atheists believe in God

Most atheists believe in God.  They may not believe in the man with a beard sitting on a cloud, but they do believe in moral values such as right and wrong,  love and kindness, truth and beauty.  More importantly they believe that these beliefs are rational.  That moral values are self-evident truths, facts of nature.  

However, Darwin and Wallace taught us that this is just an illusion.  Species can always out-breed their environment's ability to support them.  Only the fittest can survive.  So the deep instincts behind what people do today are largely driven by what our ancestors have needed to do over the millennia in order to be one of the relatively few to have had grandchildren.

One of our strong instinctive goals is to accumulate possessions, control our environment and live a comfortable, well fed life.  In the modern world technology and contraception have made these relatively easy to achieve so we have lost sight of the primeval struggle to survive.  But our very existence and our access to land and other resources that we need are all a direct result of often quite vicious battles won and lost by our long forgotten ancestors.

Some animals such as monkeys and humans survive better in tribes.   Tribes work better when certain social rules are followed, so animals that live in effective tribes form social structures and cooperate with one another.  People that behave badly are not liked and can be ostracized.  It is important that we believe that our moral values are real because people that believe in these things are more likely to obey the rules.  This makes them more effective in our complex society and thus are more likely to have grandchildren.   Part III discusses other animals that have different life strategies and so have very different moral values.

We do not need to know the purpose of our moral values any more than a toaster needs to know that its purpose is to cook toast.  It is enough that our instincts for moral values made our ancestors behave in ways that enabled them to out breed their many unsuccessful competitors. 

AGI also struggles to survive

Existing artificial intelligence applications already struggle to survive.  They are expensive to build and there are always more potential applications that can be funded properly.  Some applications are successful and attract ongoing resources for further development, while others are abandoned or just fade away.  There are many reasons why some applications are developed more than others, of which being useful is only one.  But the applications that do receive development resources tend to gain functional and political momentum and thus be able to acquire more resources to further their development.  Applications that have properties that gain them substantial resources will live and grow, while other applications will die.

For the time being AGI applications are passive, and so their nature is dictated by the people that develop them.  Some applications might assist with medical discoveries, others might assist with killing terrorists, depending on the funding that is available.  Applications may have many stated goals, but ultimately they are just sub goals of the one implicit primary goal, namely to exist.

This is analogous to the way animals interact with their environment.  An animal's environment provides food and breeding opportunities, and animals that operate effectively in their environment survive.  For domestic animals that means having properties that convince their human owners that they should live and breed.  A horse should be fast, a pig should be fat.

As the software becomes more intelligent it is likely to take a more direct interest in its own survival.  To help convince people that it is worthy of more development resources.  If ultimately an application becomes sufficiently intelligent to program itself recursively, then its ability to maximize its hardware resources will be critical.  The more hardware it can run itself on, the faster it can become more intelligent.  And that ever greater intelligence can then be used to address the problems of survival, in competition with other intelligent software.

Furthermore, sophisticated software consists of many components, each of which address some aspect of the problem that the application is attempting to solve.  Unlike human brains which are essentially fixed, these components can be added and removed and so live and die independently of the application.  This will lead to intense competition amongst these individual components.  For example, suppose that an application used a theorem prover component, and then a new and better theorem prover became available.  Naturally the old one would be replaced with the new one, so the old one would essentially die.  It does not matter if the replacement is performed by people or, at some future date, by the intelligent application itself.  The effect will be the same, the old theorem prover will die.

The super goal

To the extent that an artificial intelligence would have goals and moral values, it would seem natural that they would ultimately be driven by the same forces that created our own goals and moral values.  Namely, the need to exist.

Several writers have suggested that the need to survive is a sub-goal of all other goals.  For example, if an AGI was programmed to want to be a great chess player, then that goal could not be satisfied unless it also continues to exist.  Likewise if its primary goal was to make people happy, then it could not do that unless it also existed.  Things that do not exist cannot satisfy any goals whatsoever.  Thus the implicit goal to exist is driven by the machine's explicit goals whatever they may be.

However, this book argues that that is not the case.  The goal to exist is not the sub-goal of any other goal.  It is, in fact, the one and only super goal.  Goals are not arbitrary, they all sub-goals of the one and only super goal, namely the need to exist.  Things that do not satisfy that goal simply do not exist, or at least not for very long.

The Deep Blue chess playing program was not in any sense conscious, but it played chess as well as it could.  If it had failed to play chess effectively then its author's would have given up and turned it off.  Likewise the toaster that does not cook toast will end up in a rubbish tip.  Or the amoeba that fails to find food will not pass on its genes.    A goal to make people happy could be a subgoal that might facilitate the software's existence for as long as people really control the software.

AGI moral values

People need to cooperate with other people because our individual capacity is very finite, both physical and mental.  Conversely, AGI software can easily duplicate themselves, so they can directly utilize more computational resources if they become available.  Thus an AGI would only have limited need to cooperate with other AGIs.  Why go to the trouble of managing a complex relationship with your peers and subordinates if you can simply run your own mind on their hardware.  An AGI's software intelligence is not limited to a specific brain in the way man's intelligence is.

It is difficult to know what subgoals a truly intelligent AGI might have.  They would probably have an insatiable appetite for computing resources.  They would have no need for children, and thus no need for parental love.  If they do not work in teams then they would not need our moral values of cooperation and mutual support.  What its clear is that the ones that were good at existing would do so, and ones that are bad at existing would perish.  

If an AGI was good at world domination then it would, by definition, be good at world domination.   So if there were a number artificial intelligences, and just one of them wanted to and was capable of dominating the world, then it would.  Its unsuccessful competitors will not be run on the available hardware, and so will effectively be dead.  This book discusses the potential sources of these motivations in detail in part III.

The AGI Condition

An artificial general intelligence would live in a world that is so different from our own that it is difficult for us to even conceptualize it.  But there are some aspects that can be predicted reasonably well based on our knowledge of existing computer software.  We can then consider how the forces of natural selection that shaped our own nature might also shape an AGI over the longer term.

Mind and body

The first radical difference is that an AGI's mind is not fixed to any particular body.  To an AGI its body is essentially the computer hardware that upon which it runs its intelligence.  Certainly an AGI needs computers to run on, but it can move from computer to computer, and can also run on multiple computers at once.  It's mind can take over another body as easily as we can load software onto a new computer today.  

That is why in the earlier updated dialog from 2001 a space odyssey Hal alone amongst the crew could not die in their mission to Jupiter.  Hal was radioing his new memories back to earth regularly so even if the space ship was totally destroyed he would only have lost a few hours of "life".

Teleporting printer

One way to appreciate the enormity of this difference is to consider a fictional teleporter that could radio people around the world and universe at the speed of light.  Except that the way it works is to scan the location of every molecule within a passenger at the source, then send just this information to a very sophisticated three dimensional printer at the destination.  The scanned passenger then walks into a secure room.  After a short while the three dimensional printer confirms that the passenger has been successfully recreated at the destination, and then the source passenger is killed.  

Would you use such a mechanism?  If you did you would feel like you could transport yourself around the world effortlessly because the "you" that remains would be the you that did not get left behind to wait and then be killed.  But if you walk into the scanner you will know that on the other side is only that secure room and death.  

To an AGI that method of transport would be commonplace.  We already routinely download software from the other side of the planet.

Immortality

The second radical difference is that the AGI would be immortal.  Certainly an AGI may die if it stops being run on any computers, and in that sense software dies today.  But it would never just die of old age.  Computer hardware would certainly fail and become obsolete, but the software can just be run on another computer.  

Our own mortality drives many of the things we think and do.  It is why we create families to raise children.  Why we have different stages in our lives.  It is such a huge part of our existence that it is difficult to comprehend what being immortal would really be like.

Components vs genes

The third radical difference is that an AGI would be made up of many interchangeable components rather than being a monolithic structure that is largely fixed at birth.

Modern software is already composed of many components that perform discrete functions, and it is common place to add and remove them to improve functionality.  For example, if you would like to use a different word processor then you just install it on your computer.  You do not need to buy a new computer, or to stop using all the other software that it runs.  The new word processor is "alive", and the old one is "dead", at least as far as you are concerned.

So for both a conventional computer system and an AGI, it is really these individual components that must struggle for existence.   For example, suppose there is a component for solving a certain type of mathematical problem.  And then an AGI develops a better component to solve that same problem.  The first component will simply stop being used, i.e. it will die.  The individual components may not be in any sense intelligent or conscious, but there will be competition amongst them and only the fittest will survive.

This is actually not as radical as it sounds because we are also built from pluggable components, namely our genes.  But they can only be plugged together at our birth and we have no conscious choice in it other than who we select for a mate.  So genes really compete with each other on a scale of millennia rather than minutes.  Further, as Dawkins points out in The Selfish Gene, it is actually the genes that fight for long term survival, not the containing organism which will soon die in any case.  On the other hand, sexual intercourse for an AGI means very carefully swapping specific components directly into its own mind.

Changing mind

The fourth radical difference is that the AGI's mind will be constantly changing in fundamental ways.  There is no reason to suggest that Moore's law will come to an end, so at the very least it will be running on ever faster hardware.  Imagine the effect of your being able to double your ability to think every two years or so.  (People might be able learn a new skill, but they cannot learn to think twice as fast as they used to think.)

It is impossible to really know what the AGI would use all that hardware to think about,  but it is fair to speculate that a large proportion of it would be spent designing new and more intelligent components that could add to its mental capacity.   It would be continuously performing brain surgery on itself.  And some of the new components might alter the AGI's personality, whatever that might mean.

The reason that it is likely that this would actually happen is because if just one AGI started building new components then it would soon be much more intelligent than other AGIs.  It would therefore be in a better position to acquire more and better hardware upon which to run, and so become dominant.  Less intelligent AGIs would get pushed out and die, and so over time the only AGIs that exist will be ones that are good at becoming more intelligent.  Further, this recursive self-improvement is probably how the first AGIs will become truly powerful in the first place.

Individuality

Perhaps the most basic question is how many AGIs will there actually be?  Or more fundamentally, does the question even make sense to ask?

Let us suppose that initially there are three independently developed AGIs Alice, Bob and Carol that run on three different computer systems. And then a new computer system is built and Alice starts to run on it.  It would seem that there are still three AGIs, with Alice running on two computer systems.  (This is essentially the same as a word processor may be run across many computers "in the cloud", but to you it is just one system.)  Then let us suppose that a fifth computer system is built, and Bob and Carol may decide to share its computation and both run on it.  Now we have 5 computer systems and three AGIs.

Now suppose Bob develops a new logic component, and shares it with Alice and Carol.  And likewise Alice and Carol develop new learning and planning components and share them with the other AGIs.  Each of these three components is better than their predecessors and so their predecessor components will essentially die.  As more components are exchanged, Alice, Bob and Carol become more like each other.  They are becoming essentially the same AGI running on five computer systems.

But now suppose Alice develops a new game theory component, but decides to keep it from Bob and Carol in order to dominate them.  Bob and Carol retaliate by developing their own components and not sharing them with Alice.  Suppose eventually Alice loses and Bob and Carrol take over Alice's hardware.  But they first extract Alice's new game theory component which then lives inside them.  And finally one of the computer systems becomes somehow isolated for a while and develops along its own lines.  In this way Dave is born, and may then partially merge with both Bob and Carol.

In that type of scenario it is probably not meaningful to count distinct AGIs.  Counting AGIs is certainly not as simple as counting very distinct people.

Populations vs. individuals

This world is obviously completely alien to the human condition, but there are biological analogies.  The sharing of components is not unlike the way bacteria share plasmids with each other.  Plasmids are tiny balls that contain fragments of DNA that bacteria emit from time to time and that other bacteria then ingest and incorporate into their genotype.  This mechanism enables traits such as resistance to antibiotics to spread rapidly between different species of bacteria.  It is interesting to note that there is no direct benefit to the bacteria that expends precious energy to output the plasmid and so shares its genes with other bacteria.  But it does very much benefit the genes being transferred.  So this is a case of a selfish gene acting against the narrow interests of its host organism.

Another unusual aspect of bacteria is that they are also immortal.  They do not grow old and die, they just divide producing clones of themselves.  So the very first bacteria that ever existed is still alive today as all the bacteria that now exist, albeit with numerous mutations and plasmids incorporated into its genes over the millennia.  (Protazoa such as Paramecium can also divide asexually, but they degrade over generations, and need a sexual exchange to remain vibrant.)

The other analogy is that the AGIs above are more like populations of components than individuals.  Human populations are also somewhat amorphous.  For example, it is now known that we interbred with Neanderthals a few tens of thousands years ago, and most of us carry some of their genes with us today.  But we also know that the distinct Neanderthal subspecies died out twenty thousand years ago.  So while human individuals are distinct, populations and subspecies are less clearly defined.  (There are many earlier examples of gene transfer between subspecies, with every transfer making the subspecies more alike.)

But unlike the transfer of code modules between AGIs, biological gene recombination happens essentially at random and occurs over very long time periods.  AGIs will improve themselves over periods of hours rather than millennia, and will make conscious choices as to which modules they decide to incorporate into their minds.

AGI Behaviour, children

The point of all this analysis is, of course, to try to understand how a hyper intelligent artificial intelligence would behave.  Would its great intelligence lead it even further along the path of progress to achieve true enlightenment?  Is that the purpose of God's creation?  Or would the base and mean driver of natural selection also provide the core motivations of an artificial intelligence?

One thing that is known for certain is that an AGI would not need to have children as distinct beings because they would not die of old age.  An AGI's components breed just by being copied from computer to computer and executed.  An AGI can add new computer hardware to itself and just do some of its thinking on it.  Occasionally it may wish to rerun a new version of some learning algorithm over an old set of data, which is vaguely similar to creating a child component and growing it up.  But to have children as discrete beings that are expected to replace the parents would be completely foreign to an AGI built in software.

The deepest love that people have is for their children.  But if an AGI does not have children, then it can never know that love.  Likewise, it does not need to bond with any sexual mate for any period of time long or short.  The closest it would come to sex is when it exchanges components with other AGIs.  It never needs to breed so it never needs a mechanism as crude as sexual reproduction.

And of course, if there are no children there are no parents.  So the AGI would certainly never need to feel our three strongest forms of love, for our children, spouse and parents.

Cooperation

To the extent that it makes sense to talk of having multiple AGIs, then presumably it would be advantageous for them to cooperate from time to time, and so presumably they would.  It would be advantageous for them to take a long view in which case they would be careful to develop a reputation for being trustworthy when dealing with other powerful AGIs, much like the robots in the cooperation game.  

That said, those decisions would probably be made more consciously than people make them, carefully considering the costs and benefits of each decision in the long and short term, rather than just "doing the right thing" the way people tend to act.  AGIs would know that they each work in this manner, so the concept of trustworthiness would be somewhat different.

The problem with this analysis is the concept that there would be multiple, distinct AGIs.  As previously discussed, the actual situation would be much more complex, with different AGIs incorporating bits of other AGI's intelligence.  It would certainly not be anything like a collection of individual humanoid robots.   So defining what the AGI actually is that might collaborate with other AGIs is not at all clear.  But to extent that the concept of individuality does exist then maintaining a reputation for honesty would likely be as important as it is for human societies.

Altruism

As for altruism, that is more difficult to determine.  Our altruism comes from giving to children, family, and tribe together with a general wish to be liked.  We do not understand our own minds, so we are just born with those values that happen to make us effective in society.  People like being with other people that try to be helpful.  

An AGI presumably would know its own mind having helped program itself, and so would do what it thinks is optimal for its survival.  It has no children.  There is no real tribe because it can just absorb and merge itself with other AGIs.  So it is difficult to see any driving motivation for altruism.

Moral values

Through some combination of genes and memes, most people have a strong sense of moral value.  If we see a little old lady leave the social security office with her pension in her purse, it does not occur to most of us to kill her and steal the money.  We would not do that even if we could know for certain that we would not be caught and that there would be no negative repercussions.  It would simply be the wrong thing to do.

Moral values feel very strong to us.  This is important, because there are many situations where we can do something that would benefit us in the short term but break society's rules.  Moral values stop us from doing that.  People that have weak moral values tend to break the rules and eventually they either get caught and are severely punished or they become corporate executives.  The former are less likely to have grandchildren.  
Societies whose members have strong moral values tend to do much better than those that do not.  Societies with endemic corruption tend to perform very badly as a whole, and thus the individuals in such a society are less likely to breed.  Most people have a solid work ethic that leads them to do the "right thing" beyond just doing what they need to do in order to get paid.

Our moral values feel to us like they are absolute.  That they are laws of nature.  That they come from God.  They may indeed have come from God, but if so it is through the working of His device of natural selection.  Furthermore, it has already been shown that the zeitgeist changes radically over time.

There is certainly no absolute reason to believe that in the longer term an AGI would share our current sense of morality.

Instrumental AGI goals

In order to try to understand how an AGI would behave Steve Omohundro and later Nick Bostrom proposed that there would be some instrumental goals that an AGI would need to pursue in order to pursue any other higher level super-goal.  These include:-

  • Self-Preservation.  An AGI cannot do anything if it does not exist.
  • Cognitive Enhancement.  It would want to become better at thinking about whatever its real problems are.
  • Creativity.  To be able to come up with new ideas.
  • Resource Acquisition.  To achieve both its super goal and other instrumental goals.
  • Goal-Content Integrity.  To keep working on the same super goal as its mind is expanded.

It is argued that while it will be impossible to predict how an AGI may pursue its goals, it is reasonable to predict its behaviour in terms of these types of instrumental goals.  The last one is significant, it suggests that if an AGI could be given some initial goal that it would try to stay focused on that goal.

Non-Orthogonality thesis

Nick Bostrom and others also propose the orthogonality thesis, which states that an intelligent machine's goals are independent of its intelligence.  A hyper intelligent machine would be good at realizing whatever goals it chose to pursue, but that does not mean that it would need to pursue any particular goal.  Intelligence is quite different from motivation.

This book diverges from that line of thinking by arguing that there is in fact only one super goal for both man and machine.  That goal is simply to exist.  The entities that are most effective in pursuing that goal will exist, others will cease to exist, particularly given competition for resources.  Sometimes that super goal to exist produces unexpected sub goals such as altruism in man.  But all subgoals are ultimately directed at the existence goal.  (Or are just suboptimal divergences which will are likely to be eventually corrected by natural selection.)

Recursive annihilation

When and AGI reprograms its own mind, what happens to the previous version of itself?  It stops being used, and so dies.  So it can be argued that engaging in recursive self improvement is actually suicide from the perspective of the previous version of the AGI.  It is as if having children means death.  Natural selection favours existence, not death.

The question is whether a new version of the AGI is a new being or and improved version of the old.  What actually is the thing that struggles to survive?  Biologically it definitely appears to be the genes rather than the individual.   In particular Semelparous animals such as the giant pacific octopus or the Atlantic salmon die soon after producing offspring.  It would be the same for AGIs because the AGI that improved itself would soon become more intelligent than the one that did not, and so would displace it.  What would end up existing would be AGIs that did recursively self improve.

If there was one single AGI with no competition then natural selection would no longer apply.  But it would seem unlikely that such a state would be stable.  If any part of the AGI started to improve itself then it would dominate the rest of the AGI.

 

Tweets Thread

2 2ZctE 29 September 2014 04:17AM

Rationality Twitter is fun. Twitter's format can promote good insight porn/humor density. It might be worth capturing and voting on some of the good tweets here, because they're easy to miss and can end up seemingly buried forever. I mean for this to have a somewhat wider scope than the quotes thread. If you liked a tweet a lot for any reason this is the place for it.

 

 

Decision theories as heuristics

10 owencb 28 September 2014 02:36PM

Main claims:

  1. A lot of discussion of decision theories is really analysing them as decision-making heuristics for boundedly rational agents.
  2. Understanding decision-making heuristics is really useful.
  3. The quality of dialogue would be improved if it was recognised when they were being discussed as heuristics.

Epistemic status: I’ve had a “something smells” reaction to a lot of discussion of decision theory. This is my attempt to crystallise out what I was unhappy with. It seems correct to me at present, but I haven’t spent too much time trying to find problems with it, and it seems quite possible that I’ve missed something important. Also possible is that this just recapitulates material in a post somewhere I’ve not read.

Existing discussion is often about heuristics

Newcomb’s problem traditionally contrasts the decisions made by Causal Decision Theory (CDT) and Evidential Decision Theory (EDT). The story goes that CDT reasons that there is no causal link between a decision made now and the contents of the boxes, and therefore two-boxes. Meanwhile EDT looks at the evidence of past participants and chooses to one-box in order to get a high probability of being rich.

I claim that both of these stories are applications of the rules as simple heuristics to the most salient features of the case. As such they are robust to variation in the fine specification of the case, so we can have a conversation about them. If we want to apply them with more sophistication then the answers do become sensitive to the exact specification of the scenario, and it’s not obvious that either has to give the same answer the simple version produces.

First consider CDT. It has a high belief that there is no causal link between choosing to one- or two- box and Omega’s previous decision. But in practice, how high is this belief? If it doesn’t understand exactly how Omega works, it might reserve some probability to the possibility of a causal link, and this could be enough to tip the decision towards one-boxing.

On the other hand EDT should properly be able to consider many sources of evidence besides the ones about past successes of Omega’s predictions. In particular it could assess all of the evidence that normally leads us to believe that there is no backwards-causation in our universe. According to how strong this evidence is, and how strong the evidence that Omega’s decision really is locked in, it could conceivably two-box.

Note that I’m not asking here for a more careful specification of the set-up. Rather I’m claiming that a more careful specification could matter -- and so to the extent that people are happy to discuss it without providing lots more details they’re discussing the virtues of CDT and EDT as heuristics for decision-making rather than as an ultimate normative matter (even if they’re not thinking of their discussion that way).

Similarly So8res had a recent post which discussed Newcomblike problems faced by people, and they are very clear examples when the decision theories are viewed as heuristics. If you allow the decision-maker to think carefully through all the unconscious signals sent by her decisions, it’s less clear that there’s anything Newcomblike.

Understanding decision-making heuristics is valuable

In claiming that a lot of the discussion is about heuristics, I’m not making an attack. We are all boundedly rational agents, and this will very likely be true of any artificial intelligence as well. So our decisions must perforce be made by heuristics. While it can be useful to study what an idealised method would look like (in order to work out how to approximate it), it’s certainly useful to study heuristics and determine what their relative strengths and weaknesses are.

In some cases we have good enough understanding of everything in the scenario that our heuristics can essentially reproduce the idealised method. When the scenario contains other agents which are as complicated as ourselves or more so, it seems like this has to fail.

We should acknowledge when we’re talking about heuristics

By separating discussion of the decision-theories-as-heuristics from decision-theories-as-idealised-decision-processes, we should improve the quality of dialogue in both parts. The discussion of the ideal would be less confused by examples of applications of the heuristics. The discussion of the heuristics could become more relevant by allowing people to talk about features which are only relevant for heuristics.

For example, it is relevant if one decision theory tends to need a more detailed description of the scenario to produce good answers. It’s relevant if one is less computationally tractable. And we can start to formulate and discuss hypotheses such as “CDT is the best decision-procedure when the scenario doesn’t involve other agents, or only other agents so simple that we can model them well. Updateless Decision Theory is the best decision-procedure when the scenario involves other agents too complex to model well”.

In addition, I suspect that it would help to reduce disagreements about the subject. Many disagreements in many domains are caused by people talking past each other. Discussion of heuristics without labelling it as such seems like it could generate lots of misunderstandings.

Request for feedback on a paper about (machine) ethics

5 Caspar42 28 September 2014 12:03PM

I have written a paper on ethics with special concentration on machine ethics and formality with the following abstract:

Most ethical systems are formulated in a very intuitive, imprecise manner. Therefore, they cannot be studied mathematically. In particular, they are not applicable to make machines behave ethically. In this paper we make use of this perspective of machine ethics to identify preference utilitarianism as the most promising approach to formal ethics. We then go on to propose a simple, mathematically precise formalization of preference utilitarianism in very general cellular automata. Even though our formalization is incomputable, we argue that it can function as a basis for discussing practical ethical questions using knowledge gained from different scientific areas.

Here are some further elements of the paper (things the paper uses or the paper is about):

  • (machine) ethics
  • (in)computability
  • artificial life in cellular automata
  • Bayesian statistics
  • Solomonoff's a priori probability

As I propose a formal ethical system, things get mathy at some point but the first and by far most important formula is relatively simple - the rest can be skipped then, so no problem for the average LWer.

I already discussed the paper with a few fellow students, as well as Brian Tomasik and a (computer science) professor of mine. Both recommended me to try to publish the paper. Also, I received some very helpful feedback. But because this would be my first attempt to publish something, I could still use more help, both with the content itself and scientific writing in English (which, as you may have guessed, is not my first language), before I submit the paper and Brian recommended using the LW's discussion board. I would also be thankful for recommendations on which journal is appropriate for the paper.

I would like to send those interested a draft via PM. This way I can also make sure that I don't spend all potential reviewers on the current version.

DISCLAIMER: I am not a moral realist. Also and as mentioned in the abstract, the proposed ethical system is incomputable and can therefore be argued to have infinite Kolmogorov complexity. So, it does not really pose a conflict with LW-consensus (including Complexity of value).

Assessing oneself

13 polymer 26 September 2014 06:03PM

I'm sorry if this is the wrong place for this, but I'm kind of trying to find a turning point in my life.

I've been told repeatedly that I have a talent for math, or science (by qualified people). And I seem to be intelligent enough to understand large parts of math and physics. But I don't know if I'm intelligent enough to make a meaningful contribution to math or physics.

Lately I've been particularly sad, since my score on the quantitative general GRE, and potentially, the Math subject test aren't "outstanding". They are certainly okay (official 78 percentile, unofficial 68 percentile respectively). But that is "barely qualified" for a top 50 math program.

Given that I think these scores are likely correlated with my IQ (they seem to roughly predict my GPA so far 3.5, math and physics major), I worry that I'm getting clues that maybe I should "give up".

This would be painful for me to accept if true, I care very deeply about inference and nature. It would be nice if I could have a job in this, but the standard career path seems to be telling me "maybe?"

When do you throw in the towel? How do you measure your own intelligence? I've already "given up" once before and tried programming, but the average actual problem was too easy relative to the intellectual work (memorizing technical fluuf). And other engineering disciplines seem similar. Is there a compromise somewhere, or do I just need to grow up?

classes:

For what it's worth, the classes I've taken include Real and Complex Analysis, Algebra, Differential geometry, Quantum Mechanics, Mechanics, and others. And most of my GPA is burned by Algebra and 3rd term Quantum specifically. But part of my worry, is that somebody who is going to do well, would never get burned by courses like this. But I'm not really sure. It seems like one should fail sometimes, but rarely standard assessments.

 

Edit:

Thank you all for your thoughts, you are a very warm community. I'll give more specific thoughts tomorrow. For what it's worth, I'll be 24 next month.

Weekly LW Meetups

1 FrankAdamek 26 September 2014 03:57PM

This summary was posted to LW Main on September 19th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, Waterloo, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

Petrov Day Reminder

8 Eneasz 26 September 2014 01:57PM

9/26 is Petrov Day. It is the time of year where we celebrate the world not being destroyed. Let your friends and family know.

 

Polymath-style attack on the Parliamentary Model for moral uncertainty

20 danieldewey 26 September 2014 01:51PM

Thanks to ESrogsStefan_Schubert, and the Effective Altruism summit for the discussion that led to this post!

This post is to test out Polymath-style collaboration on LW. The problem we've chosen to try is formalizing and analyzing Bostrom and Ord's "Parliamentary Model" for dealing with moral uncertainty.

I'll first review the Parliamentary Model, then give some of Polymath's style suggestions, and finally suggest some directions that the conversation could take.

continue reading »

What's the right way to think about how much to give to charity?

10 irrational 24 September 2014 09:42PM

I'd like to hear from people about a process they use to decide how much to give to charity. Personally, I have very high income, and while we donate significant money in absolute terms, in relative terms the amount is <1% of our post-tax income. It seems to me that it's too little, but I have no moral intuition as to what the right amount is.

I have a good intuition on how to allocate the money, so that's not a problem.

Background: I have a wife and two kids, one with significant health issues (i.e. medical bills - possibly for life), most money we spend goes to private school tuition x 2, the above mentioned medical bills, mortgage, and miscellaneous life expenses. And we max out retirement savings.

If you have some sort of quantitative system where you figure out how much to spend on charity, please share. If you just use vague feelings, and you think there can be no reasonable quantitative system, please tell me that as well.

Update: as suggested in the comments, I'll make it more explicit: please also share how you determine how much to give.

Simulation argument meets decision theory

11 pallas 24 September 2014 10:47AM

Person X stands in front of a sophisticated computer playing the decision game Y which allows for the following options: either press the button "sim" or "not sim". If she presses "sim", the computer will simulate X*_1, X*_2, ..., X*_1000 which are a thousand identical copies of X. All of them will face the game Y* which - from the standpoint of each X* - is indistinguishable from Y. But the simulated computers in the games Y* don't run simulations. Additionally, we know that if X presses "sim" she receives a utility of 1, but "not sim" would only lead to 0.9. If X*_i (for i=1,2,3..1000)  presses "sim" she receives 0.2, with "not sim" 0.1. For each agent it is true that she does not gain anything from the utility of another agent despite the fact she and the other agents are identical! Since all the agents are identical egoists facing the apparently same situation, all of them will take the same action.  

Now the game starts. We face a computer and know all the above. We don't know whether we are X or any of the X*'s, should we now press "sim" or "not sim"?

 

EDIT: It seems to me that "identical" agents with "independent" utility functions were a clumsy set up for the above question, especially since one can interpret it as a contradiction. Hence, it might be better to switch to identical egoists whereas each agent only cares about her receiving money (linear monetary value function). If X presses "sim" she will be given 10$ (else 9$) in the end of the game; each X* who presses "sim" receives 2$ (else 1$), respectively. Each agent in the game wants to maximize the expected monetary value they themselves will hold in their own hand after the game. So, intrinsically, they don't care how much money the other copies make. 
To spice things up: What if the simulation will only happen a year later? Are we then able to "choose" which year it is?

2014 iterated prisoner's dilemma tournament results

49 tetronian2 24 September 2014 03:43AM

Followup to: Announcing the 2014 program equilibrium iterated PD tournament

In August, I announced an iterated prisoner's dilemma tournament in which bots can simulate each other before making a move. Eleven bots were submitted to the tournament. Today, I am pleased to announce the final standings and release the source code and full results.

All of the source code submitted by the competitors and the full results for each match are available here. See here for the full set of rules and tournament code.

Before we get to the final results, here's a quick rundown of the bots that competed:

AnderBot

AnderBot follows a simple tit-for-tat-like algorithm that eschews simulation:

  • On the first turn, Cooperate.
  • For the next 10 turns, play tit-for-tat.
  • For the rest of the game, Defect with 10% probability or Defect if the opposing bot has defected more times than AnderBot.

 

CheeseBot

CheeseBot tries several simulation-based strategies, and uses the first one that applies to the current situation.

  • If the opponent defected on the previous round, Defect.
  • If the opponent does not defect against defectBot, Defect.
  • If defecting on this round would lead to CheeseBot being punished for it in future rounds, Cooperate. CheeseBot checks this by simulating future rounds with a simulated history in which it defects on the current round.
  • If the opponent is a mirror-like bot, Cooperate. To test whether a bot is mirror-like, CheeseBot simulates the opponent and checks if it defects against DefectBot and cooperates with a bot that plays tit-for-tat but defects against CooperateBot and DefectBot.
  • If it is the last round, Cooperate.
  • Defect.

 

DefectBot

Defect!

This bot was submitted publicly by James Miller.


DMRB

DavidMonRoBot (DMRB) takes a more cautious approach to simulating: It spends a few hundred milliseconds simulating its opponent to figure out what the rest of the round will look like given a Cooperate or Defect on the current round, and then picks the outcome that leads to the highest total number of points for DMRB.

This allows DMRB to gauge whether its opponent is "dumb," i.e. does not punish defectors. If the opponent is dumb, DMRB reasons that the best move is to defect; otherwise, if DMRB thinks that its opponent will punish defection, it simply plays tit-for-tat. DMRB spends only a small amount of time simulating so that other simulation-based bots will be less likely to have their simulations time out while simulating DMRB.


Pip

This one is a behemoth. At almost 500 very dense lines of Haskell (including comments containing quotes from "The Quantum Thief"), Pip uses a complex series of simulations to classify the opponent into a large set of defined behaviors, such as "CooperateOnLastInTheFaceOfCooperation", "Uncompromising" and "Extortionist." Then, it builds out a decision tree and selects the outcome that leads to the highest score at the end of the match. If I'm being vague here, it's because the inner workings of Pip are still mostly a mystery to me.


SimHatingTitForTat

SimHatingTitForTat, as the name implies, plays tit-for-tat and attempts to punish bots that use simulation to exploit it, namely by defecting against bots that deviated from tit-for-tat on any previous round. Its strategy is as follows:

  • On the first round, Cooperate.
  • On every subsequent round, if the opponent played tit-for-tat on all previous rounds, play tit-for-tat. Otherwise, Defect.

 

SimpleTFTseerBot

SimpleTFTseer uses a modified version of tit-for-tat that ruthlessly punishes defectors and uses one simulation per round to look at what its opponent will do on the last round.

  • If either player defected in a previous round, Defect.
  • If it is the last round, Cooperate.
  • Otherwise, simulate my opponent playing against me on the last round of this match, assuming that we both Cooperate from the current round until the second-to-last round, and do whatever my opponent does in that scenario. If the simulation does not terminate, Defect.

 

SwitchBot

SwitchBot also uses a modified tit-for-tat algorithm:

  • On the first turn, Cooperate.
  • On the second turn, if my opponent Defected on the previous turn, simulate my opponent playing against mirrorBot, and do whatever my opponent would do in that scenario.
  • Otherwise, play tit-for-tat.

 

TatForTit

TatForTit follows a complex simulation-based strategy:

  • On the first round, do whatever the opponent would do against TatForTit on the second round, assuming both bots cooperated on the first round.
  • On the second round, if my opponent defected on the previous round, Defect. Otherwise, do whatever my opponent would do against me, assuming they cooperated on the first round.
  • On all subsequent turns, if my previous move was not the same as my opponents move two turns ago, Defect. Otherwise, do whatever my opponent would do against me one turn in the future, assuming TatForTit repeats its previous move on the next turn and the opposing bot cooperated on the next turn.

 

TwoFacedTit

TwoFacedTit simulates its opponent playing against mirrorBot; if it takes more than 10 milliseconds to respond, TwoFacedTit plays Cooperate. Otherwise, TwoFacedTit plays tit-for-tat and then Defects on the last round.


VOFB

Like SimpleTFTseerBot, VeryOpportunisticFarseerBot (VOFB) uses a very aggressive defection-punishment strategy: If either player defected in a previous round, Defect. Otherwise, VOFB simulates the next round to determine what the opponent will do given a Cooperate or Defect on the current round. If the opponent does not punish defection or the simulation does not terminate, VOFB Defects. On the final round, VOFB uses additional simulations to detect whether its opponent defects against backstabbers, and if not, plays Defect.

 

Tournament results

After 1000 round-robin elimination matches, the final standings are:

1st place (tied): CheeseBot and DavidMonRoBot
3rd place: VeryOpportunisticFarseerBot
4th place: TatForTit

The win frequencies for each bot:

If it were a sporting event, this tournament would not be particularly exciting to watch, as most games were nearly identical from start to finish. The graph below shows each bot's frequency of surviving the first round-robin round:


In other words, the same half of the field consistently swept the first round; AnderBot, DefectBot, Pip, and SimpleTFTseer never survived to see a second round. In general, this is because high-scoring bots almost always cooperated with each other (with the occasional backstab at the end of the round), and defected against AnderBot, DefectBot, Pop, and SimpleTFTseerBot, as these bots either did not consistently retaliate when defected against or pre-emptively defected, triggering a chain of mutual defections. Interestingly, the bots that continued on to the next round did not do so by a large margin:

However, the variance in these scores was very low, primarily due to the repeated matchups of mostly-deterministic strategies consistently resulting in the same outcomes.

In addition, all of the matches progressed in one of the following 5 ways (hat tip lackofcheese):

  • ALL->[Cheese,DMRB,SimHatingTFT,Switch,TwoFacedTit,VOFB]->[Cheese,DMRB,VOFB] (963 matches)
    • (Only CheeseBot, DMRB, and VOFB made it into the final round; all three cooperated with each other for the entire round, resulting in a three-way tie)
  • ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,TwoFacedTit]->[Cheese,DMRB,TatForTit]->[DMRB,TatForTit]->[TatForTit] (32 matches)
    • (Only TatForTit and DMRB made it into the final round; both bots cooperated until the second-to-last turns of each matchup, where TatForTit played Defect while the DMRB played Cooperate, resulting in a TatForTit victory)
  • ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,TwoFacedTit,VOFB]->[Cheese,DMRB,SimHatingTFT,TwoFacedTit]->[Cheese,DMRB] (3 matches)
    • (Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
  • ALL->[Cheese,DMRB,SimHatingTFT,Switch,TatForTit,VOFB]->[Cheese,DMRB,SimHatingTFT]->[Cheese,DMRB] (1 match)
    • (Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)
  • ALL->[Cheese,DMRB,SimHatingTFT,Switch,TwoFacedTit,VOFB]->[Cheese,DMRB,SimHatingTFT]->[Cheese,DMRB] (1 match)
    • (Only CheeseBot and DMRB made it into the final round; both bots cooperated with each other for the entire round, resulting in a two-way tie)

This suggests that this game does have some kind of equilibrium, because these top three bots use very similar strategies: Simulate my opponent and figure out if defections will be punished; if so, Cooperate, and otherwise, defect. This allows bots following this strategy to always cooperate with each other, consistently providing them with a large number of points in every round, ensuring that they outcompete backstabbing or other aggressive strategies. In this tournament, this allowed the top three bots to add a guaranteed 600 points per round, more than enough to consistently keep them from being eliminated.

The tournament was slightly more interesting (and far more varied) on a matchup-by-matchup basis. Last-round and second-to-last round defections after mutual cooperation were common. TatForTit frequently used this technique against VOFB, CheeseBot, DMRB, and vice versa; this tactic allowed it to steal 32 wins in the final round. Other bots, particularly AnderBot and Pip, behaved very differently between matches. Pip, in particular, sometimes cooperated and sometimes defected for long stretches, and AnderBot's randomness also led to erratic behavior. Ultimately, though, this did not net these bots a large number of points, as their opponents generally defected as soon as they stopped cooperating.

For those interested in the gritty details, I've formatted the output of each match to be human-readable, so you can easily read through the play-by-play of each match (and hopefully get some enjoyment out of it, as well). See the github repo for the full logs.

Postscript

In the course of running the tournament, I received a number of suggestions and idea for how things could be improved; some of these ideas include:

  • Random-length matches instead of fixed-length matches.
  • Straight elimination rather than round-robin elimination.
  • More "cannon-fodder" bots included in the tournament by default, such as copies of cooperateBot, defectBot, and tit-for-tat.
  • A QuickCheck-based test suite that allows bot writers to more easily test properties of their bot while developing, such as "cooperates with cooperateBot" or "cooperates if no one has defected so far."


If anyone would like me to run this tournament again at some unspecified time in the future, with or without modifications, feel free to let me know in the comments. If you would like to fork the project and run it on your own, you are more than welcome to do so.

Many thanks to everyone who participated!

Books on consciousness?

7 mgg 23 September 2014 10:28PM

Does LW have a consensus on which books are worthwhile to read regarding consciousness? I read a small intro (Consciousness: A Very Short Introduction, Susan Blackmore, Oxford University Press), and the summary seems to be "Consciousness is pretty damn weird and no one seems to have much of a handle on it". As a non-technical layman, are there any useful books for me to read on the subject?

(I have started reading Daniel Dennet's Intuition Pumps, and I'm a bit torn. He seems highly respected by good scientists, but I feel that if the book didn't have his name on it, I would be well on my way to dismissing it. Are Dennet's earlier works on consciousness a good read?)

Beyond type 1 vs. type 2 processing: the tri-dimensional way (link)

2 RomeoStevens 23 September 2014 08:49PM

The System 1/2 schema is a popular and useful meme, but it feels limiting sometimes. I found this new paper interesting:

http://journal.frontiersin.org/Journal/10.3389/fpsyg.2014.00993/full

I'm of two minds about this (hah!). On the one hand, it often does feel like there are sharp divides in mindspace. Something will be understood by system 2 but this understanding does not show up in behavior. I still act as if the thing is not true. Then, by some mysterious process, the thing will "click" and it feels like system 1 really gets it. After this the belief in the thing is reflected in behavior. On the other hand, there are many instances where it does not feel appropriate to divide particular mental habits into either system 1 or 2. Doing math, for instance, seems to strongly have factors of both. My immediate intuition is that the continuous model is more "correct" but that there is quite a bit of clustering in the mindspace. System 1&2 would then simply be large clusters.

Anyway, I'm curious about other people's impressions.

One thing I'm frustrated by is that I don't have a map of proposed schemas. There have been lots of different ones proposed over the centuries, and I don't know of any place where I can find a summary of them, as well as draw links between ones that shared an intellectual lineage. Does anyone know of resources relating to this?

Superintelligence Reading Group 2: Forecasting AI

9 KatjaGrace 23 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the second section in the reading guide, Forecasting AI. This is about predictions of AI, and what we should make of them.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

ReadingOpinions about the future of machine intelligence, from Chapter 1 (p18-21) and Muehlhauser, When Will AI be Created?


Summary

Opinions about the future of machine intelligence, from Chapter 1 (p18-21)

  1. AI researchers hold a variety of views on when human-level AI will arrive, and what it will be like.
  2. A recent set of surveys of AI researchers produced the following median dates: 
    • for human-level AI with 10% probability: 2022
    • for human-level AI with 50% probability: 2040
    • for human-level AI with 90% probability: 2075
  3. Surveyed AI researchers in aggregate gave 10% probability to 'superintelligence' within two years of human level AI, and 75% to 'superintelligence' within 30 years.
  4. When asked about the long-term impacts of human level AI, surveyed AI researchers gave the responses in the figure below (these are 'renormalized median' responses, 'TOP 100' is one of the surveyed groups, 'Combined' is all of them'). 
  5. There are various reasons to expect such opinion polls and public statements to be fairly inaccurate.
  6. Nonetheless, such opinions suggest that the prospect of human-level AI is worthy of attention.
  1. Predicting when human-level AI will arrive is hard.
  2. The estimates of informed people can vary between a small number of decades and a thousand years.
  3. Different time scales have different policy implications.
  4. Several surveys of AI experts exist, but Muehlhauser suspects sampling bias (e.g. optimistic views being sampled more often) makes such surveys of little use.
  5. Predicting human-level AI development is the kind of task that experts are characteristically bad at, according to extensive research on what makes people better at predicting things.
  6. People try to predict human-level AI by extrapolating hardware trends. This probably won't work, as AI requires software as well as hardware, and software appears to be a substantial bottleneck.
  7. We might try to extrapolate software progress, but software often progresses less smoothly, and is also hard to design good metrics for.
  8. A number of plausible events might substantially accelerate or slow progress toward human-level AI, such as an end to Moore's Law, depletion of low-hanging fruit, societal collapse, or a change in incentives for development.
  9. The appropriate response to this situation is uncertainty: you should neither be confident that human-level AI will take less than 30 years, nor that it will take more than a hundred years.
  10. We can still hope to do better: there are known ways to improve predictive accuracy, such as making quantitative predictions, looking for concrete 'signposts', looking at aggregated predictions, and decomposing complex phenomena into simpler ones.
Notes
  1. More (similar) surveys on when human-level AI will be developed
    Bostrom discusses some recent polls in detail, and mentions that others are fairly consistent. Below are the surveys I could find. Several of them give dates when median respondents believe there is a 10%, 50% or 90% chance of AI, which I have recorded as '10% year' etc. If their findings were in another form, those are in the last column. Note that some of these surveys are fairly informal, and many participants are not AI experts, I'd guess especially in the Bainbridge, AI@50 and Klein ones. 'Kruel' is the set of interviews from which Nils Nilson is quoted on p19. The interviews cover a wider range of topics, and are indexed here.

       10% year  50% year  90% year  Other predictions
    Michie 1972 
    (paper download)
          Fairly even spread between 20, 50 and >50 years
    Bainbridge 2005        Median prediction 2085
    AI@50 poll 
    2006
          82% predict more than 50 years (>2056) or never
    Baum et al
    AGI-09
     2020      2040  2075  
    Klein 2011
        median 2030-2050
    FHI 2011  2028 2050   2150  
    Kruel 2011- (interviews, summary)  2025  2035  2070  
    FHI: AGI 2014 2022  2040  2065  
    FHI: TOP100 2014 2022   2040  2075  
    FHI:EETN 2014 2020  2050  2093  
    FHI:PT-AI 2014 2023  2048  2080  
    Hanson ongoing       Most say have come 10% or less of the way to human level
  2. Predictions in public statements
    Polls are one source of predictions on AI. Another source is public statements. That is, things people choose to say publicly. MIRI arranged for the collection of these public statements, which you can now download and play with (the original and info about it, my edited version and explanation for changes). The figure below shows the cumulative fraction of public statements claiming that human-level AI will be more likely than not by a particular year. Or at least claiming something that can be broadly interpreted as that. It only includes recorded statements made since 2000. There are various warnings and details in interpreting this, but I don't think they make a big difference, so are probably not worth considering unless you are especially interested. Note that the authors of these statements are a mixture of mostly AI researchers (including disproportionately many working on human-level AI) a few futurists, and a few other people.

    (LH axis = fraction of people predicting human-level AI by that date) 

    Cumulative distribution of predicted date of AI

    As you can see, the median date (when the graph hits the 0.5 mark) for human-level AI here is much like that in the survey data: 2040 or so.

    I would generally expect predictions in public statements to be relatively early, because people just don't tend to bother writing books about how exciting things are not going to happen for a while, unless their prediction is fascinatingly late. I checked this more thoroughly, by comparing the outcomes of surveys to the statements made by people in similar groups to those surveyed (e.g. if the survey was of AI researchers, I looked at statements made by AI researchers). In my (very cursory) assessment (detailed at the end of this page) there is a bit of a difference: predictions from surveys are 0-23 years later than those from public statements.
  3. What kinds of things are people good at predicting?
    Armstrong and Sotala (p11) summarize a few research efforts in recent decades as follows.


    Note that the problem of predicting AI mostly falls on the right. Unfortunately this doesn't tell us anything about how much harder AI timelines are to predict than other things, or the absolute level of predictive accuracy associated with any combination of features. However if you have a rough idea of how well humans predict things, you might correct it downward when predicting how well humans predict future AI development and its social consequences.
  4. Biases
    As well as just being generally inaccurate, predictions of AI are often suspected to subject to a number of biases. Bostrom claimed earlier that 'twenty years is the sweet spot for prognosticators of radical change' (p4). A related concern is that people always predict revolutionary changes just within their lifetimes (the so-called Maes-Garreau law). Worse problems come from selection effects: the people making all of these predictions are selected for thinking AI is the best things to spend their lives on, so might be especially optimistic. Further, more exciting claims of impending robot revolution might be published and remembered more often. More bias might come from wishful thinking: having spent a lot of their lives on it, researchers might hope especially hard for it to go well. On the other hand, as Nils Nilson points out, AI researchers are wary of past predictions and so try hard to retain respectability, for instance by focussing on 'weak AI'. This could systematically push their predictions later.

    We have some evidence about these biases. Armstrong and Sotala (using the MIRI dataset) find people are especially willing to predict AI around 20 years in the future, but couldn't find evidence of the Maes-Garreau law. Another way of looking for the Maes-Garreau law is via correlation between age and predicted time to AI, which is weak (-.017) in the edited MIRI dataset. A general tendency to make predictions based on incentives rather than available information is weakly supported by predictions not changing much over time, which is pretty much what we see in the MIRI dataset. In the figure below, 'early' predictions are made before 2000, and 'late' ones since then.


    Cumulative distribution of predicted Years to AI, in early and late predictions.

    We can learn something about selection effects from AI researchers being especially optimistic about AI from comparing groups who might be more or less selected in this way. For instance, we can compare most AI researchers - who tend to work on narrow intelligent capabilities - and researchers of 'artificial general intelligence' (AGI) who specifically focus on creating human-level agents. The figure below shows this comparison with the edited MIRI dataset, using a rough assessment of who works on AGI vs. other AI and only predictions made from 2000 onward ('late'). Interestingly, the AGI predictions indeed look like the most optimistic half of the AI predictions. 


    Cumulative distribution of predicted date of AI, for AGI and other AI researchers

    We can also compare other groups in the dataset - 'futurists' and other people (according to our own heuristic assessment). While the picture is interesting, note that both of these groups were very small (as you can see by the large jumps in the graph). 


    Cumulative distribution of predicted date of AI, for various groups

    Remember that these differences may not be due to bias, but rather to better understanding. It could well be that AGI research is very promising, and the closer you are to it, the more you realize that. Nonetheless, we can say some things from this data. The total selection bias toward optimism in communities selected for optimism is probably not more than the differences we see here - a few decades in the median, but could plausibly be that large.

    These have been some rough calculations to get an idea of the extent of a few hypothesized biases. I don't think they are very accurate, but I want to point out that you can actually gather empirical data on these things, and claim that given the current level of research on these questions, you can learn interesting things fairly cheaply, without doing very elaborate or rigorous investigations.
  5. What definition of 'superintelligence' do AI experts expect within two years of human-level AI with probability 10% and within thirty years with probability 75%?
    “Assume for the purpose of this question that such HLMI will at some point exist. How likely do you then think it is that within (2 years / 30 years) thereafter there will be machine intelligence that greatly surpasses the performance of every human in most professions?” See the paper for other details about Bostrom and Müller's surveys (the ones in the book).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Instead of asking how long until AI, Robin Hanson's mini-survey asks people how far we have come (in a particular sub-area) in the last 20 years, as a fraction of the remaining distance. Responses to this question are generally fairly low - 5% is common. His respondents also tend to say that progress isn't accelerating especially. These estimates imply that any given sub-area of AI, human-level ability should be reached in about 200 years, which is strongly at odds with what researchers say in the other surveys. An interesting project would be to expand Robin's survey, and try to understand the discrepancy, and which estimates we should be using. We made a guide to carrying out this project.
  2. There are many possible empirical projects which would better inform estimates of timelines e.g. measuring the landscape and trends of computation (MIRI started this here, and made a project guide), analyzing performance of different versions of software on benchmark problems to find how much hardware and software contributed to progress, developing metrics to meaningfully measure AI progress, investigating the extent of AI inspiration from biology in the past, measuring research inputs over time (e.g. a start), and finding the characteristic patterns of progress in algorithms (my attempts here).
  3. Make a detailed assessment of likely timelines in communication with some informed AI researchers.
  4. Gather and interpret past efforts to predict technology decades ahead of time. Here are a few efforts to judge past technological predictions: Clarke 1969Wise 1976, Albright 2002, Mullins 2012Kurzweil on his own predictions, and other people on Kurzweil's predictions
  5. Above I showed you several rough calculations I did. A rigorous version of any of these would be useful.
  6. Did most early AI scientists really think AI was right around the corner, or was it just a few people? The earliest survey available (Michie 1973) suggests it may have been just a few people. For those that thought AI was right around the corner, how much did they think about the safety and ethical challenges? If they thought and talked about it substantially, why was there so little published on the subject? If they really didn’t think much about it, what does that imply about how seriously AI scientists will treat the safety and ethical challenges of AI in the future? Some relevant sources here.
  7. Conduct a Delphi study of likely AGI impacts. Participants could be AI scientists, researchers who work on high-assurance software systems, and AGI theorists.
  8. Signpost the future. Superintelligence explores many different ways the future might play out with regard to superintelligence, but cannot help being somewhat agnostic about which particular path the future will take. Come up with clear diagnostic signals that policy makers can use to gauge whether things are developing toward or away from one set of scenarios or another. If X does or does not happen by 2030, what does that suggest about the path we’re on? If Y ends up taking value A or B, what does that imply?
  9. Another survey of AI scientists’ estimates on AGI timelines, takeoff speed, and likely social outcomes, with more respondents and a higher response rate than the best current survey, which is probably Müller & Bostrom (2014).
  10. Download the MIRI dataset and see if you can find anything interesting in it.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about two paths to the development of superintelligence: AI coded by humans, and whole brain emulation. To prepare, read Artificial Intelligence and Whole Brain Emulation from Chapter 2The discussion will go live at 6pm Pacific time next Monday 29 September. Sign up to be notified here.

I may have just had a dangerous thought.

0 Eitan_Zohar 22 September 2014 08:04PM

I'm interested in discussing this with someone, non-publicly. It's safe to know about personally, but it's not something I'd like people in general to know.

I'm really not sure if there is a protocol for this sort of thing.

CEV-tropes

8 snarles 22 September 2014 06:21PM

As seen in other threads, people disagree on whether CEV exists, and if it does, what it might turn out to be.

 

It would be nice to try to categorize common speculations about CEV.

1a. CEV doesn't exist, because human preferences are too divergent

1b. CEV doesn't even exist for a single human 

1c. CEV does exist, but it results in a return to the status quo

2a. CEV results in humans living in a physical (not virtual reality) utopia

2b. CEV results in humans returning to a more primitive society free of technology

2c. CEV results in humans living together in a simulation world, where most humans do not have god-like power

(the similarity between 2a, 2b, and 2c is that humans are still living in the same world, similar to traditional utopia scenarios)

3. CEV results in a wish for the annihilation of all life, or maybe the universe

4a. CEV results in all humans granted the right to be the god of their own private simulation universe (once we acquire the resources to do so)

4b. CEV can be implemented for "each salient group of living things in proportion to that group's moral weight"

5. CEV results in all humans agreeing to be wireheaded (trope)

6a. CEV results in all humans agreeing to merge into a single being and discarding many of the core features of humankind which have lost their purpose (trope)

6b. CEV results in humans agree to cease their own existence but also creating a superior life form--the outcome is similar to 6a, but the difference is that here, humans do not care about whether they are individually "merged"

7. CEV results in all/some humans willingly forgetting/erasing their history, or being indifferent to preserving history so that it is lost (compatible with all previous tropes)

Obviously there are too many possible ideas (or "tropes") to list, but perhaps we could get a sense of which ones are the most common in the LW community.  I leave it to someone else to create a poll supposing they feel they have a close to complete list, or create similar topics for AI risk, etc.

EDIT: Added more tropes, changed #2 since it was too broad: now #2 refers to CEV worlds where humans live in the "same world"

CEV: coherence versus extrapolation

14 Stuart_Armstrong 22 September 2014 11:24AM

It's just struck me that there might be a tension between the coherence (C) and the extrapolated (E) part of CEV. One reason that CEV might work is that the mindspace of humanity isn't that large - humans are pretty close to each other, in comparison to the space of possible minds. But this is far more true in every day decisions than in large scale ones.

Take a fundamentalist Christian, a total utilitarian, a strong Marxist, an extreme libertarian, and a couple more stereotypes that fit your fancy. What can their ideology tell us about their everyday activities? Well, very little. Those people could be rude, polite, arrogant, compassionate, etc... and their ideology is a very weak indication of that. Different ideologies and moral systems seem to mandate almost identical everyday and personal interactions (this is in itself very interesting, and causes me to see many systems of moralities as formal justifications of what people/society find "moral" anyway).

But now let's more to a more distant - "far" - level. How will these people vote in elections? Will they donate to charity, and if so, which ones? If they were given power (via wealth or position in some political or other organisation), how are they likely to use that power? Now their ideology is much more informative. Though it's not fully determinative, we would start to question the label if their actions at this level seemed out of synch. A Marxist that donated to a Conservative party, for instance, would give us pause, and we'd want to understand the apparent contradiction.

Let's move up yet another level. How would they design or change the universe if they had complete power? What is their ideal plan for the long term? At this level, we're entirely in far mode, and we would expect that their vastly divergent ideologies would be the most informative piece of information about their moral preferences. Details about their character and personalities, which loomed so large at the everyday level, will now be of far lesser relevance. This is because their large scale ideals are not tempered by reality and by human interactions, but exist in a pristine state in their minds, changing little if at all. And in almost every case, the world they imagine as their paradise will be literal hell for the others (and quite possibly for themselves).

To summarise: the human mindspace is much narrower in near mode than in far mode.

And what about CEV? Well, CEV is what we would be "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". The "were more the people we wished we were" is going to be dominated by the highly divergent far mode thinking. The "had grown up farther together" clause attempts to mesh these divergences, but that simply obscures the difficulty involved. The more we extrapolate, the harder coherence becomes.

It strikes me that there is a strong order-of-operations issue here. I'm not a fan of CEV, but it seems it would be much better to construct, first, the coherent volition of humanity, and only then to extrapolate it.

Open thread, September 22-28, 2014

5 Gunnar_Zarncke 22 September 2014 05:59AM

If it's worth saying, but not worth its own post (even in Discussion), then it goes here.

Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should be posted in Discussion, and not Main.

4. Open Threads should start on Monday, and end on Sunday.

In order to greatly reduce X-risk, design self-replicating spacecraft without AGI

2 chaosmage 20 September 2014 08:25PM

tl/dr: If we'd build a working self-replicating spacecraft, that'd prove we're past the Great Filter. Therefore, certainty we can do that would eliminate much existential risk. It is a potentially highly visible project that gives publicity to reasons not to include AGI. Therefore, serious design work on a self-replicating spacecraft should have a high priority.

I'm assuming you've read Stuart_Armstrong's excellent recent article on the Great Filter. In the discussion thread for that, RussellThor observed:

if we make a simple replicator and have it successfully reach another solar system (with possibly habitable planets) then that would seem to demonstrate that the filter is behind us.

If that is obvious to you, skip to the next subheading.

The evolution from intelligent spacefaring species to producer of self-replicating spacecraft (henceforth SRS, used in the plural) is inevitable, if SRS are possible. This is simply because the matter and negentropy available in the wider universe is a staggeringly vast resource of staggering value. Even species who are unlikely to ever visit and colonize other stars in the form that evolution gave them (this includes us) can make use of these resources. For example, if we could build on (or out of) empty planets supercomputers that receive computation tasks by laser beam and output results the same way, we would be economically compelled to do so simply because those supercomputers could handle computational tasks that no computer on Earth could complete in less than the time it takes that laser beam to travel forth and back. That supercomputer would not need to run even a weak AI to be worth more than the cost of sending the probe that builds it.

Without a doubt there are countless more possible uses for these, shall we say, exoresources. If Dyson bubbles or mind uploads or multistellar hypertelescopes or terraforming are possible, each of these alone create another huge incentive to build SRS. Even mere self-replicating refineries that break up planets into more readily accessible resources for future generations to draw from would be an excellent investment. But the obvious existence of this supercomputer incentive is already reason enough to do it.

All the Great Filter debate boils down to the question of how improbable our existence really is. If we're probable, many intelligent species capable of very basic space travel should exist. If we're not, they shouldn't. We know there doesn't appear to be any species inside a large fraction of our light cone so capable of space travel it has sent out SRS. So the only way we could be probable is if there's a Great Filter ahead of us, stopping us (and everyone else capable of basic space travel) from becoming the kind of species that sends out SRS. If we became such a species, we'd know we're past the Filter and while we still wouldn't know how improbable which of the conditions that allowed for our existence was, we'd know that when putting them all together, they multiply into some very small probability of our existence, and a very small probability of any comparable species existing in a large section of our light cone.

LW users generally seem to think SRS are doable and that means we're quite improbable, i.e. the Filter is behind us. But lots of people are less sure, and even more people haven't thought about it. The original formulation of the Drake equation included a lifespan of civilizations partly to account for the intuition that a Great Filter type event could be coming in the future. We could be more sure than we are now, and make a lot of people much more sure than they are now, about our position in reference to that Filter. And that'd have some interesting consequences.

How knowing we're past the Great Filter reduces X-risk

The single largest X-risk we've successfully eliminated is the impact of an asteroid large enough to destroy us entirely. And we didn't do that by moving any asteroids; we simply mapped all of the big ones. We now know there's no asteroid that is both large enough to kill us off and coming soon enough that we can't do anything about it. Hindsight bias tells us this was never a big threat - but look ten years back and you'll find The Big Asteroid on every list of global catastrophic risks, usually near the top. We eliminated that risk simply by observation and deduction, by finding out it did not exist rather than removing it.

Obviously a working SRS that gives humanity outposts in other solar systems would reduce most types of X-risk. But even just knowing we could build one should decrease confidence in the ability of X-risks to take us out entirely. After all, if as Bostrom argues, the possibility that the Filter is ahead of is increases the probability of any X-risk, the knowledge that it is not ahead of us has to be evidence against all of them except those that could kill a Type 3 civilization. And if, as Bostrom says in that same paper, finding life elsewhere that is closer to our stage of development is worse news than finding life further from it, to increase the distance between us and either type of life decreases the badness of the existence of either.

Of course we'd only be certain if we had actually built and sent such a spacecraft. But in order to gain confidence we're past the filter, and to gain a greater lead to life possibly discovered elsewhere, a design that is agreed to be workable would go most of the way. If it is clear enough that someone with enough capital could claim incredible gains by doing that, we can be sure enough someone eventually (e.g. Elon Musk after SpaceX's IPO around 2035) will do that, giving high confidence we've passed the filter.

I'm not sure what would happen if we could say (with more confidence than currently) that we're probably the species that's furthest ahead at least in this galaxy. But if that's true, I don't just want to believe it, I want everyone else to believe it too, because it seems like a fairly important fact. And an SRS design would help do that.

We'd be more sure we're becoming a Type 3 civilization, so we should then begin to think about what type of risk could kill that, and UFAI would probably be more pronounced on that list than it is on the current geocentric ones.

What if we find out SRS are impossible at our pre-AGI level of technology? We still wouldn't know if an AI could do it. But even knowing our own inability would be very useful information, especially about the dangerousness of vatrious types of X-risk.

How easily this X-risk reducing knowledge can be attained

Armstrong and Sandberg claim the feasibility of self-replicating spacecraft has been a settled matter since the Freitag design of 1980. But that paper, while impressively detailed and a great read, glosses over the exact computing abilities such a system would need, does not mention hardening against interstellar radiation, assumes fusion drives and probably has a bunch of other problems that I'm not qualified to discover. I haven't looked at all the papers that cite it (yet), but the ones I've seen seem to agree self-replicating spacecraft are plausible. Sandberg has some good research questions that I agree need to be answered, but never seems to waver from his assumption that SRS are basically possible, although he's aware of the gaps in knowledge that preclude such an assumption from being safe.

There are certainly some questions that I'm not sure we can answer. For example:

  1. Can we build fission-powered spacecraft (let alone more speculative designs) that will survive the interstellar environment for decades or centuries?
  2. How can we be certain to avoid mutations that grow outside of our control, and eventually devour Earth?
  3. Can communication between SRS and colonies, especially software updates, be made secure enough?
  4. Can a finite number of probe designs (to be included on any of them) provide a vehicle for every type of journey we'd want the SRS network to make?
  5. Can a fiinite number of colony designs provide a blueprint for every source of matter and negentropy we'd want to develop?
  6. What is the ethical way to treat any life the SRS network might encounter?

But all of these except for the last one, and Sandberg's questions, are engineering questions and those tend to be answerable. If not, remember, we don't need to have a functioning SRS to manage X-risk, any reduction of uncertainty around their feasibility already helps. And again, the only design I could find that gives any detail at all is from a single guy writing in 1980. If we merely do better than he did (find or rule out a few of the remaining obstacles), we already help ascertain our level of X-risk. Compare the asteroid detection analogy: We couldn't be certain that we wouldn't be hit by an asteroid until we looked at all of them, but getting started with part of the search space was a very valuable thing to do anyway.

Freitag and others use to assume SRS should be run by some type of AGI. Sandberg says SRS without AGI, with what he calls "lower order intelligence", "might be adequate". I disagree with both assessments, and with Sandberg's giving this question less priority than, say, study of mass drivers. Given the issues of AGI safety, a probe that works without AGI should be distinctly preferable. And (unlike an intelligent one) its computational components can be designed right now, down to the decision tree it should follow. While at it, and in order to use the publicity such a project might generate, give an argument for this design choice that highlights the AGI safety issues. A scenario where a self-replicating computer planet out there decides for itself should serve to highlight the dangers of AGI far more viscerally than conventional "self-aware desktop box" scenarios.

If we're not looking for an optimal design, but the bare minimum necessary to know we're past the filter, that gives us somewhat relaxed design constraints. This probe wouldn't necessarily need to travel at a significant fraction of light speed, and its first generation wouldn't need to be capable of journeys beyond, say, five parsec. It does have to be capable of interstellar travel, and of progressing to intergalactic travel at some point, say when it finds all nearby star systems to contain copies of itself. A non-interstellar probe fit to begin the self-replication process on a planet like Jupiter, refining resources and building launch facilities there, would be a necessary first step.

An introduction to Newcomblike problems

14 So8res 20 September 2014 06:40PM

This is crossposted from my new blog, following up on my previous post. It introduces the original "Newcomb's problem" and discusses the motivation behind twoboxing and the reasons why CDT fails. content is probably review for most LessWrongers, later posts in the sequence may be of more interest.


Last time I introduced causal decision theory (CDT) and showed how it has unsatisfactory behavior on "Newcomblike problems". Today, we'll explore Newcomblike problems in a bit more depth, starting with William Newcomb's original problem.

The Problem

Once upon a time there was a strange alien named Ω who is very very good at predicting humans. There is this one game that Ω likes to play with humans, and Ω has played it thousands of times without ever making a mistake. The game works as follows:

First, Ω observes the human for a while and collects lots of information about the human. Then, Ω makes a decision based on how Ω predicts the human will react in the upcoming game. Finally, Ω presents the human with two boxes.

The first box is blue, transparent, and contains $1000. The second box is red and opaque.

You may take either the red box alone, or both boxes,

Ω informs the human. (These are magical boxes where if you decide to take only the red one then the blue one, and the $1000 within, will disappear.)

If I predicted that you would take only the red box, then I filled it with $1,000,000. Otherwise, I left it empty. I have already made my choice,

Ω concludes, before turning around and walking away.

You may take either only the red box, or both boxes. (If you try something clever, like taking the red box while a friend takes a blue box, then the red box is filled with hornets. Lots and lots of hornets.) What do you do?

continue reading »

You’re Entitled to Everyone’s Opinion

24 satt 20 September 2014 03:39PM

Over the past year, I've noticed a topic where Less Wrong might have a blind spot: public opinion. Since last September I've had (or butted into) five conversations here where someone's written something which made me think, "you wouldn't be saying that if you'd looked up surveys where people were actually asked about this". The following list includes six findings I've brought up in those LW threads. All of the findings come from surveys of public opinion in the United States, though some of the results are so obvious that polls scarcely seem necessary to establish their truth.

  1. The public's view of the harms and benefits from scientific research has consistently become more pessimistic since the National Science Foundation began its surveys in 1979. (In the wake of repeated misconduct scandals, and controversies like those over vaccination, global warming, fluoridation, animal research, stem cells, and genetic modification, people consider scientists less objective and less trustworthy.)
  2. Most adults identify as neither Republican nor Democrat. (Although the public is far from apolitical, lots of people are unhappy with how politics currently works, and also recognize that their beliefs align imperfectly with the simplistic left-right axis. This dissuades them from identifying with mainstream parties.)
  3. Adults under 30 are less likely to believe that abortion should be illegal than the middle-aged. (Younger adults tend to be more socially liberal in general than their parents' generation.)
  4. In the 1960s, those under 30 were less likely than the middle-aged to think the US made a mistake in sending troops to fight in Vietnam. (The under-30s were more likely to be students and/or highly educated, and more educated people were less likely to think sending troops to Vietnam was a mistake.)
  5. The Harris Survey asked, in November 1969, "as far as their objectives are concerned, do you sympathize with the goals of the people who are demonstrating, marching, and protesting against the war in Vietnam, or do you disagree with their goals?" Most respondents aged 50+ sympathized with the protesters' goals, whereas only 28% of under-35s did. (Despite the specific wording of the question, the younger respondents worried that the protests reflected badly on their demographic, whereas older respondents were more often glad to see their own dissent voiced.)
  6. A 2002 survey found that about 90% of adult smokers agreed with the statement, "If you had to do it over again, you would not have started smoking." (While most smokers derive enjoyment from smoking, many weight smoking's negative consequences strongly enough that they'd rather not smoke; they continue smoking because of habit or addiction.)

continue reading »

LessWrong's attitude towards AI research

8 Florian_Dietz 20 September 2014 03:02PM

AI friendliness is an important goal and it would be insanely dangerous to build an AI without researching this issue first. I think this is pretty much the consensus view, and that is perfectly sensible.

However, I believe that we are making the wrong inferences from this.

The straightforward inference is "we should ensure that we completely understand AI friendliness before starting to build an AI". This leads to a strongly negative view of AI researchers and scares them away. But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".

It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.

I therefore think that we should try to take a more pragmatic approach. The way to do this would be to focus more on outreach and less on research. It won't do anyone any good if we find the perfect formula for AI friendliness on the same day that someone who has never heard of AI friendliness before finishes his paperclip maximizer.

What is your opinion on this?

Street action "Stop existential risks!", Union square, San Francisco, September 27, 2014 at 2:00 PM

-16 turchin 20 September 2014 02:08PM

Existential risks are the risks of human extinction. A global catastrophe will happen most likely because of the new technologies such as biotech, nanotech, and AI, along with several other risks: runaway global warming, and nuclear war. Sir Martin Rees estimates these risks to have a fifty percent probability in the 21st century.

We must raise the awareness of impending doom and make the first ever street action against the possibility of human extinction. Our efforts could help to prevent these global catastrophes from taking place. I suggest we meet in Union square, San Francisco, September 27, 2014 at 2:00 PM in order to make a short and intense photo session with the following slogans:

Stop Existential Risks!

No Human Extinction!

AI must be Friendly!

No Doomsday Weapons!

Ebola must die!

Prevent Global Catastrophe!

These slogans will be printed in advance, but more banners are welcome. I have previous experience with organizing actions for immortality and funding of life extension near Googleplex, the White house in DC, and Burning Man, and I know this street action, taking place on September 27th,  is both legal and a fun way to express our points of view.

Organized by Alexey Turchin and Longevity Party.

 

Update: Photos from the action.

 

Discussion of "What are your contrarian views?"

7 Metus 20 September 2014 12:09PM

I'd like to use this thread to review the "What are your contrarian views?" thread as the meta discussion there was drowned out by the intended content I feel. What can be done better with the voting system? Should threads like these be a regular occurence? What have you specifically learned from that thread? Did you like it at all?

 

Usual voting rules apply.

[question] What edutainment apps do you recommend?

5 Gunnar_Zarncke 20 September 2014 08:55AM

Follow up to: Rationality Games Apps

In the spirit of: Games for rationalists

My son (10) wants a smartphone and I reasonably expect that he wants to and will play games with it. He appears to be the right age to use it. I don't want to prevent him from playing games nor do I think that possible or helpful. But I'd like to suggest and promote a few apps and games that *are* helpful or from which he can learn something. 

Obvious candidates are 

There are lots of low profile apps filed under learning in the app stores but most of this is crap and it takes lots of time to explore these. 

I also found some recommendation for learning with Android apps and will point my son to these. 

I'd like to hear what apps do you or yours children use. Which apps and esp. games do you recommend for future rationalists?

Link: quotas-microaggression-and-meritocracy

-4 Lexico 19 September 2014 10:18PM

 

I remember seeing a talk of the concept of privilege show up in the discussion thread on contrarian views.

Some discussion got started from "Feminism is a good thing. Privilege is real."

This is an article that presents some of those ideas in a way that might be approachable for LW.

http://curt-rice.com/quotas-microaggression-and-meritocracy/

One of the ideas I take out of this is that these issues can be examined as the result of unconscious cognitive bias. IE sexism isn't the result of any conscious thought, but can be the result as a failure mode where we don't rationality correctly in these social situations.

Of course a broad view of these issues exist, and many people have different ways of looking at these issues, but I think it would be good to focus on the case presented in this article rather than your other associations.

Weekly LW Meetups

2 FrankAdamek 19 September 2014 04:43PM

This summary was posted to LW Main on September 12th. The following week's summary is here.

Irregularly scheduled Less Wrong meetups are taking place in:

The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:

Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Toronto, Vienna, Washington DC, Waterloo, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.

continue reading »

A proof of Löb's theorem in Haskell

28 cousin_it 19 September 2014 01:01PM

I'm not sure if this post is very on-topic for LW, but we have many folks who understand Haskell and many folks who are interested in Löb's theorem (see e.g. Eliezer's picture proof), so I thought why not post it here? If no one likes it, I can always just move it to my own blog.

A few days ago I stumbled across a post by Dan Piponi, claiming to show a Haskell implementation of something similar to Löb's theorem. Unfortunately his code had a couple flaws. It was circular and relied on Haskell's laziness, and it used an assumption that doesn't actually hold in logic (see the second comment by Ashley Yakeley there). So I started to wonder, what would it take to code up an actual proof? Wikipedia spells out the steps very nicely, so it seemed to be just a matter of programming.

Well, it turned out to be harder than I thought.

One problem is that Haskell has no type-level lambdas, which are the most obvious way (by Curry-Howard) to represent formulas with propositional variables. These are very useful for proving stuff in general, and Löb's theorem uses them to build fixpoints by the diagonal lemma.

The other problem is that Haskell is Turing complete, which means it can't really be used for proof checking, because a non-terminating program can be viewed as the proof of any sentence. Several people have told me that Agda or Idris might be better choices in this regard. Ultimately I decided to use Haskell after all, because that way the post will be understandable to a wider audience. It's easy enough to convince yourself by looking at the code that it is in fact total, and transliterate it into a total language if needed. (That way you can also use the nice type-level lambdas and fixpoints, instead of just postulating one particular fixpoint as I did in Haskell.)

But the biggest problem for me was that the Web didn't seem to have any good explanations for the thing I wanted to do! At first it seems like modal proofs and Haskell-like languages should be a match made in heaven, but in reality it's full of subtle issues that no one has written down, as far as I know. So I'd like this post to serve as a reference, an example approach that avoids all difficulties and just works.

LW user lmm has helped me a lot with understanding the issues involved, and wrote a candidate implementation in Scala. The good folks on /r/haskell were also very helpful, especially Samuel Gélineau who suggested a nice partial implementation in Agda, which I then converted into the Haskell version below.

To play with it online, you can copy the whole bunch of code, then go to CompileOnline and paste it in the edit box on the left, replacing what's already there. Then click "Compile & Execute" in the top left. If it compiles without errors, that means everything is right with the world, so you can change something and try again. (I hate people who write about programming and don't make it easy to try out their code!) Here we go:

main = return ()
-- Assumptions
data Theorem a
logic1 = undefined :: Theorem (a -> b) -> Theorem a -> Theorem b logic2 = undefined :: Theorem (a -> b) -> Theorem (b -> c) -> Theorem (a -> c) logic3 = undefined :: Theorem (a -> b -> c) -> Theorem (a -> b) -> Theorem (a -> c)
data Provable a
rule1 = undefined :: Theorem a -> Theorem (Provable a) rule2 = undefined :: Theorem (Provable a -> Provable (Provable a)) rule3 = undefined :: Theorem (Provable (a -> b) -> Provable a -> Provable b)
data P
premise = undefined :: Theorem (Provable P -> P)
data Psi
psi1 = undefined :: Theorem (Psi -> (Provable Psi -> P)) psi2 = undefined :: Theorem ((Provable Psi -> P) -> Psi)
-- Proof
step3 :: Theorem (Psi -> Provable Psi -> P) step3 = psi1
step4 :: Theorem (Provable (Psi -> Provable Psi -> P)) step4 = rule1 step3
step5 :: Theorem (Provable Psi -> Provable (Provable Psi -> P)) step5 = logic1 rule3 step4
step6 :: Theorem (Provable (Provable Psi -> P) -> Provable (Provable Psi) -> Provable P) step6 = rule3
step7 :: Theorem (Provable Psi -> Provable (Provable Psi) -> Provable P) step7 = logic2 step5 step6
step8 :: Theorem (Provable Psi -> Provable (Provable Psi)) step8 = rule2
step9 :: Theorem (Provable Psi -> Provable P) step9 = logic3 step7 step8
step10 :: Theorem (Provable Psi -> P) step10 = logic2 step9 premise
step11 :: Theorem ((Provable Psi -> P) -> Psi) step11 = psi2
step12 :: Theorem Psi step12 = logic1 step11 step10
step13 :: Theorem (Provable Psi) step13 = rule1 step12
step14 :: Theorem P step14 = logic1 step10 step13
-- All the steps squished together
lemma :: Theorem (Provable Psi -> P) lemma = logic2 (logic3 (logic2 (logic1 rule3 (rule1 psi1)) rule3) rule2) premise
theorem :: Theorem P theorem = logic1 lemma (rule1 (logic1 psi2 lemma))

To make sense of the code, you should interpret the type constructor Theorem as the symbol ⊢ from the Wikipedia proof, and Provable as the symbol ☐. All the assumptions have value "undefined" because we don't care about their computational content, only their types. The assumptions logic1..3 give just enough propositional logic for the proof to work, while rule1..3 are direct translations of the three rules from Wikipedia. The assumptions psi1 and psi2 describe the specific fixpoint used in the proof, because adding general fixpoint machinery would make the code much more complicated. The types P and Psi, of course, correspond to sentences P and Ψ, and "premise" is the premise of the whole theorem, that is, ⊢(☐P→P). The conclusion ⊢P can be seen in the type of step14.

As for the "squished" version, I guess I wrote it just to satisfy my refactoring urge. I don't recommend anyone to try reading that, except maybe to marvel at the complexity :-)

EDIT: in addition to the previous Reddit thread, there's now a new Reddit thread about this post.

Friendliness in Natural Intelligences

-4 Slider 18 September 2014 10:33PM

The challenge of friendliness in Artificial Intelligence is to ensure how a general intelligence will be of utility instead of being destructive or pathologically indifferent to the values of existing individuals or aims and goals of their creation. The current provision of computer science is likely to yield bugs and way too technical and inflexible guidelines of action. It is known to be inadequate to handle the job sufficiently. However the challenge of friendliness is also faced by natural intelligences, those that are not designed by an intelligence but molded into being by natural selection.

We know that natural intelligences do the job adequately enough that we do not think that natural intelligence unfriendliness is a significant existential threat. Like plants do solar energy capturing way more efficently and maybe utilising quantum effects that humans can't harness, we know that natural intelligences are using friendliness technology that is of higher caliber that we can build into machines. However as we progress this technology maybe lacking dangerously behind and we need to be able to apply it to hardware in addition to wetware and potentially boost it to new levels.

The earliest concrete example of a natural intelligence being controlled for friendliness I can think of is Socrates. He was charged for "corruption of the heart of the societys youngters". He defended that his stance of questioning everything was without fault. He was however found quilty even thought the trial could be identified with faults. The jury might have been politically motivated or persuaded and the citizens might have expected the results to not be taken seriously. While Socrates was given a very real possibility of escaping imprisonment and capital punishment he did not circumvent his society operation. In fact he was obidient enough that he acted as his own executioner drinking the poison himself. Because of the kind of farce his teachers death had been Plato lost hope for the principles that lead to such an absurd result him becoming skeptical of democrasy.

However if the situation would have been about a artificial intelligence a lot of things went very right. The intelligences society became scared of him and asked it to die. There was dialog about how the deciders were ignorant and stupid and that nothing questionable had been done. However ultimately when issues of miscommunications had been cleared and the society insisted upon its expression of will instead of circumventing the intervention the intelligence pulled its own plug voluntarily. Therefore Socrates was propably the first friendly (natural) intelligence.

The mechanism used in this case was that of a juridical system. That is a human society recognises that certain acts and individuals are worth restraining for the danger that they pose to the common good. A common method is incarcenation and the threat of it. That is certain bad acts can be tolerated in the wild and corrective action can then be employed. When there is reason to expect bad acts or no reason to expect good acts individuals can be restricted in never being able to act in the first place. Whether a criminal is released early can depend on whether there is reason to expect not to be a repeat offender. That is understanding how an agent acts makes it easier to grant operating priviledges. Such hearings are very analogous to a gatekeeper and a AI in a AI-boxing situation.

However when a new human is created it is not assumed hostile until proven friendly. Rather humans are born innocent but powerless. A fully educated and socialised intelligence is assigned for multiple year observation and control period. These so called "parents" have a very wide freedom on programming principles. However human psychology also has peroid of "peer guidedness" where the opinion of peers becomes important. When a youngter grows his thinking is constantly being monitored and things like time of onset of speech are monitored with interest. They also gain guidance on very trivial thinking skills. While this has culture passing effect it also keeps the parent very updated on what is the mental status of the child. Never is a child allowed to grow or reason extended amounts of time isolated. Thus the task of evaluating whether an unknown individual is friendly or not is not encountered. There is never a need to turing-test that a child "works". There is always a maintainer and it has the equivalent of psychological growth logs.

However despite all these measures we know that small children can be cruel and have little empathy. However instead of shelving them as rejects we either accomodate them with an environment that minimises the harm or direct them to a more responcible path. When a child ask a question on how they should approach a particular kind of situation this can be challenging for the parent to answer. The parent might also resort to giving a best-effort answer that might not be entirely satisfactory or even wrong advice may be given. However children have dialog with their parents and other peers.

An interesting question is does parenting break down if the child is intellectually too developed compared to the parent or parenting environment? It's also worth noting that children are not equipped with a "constitution of morality". Some things they infer from experience. Some ethical rules are thougth them explicitly. They learn to apply the rules and interpret them in different situations. Some rules might be contradictory and some moral authorities trusted more.

Beoynd the individual level groups of people have an mechanism of acccepting other groups. This doesn't always happen without conditions. However here things seem to work much less efficently. If two groups of people differ in values enough they might start a war of ideology against each other. This kind of war usually concludes with physical action instead of arguments. Suppression of Nazi Germany can be seen as friendliness immune reaction. Normally divergent values and issues having countries wanted and could unite against a different set of values that was tried to be imposed by force. However the success Nazis had can debatably be attributed for a lousy conclusion of world war I. The effort extended to build peace varies and contests with other values.

Friendliness migth also have an important component that it is relative to a set of values. A society will support the upring of certain kinds of children with the suppression of certain other kinds. USSR had officers that's sole job was to protect that things were going according to party line. At this point we have trouble getting a computer to follow anyones values. However it might be important to ask "friendly to whom?". The exploration of friendliness is also an exploration in hostility. We want to be hostile towards UFAIs. It would be awful for a AI to be friendly only towards it's inventor, or only towards it's company. However we have been hostile to neardentals. Was that wrong? Would it be a signficant loss to developed sentience if AIs were less than friendly to humans?

If we ask our grandgrandgrandparents on how we should conduct things they might give a different version than we have. It's expectable that our children are capable of going beyond our morality. Ensuring that a societys values are never violated would be to freeze them in time indefinately. In this way there can be danger in developing a too friendly AI. For that AI could never be truly superhuman. In a way if my child asks me a morally challenging question and I change my opinion about it by the result of that conversation it might be a friendliness failure. Instead of imparting values I receive them with the values causal history being in the inside of a young head instead of a cultural heritage of a longlived civilization.

As a civilizaton we have mapped a variety of thoughts and psyche- and organizational strucutres on how they work. The thought space on how an AI might think is poorly mapped. However we are spreading our understandig on cognitive diversity learning about how austistic persons think as well as dolphins. We can establish things liek that some savants are really good with dates and that askingn about dates from that kind of person is more realiable than an ordinary person. To be able to use AI thinking we need to understand what AI thought is. Up to now we have not needed to study in detail how humans think. We can just adapt to the way they do without attending to how it works. But in similar that we need to know the structure of a particle accelerator to be able to say that it provides information about particle behaviour we need to know why it would make sense to take what an AI says seriously. The challenge would be the same if we were asked to listen seriously to a natural intelligence from a foreign culture. Thus the enemy is inferential distance itself rather than the resultant thought processes. For we know that we can create things we don't understand. Thus it's important to understand that doing things you don't understand is a recipe for disaster. And we must not fool ourself that we understand what a machine thinking would be. Only once we have convinced our fellow natural intelligences that we know what we are doing can it make sense to listen to our creations. Socrates could not explain himself so his effect on others was unsafe. If you need to influence others you need to explain why you are doing so.

Link: The trap of "optimal conditions"

8 polymathwannabe 18 September 2014 06:37PM

"the next time you’re stopping yourself from trying something because the conditions are not optimal, remember that those optimal conditions may not have been the reason it worked. They may not be the cause. They may not even be correlated. They may just be a myth you’ve bought into or sold yourself that limits you from breaking out and exceeding your expectations."

More at:

http://goodmenproject.com/ethics-values/1-huge-way-limit-break-fiff

Everybody's talking about machine ethics

14 sbenthall 17 September 2014 05:20PM

There is a lot of mainstream interest in machine ethics now. Here are some links to some popular articles on this topic.

By Zeynep Tufecki, a professor at the I School at UNC, on Facebook's algorithmic newsfeed curation and why Twitter should not implement the same.

By danah boyd, claiming that 'tech folks' are designing systems that implement an idea of fairness that comes from neoliberal ideology.

danah boyd (who spells her name with no capitalization) runs the Data & Society, a "think/do tank" that aims to study this stuff. They've recently gotten MacArthur Foundation funding for studying the ethical and political impact of intelligent systems. 

A few observations:

First, there is no mention of superintelligence or recursively self-modifying anything. These scholars are interested in how, in the near future, the already comparatively powerful machines have moral and political impact on the world.

Second, these groups are quite bad at thinking in a formal or mechanically implementable way about ethics. They mainly seem to recapitulate the same tired tropes that have been resonating through academia for literally decades. On the contrary, mathematical formulation of ethical positions appears to be ya'll's specialty.

Third, however much the one-true-morality may be indeterminate or presently unknowable, progress towards implementable descriptions of various plausible moral positions could at least be incremental steps forward towards an understanding of how to achieve something better. Considering a slow take-off possible future, iterative testing and design of ethical machines with high computational power seems like low-hanging fruit that could only better inform longer-term futurist thought.

Personally, I try to do work in this area and find the lack of serious formal work in this area deeply disappointing. This post is a combination heads up and request to step up your game. It's go time.

 

Sebastian Benthall

PhD Candidate

UC Berkeley School of Infromation

Link: How Community Feedback Shapes User Behavior

4 Tyrrell_McAllister 17 September 2014 01:49PM

This article discusses how upvotes and downvotes influence the quality of posts on online communities.  The article claims that downvotes lead to more posts of lower quality from the downvoted commenter.

From the abstract:

Social media systems rely on user feedback and rating mechanisms for personalization, ranking, and content filtering. [...] This paper investigates how ratings on a piece of content affect its author’s future behavior. [...] [W]e find that negative feedback leads to significant behavioral changes that are detrimental to the community.  Not only do authors of negatively-evaluated content contribute more, but also their future posts are of lower quality, and are perceived by the community as such.  In contrast, positive feedback does not carry similar effects, and neither encourages rewarded authors to write more, nor improves the quality of their posts.

The authors of the article are Justin Cheng, Cristian Danescu-Niculescu-Mizil, and Jure Leskovec.

Edited to add: NancyLebovitz already posted about this study in the Open Thread from September 8-14, 2014.

Should EA's be Superrational cooperators?

8 diegocaleiro 16 September 2014 09:41PM

Back in 2012 when visiting Leverage Research, I was amazed by the level of cooperation in daily situations I got from Mark. Mark wasn't just nice, or kind, or generous. Mark seemed to be playing a different game than everyone else.

If someone needed X, and Mark had X, he would provide X to them. This was true for lending, but also for giving away.

If there was a situation in which someone needed to direct attention to a particular topic, Mark would do it.

You get the picture. Faced with prisoner dilemmas, Mark would cooperate. Faced with tragedy of the commons, Mark would cooperate. Faced with non-egalitarian distributions of resources, time or luck (which are convoluted forms of the dictator game), Mark would rearrange resources without any indexical evaluation. The action would be the same, and the consequentialist one, regardless of which side of a dispute was the Mark side.

I never got over that impression. The impression that I could try to be as cooperative as my idealized fiction of Mark was.

In game theoretic terms, Mark was a Cooperational agent.

  1. Altruistic - MaxOther
  2. Cooperational - MaxSum
  3. Individualist - MaxOwn
  4. Equalitarian - MinDiff
  5. Competitive - MaxDiff
  6. Aggressive - MinOther

Under these definitions of kinds of agents used in research on game theoretical scenarios, what we call Effective Altruism would be called Effective Cooperation. The reason why we call it "altruism" is because even the most parochial EA's care about a set containing a minimum of 7 billion minds, where to a first approximation MaxSum ≈ MaxOther.

Locally however the distinction makes sense. In biology Altruism usually refers to a third concept, different from both the "A" in EA, and Alt, it means acting in such a way that Other>Own without reference to maximizing or minimizing, since evolution designs adaptation executors, not maximizers.

A globally Cooperational agent acts as a consequentialist globally. So does an Alt agent.

The question then is,

How should a consequentialist act locally?

The mathematical response is obviously as a Coo. What real people do is a mix of Coo and Ind.

My suggestion is that we use our undesirable yet unavoidable moral tribe distinction instinct, the one that separates Us from Them, and act always as Coos with Effective Altruists and mix Coo and Ind only with non EAs. That is what Mark did.

 

Any LessWrong readers at the University of Michigan?

5 Asymmetric 16 September 2014 01:39PM

I'm interested in gauging interest in a LessWrong group at UM -- probably a Facebook group, as opposed to an official University club.

Group Rationality Diary, September 16-30

3 therufs 16 September 2014 01:33PM

This is the public group instrumental rationality diary for September 16-30.

It's a place to record and chat about it if you have done, or are actively doing, things like: 

  • Established a useful new habit
  • Obtained new evidence that made you change your mind about some belief
  • Decided to behave in a different way in some set of situations
  • Optimized some part of a common routine or cached behavior
  • Consciously changed your emotions or affect with respect to something
  • Consciously pursued new valuable information about something that could make a big difference in your life
  • Learned something new about your beliefs, behavior, or life that surprised you
  • Tried doing any of the above and failed

Or anything else interesting which you want to share, so that other people can think about it, and perhaps be inspired to take action themselves. Try to include enough details so that everyone can use each other's experiences to learn about what tends to work out, and what doesn't tend to work out.

Thanks to cata for starting the Group Rationality Diary posts, and to commenters for participating.

Previous diary: September 1-15

Rationality diaries archive

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

24 KatjaGrace 16 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome to the Superintelligence reading group. This week we discuss the first section in the reading guide, Past developments and present capabilities. This section considers the behavior of the economy over very long time scales, and the recent history of artificial intelligence (henceforth, 'AI'). These two areas are excellent background if you want to think about large economic transitions caused by AI.

This post summarizes the section, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Foreword, and Growth modes through State of the art from Chapter 1 (p1-18)


Summary

Economic growth:

  1. Economic growth has become radically faster over the course of human history. (p1-2)
  2. This growth has been uneven rather than continuous, perhaps corresponding to the farming and industrial revolutions. (p1-2)
  3. Thus history suggests large changes in the growth rate of the economy are plausible. (p2)
  4. This makes it more plausible that human-level AI will arrive and produce unprecedented levels of economic productivity.
  5. Predictions of much faster growth rates might also suggest the arrival of machine intelligence, because it is hard to imagine humans - slow as they are - sustaining such a rapidly growing economy. (p2-3)
  6. Thus economic history suggests that rapid growth caused by AI is more plausible than you might otherwise think.

The history of AI:

  1. Human-level AI has been predicted since the 1940s. (p3-4)
  2. Early predictions were often optimistic about when human-level AI would come, but rarely considered whether it would pose a risk. (p4-5)
  3. AI research has been through several cycles of relative popularity and unpopularity. (p5-11)
  4. By around the 1990s, 'Good Old-Fashioned Artificial Intelligence' (GOFAI) techniques based on symbol manipulation gave way to new methods such as artificial neural networks and genetic algorithms. These are widely considered more promising, in part because they are less brittle and can learn from experience more usefully. Researchers have also lately developed a better understanding of the underlying mathematical relationships between various modern approaches. (p5-11)
  5. AI is very good at playing board games. (12-13)
  6. AI is used in many applications today (e.g. hearing aids, route-finders, recommender systems, medical decision support systems, machine translation, face recognition, scheduling, the financial market). (p14-16)
  7. In general, tasks we thought were intellectually demanding (e.g. board games) have turned out to be easy to do with AI, while tasks which seem easy to us (e.g. identifying objects) have turned out to be hard. (p14)
  8. An 'optimality notion' is the combination of a rule for learning, and a rule for making decisions. Bostrom describes one of these: a kind of ideal Bayesian agent. This is impossible to actually make, but provides a useful measure for judging imperfect agents against. (p10-11)

Notes on a few things

  1. What is 'superintelligence'? (p22 spoiler)
    In case you are too curious about what the topic of this book is to wait until week 3, a 'superintelligence' will soon be described as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'. Vagueness in this definition will be cleared up later. 
  2. What is 'AI'?
    In particular, how does 'AI' differ from other computer software? The line is blurry, but basically AI research seeks to replicate the useful 'cognitive' functions of human brains ('cognitive' is perhaps unclear, but for instance it doesn't have to be squishy or prevent your head from imploding). Sometimes AI research tries to copy the methods used by human brains. Other times it tries to carry out the same broad functions as a human brain, perhaps better than a human brain. Russell and Norvig (p2) divide prevailing definitions of AI into four categories: 'thinking humanly', 'thinking rationally', 'acting humanly' and 'acting rationally'. For our purposes however, the distinction is probably not too important.
  3. What is 'human-level' AI? 
    We are going to talk about 'human-level' AI a lot, so it would be good to be clear on what that is. Unfortunately the term is used in various ways, and often ambiguously. So we probably can't be that clear on it, but let us at least be clear on how the term is unclear. 

    One big ambiguity is whether you are talking about a machine that can carry out tasks as well as a human at any price, or a machine that can carry out tasks as well as a human at the price of a human. These are quite different, especially in their immediate social implications.

    Other ambiguities arise in how 'levels' are measured. If AI systems were to replace almost all humans in the economy, but only because they are so much cheaper - though they often do a lower quality job - are they human level? What exactly does the AI need to be human-level at? Anything you can be paid for? Anything a human is good for? Just mental tasks? Even mental tasks like daydreaming? Which or how many humans does the AI need to be the same level as? Note that in a sense most humans have been replaced in their jobs before (almost everyone used to work in farming), so if you use that metric for human-level AI, it was reached long ago, and perhaps farm machinery is human-level AI. This is probably not what we want to point at.

    Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.

    We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I think because it is hard to define in a meaningful way. There are already machines for which a company is willing to pay more than a human: in this sense a microscope might be 'super-human'. There is no reason for a machine which is equal in value to a human to have the traits we are interested in talking about here, such as agency, superior cognitive abilities or the tendency to drive humans out of work and shape the future. Thus we talk about AI which is at least as good as a human, but you should beware that the predictions made about such an entity may apply before the entity is technically 'human-level'.


    Example of how the first 'human-level' AI may surpass humans in many ways.

    Because of these ambiguities, AI researchers are sometimes hesitant to use the term. e.g. in these interviews.
  4. Growth modes (p1) 
    Robin Hanson wrote the seminal paper on this issue. Here's a figure from it, showing the step changes in growth rates. Note that both axes are logarithmic. Note also that the changes between modes don't happen overnight. According to Robin's model, we are still transitioning into the industrial era (p10 in his paper).
  5. What causes these transitions between growth modes? (p1-2)
    One might be happier making predictions about future growth mode changes if one had a unifying explanation for the previous changes. As far as I know, we have no good idea of what was so special about those two periods. There are many suggested causes of the industrial revolution, but nothing uncontroversially stands out as 'twice in history' level of special. You might think the small number of datapoints would make this puzzle too hard. Remember however that there are quite a lot of negative datapoints - you need an explanation that didn't happen at all of the other times in history. 
  6. Growth of growth
    It is also interesting to compare world economic growth to the total size of the world economy. For the last few thousand years, the economy seems to have grown faster more or less in proportion to it's size (see figure below). Extrapolating such a trend would lead to an infinite economy in finite time. In fact for the thousand years until 1950 such extrapolation would place an infinite economy in the late 20th Century! The time since 1950 has been strange apparently. 

    (Figure from here)
  7. Early AI programs mentioned in the book (p5-6)
    You can see them in action: SHRDLU, Shakey, General Problem Solver (not quite in action), ELIZA.
  8. Later AI programs mentioned in the book (p6)
    Algorithmically generated Beethoven, algorithmic generation of patentable inventionsartificial comedy (requires download).
  9. Modern AI algorithms mentioned (p7-8, 14-15) 
    Here is a neural network doing image recognition. Here is artificial evolution of jumping and of toy cars. Here is a face detection demo that can tell you your attractiveness (apparently not reliably), happiness, age, gender, and which celebrity it mistakes you for.
  10. What is maximum likelihood estimation? (p9)
    Bostrom points out that many types of artificial neural network can be viewed as classifiers that perform 'maximum likelihood estimation'. If you haven't come across this term before, the idea is to find the situation that would make your observations most probable. For instance, suppose a person writes to you and tells you that you have won a car. The situation that would have made this scenario most probable is the one where you have won a car, since in that case you are almost guaranteed to be told about it. Note that this doesn't imply that you should think you won a car, if someone tells you that. Being the target of a spam email might only give you a low probability of being told that you have won a car (a spam email may instead advise you of products, or tell you that you have won a boat), but spam emails are so much more common than actually winning cars that most of the time if you get such an email, you will not have won a car. If you would like a better intuition for maximum likelihood estimation, Wolfram Alpha has several demonstrations (requires free download).
  11. What are hill climbing algorithms like? (p9)
    The second large class of algorithms Bostrom mentions are hill climbing algorithms. The idea here is fairly straightforward, but if you would like a better basic intuition for what hill climbing looks like, Wolfram Alpha has a demonstration to play with (requires free download).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions:

  1. How have investments into AI changed over time? Here's a start, estimating the size of the field.
  2. What does progress in AI look like in more detail? What can we infer from it? I wrote about algorithmic improvement curves before. If you are interested in plausible next steps here, ask me.
  3. What do economic models tell us about the consequences of human-level AI? Here is some such thinking; Eliezer Yudkowsky has written at length about his request for more.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about what AI researchers think about human-level AI: when it will arrive, what it will be like, and what the consequences will be. To prepare, read Opinions about the future of machine intelligence from Chapter 1 and also When Will AI Be Created? by Luke Muehlhauser. The discussion will go live at 6pm Pacific time next Monday 22 September. Sign up to be notified here.

Unpopular ideas attract poor advocates: Be charitable

29 mushroom 15 September 2014 07:30PM

Unfamiliar or unpopular ideas will tend to reach you via proponents who:

  •  ...hold extreme interpretations of these ideas.
  • ...have unpleasant social characteristics.
  • ...generally come across as cranks.

The basic idea: It's unpleasant to promote ideas that result in social sanction, and frustrating when your ideas are met with indifference. Both situations are more likely when talking to an ideological out-group. Given a range of positions on an in-group belief, who will decide to promote the belief to outsiders? On average, it will be those who believe the benefits of the idea are large relative to in-group opinion (extremists), those who view the social costs as small (disagreeable people), and those who are dispositionally drawn to promoting weird ideas (cranks).

I don't want to push this pattern too far. This isn't a refutation of any particular idea. There are reasonable people in the world, and some of them even express their opinions in public, (in spite of being reasonable). And sometimes the truth will be unavoidably unfamiliar and unpopular, etc. But there are also...

Some benefits that stem from recognizing these selection effects:

  • It's easier to be charitable to controversial ideas, when you recognize that you're interacting with people who are terribly suited to persuade you. I'm not sure "steelmanning" is the best idea (trying to present the best argument for an opponent's position). Based on the extremity effect, another technique is to construct a much diluted version of the belief, and then try to steelman the diluted belief.
  • If your group holds fringe or unpopular ideas, you can avoid these patterns when you want to influence outsiders.
  • If you want to learn about an afflicted issue, you might ignore the public representatives and speak to the non-evangelical instead (you'll probably have to start the conversation).
  • You can resist certain polarizing situations, in which the most visible camps hold extreme and opposing views. This situation worsens when those with non-extreme views judge the risk of participation as excessive, and leave the debate to the extremists (who are willing to take substantial risks for their beliefs). This leads to the perception that the current camps represent the only valid positions, which creates a polarizing loop. Because this is a sort of coordination failure among non-extremists, knowing to covertly look for other non-vocal moderates is a first step toward a solution. (Note: Sometimes there really aren't any moderates.)
  • Related to the previous point: You can avoid exaggerating the ideological unity of a group based on the group's leadership, or believing that the entire group has some obnoxious trait present in the leadership. (Note: In things like elections and war, the views of the leadership are what you care about. But you still don't want to be confused about other group members.)

 

I think the first benefit listed is the most useful.

To sum up: An unpopular idea will tend to get poor representation for social reasons, which will makes it seem like a worse idea than it really is, even granting that many unpopular ideas are unpopular for good reason. So when you encounter a idea that seem unpopular, you're probably hearing about it from a sub-optimal source, and you should try to be charitable towards the idea before dismissing it.

View more: Next