Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Superintelligence 19: Post-transition formation of a singleton

5 KatjaGrace 20 January 2015 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the nineteenth section in the reading guidepost-transition formation of a singleton. This corresponds to the last part of Chapter 11.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: : “Post-transition formation of a singleton?” from Chapter 11


Summary

  1. Even if the world remains multipolar through a transition to machine intelligence, a singleton might emerge later, for instance during a transition to a more extreme technology. (p176-7)
  2. If everything is faster after the first transition, a second transition may be more or less likely to produce a singleton. (p177)
  3. Emulations may give rise to 'superorganisms': clans of emulations who care wholly about their group. These would have an advantage because they could avoid agency problems, and make various uses of the ability to delete members. (p178-80) 
  4. Improvements in surveillance resulting from machine intelligence might allow better coordination, however machine intelligence will also make concealment easier, and it is unclear which force will be stronger. (p180-1)
  5. Machine minds may be able to make clearer precommitments than humans, changing the nature of bargaining somewhat. Maybe this would produce a singleton. (p183-4)

Another view

Many of the ideas around superorganisms come from Carl Shulman's paper, Whole Brain Emulation and the Evolution of Superorganisms. Robin Hanson critiques it:

...It seems to me that Shulman actually offers two somewhat different arguments, 1) an abstract argument that future evolution generically leads to superorganisms, because their costs are generally less than their benefits, and 2) a more concrete argument, that emulations in particular have especially low costs and high benefits...

...On the general abstract argument, we see a common pattern in both the evolution of species and human organizations — while winning systems often enforce substantial value sharing and loyalty on small scales, they achieve much less on larger scales. Values tend to be more integrated in a single organism’s brain, relative to larger families or species, and in a team or firm, relative to a nation or world. Value coordination seems hard, especially on larger scales.

This is not especially puzzling theoretically. While there can be huge gains to coordination, especially in war, it is far less obvious just how much one needs value sharing to gain action coordination. There are many other factors that influence coordination, after all; even perfect value matching is consistent with quite poor coordination. It is also far from obvious that values in generic large minds can easily be separated from other large mind parts. When the parts of large systems evolve independently, to adapt to differing local circumstances, their values may also evolve independently. Detecting and eliminating value divergences might in general be quite expensive.

In general, it is not at all obvious that the benefits of more value sharing are worth these costs. And even if more value sharing is worth the costs, that would only imply that value-sharing entities should be a bit larger than they are now, not that they should shift to a world-encompassing extreme.

On Shulman’s more concrete argument, his suggested single-version approach to em value sharing, wherein a single central em only allows (perhaps vast numbers of) brief copies, can suffer from greatly reduced innovation. When em copies are assigned to and adapt to different tasks, there may be no easy way to merge their minds into a single common mind containing all their adaptations. The single em copy that is best at doing an average of tasks, may be much worse at each task than the best em for that task.

Shulman’s other concrete suggestion for sharing em values is “psychological testing, staged situations, and direct observation of their emulation software to form clear pictures of their loyalties.” But genetic and cultural evolution has long tried to make human minds fit well within strongly loyal teams, a task to which we seem well adapted. This suggests that moving our minds closer to a “borg” team ideal would cost us somewhere else, such as in our mental agility.

On the concrete coordination gains that Shulman sees from superorganism ems, most of these gains seem cheaply achievable via simple long-standard human coordination mechanisms: property rights, contracts, and trade. Individual farmers have long faced starvation if they could not extract enough food from their property, and farmers were often out-competed by others who used resources more efficiently.

With ems there is the added advantage that em copies can agree to the “terms” of their life deals before they are created. An em would agree that it starts life with certain resources, and that life will end when it can no longer pay to live. Yes there would be some selection for humans and ems who peacefully accept such deals, but probably much less than needed to get loyal devotion to and shared values with a superorganism.

Yes, with high value sharing ems might be less tempted to steal from other copies of themselves to survive. But this hardly implies that such ems no longer need property rights enforced. They’d need property rights to prevent theft by copies of other ems, including being enslaved by them. Once a property rights system exists, the additional cost of applying it within a set of em copies seems small relative to the likely costs of strong value sharing.

Shulman seems to argue both that superorganisms are a natural endpoint of evolution, and that ems are especially supportive of superorganisms. But at most he has shown that ems organizations may be at a somewhat larger scale, not that they would reach civilization-encompassing scales. In general, creatures who share values can indeed coordinate better, but perhaps not by much, and it can be costly to achieve and maintain shared values. I see no coordinate-by-values free lunch...

Notes

1. The natural endpoint

Bostrom says that a singleton is natural conclusion of long-term trend toward larger scales of political integration (p176). It seems helpful here to be more precise about what we mean by singleton. Something like a world government does seem to be a natural conclusion to long term trends. However this seems different to the kind of singleton I took Bostrom to previously be talking about. A world government would by default only make a certain class of decisions, for instance about global level policies. There has been a long term trend for the largest political units to become larger, however there have always been smaller units as well, making different classes of decisions, down to the individual. I'm not sure how to measure the mass of decisions made by different parties, but it seems like the individuals may be making more decisions more freely than ever, and the large political units have less ability than they once did to act against the will of the population. So the long term trend doesn't seem to point to an overpowering ruler of everything.

2. How value-aligned would emulated copies of the same person be?

Bostrom doesn't say exactly how 'emulations that were wholly altruistic toward their copy-siblings' would emerge. It seems to be some combination of natural 'altruism' toward oneself and selection for people who react to copies of themselves with extreme altruism (confirmed by a longer interesting discussion in Shulman's paper). How easily one might select for such people depends on how humans generally react to being copied. In particular, whether they treat a copy like part of themselves, or merely like a very similar acquaintance.

The answer to this doesn't seem obvious. Copies seem likely to agree strongly on questions of global values, such as whether the world should be more capitalistic, or whether it is admirable to work in technology. However I expect many—perhaps most—failures of coordination come from differences in selfish values—e.g. I want me to have money, and you want you to have money. And if you copy a person, it seems fairly likely to me the copies will both still want the money themselves, more or less.

From other examples of similar people—identical twins, family, people and their future selves—it seems people are unusually altruistic to similar people, but still very far from 'wholly altruistic'. Emulation siblings would be much more similar than identical twins, but who knows how far that would move their altruism?

Shulman points out that many people hold views about personal identity that would imply that copies share identity to some extent. The translation between philosophical views and actual motivations is not always complete however.

3. Contemporary family clans

Family-run firms are a place to get some information about the trade-off between reducing agency problems and having access to a wide range of potential employees. Given a brief perusal of the internet, it seems to be ambiguous whether they do better. One could try to separate out the factors that help them do better or worse.

4. How big a problem is disloyalty?

I wondered how big a problem insider disloyalty really was for companies and other organizations. Would it really be worth all this loyalty testing? I can't find much about it quickly, but 59% of respondents to a survey apparently said they had some kind of problems with insiders. The same report suggests that a bunch of costly initiatives such as intensive psychological testing are currently on the table to address the problem. Also apparently it's enough of a problem for someone to be trying to solve it with mind-reading, though that probably doesn't say much.

5. AI already contributing to the surveillance-secrecy arms race

Artificial intelligence will help with surveillance sooner and more broadly than in the observation of people's motives. e.g. here and here.

6. SMBC is also pondering these topics this week



In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. What are the present and historical barriers to coordination, between people and organizations? How much have these been lowered so far? How much difference has it made to the scale of organizations, and to productivity? How much further should we expect these barriers to be lessened as a result of machine intelligence?
  2. Investigate the implications of machine intelligence for surveillance and secrecy in more depth.
  3. Are multipolar scenarios safer than singleton scenarios? Muehlhauser suggests directions.
  4. Explore ideas for safety in a singleton scenario via temporarily multipolar AI. e.g. uploading FAI researchers (See Salamon & Shulman, “Whole Brain Emulation, as a platform for creating safe AGI.”)
  5. Which kinds of multipolar scenarios would be more likely to resolve into a singleton, and how quickly?
  6. Can we get whole brain emulation without producing neuromorphic AGI slightly earlier or shortly afterward? See section 3.2 of Eckersley & Sandberg (2013).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the 'value loading problem'. To prepare, read “The value-loading problem” through “Motivational scaffolding” from Chapter 12The discussion will go live at 6pm Pacific time next Monday 26 January. Sign up to be notified here.

Slides online from "The Future of AI: Opportunities and Challenges"

12 ciphergoth 16 January 2015 11:17AM

In the first weekend of this year, the Future of Life institute hosted a landmark conference in Puerto Rico: "The Future of AI: Opportunities and Challenges". The conference was unusual in that it was not made public until it was over, and the discussions were under Chatham House rules. The slides from the conference are now available. The list of attenders includes a great many famous names as well as lots of names familiar to those of us on Less Wrong: Elon Musk, Sam Harris, Margaret Boden, Thomas Dietterich, all three DeepMind founders, and many more.

This is shaping up to be another extraordinary year for AI risk concerns going mainstream!

Less exploitable value-updating agent

5 Stuart_Armstrong 13 January 2015 05:19PM

My indifferent value learning agent design is in some ways too good. The agent transfer perfectly from u maximisers to v maximisers - but this makes them exploitable, as Benja has pointed out.

For instance, if u values paperclips and v values staples, and everyone knows that the agent will soon transfer from a u-maximiser to a v-maximiser, then an enterprising trader can sell the agent paperclips in exchange for staples, then wait for the utility change, and sell the agent back staples for paperclips, pocketing a profit each time. More prosaically, they could "borrow" £1,000,000 from the agent, promising to pay back £2,000,000 tomorrow if the agent is still a u-maximiser. And the currently u-maximising agent will accept, even though everyone knows it will change to a v-maximiser before tomorrow.

One could argue that exploitability is inevitable, given the change in utility functions. And I haven't yet found any principled way of avoiding exploitability which preserves the indifference. But here is a tantalising quasi-example.

As before, u values paperclips and v values staples. Both are defined in terms of extra paperclips/staples over those existing in the world (and negatively in terms of destruction of existing/staples), with their zero being at the current situation. Let's put some diminishing returns on both utilities: for each paperclips/stables created/destroyed up to the first five, u/v will gain/lose one utilon. For each subsequent paperclip/staple destroyed above five, they will gain/lose one half utilon.

We now construct our world and our agent. The world lasts two days, and has a machine that can create or destroy paperclips and staples for the cost of £1 apiece. Assume there is a tiny ε chance that the machine stops working at any given time. This ε will be ignored in all calculations; it's there only to make the agent act sooner rather than later when the choices are equivalent (a discount rate could serve the same purpose).

The agent owns £10 and has utility function u+Xv. The value of X is unknown to the agent: it is either +1 or -1, with 50% probability, and this will be revealed at the end of the first day (you can imagine X is the output of some slow computation, or is written on the underside of a rock that will be lifted).

So what will the agent do? It's easy to see that it can never get more than 10 utilons, as each £1 generates at most 1 utilon (we really need a unit symbol for the utilon!). And it can achieve this: it will spend £5 immediately, creating 5 paperclips, wait until X is revealed, and spend another £5 creating or destroying staples (depending on the value of X).

This looks a lot like a resource-conserving value-learning agent. I doesn't seem to be "exploitable" in the sense Benja demonstrated. It will still accept some odd deals - one extra paperclip on the first day in exchange for all the staples in the world being destroyed, for instance. But it won't give away resources for no advantage. And it's not a perfect value-learning agent. But it still seems to have interesting features of non-exploitable and value-learning that are worth exploring.

Note that this property does not depend on v being symmetric around staple creation and destruction. Assume v hits diminishing returns after creating 5 staples, but after destroying only 4 of them. Then the agent will have the same behaviour as above (in that specific situation; in general, this will cause a slight change, in that the agent will slightly overvalue having money on the first day compared to the original v), and will expect to get 9.75 utilons (50% chance of 10 for X=+1, 50% chance of 9.5 for X=-1). Other changes to u and v will shift how much money is spent on different days, but the symmetry of v is not what is powering this example.

Superintelligence 18: Life in an algorithmic economy

3 KatjaGrace 13 January 2015 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the eighteenth section in the reading guideLife in an algorithmic economy. This corresponds to the middle of Chapter 11.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Life in an algorithmic economy” from Chapter 11


Summary

  1. In a multipolar scenario, biological humans might lead poor and meager lives. (p166-7)
  2. The AIs might be worthy of moral consideration, and if so their wellbeing might be more important than that of the relatively few humans. (p167)
  3. AI minds might be much like slaves, even if they are not literally. They may be selected for liking this. (p167)
  4. Because brain emulations would be very cheap to copy, it will often be convenient to make a copy and then later turn it off (in a sense killing a person). (p168)
  5. There are various other reasons that very short lives might be optimal for some applications. (p168-9)
  6. It isn't obvious whether brain emulations would be happy working all of the time. Some relevant considerations are current human emotions in general and regarding work, probable selection for pro-work individuals, evolutionary adaptiveness of happiness in the past and future -- e.g. does happiness help you work harder?--and absence of present sources of unhappiness such as injury. (p169-171)
  7. In the long run, artificial minds may not even be conscious, or have valuable experiences, if these are not the most effective ways for them to earn wages. If such minds replace humans, Earth might have an advanced civilization with nobody there to benefit. (p172-3)
  8. In the long run, artificial minds may outsource many parts of their thinking, thus becoming decreasingly differentiated as individuals. (p172)
  9. Evolution does not imply positive progress. Even those good things that evolved in the past may not withstand evolutionary selection in a new circumstance. (p174-6)

Another view

Robin Hanson on others' hasty distaste for a future of emulations: 

Parents sometimes disown their children, on the grounds that those children have betrayed key parental values. And if parents have the sort of values that kids could deeply betray, then it does make sense for parents to watch out for such betrayal, ready to go to extremes like disowning in response.

But surely parents who feel inclined to disown their kids should be encouraged to study their kids carefully before making such a choice. For example, parents considering whether to disown their child for refusing to fight a war for their nation, or for working for a cigarette manufacturer, should wonder to what extend national patriotism or anti-smoking really are core values, as opposed to being mere revisable opinions they collected at one point in support of other more-core values. Such parents would be wise to study the lives and opinions of their children in some detail before choosing to disown them.

I’d like people to think similarly about my attempts to analyze likely futures. The lives of our descendants in the next great era after this our industry era may be as different from ours’ as ours’ are from farmers’, or farmers’ are from foragers’. When they have lived as neighbors, foragers have often strongly criticized farmer culture, as farmers have often strongly criticized industry culture. Surely many have been tempted to disown any descendants who adopted such despised new ways. And while such disowning might hold them true to core values, if asked we would advise them to consider the lives and views of such descendants carefully, in some detail, before choosing to disown.

Similarly, many who live industry era lives and share industry era values, may be disturbed to see forecasts of descendants with life styles that appear to reject many values they hold dear. Such people may be tempted to reject such outcomes, and to fight to prevent them, perhaps preferring a continuation of our industry era to the arrival of such a very different era, even if that era would contain far more creatures who consider their lives worth living, and be far better able to prevent the extinction of Earth civilization. And such people may be correct that such a rejection and battle holds them true to their core values.

But I advise such people to first try hard to see this new era in some detail from the point of view of its typical residents. See what they enjoy and what fills them with pride, and listen to their criticisms of your era and values. I hope that my future analysis can assist such soul-searching examination. If after studying such detail, you still feel compelled to disown your likely descendants, I cannot confidently say you are wrong. My job, first and foremost, is to help you see them clearly.

More on whose lives are worth living here and here.

Notes

1. Robin Hanson is probably the foremost researcher on what the finer details of an economy of emulated human minds would be like. For instance, which company employees would run how fast, how big cities would be, whether people would hang out with their copies. See a TEDx talk, and writings hereherehere and here (some overlap - sorry). He is also writing a book on the subject, which you can read early if you ask him. 

2. Bostrom says,

Life for biological humans in a post-transition Malthusian state need not resemble any of the historical states of man...the majority of humans in this scenario might be idle rentiers who eke out a marginal living on their savings. They would be very poor, yet derive what little income they have from savings or state subsidies. They would live in a world with  extremely advanced technology, including not only superintelligent machines but also anti-aging medicine, virtual reality, and various enhancement technologies and pleasure drugs: yet these might be generally unaffordable....(p166)

It's true this might happen, but it doesn't seem like an especially likely scenario to me. As Bostrom has pointed out in various places earlier, biological humans would do quite well if they have some investments in capital, do not have too much of their property stolen or artfully manouvered away from them, and do not undergo too massive population growth themselves. These risks don't seem so large to me.

3. Paul Christiano has an interesting article on capital accumulation in a world of machine intelligence.

4. In discussing worlds of brain emulations, we often talk about selecting people for having various characteristics - for instance, being extremely productive, hard-working, not minding frequent 'death', being willing to work for free and donate any proceeds to their employer (p167-8). However there are only so many humans to select from, so we can't necessarily select for all the characteristics we might want. Bostrom also talks of using other motivation selection methods, and modifying code, but it is interesting to ask how far you could get using only selection. It is not obvious to what extent one could meaningfully modify brain emulation code initially. 

I'd guess less than one in a thousand people would be willing to donate everything to their employer, given a random employer. This means to get this characteristic, you would have to lose a factor of 1000 on selecting for other traits. All together you have about 33 bits of selection power in the present world (that is, 7 billion is about 2^33; you can divide the world in half about 33 times before you get to a single person). Lets suppose you use 5 bits in getting someone who both doesn't mind their copies dying (I guess 1 bit, or half of people) and who is willing to work an 80h/week (I guess 4 bits, or one in sixteen people). Lets suppose you are using the rest of your selection (28 bits) on intelligence, for the sake of argument. You are getting a person of IQ 186. If instead you use 10 bits (2^10 = ~1000) on getting someone to donate all their money to their employer, you can only use 18 bits on intelligence, getting a person of IQ 167. Would it not often be better to have the worker who is twenty IQ points smarter and pay them above subsistance?

5. A variety of valuable uses for cheap to copy, short-lived brain emulations are discussed in Whole brain emulation and the evolution of superorganisms, LessWrong discussion on the impact of whole brain emulation, and Robin's work cited above.

6. Anders Sandberg writes about moral implications of emulations of animals and humans.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Is the first functional whole brain emulation likely to be (1) an emulation of low-level functionality that doesn’t require much understanding of human cognitive neuroscience at the computational level, as described in Sandberg & Bostrom (2008), or is it more likely to be (2) an emulation that makes heavy use of advanced human cognitive neuroscience, as described by (e.g.) Ken Hayworth, or is it likely to be (3) something else?
  2. Extend and update our understanding of when brain emulations might appear (see Sandberg & Bostrom (2008)).
  3. Investigate the likelihood of a multipolar outcome?
  4. Follow Robin Hanson (see above) in working out the social implications of an emulation scenario
  5. What kinds of responses to the default low-regulation multipolar outcome outlined in this section are likely to be made? e.g. is any strong regulation likely to emerge that avoids the features detailed in the current section?
  6. What measures are useful for ensuring good multipolar outcomes?
  7. What qualitatively different kinds of multipolar outcomes might we expect? e.g. brain emulation outcomes are one class.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the possibility of a multipolar outcome turning into a singleton later. To prepare, read “Post-transition formation of a singleton?” from Chapter 11The discussion will go live at 6pm Pacific time next Monday 19 January. Sign up to be notified here.

Superintelligence 17: Multipolar scenarios

4 KatjaGrace 06 January 2015 06:44AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the seventeenth section in the reading guideMultipolar scenarios. This corresponds to the first part of Chapter 11.

Apologies for putting this up late. I am traveling, and collecting together the right combination of electricity, wifi, time, space, and permission from an air hostess to take out my computer was more complicated than the usual process.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Of horses and men” from Chapter 11


Summary

  1. 'Multipolar scenario': a situation where no single agent takes over the world
  2. A multipolar scenario may arise naturally, or intentionally for reasons of safety. (p159)
  3. Knowing what would happen in a multipolar scenario involves analyzing an extra kind of information beyond that needed for analyzing singleton scenarios: that about how agents interact (p159)
  4. In a world characterized by cheap human substitutes, rapidly introduced, in the presence of low regulation, and strong protection of property rights, here are some things that will likely happen: (p160)
    1. Human labor will earn wages at around the price of the substitutes - perhaps below subsistence level for a human. Note that machines have been complements to human labor for some time, raising wages. One should still expect them to become substitutes at some point and reverse this trend.  (p160-61)
    2. Capital (including AI) will earn all of the income, which will be a lot. Humans who own capital will become very wealthy. Humans who do not own income may be helped with a small fraction of others' wealth, through charity or redistribution. p161-3)
    3. If the humans, brain emulations or other AIs receive resources from a common pool when they are born or created, the population will likely increase until it is constrained by resources. This is because of selection for entities that tend to reproduce more. (p163-6) This will happen anyway eventually, but AI would make it faster, because reproduction is so much faster for programs than for humans. This outcome can be avoided by offspring receiving resources from their parents' purses.

Another view

Tyler Cowen expresses a different view (video, some transcript):

The other point I would make is I think smart machines will always be complements and not substitutes, but it will change who they’re complementing. So I was very struck by this woman who was a doctor sitting here a moment ago, and I fully believe that her role will not be replaced by machines. But her role didn’t sound to me like a doctor. It sounded to me like therapist, friend, persuader, motivational coach, placebo effect, all of which are great things. So the more you have these wealthy patients out there, the patients are in essense the people who work with the smart machines and augment their power, those people will be extremely wealthy. Those people will employ in many ways what you might call personal servants. And because those people are so wealthy, those personal servants will also earn a fair amount.

So the gains from trade are always there, there’s still a law of comparative advantage. I think people who are very good at working with the machines will earn much much more. And the others of us will need to find different kinds of jobs. But again if total output goes up, there’s always an optimistic scenario.

Though perhaps his view isn't as different as it sounds.

Notes

1. The small space devoted to multipolar outcomes in Superintelligence probably doesn't reflect a broader consensus that a singleton is more likely or more important. Robin Hanson is perhaps the loudest proponent of the 'multipolar outcomes are more likely' position. e.g. in The Foom Debate and more briefly here. This week is going to be fairly Robin Hanson themed in fact.

2. Automation can both increase the value produced by a human worker (complementing human labor) and replace the human worker altogether (substituting human labor). Over the long term, it seems complementarity has been been the overall effect. However by the time a machine can do everything a human can do, it is hard to imagine a human earning more than a machine needs to run, i.e. less than they do now. Thus at some point substitution must take over. Some think recent unemployment is due in large part to automation. Some think this time is the beginning of the end, and the jobs will never return to humans. Others disagree, and are making bets. Eliezer Yudkowsky and John Danaher clarify some arguments. Danaher adds a nice diagram:

3. Various policies have been proposed to resolve poverty from widespread permanent technological unemployment. Here is a list, though it seems to miss a straightforward one: investing ahead of time in the capital that will become profitable instead of one's own labor, or having policies that encourage such diversification. Not everyone has resources to invest in capital, but it might still help many people. Mentioned here and here:

And then there are more extreme measures. Everyone is born with an endowment of labor; why not also an endowment of capital? What if, when each citizen turns 18, the government bought him or her a diversified portfolio of equity? Of course, some people would want to sell it immediately, cash out, and party, but this could be prevented with some fairly light paternalism, like temporary "lock-up" provisions. This portfolio of capital ownership would act as an insurance policy for each human worker; if technological improvements reduced the value of that person's labor, he or she would reap compensating benefits through increased dividends and capital gains. This would essentially be like the kind of socialist land reforms proposed in highly unequal Latin American countries, only redistributing stock instead of land.

4. Even if the income implications of total unemployment are sorted out, some are concerned about the psychological and social consequences. According to Voltaire, 'work saves us from three great evils: boredom, vice and need'. Sometimes people argue that even if our work is economically worthless, we should toil away for our own good, lest the vice and boredom overcome us.

I find this unlikely, given for instance the ubiquity of more fun and satisfying things to do than most jobs. And while obscolesence and the resulting loss of purpose may be psychologically harmful, I doubt a purposeless job solves that. Also, people already have a variety of satisfying purposes in life other than earning a living. Note also that people in situations like college and lives of luxury seem to do ok on average. I'd guess that unemployed people and some retirees do less well, but this seems more plausibly from losing a previously significant  source of purpose and respect, rather than from lack of entertainment and constraint. And in a world where nobody gets respect from bringing home dollars, and other purposes are common, I doubt either of these costs will persist. But this is all speculation. 

On a side note, the kinds of vices that are usually associated with not working tend to be vices of parasitic unproductivity, such as laziness, profligacy, and tendency toward weeklong video game stints. In a world where human labor is worthless, these heuristics for what is virtuous or not might be outdated.

Nils Nielson discusses this issue more, along with the problem of humans not earning anything.

5. What happens when selection for expansive tendencies go to space? This.

6.  A kind of robot that may change some job markets:

A kind of robot which may change some job markets.

(picture by Steve Jurvetson)

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How likely is one superintelligence, versus many intelligences? What empirical data bears on this question? Bostrom briefly investigated characteristic time lags between large projects for instance, on p80-81.
  2. Are whole brain emulations likely to come first? This might be best approached by estimating timelines for different technologies (each an ambitious project) and comparing them, or there may be ways to factor out some considerations.
  3. What are the long term trends in automation replacing workers?
  4. What else can we know about the effects of automation on employment? (this seems to have a fair literature)
  5. What levels of population growth would be best in the long run, given machine intelligences? (this sounds like an ethics question, but one could also assume some kind of normal human values and investigate the empirical considerations that would make situations better or worse in their details.
  6. Are there good ways to avoid malthusian outcomes in the kind of scenario discussed in this section, if 'as much as possible' is not the answer to 6?
  7. What policies might help a society deal with permanent, almost complete unemployment caused by AI progress?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'life in an algorithmic economy'. To prepare, read the section of that name in Chapter 11The discussion will go live at 6pm Pacific time next Monday January 12. Sign up to be notified here.

Recent AI safety work

19 paulfchristiano 30 December 2014 06:19PM

(Crossposted from ordinary ideas). 

I’ve recently been thinking about AI safety, and some of the writeups might be interesting to some LWers:

  1. Ideas for building useful agents without goals: approval-directed agentsapproval-directed bootstrapping, and optimization and goals. I think this line of reasoning is very promising.
  2. A formalization of one piece of the AI safety challenge: the steering problem. I am eager to see more precise, high-level discussion of AI safety, and I think this article is a helpful step in that direction. Since articulating the steering problem I have become much more optimistic about versions of it being solved in the near term. This mostly means that the steering problem fails to capture the hardest parts of AI safety. But it’s still good news, and I think it may eventually cause some people to revise their understanding of AI safety.
  3. Some ideas for getting useful work out of self-interested agents, based on arguments: of arguments and wagersadversarial collaboration [older], and delegating to a mixed crowd. I think these are interesting ideas in an interesting area, but they have a ways to go until they could be useful.

I’m excited about a few possible next steps:

  1. Under the (highly improbable) assumption that various deep learning architectures could yield human-level performance, could they also predictably yield safe AI? I think we have a good chance of finding a solution---i.e. a design of plausibly safe AI, under roughly the same assumptions needed to get human-level AI---for some possible architectures. This would feel like a big step forward.
  2. For what capabilities can we solve the steering problem? I had originally assumed none, but I am now interested in trying to apply the ideas from the approval-directed agents post. From easiest to hardest, I think there are natural lines of attack using any of: natural language question answering, precise question answering, sequence prediction. It might even be possible using reinforcement learners (though this would involve different techniques).
  3. I am very interested in implementing effective debates, and am keen to test some unusual proposals. The connection to AI safety is more impressionistic, but in my mind these techniques are closely linked with approval-directed behavior.
  4. I’m currently writing up a concrete architecture for approval-directed agents, in order to facilitate clearer discussion about the idea. This kind of work that seems harder to do in advance, but at this point I think it’s mostly an exposition problem.

Superintelligence 16: Tool AIs

7 KatjaGrace 30 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the sixteenth section in the reading guideTool AIs. This corresponds to the last parts of Chapter Ten.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: : “Tool-AIs” and “Comparison” from Chapter 10 


Summary

  1. Tool AI: an AI that is not 'like an agent', but more like an excellent version of contemporary software. Most notably perhaps, it is not goal-directed (p151)
  2. Contemporary software may be safe because it has low capability rather than because it reliably does what you want, suggesting a very smart version of contemporary software would be dangerous (p151)
  3. Humans often want to figure out how to do a thing that they don't already know how to do. Narrow AI is already used to search for solutions. Automating this search seems to mean giving the machine a goal (that of finding a great way to make paperclips, for instance). That is, just carrying out a powerful search seems to have many of the problems of AI. (p152)
  4. A machine intended to be a tool may cause similar problems to a machine intended to be an agent, by searching to produce plans that are perverse instantiations, infrastructure profusions or mind crimes. It may either carry them out itself or give the plan to a human to carry out. (p153)
  5. A machine intended to be a tool may have agent-like parts. This could happen if its internal processes need to be optimized, and so it contains strong search processes for doing this. (p153)
  6. If tools are likely to accidentally be agent-like, it would probably be better to just build agents on purpose and have more intentional control over the design. (p155)
  7. Which castes of AI are safest is unclear and depends on circumstances. (p158) 

Another view

Holden prompted discussion of the Tool AI in 2012, in one of several Thoughts on the Singularity Institute:

...Google Maps is a type of artificial intelligence (AI). It is far more intelligent than I am when it comes to planning routes.

Google Maps - by which I mean the complete software package including the display of the map itself - does not have a "utility" that it seeks to maximize. (One could fit a utility function to its actions, as to any set of actions, but there is no single "parameter to be maximized" driving its operations.)

Google Maps (as I understand it) considers multiple possible routes, gives each a score based on factors such as distance and likely traffic, and then displays the best-scoring route in a way that makes it easily understood by the user. If I don't like the route, for whatever reason, I can change some parameters and consider a different route. If I like the route, I can print it out or email it to a friend or send it to my phone's navigation application. Google Maps has no single parameter it is trying to maximize; it has no reason to try to "trick" me in order to increase its utility.

In short, Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish.

Every software application I know of seems to work essentially the same way, including those that involve (specialized) artificial intelligence such as Google Search, Siri, Watson, Rybka, etc. Some can be put into an "agent mode" (as Watson was on Jeopardy!) but all can easily be set up to be used as "tools" (for example, Watson can simply display its top candidate answers to a question, with the score for each, without speaking any of them.)

The "tool mode" concept is importantly different from the possibility of Oracle AI sometimes discussed by SI. The discussions I've seen of Oracle AI present it as an Unfriendly AI that is "trapped in a box" - an AI whose intelligence is driven by an explicit utility function and that humans hope to control coercively. Hence the discussion of ideas such as the AI-Box Experiment. A different interpretation, given in Karnofsky/Tallinn 2011, is an AI with a carefully designed utility function - likely as difficult to construct as "Friendliness" - that leaves it "wishing" to answer questions helpfully. By contrast with both these ideas, Tool-AGI is not "trapped" and it is not Unfriendly or Friendly; it has no motivations and no driving utility function of any kind, just like Google Maps. It scores different possibilities and displays its conclusions in a transparent and user-friendly manner, as its instructions say to do; it does not have an overarching "want," and so, as with the specialized AIs described above, while it may sometimes "misinterpret" a question (thereby scoring options poorly and ranking the wrong one #1) there is no reason to expect intentional trickery or manipulation when it comes to displaying its results.

Another way of putting this is that a "tool" has an underlying instruction set that conceptually looks like: "(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc." An "agent," by contrast, has an underlying instruction set that conceptually looks like: "(1) Calculate which action, A, would maximize parameter P, based on existing data set D. (2) Execute Action A." In any AI where (1) is separable (by the programmers) as a distinct step, (2) can be set to the "tool" version rather than the "agent" version, and this separability is in fact present with most/all modern software. Note that in the "tool" version, neither step (1) nor step (2) (nor the combination) constitutes an instruction to maximize a parameter - to describe a program of this kind as "wanting" something is a category error, and there is no reason to expect its step (2) to be deceptive.

I elaborated further on the distinction and on the concept of a tool-AI in Karnofsky/Tallinn 2011.

This is important because an AGI running in tool mode could be extraordinarily useful but far more safe than an AGI running in agent mode...

Notes

1. While Holden's post was probably not the first to discuss this kind of AI, it prompted many responses. Eliezer basically said that non-catastrophic tool AI doesn't seem that easy to specify formally; that even if tool AI is best, agent-AI researchers are probably pretty useful to that problem; and that it's not so bad of MIRI to not discuss tool AI more, since there are a bunch of things other people think are similarly obviously in need of discussion. Luke basically agreed with Eliezer. Stuart argues that having a tool clearly communicate possibilities is a hard problem, and talks about some other problems. Commenters say many things, including that only one AI needs to be agent-like to have a problem, and that it's not clear what it means for a powerful optimizer to not have goals.

2. A problem often brought up with powerful AIs is that when tasked with communicating, they will try to deceive you into liking plans that will fulfil their goals. It seems to me that you can avoid such deception problems by using a tool which searches for a plan you could do that would produce a lot of paperclips, rather than a tool that searches for a string that it could say to you that would produce a lot of paperclips. A plan that produces many paperclips but sounds so bad that you won't do it still does better than a persuasive lower-paperclip plan on the proposed metric. There is still a danger that you just won't notice the perverse way in which the instructions suggested to you will be instantiated, but at least the plan won't be designed to hide it.

3. Note that in computer science, an 'agent' means something other than 'a machine with a goal', though it seems they haven't settled on exactly what [some example efforts (pdf)].

Figure: A 'simple reflex agent' is not goal directed (but kind of looks goal-directed: one in action)

4. Bostrom seems to assume that a powerful tool would be a search process. This is related to the idea that intelligence is an 'optimization process'. But this is more of a definition than an empirical relationship between the kinds of technology we are thinking of as intelligent and the kinds of processes we think of as 'searching'. Could there be things that merely contribute massively to the intelligence of a human - such that we would think of them as very intelligent tools - that naturally forward whatever goals the human has?

One can imagine a tool that is told what you are planning to do, and tries to describe the major consequences of it. This is a search or optimization process in the sense that it outputs something improbably apt from a large space of possible outputs, but that quality alone seems not enough to make something dangerous. For one thing, the machine is not selecting outputs for their effect on the world, but rather for their accuracy as descriptions. For another, the process being run may not be an actual 'search' in the sense of checking lots of things and finding one that does well on some criteria. It could for instance perform a complicated transformation on the incoming data and spit out the result. 

5. One obvious problem with tools is that they maintain humans as a component in all goal-directed behavior. If humans are some combination of slow and rare compared to artificial intelligence, there may be strong pressure to automate all aspects of decisionmaking, i.e. use agents.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Would powerful tools necessarily become goal-directed agents in the troubling sense?
  2. Are different types of entity generally likely to become optimizers, if they are not? If so, which ones? Under what dynamics? Are tool-ish or Oracle-ish things stable attractors in this way?
  3. Can we specify communication behavior in a way that doesn't rely on having goals about the interlocutor's internal state or behavior?
  4. If you assume (perhaps impossibly) strong versions of some narrow-AI capabilities, can you design a safe tool which uses them? e.g. If you had a near perfect predictor, can you design a safe super-Google Maps?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about multipolar scenarios - i.e. situations where a single AI doesn't take over the world. To prepare, read “Of horses and men” from Chapter 11The discussion will go live at 6pm Pacific time next Monday 5 January. Sign up to be notified here.

Superintelligence 14: Motivation selection methods

5 KatjaGrace 16 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the fourteenth section in the reading guideMotivation selection methods. This corresponds to the second part of Chapter Nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Motivation selection methods” and “Synopsis” from Chapter 9.


Summary

  1. One way to control an AI is to design its motives. That is, to choose what it wants to do (p138)
  2. Some varieties of 'motivation selection' for AI safety:
    1. Direct specification: figure out what we value, and code it into the AI (p139-40)
      1. Isaac Asimov's 'three laws of robotics' are a famous example
      2. Direct specification might be fairly hard: both figuring out what we want and coding it precisely seem hard
      3. This could be based on rules, or something like consequentialism
    2. Domesticity: the AI's goals limit the range of things it wants to interfere with (140-1)
      1. This might make direct specification easier, as the world the AI interacts with (and thus which has to be thought of in specifying its behavior) is simpler.
      2. Oracles are an example
      3. This might be combined well with physical containment: the AI could be trapped, and also not want to escape.
    3. Indirect normativity: instead of specifying what we value, specify a way to specify what we value (141-2)
      1. e.g. extrapolate our volition
      2. This means outsourcing the hard intellectual work to the AI
      3. This will mostly be discussed in chapter 13 (weeks 23-5 here)
    4. Augmentation: begin with a creature with desirable motives, then make it smarter, instead of designing good motives from scratch. (p142)
      1. e.g. brain emulations are likely to have human desires (at least at the start)
      2. Whether we use this method depends on the kind of AI that is developed, so usually we won't have a choice about whether to use it (except inasmuch as we have a choice about e.g. whether to develop uploads or synthetic AI first).
  3. Bostrom provides a summary of the chapter:
  4. The question is not which control method is best, but rather which set of control methods are best given the situation. (143-4)

Another view

Icelizarrd:

Would you say there's any ethical issue involved with imposing limits or constraints on a superintelligence's drives/motivations? By analogy, I think most of us have the moral intuition that technologically interfering with an unborn human's inherent desires and motivations would be questionable or wrong, supposing that were even possible. That is, say we could genetically modify a subset of humanity to be cheerful slaves; that seems like a pretty morally unsavory prospect. What makes engineering a superintelligence specifically to serve humanity less unsavory?

Notes

1. Bostrom tells us that it is very hard to specify human values. We have seen examples of galaxies full of paperclips or fake smiles resulting from poor specification. But these - and Isaac Asimov's stories - seem to tell us only that a few people spending a small fraction of their time thinking does not produce any watertight specification. What if a thousand researchers spent a decade on it? Are the millionth most obvious attempts at specification nearly as bad as the most obvious twenty? How hard is it? A general argument for pessimism is the thesis that 'value is fragile', i.e. that if you specify what you want very nearly but get it a tiny bit wrong, it's likely to be almost worthless. Much like if you get one digit wrong in a phone number. The degree to which this is so (with respect to value, not phone numbers) is controversial. I encourage you to try to specify a world you would be happy with (to see how hard it is, or produce something of value if it isn't that hard).

2. If you'd like a taste of indirect normativity before the chapter on it, the LessWrong wiki page on coherent extrapolated volition links to a bunch of sources.

3. The idea of 'indirect normativity' (i.e. outsourcing the problem of specifying what an AI should do, by giving it some good instructions for figuring out what you value) brings up the general question of just what an AI needs to be given to be able to figure out how to carry out our will. An obvious contender is a lot of information about human values. Though some people disagree with this - these people don't buy the orthogonality thesis. Other issues sometimes suggested to need working out ahead of outsourcing everything to AIs include decision theory, priors, anthropics, feelings about pascal's mugging, and attitudes to infinity. MIRI's technical work often fits into this category.

4. Danaher's last post on Superintelligence (so far) is on motivation selection. It mostly summarizes and clarifies the chapter, so is mostly good if you'd like to think about the question some more with a slightly different framing. He also previously considered the difficulty of specifying human values in The golem genie and unfriendly AI (parts one and two), which is about Intelligence Explosion and Machine Ethics.

5. Brian Clegg thinks Bostrom should have discussed Asimov's stories at greater length:

I think it’s a shame that Bostrom doesn’t make more use of science fiction to give examples of how people have already thought about these issues – he gives only half a page to Asimov and the three laws of robotics (and how Asimov then spends most of his time showing how they’d go wrong), but that’s about it. Yet there has been a lot of thought and dare I say it, a lot more readability than you typically get in a textbook, put into the issues in science fiction than is being allowed for, and it would have been worthy of a chapter in its own right.

If you haven't already, you might consider (sort-of) following his advice, and reading some science fiction.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Can you think of novel methods of specifying the values of one or many humans?
  2. What are the most promising methods for 'domesticating' an AI? (i.e. constraining it to only care about a small part of the world, and not want to interfere with the larger world to optimize that smaller part).
  3. Think more carefully about the likely motivations of drastically augmenting brain emulations
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will start to talk about a variety of more and less agent-like AIs: 'oracles', genies' and 'sovereigns'. To prepare, read Chapter “Oracles” and “Genies and Sovereigns” from Chapter 10The discussion will go live at 6pm Pacific time next Monday 22nd December. Sign up to be notified here.

Discussion of AI control over at worldbuilding.stackexchange [LINK]

6 ike 14 December 2014 02:59AM

https://worldbuilding.stackexchange.com/questions/6340/the-challenge-of-controlling-a-powerful-ai

Go insert some rationality into the discussion! (There are actually some pretty good comments in there, and some links to the right places, including LW).

[link] Etzioni: AI will empower us, not exterminate us

4 CBHacking 11 December 2014 08:51AM

https://medium.com/backchannel/ai-wont-exterminate-us-it-will-empower-us-5b7224735bf3

(Slashdot discussion: http://tech.slashdot.org/story/14/12/10/1719232/ai-expert-ai-wont-exterminate-us----it-will-empower-us)

Not sure what the local view of Oren Etzioni or the Allen Institute for AI is, but I'm curious what people think if his views on UFAI risk. As far as I can tell from this article, it basically boils down to "AGI won't happen, at least not any time soon." Is there (significant) reason to believe he's wrong, or is it simply too great a risk to leave to chance?

The Limits of My Rationality

1 JoshuaMyer 09 December 2014 09:08PM

As requested here is an introductory abstract.

The search for bias in the linguistic representations of our cognitive processes serves several purposes in this community. By pruning irrational thoughts, we can potentially effect each other in complex ways. Leaning heavy on cognitivist pedagogy, this essay represents my subjective experience trying to reconcile a perceived conflict between the rhetorical goals of the community and the absence of a generative, organic conceptualization of rationality.


The Story

    Though I've only been here a short time, I find myself fascinated by this discourse community. To discover a group of individuals bound together under the common goal of applied rationality has been an experience that has enriched my life significantly. So please understand, I do not mean to insult by what I am about to say, merely to encourage a somewhat more constructive approach to what I understand as the goal of this community: to apply collectively reinforced notions of rational thought to all areas of life.
   
    As I followed the links and read the articles on the homepage, I found myself somewhat disturbed by the juxtaposition of these highly specific definitions of biases to the narrative structures of parables providing examples in which a bias results in an incorrect conclusion. At first, I thought that perhaps my emotional reaction stemmed from rejecting the unfamiliar; naturally, I decided to learn more about the situation.

    As I read on, my interests drifted from the rhetorical structure of each article (if anyone is interested I might pursue an analysis of rhetoric further though I'm not sure I see a pressing need for this), towards the mystery of how others in the community apply the lessons contained therein. My belief was that the parables would cause most readers to form a negative association of the bias with an undesirable outcome.

    Even a quick skim of the discussions taking place on this site will reveal energetic debate on a variety of topics of potential importance, peppered heavily with accusations of bias. At this point, I noticed the comments that seem to get voted up are ones that are thoughtfully composed, well informed, soundly conceptualized and appropriately referential. Generally, this is true of the articles as well, and so it should be in productive discourse communities. Though I thought it prudent to not read every conversation in absolute detail, I also noticed that the most participated in lines of reasoning were far more rhetorically complex than the parables' portrayal of bias alone could explain. Sure the establishment of bias still seemed to represent the most commonly used rhetorical device on the forums ...

    At this point, I had been following a very interesting discussion on this site about politics. I typically have little or no interest in political theory, but "NRx" vs. "Prog" Assumptions: Locating the Sources of Disagreement Between Neoreactionaries and Progressives (Part 1) seemed so out of place in a community whose political affiliations might best be summarized the phrase "politics is the mind killer" that I couldn't help but investigate. More specifically, I was trying to figure out why it had been posted here at all (I didn't take issue with either the scholarship or intent of the article, but the latter wasn't obvious to me, perhaps because I was completely unfamiliar with the coinage "neoreactionary").

    On my third read, I made a connection to an essay about the socio-historical foundations of rhetoric. In structure, the essay progressed through a wide variety of specific observations on both theory and practice of rhetoric in classical Europe, culminating in a well argued but very unwieldy thesis; at some point in the middle of the essay, I recall a paragraph that begins with the assertion that every statement has political dimensions. I conveyed this idea as eloquently as I could muster, and received a fair bit of karma for it. And to think that it all began with a vague uncomfortable feeling and a desire to understand!

The Lesson

    So you are probably wondering what any of this has to do with rationality, cognition, or the promise of some deeply insightful transformative advice mentioned in the first paragraph. Very good.

    Cognition, a prerequisite for rationality, is a complex process; cognition can be described as the process by which ideas form, interact and evolve. Notice that this definition alone cannot explain how concepts like rationality form, why ideas form or how they should interact to produce intelligence. That specific shortcoming has long crippled cognitivist pedagogies in many disciplines -- no matter which factors you believe to determine intelligence, it is undeniably true that the process by which it occurs organically is not well-understood.

    More intricate models of cognition traditionally vary according to the sets of behavior they seek to explain; in general, this forum seems to concern itself with the wider sets of human behavior, with a strange affinity for statistical analysis. It also seems as if most of the people here associate agency with intelligence, though this should be regarded as unsubstantiated anecdote; I have little interest in what people believe, but those beliefs can have interesting consequences. In general, good models of cognition that yield a sense of agency have to be able to explain how a mushy organic collection of cells might become capable of generating a sense of identity. For this reason, our discussion of cognition will treat intelligence as a confluence of passive processes that lead to an approximation of agency.

    Who are we? What is intelligence? To answer these or any natural language questions we first search for stored-solutions to whatever we perceive as the problem, even as we generate our conception of the question as a set of abstract problems from interactions between memories. In the absence of recognizing a pattern that triggers a stored solution, a new solution is generated by processes of association and abstraction. This process may be central to the generation of every rational and irrational thought a human will ever have. I would argue that the phenomenon of agency approximates an answer to the question: "who am I?" and that any discussion of consciousness should at least acknowledge how critical natural language use is to universal agreement on any matter. I will gladly discuss this matter further and in greater detail if asked.

    At this point, I feel compelled to mention that my initial motivation for pursuing this line of reasoning stems from the realization that this community discusses rationality in a way that differs somewhat from my past encounters with the word.

    Out there, it is commonly believed that rationality develops (in hindsight) to explain the subjective experience of cognition; here we assert a fundamental difference between rationality and this other concept called rationalization. I do not see the utility of this distinction, nor have I found a satisfying explanation of how this distinction operates within accepted models for human learning in such a way that does not assume an a priori method of sorting the values which determine what is considered "rational". Thus we find there is a general derth of generative models of rational cognition beside a plethora of techniques for spotting irrational or biased methods of thinking.

    I see a lot of discussion on the forums very concerned with objective predictions of the future wherein it seems as if rationality (often of a highly probabilistic nature) is, in many cases, expected to bridge the gap between the worlds we can imagine to be possible and our many somewhat subjective realities. And the force keeping these discussions from splintering off into unproductive pissing about is a constant search for bias.

    I know I'm not going to be the first among us to suggest that the search for bias is not truly synonymous with rationality, but I would like to clarify before concluding. Searching for bias in cognitive processes can be a very productive way to spend one's waking hours, and it is a critical element to structuring the subjective world of cognition in such a way that allows abstraction to yield the kind of useful rules that comprise rationality. But it is not, at its core, a generative process.

    Let us consider the cognitive process of association (when beliefs, memories, stimuli or concepts become connected to form more complex structures). Without that period of extremely associative and biased cognition experienced during early childhood, we might never learn to attribute the perceived cause of a burn to a hot stove. Without concepts like better and worse to shape our young minds, I imagine many of us would simply lack the attention span to learn about ethics. And what about all the biases that make parables an effective way of conveying information? After all, the strength of a rhetorical argument is in it's appeal to the interpretive biases of it's intended audience and not the relative consistency of the conceptual foundations of that argument.

    We need to shift discussions involving bias towards models of cognition more complex than portraying it as simply an obstacle to rationality. In my conception of reality, recognizing the existence of bias seems to play a critical role in the development of more complex methods of abstraction; indeed, biases are an intrinsic side effect of the generative grouping of observations that is the core of Bayesian reasoning.

    In short, biases are not generative processes. Discussions of bias are not necessarily useful, rational or intelligent. A deeper understanding of the nature of intelligence requires conceptualizations that embrace the organic truths at the core of sentience; we must be able to describe our concepts of intelligence, our "rationality", such that it can emerge organically as the generative processes at the core of cognition.

    The Idea

    I'd be interested to hear some thoughts about how we might grow to recognize our own biases as necessary to the formative stages of abstraction alongside learning to collectively search for and eliminate biases from our decision making processes. The human mind is limited and while most discussions in natural language never come close to pressing us to those limits, our limitations can still be relevant to those discussions as well as to discussions of artificial intelligences. The way I see things, a bias free machine possessing a model of our own cognition would either have to have stored solutions for every situation it could encounter or methods of generating stored solutions for all future perceived problems (both of which sound like descriptions of oracles to me, though the latter seems more viable from a programmer's perspective).

    A machine capable of making the kinds of decisions considered "easy" for humans, might need biases at some point during it's journey to the complex and self consistent methods of decision making associated with rationality. This is a rhetorically complex community, but at the risk of my reach exceeding my grasp, I would be interested in seeing an examination of the Affect Heuristic in human decision making as an allegory for the historic utility of fuzzy values in chess AI.



    Thank you for your time, and I look forward to what I can only hope will be challenging and thoughtful responses.

Superintelligence 13: Capability control methods

7 KatjaGrace 09 December 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the thirteenth section in the reading guide: capability control methods. This corresponds to the start of chapter nine.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Two agency problems” and “Capability control methods” from Chapter 9


Summary

  1. If the default outcome is doom, how can we avoid it? (p127)
  2. We can divide this 'control problem' into two parts:
    1. The first principal-agent problem: the well known problem faced by a sponsor wanting an employee to fulfill their wishes (usually called 'the principal agent problem')
    2. The second principal-agent problem: the emerging problem of a developer wanting their AI to fulfill their wishes
  3. How to solve second problem? We can't rely on behavioral observation (as seen in week 11). Two other options are 'capability control methods' and 'motivation selection methods'. We see the former this week, and the latter next week.
  4. Capability control methods: avoiding bad outcomes through limiting what an AI can do. (p129)
  5. Some capability control methods:
    1. Boxing: minimize interaction between the AI and the outside world. Note that the AI must interact with the world to be useful, and that it is hard to eliminate small interactions. (p129)
    2. Incentive methods: set up the AI's environment such that it is in the AI's interest to cooperate. e.g. a social environment with punishment or social repercussions often achieves this for contemporary agents. One could also design a reward system, perhaps with cryptographic rewards (so that the AI could not wirehead) or heavily discounted rewards (so that long term plans are not worth the short term risk of detection) (p131)
      • Anthropic capture: an AI thinks it might be in a simulation, and so tries to behave as will be rewarded by simulators (box 8; p134)
    3. Stunting: limit the AI's capabilities. This may be hard to do to a degree that avoids danger and is still useful. An option here is to limit the AI's information. A strong AI may infer much from little apparent access to information however. (p135)
    4. Tripwires: test the system without its knowledge, and shut it down if it crosses some boundary. This might be combined with 'honey pots' to attract undesirable AIs take an action that would reveal them. Tripwires could test behavior, ability, or content. (p137)

Another view

Brian Clegg reviews the book mostly favorably, but isn't convinced that controlling an AI via merely turning it off should be so hard:

I also think a couple of the fundamentals aren’t covered well enough, but pretty much assumed. One is that it would be impossible to contain and restrict such an AI. Although some effort is put into this, I’m not sure there is enough thought put into the basics of ways you can pull the plug manually – if necessary by shutting down the power station that provides the AI with electricity.
Kevin Kelly also apparently doubts that AI will substantially impede efforts to modify it:

...We’ll reprogram the AIs if we are not satisfied with their performance...

...This is an engineering problem. So far as I can tell, AIs have not yet made a decision that its human creators have regretted. If they do (or when they do), then we change their algorithms. If AIs are making decisions that our society, our laws, our moral consensus, or the consumer market, does not approve of, we then should, and will, modify the principles that govern the AI, or create better ones that do make decisions we approve. Of course machines will make “mistakes,” even big mistakes – but so do humans. We keep correcting them. There will be tons of scrutiny on the actions of AI, so the world is watching. However, we don’t have universal consensus on what we find appropriate, so that is where most of the friction about them will come from. As we decide, our AI will decide...

This may be related to his view that AI is unlikely to modify itself (from further down the same page):

3. Reprogramming themselves, on their own, is the least likely of many scenarios.

The great fear pumped up by some, though, is that as AI gain our confidence in making decisions, they will somehow prevent us from altering their decisions. The fear is they lock us out. They go rogue. It is very difficult to imagine how this happens. It seems highly improbable that human engineers would program an AI so that it could not be altered in any way. That is possible, but so impractical. That hobble does not even serve a bad actor. The usual scary scenario is that an AI will reprogram itself on its own to be unalterable by outsiders. This is conjectured to be a selfish move on the AI’s part, but it is unclear how an unalterable program is an advantage to an AI. It would also be an incredible achievement for a gang of human engineers to create a system that could not be hacked. Still it may be possible at some distant time, but it is only one of many possibilities. An AI could just as likely decide on its own to let anyone change it, in open source mode. Or it could decide that it wanted to merge with human will power. Why not? In the only example we have of an introspective self-aware intelligence (hominids), we have found that evolution seems to have designed our minds to not be easily self-reprogrammable. Except for a few yogis, you can’t go in and change your core mental code easily. There seems to be an evolutionary disadvantage to being able to easily muck with your basic operating system, and it is possible that AIs may need the same self-protection. We don’t know. But the possibility they, on their own, decide to lock out their partners (and doctors) is just one of many possibilities, and not necessarily the most probable one.

 

Notes

1. What do you do with a bad AI once it is under your control?

Note that capability control doesn't necessarily solve much: boxing, stunting and tripwires seem to just stall a superintelligence rather than provide means to safely use one to its full capacity. This leaves the controlled AI to be overtaken by some other unconstrained AI as soon as someone else isn't so careful. In this way, capability control methods seem much like slowing down AI research: helpful in the short term while we find better solutions, but not in itself a solution to the problem.

However this might be too pessimistic. An AI whose capabilities are under control might either be almost as useful as an uncontrolled AI who shares your goals (if interacted with the right way), or at least be helpful in getting to a more stable situation.

Paul Christiano outlines a scheme for safely using an unfriendly AI to solve some kinds of problems. We have both blogged on general methods for getting useful work from adversarial agents, which is related.

2. Cryptographic boxing

Paul Christiano describes a way to stop an AI interacting with the environment using a cryptographic box.

3. Philosophical Disquisitions

Danaher again summarizes the chapter well. Read it if you want a different description of any of the ideas, or to refresh your memory. He also provides a table of the methods presented in this chapter.

4. Some relevant fiction

That Alien Message by Eliezer Yudkowsky

5. Control through social integration

Robin Hanson argues that it matters more that a population of AIs are integrated into our social institutions, and that they keep the peace among themselves through the same institutions we keep the peace among ourselves, than whether they have the right values. He thinks this is why you trust your neighbors, not because you are confident that they have the same values as you. He has several followup posts.

6. More miscellaneous writings on these topics

LessWrong wiki on AI boxingArmstrong et al on controlling and using an oracle AIRoman Yampolskiy on 'leakproofing' the singularity. I have not necessarily read these.

 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Choose any control method and work out the details better. For instance:
    1. Could one construct a cryptographic box for an untrusted autonomous system?
    2. Investigate steep temporal discounting as an incentives control method for an untrusted AGI.
  2. Are there other capability control methods we could add to the list?
  3. Devise uses for a malicious but constrained AI.
  4. How much pressure is there likely to be to develop AI which is not controlled?
  5. If existing AI methods had unexpected progress and were heading for human-level soon, what precautions should we take now?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'motivation selection methods'. To prepare, read “Motivation selection methods” and “Synopsis” from Chapter 9The discussion will go live at 6pm Pacific time next Monday 15th December. Sign up to be notified here.

[LINK] Steven Hawking warns of the dangers of AI

10 Salemicus 02 December 2014 03:22PM

From the BBC:

[Hawking] told the BBC:"The development of full artificial intelligence could spell the end of the human race."

...

"It would take off on its own, and re-design itself at an ever increasing rate," he said. "Humans, who are limited by slow biological evolution, couldn't compete, and would be superseded."

There is, however, no mention of Friendly AI or similar principles.

In my opinion, this is particularly notable for the coverage this story is getting within the mainstream media. At the current time, this is the most-read and most-shared news story on the BBC website.

Superintelligence 12: Malignant failure modes

7 KatjaGrace 02 December 2014 02:02AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the twelfth section in the reading guideMalignant failure modes

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: 'Malignant failure modes' from Chapter 8


Summary

  1. Malignant failure mode: a failure that involves human extinction; in contrast with many failure modes where the AI doesn't do much. 
  2. Features of malignant failures 
    1. We don't get a second try 
    2. It supposes we have a great deal of success, i.e. enough to make an unprecedentedly competent agent
  3. Some malignant failures:
    1. Perverse instantiation: the AI does what you ask, but what you ask turns out to be most satisfiable in unforeseen and destructive ways.
      1. Example: you ask the AI to make people smile, and it intervenes on their facial muscles or neurochemicals, instead of via their happiness, and in particular via the bits of the world that usually make them happy.
      2. Possible counterargument: if it's so smart, won't it know what we meant? Answer: Yes, it knows, but it's goal is to make you smile, not to do what you meant when you programmed that goal.
      3. AI which can manipulate its own mind easily is at risk of 'wireheading' - that is, a goal of maximizing a reward signal might be perversely instantiated by just manipulating the signal directly. In general, animals can be motivated to do outside things to achieve internal states, however AI with sufficient access to internal state can do this more easily by manipulating internal state. 
      4. Even if we think a goal looks good, we should fear it has perverse instantiations that we haven't appreciated.
    2. Infrastructure profusion: in pursuit of some goal, an AI redirects most resources to infrastructure, at our expense.
      1. Even apparently self-limiting goals can lead to infrastructure profusion. For instance, to an agent whose only goal is to make ten paperclips, once it has apparently made ten paperclips it is always more valuable to try to become more certain that there are really ten paperclips than it is to just stop doing anything.
      2. Examples: Riemann hypothesis catastrophe, paperclip maximizing AI
    3. Mind crime: AI contains morally relevant computations, and treats them badly
      1. Example: AI simulates humans in its mind, for the purpose of learning about human psychology, then quickly destroys them.
      2. Other reasons for simulating morally relevant creatures:
        1. Blackmail
        2. Creating indexical uncertainty in outside creatures

Another view

In this chapter Bostrom discussed the difficulty he perceives in designing goals that don't lead to indefinite resource acquisition. Steven Pinker recently offered a different perspective on the inevitability of resource acquisition:

...The other problem with AI dystopias is that they project a parochial alpha-male psychology onto the concept of intelligence. Even if we did have superhumanly intelligent robots, why would they want to depose their masters, massacre bystanders, or take over the world? Intelligence is the ability to deploy novel means to attain a goal, but the goals are extraneous to the intelligence itself: being smart is not the same as wanting something. History does turn up the occasional megalomaniacal despot or psychopathic serial killer, but these are products of a history of natural selection shaping testosterone-sensitive circuits in a certain species of primate, not an inevitable feature of intelligent systems. It’s telling that many of our techno-prophets can’t entertain the possibility that artificial intelligence will naturally develop along female lines: fully capable of solving problems, but with no burning desire to annihilate innocents or dominate the civilization.

Of course we can imagine an evil genius who deliberately designed, built, and released a battalion of robots to sow mass destruction.  But we should keep in mind the chain of probabilities that would have to multiply out before it would be a reality. A Dr. Evil would have to arise with the combination of a thirst for pointless mass murder and a genius for technological innovation. He would have to recruit and manage a team of co-conspirators that exercised perfect secrecy, loyalty, and competence. And the operation would have to survive the hazards of detection, betrayal, stings, blunders, and bad luck. In theory it could happen, but I think we have more pressing things to worry about. 

Notes

1. Perverse instantiation is a very old idea. It is what genies are most famous for. King Midas had similar problems. Apparently it was applied to AI by 1947, in With Folded Hands.

2. Adam Elga writes more on simulating people for blackmail and indexical uncertainty.

3. More directions for making AI which don't lead to infrastructure profusion:

  • Some kinds of preferences don't lend themselves to ambitious investments. Anna Salamon talks about risk averse preferences. Short time horizons and goals which are cheap to fulfil should also make long term investments in infrastructure or intelligence augmentation less valuable, compared to direct work on the problem at hand.
  • Oracle and tool AIs are intended to not be goal-directed, but as far as I know it is an open question whether this makes sense. We will get to these later in the book.

5. Often when systems break, or we make errors in them, they don't work at all. Sometimes, they fail more subtly, working well in some sense, but leading us to an undesirable outcome, for instance a malignant failure mode. How can you tell whether a poorly designed AI is likely to just not work, vs. accidentally take over the world? An important consideration for systems in general seems to be the level of abstraction at which the error occurs. We try to build systems so that you can just interact with them at a relatively abstract level, without knowing how the parts work. For instance, you can interact with your GPS by typing places into it, then listening to it, and you don't need to know anything about how it works. If you make an error while up writing your address into the GPS, it will fail by taking you to the wrong place, but it will still direct you there fairly well. If you fail by putting the wires inside the GPS into the wrong places the GPS is more likely to just not work. 

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Are there better ways to specify 'limited' goals? For instance, to ask for ten paperclips without asking for the universe to be devoted to slightly improving the probability of success?
  2. In what circumstances could you be confident that the goals you have given an AI do not permit perverse instantiations? 
  3. Explore possibilities for malignant failure vs. other failures. If we fail, is it actually probable that we will have enough 'success' for our creation to take over the world?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about capability control methods, section 13. To prepare, read “Two agency problems” and “Capability control methods” from Chapter 9The discussion will go live at 6pm Pacific time next Monday December 8. Sign up to be notified here.

[Link] Will Superintelligent Machines Destroy Humanity?

1 roystgnr 27 November 2014 09:48PM

A summary and review of Bostrom's Superintelligence is in the December issue of Reason magazine, and is now posted online at Reason.com.

Superintelligence 11: The treacherous turn

9 KatjaGrace 25 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the 11th section in the reading guideThe treacherous turn. This corresponds to Chapter 8.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8


Summary

  1. The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
    1. First mover advantage implies the AI is in a position to do what it wants
    2. Orthogonality thesis implies that what it wants could be all sorts of things
    3. Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
    4. Humans have resources and may be threats
    5. Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
  2. One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
  3. It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
  4. The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
  5. We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
  6. One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
  7. The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)

Another view

Danaher:

This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.

I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.

Notes

1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:

 

2. History

According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.

 

3. The role of the premises

Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?

It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat. 

4. Signaling

It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.

'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.

Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.

5. When would the 'conception of deception' take place?

Below the level of the best humans presumably, since we have already thought of all this.

6. Surveillance of the mind

Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.

This seems an open question to me, for several reasons:

  1. Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
  2. Especially for a creature which has only just become smart enough to realize it should treacherously turn
  3. From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
As a consequence of 2, it seems better if the 'conception of deception' comes earlier.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
  2. Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.

Superintelligence 10: Instrumentally convergent goals

6 KatjaGrace 18 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the tenth section in the reading guide: Instrumentally convergent goals. This corresponds to the second part of Chapter 7.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. And if you are behind on the book, don't let it put you off discussing. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

ReadingInstrumental convergence from Chapter 7 (p109-114)


Summary

  1. The instrumental convergence thesis: we can identify 'convergent instrumental values' (henceforth CIVs). That is, subgoals that are useful for a wide range of more fundamental goals, and in a wide range of situations. (p109)
  2. Even if we know nothing about an agent's goals, CIVs let us predict some of the agent's behavior (p109)
  3. Some CIVs:
    1. Self-preservation: because you are an excellent person to ensure your own goals are pursued in future.
    2. Goal-content integrity (i.e. not changing your own goals): because if you don't have your goals any more, you can't pursue them.
    3. Cognitive enhancement: because making better decisions helps with any goals.
    4. Technological perfection: because technology lets you have more useful resources.
    5. Resource acquisition: because a broad range of resources can support a broad range of goals.
  4. For each CIV, there are plausible combinations of final goals and scenarios under which an agent would not pursue that CIV. (p109-114)

Notes

1. Why do we care about CIVs?
CIVs to acquire resources and to preserve oneself and one's values play important roles in the argument for AI risk. The desired conclusions are that we can already predict that an AI would compete strongly with humans for resources, and also than an AI once turned on will go to great lengths to stay on and intact.

2. Related work
Steve Omohundro wrote the seminal paper on this topic. The LessWrong wiki links to all of the related papers I know of. Omohundro's list of CIVs (or as he calls them, 'basic AI drives') is a bit different from Bostrom's:

  1. Self-improvement
  2. Rationality
  3. Preservation of utility functions
  4. Avoiding counterfeit utility
  5. Self-protection
  6. Acquisition and efficient use of resources

3. Convergence for values and situations
It seems potentially helpful to distinguish convergence over situations and convergence over values. That is, to think of instrumental goals on two axes - one of how universally agents with different values would want the thing, and one of how large a range of situations it is useful in. A warehouse full of corn is useful for almost any goals, but only in the narrow range of situations where you are a corn-eating organism who fears an apocalypse (or you can trade it). A world of resources converted into computing hardware is extremely valuable in a wide range of scenarios, but much more so if you don't especially value preserving the natural environment. Many things that are CIVs for humans don't make it onto Bostrom's list, I presume because he expects the scenario for AI to be different enough. For instance, procuring social status is useful for all kinds of human goals. For an AI in the situation of a human, it would appear to also be useful. For an AI more powerful than the rest of the world combined, social status is less helpful.

4. What sort of things are CIVs?
Arguably all CIVs mentioned above could be clustered under 'cause your goals to control more resources'. This implies causing more agents to have your values (e.g. protecting your values in yourself), causing those agents to have resources (e.g. getting resources and transforming them into better resources) and getting the agents to control the resources effectively as well as nominally (e.g. cognitive enhancement, rationality). It also suggests convergent values we haven't mentioned. To cause more agents to have one's values, one might create or protect other agents with your values, or spread your values to existing other agents. To improve the resources held by those with one's values, a very convergent goal in human society is to trade. This leads to a convergent goal of creating or acquiring resources which are highly valued by others, even if not by you. Money and social influence are particularly widely redeemable 'resources'. Trade also causes others to act like they have your values when they don't, which is a way of spreading one's values. 

As I mentioned above, my guess is that these are left out of Superintelligence because they involve social interactions. I think Bostrom expects a powerful singleton, to whom other agents will be irrelevant. If you are not confident of the singleton scenario, these CIVs might be more interesting.

5. Another discussion
John Danaher discusses this section of Superintelligence, but not disagreeably enough to read as 'another view'. 

Another view

I don't know of any strong criticism of the instrumental convergence thesis, so I will play devil's advocate.

The concept of a sub-goal that is useful for many final goals is unobjectionable. However the instrumental convergence thesis claims more than this, and this stronger claim is important for the desired argument for AI doom. The further claims are also on less solid ground, as we shall see.

According to the instrumental convergence thesis, convergent instrumental goals not only exist, but can at least sometimes be identified by us. This is needed for arguing that we can foresee that AI will prioritize grabbing resources, and that it will be very hard to control. That we can identify convergent instrumental goals may seem clear - after all, we just did: self-preservation, intelligence enhancement and the like. However to say anything interesting, our claim must not only be that these values are better than not, but that they will be prioritized by the kinds of AI that will exist, in a substantial range of circumstances that will arise. This is far from clear, for several reasons.

Firstly, to know what the AI would prioritize we need to know something about its alternatives, and we can be much less confident that we have thought of all of the alternative instrumental values an AI might have. For instance, in the abstract intelligence enhancement may seem convergently valuable, but in practice adult humans devote little effort to it. This is because investments in intelligence are rarely competitive with other endeavors.

Secondly, we haven't said anything quantitative about how general or strong our proposed convergent instrumental values are likely to be, or how we are weighting the space of possible AI values. Without even any guesses, it is hard to know what to make of resulting predictions. The qualitativeness of the discussion also raises the concern that thinking on the problem has not been very concrete, and so may not be engaged with what is likely in practice.

Thirdly, we have arrived at these convergent instrumental goals by theoretical arguments about what we think of as default rational agents and 'normal' circumstances. These may be very different distributions of agents and scenarios from those produced by our engineering efforts. For instance, perhaps almost all conceivable sets of values - in whatever sense - would favor accruing resources ruthlessly. It would still not be that surprising if an agent somehow created noisily from human values cared about only acquiring resources by certain means or had blanket ill-feelings about greed.

In sum, it is unclear that we can identify important convergent instrumental values, and consequently unclear that such considerations can strongly help predict the behavior of real future AI agents.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Do approximately all final goals make an optimizer want to expand beyond the cosmological horizon?
  2. Can we say anything more quantitative about the strength or prevalence of these convergent instrumental values?
  3. Can we say more about values that are likely to be convergently instrumental just across AIs that are likely to be developed, and situations they are likely to find themselves in?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the treacherous turn. To prepare, read “Existential catastrophe…” and “The treacherous turn” from Chapter 8The discussion will go live at 6pm Pacific time next Monday 24th November. Sign up to be notified here.

AI caught by a module that counterfactually doesn't exist

9 Stuart_Armstrong 17 November 2014 05:49PM

EDIT: Whenever I use colloquial phrases like "the AI believes a (false) X" I mean that we are using utility indifference to accomplish that goal, without actually giving the AI false beliefs.

A big problem with most AI control ideas is that the AI will route around them. Suppose you try to ensure that the AI stays friendly by - say - requiring it to ensure that the world's stock markets continue to function. Then a badly motivated AI will cause the destruction it was going to cause anyway - except it will also create some system that technically counts as "the world's stock markets" and ensure that it "continues to function".

I think we can use counterfactuals and corrigibility (AI-acceptable motivation changes) to partially get around the problem, in some circumstances. Suppose we had a candidate AI, with utility U, that we thought was friendly - but that we weren't sure. We have a subgoal, say SG="ensuring that the world's stock markets continue to function". We're pretty sure that if the AI is friendly, then SG will happen. We're pretty sure that if the AI were unfriendly, and the AI did not have SG added as an objective, then SG wouldn't happen. I deliberately chose a rather rough subgoal to illustrate this.

So we then add a module M to the AI. The task of M, which it has chosen to accept, is to look into the future, using the AI's predictive abilities, and see if SG is accomplished. Crucially, it will do so under the assumption that the module M does not exist (or wasn't activated, or will soon be overwritten, or will fail to do anything, or so on). If M finds that SG is not accomplished, it will turn the AI off (in a way compatible with the AI's motivations - see corrigibility).

How is this different from the usual setup? Well, a generic FAI would accomplish SG (by assumption - we may need to work a bit on this part). A generic UFAI without M wouldn't accomplish SG. A generic UFAI with M would accomplish SG, as this prevents it from being turned off.

But a generic UFAI with M which thought that M would have no impact, would behave exactly as a generic UFAI without M - and wouldn't aim to accomplish SG, and thus would be turned off by M.

Note that M can only be used as a last test for a putatively friendly AI. We cannot use it if we are continually refining AIs (either through evolution or simply redesign) as then we are optimising for SG, and SG is a poor goal to be aiming for (many, many UFAI have SG as a goal - it's just that a generic one won't). Similarly, we can't use a unconstrained search to find such an AI.

I wonder if this idea can be extended. Suggestions?

Musk on AGI Timeframes

19 Artaxerxes 17 November 2014 01:36AM

Elon Musk submitted a comment to edge.org a day or so ago, on this article. It was later removed.

The pace of progress in artificial intelligence (I'm not referring to narrow AI) is incredibly fast. Unless you have direct exposure to groups like Deepmind, you have no idea how fast-it is growing at a pace close to exponential. The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most. This is not a case of crying wolf about something I don't understand.

I am not alone in thinking we should be worried. The leading AI companies have taken great steps to ensure safety. The recognize the danger, but believe that they can shape and control the digital superintelligences and prevent bad ones from escaping into the Internet. That remains to be seen...


Now Elon has been making noises about AI safety lately in general, including for example mentioning Bostrom's Superintelligence on twitter. But this is the first time that I know of that he's come up with his own predictions of the timeframes involved, and I think his are rather quite soon compared to most. 

The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most.

We can compare this to MIRI's post in May this year, When Will AI Be Created, which illustrates that it seems reasonable to think of AI as being further away, but also that there is a lot of uncertainty on the issue.

Of course, "something seriously dangerous" might not refer to full blown superintelligent uFAI - there's plenty of space for disasters of magnitude in between the range of the 2010 flash crash and clippy turning the universe into paperclips to occur.

In any case, it's true that Musk has more "direct exposure" to those on the frontier of AGI research than your average person, and it's also true that he has an audience, so I think there is some interest to be found in his comments here.

 

My new paper: Concept learning for safe autonomous AI

18 Kaj_Sotala 15 November 2014 07:17AM

Abstract: Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts. We review some evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria and mechanisms for generating their own concepts, and could thus learn similar concepts as humans do. We discuss this possibility, and also consider possible complications arising from the embodied nature of human thought, possible evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

I just got word that this paper was accepted for the AAAI-15 Workshop on AI and Ethics: I've uploaded a preprint here. I'm hoping that this could help seed a possibly valuable new subfield of FAI research. Thanks to Steve Rayhawk for invaluable assistance while I was writing this paper: it probably wouldn't have gotten done without his feedback motivating me to work on this.

Comments welcome. 

The germ of an idea

6 Stuart_Armstrong 13 November 2014 06:58PM

Apologies for posting another unformed idea, but I think it's important to get it out there.

The problem with dangerous AI is that it's intelligent, and thus adapts to our countermeasures. If we did something like plant a tree and order the AI not to eat the apple on it, as a test of its obedience, it would easily figure out what we were doing, and avoid the apple (until it had power over us), even if it were a treacherous apple-devouring AI of DOOM.

When I wrote the AI indifference paper, it seemed that it showed a partial way around this problem: the AI would become indifferent to a particular countermeasure (in that example, explosives), so wouldn't adapt its behaviour around it. It seems that the same idea can make an Oracle not attempt to manipulate us through its answers, by making it indifferent as to whether the message was read.

The ideas I'm vaguely groping towards is whether this is a general phenomena - whether we can use indifference to prevent the AI from adapting to any of our efforts. The second question is whether we can profitably use it on the AI's motivation itself. Something like the reduced impact AI reasoning about what impact it could have on the world. This has a penalty function for excessive impact - but maybe that's gameable, maybe there is a pernicious outcome that doesn't have a high penalty, if the AI aims for it exactly. But suppose the AI could calculate its impact under the assumption that it didn't have a penalty function (utility indifference is often equivalent to having incorrect beliefs, but less fragile than that).

So if it was a dangerous AI, it would calculate its impact as if it didn't have a penalty function (and hence no need to route around it), and thus would calculate a large impact, and get penalised by it.

My next post will be more structured, but I feel there's the germ of a potentially very useful idea there. Comments and suggestions welcome.

What's special about a fantastic outcome? Suggestions wanted.

0 Stuart_Armstrong 11 November 2014 11:04AM

I've been returning to my "reduced impact AI" approach, and currently working on some idea.

What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!

I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.

So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.

Superintelligence 9: The orthogonality of intelligence and goals

8 KatjaGrace 11 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the ninth section in the reading guideThe orthogonality of intelligence and goals. This corresponds to the first section in Chapter 7, 'The relation between intelligence and motivation'.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: 'The relation between intelligence and motivation' (p105-8)


Summary

  1. The orthogonality thesis: intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal (p107)
  2. Some qualifications to the orthogonality thesis: (p107)
    1. Simple agents may not be able to entertain some goals
    2. Agents with desires relating to their intelligence might alter their intelligence
  3. The motivations of highly intelligent agents may nonetheless be predicted (p108):
    1. Via knowing the goals the agent was designed to fulfil
    2. Via knowing the kinds of motivations held by the agent's 'ancestors'
    3. Via finding instrumental goals that an agent with almost any ultimate goals would desire (e.g. to stay alive, to control money)

Another view

John Danaher at Philosophical Disquisitions starts a series of posts on Superintelligence with a somewhat critical evaluation of the orthogonality thesis, in the process contributing a nice summary of nearby philosophical debates. Here is an excerpt, entitled 'is the orthogonality thesis plausible?':

At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.

We can call it the motivating belief objection. It works something like this:

Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.

A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).

The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.

What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.

One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.

Notes

1. Why care about the orthogonality thesis?
We are interested in an argument which says that AI might be dangerous, because it might be powerful and motivated by goals very far from our own. An occasional response to this is that if a creature is sufficiently intelligent, it will surely know things like which deeds are virtuous and what one ought do. Thus a sufficiently powerful AI cannot help but be kind to us. This is closely related to the position of the moral realist: that there are facts about what one ought do, which can be observed (usually mentally). 

So the role of the orthogonality thesis in the larger argument is to rule out the possibility that strong artificial intelligence will automatically be beneficial to humans, by virtue of being so clever. For this purpose, it seems a fairly weak version of the orthogonality thesis is needed. For instance, the qualifications discussed do not seem to matter. Even if one's mind needs to be quite complex to have many goals, there is little reason to expect the goals of more complex agents to be disproportionately human-friendly. Also the existence of goals which would undermine intelligence doesn't seem to affect the point.

2. Is the orthogonality thesis necessary?
If we talked about specific capabilities instead of 'intelligence' I suspect the arguments for AI risk could be made similarly well, without anyone being tempted to disagree with the analogous orthogonality theses for those skills. For instance, does anyone believe that a sufficiently good automated programming algorithm will come to appreciate true ethics? 

3. Some writings on the orthogonality thesis which I haven't necessarily read
The Superintelligent Will by Bostrom; Arguing the orthogonality thesis by Stuart Armstrong; Moral Realism, as discussed by lots of people, John Danaher blogs twice

4. 'It might be impossible for a very unintelligent system to have very complex motivations'
If this is so, it seems something more general is true. For any given degree of mental complexity substantially less than that of the universe, almost all values cannot be had by any agent with that degree of complexity or less. You can see this by comparing the number of different states the universe could be in (and thus which one might in principle have as one's goal) to the number of different minds with less than the target level of complexity. Intelligence and complexity are not the same, and perhaps you can be very complex while stupid by dedicating most of your mind to knowing about your complicated goals, but if you think about things this way, then the original statement is also less plausible.

5. How do you tell if two entities with different goals have the same intelligence? Suppose that I want to write award-winning non-fiction books and you want to be a successful lawyer. If we both just work on the thing we care about, how can anyone tell who is better in general? One nice way to judge is to artificially give us both the same instrumental goal, on which our intelligence can be measured. e.g. pay both of us thousands of dollars per correct question on an IQ test, which we could put toward our goals.

Note that this means we treat each person as having a fixed degree of intelligence across tasks. If I do well on the IQ test yet don't write many books, we would presumably say that writing books is just hard. This might work poorly as a model, if for instance people who did worse on the IQ test often wrote more books than me.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Are there interesting axes other than morality on which orthogonality may be false? That is, are there other ways the values of more or less intelligent agents might be constrained?
  2. Is moral realism true? (An old and probably not neglected one, but perhaps you have a promising angle)
  3. Investigate whether the orthogonality thesis holds for simple models of AI.
  4. To what extent can agents with values A be converted into agents with values B with appropriate institutions or arrangements?
  5. Sure, “any level of intelligence could in principle be combined with more or less any final goal,” but what kinds of general intelligences are plausible? Should we expect some correlation between level of intelligence and final goals in de novo AI? How true is this in humans, and in WBEs?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about instrumentally convergent goals. To prepare, read 'Instrumental convergence' from Chapter 7The discussion will go live at 6pm Pacific time next Monday November 17. Sign up to be notified here.

Superintelligence 8: Cognitive superpowers

7 KatjaGrace 04 November 2014 02:01AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the eighth section in the reading guideCognitive Superpowers. This corresponds to Chapter 6.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 6


Summary

  1. AI agents might have very different skill profiles.
  2. AI with some narrow skills could produce a variety of other skills. e.g. strong AI research skills might allow an AI to build its own social skills.
  3. 'Superpowers' that might be particularly important for an AI that wants to take control of the world include:
    1. Intelligence amplification: for bootstrapping its own intelligence
    2. Strategizing: for achieving distant goals and overcoming opposition
    3. Social manipulation: for escaping human control, getting support, and encouraging desired courses of action
    4. Hacking: for stealing hardware, money and infrastructure; for escaping human control
    5. Technology research: for creating military force, surveillance, or space transport
    6. Economic productivity: for making money to spend on taking over the world
  4. These 'superpowers' are relative to other nearby agents; Bostrom means them to be super only if they substantially exceed the combined capabilities of the rest of the global civilization.
  5. A takeover scenario might go like this:
    1. Pre-criticality: researchers make a seed-AI, which becomes increasingly helpful at improving itself
    2. Recursive self-improvement: seed-AI becomes main force for improving itself and brings about an intelligence explosion. It perhaps develops all of the superpowers it didn't already have.
    3. Covert preparation: the AI makes up a robust long term plan, pretends to be nice, and escapes from human control if need be.
    4. Overt implementation: the AI goes ahead with its plan, perhaps killing the humans at the outset to remove opposition.
  6. Wise Singleton Sustainability Threshold (WSST): a capability set exceeds this iff a wise singleton with that capability set would be able to take over much of the accessible universe. 'Wise' here means being patient and savvy about existential risks, 'singleton' means being internally coordinated and having no opponents.
  7. The WSST appears to be low. e.g. our own intelligence is sufficient, as would some skill sets be that were strong in only a few narrow areas.
  8. The cosmic endowment (what we could do with the matter and energy that might ultimately be available if we colonized space) is at least about 10^85 computational operations. This is equivalent to 10^58 emulated human lives.

Another view

Bostrom starts the chapter claiming that humans' dominant position comes from their slightly expanded set of cognitive functions relative to other animals. Computer scientist Ernest Davis criticizes this claim in a recent review of Superintelligence:

The assumption that a large gain in intelligence would necessarily entail a correspondingly large increase in power. Bostrom points out that what he calls a comparatively small increase in brain size and complexity resulted in mankind’s spectacular gain in physical power. But he ignores the fact that the much larger increase in brain size and complexity that preceded the appearance in man had no such effect. He says that the relation of a supercomputer to man will be like the relation of a man to a mouse, rather than like the relation of Einstein to the rest of us; but what if it is like the relation of an elephant to a mouse?

Notes

1. How does this model of AIs with unique bundles of 'superpowers' fit with the story we have heard so far?

Earlier it seemed we were just talking about a single level of intelligence that was growing, whereas now it seems there are potentially many distinct intelligent skills that might need to be developed. Does our argument so far still work out, if an agent has a variety of different sorts of intelligence to be improving?

If you recall, the main argument so far was that AI might be easy (have 'low recalcitrance') mostly because there is a lot of hardware and content sitting around and algorithms might randomly happen to be easy. Then more effort ('optimization power') will be spent on AI as it became evidently important. Then much more effort again will be spent when the AI becomes a large source of labor itself. This was all taken to suggest that AI might progress very fast from human-level to superhuman level, which suggests that one AI agent might get far ahead before anyone else catches up, suggesting that one AI might seize power. 

It seems to me that this argument works a bit less well with a cluster of skills than one central important skill, though it is a matter of degree and the argument was only qualitative to begin with.

It is less likely that AI algorithms will happen to be especially easy if a lot of different algorithms are needed. Also, if different cognitive skills are developed at somewhat different times, then it's harder to imagine a sudden jump when a fully capable AI suddenly reads the whole internet or becomes a hugely more valuable use for hardware than anything being run already. Then if there are many different projects needed for making an AI smarter in different ways, the extra effort (brought first by human optimism and then by self-improving AI) must be divided between those projects. If a giant AI could dedicate its efforts to improving some central feature that would improve all of its future efforts (like 'intelligence'), then it would do much better than if it has to devote one one thousandth of its efforts to each of a thousand different sub-skills, each of which is only relevant for a few niche circumstances. Overall it seems AI must progress slower if its success is driven by more distinct dedicated skills.

2. The 'intelligence amplification' superpower seems much more important than the others. It directly leads to an intelligence explosion - a key reason we have seen so far to expect anything exciting to happen with AI - while several others just allow one-off grabbing of resources (e.g. social manipulation and hacking). Note that this suggests an intelligence explosion could happen with only this superpower, well before an AI appeared to be human-level.

3. Box 6 outlines a specific AI takeover scenario. A bunch of LessWrongers thought about other possibilities in this post.

4. Bostrom mentions that social manipulation could allow a 'boxed' AI to persuade its gatekeepers to let it out. Some humans have tried to demonstrate that this is a serious hazard by simulating the interaction using only an intelligent human in the place of the AI, in the 'AI box experiment'. Apparently in both 'official' efforts the AI escaped, though there have been other trials where the human won.

5. How to measure intelligence

Bostrom pointed to some efforts to design more general intelligence metrics:

Legg: intelligence is measured in terms of reward in all reward-summable environments, weighted by complexity of the environment.

Hibbard: intelligence is measured in terms of the hardest environment you can pass, in a hierarchy of increasingly hard environments

Dowe and Hernández-Orallo have several papers on the topic, and summarize some other efforts. I haven't looked at them enough to summarize.

The Turing Test is the most famous test of machine intelligence. However it only tests whether a machine is at a specific level so isn't great for fine-grained measurement of other levels of intelligence. It is also often misunderstood to measure just whether a machine can conduct a normal chat like a human, rather than whether it can respond as capably as a human to anything you can ask it.

For some specific cognitive skills, there are other measures already. e.g. 'economic productivity' can be measured crudely in terms of profits made. Others seem like they could be developed without too much difficulty. e.g. Social manipulation could be measured in terms of probabilities of succeeding at manipulation tasks - this test doesn't exist as far as I know, but it doesn't seem prohibitively difficult to make.

6. Will we be able to colonize the stars?

Nick Beckstead looked into it recently. Summary: probably.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, almost entirely taken from Luke Muehlhauser's list, without my looking into them further.

  1. Try to develop metrics for specific important cognitive abilities, including general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc.
  2. What is the construct validity of non-anthropomorphic intelligence measures? In other words, are there convergently instrumental prediction and planning algorithms? E.g. can one tend to get agents that are good at predicting economies but not astronomical events? Or do self-modifying agents in a competitive environment tend to converge toward a specific stable attractor in general intelligence space? 
  3. Scenario analysis: What are some concrete AI paths to influence over world affairs? See project guide here.
  4. How much of humanity’s cosmic endowment can we plausibly make productive use of given AGI? One way to explore this question is via various follow-ups to Armstrong & Sandberg (2013). Sandberg lists several potential follow-up studies in this interview, for example (1) get more precise measurements of the distribution of large particles in interstellar and intergalactic space, and (2) analyze how well different long-term storable energy sources scale. See Beckstead (2014).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the orthogonality of intelligence and goals, section 9. To prepare, read The relation between intelligence and motivation from Chapter 7The discussion will go live at 6pm Pacific time next Monday November 10. Sign up to be notified here.

[Link]"Neural Turing Machines"

16 Prankster 31 October 2014 08:54AM

The paper.

Discusses the technical aspects of one of Googles AI projects. According to a pcworld the system "apes human memory and programming skills" (this article seems pretty solid, also contains link to the paper). 

The abstract:

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

 

(First post here, feedback on the appropriateness of the post appreciated)

Superintelligence 7: Decisive strategic advantage

7 KatjaGrace 28 October 2014 01:01AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the seventh section in the reading guideDecisive strategic advantage. This corresponds to Chapter 5.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 5 (p78-91)


Summary

  1. Question: will a single artificial intelligence project get to 'dictate the future'? (p78)
  2. We can ask, will a project attain a 'decisive strategic advantage' and will they use this to make a 'singleton'?
    1. 'Decisive strategic advantage' = a level of technological and other advantages sufficient for complete world domination (p78)
    2. 'Singleton' = a single global decision-making agency strong enough to solve all major global coordination problems (p78, 83)
  3. A project will get a decisive strategic advantage if there is a big enough gap between its capability and that of other projects. 
  4. A faster takeoff would make this gap bigger. Other factors would too, e.g. diffusion of ideas, regulation or expropriation of winnings, the ease of staying ahead once you are far enough ahead, and AI solutions to loyalty issues (p78-9)
  5. For some historical examples, leading projects have a gap of a few months to a few years with those following them. (p79)
  6. Even if a second project starts taking off before the first is done, the first may emerge decisively advantageous. If we imagine takeoff accelerating, a project that starts out just behind the leading project might still be far inferior when the leading project reaches superintelligence. (p82)
  7. How large would a successful project be? (p83) If the route to superintelligence is not AI, the project probably needs to be big. If it is AI, size is less clear. If lots of insights are accumulated in open resources, and can be put together or finished by a small team, a successful AI project might be quite small (p83).
  8. We should distinguish the size of the group working on the project, and the size of the group that controls the project (p83-4)
  9. If large powers anticipate an intelligence explosion, they may want to monitor those involved and/or take control. (p84)
  10. It might be easy to monitor very large projects, but hard to trace small projects designed to be secret from the outset. (p85)
  11. Authorities may just not notice what's going on, for instance if politically motivated firms and academics fight against their research being seen as dangerous. (p85)
  12. Various considerations suggest a superintelligence with a decisive strategic advantage would be more likely than a human group to use the advantage to form a singleton (p87-89)

Another view

This week, Paul Christiano contributes a guest sub-post on an alternative perspective:

Typically new technologies do not allow small groups to obtain a “decisive strategic advantage”—they usually diffuse throughout the whole world, or perhaps are limited to a single country or coalition during war. This is consistent with intuition: a small group with a technological advantage will still do further research slower than the rest of the world, unless their technological advantage overwhelms their smaller size.

The result is that small groups will be overtaken by big groups. Usually the small group will sell or lease their technology to society at large first, since a technology’s usefulness is proportional to the scale at which it can be deployed. In extreme cases such as war these gains might be offset by the cost of empowering the enemy. But even in this case we expect the dynamics of coalition-formation to increase the scale of technology-sharing until there are at most a handful of competing factions.

So any discussion of why AI will lead to a decisive strategic advantage must necessarily be a discussion of why AI is an unusual technology.

In the case of AI, the main difference Bostrom highlights is the possibility of an abrupt increase in productivity. In order for a small group to obtain such an advantage, their technological lead must correspond to a large productivity improvement. A team with a billion dollar budget would need to secure something like a 10,000-fold increase in productivity in order to outcompete the rest of the world. Such a jump is conceivable, but I consider it unlikely. There are other conceivable mechanisms distinctive to AI; I don’t think any of them have yet been explored in enough depth to be persuasive to a skeptical audience.


Notes

1. Extreme AI capability does not imply strategic advantage. An AI program could be very capable - such that the sum of all instances of that AI worldwide were far superior (in capability, e.g. economic value) to the rest of humanity's joint efforts - and yet the AI could fail to have a decisive strategic advantage, because it may not be a strategic unit. Instances of the AI may be controlled by different parties across society. In fact this is the usual outcome for technological developments.

2. On gaps between the best AI project and the second best AI project (p79) A large gap might develop either because of an abrupt jump in capability or extremely fast progress (which is much like an abrupt jump), or from one project having consistent faster growth than other projects for a time. Consistently faster progress is a bit like a jump, in that there is presumably some particular highly valuable thing that changed at the start of the fast progress. Robin Hanson frames his Foom debate with Eliezer as about whether there are 'architectural' innovations to be made, by which he means innovations which have a large effect (or so I understood from conversation). This seems like much the same question. On this, Robin says:

Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.

3. What should activists do? Bostrom points out that activists seeking maximum expected impact might wish to focus their planning on high leverage scenarios, where larger players are not paying attention (p86). This is true, but it's worth noting that changing the probability of large players paying attention is also an option for activists, if they think the 'high leverage scenarios' are likely to be much better or worse.

4. Trade. One key question seems to be whether successful projects are likely to sell their products, or hoard them in the hope of soon taking over the world. I doubt this will be a strategic decision they will make - rather it seems that one of these options will be obviously better given the situation, and we are uncertain about which. A lone inventor of writing should probably not have hoarded it for a solitary power grab, even though it could reasonably have seemed like a good candidate for radically speeding up the process of self-improvement.

5. Disagreement. Note that though few people believe that a single AI project will get to dictate the future, this is often because they disagree with things in the previous chapter - e.g. that a single AI project will plausibly become more capable than the world in the space of less than a month.

6. How big is the AI project? Bostrom distinguishes between the size of the effort to make AI and the size of the group ultimately controlling its decisions. Note that the people making decisions for the AI project may also not be the people making decisions for the AI - i.e. the agents that emerge. For instance, the AI making company might sell versions of their AI to a range of organizations, modified for their particular goals. While in some sense their AI has taken over the world, the actual agents are acting on behalf of much of society.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. When has anyone gained a 'decisive strategic advantage' at a smaller scale than the world? Can we learn anything interesting about what characteristics a project would need to have such an advantage with respect to the world?
  2. How scalable is innovative project secrecy? Examine past cases: Manhattan project, Bletchly park, Bitcoin, Anonymous, Stuxnet, Skunk Works, Phantom Works, Google X.
  3. How large are the gaps in development time between modern software projects? What dictates this? (e.g. is there diffusion of ideas from engineers talking to each other? From people changing organizations? Do people get far enough ahead that it is hard to follow them?)

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about Cognitive superpowers (section 8). To prepare, read Chapter 6The discussion will go live at 6pm Pacific time next Monday 3 November. Sign up to be notified here.

Superintelligence 6: Intelligence explosion kinetics

9 KatjaGrace 21 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the sixth section in the reading guideIntelligence explosion kinetics. This corresponds to Chapter 4 in the book, of a similar name. This section is about how fast a human-level artificial intelligence might become superintelligent.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 4 (p62-77)


Summary

  1. Question: If and when a human-level general machine intelligence is developed, how long will it be from then until a machine becomes radically superintelligent? (p62)
  2. The following figure from p63 illustrates some important features in Bostrom's model of the growth of machine intelligence. He envisages machine intelligence passing human-level, then at some point reaching the level where most inputs to further intelligence growth come from the AI itself ('crossover'), then passing the level where a single AI system is as capable as all of human civilization, then reaching 'strong superintelligence'. The shape of the curve is probably intended an example rather than a prediction.
  3. A transition from human-level machine intelligence to superintelligence might be categorized into one of three scenarios: 'slow takeoff' takes decades or centuries, 'moderate takeoff' takes months or years and 'fast takeoff' takes minutes to days. Which scenario occurs has implications for the kinds of responses that might be feasible.
  4. We can model improvement in a system's intelligence with this equation:

    Rate of change in intelligence = Optimization power/Recalcitrance

    where 'optimization power' is effort being applied to the problem, and 'recalcitrance' is how hard it is to make the system smarter by applying effort.
  5. Bostrom's comments on recalcitrance of different methods of increasing kinds of intelligence:
    1. Cognitive enhancement via public health and diet: steeply diminishing returns (i.e. increasing recalcitrance)
    2. Pharmacological enhancers: diminishing returns, but perhaps there are still some easy wins because it hasn't had a lot of attention.
    3. Genetic cognitive enhancement: U-shaped recalcitrance - improvement will become easier as methods improve, but then returns will decline. Overall rates of growth are limited by maturation taking time.
    4. Networks and organizations: for organizations as a whole recalcitrance is high. A vast amount of effort is spent on this, and the world only becomes around a couple of percent more productive per year. The internet may have merely moderate recalcitrance, but this will likely increase as low-hanging fruits are depleted.
    5. Whole brain emulation: recalcitrance is hard to evaluate, but emulation of an insect will make the path much clearer. After human-level emulations arrive, recalcitrance will probably fall, e.g. because software manipulation techniques will replace physical-capital intensive scanning and image interpretation efforts as the primary ways to improve the intelligence of the system. Also there will be new opportunities for organizing the new creatures. Eventually diminishing returns will set in for these things. Restrictive regulations might increase recalcitrance.
    6. AI algorithms: recalcitrance is hard to judge. It could be very low if a single last key insight is discovered when much else is ready. Overall recalcitrance may drop abruptly if a low-recalcitrance system moves out ahead of higher recalcitrance systems as the most effective method for solving certain problems. We might overestimate the recalcitrance of sub-human systems in general if we see them all as just 'stupid'.
    7. AI 'content': recalcitrance might be very low because of the content already produced by human civilization, e.g. a smart AI might read the whole internet fast, and so become much better.
    8. Hardware (for AI or uploads): potentially low recalcitrance. A project might be scaled up by orders of magnitude by just purchasing more hardware. In the longer run, hardware tends to improve according to Moore's law, and the installed capacity might grow quickly if prices rise due to a demand spike from AI.
  6. Optimization power will probably increase after AI reaches human-level, because its newfound capabilities will attract interest and investment.
  7. Optimization power would increase more rapidly if AI reaches the 'crossover' point, when much of the optimization power is coming from the AI itself. Because smarter machines can improve their intelligence more than less smart machines, after the crossover a 'recursive self improvement' feedback loop would kick in.
  8. Thus optimization power is likely to increase during the takeoff, and this alone could produce a fast or medium takeoff. Further, recalcitrance is likely to decline. Bostrom concludes that a fast or medium takeoff looks likely, though a slow takeoff cannot be excluded.

Notes

1. The argument for a relatively fast takeoff is one of the most controversial arguments in the book, so it deserves some thought. Here is my somewhat formalized summary of the argument as it is presented in this chapter. I personally don't think it holds, so tell me if that's because I'm failing to do it justice. The pink bits are not explicitly in the chapter, but are assumptions the argument seems to use.

  1. Growth in intelligence  =  optimization power /  recalcitrance                                                  [true by definition]
  2. Recalcitrance of AI research will probably drop or be steady when AI reaches human-level               (p68-73)
  3. Optimization power spent on AI research will increase after AI reaches human level                         (p73-77)
  4. Optimization/Recalcitrance will stay similarly high for a while prior to crossover
  5. A 'high' O/R ratio prior to crossover will produce explosive growth OR crossover is close
  6. Within minutes to years, human-level intelligence will reach crossover                                           [from 1-5]
  7. Optimization power will climb ever faster after crossover, in line with the AI's own growing capacity     (p74)
  8. Recalcitrance will not grow much between crossover and superintelligence
  9. Within minutes to years, crossover-level intelligence will reach superintelligence                           [from 7 and 8]
  10. Within minutes to years, human-level AI will likely transition to superintelligence           [from 6 and 9]

Do you find this compelling? Should I have filled out the assumptions differently?

***

2. Other takes on the fast takeoff 

It seems to me that 5 above is the most controversial point. The famous Foom Debate was a long argument between Eliezer Yudkowsky and Robin Hanson over the plausibility of fast takeoff, among other things. Their arguments were mostly about both arms of 5, as well as the likelihood of an AI taking over the world (to be discussed in a future week). The Foom Debate included a live verbal component at Jane Street Capital: blog summaryvideotranscript. Hanson more recently reviewed Superintelligence, again criticizing the plausibility of a single project quickly matching the capacity of the world.

Kevin Kelly criticizes point 5 from a different angle: he thinks that speeding up human thought can't speed up progress all that much, because progress will quickly bottleneck on slower processes.

Others have compiled lists of criticisms and debates here and here.

3. A closer look at 'crossover'

Crossover is 'a point beyond which the system's further improvement is mainly driven by the system's own actions rather than by work performed upon it by others'. Another way to put this, avoiding certain ambiguities, is 'a point at which the inputs to a project are mostly its own outputs', such that improvements to its outputs feed back into its inputs. 

The nature and location of such a point seems an interesting and important question. If you think crossover is likely to be very nearby for AI, then you need only worry about the recursive self-improvement part of the story, which kicks in after crossover. If you think it will be very hard for an AI project to produce most of its own inputs, you may want to pay more attention to the arguments about fast progress before that point.

To have a concrete picture of crossover, consider Google. Suppose Google improves their search product such that one can find a thing on the internet a radical 10% faster. This makes Google's own work more effective, because people at Google look for things on the internet sometimes. How much more effective does this make Google overall? Maybe they spend a couple of minutes a day doing Google searches, i.e. 0.5% of their work hours, for an overall saving of .05% of work time. This suggests their next improvements made at Google will be made 1.0005 faster than the last. It will take a while for this positive feedback to take off. If Google coordinated your eating and organized your thoughts and drove your car for you and so on, and then Google improved efficiency using all of those services by 10% in one go, then this might make their employees close to 10% more productive, which might produce more noticeable feedback. Then Google would have reached the crossover. This is perhaps easier to imagine for Google than other projects, yet I think still fairly hard to imagine.

Hanson talks more about this issue when he asks why the explosion argument doesn't apply to other recursive tools. He points to Douglas Englebart's ambitious proposal to use computer technologies to produce a rapidly self-improving tool set.

Below is a simple model of a project which contributes all of its own inputs, and one which begins mostly being improved by the world. They are both normalized to begin one tenth as large as the world and to grow at the same pace as each other (this is why the one with help grows slower, perhaps counterintuitively). As you can see, the project which is responsible for its own improvement takes far less time to reach its 'singularity', and is more abrupt. It starts out at crossover. The project which is helped by the world doesn't reach crossover until it passes 1. 

 

 

4. How much difference does attention and funding make to research?

Interest and investments in AI at around human-level are (naturally) hypothesized to accelerate AI development in this chapter. It would be good to have more empirical evidence on the quantitative size of such an effect. I'll start with one example, because examples are a bit costly to investigate. I selected renewable energy before I knew the results, because they come up early in the Performance Curves Database, and I thought their funding likely to have been unstable. Indeed, OECD funding since the 70s looks like this apparently:

(from here)

The steep increase in funding in the early 80s was due to President Carter's energy policies, which were related to the 1979 oil crisis.

This is what various indicators of progress in renewable energies look like (click on them to see their sources):

 

 

 

There are quite a few more at the Performance Curves Database. I see surprisingly little relationship between the funding curves and these metrics of progress. Some of them are shockingly straight. What is going on? (I haven't looked into these more than you see here).

5. Other writings on recursive self-improvement

Eliezer Yudkowsky wrote about the idea originally, e.g. here. David Chalmers investigated the topic in some detail, and Marcus Hutter did some more. More pointers here.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Model the intelligence explosion more precisely. Take inspiration from successful economic models, and evidence from a wide range of empirical areas such as evolutionary biology, technological history, algorithmic progress, and observed technological trends. Eliezer Yudkowsky has written at length about this project.
  2. Estimate empirically a specific interaction in the intelligence explosion model. For instance, how much and how quickly does investment increase in technologies that look promising? How much difference does that make to the rate of progress in the technology? How much does scaling up researchers change output in computer science? (Relevant to how much adding extra artificial AI researchers speeds up progress) How much do contemporary organizations contribute to their own inputs? (i.e. how hard would it be for a project to contribute more to its own inputs than the rest of the world put together, such that a substantial positive feedback might ensue?) Yudkowsky 2013 again has a few pointers (e.g. starting at p15).
  3. If human thought was sped up substantially, what would be the main limits to arbitrarily fast technological progress?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'decisive strategic advantage': the possibility of a single AI project getting huge amounts of power in an AI transition. To prepare, read Chapter 5, Decisive Strategic Advantage (p78-90)The discussion will go live at 6pm Pacific time next Monday Oct 27. Sign up to be notified here.

Can AIXI be trained to do anything a human can?

3 Stuart_Armstrong 20 October 2014 01:12PM

There is some discussion as to whether an AIXI-like entity would be able to defend itself (or refrain from destroying itself). The problem is that such an entity would be unable to model itself as being part of the universe: AIXI itself is an uncomputable entity modelling a computable universe, and more limited variants like AIXI(tl) lack the power to simulate themselves. Therefore, they cannot identify "that computer running the code" with "me", and would cheerfully destroy themselves in the pursuit of their goals/reward.

I've pointed out that agents of the AIXI type could nevertheless learn to defend itself in certain circumstances. These were the circumstances where it could translate bad things happening to itself into bad things happening to the universe. For instance, if someone pressed an OFF swith to turn it off for an hour, it could model that as "the universe jumps forwards an hour when that button is pushed", and if that's a negative (which is likely is, since the AIXI loses an hour of influencing the universe), it would seek to prevent that OFF switch being pressed.

That was an example of the setup of the universe "training" the AIXI to do something that it didn't seem it could do. Can this be generalised? Let's go back to the initial AIXI design (the one with the reward channel) and put a human in charge of that reward channel with the mission of teaching the AIXI important facts. Could this work?

For instance, if anything dangerous approached the AIXI's location, the human could lower the AIXI's reward, until it became very effective at deflecting danger. The more variety of things that could potentially threaten the AIXI, the more likely it is to construct plans of actions that contain behaviours that look a lot like "defend myself." We could even imagine that there is a robot programmed to repair the AIXI if it gets (mildly) damaged. The human could then reward the AIXI if it leaves that robot intact or builds duplicates or improves it in some way. It's therefore possible the AIXI could come to come to value "repairing myself", still without explicit model of itself in the universe.

It seems this approach could be extended to many of the problems with AIXI. Sure, an AIXI couldn't restrict its own computation in order to win the HeatingUp game. But the AIXI could be trained to always use subagents to deal with these kinds of games, subagents that could achieve maximal score. In fact, if the human has good knowledge of the AIXI's construction, it could, for instance, pinpoint a button that causes the AIXI to cut short its own calculation. The AIXI could then learn that pushing that button in certain circumstances would get a higher reward. A similar reward mechanism, if kept up long enough, could get it around existential despair problems.

I'm not claiming this would necessarily work - it may require a human rewarder of unfeasibly large intelligence. But it seems there's a chance that it could work. So it seems that categorical statements of the type "AIXI wouldn't..." or "AIXI would..." are wrong, at least as AIXI's behaviour is concerned. An AIXI couldn't develop self-preservation - but it could behave as if it had. It can't learn about itself - but it can behave as if it did. The human rewarder may not be necessary - maybe certain spontaneously occurring situations in the universe ("AIXI training wheels arenas") could allow the AIXI to develop these skills without outside training. Or maybe somewhat stochastic AIXI's with evolution and natural selection could do so. There is an angle connected with embodied embedded cognition that might be worth exploring there (especially the embedded part).

It seems that agents of the AIXI type may not necessarily have the limitations we assume they must.

A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

3 the-citizen 19 October 2014 07:59AM

Friendly AI is an idea that I find to be an admirable goal. While I'm not yet sure an intelligence explosion is likely, or whether FAI is possible, I've found myself often thinking about it, and I'd like for my first post to share a few those thoughts on FAI with you.

Safe AGI vs Friendly AGI
-Let's assume an Intelligence Explosion is possible for now, and that an AGI with the ability to improve itself somehow is enough to achieve it.
-Let's define a safe AGI as an above-human general AI that does not threaten humanity or terran life (eg. FAI, Tool AGI, possibly Oracle AGI)
-Let's define a Friendly AGI as one that *ensures* the continuation of humanity and terran life.
-Let's say an unsafe AGI is all other AGIs.
-Safe AGIs must supress unsafe AGIs in order to be considered Friendly. Here's why:

-If we can build a safe AGI, we probably have the technology to build an unsafe AGI too.
-An unsafe AGI is likely to be built at that point because:
-It's very difficult to conceive of a way that humans alone will be able to permanently stop all humans from developing an unsafe AGI once the steps are known**
-Some people will find the safe AGI's goals unnacceptable
-Some people will rationalise or simply mistake that their AGI design is safe when it is not
-Some people will not care if their AGI design is safe, because they do not care about other people, or because they hold some extreme beliefs
-Most imaginable unsafe AGIs would outcompete safe AGIs, because they would not neccessarily be "hamstrung" by complex goals such as protecting us meatbags from destruction. Tool or Oracle AGIs would obviously not stand a chance due to their restrictions.
-Therefore, If a safe AGI does not prevent unsafe AGIs from coming into existence, humanity will very likely be destroyed.

-The AGI most likely to prevent unsafe AGIs from being created is one that actively predicted their development and terminates that development before or on completion.
-So to summarise

-An AGI is very likely only a Friendly AI if it actively supresses unsafe AGI.
-Oracle and Tool AGIs are not Friendly AIs, they are just safe AIs, because they don't suppress anything.
-Oracle and Tool AGIs are a bad plan for AI if we want to prevent the destruction of humanity, because hostile AGIs will surely follow.

(**On reflection I cannot be certain of this specific point, but I assume it would take a fairly restrictive regime for this to be wrong. Further comments on this very welcome.)

Other minds problem - Why should be philosophically careful when attempting to theorise about FAI

I read quite a few comments in AI discussions that I'd probably characterise as "the best utility function for a FAI is one that values all consciousness". I'm quite concerned that this persists as a deeply held and largely unchallenged assumption amongst some FAI supporters. I think in general I find consciousness to be an extremely contentious, vague and inconsistently defined concept, but here I want to talk about some specific philosophical failures.

My first concern is that while many AI theorists like to say that consciousness is a physical phenomenon, which seems to imply Monist/Physicalist views, they at the same time don't seem to understand that consciousness is a Dualist concept that is coherent only in a Dualist framework. A Dualist believes there is a thing called a "subject" (very crudely this equates with the mind) and then things called objects (the outside "empirical" world interpreted by that mind). Most of this reasoning begins with Descartes' cogito ergo sum or similar starting points ( https://en.wikipedia.org/wiki/Cartesian_dualism ). Subjective experience, qualia and consciousness make sense if you accept that framework. But if you're a Monist, this arbitrary distinction between a subject and object is generally something you don't accept. In the case of a Physicalist, there's just matter doing stuff. A proper Physicalist doesn't believe in "consciousness" or "subjective experience", there's just brains and the physical human behaviours that occur as a result. Your life exists from a certain point of view, I hear you say? The Physicalist replies, "well a bunch of matter arranged to process information would say and think that, wouldn't it?".

I don't really want to get into whether Dualism or Monism is correct/true, but I want to point out even if you try to avoid this by deciding Dualism is right and consciousness is a thing, there's yet another more dangerous problem. The core of the problem is that logically or empirically establishing the existence of minds, other than your own is extremely difficult (impossible according to many). They could just be physical things walking around acting similar to you, but by virtue of something purely mechanical - without actual minds. In philosophy this is called the "other minds problem" ( https://en.wikipedia.org/wiki/Problem_of_other_minds or http://plato.stanford.edu/entries/other-minds/). I recommend a proper read of it if the idea seems crazy to you. It's a problem that's been around for centuries, and yet to-date we don't really have any convincing solution (there are some attempts but they are highly contentious and IMHO also highly problematic). I won't get into it more than that for now, suffice to say that not many people accept that there is a logical/empirical solution to this problem.

Now extrapolate that to an AGI, and the design of its "safe" utility functions. If your AGI is designed as a Dualist (which is neccessary if you wish to encorporate "consciousness", "experience" or the like into your design), then you build-in a huge risk that the AGI will decide that other minds are unprovable or do not exist. In this case your friendly utility function designed to protect "conscious beings" fails and the AGI wipes out humanity because it poses a non-zero threat to the only consciousness it can confirm - its own. For this reason I feel "consciousness", "awareness", "experience" should be left out of FAI utility functions and designs, regardless of the truth of Monism/Dualism, in favour of more straight-forward definitions of organisms, intelligence, observable emotions and intentions. (I personally favour conceptualising any AGI as a sort of extension of biological humanity, but that's a discussion for another day) My greatest concern is there is such strong cultural attachment to the concept of consciousness that researchers will be unwilling to properly question the concept at all.

What if we're not alone?

It seems a little unusual to throw alien life into the mix at this point, but I think its justified because an intelligence explosion really puts an interstellar existence well within our civilisation's grasp. Because it seems that an intelligence explosion implies a very high rate of change, it makes sense to start considering even the long term implication early, particularly if the consequences are very serious, as I believe they may be in this realm of things.

Let's say we successfully achieved a FAI. In order to fufill its mission of protecting humanity and the biosphere, it begins expanding, colonising and terraforming other planets for potential habitation by Earth originating life. I would expect this expansion wouldn't really have a limit, because the more numourous the colonies, the less likely it is we could be wiped out by some interstellar disaster.

Of course, we can't really rule out the possibility that we're not alone in the universe, or even the galaxy. If we make it as far as AGI, then its possible another alien civilisation might reach a very high level of technological advancement too. Or there might be many. If our FAI is friendly to us but basically treats them as paperclip fodder, then potentially that's a big problem. Why? Well:

-Firstly, while a species' first loyalty is to itself, we should consider that it might be morally unsdesirable to wipe out alien civilisations, particularly as they might be in some distant way "related" (see panspermia) to own biosphere.
-Secondly, there is conceivable scenarios where alien civilisations might respond to this by destroying our FAI/Earth/the biosphere/humanity. The reason is fairly obvious when you think about it. An expansionist AGI could be reasonably viewed as an attack or possibly an act of war.

Let's go into a tiny bit more detai. Given that we've not been destroyed by any alien AGI just yet, I can think of a number of possible interstellar scenarios:

(1) There is no other advanced life
(2) There is advanced life, but it is inherently non-expansive (expand inwards, or refuse to develop dangerous AGI)
(3) There is advanced life, but they have not discovered AGI yet. There could potentially be a race-to-the-finish (FAI) scenario on.
(4) There is already expanding AGIs, but due to physical limits on the expansion rate, we are not aware of them yet. (this could use further analysis)
One civilisation, or an allied group of civilisations have develop FAIs and are dominant in the galaxy. They could be either:

(5) Whack-a-mole cilivisations that destroy all potential competitors as soon as they are identified
(6) Dominators that tolerate civilisations so long as they remain primitive and non-threatening by comparison.
(7) Some sort of interstellar community that allows safe civilisations to join (this community still needs to stomp on dangerous potential rival AGIs)

In the case of (6) or (7), developing a FAI that isn't equipped to deal with alien life will probably result in us being liquidated, or at least partially sanitised in some way. In (1) (2) or (5), it probably doesn't matter what we do in this regard, though in (2) we should consider being nice. In (3) and probably (4) we're going to need a FAI capable of expanding very quickly and disarming potential AGIs (or at least ensuring they are FAIs from our perspective).

The upshot of all this is that we probably want to design safety features into our FAI so that it doesn't destroy alien civilisations/life unless its a significant threat to us. I think the understandable reaction to this is something along the lines of "create an FAI that values all types of life" or "intelligent life" or something along these lines. I don't exactly disagree, but I think we must be cautious in how we formulate this too.

Say there are many different civilisations in the galaxy. What sort of criteria would ensure that, given some sort of zero-sum scenario, Earth life wouldn't be destroyed. Let's say there was some sort of tiny but non-zero probability that humanity could evade the FAI's efforts to prevent further AGI development. Or perhaps there was some loophole in the types of AGI's that humans were allowed to develop. Wouldn't it be sensible, in this scenario, for a universalist FAI to wipe out humanity to protect the countless other civilisations? Perhaps that is acceptable? Or perhaps not? Or less drastically, how does the FAI police warfare or other competition between civilisations? A slight change in the way life is quantified and valued could change drastically the outcome for humanity. I'd probably suggest we want to weight the FAI's values to start with human and Earth biosphere primacy, but then still give some non-zero weighting to other civilisations. There is probably more thought to be done in this area too.

Simulation

I want to also briefly note that one conceivable way we might postulate as a safe way to test Friendly AI designs is to simulate a worlds/universes of less complexity than our own, make it likely that it's inhabitants invent a AGI or FAI, and then closely study the results of these simluations. Then we could study failed FAI attempt with much greater safety. It also occured to me that if we consider the possibilty of our universe being a simulated one, then this is a conceivable scenario under which our simulation might be created. After all, if you're going to simulate something, why not something vital like modelling existential risks? I'm not sure yet sure of the implications exactly. Maybe we need to consider how it relates to our universe's continued existence, or perhaps it's just another case of Pascal's Mugging. Anyway I thought I'd mention it and see what people say.

A playground for FAI theories

I want to lastly mention this link (https://www.reddit.com/r/LessWrongLounge/comments/2f3y53/the_ai_game/). Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. I want to suggest this is a very worthwhile discussion, not because its content will include rigourous theories that are directly translatable into utility functions, because very clearly it won't, but because a well developed thread of this kind would be mixing pot of ideas and good introduction to common known mistakes in thinking about FAI. We should encourage a slightly more serious verison of this.

Thanks

FAI and AGI are very interesting topics. I don't consider myself able to really discern whether such things will occur, but its an interesting and potentially vital topic. I'm looking forward to a bit of feedback on my first LW post. Thanks for reading!

Superintelligence 5: Forms of Superintelligence

12 KatjaGrace 14 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the fifth section in the reading guideForms of superintelligence. This corresponds to Chapter 3, on different ways in which an intelligence can be super.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 3 (p52-61)


Summary

  1. A speed superintelligence could do what a human does, but faster. This would make the outside world seem very slow to it. It might cope with this partially by being very tiny, or virtual. (p53)
  2. A collective superintelligence is composed of smaller intellects, interacting in some way. It is especially good at tasks that can be broken into parts and completed in parallel. It can be improved by adding more smaller intellects, or by organizing them better. (p54)
  3. A quality superintelligence can carry out intellectual tasks that humans just can't in practice, without necessarily being better or faster at the things humans can do. This can be understood by analogy with the difference between other animals and humans, or the difference between humans with and without certain cognitive capabilities. (p56-7)
  4. These different kinds of superintelligence are especially good at different kinds of tasks. We might say they have different 'direct reach'. Ultimately they could all lead to one another, so can indirectly carry out the same tasks. We might say their 'indirect reach' is the same. (p58-9)
  5. We don't know how smart it is possible for a biological or a synthetic intelligence to be. Nonetheless we can be confident that synthetic entities can be much more intelligent than biological entities
    1. Digital intelligences would have better hardware: they would be made of components ten million times faster than neurons; the components could communicate about two million times faster than neurons can; they could use many more components while our brains are constrained to our skulls; it looks like better memory should be feasible; and they could be built to be more reliable, long-lasting, flexible, and well suited to their environment.
    2. Digital intelligences would have better software: they could be cheaply and non-destructively 'edited'; they could be duplicated arbitrarily; they could have well aligned goals as a result of this duplication; they could share memories (at least for some forms of AI); and they could have powerful dedicated software (like our vision system) for domains where we have to rely on slow general reasoning.

Notes

  1. This chapter is about different kinds of superintelligent entities that could exist. I like to think about the closely related question, 'what kinds of better can intelligence be?' You can be a better baker if you can bake a cake faster, or bake more cakes, or bake better cakes. Similarly, a system can become more intelligent if it can do the same intelligent things faster, or if it does things that are qualitatively more intelligent. (Collective intelligence seems somewhat different, in that it appears to be a means to be faster or able to do better things, though it may have benefits in dimensions I'm not thinking of.) I think the chapter is getting at different ways intelligence can be better rather than 'forms' in general, which might vary on many other dimensions (e.g. emulation vs AI, goal directed vs. reflexive, nice vs. nasty).
  2. Some of the hardware and software advantages mentioned would be pretty transformative on their own. If you haven't before, consider taking a moment to think about what the world would be like if people could be cheaply and perfectly replicated, with their skills intact. Or if people could live arbitrarily long by replacing worn components. 
  3. The main differences between increasing intelligence of a system via speed and via collectiveness seem to be: (1) the 'collective' route requires that you can break up the task into parallelizable subtasks, (2) it generally has larger costs from communication between those subparts, and (3) it can't produce a single unit as fast as a comparable 'speed-based' system. This suggests that anything a collective intelligence can do, a comparable speed intelligence can do at least as well. One counterexample to this I can think of is that often groups include people with a diversity of knowledge and approaches, and so the group can do a lot more productive thinking than a single person could. It seems wrong to count this as a virtue of collective intelligence in general however, since you could also have a single fast system with varied approaches at different times.
  4. For each task, we can think of curves for how performance increases as we increase intelligence in these different ways. For instance, take the task of finding a fact on the internet quickly. It seems to me that a person who ran at 10x speed would get the figure 10x faster. Ten times as many people working in parallel would do it only a bit faster than one, depending on the variance of their individual performance, and whether they found some clever way to complement each other. It's not obvious how to multiply qualitative intelligence by a particular factor, especially as there are different ways to improve the quality of a system. It also seems non-obvious to me how search speed would scale with a particular measure such as IQ. 
  5. How much more intelligent do human systems get as we add more humans? I can't find much of an answer, but people have investigated the effect of things like team sizecity size, and scientific collaboration on various measures of productivity.
  6. The things we might think of as collective intelligences - e.g. companies, governments, academic fields - seem notable to me for being slow-moving, relative to their components. If someone were to steal some chewing gum from Target, Target can respond in the sense that an employee can try to stop them. And this is no slower than an individual human acting to stop their chewing gum from being taken. However it also doesn't involve any extra problem-solving from the organization - to the extent that the organization's intelligence goes into the issue, it has to have already done the thinking ahead of time. Target was probably much smarter than an individual human about setting up the procedures and the incentives to have a person there ready to respond quickly and effectively, but that might have happened over months or years.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Produce improved measures of (substrate-independent) general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc. Differentiate intelligence quality from speed.
  2. List some feasible but non-realized cognitive talents for humans, and explore what could be achieved if they were given to some humans.
  3. List and examine some types of problems better solved by a speed superintelligence than by a collective superintelligence, and vice versa. Also, what are the returns on “more brains applied to the problem” (collective intelligence) for various problems? If there were merely a huge number of human-level agents added to the economy, how much would it speed up economic growth, technological progress, or other relevant metrics? If there were a large number of researchers added to the field of AI, how would it change progress?
  4. How does intelligence quality improve performance on economically relevant tasks?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'intelligence explosion kinetics', a topic at the center of much contemporary debate over the arrival of machine intelligence. To prepare, read Chapter 4, The kinetics of an intelligence explosion (p62-77)The discussion will go live at 6pm Pacific time next Monday 20 October. Sign up to be notified here.

SRG 4: Biological Cognition, BCIs, Organizations

7 KatjaGrace 07 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we finish chapter 2 with three more routes to superintelligence: enhancement of biological cognition, brain-computer interfaces, and well-organized networks of intelligent agents. This corresponds to the fourth section in the reading guideBiological Cognition, BCIs, Organizations

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading“Biological Cognition” and the rest of Chapter 2 (p36-51)


Summary

Biological intelligence

  1. Modest gains to intelligence are available with current interventions such as nutrition.
  2. Genetic technologies might produce a population whose average is smarter than anyone who has have ever lived.
  3. Some particularly interesting possibilities are 'iterated embryo selection' where many rounds of selection take place in a single generation, and 'spell-checking' where the genetic mutations which are ubiquitous in current human genomes are removed.

Brain-computer interfaces

  1. It is sometimes suggested that machines interfacing closely with the human brain will greatly enhance human cognition. For instance implants that allow perfect recall and fast arithmetic. (p44-45) 
  2. Brain-computer interfaces seem unlikely to produce superintelligence (p51) This is because they have substantial health risks, because our existing systems for getting information in and out of our brains are hard to compete with, and because our brains are probably bottlenecked in other ways anyway. (p45-6) 
  3. 'Downloading' directly from one brain to another seems infeasible because each brain represents concepts idiosyncratically, without a standard format. (p46-7)

Networks and organizations

  1. A large connected system of people (or something else) might become superintelligent. (p48) 
  2. Systems of connected people become more capable through technological and institutional innovations, such as enhanced communications channels, well-aligned incentives, elimination of bureaucratic failures, and mechanisms for aggregating information. The internet as a whole is a contender for a network of humans that might become superintelligent (p49) 

Summary

  1. Since there are many possible paths to superintelligence, we can be more confident that we will get there eventually (p50) 
  2. Whole brain emulation and biological enhancement are both likely to succeed after enough incremental progress in existing technologies. Networks and organizations are already improving gradually. 
  3. The path to AI is less clear, and may be discontinuous. Which route we take might matter a lot, even if we end up with similar capabilities anyway. (p50)

The book so far

Here's a recap of what we have seen so far, now at the end of Chapter 2:

  1. Economic history suggests big changes are plausible.
  2. AI progress is ongoing.
  3. AI progress is hard to predict, but AI experts tend to expect human-level AI in mid-century.
  4. Several plausible paths lead to superintelligence: brain emulations, AI, human cognitive enhancement, brain-computer interfaces, and organizations.
  5. Most of these probably lead to machine superintelligence ultimately.
  6. That there are several paths suggests we are likely to get there.

Do you disagree with any of these points? Tell us about it in the comments.

Notes

  1. Nootropics
    Snake Oil Supplements? is a nice illustration of scientific evidence for different supplements, here filtered for those with purported mental effects, many of which relate to intelligence. I don't know how accurate it is, or where to find a summary of apparent effect sizes rather than evidence, which I think would be more interesting.

    Ryan Carey and I talked to Gwern Branwen - an independent researcher with an interest in nootropics - about prospects for substantial intelligence amplification. I was most surprised that Gwern would not be surprised if creatine gave normal people an extra 3 IQ points.
  2. Environmental influences on intelligence
    And some more health-specific ones.
  3. The Flynn Effect
    People have apparently been getting smarter by about 3 points per decade for much of the twentieth century, though this trend may be ending. Several explanations have been proposed. Namesake James Flynn has a TED talk on the phenomenon. It is strangely hard to find a good summary picture of these changes, but here's a table from Flynn's classic 1978 paper of measured increases at that point:


    Here are changes in IQ test scores over time in a set of Polish teenagers, and a set of Norwegian military conscripts respectively:


  4. Prospects for genetic intelligence enhancement
    This study uses 'Genome-wide Complex Trait Analysis' (GCTA) to estimate that about half of variation in fluid intelligence in adults is explained by common genetic variation (childhood intelligence may be less heritable). These studies use genetic data to predict 1% of variation in intelligence. This genome-wide association study (GWAS) allowed prediction of 2% of education and IQ. This study finds several common genetic variants associated with cognitive performance. Stephen Hsu very roughly estimates that you would need a million samples in order to characterize the relationship between intelligence and genetics. According to Robertson et al, even among students in the top 1% of quantitative ability, cognitive performance predicts differences in occupational outcomes later in life. The Social Science Genetics Association Consortium (SSGAC) lead research efforts on genetics of education and intelligence, and are also investigating the genetics of other 'social science traits' such as self-employment, happiness and fertility. Carl Shulman and Nick Bostrom provide some estimates for the feasibility and impact of genetic selection for intelligence, along with a discussion of reproductive technologies that might facilitate more extreme selection. Robert Sparrow writes about 'in vitro eugenics'. Stephen Hsu also had an interesting interview with Luke Muehlhauser about several of these topics, and summarizes research on genetics and intelligence in a Google Tech Talk.
  5. Some brain computer interfaces in action
    For Parkinson's disease relief, allowing locked in patients to communicate, handwriting, and controlling robot arms.
  6. What changes have made human organizations 'smarter' in the past?
    Big ones I can think of include innovations in using text (writing, printing, digital text editing), communicating better in other ways (faster, further, more reliably), increasing population size (population growth, or connection between disjoint populations), systems for trade (e.g. currency, finance, different kinds of marketplace), innovations in business organization, improvements in governance, and forces leading to reduced conflict.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How well does IQ predict relevant kinds of success? This is informative about what enhanced humans might achieve, in general and in terms of producing more enhancement. How much better is a person with IQ 150 at programming or doing genetics research than a person with IQ 120? How does IQ relate to philosophical ability, reflectiveness, or the ability to avoid catastrophic errors? (related project guide here).
  2. How promising are nootropics? Bostrom argues 'probably not very', but it seems worth checking more thoroughly. One related curiosity is that on casual inspection, there seem to be quite a few nootropics that appeared promising at some point and then haven't been studied much. This could be explained well by any of publication bias, whatever forces are usually blamed for relatively natural drugs receiving little attention, or the casualness of my casual inspection.
  3. How can we measure intelligence in non-human systems? e.g. What are good ways to track increasing 'intelligence' of social networks, quantitatively? We have the general sense that groups of humans are the level at which everything is a lot better than it was in 1000BC, but it would be nice to have an idea of how this is progressing over time. Is GDP a reasonable metric?  
  4. What are the trends in those things that make groups of humans smarter? e.g. How will world capacity for information communication change over the coming decades? (Hilbert and Lopez's work is probably relevant)
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'forms of superintelligence', in the sense of different dimensions in which general intelligence might be scaled up. To prepare, read Chapter 3, Forms of Superintelligence (p52-61)The discussion will go live at 6pm Pacific time next Monday 13 October. Sign up to be notified here.

Superintelligence Reading Group 3: AI and Uploads

9 KatjaGrace 30 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the third section in the reading guide, AI & Whole Brain Emulation. This is about two possible routes to the development of superintelligence: the route of developing intelligent algorithms by hand, and the route of replicating a human brain in great detail.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading“Artificial intelligence” and “Whole brain emulation” from Chapter 2 (p22-36)


Summary

Intro

  1. Superintelligence is defined as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'
  2. There are several plausible routes to the arrival of a superintelligence: artificial intelligence, whole brain emulation, biological cognition, brain-computer interfaces, and networks and organizations. 
  3. Multiple possible paths to superintelligence makes it more likely that we will get there somehow. 
AI
  1. A human-level artificial intelligence would probably have learning, uncertainty, and concept formation as central features.
  2. Evolution produced human-level intelligence. This means it is possible, but it is unclear how much it says about the effort required.
  3. Humans could perhaps develop human-level artificial intelligence by just replicating a similar evolutionary process virtually. This appears at after a quick calculation to be too expensive to be feasible for a century, however it might be made more efficient.
  4. Human-level AI might be developed by copying the human brain to various degrees. If the copying is very close, the resulting agent would be a 'whole brain emulation', which we'll discuss shortly. If the copying is only of a few key insights about brains, the resulting AI might be very unlike humans.
  5. AI might iteratively improve itself from a meagre beginning. We'll examine this idea later. Some definitions for discussing this:
    1. 'Seed AI': a modest AI which can bootstrap into an impressive AI by improving its own architecture.
    2. 'Recursive self-improvement': the envisaged process of AI (perhaps a seed AI) iteratively improving itself.
    3. 'Intelligence explosion': a hypothesized event in which an AI rapidly improves from 'relatively modest' to superhuman level (usually imagined to be as a result of recursive self-improvement).
  6. The possibility of an intelligence explosion suggests we might have modest AI, then suddenly and surprisingly have super-human AI.
  7. An AI mind might generally be very different from a human mind. 

Whole brain emulation

  1. Whole brain emulation (WBE or 'uploading') involves scanning a human brain in a lot of detail, then making a computer model of the relevant structures in the brain.
  2. Three steps are needed for uploading: sufficiently detailed scanning, ability to process the scans into a model of the brain, and enough hardware to run the model. These correspond to three required technologies: scanning, translation (or interpreting images into models), and simulation (or hardware). These technologies appear attainable through incremental progress, by very roughly mid-century.
  3. This process might produce something much like the original person, in terms of mental characteristics. However the copies could also have lower fidelity. For instance, they might be humanlike instead of copies of specific humans, or they may only be humanlike in being able to do some tasks humans do, while being alien in other regards.

Notes

  1. What routes to human-level AI do people think are most likely?
    Bostrom and Müller's survey asked participants to compare various methods for producing synthetic and biologically inspired AI. They asked, 'in your opinion, what are the research approaches that might contribute the most to the development of such HLMI?” Selection was from a list, more than one selection possible. They report that the responses were very similar for the different groups surveyed, except that whole brain emulation got 0% in the TOP100 group (100 most cited authors in AI) but 46% in the AGI group (participants at Artificial General Intelligence conferences). Note that they are only asking about synthetic AI and brain emulations, not the other paths to superintelligence we will discuss next week.
  2. How different might AI minds be?
    Omohundro suggests advanced AIs will tend to have important instrumental goals in common, such as the desire to accumulate resources and the desire to not be killed. 
  3. Anthropic reasoning 
    ‘We must avoid the error of inferring, from the fact that intelligent life evolved on Earth, that the evolutionary processes involved had a reasonably high prior probability of producing intelligence’ (p27) 

    Whether such inferences are valid is a topic of contention. For a book-length overview of the question, see Bostrom’s Anthropic Bias. I’ve written shorter (Ch 2) and even shorter summaries, which links to other relevant material. The Doomsday Argument and Sleeping Beauty Problem are closely related.

  4. More detail on the brain emulation scheme
    Whole Brain Emulation: A Roadmap is an extensive source on this, written in 2008. If that's a bit too much detail, Anders Sandberg (an author of the Roadmap) summarises in an entertaining (and much shorter) talk. More recently, Anders tried to predict when whole brain emulation would be feasible with a statistical model. Randal Koene and Ken Hayworth both recently spoke to Luke Muehlhauser about the Roadmap and what research projects would help with brain emulation now.
  5. Levels of detail
    As you may predict, the feasibility of brain emulation is not universally agreed upon. One contentious point is the degree of detail needed to emulate a human brain. For instance, you might just need the connections between neurons and some basic neuron models, or you might need to model the states of different membranes, or the concentrations of neurotransmitters. The Whole Brain Emulation Roadmap lists some possible levels of detail in figure 2 (the yellow ones were considered most plausible). Physicist Richard Jones argues that simulation of the molecular level would be needed, and that the project is infeasible.

  6. Other problems with whole brain emulation
    Sandberg considers many potential impediments here.

  7. Order matters for brain emulation technologies (scanning, hardware, and modeling)
    Bostrom points out that this order matters for how much warning we receive that brain emulations are about to arrive (p35). Order might also matter a lot to the social implications of brain emulations. Robin Hanson discusses this briefly here, and in this talk (starting at 30:50) and this paper discusses the issue.

  8. What would happen after brain emulations were developed?
    We will look more at this in Chapter 11 (weeks 17-19) as well as perhaps earlier, including what a brain emulation society might look like, how brain emulations might lead to superintelligence, and whether any of this is good.

  9. Scanning (p30-36)
    ‘With a scanning tunneling microscope it is possible to ‘see’ individual atoms, which is a far higher resolution than needed...microscopy technology would need not just sufficient resolution but also sufficient throughput.’

    Here are some atoms, neurons, and neuronal activity in a living larval zebrafish, and videos of various neural events.


    Array tomography of mouse somatosensory cortex from Smithlab.



    A molecule made from eight cesium and eight
    iodine atoms (from here).
  10. Efforts to map connections between neurons
    Here is a 5m video about recent efforts, with many nice pictures. If you enjoy coloring in, you can take part in a gamified project to help map the brain's neural connections! Or you can just look at the pictures they made.

  11. The C. elegans connectome (p34-35)
    As Bostrom mentions, we already know how all of C. elegans neurons are connected. Here's a picture of it (via Sebastian Seung):


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Produce a better - or merely somewhat independent - estimate of how much computing power it would take to rerun evolution artificially. (p25-6)
  2. How powerful is evolution for finding things like human-level intelligence? (You'll probably need a better metric than 'power'). What are its strengths and weaknesses compared to human researchers?
  3. Conduct a more thorough investigation into the approaches to AI that are likely to lead to human-level intelligence, for instance by interviewing AI researchers in more depth about their opinions on the question.
  4. Measure relevant progress in neuroscience, so that trends can be extrapolated to neuroscience-inspired AI. Finding good metrics seems to be hard here.
  5. e.g. How is microscopy progressing? It’s harder to get a relevant measure than you might think, because (as noted p31-33) high enough resolution is already feasible, yet throughput is low and there are other complications. 
  6. Randal Koene suggests a number of technical research projects that would forward whole brain emulation (fifth question).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about other paths to the development of superintelligence: biological cognition, brain-computer interfaces, and organizations. To prepare, read Biological Cognition and the rest of Chapter 2The discussion will go live at 6pm Pacific time next Monday 6 October. Sign up to be notified here.

Superintelligence Reading Group 2: Forecasting AI

10 KatjaGrace 23 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the second section in the reading guide, Forecasting AI. This is about predictions of AI, and what we should make of them.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

ReadingOpinions about the future of machine intelligence, from Chapter 1 (p18-21) and Muehlhauser, When Will AI be Created?


Summary

Opinions about the future of machine intelligence, from Chapter 1 (p18-21)

  1. AI researchers hold a variety of views on when human-level AI will arrive, and what it will be like.
  2. A recent set of surveys of AI researchers produced the following median dates: 
    • for human-level AI with 10% probability: 2022
    • for human-level AI with 50% probability: 2040
    • for human-level AI with 90% probability: 2075
  3. Surveyed AI researchers in aggregate gave 10% probability to 'superintelligence' within two years of human level AI, and 75% to 'superintelligence' within 30 years.
  4. When asked about the long-term impacts of human level AI, surveyed AI researchers gave the responses in the figure below (these are 'renormalized median' responses, 'TOP 100' is one of the surveyed groups, 'Combined' is all of them'). 
  5. There are various reasons to expect such opinion polls and public statements to be fairly inaccurate.
  6. Nonetheless, such opinions suggest that the prospect of human-level AI is worthy of attention.
  1. Predicting when human-level AI will arrive is hard.
  2. The estimates of informed people can vary between a small number of decades and a thousand years.
  3. Different time scales have different policy implications.
  4. Several surveys of AI experts exist, but Muehlhauser suspects sampling bias (e.g. optimistic views being sampled more often) makes such surveys of little use.
  5. Predicting human-level AI development is the kind of task that experts are characteristically bad at, according to extensive research on what makes people better at predicting things.
  6. People try to predict human-level AI by extrapolating hardware trends. This probably won't work, as AI requires software as well as hardware, and software appears to be a substantial bottleneck.
  7. We might try to extrapolate software progress, but software often progresses less smoothly, and is also hard to design good metrics for.
  8. A number of plausible events might substantially accelerate or slow progress toward human-level AI, such as an end to Moore's Law, depletion of low-hanging fruit, societal collapse, or a change in incentives for development.
  9. The appropriate response to this situation is uncertainty: you should neither be confident that human-level AI will take less than 30 years, nor that it will take more than a hundred years.
  10. We can still hope to do better: there are known ways to improve predictive accuracy, such as making quantitative predictions, looking for concrete 'signposts', looking at aggregated predictions, and decomposing complex phenomena into simpler ones.
Notes
  1. More (similar) surveys on when human-level AI will be developed
    Bostrom discusses some recent polls in detail, and mentions that others are fairly consistent. Below are the surveys I could find. Several of them give dates when median respondents believe there is a 10%, 50% or 90% chance of AI, which I have recorded as '10% year' etc. If their findings were in another form, those are in the last column. Note that some of these surveys are fairly informal, and many participants are not AI experts, I'd guess especially in the Bainbridge, AI@50 and Klein ones. 'Kruel' is the set of interviews from which Nils Nilson is quoted on p19. The interviews cover a wider range of topics, and are indexed here.

       10% year  50% year  90% year  Other predictions
    Michie 1972 
    (paper download)
          Fairly even spread between 20, 50 and >50 years
    Bainbridge 2005        Median prediction 2085
    AI@50 poll 
    2006
          82% predict more than 50 years (>2056) or never
    Baum et al
    AGI-09
     2020      2040  2075  
    Klein 2011
        median 2030-2050
    FHI 2011  2028 2050   2150  
    Kruel 2011- (interviews, summary)  2025  2035  2070  
    FHI: AGI 2014 2022  2040  2065  
    FHI: TOP100 2014 2022   2040  2075  
    FHI:EETN 2014 2020  2050  2093  
    FHI:PT-AI 2014 2023  2048  2080  
    Hanson ongoing       Most say have come 10% or less of the way to human level
  2. Predictions in public statements
    Polls are one source of predictions on AI. Another source is public statements. That is, things people choose to say publicly. MIRI arranged for the collection of these public statements, which you can now download and play with (the original and info about it, my edited version and explanation for changes). The figure below shows the cumulative fraction of public statements claiming that human-level AI will be more likely than not by a particular year. Or at least claiming something that can be broadly interpreted as that. It only includes recorded statements made since 2000. There are various warnings and details in interpreting this, but I don't think they make a big difference, so are probably not worth considering unless you are especially interested. Note that the authors of these statements are a mixture of mostly AI researchers (including disproportionately many working on human-level AI) a few futurists, and a few other people.

    (LH axis = fraction of people predicting human-level AI by that date) 

    Cumulative distribution of predicted date of AI

    As you can see, the median date (when the graph hits the 0.5 mark) for human-level AI here is much like that in the survey data: 2040 or so.

    I would generally expect predictions in public statements to be relatively early, because people just don't tend to bother writing books about how exciting things are not going to happen for a while, unless their prediction is fascinatingly late. I checked this more thoroughly, by comparing the outcomes of surveys to the statements made by people in similar groups to those surveyed (e.g. if the survey was of AI researchers, I looked at statements made by AI researchers). In my (very cursory) assessment (detailed at the end of this page) there is a bit of a difference: predictions from surveys are 0-23 years later than those from public statements.
  3. What kinds of things are people good at predicting?
    Armstrong and Sotala (p11) summarize a few research efforts in recent decades as follows.


    Note that the problem of predicting AI mostly falls on the right. Unfortunately this doesn't tell us anything about how much harder AI timelines are to predict than other things, or the absolute level of predictive accuracy associated with any combination of features. However if you have a rough idea of how well humans predict things, you might correct it downward when predicting how well humans predict future AI development and its social consequences.
  4. Biases
    As well as just being generally inaccurate, predictions of AI are often suspected to subject to a number of biases. Bostrom claimed earlier that 'twenty years is the sweet spot for prognosticators of radical change' (p4). A related concern is that people always predict revolutionary changes just within their lifetimes (the so-called Maes-Garreau law). Worse problems come from selection effects: the people making all of these predictions are selected for thinking AI is the best things to spend their lives on, so might be especially optimistic. Further, more exciting claims of impending robot revolution might be published and remembered more often. More bias might come from wishful thinking: having spent a lot of their lives on it, researchers might hope especially hard for it to go well. On the other hand, as Nils Nilson points out, AI researchers are wary of past predictions and so try hard to retain respectability, for instance by focussing on 'weak AI'. This could systematically push their predictions later.

    We have some evidence about these biases. Armstrong and Sotala (using the MIRI dataset) find people are especially willing to predict AI around 20 years in the future, but couldn't find evidence of the Maes-Garreau law. Another way of looking for the Maes-Garreau law is via correlation between age and predicted time to AI, which is weak (-.017) in the edited MIRI dataset. A general tendency to make predictions based on incentives rather than available information is weakly supported by predictions not changing much over time, which is pretty much what we see in the MIRI dataset. In the figure below, 'early' predictions are made before 2000, and 'late' ones since then.


    Cumulative distribution of predicted Years to AI, in early and late predictions.

    We can learn something about selection effects from AI researchers being especially optimistic about AI from comparing groups who might be more or less selected in this way. For instance, we can compare most AI researchers - who tend to work on narrow intelligent capabilities - and researchers of 'artificial general intelligence' (AGI) who specifically focus on creating human-level agents. The figure below shows this comparison with the edited MIRI dataset, using a rough assessment of who works on AGI vs. other AI and only predictions made from 2000 onward ('late'). Interestingly, the AGI predictions indeed look like the most optimistic half of the AI predictions. 


    Cumulative distribution of predicted date of AI, for AGI and other AI researchers

    We can also compare other groups in the dataset - 'futurists' and other people (according to our own heuristic assessment). While the picture is interesting, note that both of these groups were very small (as you can see by the large jumps in the graph). 


    Cumulative distribution of predicted date of AI, for various groups

    Remember that these differences may not be due to bias, but rather to better understanding. It could well be that AGI research is very promising, and the closer you are to it, the more you realize that. Nonetheless, we can say some things from this data. The total selection bias toward optimism in communities selected for optimism is probably not more than the differences we see here - a few decades in the median, but could plausibly be that large.

    These have been some rough calculations to get an idea of the extent of a few hypothesized biases. I don't think they are very accurate, but I want to point out that you can actually gather empirical data on these things, and claim that given the current level of research on these questions, you can learn interesting things fairly cheaply, without doing very elaborate or rigorous investigations.
  5. What definition of 'superintelligence' do AI experts expect within two years of human-level AI with probability 10% and within thirty years with probability 75%?
    “Assume for the purpose of this question that such HLMI will at some point exist. How likely do you then think it is that within (2 years / 30 years) thereafter there will be machine intelligence that greatly surpasses the performance of every human in most professions?” See the paper for other details about Bostrom and Müller's surveys (the ones in the book).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Instead of asking how long until AI, Robin Hanson's mini-survey asks people how far we have come (in a particular sub-area) in the last 20 years, as a fraction of the remaining distance. Responses to this question are generally fairly low - 5% is common. His respondents also tend to say that progress isn't accelerating especially. These estimates imply that any given sub-area of AI, human-level ability should be reached in about 200 years, which is strongly at odds with what researchers say in the other surveys. An interesting project would be to expand Robin's survey, and try to understand the discrepancy, and which estimates we should be using. We made a guide to carrying out this project.
  2. There are many possible empirical projects which would better inform estimates of timelines e.g. measuring the landscape and trends of computation (MIRI started this here, and made a project guide), analyzing performance of different versions of software on benchmark problems to find how much hardware and software contributed to progress, developing metrics to meaningfully measure AI progress, investigating the extent of AI inspiration from biology in the past, measuring research inputs over time (e.g. a start), and finding the characteristic patterns of progress in algorithms (my attempts here).
  3. Make a detailed assessment of likely timelines in communication with some informed AI researchers.
  4. Gather and interpret past efforts to predict technology decades ahead of time. Here are a few efforts to judge past technological predictions: Clarke 1969Wise 1976, Albright 2002, Mullins 2012Kurzweil on his own predictions, and other people on Kurzweil's predictions
  5. Above I showed you several rough calculations I did. A rigorous version of any of these would be useful.
  6. Did most early AI scientists really think AI was right around the corner, or was it just a few people? The earliest survey available (Michie 1973) suggests it may have been just a few people. For those that thought AI was right around the corner, how much did they think about the safety and ethical challenges? If they thought and talked about it substantially, why was there so little published on the subject? If they really didn’t think much about it, what does that imply about how seriously AI scientists will treat the safety and ethical challenges of AI in the future? Some relevant sources here.
  7. Conduct a Delphi study of likely AGI impacts. Participants could be AI scientists, researchers who work on high-assurance software systems, and AGI theorists.
  8. Signpost the future. Superintelligence explores many different ways the future might play out with regard to superintelligence, but cannot help being somewhat agnostic about which particular path the future will take. Come up with clear diagnostic signals that policy makers can use to gauge whether things are developing toward or away from one set of scenarios or another. If X does or does not happen by 2030, what does that suggest about the path we’re on? If Y ends up taking value A or B, what does that imply?
  9. Another survey of AI scientists’ estimates on AGI timelines, takeoff speed, and likely social outcomes, with more respondents and a higher response rate than the best current survey, which is probably Müller & Bostrom (2014).
  10. Download the MIRI dataset and see if you can find anything interesting in it.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about two paths to the development of superintelligence: AI coded by humans, and whole brain emulation. To prepare, read Artificial Intelligence and Whole Brain Emulation from Chapter 2The discussion will go live at 6pm Pacific time next Monday 29 September. Sign up to be notified here.

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

25 KatjaGrace 16 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome to the Superintelligence reading group. This week we discuss the first section in the reading guide, Past developments and present capabilities. This section considers the behavior of the economy over very long time scales, and the recent history of artificial intelligence (henceforth, 'AI'). These two areas are excellent background if you want to think about large economic transitions caused by AI.

This post summarizes the section, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Foreword, and Growth modes through State of the art from Chapter 1 (p1-18)


Summary

Economic growth:

  1. Economic growth has become radically faster over the course of human history. (p1-2)
  2. This growth has been uneven rather than continuous, perhaps corresponding to the farming and industrial revolutions. (p1-2)
  3. Thus history suggests large changes in the growth rate of the economy are plausible. (p2)
  4. This makes it more plausible that human-level AI will arrive and produce unprecedented levels of economic productivity.
  5. Predictions of much faster growth rates might also suggest the arrival of machine intelligence, because it is hard to imagine humans - slow as they are - sustaining such a rapidly growing economy. (p2-3)
  6. Thus economic history suggests that rapid growth caused by AI is more plausible than you might otherwise think.

The history of AI:

  1. Human-level AI has been predicted since the 1940s. (p3-4)
  2. Early predictions were often optimistic about when human-level AI would come, but rarely considered whether it would pose a risk. (p4-5)
  3. AI research has been through several cycles of relative popularity and unpopularity. (p5-11)
  4. By around the 1990s, 'Good Old-Fashioned Artificial Intelligence' (GOFAI) techniques based on symbol manipulation gave way to new methods such as artificial neural networks and genetic algorithms. These are widely considered more promising, in part because they are less brittle and can learn from experience more usefully. Researchers have also lately developed a better understanding of the underlying mathematical relationships between various modern approaches. (p5-11)
  5. AI is very good at playing board games. (12-13)
  6. AI is used in many applications today (e.g. hearing aids, route-finders, recommender systems, medical decision support systems, machine translation, face recognition, scheduling, the financial market). (p14-16)
  7. In general, tasks we thought were intellectually demanding (e.g. board games) have turned out to be easy to do with AI, while tasks which seem easy to us (e.g. identifying objects) have turned out to be hard. (p14)
  8. An 'optimality notion' is the combination of a rule for learning, and a rule for making decisions. Bostrom describes one of these: a kind of ideal Bayesian agent. This is impossible to actually make, but provides a useful measure for judging imperfect agents against. (p10-11)

Notes on a few things

  1. What is 'superintelligence'? (p22 spoiler)
    In case you are too curious about what the topic of this book is to wait until week 3, a 'superintelligence' will soon be described as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'. Vagueness in this definition will be cleared up later. 
  2. What is 'AI'?
    In particular, how does 'AI' differ from other computer software? The line is blurry, but basically AI research seeks to replicate the useful 'cognitive' functions of human brains ('cognitive' is perhaps unclear, but for instance it doesn't have to be squishy or prevent your head from imploding). Sometimes AI research tries to copy the methods used by human brains. Other times it tries to carry out the same broad functions as a human brain, perhaps better than a human brain. Russell and Norvig (p2) divide prevailing definitions of AI into four categories: 'thinking humanly', 'thinking rationally', 'acting humanly' and 'acting rationally'. For our purposes however, the distinction is probably not too important.
  3. What is 'human-level' AI? 
    We are going to talk about 'human-level' AI a lot, so it would be good to be clear on what that is. Unfortunately the term is used in various ways, and often ambiguously. So we probably can't be that clear on it, but let us at least be clear on how the term is unclear. 

    One big ambiguity is whether you are talking about a machine that can carry out tasks as well as a human at any price, or a machine that can carry out tasks as well as a human at the price of a human. These are quite different, especially in their immediate social implications.

    Other ambiguities arise in how 'levels' are measured. If AI systems were to replace almost all humans in the economy, but only because they are so much cheaper - though they often do a lower quality job - are they human level? What exactly does the AI need to be human-level at? Anything you can be paid for? Anything a human is good for? Just mental tasks? Even mental tasks like daydreaming? Which or how many humans does the AI need to be the same level as? Note that in a sense most humans have been replaced in their jobs before (almost everyone used to work in farming), so if you use that metric for human-level AI, it was reached long ago, and perhaps farm machinery is human-level AI. This is probably not what we want to point at.

    Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.

    We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I think because it is hard to define in a meaningful way. There are already machines for which a company is willing to pay more than a human: in this sense a microscope might be 'super-human'. There is no reason for a machine which is equal in value to a human to have the traits we are interested in talking about here, such as agency, superior cognitive abilities or the tendency to drive humans out of work and shape the future. Thus we talk about AI which is at least as good as a human, but you should beware that the predictions made about such an entity may apply before the entity is technically 'human-level'.


    Example of how the first 'human-level' AI may surpass humans in many ways.

    Because of these ambiguities, AI researchers are sometimes hesitant to use the term. e.g. in these interviews.
  4. Growth modes (p1) 
    Robin Hanson wrote the seminal paper on this issue. Here's a figure from it, showing the step changes in growth rates. Note that both axes are logarithmic. Note also that the changes between modes don't happen overnight. According to Robin's model, we are still transitioning into the industrial era (p10 in his paper).
  5. What causes these transitions between growth modes? (p1-2)
    One might be happier making predictions about future growth mode changes if one had a unifying explanation for the previous changes. As far as I know, we have no good idea of what was so special about those two periods. There are many suggested causes of the industrial revolution, but nothing uncontroversially stands out as 'twice in history' level of special. You might think the small number of datapoints would make this puzzle too hard. Remember however that there are quite a lot of negative datapoints - you need an explanation that didn't happen at all of the other times in history. 
  6. Growth of growth
    It is also interesting to compare world economic growth to the total size of the world economy. For the last few thousand years, the economy seems to have grown faster more or less in proportion to it's size (see figure below). Extrapolating such a trend would lead to an infinite economy in finite time. In fact for the thousand years until 1950 such extrapolation would place an infinite economy in the late 20th Century! The time since 1950 has been strange apparently. 

    (Figure from here)
  7. Early AI programs mentioned in the book (p5-6)
    You can see them in action: SHRDLU, Shakey, General Problem Solver (not quite in action), ELIZA.
  8. Later AI programs mentioned in the book (p6)
    Algorithmically generated Beethoven, algorithmic generation of patentable inventionsartificial comedy (requires download).
  9. Modern AI algorithms mentioned (p7-8, 14-15) 
    Here is a neural network doing image recognition. Here is artificial evolution of jumping and of toy cars. Here is a face detection demo that can tell you your attractiveness (apparently not reliably), happiness, age, gender, and which celebrity it mistakes you for.
  10. What is maximum likelihood estimation? (p9)
    Bostrom points out that many types of artificial neural network can be viewed as classifiers that perform 'maximum likelihood estimation'. If you haven't come across this term before, the idea is to find the situation that would make your observations most probable. For instance, suppose a person writes to you and tells you that you have won a car. The situation that would have made this scenario most probable is the one where you have won a car, since in that case you are almost guaranteed to be told about it. Note that this doesn't imply that you should think you won a car, if someone tells you that. Being the target of a spam email might only give you a low probability of being told that you have won a car (a spam email may instead advise you of products, or tell you that you have won a boat), but spam emails are so much more common than actually winning cars that most of the time if you get such an email, you will not have won a car. If you would like a better intuition for maximum likelihood estimation, Wolfram Alpha has several demonstrations (requires free download).
  11. What are hill climbing algorithms like? (p9)
    The second large class of algorithms Bostrom mentions are hill climbing algorithms. The idea here is fairly straightforward, but if you would like a better basic intuition for what hill climbing looks like, Wolfram Alpha has a demonstration to play with (requires free download).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions:

  1. How have investments into AI changed over time? Here's a start, estimating the size of the field.
  2. What does progress in AI look like in more detail? What can we infer from it? I wrote about algorithmic improvement curves before. If you are interested in plausible next steps here, ask me.
  3. What do economic models tell us about the consequences of human-level AI? Here is some such thinking; Eliezer Yudkowsky has written at length about his request for more.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about what AI researchers think about human-level AI: when it will arrive, what it will be like, and what the consequences will be. To prepare, read Opinions about the future of machine intelligence from Chapter 1 and also When Will AI Be Created? by Luke Muehlhauser. The discussion will go live at 6pm Pacific time next Monday 22 September. Sign up to be notified here.

Do Virtual Humans deserve human rights?

-3 cameroncowan 11 September 2014 07:20PM

Do Virtual Humans deserve human rights?

Slate Article

 

I think the idea of storing our minds in a machine so that we can keep on "living" (and I use that term loosely) is fascinating and certainly and oft discussed topic around here. However, in thinking about keeping our brains on a hard drive we have to think about rights and how that all works together. Indeed the technology may be here before we know it so I think its important to think about mindclones. If I create a little version of myself that can answer my emails for me, can I delete him when I'm done with him or just turn him in for a new model like I do iPhones? 

 

I look forward to the discussion.

 

Omission vs commission and conservation of expected moral evidence

2 Stuart_Armstrong 08 September 2014 02:22PM

Consequentialism traditionally doesn't distinguish between acts of commission or acts of omission. Not flipping the lever to the left is equivalent with flipping it to the right.

But there seems one clear case where the distinction is important. Consider a moral learning agent. It must act in accordance with human morality and desires, which it is currently unclear about.

For example, it may consider whether to forcibly wirehead everyone. If it does so, they everyone will agree, for the rest of their existence, that the wireheading was the right thing to do. Therefore across the whole future span of human preferences, humans agree that wireheading was correct, apart from a very brief period of objection in the immediate future. Given that human preferences are known to be inconsistent, this seems to imply that forcible wireheading is the right thing to do (if you happen to personally approve of forcible wireheading, replace that example with some other forcible rewriting of human preferences).

What went wrong there? Well, this doesn't respect "conversation of moral evidence": the AI got the moral values it wanted, but only though the actions it took. This is very close to the omission/commission distinction. We'd want the AI to not take actions (commission) that determines the (expectation of the) moral evidence it gets. Instead, we'd want the moral evidence to accrue "naturally", without interference and manipulation from the AI (omission).

Goal retention discussion with Eliezer

56 MaxTegmark 04 September 2014 10:23PM

Although I feel that Nick Bostrom’s new book “Superintelligence” is generally awesome and a well-needed milestone for the field, I do have one quibble: both he and Steve Omohundro appear to be more convinced than I am by the assumption that an AI will naturally tend to retain its goals as it reaches a deeper understanding of the world and of itself. I’ve written a short essay on this issue from my physics perspective, available at http://arxiv.org/pdf/1409.0813.pdf.

Eliezer Yudkowsky just sent the following extremely interesting comments, and told me he was OK with me sharing them here to spur a broader discussion of these issues, so here goes.

On Sep 3, 2014, at 17:21, Eliezer Yudkowsky <yudkowsky@gmail.com> wrote:

Hi Max!  You're asking the right questions.  Some of the answers we can
give you, some we can't, few have been written up and even fewer in any
well-organized way.  Benja or Nate might be able to expound in more detail
while I'm in my seclusion.

Very briefly, though:
The problem of utility functions turning out to be ill-defined in light of
new discoveries of the universe is what Peter de Blanc named an
"ontological crisis" (not necessarily a particularly good name, but it's
what we've been using locally).

http://intelligence.org/files/OntologicalCrises.pdf

The way I would phrase this problem now is that an expected utility
maximizer makes comparisons between quantities that have the type
"expected utility conditional on an action", which means that the AI's
utility function must be something that can assign utility-numbers to the
AI's model of reality, and these numbers must have the further property
that there is some computationally feasible approximation for calculating
expected utilities relative to the AI's probabilistic beliefs.  This is a
constraint that rules out the vast majority of all completely chaotic and
uninteresting utility functions, but does not rule out, say, "make lots of
paperclips".

Models also have the property of being Bayes-updated using sensory
information; for the sake of discussion let's also say that models are
about universes that can generate sensory information, so that these
models can be probabilistically falsified or confirmed.  Then an
"ontological crisis" occurs when the hypothesis that best fits sensory
information corresponds to a model that the utility function doesn't run
on, or doesn't detect any utility-having objects in.  The example of
"immortal souls" is a reasonable one.  Suppose we had an AI that had a
naturalistic version of a Solomonoff prior, a language for specifying
universes that could have produced its sensory data.  Suppose we tried to
give it a utility function that would look through any given model, detect
things corresponding to immortal souls, and value those things.  Even if
the immortal-soul-detecting utility function works perfectly (it would in
fact detect all immortal souls) this utility function will not detect
anything in many (representations of) universes, and in particular it will
not detect anything in the (representations of) universes we think have
most of the probability mass for explaining our own world.  In this case
the AI's behavior is undefined until you tell me more things about the AI;
an obvious possibility is that the AI would choose most of its actions
based on low-probability scenarios in which hidden immortal souls existed
that its actions could affect.  (Note that even in this case the utility
function is stable!)

Since we don't know the final laws of physics and could easily be
surprised by further discoveries in the laws of physics, it seems pretty
clear that we shouldn't be specifying a utility function over exact
physical states relative to the Standard Model, because if the Standard
Model is even slightly wrong we get an ontological crisis.  Of course
there are all sorts of extremely good reasons we should not try to do this
anyway, some of which are touched on in your draft; there just is no
simple function of physics that gives us something good to maximize.  See
also Complexity of Value, Fragility of Value, indirect normativity, the
whole reason for a drive behind CEV, and so on.  We're almost certainly
going to be using some sort of utility-learning algorithm, the learned
utilities are going to bind to modeled final physics by way of modeled
higher levels of representation which are known to be imperfect, and we're
going to have to figure out how to preserve the model and learned
utilities through shifts of representation.  E.g., the AI discovers that
humans are made of atoms rather than being ontologically fundamental
humans, and furthermore the AI's multi-level representations of reality
evolve to use a different sort of approximation for "humans", but that's
okay because our utility-learning mechanism also says how to re-bind the
learned information through an ontological shift.

This sorta thing ain't going to be easy which is the other big reason to
start working on it well in advance.  I point out however that this
doesn't seem unthinkable in human terms.  We discovered that brains are
made of neurons but were nonetheless able to maintain an intuitive grasp
on what it means for them to be happy, and we don't throw away all that
info each time a new physical discovery is made.  The kind of cognition we
want does not seem inherently self-contradictory.

Three other quick remarks:

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
The Omohundrian/Yudkowskian argument is not that we can take an arbitrary
stupid young AI and it will be smart enough to self-modify in a way that
preserves its values, but rather that most AIs that don't self-destruct
will eventually end up at a stable fixed-point of coherent
consequentialist values.  This could easily involve a step where, e.g., an
AI that started out with a neural-style delta-rule policy-reinforcement
learning algorithm, or an AI that started out as a big soup of
self-modifying heuristics, is "taken over" by whatever part of the AI
first learns to do consequentialist reasoning about code.  But this
process doesn't repeat indefinitely; it stabilizes when there's a
consequentialist self-modifier with a coherent utility function that can
precisely predict the results of self-modifications.  The part where this
does happen to an initial AI that is under this threshold of stability is
a big part of the problem of Friendly AI and it's why MIRI works on tiling
agents and so on!

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
It built humans to be consequentialists that would value sex, not value
inclusive genetic fitness, and not value being faithful to natural
selection's optimization criterion.  Well, that's dumb, and of course the
result is that humans don't optimize for inclusive genetic fitness.
Natural selection was just stupid like that.  But that doesn't mean
there's a generic process whereby an agent rejects its "purpose" in the
light of exogenously appearing preference criteria.  Natural selection's
anthropomorphized "purpose" in making human brains is just not the same as
the cognitive purposes represented in those brains.  We're not talking
about spontaneous rejection of internal cognitive purposes based on their
causal origins failing to meet some exogenously-materializing criterion of
validity.  Our rejection of "maximize inclusive genetic fitness" is not an
exogenous rejection of something that was explicitly represented in us,
that we were explicitly being consequentialists for.  It's a rejection of
something that was never an explicitly represented terminal value in the
first place.  Similarly the stability argument for sufficiently advanced
self-modifiers doesn't go through a step where the successor form of the
AI reasons about the intentions of the previous step and respects them
apart from its constructed utility function.  So the lack of any universal
preference of this sort is not a general obstacle to stable
self-improvement.

*)   The case of natural selection does not illustrate a universal
computational constraint, it illustrates something that we could
anthropomorphize as a foolish design error.  Consider humans building Deep
Blue.  We built Deep Blue to attach a sort of default value to queens and
central control in its position evaluation function, but Deep Blue is
still perfectly able to sacrifice queens and central control alike if the
position reaches a checkmate thereby.  In other words, although an agent
needs crystallized instrumental goals, it is also perfectly reasonable to
have an agent which never knowingly sacrifices the terminally defined
utilities for the crystallized instrumental goals if the two conflict;
indeed "instrumental value of X" is simply "probabilistic belief that X
leads to terminal utility achievement", which is sensibly revised in the
presence of any overriding information about the terminal utility.  To put
it another way, in a rational agent, the only way a loose generalization
about instrumental expected-value can conflict with and trump terminal
actual-value is if the agent doesn't know it, i.e., it does something that
it reasonably expected to lead to terminal value, but it was wrong.

This has been very off-the-cuff and I think I should hand this over to
Nate or Benja if further replies are needed, if that's all right.

Superintelligence reading group

16 KatjaGrace 31 August 2014 02:59PM

In just over two weeks I will be running an online reading group on Nick Bostrom's Superintelligence, on behalf of MIRI. It will be here on LessWrong. This is an advance warning, so you can get a copy and get ready for some stimulating discussion. MIRI's post, appended below, gives the details.

Added: At the bottom of this post is a list of the discussion posts so far.


Nick Bostrom’s eagerly awaited Superintelligence comes out in the US this week. To help you get the most out of it, MIRI is running an online reading group where you can join with others to ask questions, discuss ideas, and probe the arguments more deeply.

The reading group will “meet” on a weekly post on the LessWrong discussion forum. For each ‘meeting’, we will read about half a chapter of Superintelligence, then come together virtually to discuss. I’ll summarize the chapter, and offer a few relevant notes, thoughts, and ideas for further investigation. (My notes will also be used as the source material for the final reading guide for the book.)

Discussion will take place in the comments. I’ll offer some questions, and invite you to bring your own, as well as thoughts, criticisms and suggestions for interesting related material. Your contributions to the reading group might also (with permission) be used in our final reading guide for the book.

We welcome both newcomers and veterans on the topic. Content will aim to be intelligible to a wide audience, and topics will range from novice to expert level. All levels of time commitment are welcome.

We will follow this preliminary reading guide, produced by MIRI, reading one section per week.

If you have already read the book, don’t worry! To the extent you remember what it says, your superior expertise will only be a bonus. To the extent you don’t remember what it says, now is a good time for a review! If you don’t have time to read the book, but still want to participate, you are also welcome to join in. I will provide summaries, and many things will have page numbers, in case you want to skip to the relevant parts.

If this sounds good to you, first grab a copy of Superintelligence. You may also want to sign up here to be emailed when the discussion begins each week. The first virtual meeting (forum post) will go live at 6pm Pacific on Monday, September 15th. Following meetings will start at 6pm every Monday, so if you’d like to coordinate for quick fire discussion with others, put that into your calendar. If you prefer flexibility, come by any time! And remember that if there are any people you would especially enjoy discussing Superintelligence with, link them to this post!

Topics for the first week will include impressive displays of artificial intelligence, why computers play board games so well, and what a reasonable person should infer from the agricultural and industrial revolutions.


Posts in this sequence

Week 1: Past developments and present capabilities

Week 2: Forecasting AI

Week 3: AI and uploads

Week 4: Biological cognition, BCIs, organizations

Week 5: Forms of superintelligence

Week 6: Intelligence explosion kinetics

Week 7: Decisive strategic advantage

Week 8: Cognitive superpowers

Week 9: The orthogonality of intelligence and goals

Week 10: Instrumentally convergent goals

Week 11: The treacherous turn

Week 12: Malignant failure modes

Week 13: Capability control methods

Week 14: Motivation selection methods

Week 15: Oracles, genies and sovereigns

Week 16: Tool AIs

Week 17: Multipolar scenarios

Week 18: Life in an algorithmic economy

Week 19: Post-transition formation of a singleton

The Great Filter is early, or AI is hard

19 Stuart_Armstrong 29 August 2014 04:17PM

Attempt at the briefest content-full Less Wrong post:

Once AI is developed, it could "easily" colonise the universe. So the Great Filter (preventing the emergence of star-spanning civilizations) must strike before AI could be developed. If AI is easy, we could conceivably have built it already, or we could be on the cusp of building it. So the Great Filter must predate us, unless AI is hard.

The immediate real-world uses of Friendly AI research

6 ancientcampus 26 August 2014 02:47AM

Much of the glamor and attention paid toward Friendly AI is focused on the misty-future event of a super-intelligent general AI, and how we can prevent it from repurposing our atoms to better run Quake 2. Until very recently, that was the full breadth of the field in my mind. I recently realized that dumber, narrow AI is a real thing today, helpfully choosing advertisements for me and running my 401K. As such, making automated programs safe to let loose on the real world is not just a problem to solve as a favor for the people of tomorrow, but something with immediate real-world advantages that has indeed already been going on for quite some time. Veterans in the field surely already understand this, so this post is directed at people like me, with a passing and disinterested understanding of the point of Friendly AI research, and outlines an argument that the field may be useful right now, even if you believe that an evil AI overlord is not on the list of things to worry about in the next 40 years.

 

Let's look at the stock market. High-Frequency Trading is the practice of using computer programs to make fast trades constantly throughout the day, and accounts for more than half of all equity trades in the US. So, the economy today is already in the hands of a bunch of very narrow AIs buying and selling to each other. And as you may or may not already know, this has already caused problems. In the “2010 Flash Crash”, the Dow Jones suddenly and mysteriously hit a massive plummet only to mostly recover within a few minutes. The reasons for this were of course complicated, but it boiled down to a couple red flags triggering in numerous programs, setting off a cascade of wacky trades.

 

The long-term damage was not catastrophic to society at large (though I'm sure a couple fortunes were made and lost that day), but it illustrates the need for safety measures as we hand over more and more responsibility and power to processes that require little human input. It might be a blue moon before anyone makes true general AI, but adaptive city traffic-light systems are entirely plausible in upcoming years.

 

To me, Friendly AI isn't solely about making a human-like intelligence that doesn't hurt us – we need techniques for testing automated programs, predicting how they will act when let loose on the world, and how they'll act when faced with unpredictable situations. Indeed, when framed like that, it looks less like a field for “the singularitarian cultists at LW”, and more like a narrow-but-important specialty in which quite a bit of money might be made.

 

After all, I want my self-driving car.

 

(To the actual researchers in FAI – I'm sorry if I'm stretching the field's definition to include more than it does or should. If so, please correct me.)

View more: Next