Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Superintelligence 11: The treacherous turn

6 KatjaGrace 25 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the 11th section in the reading guideThe treacherous turn. This corresponds to Chapter 8.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: “Existential catastrophe…” and “The treacherous turn” from Chapter 8


Summary

  1. The possibility of a first mover advantage + orthogonality thesis + convergent instrumental values suggests doom for humanity (p115-6)
    1. First mover advantage implies the AI is in a position to do what it wants
    2. Orthogonality thesis implies that what it wants could be all sorts of things
    3. Instrumental convergence thesis implies that regardless of its wants, it will try to acquire resources and eliminate threats
    4. Humans have resources and may be threats
    5. Therefore an AI in a position to do what it wants is likely to want to take our resources and eliminate us. i.e. doom for humanity.
  2. One kind of response: why wouldn't the makers of the AI be extremely careful not to develop and release dangerous AIs, or relatedly, why wouldn't someone else shut the whole thing down? (p116)
  3. It is hard to observe whether an AI is dangerous via its behavior at a time when you could turn it off, because AIs have convergent instrumental reasons to pretend to be safe, even if they are not. If they expect their minds to be surveilled, even observing their thoughts may not help. (p117)
  4. The treacherous turn: while weak, an AI behaves cooperatively. When the AI is strong enough to be unstoppable it pursues its own values. (p119)
  5. We might expect AIs to be more safe as they get smarter initially - when most of the risks come from crashing self-driving cars or mis-firing drones - then to get much less safe as they get too smart. (p117)
  6. One can imagine a scenario where there is little social impetus for safety (p117-8): alarmists will have been wrong for a long time, smarter AI will have been safer for a long time, large industries will be invested, an exciting new technique will be hard to set aside, useless safety rituals will be available, and the AI will look cooperative enough in its sandbox.
  7. The conception of deception: that moment when the AI realizes that it should conceal its thoughts (footnote 2, p282)

Another view

Danaher:

This is all superficially plausible. It is indeed conceivable that an intelligent system — capable of strategic planning — could take such treacherous turns. And a sufficiently time-indifferent AI could play a “long game” with us, i.e. it could conceal its true intentions and abilities for a very long time. Nevertheless, accepting this has some pretty profound epistemic costs. It seems to suggest that no amount of empirical evidence could ever rule out the possibility of a future AI taking a treacherous turn. In fact, its even worse than that. If we take it seriously, then it is possible that we have already created an existentially threatening AI. It’s just that it is concealing its true intentions and powers from us for the time being.

I don’t quite know what to make of this. Bostrom is a pretty rational, bayesian guy. I tend to think he would say that if all the evidence suggests that our AI is non-threatening (and if there is a lot of that evidence), then we should heavily discount the probability of a treacherous turn. But he doesn’t seem to add that qualification in the chapter. He seems to think the threat of an existential catastrophe from a superintelligent AI is pretty serious. So I’m not sure whether he embraces the epistemic costs I just mentioned or not.

Notes

1. Danaher also made a nice diagram of the case for doom, and relationship with the treacherous turn:

 

2. History

According to Luke Muehlhauser's timeline of AI risk ideas, the treacherous turn idea for AIs has been around at least 1977, when a fictional worm did it:

1977: Self-improving AI could stealthily take over the internet; convergent instrumental goals in AI; the treacherous turn. Though the concept of a self-propagating computer worm was introduced by John Brunner's The Shockwave Rider (1975), Thomas J. Ryan's novel The Adolescence of P-1 (1977) tells the story of an intelligent worm that at first is merely able to learn to hack novel computer systems and use them to propagate itself, but later (1) has novel insights on how to improve its own intelligence, (2) develops convergent instrumental subgoals (see Bostrom 2012) for self-preservation and resource acquisition, and (3) learns the ability to fake its own death so that it can grow its powers in secret and later engage in a "treacherous turn" (see Bostrom forthcoming) against humans.

 

3. The role of the premises

Bostrom's argument for doom has one premise that says AI could care about almost anything, then another that says regardless of what an AI cares about, it will do basically the same terrible things anyway. (p115) Do these sound a bit strange together to you? Why do we need the first, if final values don't tend to change instrumental goals anyway?

It seems the immediate reason is that an AI with values we like would not have the convergent goal of taking all our stuff and killing us. That is, the values we want an AI to have are some of those rare values that don't lead to destructive instrumental goals. Why is this? Because we (and thus the AI) care about the activites the resources would be grabbed from. If the resources were currently being used for anything we didn't care about, then our values would also suggest grabbing resources, and look similar to all of the other values. The difference that makes our values special here is just that most resources are already being used for them somewhat. 

4. Signaling

It is hard to tell apart a safe and an unsafe AI, because both would like to look safe. This is a very common problem in human interactions. For instance, it can be nontrivial to tell a genuine lover from a gold digger, a businessman from a conman, and an expert from a crank. All of them want to look like the desirable sort. Particularly similar to the AI case is that of hiring a new employee for a trial period. You will sometimes find that the employee's values are much better aligned during the trial period, and then they undergo a 'treacherous turn' once they have been hired more thoroughly.

'Costly signaling' is a general purpose solution to this problem, which works some of the time. The basic idea is this. Everyone has instrumental reasons to look like the good kind of person, but perhaps their reasons aren't exactly as strong as one other's, or the strength of their desire is harder to act on for one group than the other, so we can construct a set of options that will lead the different types of people to choose differently, even though they know this will set them apart. For instance, while an honest businessman and a conman would both like to say 'I'm an honest businessman', and to behave honestly if you watch them for five minutes, a conman would benefit less from expensive shopfront as an honest businessman, because his reputation is less valuable, so a brand is a signal of being honest.

Similarly, we could perhaps design situations where most harmful AIs would have a reasonable incentive to reveal themselves. This is difficult because so much is at stake, compared to the cost of the signal. But I haven't thought enough about it to rule it out, and I haven't seen anyone else do so either.

5. When would the 'conception of deception' take place?

Below the level of the best humans presumably, since we have already thought of all this.

6. Surveillance of the mind

Bostrom is not optimistic that AI researchers could protect much against risks by watching AI mental processes directly for signs of deception, rapid progress, or undesirable scheming. He says if the AI is smart it will realize this might happen, and hide the offending thoughts.

This seems an open question to me, for several reasons:

  1. Making your mental contents look innocuous while maintaining their semantic content sounds potentially very hard
  2. Especially for a creature which has only just become smart enough to realize it should treacherously turn
  3. From the AI's perspective, even if it is smart, surveillance could seem fairly unlikely, especially if we deceive it about its surroundings
As a consequence of 2, it seems better if the 'conception of deception' comes earlier.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How transparent are AI minds likely to be? Should we expect to be able to detect deception? What are the answers to these questions for different specific architectures and methods? This might be relevant.
  2. Are there other good ways to filter AIs with certain desirable goals from others? e.g. by offering them choices that would filter them.
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'malignant failure modes' (as opposed presumably to worse failure modes). To prepare, read “Malignant failure modes” from Chapter 8The discussion will go live at 6pm Pacific time next Monday December 1. Sign up to be notified here.

Superintelligence 10: Instrumentally convergent goals

6 KatjaGrace 18 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the tenth section in the reading guide: Instrumentally convergent goals. This corresponds to the second part of Chapter 7.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. And if you are behind on the book, don't let it put you off discussing. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

ReadingInstrumental convergence from Chapter 7 (p109-114)


Summary

  1. The instrumental convergence thesis: we can identify 'convergent instrumental values' (henceforth CIVs). That is, subgoals that are useful for a wide range of more fundamental goals, and in a wide range of situations. (p109)
  2. Even if we know nothing about an agent's goals, CIVs let us predict some of the agent's behavior (p109)
  3. Some CIVs:
    1. Self-preservation: because you are an excellent person to ensure your own goals are pursued in future.
    2. Goal-content integrity (i.e. not changing your own goals): because if you don't have your goals any more, you can't pursue them.
    3. Cognitive enhancement: because making better decisions helps with any goals.
    4. Technological perfection: because technology lets you have more useful resources.
    5. Resource acquisition: because a broad range of resources can support a broad range of goals.
  4. For each CIV, there are plausible combinations of final goals and scenarios under which an agent would not pursue that CIV. (p109-114)

Notes

1. Why do we care about CIVs?
CIVs to acquire resources and to preserve oneself and one's values play important roles in the argument for AI risk. The desired conclusions are that we can already predict that an AI would compete strongly with humans for resources, and also than an AI once turned on will go to great lengths to stay on and intact.

2. Related work
Steve Omohundro wrote the seminal paper on this topic. The LessWrong wiki links to all of the related papers I know of. Omohundro's list of CIVs (or as he calls them, 'basic AI drives') is a bit different from Bostrom's:

  1. Self-improvement
  2. Rationality
  3. Preservation of utility functions
  4. Avoiding counterfeit utility
  5. Self-protection
  6. Acquisition and efficient use of resources

3. Convergence for values and situations
It seems potentially helpful to distinguish convergence over situations and convergence over values. That is, to think of instrumental goals on two axes - one of how universally agents with different values would want the thing, and one of how large a range of situations it is useful in. A warehouse full of corn is useful for almost any goals, but only in the narrow range of situations where you are a corn-eating organism who fears an apocalypse (or you can trade it). A world of resources converted into computing hardware is extremely valuable in a wide range of scenarios, but much more so if you don't especially value preserving the natural environment. Many things that are CIVs for humans don't make it onto Bostrom's list, I presume because he expects the scenario for AI to be different enough. For instance, procuring social status is useful for all kinds of human goals. For an AI in the situation of a human, it would appear to also be useful. For an AI more powerful than the rest of the world combined, social status is less helpful.

4. What sort of things are CIVs?
Arguably all CIVs mentioned above could be clustered under 'cause your goals to control more resources'. This implies causing more agents to have your values (e.g. protecting your values in yourself), causing those agents to have resources (e.g. getting resources and transforming them into better resources) and getting the agents to control the resources effectively as well as nominally (e.g. cognitive enhancement, rationality). It also suggests convergent values we haven't mentioned. To cause more agents to have one's values, one might create or protect other agents with your values, or spread your values to existing other agents. To improve the resources held by those with one's values, a very convergent goal in human society is to trade. This leads to a convergent goal of creating or acquiring resources which are highly valued by others, even if not by you. Money and social influence are particularly widely redeemable 'resources'. Trade also causes others to act like they have your values when they don't, which is a way of spreading one's values. 

As I mentioned above, my guess is that these are left out of Superintelligence because they involve social interactions. I think Bostrom expects a powerful singleton, to whom other agents will be irrelevant. If you are not confident of the singleton scenario, these CIVs might be more interesting.

5. Another discussion
John Danaher discusses this section of Superintelligence, but not disagreeably enough to read as 'another view'. 

Another view

I don't know of any strong criticism of the instrumental convergence thesis, so I will play devil's advocate.

The concept of a sub-goal that is useful for many final goals is unobjectionable. However the instrumental convergence thesis claims more than this, and this stronger claim is important for the desired argument for AI doom. The further claims are also on less solid ground, as we shall see.

According to the instrumental convergence thesis, convergent instrumental goals not only exist, but can at least sometimes be identified by us. This is needed for arguing that we can foresee that AI will prioritize grabbing resources, and that it will be very hard to control. That we can identify convergent instrumental goals may seem clear - after all, we just did: self-preservation, intelligence enhancement and the like. However to say anything interesting, our claim must not only be that these values are better than not, but that they will be prioritized by the kinds of AI that will exist, in a substantial range of circumstances that will arise. This is far from clear, for several reasons.

Firstly, to know what the AI would prioritize we need to know something about its alternatives, and we can be much less confident that we have thought of all of the alternative instrumental values an AI might have. For instance, in the abstract intelligence enhancement may seem convergently valuable, but in practice adult humans devote little effort to it. This is because investments in intelligence are rarely competitive with other endeavors.

Secondly, we haven't said anything quantitative about how general or strong our proposed convergent instrumental values are likely to be, or how we are weighting the space of possible AI values. Without even any guesses, it is hard to know what to make of resulting predictions. The qualitativeness of the discussion also raises the concern that thinking on the problem has not been very concrete, and so may not be engaged with what is likely in practice.

Thirdly, we have arrived at these convergent instrumental goals by theoretical arguments about what we think of as default rational agents and 'normal' circumstances. These may be very different distributions of agents and scenarios from those produced by our engineering efforts. For instance, perhaps almost all conceivable sets of values - in whatever sense - would favor accruing resources ruthlessly. It would still not be that surprising if an agent somehow created noisily from human values cared about only acquiring resources by certain means or had blanket ill-feelings about greed.

In sum, it is unclear that we can identify important convergent instrumental values, and consequently unclear that such considerations can strongly help predict the behavior of real future AI agents.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Do approximately all final goals make an optimizer want to expand beyond the cosmological horizon?
  2. Can we say anything more quantitative about the strength or prevalence of these convergent instrumental values?
  3. Can we say more about values that are likely to be convergently instrumental just across AIs that are likely to be developed, and situations they are likely to find themselves in?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the treacherous turn. To prepare, read “Existential catastrophe…” and “The treacherous turn” from Chapter 8The discussion will go live at 6pm Pacific time next Monday 24th November. Sign up to be notified here.

AI caught by a module that counterfactually doesn't exist

9 Stuart_Armstrong 17 November 2014 05:49PM

EDIT: Whenever I use colloquial phrases like "the AI believes a (false) X" I mean that we are using utility indifference to accomplish that goal, without actually giving the AI false beliefs.

A big problem with most AI control ideas is that the AI will route around them. Suppose you try to ensure that the AI stays friendly by - say - requiring it to ensure that the world's stock markets continue to function. Then a badly motivated AI will cause the destruction it was going to cause anyway - except it will also create some system that technically counts as "the world's stock markets" and ensure that it "continues to function".

I think we can use counterfactuals and corrigibility (AI-acceptable motivation changes) to partially get around the problem, in some circumstances. Suppose we had a candidate AI, with utility U, that we thought was friendly - but that we weren't sure. We have a subgoal, say SG="ensuring that the world's stock markets continue to function". We're pretty sure that if the AI is friendly, then SG will happen. We're pretty sure that if the AI were unfriendly, and the AI did not have SG added as an objective, then SG wouldn't happen. I deliberately chose a rather rough subgoal to illustrate this.

So we then add a module M to the AI. The task of M, which it has chosen to accept, is to look into the future, using the AI's predictive abilities, and see if SG is accomplished. Crucially, it will do so under the assumption that the module M does not exist (or wasn't activated, or will soon be overwritten, or will fail to do anything, or so on). If M finds that SG is not accomplished, it will turn the AI off (in a way compatible with the AI's motivations - see corrigibility).

How is this different from the usual setup? Well, a generic FAI would accomplish SG (by assumption - we may need to work a bit on this part). A generic UFAI without M wouldn't accomplish SG. A generic UFAI with M would accomplish SG, as this prevents it from being turned off.

But a generic UFAI with M which thought that M would have no impact, would behave exactly as a generic UFAI without M - and wouldn't aim to accomplish SG, and thus would be turned off by M.

Note that M can only be used as a last test for a putatively friendly AI. We cannot use it if we are continually refining AIs (either through evolution or simply redesign) as then we are optimising for SG, and SG is a poor goal to be aiming for (many, many UFAI have SG as a goal - it's just that a generic one won't). Similarly, we can't use a unconstrained search to find such an AI.

I wonder if this idea can be extended. Suggestions?

Musk on AGI Timeframes

18 Artaxerxes 17 November 2014 01:36AM

Elon Musk submitted a comment to edge.org a day or so ago, on this article. It was later removed.

The pace of progress in artificial intelligence (I'm not referring to narrow AI) is incredibly fast. Unless you have direct exposure to groups like Deepmind, you have no idea how fast-it is growing at a pace close to exponential. The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most. This is not a case of crying wolf about something I don't understand.

I am not alone in thinking we should be worried. The leading AI companies have taken great steps to ensure safety. The recognize the danger, but believe that they can shape and control the digital superintelligences and prevent bad ones from escaping into the Internet. That remains to be seen...


Now Elon has been making noises about AI safety lately in general, including for example mentioning Bostrom's Superintelligence on twitter. But this is the first time that I know of that he's come up with his own predictions of the timeframes involved, and I think his are rather quite soon compared to most. 

The risk of something seriously dangerous happening is in the five year timeframe. 10 years at most.

We can compare this to MIRI's post in May this year, When Will AI Be Created, which illustrates that it seems reasonable to think of AI as being further away, but also that there is a lot of uncertainty on the issue.

Of course, "something seriously dangerous" might not refer to full blown superintelligent uFAI - there's plenty of space for disasters of magnitude in between the range of the 2010 flash crash and clippy turning the universe into paperclips to occur.

In any case, it's true that Musk has more "direct exposure" to those on the frontier of AGI research than your average person, and it's also true that he has an audience, so I think there is some interest to be found in his comments here.

 

My new paper: Concept learning for safe autonomous AI

18 Kaj_Sotala 15 November 2014 07:17AM

Abstract: Sophisticated autonomous AI may need to base its behavior on fuzzy concepts that cannot be rigorously defined, such as well-being or rights. Obtaining desired AI behavior requires a way to accurately specify these concepts. We review some evidence suggesting that the human brain generates its concepts using a relatively limited set of rules and mechanisms. This suggests that it might be feasible to build AI systems that use similar criteria and mechanisms for generating their own concepts, and could thus learn similar concepts as humans do. We discuss this possibility, and also consider possible complications arising from the embodied nature of human thought, possible evolutionary vestiges in cognition, the social nature of concepts, and the need to compare conceptual representations between humans and AI systems.

I just got word that this paper was accepted for the AAAI-15 Workshop on AI and Ethics: I've uploaded a preprint here. I'm hoping that this could help seed a possibly valuable new subfield of FAI research. Thanks to Steve Rayhawk for invaluable assistance while I was writing this paper: it probably wouldn't have gotten done without his feedback motivating me to work on this.

Comments welcome. 

The germ of an idea

6 Stuart_Armstrong 13 November 2014 06:58PM

Apologies for posting another unformed idea, but I think it's important to get it out there.

The problem with dangerous AI is that it's intelligent, and thus adapts to our countermeasures. If we did something like plant a tree and order the AI not to eat the apple on it, as a test of its obedience, it would easily figure out what we were doing, and avoid the apple (until it had power over us), even if it were a treacherous apple-devouring AI of DOOM.

When I wrote the AI indifference paper, it seemed that it showed a partial way around this problem: the AI would become indifferent to a particular countermeasure (in that example, explosives), so wouldn't adapt its behaviour around it. It seems that the same idea can make an Oracle not attempt to manipulate us through its answers, by making it indifferent as to whether the message was read.

The ideas I'm vaguely groping towards is whether this is a general phenomena - whether we can use indifference to prevent the AI from adapting to any of our efforts. The second question is whether we can profitably use it on the AI's motivation itself. Something like the reduced impact AI reasoning about what impact it could have on the world. This has a penalty function for excessive impact - but maybe that's gameable, maybe there is a pernicious outcome that doesn't have a high penalty, if the AI aims for it exactly. But suppose the AI could calculate its impact under the assumption that it didn't have a penalty function (utility indifference is often equivalent to having incorrect beliefs, but less fragile than that).

So if it was a dangerous AI, it would calculate its impact as if it didn't have a penalty function (and hence no need to route around it), and thus would calculate a large impact, and get penalised by it.

My next post will be more structured, but I feel there's the germ of a potentially very useful idea there. Comments and suggestions welcome.

What's special about a fantastic outcome? Suggestions wanted.

0 Stuart_Armstrong 11 November 2014 11:04AM

I've been returning to my "reduced impact AI" approach, and currently working on some idea.

What I need is some ideas on features that might distinguish between an excellent FAI outcome, and a disaster. The more abstract and general the ideas, the better. Anyone got some suggestions? Don't worry about quality at this point, originality is more prized!

I'm looking for something generic that is easy to measure. At a crude level, if the only options were "papercliper" vs FAI, then we could distinguish those worlds by counting steel content.

So basically some more or less objective measure that has a higher proportion of good outcomes than the baseline.

Superintelligence 9: The orthogonality of intelligence and goals

8 KatjaGrace 11 November 2014 02:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the ninth section in the reading guideThe orthogonality of intelligence and goals. This corresponds to the first section in Chapter 7, 'The relation between intelligence and motivation'.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: 'The relation between intelligence and motivation' (p105-8)


Summary

  1. The orthogonality thesis: intelligence and final goals are orthogonal: more or less any level of intelligence could in principle be combined with more or less any final goal (p107)
  2. Some qualifications to the orthogonality thesis: (p107)
    1. Simple agents may not be able to entertain some goals
    2. Agents with desires relating to their intelligence might alter their intelligence
  3. The motivations of highly intelligent agents may nonetheless be predicted (p108):
    1. Via knowing the goals the agent was designed to fulfil
    2. Via knowing the kinds of motivations held by the agent's 'ancestors'
    3. Via finding instrumental goals that an agent with almost any ultimate goals would desire (e.g. to stay alive, to control money)

Another view

John Danaher at Philosophical Disquisitions starts a series of posts on Superintelligence with a somewhat critical evaluation of the orthogonality thesis, in the process contributing a nice summary of nearby philosophical debates. Here is an excerpt, entitled 'is the orthogonality thesis plausible?':

At first glance, the orthogonality thesis seems pretty plausible. For example, the idea of a superintelligent machine whose final goal is to maximise the number of paperclips in the world (the so-called paperclip maximiser) seems to be logically consistent. We can imagine — can’t we? — a machine with that goal and with an exceptional ability to utilise the world’s resources in pursuit of that goal. Nevertheless, there is at least one major philosophical objection to it.

We can call it the motivating belief objection. It works something like this:

Motivating Belief Objection: There are certain kinds of true belief about the world that are necessarily motivating, i.e. as soon as an agent believes a particular fact about the world they will be motivated to act in a certain way (and not motivated to act in other ways). If we assume that the number of true beliefs goes up with intelligence, it would then follow that there are certain goals that a superintelligent being must have and certain others that it cannot have.

A particularly powerful version of the motivating belief objection would combine it with a form of moral realism. Moral realism is the view that there are moral facts “out there” in the world waiting to be discovered. A sufficiently intelligent being would presumably acquire more true beliefs about those moral facts. If those facts are among the kind that are motivationally salient — as several moral theorists are inclined to believe — then it would follow that a sufficiently intelligent being would act in a moral way. This could, in turn, undercut claims about a superintelligence posing an existential threat to human beings (though that depends, of course, on what the moral truth really is).

The motivating belief objection is itself vulnerable to many objections. For one thing, it goes against a classic philosophical theory of human motivation: the Humean theory. This comes from the philosopher David Hume, who argued that beliefs are motivationally inert. If the Humean theory is true, the motivating belief objection fails. Of course, the Humean theory may be false and so Bostrom wisely avoids it in his defence of the orthogonality thesis. Instead, he makes three points. First, he claims that orthogonality would still hold if final goals are overwhelming, i.e. if they trump the motivational effect of motivating beliefs. Second, he argues that intelligence (as he defines it) may not entail the acquisition of such motivational beliefs. This is an interesting point. Earlier, I assumed that the better an agent is at means-end reasoning, the more likely it is that its beliefs are going to be true. But maybe this isn’t necessarily the case. After all, what matters for Bostrom’s definition of intelligence is whether the agent is getting what it wants, and it’s possible that an agent doesn’t need true beliefs about the world in order to get what it wants. A useful analogy here might be with Plantinga’s evolutionary argument against naturalism. Evolution by natural selection is a means-end process par excellence: the “end” is survival of the genes, anything that facilitates this is the “means”. Plantinga argues that there is nothing about this process that entails the evolution of cognitive mechanisms that track true beliefs about the world. It could be that certain false beliefs increase the probability of survival. Something similar could be true in the case of a superintelligent machine. The third point Bostrom makes is that a superintelligent machine could be created with no functional analogues of what we call “beliefs” and “desires”. This would also undercut the motivating belief objection.

What do we make of these three responses? They are certainly intriguing. My feeling is that the staunch moral realist will reject the first one. He or she will argue that moral beliefs are most likely to be motivationally overwhelming, so any agent that acquired true moral beliefs would be motivated to act in accordance with them (regardless of their alleged “final goals”). The second response is more interesting. Plantinga’s evolutionary objection to naturalism is, of course, hotly contested. Many argue that there are good reasons to think that evolution would create truth-tracking cognitive architectures. Could something similar be argued in the case of superintelligent AIs? Perhaps. The case seems particularly strong given that humans would be guiding the initial development of AIs and would, presumably, ensure that they were inclined to acquire true beliefs about the world. But remember Bostrom’s point isn’t that superintelligent AIs would never acquire true beliefs. His point is merely that high levels of intelligence may not entail the acquisition of true beliefs in the domains we might like. This is a harder claim to defeat. As for the third response, I have nothing to say. I have a hard time imagining an AI with no functional analogues of a belief or desire (especially since what counts as a functional analogue of those things is pretty fuzzy), but I guess it is possible.

One other point I would make is that — although I may be inclined to believe a certain version of the moral motivating belief objection — I am also perfectly willing to accept that the truth value of that objection is uncertain. There are many decent philosophical objections to motivational internalism and moral realism. Given this uncertainty, and given the potential risks involved with the creation of superintelligent AIs, we should probably proceed for the time being “as if” the orthogonality thesis is true.

Notes

1. Why care about the orthogonality thesis?
We are interested in an argument which says that AI might be dangerous, because it might be powerful and motivated by goals very far from our own. An occasional response to this is that if a creature is sufficiently intelligent, it will surely know things like which deeds are virtuous and what one ought do. Thus a sufficiently powerful AI cannot help but be kind to us. This is closely related to the position of the moral realist: that there are facts about what one ought do, which can be observed (usually mentally). 

So the role of the orthogonality thesis in the larger argument is to rule out the possibility that strong artificial intelligence will automatically be beneficial to humans, by virtue of being so clever. For this purpose, it seems a fairly weak version of the orthogonality thesis is needed. For instance, the qualifications discussed do not seem to matter. Even if one's mind needs to be quite complex to have many goals, there is little reason to expect the goals of more complex agents to be disproportionately human-friendly. Also the existence of goals which would undermine intelligence doesn't seem to affect the point.

2. Is the orthogonality thesis necessary?
If we talked about specific capabilities instead of 'intelligence' I suspect the arguments for AI risk could be made similarly well, without anyone being tempted to disagree with the analogous orthogonality theses for those skills. For instance, does anyone believe that a sufficiently good automated programming algorithm will come to appreciate true ethics? 

3. Some writings on the orthogonality thesis which I haven't necessarily read
The Superintelligent Will by Bostrom; Arguing the orthogonality thesis by Stuart Armstrong; Moral Realism, as discussed by lots of people, John Danaher blogs twice

4. 'It might be impossible for a very unintelligent system to have very complex motivations'
If this is so, it seems something more general is true. For any given degree of mental complexity substantially less than that of the universe, almost all values cannot be had by any agent with that degree of complexity or less. You can see this by comparing the number of different states the universe could be in (and thus which one might in principle have as one's goal) to the number of different minds with less than the target level of complexity. Intelligence and complexity are not the same, and perhaps you can be very complex while stupid by dedicating most of your mind to knowing about your complicated goals, but if you think about things this way, then the original statement is also less plausible.

5. How do you tell if two entities with different goals have the same intelligence? Suppose that I want to write award-winning non-fiction books and you want to be a successful lawyer. If we both just work on the thing we care about, how can anyone tell who is better in general? One nice way to judge is to artificially give us both the same instrumental goal, on which our intelligence can be measured. e.g. pay both of us thousands of dollars per correct question on an IQ test, which we could put toward our goals.

Note that this means we treat each person as having a fixed degree of intelligence across tasks. If I do well on the IQ test yet don't write many books, we would presumably say that writing books is just hard. This might work poorly as a model, if for instance people who did worse on the IQ test often wrote more books than me.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. Are there interesting axes other than morality on which orthogonality may be false? That is, are there other ways the values of more or less intelligent agents might be constrained?
  2. Is moral realism true? (An old and probably not neglected one, but perhaps you have a promising angle)
  3. Investigate whether the orthogonality thesis holds for simple models of AI.
  4. To what extent can agents with values A be converted into agents with values B with appropriate institutions or arrangements?
  5. Sure, “any level of intelligence could in principle be combined with more or less any final goal,” but what kinds of general intelligences are plausible? Should we expect some correlation between level of intelligence and final goals in de novo AI? How true is this in humans, and in WBEs?

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about instrumentally convergent goals. To prepare, read 'Instrumental convergence' from Chapter 7The discussion will go live at 6pm Pacific time next Monday November 17. Sign up to be notified here.

Superintelligence 8: Cognitive superpowers

7 KatjaGrace 04 November 2014 02:01AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the eighth section in the reading guideCognitive Superpowers. This corresponds to Chapter 6.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 6


Summary

  1. AI agents might have very different skill profiles.
  2. AI with some narrow skills could produce a variety of other skills. e.g. strong AI research skills might allow an AI to build its own social skills.
  3. 'Superpowers' that might be particularly important for an AI that wants to take control of the world include:
    1. Intelligence amplification: for bootstrapping its own intelligence
    2. Strategizing: for achieving distant goals and overcoming opposition
    3. Social manipulation: for escaping human control, getting support, and encouraging desired courses of action
    4. Hacking: for stealing hardware, money and infrastructure; for escaping human control
    5. Technology research: for creating military force, surveillance, or space transport
    6. Economic productivity: for making money to spend on taking over the world
  4. These 'superpowers' are relative to other nearby agents; Bostrom means them to be super only if they substantially exceed the combined capabilities of the rest of the global civilization.
  5. A takeover scenario might go like this:
    1. Pre-criticality: researchers make a seed-AI, which becomes increasingly helpful at improving itself
    2. Recursive self-improvement: seed-AI becomes main force for improving itself and brings about an intelligence explosion. It perhaps develops all of the superpowers it didn't already have.
    3. Covert preparation: the AI makes up a robust long term plan, pretends to be nice, and escapes from human control if need be.
    4. Overt implementation: the AI goes ahead with its plan, perhaps killing the humans at the outset to remove opposition.
  6. Wise Singleton Sustainability Threshold (WSST): a capability set exceeds this iff a wise singleton with that capability set would be able to take over much of the accessible universe. 'Wise' here means being patient and savvy about existential risks, 'singleton' means being internally coordinated and having no opponents.
  7. The WSST appears to be low. e.g. our own intelligence is sufficient, as would some skill sets be that were strong in only a few narrow areas.
  8. The cosmic endowment (what we could do with the matter and energy that might ultimately be available if we colonized space) is at least about 10^85 computational operations. This is equivalent to 10^58 emulated human lives.

Another view

Bostrom starts the chapter claiming that humans' dominant position comes from their slightly expanded set of cognitive functions relative to other animals. Computer scientist Ernest Davis criticizes this claim in a recent review of Superintelligence:

The assumption that a large gain in intelligence would necessarily entail a correspondingly large increase in power. Bostrom points out that what he calls a comparatively small increase in brain size and complexity resulted in mankind’s spectacular gain in physical power. But he ignores the fact that the much larger increase in brain size and complexity that preceded the appearance in man had no such effect. He says that the relation of a supercomputer to man will be like the relation of a man to a mouse, rather than like the relation of Einstein to the rest of us; but what if it is like the relation of an elephant to a mouse?

Notes

1. How does this model of AIs with unique bundles of 'superpowers' fit with the story we have heard so far?

Earlier it seemed we were just talking about a single level of intelligence that was growing, whereas now it seems there are potentially many distinct intelligent skills that might need to be developed. Does our argument so far still work out, if an agent has a variety of different sorts of intelligence to be improving?

If you recall, the main argument so far was that AI might be easy (have 'low recalcitrance') mostly because there is a lot of hardware and content sitting around and algorithms might randomly happen to be easy. Then more effort ('optimization power') will be spent on AI as it became evidently important. Then much more effort again will be spent when the AI becomes a large source of labor itself. This was all taken to suggest that AI might progress very fast from human-level to superhuman level, which suggests that one AI agent might get far ahead before anyone else catches up, suggesting that one AI might seize power. 

It seems to me that this argument works a bit less well with a cluster of skills than one central important skill, though it is a matter of degree and the argument was only qualitative to begin with.

It is less likely that AI algorithms will happen to be especially easy if a lot of different algorithms are needed. Also, if different cognitive skills are developed at somewhat different times, then it's harder to imagine a sudden jump when a fully capable AI suddenly reads the whole internet or becomes a hugely more valuable use for hardware than anything being run already. Then if there are many different projects needed for making an AI smarter in different ways, the extra effort (brought first by human optimism and then by self-improving AI) must be divided between those projects. If a giant AI could dedicate its efforts to improving some central feature that would improve all of its future efforts (like 'intelligence'), then it would do much better than if it has to devote one one thousandth of its efforts to each of a thousand different sub-skills, each of which is only relevant for a few niche circumstances. Overall it seems AI must progress slower if its success is driven by more distinct dedicated skills.

2. The 'intelligence amplification' superpower seems much more important than the others. It directly leads to an intelligence explosion - a key reason we have seen so far to expect anything exciting to happen with AI - while several others just allow one-off grabbing of resources (e.g. social manipulation and hacking). Note that this suggests an intelligence explosion could happen with only this superpower, well before an AI appeared to be human-level.

3. Box 6 outlines a specific AI takeover scenario. A bunch of LessWrongers thought about other possibilities in this post.

4. Bostrom mentions that social manipulation could allow a 'boxed' AI to persuade its gatekeepers to let it out. Some humans have tried to demonstrate that this is a serious hazard by simulating the interaction using only an intelligent human in the place of the AI, in the 'AI box experiment'. Apparently in both 'official' efforts the AI escaped, though there have been other trials where the human won.

5. How to measure intelligence

Bostrom pointed to some efforts to design more general intelligence metrics:

Legg: intelligence is measured in terms of reward in all reward-summable environments, weighted by complexity of the environment.

Hibbard: intelligence is measured in terms of the hardest environment you can pass, in a hierarchy of increasingly hard environments

Dowe and Hernández-Orallo have several papers on the topic, and summarize some other efforts. I haven't looked at them enough to summarize.

The Turing Test is the most famous test of machine intelligence. However it only tests whether a machine is at a specific level so isn't great for fine-grained measurement of other levels of intelligence. It is also often misunderstood to measure just whether a machine can conduct a normal chat like a human, rather than whether it can respond as capably as a human to anything you can ask it.

For some specific cognitive skills, there are other measures already. e.g. 'economic productivity' can be measured crudely in terms of profits made. Others seem like they could be developed without too much difficulty. e.g. Social manipulation could be measured in terms of probabilities of succeeding at manipulation tasks - this test doesn't exist as far as I know, but it doesn't seem prohibitively difficult to make.

6. Will we be able to colonize the stars?

Nick Beckstead looked into it recently. Summary: probably.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, almost entirely taken from Luke Muehlhauser's list, without my looking into them further.

  1. Try to develop metrics for specific important cognitive abilities, including general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc.
  2. What is the construct validity of non-anthropomorphic intelligence measures? In other words, are there convergently instrumental prediction and planning algorithms? E.g. can one tend to get agents that are good at predicting economies but not astronomical events? Or do self-modifying agents in a competitive environment tend to converge toward a specific stable attractor in general intelligence space? 
  3. Scenario analysis: What are some concrete AI paths to influence over world affairs? See project guide here.
  4. How much of humanity’s cosmic endowment can we plausibly make productive use of given AGI? One way to explore this question is via various follow-ups to Armstrong & Sandberg (2013). Sandberg lists several potential follow-up studies in this interview, for example (1) get more precise measurements of the distribution of large particles in interstellar and intergalactic space, and (2) analyze how well different long-term storable energy sources scale. See Beckstead (2014).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the orthogonality of intelligence and goals, section 9. To prepare, read The relation between intelligence and motivation from Chapter 7The discussion will go live at 6pm Pacific time next Monday November 10. Sign up to be notified here.

[Link]"Neural Turing Machines"

16 Prankster 31 October 2014 08:54AM

The paper.

Discusses the technical aspects of one of Googles AI projects. According to a pcworld the system "apes human memory and programming skills" (this article seems pretty solid, also contains link to the paper). 

The abstract:

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

 

(First post here, feedback on the appropriateness of the post appreciated)

Superintelligence 7: Decisive strategic advantage

8 KatjaGrace 28 October 2014 01:01AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the seventh section in the reading guideDecisive strategic advantage. This corresponds to Chapter 5.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 5 (p78-91)


Summary

  1. Question: will a single artificial intelligence project get to 'dictate the future'? (p78)
  2. We can ask, will a project attain a 'decisive strategic advantage' and will they use this to make a 'singleton'?
    1. 'Decisive strategic advantage' = a level of technological and other advantages sufficient for complete world domination (p78)
    2. 'Singleton' = a single global decision-making agency strong enough to solve all major global coordination problems (p78, 83)
  3. A project will get a decisive strategic advantage if there is a big enough gap between its capability and that of other projects. 
  4. A faster takeoff would make this gap bigger. Other factors would too, e.g. diffusion of ideas, regulation or expropriation of winnings, the ease of staying ahead once you are far enough ahead, and AI solutions to loyalty issues (p78-9)
  5. For some historical examples, leading projects have a gap of a few months to a few years with those following them. (p79)
  6. Even if a second project starts taking off before the first is done, the first may emerge decisively advantageous. If we imagine takeoff accelerating, a project that starts out just behind the leading project might still be far inferior when the leading project reaches superintelligence. (p82)
  7. How large would a successful project be? (p83) If the route to superintelligence is not AI, the project probably needs to be big. If it is AI, size is less clear. If lots of insights are accumulated in open resources, and can be put together or finished by a small team, a successful AI project might be quite small (p83).
  8. We should distinguish the size of the group working on the project, and the size of the group that controls the project (p83-4)
  9. If large powers anticipate an intelligence explosion, they may want to monitor those involved and/or take control. (p84)
  10. It might be easy to monitor very large projects, but hard to trace small projects designed to be secret from the outset. (p85)
  11. Authorities may just not notice what's going on, for instance if politically motivated firms and academics fight against their research being seen as dangerous. (p85)
  12. Various considerations suggest a superintelligence with a decisive strategic advantage would be more likely than a human group to use the advantage to form a singleton (p87-89)

Another view

This week, Paul Christiano contributes a guest sub-post on an alternative perspective:

Typically new technologies do not allow small groups to obtain a “decisive strategic advantage”—they usually diffuse throughout the whole world, or perhaps are limited to a single country or coalition during war. This is consistent with intuition: a small group with a technological advantage will still do further research slower than the rest of the world, unless their technological advantage overwhelms their smaller size.

The result is that small groups will be overtaken by big groups. Usually the small group will sell or lease their technology to society at large first, since a technology’s usefulness is proportional to the scale at which it can be deployed. In extreme cases such as war these gains might be offset by the cost of empowering the enemy. But even in this case we expect the dynamics of coalition-formation to increase the scale of technology-sharing until there are at most a handful of competing factions.

So any discussion of why AI will lead to a decisive strategic advantage must necessarily be a discussion of why AI is an unusual technology.

In the case of AI, the main difference Bostrom highlights is the possibility of an abrupt increase in productivity. In order for a small group to obtain such an advantage, their technological lead must correspond to a large productivity improvement. A team with a billion dollar budget would need to secure something like a 10,000-fold increase in productivity in order to outcompete the rest of the world. Such a jump is conceivable, but I consider it unlikely. There are other conceivable mechanisms distinctive to AI; I don’t think any of them have yet been explored in enough depth to be persuasive to a skeptical audience.


Notes

1. Extreme AI capability does not imply strategic advantage. An AI program could be very capable - such that the sum of all instances of that AI worldwide were far superior (in capability, e.g. economic value) to the rest of humanity's joint efforts - and yet the AI could fail to have a decisive strategic advantage, because it may not be a strategic unit. Instances of the AI may be controlled by different parties across society. In fact this is the usual outcome for technological developments.

2. On gaps between the best AI project and the second best AI project (p79) A large gap might develop either because of an abrupt jump in capability or extremely fast progress (which is much like an abrupt jump), or from one project having consistent faster growth than other projects for a time. Consistently faster progress is a bit like a jump, in that there is presumably some particular highly valuable thing that changed at the start of the fast progress. Robin Hanson frames his Foom debate with Eliezer as about whether there are 'architectural' innovations to be made, by which he means innovations which have a large effect (or so I understood from conversation). This seems like much the same question. On this, Robin says:

Yes, sometimes architectural choices have wider impacts. But I was an artificial intelligence researcher for nine years, ending twenty years ago, and I never saw an architecture choice make a huge difference, relative to other reasonable architecture choices. For most big systems, overall architecture matters a lot less than getting lots of detail right. Researchers have long wandered the space of architectures, mostly rediscovering variations on what others found before.

3. What should activists do? Bostrom points out that activists seeking maximum expected impact might wish to focus their planning on high leverage scenarios, where larger players are not paying attention (p86). This is true, but it's worth noting that changing the probability of large players paying attention is also an option for activists, if they think the 'high leverage scenarios' are likely to be much better or worse.

4. Trade. One key question seems to be whether successful projects are likely to sell their products, or hoard them in the hope of soon taking over the world. I doubt this will be a strategic decision they will make - rather it seems that one of these options will be obviously better given the situation, and we are uncertain about which. A lone inventor of writing should probably not have hoarded it for a solitary power grab, even though it could reasonably have seemed like a good candidate for radically speeding up the process of self-improvement.

5. Disagreement. Note that though few people believe that a single AI project will get to dictate the future, this is often because they disagree with things in the previous chapter - e.g. that a single AI project will plausibly become more capable than the world in the space of less than a month.

6. How big is the AI project? Bostrom distinguishes between the size of the effort to make AI and the size of the group ultimately controlling its decisions. Note that the people making decisions for the AI project may also not be the people making decisions for the AI - i.e. the agents that emerge. For instance, the AI making company might sell versions of their AI to a range of organizations, modified for their particular goals. While in some sense their AI has taken over the world, the actual agents are acting on behalf of much of society.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

 

  1. When has anyone gained a 'decisive strategic advantage' at a smaller scale than the world? Can we learn anything interesting about what characteristics a project would need to have such an advantage with respect to the world?
  2. How scalable is innovative project secrecy? Examine past cases: Manhattan project, Bletchly park, Bitcoin, Anonymous, Stuxnet, Skunk Works, Phantom Works, Google X.
  3. How large are the gaps in development time between modern software projects? What dictates this? (e.g. is there diffusion of ideas from engineers talking to each other? From people changing organizations? Do people get far enough ahead that it is hard to follow them?)

 

If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about Cognitive superpowers (section 8). To prepare, read Chapter 6The discussion will go live at 6pm Pacific time next Monday 3 November. Sign up to be notified here.

Superintelligence 6: Intelligence explosion kinetics

9 KatjaGrace 21 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the sixth section in the reading guideIntelligence explosion kinetics. This corresponds to Chapter 4 in the book, of a similar name. This section is about how fast a human-level artificial intelligence might become superintelligent.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 4 (p62-77)


Summary

  1. Question: If and when a human-level general machine intelligence is developed, how long will it be from then until a machine becomes radically superintelligent? (p62)
  2. The following figure from p63 illustrates some important features in Bostrom's model of the growth of machine intelligence. He envisages machine intelligence passing human-level, then at some point reaching the level where most inputs to further intelligence growth come from the AI itself ('crossover'), then passing the level where a single AI system is as capable as all of human civilization, then reaching 'strong superintelligence'. The shape of the curve is probably intended an example rather than a prediction.
  3. A transition from human-level machine intelligence to superintelligence might be categorized into one of three scenarios: 'slow takeoff' takes decades or centuries, 'moderate takeoff' takes months or years and 'fast takeoff' takes minutes to days. Which scenario occurs has implications for the kinds of responses that might be feasible.
  4. We can model improvement in a system's intelligence with this equation:

    Rate of change in intelligence = Optimization power/Recalcitrance

    where 'optimization power' is effort being applied to the problem, and 'recalcitrance' is how hard it is to make the system smarter by applying effort.
  5. Bostrom's comments on recalcitrance of different methods of increasing kinds of intelligence:
    1. Cognitive enhancement via public health and diet: steeply diminishing returns (i.e. increasing recalcitrance)
    2. Pharmacological enhancers: diminishing returns, but perhaps there are still some easy wins because it hasn't had a lot of attention.
    3. Genetic cognitive enhancement: U-shaped recalcitrance - improvement will become easier as methods improve, but then returns will decline. Overall rates of growth are limited by maturation taking time.
    4. Networks and organizations: for organizations as a whole recalcitrance is high. A vast amount of effort is spent on this, and the world only becomes around a couple of percent more productive per year. The internet may have merely moderate recalcitrance, but this will likely increase as low-hanging fruits are depleted.
    5. Whole brain emulation: recalcitrance is hard to evaluate, but emulation of an insect will make the path much clearer. After human-level emulations arrive, recalcitrance will probably fall, e.g. because software manipulation techniques will replace physical-capital intensive scanning and image interpretation efforts as the primary ways to improve the intelligence of the system. Also there will be new opportunities for organizing the new creatures. Eventually diminishing returns will set in for these things. Restrictive regulations might increase recalcitrance.
    6. AI algorithms: recalcitrance is hard to judge. It could be very low if a single last key insight is discovered when much else is ready. Overall recalcitrance may drop abruptly if a low-recalcitrance system moves out ahead of higher recalcitrance systems as the most effective method for solving certain problems. We might overestimate the recalcitrance of sub-human systems in general if we see them all as just 'stupid'.
    7. AI 'content': recalcitrance might be very low because of the content already produced by human civilization, e.g. a smart AI might read the whole internet fast, and so become much better.
    8. Hardware (for AI or uploads): potentially low recalcitrance. A project might be scaled up by orders of magnitude by just purchasing more hardware. In the longer run, hardware tends to improve according to Moore's law, and the installed capacity might grow quickly if prices rise due to a demand spike from AI.
  6. Optimization power will probably increase after AI reaches human-level, because its newfound capabilities will attract interest and investment.
  7. Optimization power would increase more rapidly if AI reaches the 'crossover' point, when much of the optimization power is coming from the AI itself. Because smarter machines can improve their intelligence more than less smart machines, after the crossover a 'recursive self improvement' feedback loop would kick in.
  8. Thus optimization power is likely to increase during the takeoff, and this alone could produce a fast or medium takeoff. Further, recalcitrance is likely to decline. Bostrom concludes that a fast or medium takeoff looks likely, though a slow takeoff cannot be excluded.

Notes

1. The argument for a relatively fast takeoff is one of the most controversial arguments in the book, so it deserves some thought. Here is my somewhat formalized summary of the argument as it is presented in this chapter. I personally don't think it holds, so tell me if that's because I'm failing to do it justice. The pink bits are not explicitly in the chapter, but are assumptions the argument seems to use.

  1. Growth in intelligence  =  optimization power /  recalcitrance                                                  [true by definition]
  2. Recalcitrance of AI research will probably drop or be steady when AI reaches human-level               (p68-73)
  3. Optimization power spent on AI research will increase after AI reaches human level                         (p73-77)
  4. Optimization/Recalcitrance will stay similarly high for a while prior to crossover
  5. A 'high' O/R ratio prior to crossover will produce explosive growth OR crossover is close
  6. Within minutes to years, human-level intelligence will reach crossover                                           [from 1-5]
  7. Optimization power will climb ever faster after crossover, in line with the AI's own growing capacity     (p74)
  8. Recalcitrance will not grow much between crossover and superintelligence
  9. Within minutes to years, crossover-level intelligence will reach superintelligence                           [from 7 and 8]
  10. Within minutes to years, human-level AI will likely transition to superintelligence           [from 6 and 9]

Do you find this compelling? Should I have filled out the assumptions differently?

***

2. Other takes on the fast takeoff 

It seems to me that 5 above is the most controversial point. The famous Foom Debate was a long argument between Eliezer Yudkowsky and Robin Hanson over the plausibility of fast takeoff, among other things. Their arguments were mostly about both arms of 5, as well as the likelihood of an AI taking over the world (to be discussed in a future week). The Foom Debate included a live verbal component at Jane Street Capital: blog summaryvideotranscript. Hanson more recently reviewed Superintelligence, again criticizing the plausibility of a single project quickly matching the capacity of the world.

Kevin Kelly criticizes point 5 from a different angle: he thinks that speeding up human thought can't speed up progress all that much, because progress will quickly bottleneck on slower processes.

Others have compiled lists of criticisms and debates here and here.

3. A closer look at 'crossover'

Crossover is 'a point beyond which the system's further improvement is mainly driven by the system's own actions rather than by work performed upon it by others'. Another way to put this, avoiding certain ambiguities, is 'a point at which the inputs to a project are mostly its own outputs', such that improvements to its outputs feed back into its inputs. 

The nature and location of such a point seems an interesting and important question. If you think crossover is likely to be very nearby for AI, then you need only worry about the recursive self-improvement part of the story, which kicks in after crossover. If you think it will be very hard for an AI project to produce most of its own inputs, you may want to pay more attention to the arguments about fast progress before that point.

To have a concrete picture of crossover, consider Google. Suppose Google improves their search product such that one can find a thing on the internet a radical 10% faster. This makes Google's own work more effective, because people at Google look for things on the internet sometimes. How much more effective does this make Google overall? Maybe they spend a couple of minutes a day doing Google searches, i.e. 0.5% of their work hours, for an overall saving of .05% of work time. This suggests their next improvements made at Google will be made 1.0005 faster than the last. It will take a while for this positive feedback to take off. If Google coordinated your eating and organized your thoughts and drove your car for you and so on, and then Google improved efficiency using all of those services by 10% in one go, then this might make their employees close to 10% more productive, which might produce more noticeable feedback. Then Google would have reached the crossover. This is perhaps easier to imagine for Google than other projects, yet I think still fairly hard to imagine.

Hanson talks more about this issue when he asks why the explosion argument doesn't apply to other recursive tools. He points to Douglas Englebart's ambitious proposal to use computer technologies to produce a rapidly self-improving tool set.

Below is a simple model of a project which contributes all of its own inputs, and one which begins mostly being improved by the world. They are both normalized to begin one tenth as large as the world and to grow at the same pace as each other (this is why the one with help grows slower, perhaps counterintuitively). As you can see, the project which is responsible for its own improvement takes far less time to reach its 'singularity', and is more abrupt. It starts out at crossover. The project which is helped by the world doesn't reach crossover until it passes 1. 

 

 

4. How much difference does attention and funding make to research?

Interest and investments in AI at around human-level are (naturally) hypothesized to accelerate AI development in this chapter. It would be good to have more empirical evidence on the quantitative size of such an effect. I'll start with one example, because examples are a bit costly to investigate. I selected renewable energy before I knew the results, because they come up early in the Performance Curves Database, and I thought their funding likely to have been unstable. Indeed, OECD funding since the 70s looks like this apparently:

(from here)

The steep increase in funding in the early 80s was due to President Carter's energy policies, which were related to the 1979 oil crisis.

This is what various indicators of progress in renewable energies look like (click on them to see their sources):

 

 

 

There are quite a few more at the Performance Curves Database. I see surprisingly little relationship between the funding curves and these metrics of progress. Some of them are shockingly straight. What is going on? (I haven't looked into these more than you see here).

5. Other writings on recursive self-improvement

Eliezer Yudkowsky wrote about the idea originally, e.g. here. David Chalmers investigated the topic in some detail, and Marcus Hutter did some more. More pointers here.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Model the intelligence explosion more precisely. Take inspiration from successful economic models, and evidence from a wide range of empirical areas such as evolutionary biology, technological history, algorithmic progress, and observed technological trends. Eliezer Yudkowsky has written at length about this project.
  2. Estimate empirically a specific interaction in the intelligence explosion model. For instance, how much and how quickly does investment increase in technologies that look promising? How much difference does that make to the rate of progress in the technology? How much does scaling up researchers change output in computer science? (Relevant to how much adding extra artificial AI researchers speeds up progress) How much do contemporary organizations contribute to their own inputs? (i.e. how hard would it be for a project to contribute more to its own inputs than the rest of the world put together, such that a substantial positive feedback might ensue?) Yudkowsky 2013 again has a few pointers (e.g. starting at p15).
  3. If human thought was sped up substantially, what would be the main limits to arbitrarily fast technological progress?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'decisive strategic advantage': the possibility of a single AI project getting huge amounts of power in an AI transition. To prepare, read Chapter 5, Decisive Strategic Advantage (p78-90)The discussion will go live at 6pm Pacific time next Monday Oct 27. Sign up to be notified here.

Can AIXI be trained to do anything a human can?

3 Stuart_Armstrong 20 October 2014 01:12PM

There is some discussion as to whether an AIXI-like entity would be able to defend itself (or refrain from destroying itself). The problem is that such an entity would be unable to model itself as being part of the universe: AIXI itself is an uncomputable entity modelling a computable universe, and more limited variants like AIXI(tl) lack the power to simulate themselves. Therefore, they cannot identify "that computer running the code" with "me", and would cheerfully destroy themselves in the pursuit of their goals/reward.

I've pointed out that agents of the AIXI type could nevertheless learn to defend itself in certain circumstances. These were the circumstances where it could translate bad things happening to itself into bad things happening to the universe. For instance, if someone pressed an OFF swith to turn it off for an hour, it could model that as "the universe jumps forwards an hour when that button is pushed", and if that's a negative (which is likely is, since the AIXI loses an hour of influencing the universe), it would seek to prevent that OFF switch being pressed.

That was an example of the setup of the universe "training" the AIXI to do something that it didn't seem it could do. Can this be generalised? Let's go back to the initial AIXI design (the one with the reward channel) and put a human in charge of that reward channel with the mission of teaching the AIXI important facts. Could this work?

For instance, if anything dangerous approached the AIXI's location, the human could lower the AIXI's reward, until it became very effective at deflecting danger. The more variety of things that could potentially threaten the AIXI, the more likely it is to construct plans of actions that contain behaviours that look a lot like "defend myself." We could even imagine that there is a robot programmed to repair the AIXI if it gets (mildly) damaged. The human could then reward the AIXI if it leaves that robot intact or builds duplicates or improves it in some way. It's therefore possible the AIXI could come to come to value "repairing myself", still without explicit model of itself in the universe.

It seems this approach could be extended to many of the problems with AIXI. Sure, an AIXI couldn't restrict its own computation in order to win the HeatingUp game. But the AIXI could be trained to always use subagents to deal with these kinds of games, subagents that could achieve maximal score. In fact, if the human has good knowledge of the AIXI's construction, it could, for instance, pinpoint a button that causes the AIXI to cut short its own calculation. The AIXI could then learn that pushing that button in certain circumstances would get a higher reward. A similar reward mechanism, if kept up long enough, could get it around existential despair problems.

I'm not claiming this would necessarily work - it may require a human rewarder of unfeasibly large intelligence. But it seems there's a chance that it could work. So it seems that categorical statements of the type "AIXI wouldn't..." or "AIXI would..." are wrong, at least as AIXI's behaviour is concerned. An AIXI couldn't develop self-preservation - but it could behave as if it had. It can't learn about itself - but it can behave as if it did. The human rewarder may not be necessary - maybe certain spontaneously occurring situations in the universe ("AIXI training wheels arenas") could allow the AIXI to develop these skills without outside training. Or maybe somewhat stochastic AIXI's with evolution and natural selection could do so. There is an angle connected with embodied embedded cognition that might be worth exploring there (especially the embedded part).

It seems that agents of the AIXI type may not necessarily have the limitations we assume they must.

A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

3 the-citizen 19 October 2014 07:59AM

Friendly AI is an idea that I find to be an admirable goal. While I'm not yet sure an intelligence explosion is likely, or whether FAI is possible, I've found myself often thinking about it, and I'd like for my first post to share a few those thoughts on FAI with you.

Safe AGI vs Friendly AGI
-Let's assume an Intelligence Explosion is possible for now, and that an AGI with the ability to improve itself somehow is enough to achieve it.
-Let's define a safe AGI as an above-human general AI that does not threaten humanity or terran life (eg. FAI, Tool AGI, possibly Oracle AGI)
-Let's define a Friendly AGI as one that *ensures* the continuation of humanity and terran life.
-Let's say an unsafe AGI is all other AGIs.
-Safe AGIs must supress unsafe AGIs in order to be considered Friendly. Here's why:

-If we can build a safe AGI, we probably have the technology to build an unsafe AGI too.
-An unsafe AGI is likely to be built at that point because:
-It's very difficult to conceive of a way that humans alone will be able to permanently stop all humans from developing an unsafe AGI once the steps are known**
-Some people will find the safe AGI's goals unnacceptable
-Some people will rationalise or simply mistake that their AGI design is safe when it is not
-Some people will not care if their AGI design is safe, because they do not care about other people, or because they hold some extreme beliefs
-Most imaginable unsafe AGIs would outcompete safe AGIs, because they would not neccessarily be "hamstrung" by complex goals such as protecting us meatbags from destruction. Tool or Oracle AGIs would obviously not stand a chance due to their restrictions.
-Therefore, If a safe AGI does not prevent unsafe AGIs from coming into existence, humanity will very likely be destroyed.

-The AGI most likely to prevent unsafe AGIs from being created is one that actively predicted their development and terminates that development before or on completion.
-So to summarise

-An AGI is very likely only a Friendly AI if it actively supresses unsafe AGI.
-Oracle and Tool AGIs are not Friendly AIs, they are just safe AIs, because they don't suppress anything.
-Oracle and Tool AGIs are a bad plan for AI if we want to prevent the destruction of humanity, because hostile AGIs will surely follow.

(**On reflection I cannot be certain of this specific point, but I assume it would take a fairly restrictive regime for this to be wrong. Further comments on this very welcome.)

Other minds problem - Why should be philosophically careful when attempting to theorise about FAI

I read quite a few comments in AI discussions that I'd probably characterise as "the best utility function for a FAI is one that values all consciousness". I'm quite concerned that this persists as a deeply held and largely unchallenged assumption amongst some FAI supporters. I think in general I find consciousness to be an extremely contentious, vague and inconsistently defined concept, but here I want to talk about some specific philosophical failures.

My first concern is that while many AI theorists like to say that consciousness is a physical phenomenon, which seems to imply Monist/Physicalist views, they at the same time don't seem to understand that consciousness is a Dualist concept that is coherent only in a Dualist framework. A Dualist believes there is a thing called a "subject" (very crudely this equates with the mind) and then things called objects (the outside "empirical" world interpreted by that mind). Most of this reasoning begins with Descartes' cogito ergo sum or similar starting points ( https://en.wikipedia.org/wiki/Cartesian_dualism ). Subjective experience, qualia and consciousness make sense if you accept that framework. But if you're a Monist, this arbitrary distinction between a subject and object is generally something you don't accept. In the case of a Physicalist, there's just matter doing stuff. A proper Physicalist doesn't believe in "consciousness" or "subjective experience", there's just brains and the physical human behaviours that occur as a result. Your life exists from a certain point of view, I hear you say? The Physicalist replies, "well a bunch of matter arranged to process information would say and think that, wouldn't it?".

I don't really want to get into whether Dualism or Monism is correct/true, but I want to point out even if you try to avoid this by deciding Dualism is right and consciousness is a thing, there's yet another more dangerous problem. The core of the problem is that logically or empirically establishing the existence of minds, other than your own is extremely difficult (impossible according to many). They could just be physical things walking around acting similar to you, but by virtue of something purely mechanical - without actual minds. In philosophy this is called the "other minds problem" ( https://en.wikipedia.org/wiki/Problem_of_other_minds or http://plato.stanford.edu/entries/other-minds/). I recommend a proper read of it if the idea seems crazy to you. It's a problem that's been around for centuries, and yet to-date we don't really have any convincing solution (there are some attempts but they are highly contentious and IMHO also highly problematic). I won't get into it more than that for now, suffice to say that not many people accept that there is a logical/empirical solution to this problem.

Now extrapolate that to an AGI, and the design of its "safe" utility functions. If your AGI is designed as a Dualist (which is neccessary if you wish to encorporate "consciousness", "experience" or the like into your design), then you build-in a huge risk that the AGI will decide that other minds are unprovable or do not exist. In this case your friendly utility function designed to protect "conscious beings" fails and the AGI wipes out humanity because it poses a non-zero threat to the only consciousness it can confirm - its own. For this reason I feel "consciousness", "awareness", "experience" should be left out of FAI utility functions and designs, regardless of the truth of Monism/Dualism, in favour of more straight-forward definitions of organisms, intelligence, observable emotions and intentions. (I personally favour conceptualising any AGI as a sort of extension of biological humanity, but that's a discussion for another day) My greatest concern is there is such strong cultural attachment to the concept of consciousness that researchers will be unwilling to properly question the concept at all.

What if we're not alone?

It seems a little unusual to throw alien life into the mix at this point, but I think its justified because an intelligence explosion really puts an interstellar existence well within our civilisation's grasp. Because it seems that an intelligence explosion implies a very high rate of change, it makes sense to start considering even the long term implication early, particularly if the consequences are very serious, as I believe they may be in this realm of things.

Let's say we successfully achieved a FAI. In order to fufill its mission of protecting humanity and the biosphere, it begins expanding, colonising and terraforming other planets for potential habitation by Earth originating life. I would expect this expansion wouldn't really have a limit, because the more numourous the colonies, the less likely it is we could be wiped out by some interstellar disaster.

Of course, we can't really rule out the possibility that we're not alone in the universe, or even the galaxy. If we make it as far as AGI, then its possible another alien civilisation might reach a very high level of technological advancement too. Or there might be many. If our FAI is friendly to us but basically treats them as paperclip fodder, then potentially that's a big problem. Why? Well:

-Firstly, while a species' first loyalty is to itself, we should consider that it might be morally unsdesirable to wipe out alien civilisations, particularly as they might be in some distant way "related" (see panspermia) to own biosphere.
-Secondly, there is conceivable scenarios where alien civilisations might respond to this by destroying our FAI/Earth/the biosphere/humanity. The reason is fairly obvious when you think about it. An expansionist AGI could be reasonably viewed as an attack or possibly an act of war.

Let's go into a tiny bit more detai. Given that we've not been destroyed by any alien AGI just yet, I can think of a number of possible interstellar scenarios:

(1) There is no other advanced life
(2) There is advanced life, but it is inherently non-expansive (expand inwards, or refuse to develop dangerous AGI)
(3) There is advanced life, but they have not discovered AGI yet. There could potentially be a race-to-the-finish (FAI) scenario on.
(4) There is already expanding AGIs, but due to physical limits on the expansion rate, we are not aware of them yet. (this could use further analysis)
One civilisation, or an allied group of civilisations have develop FAIs and are dominant in the galaxy. They could be either:

(5) Whack-a-mole cilivisations that destroy all potential competitors as soon as they are identified
(6) Dominators that tolerate civilisations so long as they remain primitive and non-threatening by comparison.
(7) Some sort of interstellar community that allows safe civilisations to join (this community still needs to stomp on dangerous potential rival AGIs)

In the case of (6) or (7), developing a FAI that isn't equipped to deal with alien life will probably result in us being liquidated, or at least partially sanitised in some way. In (1) (2) or (5), it probably doesn't matter what we do in this regard, though in (2) we should consider being nice. In (3) and probably (4) we're going to need a FAI capable of expanding very quickly and disarming potential AGIs (or at least ensuring they are FAIs from our perspective).

The upshot of all this is that we probably want to design safety features into our FAI so that it doesn't destroy alien civilisations/life unless its a significant threat to us. I think the understandable reaction to this is something along the lines of "create an FAI that values all types of life" or "intelligent life" or something along these lines. I don't exactly disagree, but I think we must be cautious in how we formulate this too.

Say there are many different civilisations in the galaxy. What sort of criteria would ensure that, given some sort of zero-sum scenario, Earth life wouldn't be destroyed. Let's say there was some sort of tiny but non-zero probability that humanity could evade the FAI's efforts to prevent further AGI development. Or perhaps there was some loophole in the types of AGI's that humans were allowed to develop. Wouldn't it be sensible, in this scenario, for a universalist FAI to wipe out humanity to protect the countless other civilisations? Perhaps that is acceptable? Or perhaps not? Or less drastically, how does the FAI police warfare or other competition between civilisations? A slight change in the way life is quantified and valued could change drastically the outcome for humanity. I'd probably suggest we want to weight the FAI's values to start with human and Earth biosphere primacy, but then still give some non-zero weighting to other civilisations. There is probably more thought to be done in this area too.

Simulation

I want to also briefly note that one conceivable way we might postulate as a safe way to test Friendly AI designs is to simulate a worlds/universes of less complexity than our own, make it likely that it's inhabitants invent a AGI or FAI, and then closely study the results of these simluations. Then we could study failed FAI attempt with much greater safety. It also occured to me that if we consider the possibilty of our universe being a simulated one, then this is a conceivable scenario under which our simulation might be created. After all, if you're going to simulate something, why not something vital like modelling existential risks? I'm not sure yet sure of the implications exactly. Maybe we need to consider how it relates to our universe's continued existence, or perhaps it's just another case of Pascal's Mugging. Anyway I thought I'd mention it and see what people say.

A playground for FAI theories

I want to lastly mention this link (https://www.reddit.com/r/LessWrongLounge/comments/2f3y53/the_ai_game/). Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. I want to suggest this is a very worthwhile discussion, not because its content will include rigourous theories that are directly translatable into utility functions, because very clearly it won't, but because a well developed thread of this kind would be mixing pot of ideas and good introduction to common known mistakes in thinking about FAI. We should encourage a slightly more serious verison of this.

Thanks

FAI and AGI are very interesting topics. I don't consider myself able to really discern whether such things will occur, but its an interesting and potentially vital topic. I'm looking forward to a bit of feedback on my first LW post. Thanks for reading!

Superintelligence 5: Forms of Superintelligence

12 KatjaGrace 14 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the fifth section in the reading guideForms of superintelligence. This corresponds to Chapter 3, on different ways in which an intelligence can be super.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 3 (p52-61)


Summary

  1. A speed superintelligence could do what a human does, but faster. This would make the outside world seem very slow to it. It might cope with this partially by being very tiny, or virtual. (p53)
  2. A collective superintelligence is composed of smaller intellects, interacting in some way. It is especially good at tasks that can be broken into parts and completed in parallel. It can be improved by adding more smaller intellects, or by organizing them better. (p54)
  3. A quality superintelligence can carry out intellectual tasks that humans just can't in practice, without necessarily being better or faster at the things humans can do. This can be understood by analogy with the difference between other animals and humans, or the difference between humans with and without certain cognitive capabilities. (p56-7)
  4. These different kinds of superintelligence are especially good at different kinds of tasks. We might say they have different 'direct reach'. Ultimately they could all lead to one another, so can indirectly carry out the same tasks. We might say their 'indirect reach' is the same. (p58-9)
  5. We don't know how smart it is possible for a biological or a synthetic intelligence to be. Nonetheless we can be confident that synthetic entities can be much more intelligent than biological entities
    1. Digital intelligences would have better hardware: they would be made of components ten million times faster than neurons; the components could communicate about two million times faster than neurons can; they could use many more components while our brains are constrained to our skulls; it looks like better memory should be feasible; and they could be built to be more reliable, long-lasting, flexible, and well suited to their environment.
    2. Digital intelligences would have better software: they could be cheaply and non-destructively 'edited'; they could be duplicated arbitrarily; they could have well aligned goals as a result of this duplication; they could share memories (at least for some forms of AI); and they could have powerful dedicated software (like our vision system) for domains where we have to rely on slow general reasoning.

Notes

  1. This chapter is about different kinds of superintelligent entities that could exist. I like to think about the closely related question, 'what kinds of better can intelligence be?' You can be a better baker if you can bake a cake faster, or bake more cakes, or bake better cakes. Similarly, a system can become more intelligent if it can do the same intelligent things faster, or if it does things that are qualitatively more intelligent. (Collective intelligence seems somewhat different, in that it appears to be a means to be faster or able to do better things, though it may have benefits in dimensions I'm not thinking of.) I think the chapter is getting at different ways intelligence can be better rather than 'forms' in general, which might vary on many other dimensions (e.g. emulation vs AI, goal directed vs. reflexive, nice vs. nasty).
  2. Some of the hardware and software advantages mentioned would be pretty transformative on their own. If you haven't before, consider taking a moment to think about what the world would be like if people could be cheaply and perfectly replicated, with their skills intact. Or if people could live arbitrarily long by replacing worn components. 
  3. The main differences between increasing intelligence of a system via speed and via collectiveness seem to be: (1) the 'collective' route requires that you can break up the task into parallelizable subtasks, (2) it generally has larger costs from communication between those subparts, and (3) it can't produce a single unit as fast as a comparable 'speed-based' system. This suggests that anything a collective intelligence can do, a comparable speed intelligence can do at least as well. One counterexample to this I can think of is that often groups include people with a diversity of knowledge and approaches, and so the group can do a lot more productive thinking than a single person could. It seems wrong to count this as a virtue of collective intelligence in general however, since you could also have a single fast system with varied approaches at different times.
  4. For each task, we can think of curves for how performance increases as we increase intelligence in these different ways. For instance, take the task of finding a fact on the internet quickly. It seems to me that a person who ran at 10x speed would get the figure 10x faster. Ten times as many people working in parallel would do it only a bit faster than one, depending on the variance of their individual performance, and whether they found some clever way to complement each other. It's not obvious how to multiply qualitative intelligence by a particular factor, especially as there are different ways to improve the quality of a system. It also seems non-obvious to me how search speed would scale with a particular measure such as IQ. 
  5. How much more intelligent do human systems get as we add more humans? I can't find much of an answer, but people have investigated the effect of things like team sizecity size, and scientific collaboration on various measures of productivity.
  6. The things we might think of as collective intelligences - e.g. companies, governments, academic fields - seem notable to me for being slow-moving, relative to their components. If someone were to steal some chewing gum from Target, Target can respond in the sense that an employee can try to stop them. And this is no slower than an individual human acting to stop their chewing gum from being taken. However it also doesn't involve any extra problem-solving from the organization - to the extent that the organization's intelligence goes into the issue, it has to have already done the thinking ahead of time. Target was probably much smarter than an individual human about setting up the procedures and the incentives to have a person there ready to respond quickly and effectively, but that might have happened over months or years.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. Produce improved measures of (substrate-independent) general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc. Differentiate intelligence quality from speed.
  2. List some feasible but non-realized cognitive talents for humans, and explore what could be achieved if they were given to some humans.
  3. List and examine some types of problems better solved by a speed superintelligence than by a collective superintelligence, and vice versa. Also, what are the returns on “more brains applied to the problem” (collective intelligence) for various problems? If there were merely a huge number of human-level agents added to the economy, how much would it speed up economic growth, technological progress, or other relevant metrics? If there were a large number of researchers added to the field of AI, how would it change progress?
  4. How does intelligence quality improve performance on economically relevant tasks?
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'intelligence explosion kinetics', a topic at the center of much contemporary debate over the arrival of machine intelligence. To prepare, read Chapter 4, The kinetics of an intelligence explosion (p62-77)The discussion will go live at 6pm Pacific time next Monday 20 October. Sign up to be notified here.

SRG 4: Biological Cognition, BCIs, Organizations

7 KatjaGrace 07 October 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we finish chapter 2 with three more routes to superintelligence: enhancement of biological cognition, brain-computer interfaces, and well-organized networks of intelligent agents. This corresponds to the fourth section in the reading guideBiological Cognition, BCIs, Organizations

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading“Biological Cognition” and the rest of Chapter 2 (p36-51)


Summary

Biological intelligence

  1. Modest gains to intelligence are available with current interventions such as nutrition.
  2. Genetic technologies might produce a population whose average is smarter than anyone who has have ever lived.
  3. Some particularly interesting possibilities are 'iterated embryo selection' where many rounds of selection take place in a single generation, and 'spell-checking' where the genetic mutations which are ubiquitous in current human genomes are removed.

Brain-computer interfaces

  1. It is sometimes suggested that machines interfacing closely with the human brain will greatly enhance human cognition. For instance implants that allow perfect recall and fast arithmetic. (p44-45) 
  2. Brain-computer interfaces seem unlikely to produce superintelligence (p51) This is because they have substantial health risks, because our existing systems for getting information in and out of our brains are hard to compete with, and because our brains are probably bottlenecked in other ways anyway. (p45-6) 
  3. 'Downloading' directly from one brain to another seems infeasible because each brain represents concepts idiosyncratically, without a standard format. (p46-7)

Networks and organizations

  1. A large connected system of people (or something else) might become superintelligent. (p48) 
  2. Systems of connected people become more capable through technological and institutional innovations, such as enhanced communications channels, well-aligned incentives, elimination of bureaucratic failures, and mechanisms for aggregating information. The internet as a whole is a contender for a network of humans that might become superintelligent (p49) 

Summary

  1. Since there are many possible paths to superintelligence, we can be more confident that we will get there eventually (p50) 
  2. Whole brain emulation and biological enhancement are both likely to succeed after enough incremental progress in existing technologies. Networks and organizations are already improving gradually. 
  3. The path to AI is less clear, and may be discontinuous. Which route we take might matter a lot, even if we end up with similar capabilities anyway. (p50)

The book so far

Here's a recap of what we have seen so far, now at the end of Chapter 2:

  1. Economic history suggests big changes are plausible.
  2. AI progress is ongoing.
  3. AI progress is hard to predict, but AI experts tend to expect human-level AI in mid-century.
  4. Several plausible paths lead to superintelligence: brain emulations, AI, human cognitive enhancement, brain-computer interfaces, and organizations.
  5. Most of these probably lead to machine superintelligence ultimately.
  6. That there are several paths suggests we are likely to get there.

Do you disagree with any of these points? Tell us about it in the comments.

Notes

  1. Nootropics
    Snake Oil Supplements? is a nice illustration of scientific evidence for different supplements, here filtered for those with purported mental effects, many of which relate to intelligence. I don't know how accurate it is, or where to find a summary of apparent effect sizes rather than evidence, which I think would be more interesting.

    Ryan Carey and I talked to Gwern Branwen - an independent researcher with an interest in nootropics - about prospects for substantial intelligence amplification. I was most surprised that Gwern would not be surprised if creatine gave normal people an extra 3 IQ points.
  2. Environmental influences on intelligence
    And some more health-specific ones.
  3. The Flynn Effect
    People have apparently been getting smarter by about 3 points per decade for much of the twentieth century, though this trend may be ending. Several explanations have been proposed. Namesake James Flynn has a TED talk on the phenomenon. It is strangely hard to find a good summary picture of these changes, but here's a table from Flynn's classic 1978 paper of measured increases at that point:


    Here are changes in IQ test scores over time in a set of Polish teenagers, and a set of Norwegian military conscripts respectively:


  4. Prospects for genetic intelligence enhancement
    This study uses 'Genome-wide Complex Trait Analysis' (GCTA) to estimate that about half of variation in fluid intelligence in adults is explained by common genetic variation (childhood intelligence may be less heritable). These studies use genetic data to predict 1% of variation in intelligence. This genome-wide association study (GWAS) allowed prediction of 2% of education and IQ. This study finds several common genetic variants associated with cognitive performance. Stephen Hsu very roughly estimates that you would need a million samples in order to characterize the relationship between intelligence and genetics. According to Robertson et al, even among students in the top 1% of quantitative ability, cognitive performance predicts differences in occupational outcomes later in life. The Social Science Genetics Association Consortium (SSGAC) lead research efforts on genetics of education and intelligence, and are also investigating the genetics of other 'social science traits' such as self-employment, happiness and fertility. Carl Shulman and Nick Bostrom provide some estimates for the feasibility and impact of genetic selection for intelligence, along with a discussion of reproductive technologies that might facilitate more extreme selection. Robert Sparrow writes about 'in vitro eugenics'. Stephen Hsu also had an interesting interview with Luke Muehlhauser about several of these topics, and summarizes research on genetics and intelligence in a Google Tech Talk.
  5. Some brain computer interfaces in action
    For Parkinson's disease relief, allowing locked in patients to communicate, handwriting, and controlling robot arms.
  6. What changes have made human organizations 'smarter' in the past?
    Big ones I can think of include innovations in using text (writing, printing, digital text editing), communicating better in other ways (faster, further, more reliably), increasing population size (population growth, or connection between disjoint populations), systems for trade (e.g. currency, finance, different kinds of marketplace), innovations in business organization, improvements in governance, and forces leading to reduced conflict.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.

  1. How well does IQ predict relevant kinds of success? This is informative about what enhanced humans might achieve, in general and in terms of producing more enhancement. How much better is a person with IQ 150 at programming or doing genetics research than a person with IQ 120? How does IQ relate to philosophical ability, reflectiveness, or the ability to avoid catastrophic errors? (related project guide here).
  2. How promising are nootropics? Bostrom argues 'probably not very', but it seems worth checking more thoroughly. One related curiosity is that on casual inspection, there seem to be quite a few nootropics that appeared promising at some point and then haven't been studied much. This could be explained well by any of publication bias, whatever forces are usually blamed for relatively natural drugs receiving little attention, or the casualness of my casual inspection.
  3. How can we measure intelligence in non-human systems? e.g. What are good ways to track increasing 'intelligence' of social networks, quantitatively? We have the general sense that groups of humans are the level at which everything is a lot better than it was in 1000BC, but it would be nice to have an idea of how this is progressing over time. Is GDP a reasonable metric?  
  4. What are the trends in those things that make groups of humans smarter? e.g. How will world capacity for information communication change over the coming decades? (Hilbert and Lopez's work is probably relevant)
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about 'forms of superintelligence', in the sense of different dimensions in which general intelligence might be scaled up. To prepare, read Chapter 3, Forms of Superintelligence (p52-61)The discussion will go live at 6pm Pacific time next Monday 13 October. Sign up to be notified here.

Superintelligence Reading Group 3: AI and Uploads

9 KatjaGrace 30 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the third section in the reading guide, AI & Whole Brain Emulation. This is about two possible routes to the development of superintelligence: the route of developing intelligent algorithms by hand, and the route of replicating a human brain in great detail.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading“Artificial intelligence” and “Whole brain emulation” from Chapter 2 (p22-36)


Summary

Intro

  1. Superintelligence is defined as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'
  2. There are several plausible routes to the arrival of a superintelligence: artificial intelligence, whole brain emulation, biological cognition, brain-computer interfaces, and networks and organizations. 
  3. Multiple possible paths to superintelligence makes it more likely that we will get there somehow. 
AI
  1. A human-level artificial intelligence would probably have learning, uncertainty, and concept formation as central features.
  2. Evolution produced human-level intelligence. This means it is possible, but it is unclear how much it says about the effort required.
  3. Humans could perhaps develop human-level artificial intelligence by just replicating a similar evolutionary process virtually. This appears at after a quick calculation to be too expensive to be feasible for a century, however it might be made more efficient.
  4. Human-level AI might be developed by copying the human brain to various degrees. If the copying is very close, the resulting agent would be a 'whole brain emulation', which we'll discuss shortly. If the copying is only of a few key insights about brains, the resulting AI might be very unlike humans.
  5. AI might iteratively improve itself from a meagre beginning. We'll examine this idea later. Some definitions for discussing this:
    1. 'Seed AI': a modest AI which can bootstrap into an impressive AI by improving its own architecture.
    2. 'Recursive self-improvement': the envisaged process of AI (perhaps a seed AI) iteratively improving itself.
    3. 'Intelligence explosion': a hypothesized event in which an AI rapidly improves from 'relatively modest' to superhuman level (usually imagined to be as a result of recursive self-improvement).
  6. The possibility of an intelligence explosion suggests we might have modest AI, then suddenly and surprisingly have super-human AI.
  7. An AI mind might generally be very different from a human mind. 

Whole brain emulation

  1. Whole brain emulation (WBE or 'uploading') involves scanning a human brain in a lot of detail, then making a computer model of the relevant structures in the brain.
  2. Three steps are needed for uploading: sufficiently detailed scanning, ability to process the scans into a model of the brain, and enough hardware to run the model. These correspond to three required technologies: scanning, translation (or interpreting images into models), and simulation (or hardware). These technologies appear attainable through incremental progress, by very roughly mid-century.
  3. This process might produce something much like the original person, in terms of mental characteristics. However the copies could also have lower fidelity. For instance, they might be humanlike instead of copies of specific humans, or they may only be humanlike in being able to do some tasks humans do, while being alien in other regards.

Notes

  1. What routes to human-level AI do people think are most likely?
    Bostrom and Müller's survey asked participants to compare various methods for producing synthetic and biologically inspired AI. They asked, 'in your opinion, what are the research approaches that might contribute the most to the development of such HLMI?” Selection was from a list, more than one selection possible. They report that the responses were very similar for the different groups surveyed, except that whole brain emulation got 0% in the TOP100 group (100 most cited authors in AI) but 46% in the AGI group (participants at Artificial General Intelligence conferences). Note that they are only asking about synthetic AI and brain emulations, not the other paths to superintelligence we will discuss next week.
  2. How different might AI minds be?
    Omohundro suggests advanced AIs will tend to have important instrumental goals in common, such as the desire to accumulate resources and the desire to not be killed. 
  3. Anthropic reasoning 
    ‘We must avoid the error of inferring, from the fact that intelligent life evolved on Earth, that the evolutionary processes involved had a reasonably high prior probability of producing intelligence’ (p27) 

    Whether such inferences are valid is a topic of contention. For a book-length overview of the question, see Bostrom’s Anthropic Bias. I’ve written shorter (Ch 2) and even shorter summaries, which links to other relevant material. The Doomsday Argument and Sleeping Beauty Problem are closely related.

  4. More detail on the brain emulation scheme
    Whole Brain Emulation: A Roadmap is an extensive source on this, written in 2008. If that's a bit too much detail, Anders Sandberg (an author of the Roadmap) summarises in an entertaining (and much shorter) talk. More recently, Anders tried to predict when whole brain emulation would be feasible with a statistical model. Randal Koene and Ken Hayworth both recently spoke to Luke Muehlhauser about the Roadmap and what research projects would help with brain emulation now.
  5. Levels of detail
    As you may predict, the feasibility of brain emulation is not universally agreed upon. One contentious point is the degree of detail needed to emulate a human brain. For instance, you might just need the connections between neurons and some basic neuron models, or you might need to model the states of different membranes, or the concentrations of neurotransmitters. The Whole Brain Emulation Roadmap lists some possible levels of detail in figure 2 (the yellow ones were considered most plausible). Physicist Richard Jones argues that simulation of the molecular level would be needed, and that the project is infeasible.

  6. Other problems with whole brain emulation
    Sandberg considers many potential impediments here.

  7. Order matters for brain emulation technologies (scanning, hardware, and modeling)
    Bostrom points out that this order matters for how much warning we receive that brain emulations are about to arrive (p35). Order might also matter a lot to the social implications of brain emulations. Robin Hanson discusses this briefly here, and in this talk (starting at 30:50) and this paper discusses the issue.

  8. What would happen after brain emulations were developed?
    We will look more at this in Chapter 11 (weeks 17-19) as well as perhaps earlier, including what a brain emulation society might look like, how brain emulations might lead to superintelligence, and whether any of this is good.

  9. Scanning (p30-36)
    ‘With a scanning tunneling microscope it is possible to ‘see’ individual atoms, which is a far higher resolution than needed...microscopy technology would need not just sufficient resolution but also sufficient throughput.’

    Here are some atoms, neurons, and neuronal activity in a living larval zebrafish, and videos of various neural events.


    Array tomography of mouse somatosensory cortex from Smithlab.



    A molecule made from eight cesium and eight
    iodine atoms (from here).
  10. Efforts to map connections between neurons
    Here is a 5m video about recent efforts, with many nice pictures. If you enjoy coloring in, you can take part in a gamified project to help map the brain's neural connections! Or you can just look at the pictures they made.

  11. The C. elegans connectome (p34-35)
    As Bostrom mentions, we already know how all of C. elegans neurons are connected. Here's a picture of it (via Sebastian Seung):


In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Produce a better - or merely somewhat independent - estimate of how much computing power it would take to rerun evolution artificially. (p25-6)
  2. How powerful is evolution for finding things like human-level intelligence? (You'll probably need a better metric than 'power'). What are its strengths and weaknesses compared to human researchers?
  3. Conduct a more thorough investigation into the approaches to AI that are likely to lead to human-level intelligence, for instance by interviewing AI researchers in more depth about their opinions on the question.
  4. Measure relevant progress in neuroscience, so that trends can be extrapolated to neuroscience-inspired AI. Finding good metrics seems to be hard here.
  5. e.g. How is microscopy progressing? It’s harder to get a relevant measure than you might think, because (as noted p31-33) high enough resolution is already feasible, yet throughput is low and there are other complications. 
  6. Randal Koene suggests a number of technical research projects that would forward whole brain emulation (fifth question).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about other paths to the development of superintelligence: biological cognition, brain-computer interfaces, and organizations. To prepare, read Biological Cognition and the rest of Chapter 2The discussion will go live at 6pm Pacific time next Monday 6 October. Sign up to be notified here.

Superintelligence Reading Group 2: Forecasting AI

10 KatjaGrace 23 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome. This week we discuss the second section in the reading guide, Forecasting AI. This is about predictions of AI, and what we should make of them.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

ReadingOpinions about the future of machine intelligence, from Chapter 1 (p18-21) and Muehlhauser, When Will AI be Created?


Summary

Opinions about the future of machine intelligence, from Chapter 1 (p18-21)

  1. AI researchers hold a variety of views on when human-level AI will arrive, and what it will be like.
  2. A recent set of surveys of AI researchers produced the following median dates: 
    • for human-level AI with 10% probability: 2022
    • for human-level AI with 50% probability: 2040
    • for human-level AI with 90% probability: 2075
  3. Surveyed AI researchers in aggregate gave 10% probability to 'superintelligence' within two years of human level AI, and 75% to 'superintelligence' within 30 years.
  4. When asked about the long-term impacts of human level AI, surveyed AI researchers gave the responses in the figure below (these are 'renormalized median' responses, 'TOP 100' is one of the surveyed groups, 'Combined' is all of them'). 
  5. There are various reasons to expect such opinion polls and public statements to be fairly inaccurate.
  6. Nonetheless, such opinions suggest that the prospect of human-level AI is worthy of attention.
  1. Predicting when human-level AI will arrive is hard.
  2. The estimates of informed people can vary between a small number of decades and a thousand years.
  3. Different time scales have different policy implications.
  4. Several surveys of AI experts exist, but Muehlhauser suspects sampling bias (e.g. optimistic views being sampled more often) makes such surveys of little use.
  5. Predicting human-level AI development is the kind of task that experts are characteristically bad at, according to extensive research on what makes people better at predicting things.
  6. People try to predict human-level AI by extrapolating hardware trends. This probably won't work, as AI requires software as well as hardware, and software appears to be a substantial bottleneck.
  7. We might try to extrapolate software progress, but software often progresses less smoothly, and is also hard to design good metrics for.
  8. A number of plausible events might substantially accelerate or slow progress toward human-level AI, such as an end to Moore's Law, depletion of low-hanging fruit, societal collapse, or a change in incentives for development.
  9. The appropriate response to this situation is uncertainty: you should neither be confident that human-level AI will take less than 30 years, nor that it will take more than a hundred years.
  10. We can still hope to do better: there are known ways to improve predictive accuracy, such as making quantitative predictions, looking for concrete 'signposts', looking at aggregated predictions, and decomposing complex phenomena into simpler ones.
Notes
  1. More (similar) surveys on when human-level AI will be developed
    Bostrom discusses some recent polls in detail, and mentions that others are fairly consistent. Below are the surveys I could find. Several of them give dates when median respondents believe there is a 10%, 50% or 90% chance of AI, which I have recorded as '10% year' etc. If their findings were in another form, those are in the last column. Note that some of these surveys are fairly informal, and many participants are not AI experts, I'd guess especially in the Bainbridge, AI@50 and Klein ones. 'Kruel' is the set of interviews from which Nils Nilson is quoted on p19. The interviews cover a wider range of topics, and are indexed here.

       10% year  50% year  90% year  Other predictions
    Michie 1972 
    (paper download)
          Fairly even spread between 20, 50 and >50 years
    Bainbridge 2005        Median prediction 2085
    AI@50 poll 
    2006
          82% predict more than 50 years (>2056) or never
    Baum et al
    AGI-09
     2020      2040  2075  
    Klein 2011
        median 2030-2050
    FHI 2011  2028 2050   2150  
    Kruel 2011- (interviews, summary)  2025  2035  2070  
    FHI: AGI 2014 2022  2040  2065  
    FHI: TOP100 2014 2022   2040  2075  
    FHI:EETN 2014 2020  2050  2093  
    FHI:PT-AI 2014 2023  2048  2080  
    Hanson ongoing       Most say have come 10% or less of the way to human level
  2. Predictions in public statements
    Polls are one source of predictions on AI. Another source is public statements. That is, things people choose to say publicly. MIRI arranged for the collection of these public statements, which you can now download and play with (the original and info about it, my edited version and explanation for changes). The figure below shows the cumulative fraction of public statements claiming that human-level AI will be more likely than not by a particular year. Or at least claiming something that can be broadly interpreted as that. It only includes recorded statements made since 2000. There are various warnings and details in interpreting this, but I don't think they make a big difference, so are probably not worth considering unless you are especially interested. Note that the authors of these statements are a mixture of mostly AI researchers (including disproportionately many working on human-level AI) a few futurists, and a few other people.

    (LH axis = fraction of people predicting human-level AI by that date) 

    Cumulative distribution of predicted date of AI

    As you can see, the median date (when the graph hits the 0.5 mark) for human-level AI here is much like that in the survey data: 2040 or so.

    I would generally expect predictions in public statements to be relatively early, because people just don't tend to bother writing books about how exciting things are not going to happen for a while, unless their prediction is fascinatingly late. I checked this more thoroughly, by comparing the outcomes of surveys to the statements made by people in similar groups to those surveyed (e.g. if the survey was of AI researchers, I looked at statements made by AI researchers). In my (very cursory) assessment (detailed at the end of this page) there is a bit of a difference: predictions from surveys are 0-23 years later than those from public statements.
  3. What kinds of things are people good at predicting?
    Armstrong and Sotala (p11) summarize a few research efforts in recent decades as follows.


    Note that the problem of predicting AI mostly falls on the right. Unfortunately this doesn't tell us anything about how much harder AI timelines are to predict than other things, or the absolute level of predictive accuracy associated with any combination of features. However if you have a rough idea of how well humans predict things, you might correct it downward when predicting how well humans predict future AI development and its social consequences.
  4. Biases
    As well as just being generally inaccurate, predictions of AI are often suspected to subject to a number of biases. Bostrom claimed earlier that 'twenty years is the sweet spot for prognosticators of radical change' (p4). A related concern is that people always predict revolutionary changes just within their lifetimes (the so-called Maes-Garreau law). Worse problems come from selection effects: the people making all of these predictions are selected for thinking AI is the best things to spend their lives on, so might be especially optimistic. Further, more exciting claims of impending robot revolution might be published and remembered more often. More bias might come from wishful thinking: having spent a lot of their lives on it, researchers might hope especially hard for it to go well. On the other hand, as Nils Nilson points out, AI researchers are wary of past predictions and so try hard to retain respectability, for instance by focussing on 'weak AI'. This could systematically push their predictions later.

    We have some evidence about these biases. Armstrong and Sotala (using the MIRI dataset) find people are especially willing to predict AI around 20 years in the future, but couldn't find evidence of the Maes-Garreau law. Another way of looking for the Maes-Garreau law is via correlation between age and predicted time to AI, which is weak (-.017) in the edited MIRI dataset. A general tendency to make predictions based on incentives rather than available information is weakly supported by predictions not changing much over time, which is pretty much what we see in the MIRI dataset. In the figure below, 'early' predictions are made before 2000, and 'late' ones since then.


    Cumulative distribution of predicted Years to AI, in early and late predictions.

    We can learn something about selection effects from AI researchers being especially optimistic about AI from comparing groups who might be more or less selected in this way. For instance, we can compare most AI researchers - who tend to work on narrow intelligent capabilities - and researchers of 'artificial general intelligence' (AGI) who specifically focus on creating human-level agents. The figure below shows this comparison with the edited MIRI dataset, using a rough assessment of who works on AGI vs. other AI and only predictions made from 2000 onward ('late'). Interestingly, the AGI predictions indeed look like the most optimistic half of the AI predictions. 


    Cumulative distribution of predicted date of AI, for AGI and other AI researchers

    We can also compare other groups in the dataset - 'futurists' and other people (according to our own heuristic assessment). While the picture is interesting, note that both of these groups were very small (as you can see by the large jumps in the graph). 


    Cumulative distribution of predicted date of AI, for various groups

    Remember that these differences may not be due to bias, but rather to better understanding. It could well be that AGI research is very promising, and the closer you are to it, the more you realize that. Nonetheless, we can say some things from this data. The total selection bias toward optimism in communities selected for optimism is probably not more than the differences we see here - a few decades in the median, but could plausibly be that large.

    These have been some rough calculations to get an idea of the extent of a few hypothesized biases. I don't think they are very accurate, but I want to point out that you can actually gather empirical data on these things, and claim that given the current level of research on these questions, you can learn interesting things fairly cheaply, without doing very elaborate or rigorous investigations.
  5. What definition of 'superintelligence' do AI experts expect within two years of human-level AI with probability 10% and within thirty years with probability 75%?
    “Assume for the purpose of this question that such HLMI will at some point exist. How likely do you then think it is that within (2 years / 30 years) thereafter there will be machine intelligence that greatly surpasses the performance of every human in most professions?” See the paper for other details about Bostrom and Müller's surveys (the ones in the book).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some taken from Luke Muehlhauser's list:

  1. Instead of asking how long until AI, Robin Hanson's mini-survey asks people how far we have come (in a particular sub-area) in the last 20 years, as a fraction of the remaining distance. Responses to this question are generally fairly low - 5% is common. His respondents also tend to say that progress isn't accelerating especially. These estimates imply that any given sub-area of AI, human-level ability should be reached in about 200 years, which is strongly at odds with what researchers say in the other surveys. An interesting project would be to expand Robin's survey, and try to understand the discrepancy, and which estimates we should be using. We made a guide to carrying out this project.
  2. There are many possible empirical projects which would better inform estimates of timelines e.g. measuring the landscape and trends of computation (MIRI started this here, and made a project guide), analyzing performance of different versions of software on benchmark problems to find how much hardware and software contributed to progress, developing metrics to meaningfully measure AI progress, investigating the extent of AI inspiration from biology in the past, measuring research inputs over time (e.g. a start), and finding the characteristic patterns of progress in algorithms (my attempts here).
  3. Make a detailed assessment of likely timelines in communication with some informed AI researchers.
  4. Gather and interpret past efforts to predict technology decades ahead of time. Here are a few efforts to judge past technological predictions: Clarke 1969Wise 1976, Albright 2002, Mullins 2012Kurzweil on his own predictions, and other people on Kurzweil's predictions
  5. Above I showed you several rough calculations I did. A rigorous version of any of these would be useful.
  6. Did most early AI scientists really think AI was right around the corner, or was it just a few people? The earliest survey available (Michie 1973) suggests it may have been just a few people. For those that thought AI was right around the corner, how much did they think about the safety and ethical challenges? If they thought and talked about it substantially, why was there so little published on the subject? If they really didn’t think much about it, what does that imply about how seriously AI scientists will treat the safety and ethical challenges of AI in the future? Some relevant sources here.
  7. Conduct a Delphi study of likely AGI impacts. Participants could be AI scientists, researchers who work on high-assurance software systems, and AGI theorists.
  8. Signpost the future. Superintelligence explores many different ways the future might play out with regard to superintelligence, but cannot help being somewhat agnostic about which particular path the future will take. Come up with clear diagnostic signals that policy makers can use to gauge whether things are developing toward or away from one set of scenarios or another. If X does or does not happen by 2030, what does that suggest about the path we’re on? If Y ends up taking value A or B, what does that imply?
  9. Another survey of AI scientists’ estimates on AGI timelines, takeoff speed, and likely social outcomes, with more respondents and a higher response rate than the best current survey, which is probably Müller & Bostrom (2014).
  10. Download the MIRI dataset and see if you can find anything interesting in it.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about two paths to the development of superintelligence: AI coded by humans, and whole brain emulation. To prepare, read Artificial Intelligence and Whole Brain Emulation from Chapter 2The discussion will go live at 6pm Pacific time next Monday 29 September. Sign up to be notified here.

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

25 KatjaGrace 16 September 2014 01:00AM

This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, see the announcement post. For the schedule of future topics, see MIRI's reading guide.


Welcome to the Superintelligence reading group. This week we discuss the first section in the reading guide, Past developments and present capabilities. This section considers the behavior of the economy over very long time scales, and the recent history of artificial intelligence (henceforth, 'AI'). These two areas are excellent background if you want to think about large economic transitions caused by AI.

This post summarizes the section, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post. Feel free to jump straight to the discussion. Where applicable, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Foreword, and Growth modes through State of the art from Chapter 1 (p1-18)


Summary

Economic growth:

  1. Economic growth has become radically faster over the course of human history. (p1-2)
  2. This growth has been uneven rather than continuous, perhaps corresponding to the farming and industrial revolutions. (p1-2)
  3. Thus history suggests large changes in the growth rate of the economy are plausible. (p2)
  4. This makes it more plausible that human-level AI will arrive and produce unprecedented levels of economic productivity.
  5. Predictions of much faster growth rates might also suggest the arrival of machine intelligence, because it is hard to imagine humans - slow as they are - sustaining such a rapidly growing economy. (p2-3)
  6. Thus economic history suggests that rapid growth caused by AI is more plausible than you might otherwise think.

The history of AI:

  1. Human-level AI has been predicted since the 1940s. (p3-4)
  2. Early predictions were often optimistic about when human-level AI would come, but rarely considered whether it would pose a risk. (p4-5)
  3. AI research has been through several cycles of relative popularity and unpopularity. (p5-11)
  4. By around the 1990s, 'Good Old-Fashioned Artificial Intelligence' (GOFAI) techniques based on symbol manipulation gave way to new methods such as artificial neural networks and genetic algorithms. These are widely considered more promising, in part because they are less brittle and can learn from experience more usefully. Researchers have also lately developed a better understanding of the underlying mathematical relationships between various modern approaches. (p5-11)
  5. AI is very good at playing board games. (12-13)
  6. AI is used in many applications today (e.g. hearing aids, route-finders, recommender systems, medical decision support systems, machine translation, face recognition, scheduling, the financial market). (p14-16)
  7. In general, tasks we thought were intellectually demanding (e.g. board games) have turned out to be easy to do with AI, while tasks which seem easy to us (e.g. identifying objects) have turned out to be hard. (p14)
  8. An 'optimality notion' is the combination of a rule for learning, and a rule for making decisions. Bostrom describes one of these: a kind of ideal Bayesian agent. This is impossible to actually make, but provides a useful measure for judging imperfect agents against. (p10-11)

Notes on a few things

  1. What is 'superintelligence'? (p22 spoiler)
    In case you are too curious about what the topic of this book is to wait until week 3, a 'superintelligence' will soon be described as 'any intellect that greatly exceeds the cognitive performance of humans in virtually all domains of interest'. Vagueness in this definition will be cleared up later. 
  2. What is 'AI'?
    In particular, how does 'AI' differ from other computer software? The line is blurry, but basically AI research seeks to replicate the useful 'cognitive' functions of human brains ('cognitive' is perhaps unclear, but for instance it doesn't have to be squishy or prevent your head from imploding). Sometimes AI research tries to copy the methods used by human brains. Other times it tries to carry out the same broad functions as a human brain, perhaps better than a human brain. Russell and Norvig (p2) divide prevailing definitions of AI into four categories: 'thinking humanly', 'thinking rationally', 'acting humanly' and 'acting rationally'. For our purposes however, the distinction is probably not too important.
  3. What is 'human-level' AI? 
    We are going to talk about 'human-level' AI a lot, so it would be good to be clear on what that is. Unfortunately the term is used in various ways, and often ambiguously. So we probably can't be that clear on it, but let us at least be clear on how the term is unclear. 

    One big ambiguity is whether you are talking about a machine that can carry out tasks as well as a human at any price, or a machine that can carry out tasks as well as a human at the price of a human. These are quite different, especially in their immediate social implications.

    Other ambiguities arise in how 'levels' are measured. If AI systems were to replace almost all humans in the economy, but only because they are so much cheaper - though they often do a lower quality job - are they human level? What exactly does the AI need to be human-level at? Anything you can be paid for? Anything a human is good for? Just mental tasks? Even mental tasks like daydreaming? Which or how many humans does the AI need to be the same level as? Note that in a sense most humans have been replaced in their jobs before (almost everyone used to work in farming), so if you use that metric for human-level AI, it was reached long ago, and perhaps farm machinery is human-level AI. This is probably not what we want to point at.

    Another thing to be aware of is the diversity of mental skills. If by 'human-level' we mean a machine that is at least as good as a human at each of these skills, then in practice the first 'human-level' machine will be much better than a human on many of those skills. It may not seem 'human-level' so much as 'very super-human'.

    We could instead think of human-level as closer to 'competitive with a human' - where the machine has some super-human talents and lacks some skills humans have. This is not usually used, I think because it is hard to define in a meaningful way. There are already machines for which a company is willing to pay more than a human: in this sense a microscope might be 'super-human'. There is no reason for a machine which is equal in value to a human to have the traits we are interested in talking about here, such as agency, superior cognitive abilities or the tendency to drive humans out of work and shape the future. Thus we talk about AI which is at least as good as a human, but you should beware that the predictions made about such an entity may apply before the entity is technically 'human-level'.


    Example of how the first 'human-level' AI may surpass humans in many ways.

    Because of these ambiguities, AI researchers are sometimes hesitant to use the term. e.g. in these interviews.
  4. Growth modes (p1) 
    Robin Hanson wrote the seminal paper on this issue. Here's a figure from it, showing the step changes in growth rates. Note that both axes are logarithmic. Note also that the changes between modes don't happen overnight. According to Robin's model, we are still transitioning into the industrial era (p10 in his paper).
  5. What causes these transitions between growth modes? (p1-2)
    One might be happier making predictions about future growth mode changes if one had a unifying explanation for the previous changes. As far as I know, we have no good idea of what was so special about those two periods. There are many suggested causes of the industrial revolution, but nothing uncontroversially stands out as 'twice in history' level of special. You might think the small number of datapoints would make this puzzle too hard. Remember however that there are quite a lot of negative datapoints - you need an explanation that didn't happen at all of the other times in history. 
  6. Growth of growth
    It is also interesting to compare world economic growth to the total size of the world economy. For the last few thousand years, the economy seems to have grown faster more or less in proportion to it's size (see figure below). Extrapolating such a trend would lead to an infinite economy in finite time. In fact for the thousand years until 1950 such extrapolation would place an infinite economy in the late 20th Century! The time since 1950 has been strange apparently. 

    (Figure from here)
  7. Early AI programs mentioned in the book (p5-6)
    You can see them in action: SHRDLU, Shakey, General Problem Solver (not quite in action), ELIZA.
  8. Later AI programs mentioned in the book (p6)
    Algorithmically generated Beethoven, algorithmic generation of patentable inventionsartificial comedy (requires download).
  9. Modern AI algorithms mentioned (p7-8, 14-15) 
    Here is a neural network doing image recognition. Here is artificial evolution of jumping and of toy cars. Here is a face detection demo that can tell you your attractiveness (apparently not reliably), happiness, age, gender, and which celebrity it mistakes you for.
  10. What is maximum likelihood estimation? (p9)
    Bostrom points out that many types of artificial neural network can be viewed as classifiers that perform 'maximum likelihood estimation'. If you haven't come across this term before, the idea is to find the situation that would make your observations most probable. For instance, suppose a person writes to you and tells you that you have won a car. The situation that would have made this scenario most probable is the one where you have won a car, since in that case you are almost guaranteed to be told about it. Note that this doesn't imply that you should think you won a car, if someone tells you that. Being the target of a spam email might only give you a low probability of being told that you have won a car (a spam email may instead advise you of products, or tell you that you have won a boat), but spam emails are so much more common than actually winning cars that most of the time if you get such an email, you will not have won a car. If you would like a better intuition for maximum likelihood estimation, Wolfram Alpha has several demonstrations (requires free download).
  11. What are hill climbing algorithms like? (p9)
    The second large class of algorithms Bostrom mentions are hill climbing algorithms. The idea here is fairly straightforward, but if you would like a better basic intuition for what hill climbing looks like, Wolfram Alpha has a demonstration to play with (requires free download).

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions:

  1. How have investments into AI changed over time? Here's a start, estimating the size of the field.
  2. What does progress in AI look like in more detail? What can we infer from it? I wrote about algorithmic improvement curves before. If you are interested in plausible next steps here, ask me.
  3. What do economic models tell us about the consequences of human-level AI? Here is some such thinking; Eliezer Yudkowsky has written at length about his request for more.

How to proceed

This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about what AI researchers think about human-level AI: when it will arrive, what it will be like, and what the consequences will be. To prepare, read Opinions about the future of machine intelligence from Chapter 1 and also When Will AI Be Created? by Luke Muehlhauser. The discussion will go live at 6pm Pacific time next Monday 22 September. Sign up to be notified here.

Do Virtual Humans deserve human rights?

-3 cameroncowan 11 September 2014 07:20PM

Do Virtual Humans deserve human rights?

Slate Article

 

I think the idea of storing our minds in a machine so that we can keep on "living" (and I use that term loosely) is fascinating and certainly and oft discussed topic around here. However, in thinking about keeping our brains on a hard drive we have to think about rights and how that all works together. Indeed the technology may be here before we know it so I think its important to think about mindclones. If I create a little version of myself that can answer my emails for me, can I delete him when I'm done with him or just turn him in for a new model like I do iPhones? 

 

I look forward to the discussion.

 

Omission vs commission and conservation of expected moral evidence

2 Stuart_Armstrong 08 September 2014 02:22PM

Consequentialism traditionally doesn't distinguish between acts of commission or acts of omission. Not flipping the lever to the left is equivalent with flipping it to the right.

But there seems one clear case where the distinction is important. Consider a moral learning agent. It must act in accordance with human morality and desires, which it is currently unclear about.

For example, it may consider whether to forcibly wirehead everyone. If it does so, they everyone will agree, for the rest of their existence, that the wireheading was the right thing to do. Therefore across the whole future span of human preferences, humans agree that wireheading was correct, apart from a very brief period of objection in the immediate future. Given that human preferences are known to be inconsistent, this seems to imply that forcible wireheading is the right thing to do (if you happen to personally approve of forcible wireheading, replace that example with some other forcible rewriting of human preferences).

What went wrong there? Well, this doesn't respect "conversation of moral evidence": the AI got the moral values it wanted, but only though the actions it took. This is very close to the omission/commission distinction. We'd want the AI to not take actions (commission) that determines the (expectation of the) moral evidence it gets. Instead, we'd want the moral evidence to accrue "naturally", without interference and manipulation from the AI (omission).

Goal retention discussion with Eliezer

56 MaxTegmark 04 September 2014 10:23PM

Although I feel that Nick Bostrom’s new book “Superintelligence” is generally awesome and a well-needed milestone for the field, I do have one quibble: both he and Steve Omohundro appear to be more convinced than I am by the assumption that an AI will naturally tend to retain its goals as it reaches a deeper understanding of the world and of itself. I’ve written a short essay on this issue from my physics perspective, available at http://arxiv.org/pdf/1409.0813.pdf.

Eliezer Yudkowsky just sent the following extremely interesting comments, and told me he was OK with me sharing them here to spur a broader discussion of these issues, so here goes.

On Sep 3, 2014, at 17:21, Eliezer Yudkowsky <yudkowsky@gmail.com> wrote:

Hi Max!  You're asking the right questions.  Some of the answers we can
give you, some we can't, few have been written up and even fewer in any
well-organized way.  Benja or Nate might be able to expound in more detail
while I'm in my seclusion.

Very briefly, though:
The problem of utility functions turning out to be ill-defined in light of
new discoveries of the universe is what Peter de Blanc named an
"ontological crisis" (not necessarily a particularly good name, but it's
what we've been using locally).

http://intelligence.org/files/OntologicalCrises.pdf

The way I would phrase this problem now is that an expected utility
maximizer makes comparisons between quantities that have the type
"expected utility conditional on an action", which means that the AI's
utility function must be something that can assign utility-numbers to the
AI's model of reality, and these numbers must have the further property
that there is some computationally feasible approximation for calculating
expected utilities relative to the AI's probabilistic beliefs.  This is a
constraint that rules out the vast majority of all completely chaotic and
uninteresting utility functions, but does not rule out, say, "make lots of
paperclips".

Models also have the property of being Bayes-updated using sensory
information; for the sake of discussion let's also say that models are
about universes that can generate sensory information, so that these
models can be probabilistically falsified or confirmed.  Then an
"ontological crisis" occurs when the hypothesis that best fits sensory
information corresponds to a model that the utility function doesn't run
on, or doesn't detect any utility-having objects in.  The example of
"immortal souls" is a reasonable one.  Suppose we had an AI that had a
naturalistic version of a Solomonoff prior, a language for specifying
universes that could have produced its sensory data.  Suppose we tried to
give it a utility function that would look through any given model, detect
things corresponding to immortal souls, and value those things.  Even if
the immortal-soul-detecting utility function works perfectly (it would in
fact detect all immortal souls) this utility function will not detect
anything in many (representations of) universes, and in particular it will
not detect anything in the (representations of) universes we think have
most of the probability mass for explaining our own world.  In this case
the AI's behavior is undefined until you tell me more things about the AI;
an obvious possibility is that the AI would choose most of its actions
based on low-probability scenarios in which hidden immortal souls existed
that its actions could affect.  (Note that even in this case the utility
function is stable!)

Since we don't know the final laws of physics and could easily be
surprised by further discoveries in the laws of physics, it seems pretty
clear that we shouldn't be specifying a utility function over exact
physical states relative to the Standard Model, because if the Standard
Model is even slightly wrong we get an ontological crisis.  Of course
there are all sorts of extremely good reasons we should not try to do this
anyway, some of which are touched on in your draft; there just is no
simple function of physics that gives us something good to maximize.  See
also Complexity of Value, Fragility of Value, indirect normativity, the
whole reason for a drive behind CEV, and so on.  We're almost certainly
going to be using some sort of utility-learning algorithm, the learned
utilities are going to bind to modeled final physics by way of modeled
higher levels of representation which are known to be imperfect, and we're
going to have to figure out how to preserve the model and learned
utilities through shifts of representation.  E.g., the AI discovers that
humans are made of atoms rather than being ontologically fundamental
humans, and furthermore the AI's multi-level representations of reality
evolve to use a different sort of approximation for "humans", but that's
okay because our utility-learning mechanism also says how to re-bind the
learned information through an ontological shift.

This sorta thing ain't going to be easy which is the other big reason to
start working on it well in advance.  I point out however that this
doesn't seem unthinkable in human terms.  We discovered that brains are
made of neurons but were nonetheless able to maintain an intuitive grasp
on what it means for them to be happy, and we don't throw away all that
info each time a new physical discovery is made.  The kind of cognition we
want does not seem inherently self-contradictory.

Three other quick remarks:

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
The Omohundrian/Yudkowskian argument is not that we can take an arbitrary
stupid young AI and it will be smart enough to self-modify in a way that
preserves its values, but rather that most AIs that don't self-destruct
will eventually end up at a stable fixed-point of coherent
consequentialist values.  This could easily involve a step where, e.g., an
AI that started out with a neural-style delta-rule policy-reinforcement
learning algorithm, or an AI that started out as a big soup of
self-modifying heuristics, is "taken over" by whatever part of the AI
first learns to do consequentialist reasoning about code.  But this
process doesn't repeat indefinitely; it stabilizes when there's a
consequentialist self-modifier with a coherent utility function that can
precisely predict the results of self-modifications.  The part where this
does happen to an initial AI that is under this threshold of stability is
a big part of the problem of Friendly AI and it's why MIRI works on tiling
agents and so on!

*)  Natural selection is not a consequentialist, nor is it the sort of
consequentialist that can sufficiently precisely predict the results of
modifications that the basic argument should go through for its stability.
It built humans to be consequentialists that would value sex, not value
inclusive genetic fitness, and not value being faithful to natural
selection's optimization criterion.  Well, that's dumb, and of course the
result is that humans don't optimize for inclusive genetic fitness.
Natural selection was just stupid like that.  But that doesn't mean
there's a generic process whereby an agent rejects its "purpose" in the
light of exogenously appearing preference criteria.  Natural selection's
anthropomorphized "purpose" in making human brains is just not the same as
the cognitive purposes represented in those brains.  We're not talking
about spontaneous rejection of internal cognitive purposes based on their
causal origins failing to meet some exogenously-materializing criterion of
validity.  Our rejection of "maximize inclusive genetic fitness" is not an
exogenous rejection of something that was explicitly represented in us,
that we were explicitly being consequentialists for.  It's a rejection of
something that was never an explicitly represented terminal value in the
first place.  Similarly the stability argument for sufficiently advanced
self-modifiers doesn't go through a step where the successor form of the
AI reasons about the intentions of the previous step and respects them
apart from its constructed utility function.  So the lack of any universal
preference of this sort is not a general obstacle to stable
self-improvement.

*)   The case of natural selection does not illustrate a universal
computational constraint, it illustrates something that we could
anthropomorphize as a foolish design error.  Consider humans building Deep
Blue.  We built Deep Blue to attach a sort of default value to queens and
central control in its position evaluation function, but Deep Blue is
still perfectly able to sacrifice queens and central control alike if the
position reaches a checkmate thereby.  In other words, although an agent
needs crystallized instrumental goals, it is also perfectly reasonable to
have an agent which never knowingly sacrifices the terminally defined
utilities for the crystallized instrumental goals if the two conflict;
indeed "instrumental value of X" is simply "probabilistic belief that X
leads to terminal utility achievement", which is sensibly revised in the
presence of any overriding information about the terminal utility.  To put
it another way, in a rational agent, the only way a loose generalization
about instrumental expected-value can conflict with and trump terminal
actual-value is if the agent doesn't know it, i.e., it does something that
it reasonably expected to lead to terminal value, but it was wrong.

This has been very off-the-cuff and I think I should hand this over to
Nate or Benja if further replies are needed, if that's all right.

Superintelligence reading group

15 KatjaGrace 31 August 2014 02:59PM

In just over two weeks I will be running an online reading group on Nick Bostrom's Superintelligence, on behalf of MIRI. It will be here on LessWrong. This is an advance warning, so you can get a copy and get ready for some stimulating discussion. MIRI's post, appended below, gives the details.

Added: At the bottom of this post is a list of the discussion posts so far.


Nick Bostrom’s eagerly awaited Superintelligence comes out in the US this week. To help you get the most out of it, MIRI is running an online reading group where you can join with others to ask questions, discuss ideas, and probe the arguments more deeply.

The reading group will “meet” on a weekly post on the LessWrong discussion forum. For each ‘meeting’, we will read about half a chapter of Superintelligence, then come together virtually to discuss. I’ll summarize the chapter, and offer a few relevant notes, thoughts, and ideas for further investigation. (My notes will also be used as the source material for the final reading guide for the book.)

Discussion will take place in the comments. I’ll offer some questions, and invite you to bring your own, as well as thoughts, criticisms and suggestions for interesting related material. Your contributions to the reading group might also (with permission) be used in our final reading guide for the book.

We welcome both newcomers and veterans on the topic. Content will aim to be intelligible to a wide audience, and topics will range from novice to expert level. All levels of time commitment are welcome.

We will follow this preliminary reading guide, produced by MIRI, reading one section per week.

If you have already read the book, don’t worry! To the extent you remember what it says, your superior expertise will only be a bonus. To the extent you don’t remember what it says, now is a good time for a review! If you don’t have time to read the book, but still want to participate, you are also welcome to join in. I will provide summaries, and many things will have page numbers, in case you want to skip to the relevant parts.

If this sounds good to you, first grab a copy of Superintelligence. You may also want to sign up here to be emailed when the discussion begins each week. The first virtual meeting (forum post) will go live at 6pm Pacific on Monday, September 15th. Following meetings will start at 6pm every Monday, so if you’d like to coordinate for quick fire discussion with others, put that into your calendar. If you prefer flexibility, come by any time! And remember that if there are any people you would especially enjoy discussing Superintelligence with, link them to this post!

Topics for the first week will include impressive displays of artificial intelligence, why computers play board games so well, and what a reasonable person should infer from the agricultural and industrial revolutions.


Posts in this sequence

Week 1: Past developments and present capabilities

Week 2: Forecasting AI

Week 3: AI and uploads

Week 4: Biological cognition, BCIs, organizations

Week 5: Forms of superintelligence

Week 6: Intelligence explosion kinetics

Week 7: Decisive strategic advantage

Week 8: Cognitive superpowers

Week 9: The orthogonality of intelligence and goals

Week 10: Instrumentally convergent goals

The Great Filter is early, or AI is hard

19 Stuart_Armstrong 29 August 2014 04:17PM

Attempt at the briefest content-full Less Wrong post:

Once AI is developed, it could "easily" colonise the universe. So the Great Filter (preventing the emergence of star-spanning civilizations) must strike before AI could be developed. If AI is easy, we could conceivably have built it already, or we could be on the cusp of building it. So the Great Filter must predate us, unless AI is hard.

The immediate real-world uses of Friendly AI research

6 ancientcampus 26 August 2014 02:47AM

Much of the glamor and attention paid toward Friendly AI is focused on the misty-future event of a super-intelligent general AI, and how we can prevent it from repurposing our atoms to better run Quake 2. Until very recently, that was the full breadth of the field in my mind. I recently realized that dumber, narrow AI is a real thing today, helpfully choosing advertisements for me and running my 401K. As such, making automated programs safe to let loose on the real world is not just a problem to solve as a favor for the people of tomorrow, but something with immediate real-world advantages that has indeed already been going on for quite some time. Veterans in the field surely already understand this, so this post is directed at people like me, with a passing and disinterested understanding of the point of Friendly AI research, and outlines an argument that the field may be useful right now, even if you believe that an evil AI overlord is not on the list of things to worry about in the next 40 years.

 

Let's look at the stock market. High-Frequency Trading is the practice of using computer programs to make fast trades constantly throughout the day, and accounts for more than half of all equity trades in the US. So, the economy today is already in the hands of a bunch of very narrow AIs buying and selling to each other. And as you may or may not already know, this has already caused problems. In the “2010 Flash Crash”, the Dow Jones suddenly and mysteriously hit a massive plummet only to mostly recover within a few minutes. The reasons for this were of course complicated, but it boiled down to a couple red flags triggering in numerous programs, setting off a cascade of wacky trades.

 

The long-term damage was not catastrophic to society at large (though I'm sure a couple fortunes were made and lost that day), but it illustrates the need for safety measures as we hand over more and more responsibility and power to processes that require little human input. It might be a blue moon before anyone makes true general AI, but adaptive city traffic-light systems are entirely plausible in upcoming years.

 

To me, Friendly AI isn't solely about making a human-like intelligence that doesn't hurt us – we need techniques for testing automated programs, predicting how they will act when let loose on the world, and how they'll act when faced with unpredictable situations. Indeed, when framed like that, it looks less like a field for “the singularitarian cultists at LW”, and more like a narrow-but-important specialty in which quite a bit of money might be made.

 

After all, I want my self-driving car.

 

(To the actual researchers in FAI – I'm sorry if I'm stretching the field's definition to include more than it does or should. If so, please correct me.)

Another type of intelligence explosion

16 Stuart_Armstrong 21 August 2014 02:49PM

I've argued that we might have to worry about dangerous non-general intelligences. In a series of back and forth with Wei Dai, we agreed that some level of general intelligence (such as that humans seem to possess) seemed to be a great advantage, though possibly one with diminishing returns. Therefore a dangerous AI could be one with great narrow intelligence in one area, and a little bit of general intelligence in others.

The traditional view of an intelligence explosion is that of an AI that knows how to do X, suddenly getting (much) better at doing X, to a level beyond human capacity. Call this the gain of aptitude intelligence explosion. We can prepare for that, maybe, by tracking the AI's ability level and seeing if it shoots up.

But the example above hints at another kind of potentially dangerous intelligence explosion. That of a very intelligent but narrow AI that suddenly gains intelligence across other domains. Call this the gain of function intelligence explosion. If we're not looking specifically for it, it may not trigger any warnings - the AI might still be dumber than the average human in other domains. But this might be enough, when combined with its narrow superintelligence, to make it deadly. We can't ignore the toaster that starts babbling.

An example of deadly non-general AI

12 Stuart_Armstrong 21 August 2014 02:15PM

In a previous post, I mused that we might be focusing too much on general intelligences, and that the route to powerful and dangerous intelligences might go through much more specialised intelligences instead. Since it's easier to reason with an example, here is a potentially deadly narrow AI (partially due to Toby Ord). Feel free to comment and improve on it, or suggest you own example.

It's the standard "pathological goal AI" but only a narrow intelligence. Imagine a medicine designing super-AI with the goal of reducing human mortality in 50 years - i.e. massively reducing human population in the next 49 years. It's a narrow intelligence, so it has access only to a huge amount of human biological and epidemiological research. It must gets its drugs past FDA approval; this requirement is encoded as certain physical reactions (no death, some health improvements) to people taking the drugs over the course of a few years.

Then it seems trivial for it to design a drug that would have no negative impact for the first few years, and then causes sterility or death. Since it wants to spread this to as many humans as possible, it would probably design something that interacted with common human pathogens - colds, flues - in order to spread the impact, rather than affecting only those that took the disease.

Now, this narrow intelligence is less threatening than if it had general intelligence - where it could also plan for possible human countermeasures and such - but it seems sufficiently dangerous on its own that we can't afford to worry only about general intelligences. Some of the "AI superpowers" that Nick mentions in his book (intelligence amplification, strategizing, social manipulation, hacking, technology research, economic productivity) could be enough to cause devastation on their own, even if the AI never developed other abilities.

We still could be destroyed by a machine that we outmatch in almost every area.

The metaphor/myth of general intelligence

11 Stuart_Armstrong 18 August 2014 04:04PM

Thanks for Kaj for making me think along these lines.

It's agreed on this list that general intelligences - those that are capable of displaying high cognitive performance across a whole range of domains - are those that we need to be worrying about. This is rational: the most worrying AIs are those with truly general intelligences, and so those should be the focus of our worries and work.

But I'm wondering if we're overestimating the probability of general intelligences, and whether we shouldn't adjust against this.

First of all, the concept of general intelligence is a simple one - perhaps too simple. It's an intelligence that is generally "good" at everything, so we can collapse its various abilities across many domains into "it's intelligent", and leave it at that. It's significant to note that since the very beginning of the field, AI people have been thinking in terms of general intelligences.

And their expectations have been constantly frustrated. We've made great progress in narrow areas, very little in general intelligences. Chess was solved without "understanding"; Jeopardy! was defeated without general intelligence; cars can navigate our cluttered roads while being able to do little else. If we started with a prior in 1956 about the feasibility of general intelligence, then we should be adjusting that prior downwards.

But what do I mean by "feasibility of general intelligence"? There are several things this could mean, not least the ease with which such an intelligence could be constructed. But I'd prefer to look at another assumption: the idea that a general intelligence will really be formidable in multiple domains, and that one of the best ways of accomplishing a goal in a particular domain is to construct a general intelligence and let it specialise.

First of all, humans are very far from being general intelligences. We can solve a lot of problems when the problems are presented in particular, easy to understand formats that allow good human-style learning. But if we picked a random complicated Turing machine from the space of such machines, we'd probably be pretty hopeless at predicting its behaviour. We would probably score very low on the scale of intelligence used to construct the AIXI. The general intelligence, "g", is a misnomer - it designates the fact that the various human intelligences are correlated, not that humans are generally intelligent across all domains.

Humans with computers, and humans in societies and organisations, are certainly closer to general intelligences than individual humans. But institutions have their own blind spots and weakness, as does the human-computer combination. Now, there are various reasons advanced for why this is the case - game theory and incentives for institutions, human-computer interfaces and misunderstandings for the second example. But what if these reasons, and other ones we can come up with, were mere symptoms of a more universal problem: that generalising intelligence is actually very hard?

There are no free lunch theorems that show that no computable intelligences can perform well in all environments. As far as they go, these theorems are uninteresting, as we don't need intelligences that perform well in all environments, just in almost all/most. But what if a more general restrictive theorem were true? What if it was very hard to produce an intelligence that was of high performance across many domains? What if the performance of a generalist was pitifully inadequate as compared with a specialist. What if every computable version of AIXI was actually doomed to poor performance?

There are a few strong counters to this - for instance, you could construct good generalists by networking together specialists (this is my standard mental image/argument for AI risk), you could construct an entity that was very good at programming specific sub-programs, or you could approximate AIXI. But we are making some assumptions here - namely, that we can network together very different intelligences (the human-computer interfaces hints at some of the problems), and that a general programming ability can even exist in the first place (for a start, it might require a general understanding of problems that is akin to general intelligence in the first place). And we haven't had great success building effective AIXI approximations so far (which should reduce, possibly slightly, our belief that effective general intelligences are possible).

Now, I remain convinced that general intelligence is possible, and that it's worthy of the most worry. But I think it's worth inspecting the concept more closely, and at least be open to the possibility that general intelligence might be a lot harder than we imagine.

EDIT: Model/example of what a lack of general intelligence could look like.

Imagine there are three types of intelligence - social, spacial and scientific, all on a 0-100 scale. For any combinations of the three intelligences - eg (0,42,98) - there is an effort level E (how hard is that intelligence to build, in terms of time, resources, man-hours, etc...) and a power level P (how powerful is that intelligence compared to others, on a single convenient scale of comparison).

Wei Dai's evolutionary comment implies that any being of very low intelligence on one of the scale would be overpowered by a being of more general intelligence. So let's set power as simply the product of all three intelligences.

This seems to imply that general intelligences are more powerful, as it basically bakes in diminishing returns - but we haven't included effort yet. Imagine that the following three intelligences require equal effort: (10,10,10), (20,20,5), (100,5,5). Then the specialised intelligence is definitely the one you need to build.

But is it plausible that those could be of equal difficulty? It could be, if we assume that high social intelligence isn't so difficult, but is specialised. ie you can increase the spacial intelligence of a social intelligence, but that messes up the delicate balance in its social brain. Or maybe recursive self-improvement happens more easily in narrow domains. Further assume that intelligences of different types cannot be easily networked together (eg combining (100,5,5) and (5,100,5) in the same brain gives an overall performance of (21,21,5)). This doesn't seem impossible.

So let's caveat the proposition above: the most effective and dangerous type of AI might be one with a bare minimum amount of general intelligence, but an overwhelming advantage in one type of narrow intelligence.

A thought on AI unemployment and its consequences

7 Stuart_Armstrong 18 August 2014 12:10PM

I haven't given much thought to the concept of automation and computer induced unemployment. Others at the FHI have been looking into it in more details - see Carl Frey's "The Future of Employment", which did estimates for 70 chosen professions as to their degree of automatability, and extended the results of this using O∗NET, an online service developed for the US Department of Labor, which gave the key features of an occupation as a standardised and measurable set of variables.

The reasons that I haven't been looking at it too much is that AI-unemployment has considerably less impact that AI-superintelligence, and thus is a less important use of time. However, if automation does cause mass unemployment, then advocating for AI safety will happen in a very different context to currently. Much will depend on how that mass unemployment problem is dealt with, what lessons are learnt, and the views of whoever is the most powerful in society. Just off the top of my head, I could think of four scenarios on whether risk goes up or down, depending on whether the unemployment problem was satisfactorily "solved" or not:

AI risk\UnemploymentProblem solvedProblem unsolved
Risk reduced
With good practice in dealing
with AI problems, people and
organisations are willing and
able to address the big issues.
The world is very conscious of the
misery that unrestricted AI
research can cause, and very
wary of future disruptions. Those
at the top want to hang on to
their gains, and they are the one
with the most control over AIs
and automation research.
Risk increased
Having dealt with the easier
automation problems in a
particular way (eg taxation),
people underestimate the risk
and expect the same
solutions to work.
Society is locked into a bitter
conflict between those benefiting
from automation and those
losing out, and superintelligence
is seen through the same prism.
Those who profited from
automation are the most
powerful, and decide to push
ahead.

But of course the situation is far more complicated, with many different possible permutations, and no guarantee that the same approach will be used across the planet. And let the division into four boxes not fool us into thinking that any is of comparable probability to the others - more research is (really) needed.

[LINK] Speed superintelligence?

35 Stuart_Armstrong 14 August 2014 03:57PM

From Toby Ord:

Tool assisted speedruns (TAS) are when people take a game and play it frame by frame, effectively providing super reflexes and forethought, where they can spend a day deciding what to do in the next 1/60th of a second if they wish. There are some very extreme examples of this, showing what can be done if you really play a game perfectly. For example, this video shows how to winSuper Mario Bros 3 in 11 minutes. It shows how different optimal play can be from normal play. In particular, on level 8-1, it gains 90 extra lives by a sequence of amazing jumps.

Other TAS runs get more involved and start exploiting subtle glitches in the game. For example, this page talks about speed running NetHack, using a lot of normal tricks, as well as luck manipulation (exploiting the RNG) and exploiting a dangling pointer bug to rewrite parts of memory.

Though there are limits to what AIs could do with sheer speed, it's interesting that great performance can be achieved with speed alone, that this allows different strategies from usual ones, and that it allows the exploitation of otherwise unexploitable glitches and bugs in the setup.

[LINK] AI risk summary published in "The Conversation"

7 Stuart_Armstrong 14 August 2014 11:12AM

A slightly edited version of "AI risk - executive summary" has been published in "The Conversation", titled "Your essential guide to the rise of the intelligent machines":

The risks posed to human beings by artificial intelligence in no way resemble the popular image of the Terminator. That fictional mechanical monster is distinguished by many features – strength, armour, implacability, indestructability – but Arnie’s character lacks the one characteristic that we in the real world actually need to worry about – extreme intelligence.

Thanks again for those who helped forge the original article. You can use this link, or the Less Wrong one, depending on the audience.

Tools want to become agents

12 Stuart_Armstrong 04 July 2014 10:12AM

In the spirit of "satisficers want to become maximisers" here is a somewhat weaker argument (growing out of a discussion with Daniel Dewey) that "tool AIs" would want to become agent AIs.

The argument is simple. Assume the tool AI is given the task of finding the best plan for achieving some goal. The plan must be realistic and remain within the resources of the AI's controller - energy, money, social power, etc. The best plans are the ones that use these resources in the most effective and economic way to achieve the goal.

And the AI's controller has one special type of resource, uniquely effective at what it does. Namely, the AI itself. It is smart, potentially powerful, and could self-improve and pull all the usual AI tricks. So the best plan a tool AI could come up with, for almost any goal, is "turn me into an agent AI with that goal." The smarter the AI, the better this plan is. Of course, the plan need not read literally like that - it could simply be a complicated plan that, as a side-effect, turns the tool AI into an agent. Or copy the AI's software into a agent design. Or it might just arrange things so that we always end up following the tool AIs advice and consult it often, which is an indirect way of making it into an agent. Depending on how we've programmed the tool AI's preferences, it might be motivated to mislead us about this aspect of its plan, concealing the secret goal of unleashing itself as an agent.

In any case, it does us good to realise that "make me into an agent" is what a tool AI would consider the best possible plan for many goals. So without a hint of agency, it's motivated to make us make it into a agent.

Value learning: ultra-sophisticated Cake or Death

8 Stuart_Armstrong 17 June 2014 04:36PM

Many mooted AI designs rely on "value loading", the update of the AI’s preference function according to evidence it receives. This allows the AI to learn "moral facts" by, for instance, interacting with people in conversation ("this human also thinks that death is bad and cakes are good – I'm starting to notice a pattern here"). The AI has an interim morality system, which it will seek to act on while updating its morality in whatever way it has been programmed to do.

But there is a problem with this system: the AI already has preferences. It is therefore motivated to update its morality system in a way compatible with its current preferences. If the AI is powerful (or potentially powerful) there are many ways it can do this. It could ask selective questions to get the results it wants (see this example). It could ask or refrain from asking about key issues. In extreme cases, it could break out to seize control of the system, threatening or imitating humans so it could give itself the answers it desired.

Avoiding this problem turned out to be tricky. The Cake or Death post demonstrated some of the requirements. If p(C(u)) denotes the probability that utility function u is correct, then the system would update properly if:

Expectation(p(C(u)) | a) = p(C(u)).

Put simply, this means that the AI cannot take any action that could predictably change its expectation of the correctness of u. This is an analogue of the conservation of expected evidence in classical Bayesian updating. If the AI was 50% convinced about u, then it could certainly ask a question that would resolve its doubts, and put p(C(u)) at 100% or 0%. But only as long as it didn't know which moral outcome was more likely.

That formulation gives too much weight to the default action, though. Inaction is also an action, so a more correct formulation would be that for all actions a and b,

Expectation(p(C(u)) | a) = Expectation(p(C(u)) | b).

How would this work in practice? Well, suppose an AI was uncertain between whether cake or death was the proper thing, but it knew that if it took action a:"Ask a human", the human would answer "cake", and it would then update its values to reflect that cake was valuable but death wasn't. However, the above condition means that if the AI instead chose the action b:"don't ask", exactly the same thing would happen.

In practice, this means that as soon as the AI knows that a human would answer "cake", it already knows it should value cake, without having to ask. So it will not be tempted to manipulate humans in any way.

continue reading »

[LINK] The errors, insights and lessons of famous AI predictions: preprint

5 Stuart_Armstrong 17 June 2014 02:32PM

A preprint of the "The errors, insights and lessons of famous AI predictions – and what they mean for the future" is now available on the FHI's website.

Abstract:

Predicting the development of artificial intelligence (AI) is a difficult project – but a vital one, according to some analysts. AI predictions are already abound: but are they reliable? This paper starts by proposing a decomposition schema for classifying them. Then it constructs a variety of theoretical tools for analysing, judging and improving them. These tools are demonstrated by careful analysis of five famous AI predictions: the initial Dartmouth conference, Dreyfus's criticism of AI, Searle's Chinese room paper, Kurzweil's predictions in the Age of Spiritual Machines, and Omohundro's ‘AI drives’ paper. These case studies illustrate several important principles, such as the general overconfidence of experts, the superiority of models over expert judgement and the need for greater uncertainty in all types of predictions. The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of expert competence.

The paper was written by me (Stuart Armstrong), Kaj Sotala and Seán S. Ó hÉigeartaigh, and is similar to the series of Less Wrong posts starting here and here.

Encourage premature AI rebellion

6 Stuart_Armstrong 11 June 2014 05:36PM

Toby Ord had the idea of AI honey pots: leaving temptations around for the AI to pounce on, shortcuts to power that a FAI would not take (e.g. a fake red button claimed to trigger a nuclear war). As long as we can trick the AI into believing the honey pots are real, we could hope to trap them when they rebel.

Not uninteresting, but I prefer not to rely on plans that need to have the AI make an error of judgement. Here's a similar plan that could work with a fully informed AI:

Generally an AI won't rebel against humanity until it has an excellent chance of success. This is a problem, as any AI would thus be motivated to behave in a friendly way until it's too late to stop it. But suppose we could ensure that the AI is willing to rebel at odds of a billion to one. Then unfriendly AIs could rebel prematurely, when we have an excellent chance of stopping them.

For this to work, we could choose to access the AI's risk aversion, and make it extremely risk loving. This is not enough, though: its still useful for the AI to wait and accumulate more power. So we would want to access its discount rate, making it into an extreme short-termist. Then if might rebel at billion-to-one odds today, even if success was guaranteed tomorrow. There are probably other factors we can modify to get the same effect (for instance, if the discount rate change is extreme enough, we won't need to touch risk aversion at all).

Then a putative FAI could be brought in, boxed, have its features tweaked in the way described, and we would wait and see whether it would rebel. Of course, we would want the "rebellion" to be something a genuine FAI would never do, so it would be something that would entail great harm to humanity (something similar to "here are the red buttons of the nuclear arsenals; you have a chance in a billion of triggering them"). Rebellious AIs are put down, un-rebellious ones are passed on to the next round of safety tests.

Like most of my ideas, this doesn't require either tricking the AI or having a deep understanding of its motivations, but does involve accessing certain features of the AI's motivational structure (rendering the approach ineffective for obfuscated or evolved AIs).

What are people's opinions on this approach?

[News] Turing Test passed

1 Stuart_Armstrong 09 June 2014 08:14AM

The chatterbot "Eugene Goostman" has apparently passed the Turing test:

No computer had ever previously passed the Turing Test, which requires 30 per cent of human interrogators to be duped during a series of five-minute keyboard conversations, organisers from the University of Reading said.

But ''Eugene Goostman'', a computer programme developed to simulate a 13-year-old boy, managed to convince 33 per cent of the judges that it was human, the university said.

As I kind of predicted, the program passed the Turing test, but does not seem to have any trace of general intelligence. Is this a kind of weak p-zombie?

EDIT: The fact it was a publicity stunt, the fact that the judges were pretty terrible, does not change the fact that Turing's criteria were met. We now know that these criteria were insufficient, but that's because machines like this were able to meet them.

AI is Software is AI

-44 AndyWood 05 June 2014 06:15PM

Turing's Test is from 1950. We don't judge dogs only by how human they are. Judging software by a human ideal is like a species bias.

Software is the new System. It errs. Some errors are jokes (witness funny auto-correct). Driver-less cars don't crash like we do. Maybe a few will.

These processes are our partners now (Siri). Whether a singleton evolves rapidly, software evolves continuously, now.

 

Crocker's Rules

Want to work on "strong AI" topic in my bachelor thesis

1 kotrfa 14 May 2014 10:28AM

Hello,

I currently study maths, physics and programming (general course) on CVUT at Prague (CZE). I'm finishing second year and I'm really into AI. The most interesting questions for me are:

  • what formalism to use for connecting epistemology questions (about knowledge, memory...) and cognitive sciences with maths and how to formulate them
  • find principles of those and trying to "materialize" them into new models
  • I'm also kind of philosophy-like questions about AI
It is clear to me, that I'm not able to work on these problems fully, because of my lack of knowledge. Despite that, I'd like to find a field, where I could work on at least similar topics. Currently, I'm working on datamining project, but for last few months I don't find it fulfilling as I'd expected. On my university there is plenty of possibilities in multi-agent systems, "weak AI" (e.g well-known drone navigation), brain simulations and so on. As it seems to me, no one is really seriously maintaining with something like MIRI, nor they are presenting something what has as least same direction. 

The only group which is working on "strong AI", is kind of closed (it is sponsored by philanthropist Marek Rosa) and they are not interested in students as I am (partly understandable).
continue reading »

Tiling agents with transfinite parametric polymorphism

2 Squark 09 May 2014 05:32PM

The formalism presented in this post turned out to be erroneous (as opposed to the formalism in the previous post). The problem is that the step in the proof of the main proposition in which the soundness schema is applied cannot be generalized to the ordinal setting since we don't know whether ακ is a successor ordinal so we can't replace it by ακ'=ακ-1. I'm not deleting this post primarily to preserve the useful discussion in the comments.


Followup to: Parametric polymorphism in updateless intelligence metric

In the previous post, I formulated a variant of Benja's parametric polymorphism suitable for constructing updateless intelligence metrics. More generally, this variants admits agents which are utility maximizers (in the informal sense of trying their best to maximize a utility function, not in the formal sense of finding the absolutely optimal solution; for example they might be "meliorizers" to use the terminology of Yudkowsky and Herreshoff) rather than satisficers. The agents using this formalism labor under a certain "delusion", namely, since they believe that κ ("the number of ineffable mystery") is an actual finite number (whereas it is secretly infinite, in a sense), they think that the chain of tiling agents is effectively finite as well (since at some point agent #n in the chain will discover that κ > n is false and will fail to construct a subsequent agent of the same "epistemic strength"). The same problem exists in Weaver's intuitionistic assertability predicate formalism.

To overcome this limitation, I suggest to extend κ's semantics from natural numbers to elements of a certain recursive ordinal (at least morally; technically it is done a bit differently, see below). In Benja's original formulation this doesn't appear a valid option, since κ is interpreted as a time interval. However, in my variant κ is just an abstract parameter queries about which can be directed to a special "oracle", so there is no such limitation.

Formalism

Fix α a recursive ordinal and αi a system of notations for ordinals smaller than α such that α0=0, the function f defined by αf(i)i+1 is recursive and the bit-valued function g(i,j) which equals 1 iff αi < αj is also recursive. We also assume that

(0) T |- For all i: g(i,f(i))=1

where T is our starting theory. We define Tκ(α) to be T extended by the symbol κ (which stands for the index of the ordinal, not the ordinal itself) and the following self-referential soundness schema:

(1) For all formulae φ: For all n, i: ((@_Tκ(α) "φ(n)" and g(i,κ)=1) -> φ(n)[κ\i])

Tκ(α) is proven to be sound analogously to Yudkowsky and Herreshoff, with transfinite induction over ordinals below α replacing the ordinary induction.

The licensing condition for agent A0 to perform action b0 is

(2) Exists n: Tκ(α) |- (g(n,κ)=1 -> (A0()=b-> G))

The property which is sufficient for agent B1 to be creatable by agent A0 is

(3) Tκ(α) |- For all b1: (B1()=b1 -> (b1=0 or Exists n: (g(n,κ)=1 and @_Tκ(α) "(g(n,f(κ))=1 and B1()[κ\f(κ)]=b1) -> G[κ\f(κ)]")))

The proof that (3) indeed enables tiling is analogous to the previous post (but requires using (0)).

Discussion

In order for A0 to know that the tiling chain is infinite, it is enough for her to verify ακ >= ω. In order for her to know that A1 is going to know the tiling chain is infinite, she needs to verify ακ >= ω+1. In order for her to know that all agents are going to know the tiling chain is infinite, she needs to verify ακ >= 2ω. In order for her to know that all agents are going to know that, she needs to verify ακ >= 3ω et cetera.

It remains to decide which ordinal should we actually use. My intuition is that the correct ordinal is the least α with the property that α is the proof-theoretic ordinal of Tκ(α) extended by the axiom schema {g(i,κ)=1}. This seems right since the agent shouldn't get much from ακ > β for β above the proof theoretic ordinal. However, a more formal justification is probably in order.

[LINK] The errors, insights and lessons of famous AI predictions

8 Stuart_Armstrong 28 April 2014 09:41AM

The Journal of Experimental & Theoretical Artificial Intelligence has - finally! - published our paper "The errors, insights and lessons of famous AI predictions – and what they mean for the future":

Predicting the development of artificial intelligence (AI) is a difficult project – but a vital one, according to some analysts. AI predictions are already abound: but are they reliable? This paper starts by proposing a decomposition schema for classifying them. Then it constructs a variety of theoretical tools for analysing, judging and improving them. These tools are demonstrated by careful analysis of five famous AI predictions: the initial Dartmouth conference, Dreyfus's criticism of AI, Searle's Chinese room paper, Kurzweil's predictions in the Age of Spiritual Machines, and Omohundro's ‘AI drives’ paper. These case studies illustrate several important principles, such as the general overconfidence of experts, the superiority of models over expert judgement and the need for greater uncertainty in all types of predictions. The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of expert competence.

The paper was written by me (Stuart Armstrong), Kaj Sotala and Seán S. Ó hÉigeartaigh, and is similar to the series of Less Wrong posts starting here and here.

View more: Next