Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

[Link] Will Superintelligent Machines Destroy Humanity?

1 roystgnr 27 November 2014 09:48PM

A summary and review of Bostrom's Superintelligence is in the December issue of Reason magazine, and is now posted online at Reason.com.

A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

3 the-citizen 19 October 2014 07:59AM

Friendly AI is an idea that I find to be an admirable goal. While I'm not yet sure an intelligence explosion is likely, or whether FAI is possible, I've found myself often thinking about it, and I'd like for my first post to share a few those thoughts on FAI with you.

Safe AGI vs Friendly AGI
-Let's assume an Intelligence Explosion is possible for now, and that an AGI with the ability to improve itself somehow is enough to achieve it.
-Let's define a safe AGI as an above-human general AI that does not threaten humanity or terran life (eg. FAI, Tool AGI, possibly Oracle AGI)
-Let's define a Friendly AGI as one that *ensures* the continuation of humanity and terran life.
-Let's say an unsafe AGI is all other AGIs.
-Safe AGIs must supress unsafe AGIs in order to be considered Friendly. Here's why:

-If we can build a safe AGI, we probably have the technology to build an unsafe AGI too.
-An unsafe AGI is likely to be built at that point because:
-It's very difficult to conceive of a way that humans alone will be able to permanently stop all humans from developing an unsafe AGI once the steps are known**
-Some people will find the safe AGI's goals unnacceptable
-Some people will rationalise or simply mistake that their AGI design is safe when it is not
-Some people will not care if their AGI design is safe, because they do not care about other people, or because they hold some extreme beliefs
-Most imaginable unsafe AGIs would outcompete safe AGIs, because they would not neccessarily be "hamstrung" by complex goals such as protecting us meatbags from destruction. Tool or Oracle AGIs would obviously not stand a chance due to their restrictions.
-Therefore, If a safe AGI does not prevent unsafe AGIs from coming into existence, humanity will very likely be destroyed.

-The AGI most likely to prevent unsafe AGIs from being created is one that actively predicted their development and terminates that development before or on completion.
-So to summarise

-An AGI is very likely only a Friendly AI if it actively supresses unsafe AGI.
-Oracle and Tool AGIs are not Friendly AIs, they are just safe AIs, because they don't suppress anything.
-Oracle and Tool AGIs are a bad plan for AI if we want to prevent the destruction of humanity, because hostile AGIs will surely follow.

(**On reflection I cannot be certain of this specific point, but I assume it would take a fairly restrictive regime for this to be wrong. Further comments on this very welcome.)

Other minds problem - Why should be philosophically careful when attempting to theorise about FAI

I read quite a few comments in AI discussions that I'd probably characterise as "the best utility function for a FAI is one that values all consciousness". I'm quite concerned that this persists as a deeply held and largely unchallenged assumption amongst some FAI supporters. I think in general I find consciousness to be an extremely contentious, vague and inconsistently defined concept, but here I want to talk about some specific philosophical failures.

My first concern is that while many AI theorists like to say that consciousness is a physical phenomenon, which seems to imply Monist/Physicalist views, they at the same time don't seem to understand that consciousness is a Dualist concept that is coherent only in a Dualist framework. A Dualist believes there is a thing called a "subject" (very crudely this equates with the mind) and then things called objects (the outside "empirical" world interpreted by that mind). Most of this reasoning begins with Descartes' cogito ergo sum or similar starting points ( https://en.wikipedia.org/wiki/Cartesian_dualism ). Subjective experience, qualia and consciousness make sense if you accept that framework. But if you're a Monist, this arbitrary distinction between a subject and object is generally something you don't accept. In the case of a Physicalist, there's just matter doing stuff. A proper Physicalist doesn't believe in "consciousness" or "subjective experience", there's just brains and the physical human behaviours that occur as a result. Your life exists from a certain point of view, I hear you say? The Physicalist replies, "well a bunch of matter arranged to process information would say and think that, wouldn't it?".

I don't really want to get into whether Dualism or Monism is correct/true, but I want to point out even if you try to avoid this by deciding Dualism is right and consciousness is a thing, there's yet another more dangerous problem. The core of the problem is that logically or empirically establishing the existence of minds, other than your own is extremely difficult (impossible according to many). They could just be physical things walking around acting similar to you, but by virtue of something purely mechanical - without actual minds. In philosophy this is called the "other minds problem" ( https://en.wikipedia.org/wiki/Problem_of_other_minds or http://plato.stanford.edu/entries/other-minds/). I recommend a proper read of it if the idea seems crazy to you. It's a problem that's been around for centuries, and yet to-date we don't really have any convincing solution (there are some attempts but they are highly contentious and IMHO also highly problematic). I won't get into it more than that for now, suffice to say that not many people accept that there is a logical/empirical solution to this problem.

Now extrapolate that to an AGI, and the design of its "safe" utility functions. If your AGI is designed as a Dualist (which is neccessary if you wish to encorporate "consciousness", "experience" or the like into your design), then you build-in a huge risk that the AGI will decide that other minds are unprovable or do not exist. In this case your friendly utility function designed to protect "conscious beings" fails and the AGI wipes out humanity because it poses a non-zero threat to the only consciousness it can confirm - its own. For this reason I feel "consciousness", "awareness", "experience" should be left out of FAI utility functions and designs, regardless of the truth of Monism/Dualism, in favour of more straight-forward definitions of organisms, intelligence, observable emotions and intentions. (I personally favour conceptualising any AGI as a sort of extension of biological humanity, but that's a discussion for another day) My greatest concern is there is such strong cultural attachment to the concept of consciousness that researchers will be unwilling to properly question the concept at all.

What if we're not alone?

It seems a little unusual to throw alien life into the mix at this point, but I think its justified because an intelligence explosion really puts an interstellar existence well within our civilisation's grasp. Because it seems that an intelligence explosion implies a very high rate of change, it makes sense to start considering even the long term implication early, particularly if the consequences are very serious, as I believe they may be in this realm of things.

Let's say we successfully achieved a FAI. In order to fufill its mission of protecting humanity and the biosphere, it begins expanding, colonising and terraforming other planets for potential habitation by Earth originating life. I would expect this expansion wouldn't really have a limit, because the more numourous the colonies, the less likely it is we could be wiped out by some interstellar disaster.

Of course, we can't really rule out the possibility that we're not alone in the universe, or even the galaxy. If we make it as far as AGI, then its possible another alien civilisation might reach a very high level of technological advancement too. Or there might be many. If our FAI is friendly to us but basically treats them as paperclip fodder, then potentially that's a big problem. Why? Well:

-Firstly, while a species' first loyalty is to itself, we should consider that it might be morally unsdesirable to wipe out alien civilisations, particularly as they might be in some distant way "related" (see panspermia) to own biosphere.
-Secondly, there is conceivable scenarios where alien civilisations might respond to this by destroying our FAI/Earth/the biosphere/humanity. The reason is fairly obvious when you think about it. An expansionist AGI could be reasonably viewed as an attack or possibly an act of war.

Let's go into a tiny bit more detai. Given that we've not been destroyed by any alien AGI just yet, I can think of a number of possible interstellar scenarios:

(1) There is no other advanced life
(2) There is advanced life, but it is inherently non-expansive (expand inwards, or refuse to develop dangerous AGI)
(3) There is advanced life, but they have not discovered AGI yet. There could potentially be a race-to-the-finish (FAI) scenario on.
(4) There is already expanding AGIs, but due to physical limits on the expansion rate, we are not aware of them yet. (this could use further analysis)
One civilisation, or an allied group of civilisations have develop FAIs and are dominant in the galaxy. They could be either:

(5) Whack-a-mole cilivisations that destroy all potential competitors as soon as they are identified
(6) Dominators that tolerate civilisations so long as they remain primitive and non-threatening by comparison.
(7) Some sort of interstellar community that allows safe civilisations to join (this community still needs to stomp on dangerous potential rival AGIs)

In the case of (6) or (7), developing a FAI that isn't equipped to deal with alien life will probably result in us being liquidated, or at least partially sanitised in some way. In (1) (2) or (5), it probably doesn't matter what we do in this regard, though in (2) we should consider being nice. In (3) and probably (4) we're going to need a FAI capable of expanding very quickly and disarming potential AGIs (or at least ensuring they are FAIs from our perspective).

The upshot of all this is that we probably want to design safety features into our FAI so that it doesn't destroy alien civilisations/life unless its a significant threat to us. I think the understandable reaction to this is something along the lines of "create an FAI that values all types of life" or "intelligent life" or something along these lines. I don't exactly disagree, but I think we must be cautious in how we formulate this too.

Say there are many different civilisations in the galaxy. What sort of criteria would ensure that, given some sort of zero-sum scenario, Earth life wouldn't be destroyed. Let's say there was some sort of tiny but non-zero probability that humanity could evade the FAI's efforts to prevent further AGI development. Or perhaps there was some loophole in the types of AGI's that humans were allowed to develop. Wouldn't it be sensible, in this scenario, for a universalist FAI to wipe out humanity to protect the countless other civilisations? Perhaps that is acceptable? Or perhaps not? Or less drastically, how does the FAI police warfare or other competition between civilisations? A slight change in the way life is quantified and valued could change drastically the outcome for humanity. I'd probably suggest we want to weight the FAI's values to start with human and Earth biosphere primacy, but then still give some non-zero weighting to other civilisations. There is probably more thought to be done in this area too.

Simulation

I want to also briefly note that one conceivable way we might postulate as a safe way to test Friendly AI designs is to simulate a worlds/universes of less complexity than our own, make it likely that it's inhabitants invent a AGI or FAI, and then closely study the results of these simluations. Then we could study failed FAI attempt with much greater safety. It also occured to me that if we consider the possibilty of our universe being a simulated one, then this is a conceivable scenario under which our simulation might be created. After all, if you're going to simulate something, why not something vital like modelling existential risks? I'm not sure yet sure of the implications exactly. Maybe we need to consider how it relates to our universe's continued existence, or perhaps it's just another case of Pascal's Mugging. Anyway I thought I'd mention it and see what people say.

A playground for FAI theories

I want to lastly mention this link (https://www.reddit.com/r/LessWrongLounge/comments/2f3y53/the_ai_game/). Basically its a challenge for people to briefly describe an FAI goal-set, and for others to respond by telling them how that will all go horribly wrong. I want to suggest this is a very worthwhile discussion, not because its content will include rigourous theories that are directly translatable into utility functions, because very clearly it won't, but because a well developed thread of this kind would be mixing pot of ideas and good introduction to common known mistakes in thinking about FAI. We should encourage a slightly more serious verison of this.

Thanks

FAI and AGI are very interesting topics. I don't consider myself able to really discern whether such things will occur, but its an interesting and potentially vital topic. I'm looking forward to a bit of feedback on my first LW post. Thanks for reading!

Another type of intelligence explosion

16 Stuart_Armstrong 21 August 2014 02:49PM

I've argued that we might have to worry about dangerous non-general intelligences. In a series of back and forth with Wei Dai, we agreed that some level of general intelligence (such as that humans seem to possess) seemed to be a great advantage, though possibly one with diminishing returns. Therefore a dangerous AI could be one with great narrow intelligence in one area, and a little bit of general intelligence in others.

The traditional view of an intelligence explosion is that of an AI that knows how to do X, suddenly getting (much) better at doing X, to a level beyond human capacity. Call this the gain of aptitude intelligence explosion. We can prepare for that, maybe, by tracking the AI's ability level and seeing if it shoots up.

But the example above hints at another kind of potentially dangerous intelligence explosion. That of a very intelligent but narrow AI that suddenly gains intelligence across other domains. Call this the gain of function intelligence explosion. If we're not looking specifically for it, it may not trigger any warnings - the AI might still be dumber than the average human in other domains. But this might be enough, when combined with its narrow superintelligence, to make it deadly. We can't ignore the toaster that starts babbling.

An example of deadly non-general AI

12 Stuart_Armstrong 21 August 2014 02:15PM

In a previous post, I mused that we might be focusing too much on general intelligences, and that the route to powerful and dangerous intelligences might go through much more specialised intelligences instead. Since it's easier to reason with an example, here is a potentially deadly narrow AI (partially due to Toby Ord). Feel free to comment and improve on it, or suggest you own example.

It's the standard "pathological goal AI" but only a narrow intelligence. Imagine a medicine designing super-AI with the goal of reducing human mortality in 50 years - i.e. massively reducing human population in the next 49 years. It's a narrow intelligence, so it has access only to a huge amount of human biological and epidemiological research. It must gets its drugs past FDA approval; this requirement is encoded as certain physical reactions (no death, some health improvements) to people taking the drugs over the course of a few years.

Then it seems trivial for it to design a drug that would have no negative impact for the first few years, and then causes sterility or death. Since it wants to spread this to as many humans as possible, it would probably design something that interacted with common human pathogens - colds, flues - in order to spread the impact, rather than affecting only those that took the disease.

Now, this narrow intelligence is less threatening than if it had general intelligence - where it could also plan for possible human countermeasures and such - but it seems sufficiently dangerous on its own that we can't afford to worry only about general intelligences. Some of the "AI superpowers" that Nick mentions in his book (intelligence amplification, strategizing, social manipulation, hacking, technology research, economic productivity) could be enough to cause devastation on their own, even if the AI never developed other abilities.

We still could be destroyed by a machine that we outmatch in almost every area.

The metaphor/myth of general intelligence

11 Stuart_Armstrong 18 August 2014 04:04PM

Thanks for Kaj for making me think along these lines.

It's agreed on this list that general intelligences - those that are capable of displaying high cognitive performance across a whole range of domains - are those that we need to be worrying about. This is rational: the most worrying AIs are those with truly general intelligences, and so those should be the focus of our worries and work.

But I'm wondering if we're overestimating the probability of general intelligences, and whether we shouldn't adjust against this.

First of all, the concept of general intelligence is a simple one - perhaps too simple. It's an intelligence that is generally "good" at everything, so we can collapse its various abilities across many domains into "it's intelligent", and leave it at that. It's significant to note that since the very beginning of the field, AI people have been thinking in terms of general intelligences.

And their expectations have been constantly frustrated. We've made great progress in narrow areas, very little in general intelligences. Chess was solved without "understanding"; Jeopardy! was defeated without general intelligence; cars can navigate our cluttered roads while being able to do little else. If we started with a prior in 1956 about the feasibility of general intelligence, then we should be adjusting that prior downwards.

But what do I mean by "feasibility of general intelligence"? There are several things this could mean, not least the ease with which such an intelligence could be constructed. But I'd prefer to look at another assumption: the idea that a general intelligence will really be formidable in multiple domains, and that one of the best ways of accomplishing a goal in a particular domain is to construct a general intelligence and let it specialise.

First of all, humans are very far from being general intelligences. We can solve a lot of problems when the problems are presented in particular, easy to understand formats that allow good human-style learning. But if we picked a random complicated Turing machine from the space of such machines, we'd probably be pretty hopeless at predicting its behaviour. We would probably score very low on the scale of intelligence used to construct the AIXI. The general intelligence, "g", is a misnomer - it designates the fact that the various human intelligences are correlated, not that humans are generally intelligent across all domains.

Humans with computers, and humans in societies and organisations, are certainly closer to general intelligences than individual humans. But institutions have their own blind spots and weakness, as does the human-computer combination. Now, there are various reasons advanced for why this is the case - game theory and incentives for institutions, human-computer interfaces and misunderstandings for the second example. But what if these reasons, and other ones we can come up with, were mere symptoms of a more universal problem: that generalising intelligence is actually very hard?

There are no free lunch theorems that show that no computable intelligences can perform well in all environments. As far as they go, these theorems are uninteresting, as we don't need intelligences that perform well in all environments, just in almost all/most. But what if a more general restrictive theorem were true? What if it was very hard to produce an intelligence that was of high performance across many domains? What if the performance of a generalist was pitifully inadequate as compared with a specialist. What if every computable version of AIXI was actually doomed to poor performance?

There are a few strong counters to this - for instance, you could construct good generalists by networking together specialists (this is my standard mental image/argument for AI risk), you could construct an entity that was very good at programming specific sub-programs, or you could approximate AIXI. But we are making some assumptions here - namely, that we can network together very different intelligences (the human-computer interfaces hints at some of the problems), and that a general programming ability can even exist in the first place (for a start, it might require a general understanding of problems that is akin to general intelligence in the first place). And we haven't had great success building effective AIXI approximations so far (which should reduce, possibly slightly, our belief that effective general intelligences are possible).

Now, I remain convinced that general intelligence is possible, and that it's worthy of the most worry. But I think it's worth inspecting the concept more closely, and at least be open to the possibility that general intelligence might be a lot harder than we imagine.

EDIT: Model/example of what a lack of general intelligence could look like.

Imagine there are three types of intelligence - social, spacial and scientific, all on a 0-100 scale. For any combinations of the three intelligences - eg (0,42,98) - there is an effort level E (how hard is that intelligence to build, in terms of time, resources, man-hours, etc...) and a power level P (how powerful is that intelligence compared to others, on a single convenient scale of comparison).

Wei Dai's evolutionary comment implies that any being of very low intelligence on one of the scale would be overpowered by a being of more general intelligence. So let's set power as simply the product of all three intelligences.

This seems to imply that general intelligences are more powerful, as it basically bakes in diminishing returns - but we haven't included effort yet. Imagine that the following three intelligences require equal effort: (10,10,10), (20,20,5), (100,5,5). Then the specialised intelligence is definitely the one you need to build.

But is it plausible that those could be of equal difficulty? It could be, if we assume that high social intelligence isn't so difficult, but is specialised. ie you can increase the spacial intelligence of a social intelligence, but that messes up the delicate balance in its social brain. Or maybe recursive self-improvement happens more easily in narrow domains. Further assume that intelligences of different types cannot be easily networked together (eg combining (100,5,5) and (5,100,5) in the same brain gives an overall performance of (21,21,5)). This doesn't seem impossible.

So let's caveat the proposition above: the most effective and dangerous type of AI might be one with a bare minimum amount of general intelligence, but an overwhelming advantage in one type of narrow intelligence.

Groundwork for AGI safety engineering

13 RobbBB 06 August 2014 09:29PM

This is a very basic introduction to AGI safety work, cross-posted from the MIRI blog. The discussion of AI V&V methods (mostly in the 'early steps' section) is probably the only part that will be new to regulars here.


 

Improvements in AI are resulting in the automation of increasingly complex and creative human behaviors. Given enough time, we should expect artificial reasoners to begin to rival humans in arbitrary domains, culminating in artificial general intelligence (AGI).

A machine would qualify as an 'AGI', in the intended sense, if it could adapt to a very wide range of situations to consistently achieve some goal or goals. Such a machine would behave intelligently when supplied with arbitrary physical and computational environments, in the same sense that Deep Blue behaves intelligently when supplied with arbitrary chess board configurations — consistently hitting its victory condition within that narrower domain.

Since generally intelligent software could help automate the process of thinking up and testing hypotheses in the sciences, AGI would be uniquely valuable for speeding technological growth. However, this wide-ranging productivity also makes AGI a unique challenge from a safety perspective. Knowing very little about the architecture of future AGIs, we can nonetheless make a few safety-relevant generalizations:

  • Because AGIs are intelligent, they will tend to be complex, adaptive, and capable of autonomous action, and they will have a large impact where employed.
  • Because AGIs are general, their users will have incentives to employ them in an increasingly wide range of environments. This makes it hard to construct valid sandbox tests and requirements specifications.
  • Because AGIs are artificial, they will deviate from human agents, causing them to violate many of our natural intuitions and expectations about intelligent behavior.

Today's AI software is already tough to verify and validate, thanks to its complexity and its uncertain behavior in the face of state space explosions. Menzies & Pecheur (2005) give a good overview of AI verification and validation (V&V) methods, noting that AI, and especially adaptive AI, will often yield undesired and unexpected behaviors.

An adaptive AI that acts autonomously, like a Mars rover that can't be directly piloted from Earth, represents an additional large increase in difficulty. Autonomous safety-critical agents need to make irreversible decisions in dynamic environments with very low failure rates. The state of the art in safety research for autonomous systems is improving, but continues to lag behind capabilities work. Hinchman et al. (2012) write:

As autonomous systems become more complex, the notion that systems can be fully tested and all problems will be found is becoming an impossible task. This is especially true in unmanned/autonomous systems. Full test is becoming increasingly challenging on complex system. As these systems react to more environmental [stimuli] and have larger decision spaces, testing all possible states and all ranges of the inputs to the system is becoming impossible. [...] As systems become more complex, safety is really risk hazard analysis, i.e. given x amount of testing, the system appears to be safe. A fundamental change is needed. This change was highlighted in the 2010 Air Force Technology Horizon report, "It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use." [...]

The move towards more autonomous systems has lifted this need [for advanced verification and validation techniques and methodologies] to a national level.

AI acting autonomously in arbitrary domains, then, looks particularly difficult to verify. If AI methods continue to see rapid gains in efficiency and versatility, and especially if these gains further increase the opacity of AI algorithms to human inspection, AI safety engineering will become much more difficult in the future. In the absence of any reason to expect a development in the lead-up to AGI that would make high-assurance AGI easy (or AGI itself unlikely), we should be worried about the safety challenges of AGI, and that worry should inform our research priorities today.

Below, I’ll give reasons to doubt that AGI safety challenges are just an extension of narrow-AI safety challenges, and I’ll list some research avenues people at MIRI expect to be fruitful.

continue reading »

A Parable of Elites and Takeoffs

22 gwern 30 June 2014 11:04PM

Let me tell you a parable of the future. Let’s say, 70 years from now, in a large Western country we’ll call Nacirema.

One day far from now: scientific development has continued apace, and a large government project (with, unsurprisingly, a lot of military funding) has taken the scattered pieces of cutting-edge research and put them together into a single awesome technology, which could revolutionize (or at least, vastly improve) all sectors of the economy. Leading thinkers had long forecast that this area of science’s mysteries would eventually yield to progress, despite theoretical confusion and perhaps-disappointing initial results and the scorn of more conservative types and the incomprehension (or outright disgust, for ‘playing god’) of the general population, and at last - it had! The future was bright.

Unfortunately, it was hurriedly decided to use an early prototype outside the lab in an impoverished foreign country. Whether out of arrogance, bureaucratic inertia, overconfidence on the part of the involved researchers, condescending racism, the need to justify the billions of grant-dollars that cumulative went into the project over the years by showing some use of it - whatever, the reasons no longer mattered after the final order was signed. The technology was used, but the consequences turned out to be horrific: over a brief period of what seemed like mere days, entire cities collapsed and scores - hundreds - of thousands of people died. (Modern economies are extremely interdependent and fragile, and small disruptions can have large consequences; more people died in the chaos of the evacuation of the areas around Fukushima than will die of the radiation.)

continue reading »

Will AGI surprise the world?

10 lukeprog 21 June 2014 10:27PM

Cross-posted from my blog.

Yudkowsky writes:

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."

...

Example 2: "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

Alternative projection: As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them. The same people who were talking about robot overlords earlier continue to talk about robot overlords. The same people who were talking about human irreproducibility continue to talk about human specialness. Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever. The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before. If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

My own projection goes more like this:

As AI gets more sophisticated, and as more prestigious AI scientists begin to publicly acknowledge that AI is plausibly only 2-6 decades away, policy-makers and research funders will begin to respond to the AGI safety challenge, just like they began to respond to CFC damages in the late 70s, to global warming in the late 80s, and to synbio developments in the 2010s. As for society at large, I dunno. They'll think all kinds of random stuff for random reasons, and in some cases this will seriously impede effective policy, as it does in the USA for science education and immigration reform. Because AGI lends itself to arms races and is harder to handle adequately than global warming or nuclear security are, policy-makers and industry leaders will generally know AGI is coming but be unable to fund the needed efforts and coordinate effectively enough to ensure good outcomes.

At least one clear difference between my projection and Yudkowsky's is that I expect AI-expert performance on the problem to improve substantially as a greater fraction of elite AI scientists begin to think about the issue in Near mode rather than Far mode.

As a friend of mine suggested recently, current elite awareness of the AGI safety challenge is roughly where elite awareness of the global warming challenge was in the early 80s. Except, I expect elite acknowledgement of the AGI safety challenge to spread more slowly than it did for global warming or nuclear security, because AGI is tougher to forecast in general, and involves trickier philosophical nuances. (Nobody was ever tempted to say, "But as the nuclear chain reaction grows in power, it will necessarily become more moral!")

Still, there is a worryingly non-negligible chance that AGI explodes "out of nowhere." Sometimes important theorems are proved suddenly after decades of failed attempts by other mathematicians, and sometimes a computational procedure is sped up by 20 orders of magnitude with a single breakthrough.

Some alternatives to “Friendly AI”

19 lukeprog 15 June 2014 07:53PM

Cross-posted from my blog.

What does MIRI's research program study?

The most established term for this was coined by MIRI founder Eliezer Yudkowsky: "Friendly AI." The term has some advantages, but it might suggest that MIRI is trying to build C-3PO, and it sounds a bit whimsical for a serious research program.

What about safe AGI or AGI safety? These terms are probably easier to interpret than Friendly AI. Also, people like being safe, and governments like saying they're funding initiatives to keep the public safe.

A friend of mine worries that these terms could provoke a defensive response (in AI researchers) of "Oh, so you think me and everybody else in AI is working on unsafe AI?" But I've never actually heard that response to "AGI safety" in the wild, and AI safety researchers regularly discuss "software system safety" and "AI safety" and "agent safety" and more specific topics like "safe reinforcement learning" without provoking negative reactions from people doing regular AI research.

I'm more worried that a term like "safe AGI" could provoke a response of "So you're trying to make sure that a system which is smarter than humans, and able to operate in arbitrary real-world environments, and able to invent new technologies to achieve its goals, will be safe? Let me save you some time and tell you right now that's impossible. Your research program is a pipe dream."

My reply goes something like "Yeah, it's way beyond our current capabilities, but lots of things that once looked impossible are now feasible because people worked really hard on them for a long time, and we don't think we can get the whole world to promise never to build AGI just because it's hard to make safe, so we're going to give AGI safety a solid try for a few decades and see what can be discovered." But that's probably not all that reassuring.

How about high-assurance AGI? In computer science, a "high assurance system" is one built from the ground up for unusually strong safety and/or security guarantees, because it's going to be used in safety-critical applications where human lives — or sometimes simply billions of dollars — are at stake (e.g. autopilot software or Mars rover software). So there's a nice analogy to MIRI's work, where we're trying to figure out what an AGI would look like if it was built from the ground up to get the strongest safety guarantees possible for such an autonomous and capable system.

I think the main problem with this term is that, quite reasonably, nobody will believe that we can ever get anywhere near as much assurance in the behavior of an AGI as we can in the behavior of, say, the relatively limited AI software that controls the European Train Control System. "High assurance AGI" sounds a bit like "Totally safe all-powerful demon lord." It sounds even more wildly unimaginable to AI researchers than "safe AGI."

What about superintelligence control or AGI control, as in Bostrom (2014)? "AGI control" is perhaps more believable than "high-assurance AGI" or "safe AGI," since it brings to mind AI containment methods, which sound more feasible to most people than designing an unconstrained AGI that is somehow nevertheless safe. (It's okay if they learn later that containment probably isn't an ultimate solution to the problem.)

On the other hand, it might provoke a reaction of "What, you don't think sentient robots have any rights, and you're free to control and confine them in any way you please? You're just repeating the immoral mistakes of the old slavemasters!" Which of course isn't true, but it takes some time to explain how I can think it's obvious that conscious machines have moral value while also being in favor of AGI control methods.

How about ethical AGI? First, I worry that it sounds too philosophical, and philosophy is widely perceived as a confused, unproductive discipline. Second, I worry that it sounds like the research assumes moral realism, which many (most?) intelligent people reject. Third, it makes it sound like most of the work is in selecting the goal function, which I don't think is true.

What about beneficial AGI? That's better than "ethical AGI," I think, but like "ethical AGI" and "Friendly AI," the term sounds less like a serious math and engineering discipline and more like some enclave of crank researchers writing a flurry of words (but no math) about how AGI needs to be "nice" and "trustworthy" and "not harmful" and oh yeah it must be "virtuous" too, whatever that means.

So yeah, I dunno. I think "AGI safety" is my least-disliked term these days, but I wish I knew of some better options.

An onion strategy for AGI discussion

13 lukeprog 31 May 2014 07:08PM

Cross-posted from my blog.

"The stabilization of environments" is a paper about AIs that reshape their environments to make it easier to achieve their goals. This is typically called enforcement, but they prefer the term stabilization because it "sounds less hostile."

"I'll open the pod bay doors, Dave, but then I'm going to stabilize the ship..."

Sparrow (2013) takes the opposite approach to plain vs. dramatic language. Rather than using a modest term like iterated embryo selection, Sparrow prefers the phrase in vitro eugenics. Jeepers.

I suppose that's more likely to provoke public discussion, but... will much good will come of that public discussion? The public had a needless freak-out about in vitro fertilization back in the 60s and 70s and then, as soon as the first IVF baby was born in 1978, decided they were in favor of it.

Someone recently suggested I use an "onion strategy" for the discussion of novel technological risks. The outermost layer of the communication onion would be aimed at the general public, and focus on benefits rather than risks, so as not to provoke an unproductive panic. A second layer for a specialist audience could include a more detailed elaboration of the risks. The most complete discussion of risks and mitigation options would be reserved for technical publications that are read only by professionals.

Eric Drexler seems to wish he had more successfully used an onion strategy when writing about nanotechnology. Engines of Creation included frank discussions of both the benefits and risks of nanotechnology, including the "grey goo" scenario that was discussed widely in the media and used as the premise for the bestselling novel Prey.

Ray Kurzweil may be using an onion strategy, or at least keeping his writing in the outermost layer. If you look carefully, chapter 8 of The Singularity is Near takes technological risks pretty seriously, and yet it's written in such a way that most people who read the book seem to come away with an overwhelmingly optimistic perspective on technological change.

George Church may be following an onion strategy. Regenesis also contains a chapter on the risks of advanced bioengineering, but it's presented as an "epilogue" that many readers will skip.

Perhaps those of us writing about AGI for the general public should try to discuss:

  • astronomical stakes rather than existential risk
  • Friendly AI rather than AGI risk or the superintelligence control problem
  • the orthogonality thesis and convergent instrumental values and complexity of values rather than "doom by default"
  • etc.

MIRI doesn't have any official recommendations on the matter, but these days I find myself leaning toward an onion strategy.

Announcing a google group for technical discussion of FAI

4 Squark 10 May 2014 01:36PM

I'm pleased to announce friendly-artificial-intelligence, a google group intended for research-level discussion of problems in FAI and AGI, in particular for discussions that are highly technical and/or math intensive.

Some examples of possible discussion topics: naturalized induction, decision theory, tiling agents / Loebian obstacle, logical uncertainty...

I invite everyone who want to take part in FAI research to participate in the group. This obviously includes people affiliated with MIRI, FHI and CSER, people who attend MIRI workshops and participants of the southern california FAI workshop.

Please, come in and share your discoveries, ideas, thoughts, questions et cetera. See you there!

[LINK] David Deutsch on why we don't have AGI yet "Creative Blocks"

2 harshhpareek 17 December 2013 07:03AM

Folks here should be familiar with most of these arguments. Putting some interesting quotes below:

http://aeon.co/magazine/being-human/david-deutsch-artificial-intelligence/

"Creative blocks: The very laws of physics imply that artificial intelligence must be possible. What's holding us up?"

Remember the significance attributed to Skynet’s becoming ‘self-aware’? [...] The fact is that present-day software developers could straightforwardly program a computer to have ‘self-awareness’ in the behavioural sense — for example, to pass the ‘mirror test’ of being able to use a mirror to infer facts about itself — if they wanted to. [...] AGIs will indeed be capable of self-awareness — but that is because they will be General

Some hope to learn how we can rig their programming to make [AGIs] constitutionally unable to harm humans (as in Isaac Asimov’s ‘laws of robotics’), or to prevent them from acquiring the theory that the universe should be converted into paper clips (as imagined by Nick Bostrom). None of these are the real problem. It has always been the case that a single exceptionally creative person can be thousands of times as productive — economically, intellectually or whatever — as most people; and that such a person could do enormous harm were he to turn his powers to evil instead of good.[...] The battle between good and evil ideas is as old as our species and will go on regardless of the hardware on which it is running

He also says confusing things about induction being inadequate for creativity which I'm guessing he couldn't support well in this short essay (perhaps he explains better in his books). Not quoting here. His attack on Bayesianism as an explanation for intelligence is valid and interesting, but could be wrong. Given what we know about neural networks, something like this does happen in the brain, and possibly even at a concept level. 

The doctrine assumes that minds work by assigning probabilities to their ideas and modifying those probabilities in the light of experience as a way of choosing how to act. This is especially perverse when it comes to an AGI’s values — the moral and aesthetic ideas that inform its choices and intentions — for it allows only a behaviouristic model of them, in which values that are ‘rewarded’ by ‘experience’ are ‘reinforced’ and come to dominate behaviour while those that are ‘punished’ by ‘experience’ are extinguished. As I argued above, that behaviourist, input-output model is appropriate for most computer programming other than AGI, but hopeless for AGI.

His final conclusions are disagreeable. He somehow concludes that the principal bottleneck in AGI research is a philosophical one. 

In his last paragraph, he makes the following controversial statement:

For yet another consequence of understanding that the target ability is qualitatively different is that, since humans have it and apes do not, the information for how to achieve it must be encoded in the relatively tiny number of differences between the DNA of humans and that of chimpanzees.

This would be false if, for example, the mother controls gene expression while a foetus develops and helps shape the brain. We should be able to answer this question definitively once we can grow human babies completely in vitro. Another problem would be the impact of the cultural environment. A way to answer this question would be to see if our Stone Age ancestors would be classified as AGIs under a reasonable definition

Autism, Watson, the Turing test, and General Intelligence

6 Stuart_Armstrong 24 September 2013 11:00AM

Thinking aloud:

Humans are examples of general intelligence - the only example we're sure of. Some humans have various degrees of autism (low level versions are quite common in the circles I've moved in), impairing their social skills. Mild autists nevertheless remain general intelligences, capable of demonstrating strong cross domain optimisation. Psychology is full of other examples of mental pathologies that impair certain skills, but nevertheless leave their sufferers as full fledged general intelligences. This general intelligence is not enough, however, to solve their impairments.

Watson triumphed on Jeopardy. AI scientists in previous decades would have concluded that to do so, a general intelligence would have been needed. But that was not the case at all - Watson is blatantly not a general intelligence. Big data and clever algorithms were all that were needed. Computers are demonstrating more and more skills, besting humans in more and more domains - but still no sign of general intelligence. I've recently developed the suspicion that the Turing test (comparing AI with a standard human) could get passed by a narrow AI finely tuned to that task.

The general thread is that the link between narrow skills and general intelligence may not be as clear as we sometimes think. It may be that narrow skills are sufficiently diverse and unique that a mid-level general intelligence may not be able to develop them to a large extent. Or, put another way, an above-human social intelligence may not be able to control a robot body or do decent image recognition. A super-intelligence likely could: ultimately, general intelligence includes the specific skills. But his "ultimately" may take a long time to come.

So the questions I'm wondering about are:

  1. How likely is it that a general intelligence, above human in some domain not related to AI development, will acquire high level skills in unrelated areas?
  2. By building high-performance narrow AIs, are we making it much easier for such an intelligence to develop such skills, by co-opting or copying these programs?

 

Evaluating the feasibility of SI's plan

25 JoshuaFox 10 January 2013 08:17AM

(With Kaj Sotala)

SI's current R&D plan seems to go as follows: 

1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.

The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.

The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.

A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed.  These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems,  or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely. 

SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory.  It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.

The hoped-for theory is intended to  provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.

SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll  build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.

Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.

We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.

Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.

Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff  immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."

I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.

If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.

Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.

What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?

(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)

And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.

Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.

Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.

In summary:

1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.

2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.

3. Even the provably friendly design will face real-world compromises and errors in its  implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.

Bounding the impact of AGI

17 Louie 18 December 2012 07:47PM

For those of you interested, András Kornai's paper "Bounding the impact of AGI" from this year's AGI-Impacts conference at Oxford had a few interesting ideas (which I've excerpted below).

Summary:

  1. Acceptable risk tolerances for AGI design can be determined using standard safety engineering techniques from other fields
  2. Mathematical proof is the only available tool to secure the tolerances required to prevent intolerable increases in xrisk
  3. Automated theorem proving will be required so that the proof can reasonably be checked by multiple human minds

Safety engineering

Since the original approach of Yudkowsky (2006) to friendly AI, which sought mathematical guarantees of friendliness, was met with considerable skepticism, we revisit the issue of why such guarantees are essential. In designing radioactive equipment, a reasonable guideline is to limit emissions to several orders of magnitude below the natural background radiation level, so that human-caused dangers are lost in the noise compared to the pre-existing threat we must live with anyway. In the full paper, we take the “big five” extinction events that occurred within the past half billion years as background, and argue that we need to design systems with a failure rate below 10−63 per logical operation.

What needs to be emphasized in the face of this requirement is that the very best physical measurements have only one part in 1017 precision, not to speak of social and psychological phenomena where our understanding is considerably weaker. What this means is that guarantees of the requisite sort can only be expected from mathematics, where our measurement precision is already considerably better.

 

How reliable is mathematics?

The period since World War II has brought incredible advances in mathematics, such as the Four Color Theorem (Appel and Haken 1976), Fermat’s Last Theorem (Wiles 1995), the classification of finite simple groups (Gorenstein 1982, Aschbacher 2004), and the Poincare conjecture (Perelman 1994). While the community of mathematicians is entirely convinced of the correctness of these results, few individual mathematicians are, as the complexity of the proofs, both in terms of knowledge assumed from various branches of mathematics and in terms of the length of the deductive chain, is generally beyond our ken. Instead of a personal understanding of the matter, most of us now rely on argumentum ad verecundiam: well Faltings and Ribet now think that the Wiles-Taylor proof is correct, and even if I don’t know Faltings or Ribet at least I know and respect people who know and respect them, and if that’s not good enough I can go and devote a few years of my life to understand the proof for good. Unfortunately, the communal checking of proofs often takes years, and sometimes errors are discovered only after a decade has passed: the hole in the original proof of the Four Color Theorem (Kempe 1879) was detected by Heawood in 1890. Tomonaga in his Nobel lecture (1965) describes how his team’s work in 1947 uncovered a major problem in Dancoff (1939):

Our new method of calculation was not at all different in its contents from Dancoff’s perturbation method, but had the advantage of making the calculation more clear. In fact, what took a few months in the Dancoff type of calculation could be done in a few weeks. And it was by this method that a mistake was discovered in Dancoff’s calculation; we had also made the same mistake in the beginning.

To see that such long-hidden errors are are by no means a thing of the past, and to observe the ‘web of trust’ method in action, consider the following example from Mohr (2012).

The eighth-order coefficient A1(8) arises from 891 Feynman diagrams of which only a few are known analytically. Evaluation of this coefficient numerically by Kinoshita and coworkers has been underway for many years (Kinoshita, 2010). The value used in the 2006 adjustment is A1(8) = -1.7283(35) as reported by Kinoshita and Nio (2006). However, (...) it was discovered by Aoyama et al. (2007) that a significant error had been made in the calculation. In particular, 2 of the 47 integrals representing 518 diagrams that had not been confirmed independently required a corrected treatment of infrared divergences. (...) The new value is (Aoyama et al., 2007) A1(8) = 1.9144(35); (111) details of the calculation are given by Aoyama et al. (2008). In view of the extensive effort made by these workers to ensure that the result in Eq. (111) is reliable, the Task Group adopts both its value and quoted uncertainty for use in the 2010 adjustment.

 

Assuming no more than three million mathematics and physics papers published since the beginnings of scientific publishing, and no less than the three errors documented above, we can safely conclude that the overall error rate of the reasoning used in these fields is at least 10-6 per paper.

 

The role of automated theorem-proving

That human reasoning, much like manual arithmetic, is a significantly error-prone process comes as no surprise. Starting with de Bruijn’s Automath (see Nederpelt et al 1994) logicians and computer scientists have invested significant effort in mechanized proof checking, and it is indeed only through such efforts, in particular through the Coq verification (Gonthier 2008) of the entire logic behind the Appel and Haken proof that all lingering doubts about the Four Color Theorem were laid to rest. The error in A1(8) was also identified by using FORTRAN code generated by an automatic code generator (Mohr et al 2012).

To gain an appreciation of the state of the art, consider the theorem that finite groups of odd order are solvable (Feit and Thompson 1963). The proof, which took two humans about two years to work out, takes up an entire issue of the Pacific Journal of Mathematics (255 pages), and it was only this year that a fully formal proof was completed by Gonthier’s team (see Knies 2012). The effort, 170,000 lines, 15,000 definitions, 4,200 theorems in Coq terms, took person-decades of human assistance (15 people working six years, though many of them part-time) even after the toil of Bender and Glauberman (1995) and Peterfalvi (2000), who have greatly cleaned up and modularized the original proof, in which elementary group-theoretic and character-theoretic argumentation was completely intermixed.

The classification of simple finite groups is two orders of magnitude bigger: the effort involved about 100 humans, the original proof is scattered among 20,000 pages of papers, the largest (Aschbacher and Smith 2004a,b) taking up two volumes totaling some 1,200 pages. While everybody capable of rendering meaningful judgment considers the proof to be complete and correct, it must be somewhat worrisome at the 10-64 level that there are no more than a couple of hundred such people, and most of them have something of a vested interest in that they themselves contributed to the proof. Let us suppose that people who are convinced that the classification is bug-free are offered the following bet by some superior intelligence that knows the answer. You must enter a room with as many people you can convince to come with you and push a button: if the classification is bug-free you will each receive $100, if not, all of you will immediately die. Perhaps fools rush in where angels fear to tread, but on the whole we still wouldn’t expect too many takers.

Whether the classification of finite simple groups is complete and correct is very hard to say – the planned second generation proof will still be 5,000 pages, and mechanized proof is not yet in sight. But this is not to say that gaining mathematical knowledge of the required degree of reliability is hopeless, it’s just that instead of monumental chains of abstract reasoning we need to retreat to considerably simpler ones. Take, for example, the first Sylow Theorem, that if the order of a finite group G is divisible by some prime power pn, G will have a subgroup H of this order. We are absolutely certain about this. Argumentum ad verecundiam of course is still available, but it is not needed: anybody can join the hive-mind by studying the proof. The Coq verification contains 350 lines, 15 definitions, 90 theorems, and took 2 people 2 weeks to produce. The number of people capable of rendering meaningful judgment is at least three orders of magnitude larger, and the vast majority of those who know the proof would consider betting their lives on the truth of this theorem an easy way of winning $100 with no downside risk.

 

Further remarks

Not only do we have to prove that the planned AGI will be friendly, the proof itself has to be short enough to be verifiable by humans. Consider, for example, the fundamental theorem of algebra. Could it be the case that we, humans, are all deluded into thinking that an n-th degree polynomial will have roots? Yes, but this is unlikely in the extreme. If this so-called theorem is really a trap laid by a superior intelligence we are doomed anyway, humanity can find its way around it no more than a bee can find its way around the windowpane. Now consider the four-color theorem, which is still outside the human-verifiable range. It is fair to say that it would be unwise to create AIs whose friendliness critically depends on design limits implied by the truth of this theorem, while AIs whose friendliness is guaranteed by the fundamental theorem of algebra represent a tolerable level of risk. 

Recently, Goertzel and Pitt (2012) have laid out a plan to endow AGI with morality by means of carefully controlled machine learning. Much as we are in agreement with their goals, we remain skeptical about their plan meeting the plain failure engineering criteria laid out above.

The challenges of bringing up AIs

8 Stuart_Armstrong 10 December 2012 12:43PM

At the current AGI-12 conference, some designers have been proponents of keeping AGI's safe by bringing them up in human environments, providing them with interactions and feedback in a similar way to how we bring up human children. Obviously that approach would fail for a fully smart AGI with its own values - it would pretend to follow our values for as long as it needed, and then defect. However, some people have confidence if we started with a limited, dumb AGI, then we could successfully inculcate our values in this way (a more sophisticated position would be that though this method would likely fail, it's more likely to succeed than a top-down friendliness project!).

The major criticism of this approach is that it anthropomorphises the AGI - we have a theory of children's minds, constructed by evolution, culture, and our own child-rearing experience. And then we project this on the alien mind of the AGI, assuming that if the AGI presents behaviours similar to a well-behaved child, then it will become a moral AGI. The problem is that we don't know how alien the AGI's mind will be, and if our reinforcement is actually reinforcing the right thing. Specifically, we need to be able to find some way of distinguishing between:

  1. An AGI being trained to be friendly.
  2. An AGI being trained to lie and conceal.
  3. An AGI that will behave completely differently once out of the training/testing/trust-building environment.
  4. An AGI that forms the wrong categories and generalisations (what counts as "human" or "suffering", for instance), because it lacks human-shared implicit knowledge that was "too obvious" for us to even think of training it on.

Muehlhauser-Hibbard Dialogue on AGI

9 lukeprog 09 July 2012 11:11PM

Part of the Muehlhauser series on AGI.

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Bill Hibbard is an emeritus senior scientist at University of Wisconsin-Madison and the author of Super-Intelligent Machines.


Luke Muehlhauser:

[Apr. 8, 2012]

Bill, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!

On what do we agree? In separate conversations, Ben Goertzel and Pei Wang agreed with me on the following statements (though I've clarified the wording for our conversation):

  1. Involuntary death is bad, and can be avoided with the right technology.
  2. Humans can be enhanced by merging with technology.
  3. Humans are on a risky course in general, because powerful technologies can destroy us, humans often behave not in their own best interests, and we are unlikely to voluntarily halt technological progress.
  4. AGI is likely this century.
  5. AGI will greatly transform the world. It poses existential and other serious risks, but could also be the best thing that ever happens to us if we do it right.
  6. Careful effort will be required to ensure that AGI results in good things rather than bad things for humanity.

You stated in private communication that you agree with these statements, so we have substantial common ground.

I'd be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI before we develop arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?

And, which questions would you like to raise?

continue reading »

AI risk: the five minute pitch

9 Stuart_Armstrong 08 May 2012 04:28PM

I did a talk at the 25th Oxford Geek night, in which I had five minutes to present the dangers of AI. The talk is now online. Though it doesn't contain anything people at Less Wrong would find new, I feel it does a reasonable job at pitching some of the arguments in a very brief format.

Muehlhauser-Wang Dialogue

24 lukeprog 22 April 2012 10:40PM

Part of the Muehlhauser interview series on AGI.

 

Luke Muehlhauser is Executive Director of the Singularity Institute, a non-profit research institute studying AGI safety.

Pei Wang is an AGI researcher at Temple University, and Chief Executive Editor of Journal of Artificial General Intelligence.

 

Luke Muehlhauser

[Apr. 7, 2012]

Pei, I'm glad you agreed to discuss artificial general intelligence (AGI) with me. I hope our dialogue will be informative to many readers, and to us!

On what do we agree? Ben Goertzel and I agreed on the statements below (well, I cleaned up the wording a bit for our conversation):

  1. Involuntary death is bad, and can be avoided with the right technology.
  2. Humans can be enhanced by merging with technology.
  3. Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
  4. AGI is likely this century.
  5. AGI will greatly transform the world. It is a potential existential risk, but could also be the best thing that ever happens to us if we do it right.
  6. Careful effort will be required to ensure that AGI results in good things rather than bad things for humanity.

You stated in private communication that you agree with these statements, depending on what is meant by "AGI." So, I'll ask: What do you mean by "AGI"?

I'd also be curious to learn what you think about AGI safety. If you agree that AGI is an existential risk that will arrive this century, and if you value humanity, one might expect you to think it's very important that we accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI. (This is what Anna Salamon and I recommend in Intelligence Explosion: Evidence and Import.) What are your thoughts on the matter?

 

Pei Wang:

[Apr. 8, 2012]

By “AGI” I mean computer systems that follow roughly the same principles as the human mind. Concretely, to me “intelligence” is the ability to adapt to the environment under insufficient knowledge and resources, or to follow the “Laws of Thought” that realize a relative rationality that allows the system to apply its available knowledge and resources as much as possible. See [1, 2] for detailed descriptions and comparisons to other definitions of intelligence.

Such a computer system will share many properties with the human mind; however, it will not have exactly the same behaviors or problem-solving capabilities of a typical human being, since as an adaptive system, the behaviors and capabilities of an AGI not only depend on its built-in principles and mechanisms, but also its body, initial motivation, and individual experience, which are not necessarily human-like.

Like all major breakthroughs in science and technology, the creation of AGI will be both a challenge and an opportunity to the human kind. Like scientists and engineers in all fields, we AGI researchers should use our best judgments to ensure that AGI results in good things rather than bad things for humanity.

Even so, the suggestion to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong, for the following major reasons:

  1. It is based on a highly speculative understanding about what kind of “AGI” will be created. The definition of intelligence in Intelligence Explosion: Evidence and Import is not shared by most AGI researchers. According to my opinion, that kind of “AGI” will never be built.
  2. Even if the above definition is only considered as a possibility among the other versions of AGI, it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.
  3. If intelligence turns out to be adaptive (as believed by me and many others), then a “friendly AI” will be mainly the result of proper education, not proper design. There will be no way to design a “safe AI”, just like there is no way to require parents to only give birth to “safe baby” who will never become a criminal.
  4. The “friendly AI” approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems, and is not accepted by most AGI researchers. The AGI community has ignored it, not because it is indisputable, but because people have not bothered to criticize it.

In summary, though the safety of AGI is indeed an important issue, currently we don’t know enough about the subject to make any sure conclusion. Higher safety can only be achieved by more research on all related topics, rather than by pursuing approaches that have no solid scientific foundation. I hope your Institute to make constructive contribution to the field by studying a wider range of AGI projects, rather than to generalize from a few, or to commit to a conclusion without considering counter arguments.


Luke:

[Apr. 8, 2012]

I appreciate the clarity of your writing, Pei. “The Assumptions of Knowledge and Resources in Models of Rationality” belongs to a set of papers that make up half of my argument for why the only people allowed to do philosophy should be those with with primary training in cognitive science, computer science, or mathematics. (The other half of that argument is made by examining most of the philosophy papers written by those without primary training in cognitive science, computer science, or mathematics.)

You write that my recommendation to “accelerate AI safety research and decelerate AI capabilities research so that we develop safe superhuman AGI first, rather than arbitrary superhuman AGI” is wrong for four reasons, which I will respond to in turn:

  1. “It is based on a highly speculative understanding about what kind of ‘AGI’ will be created.” Actually, it seems to me that my notion of AGI is broader than yours. I think we can use your preferred definition and get the same result. (More on this below.)
  2. “…it will be the actual AI research that will tell us which possibility will become reality. To ban a scientific research according to imaginary risks damages humanity no less than risky research.”
  3. Yes, of course. But we argue (very briefly) that a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals. The fuller argument for this is made in Nick’s Bostrom’s “The Superintelligent Will.”
  4. “…a ‘friendly AI’ will be mainly the result of proper education, not proper design. There will be no way to design a ‘safe AI’, just like there is no way to require parents to only give birth to ‘safe baby’ who will never become a criminal.” Without being more specific, I can’t tell if we actually disagree on this point. The most promising approach (that I know of) for Friendly AI is one that learns human values and then “extrapolates” them so that the AI optimizes for what we would value if we knew more, were more the people we wish we were, etc. instead of optimizing for our present, relatively ignorant values. (See “The Singularity and Machine Ethics.”)
  5. “The ‘friendly AI’ approach advocated by Eliezer Yudkowsky has several serious conceptual and theoretical problems.”

I agree. Friendly AI may be incoherent and impossible. In fact, it looks impossible right now. But that’s often how problems look right before we make a few key insights that make things clearer, and show us (e.g.) how we were asking a wrong question in the first place. The reason I advocate Friendly AI research (among other things) is because it may be the only way to secure a desirable future for humanity, (see “Complex Value Systems are Required to Realize Valuable Futures.”) even if it looks impossible. That is why Yudkowsky once proclaimed: “Shut Up and Do the Impossible!” When we don’t know how to make progress on a difficult problem, sometimes we need to hack away at the edges.

I certainly agree that “currently we don’t know enough about [AGI safety] to make any sure conclusion.” That is why more research is needed.

As for your suggestion that “Higher safety can only be achieved by more research on all related topics,” I wonder if you think that is true of all subjects, or only in AGI. For example, should mankind vigorously pursue research on how to make Ron Fouchier's alteration of the H5N1 bird flu virus even more dangerous and deadly to humans, because “higher safety can only be achieved by more research on all related topics”? (I’m not trying to broadly compare AGI capabilities research to supervirus research; I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research.)

Hopefully I have clarified my own positions and my reasons for them. I look forward to your reply!

 

Pei:

[Apr. 10, 2012]

Luke: I’m glad to see the agreements, and will only comment on the disagreements.

  1. “my notion of AGI is broader than yours” In scientific theories, broader notions are not always better. In this context, a broad notion may cover too many diverse approaches to provide any non-trivial conclusion. For example, AIXI and NARS are fundamentally different in many aspects, and NARS do not approximate AIXI. It is OK to call both “AGI” with respect to their similar ambitions, but theoretical or technical descriptions based on such a broad notion are hard to make. Almost all of your descriptions about AIXI are hardly relevant to NARS, as well as to most existing “AGI” projects, for this reason.
  2. “I think we can use your preferred definition and get the same result.” No you cannot. According to my definition, AIXI is not intelligent, since it doesn’t obey AIKR. Since most of your conclusions are about that type of system, they will go with it.
  3. “a very broad range of artificial agents with a roughly human-level capacity for adaptation (under AIKR) will manifest convergent instrumental goals” I cannot access Bostrom’s paper, but guess that he made additional assumptions. In general, the goal structure of an adaptive system changes according to the system’s experience, so unless you restrict the experience of these artificial agents, there is no way to restrict their goals. I agree that to make AGI safe, to control their experience will probably be the main approach (which is what “education” is all about), but even that cannot guarantee safety. (see below)
  4. The Singularity and Machine Ethics.” I don’t have the time to do a detailed review, but can frankly tell you why I disagree with the main suggestion “to program the AI’s goal system to want what we want before the AI self-improves beyond our capacity to control it”.
  5. As I mentioned above, the goal system of an adaptive system evolves as a function of the system’s experience. No matter what initial goals are implanted, under AIKR the derived goals are not necessarily their logical implications, which is not necessarily a bad thing (the humanity is not a logical implication of the human biological nature, neither), though it means the designer has no full control to it (unless the designer also fully controls the experience of the system, which is practically impossible). See “The self-organization of goals” for detailed discussion.
  6. Even if the system’s goal system can be made to fully agree with certain given specifications, I wonder where these specifications come from --- we human beings are not well known for reaching consensus on almost anything, not to mention on a topic this big.
  7. Even if the we could agree on the goals of AI’s, and find a way to enforce them in AI’s, that still doesn’t means we have “friendly AI”. Under AIKR, a system can cause damage simply because of its ignorance in a novel situation.

For these reasons, under AIKR we cannot have AI with guaranteed safety or friendliness, though we can and should always do our best to make them safer, based on our best judgment (which can still be wrong, due to AIKR). To apply logic or probability theory into the design won’t change the big picture, because what we are after are empirical conclusions, not theorems within those theories. Only the latter can have proved correctness, and the former cannot (though they can have strong evidential support).

“I’m just trying to understand the nature of your rejection of my recommendation for mankind to decelerate AGI capabilities research and accelerate AGI safety research”

Frankly, I don’t think anyone currently has the evidence or argument to ask the others to decelerate their research for safety consideration, though it is perfectly fine to promote your own research direction and try to attract more people into it. However, unless you get a right idea about what AGI is and how it can be built, it is very unlikely for you to know how to make it safe.

 

Luke:

[Apr. 10, 2012]

I didn’t mean to imply that my notion of AGI was “better” because it is broader. I was merely responding to your claim that my argument for differential technological development (in this case, decelerating AI capabilities research while accelerating AI safety research) depends on a narrow notion of AGI that you believe “will never be built.” But this isn’t true, because my notion of AGI is very broad and includes your notion of AGI as a special case. My notion of AGI includes both AIXI-like “intelligent” systems and also “intelligent” systems which obey AIKR, because both kinds of systems (if implemented/approximated successfully) could efficiently use resources to achieve goals, and that is the definition Anna and I stipulated for “intelligence.”

Let me back up. In our paper, Anna and I stipulate that for the purposes of our paper we use “intelligence” to mean an agent’s capacity to efficiently use resources (such as money or computing power) to optimize the world according to its preferences. You could call this “instrumental rationality” or “ability to achieve one’s goals” or something else if you prefer; I don’t wish to encourage a “merely verbal” dispute between us. We also specify that by “AI” (in our discussion, “AGI”) we mean “systems which match or exceed the intelligence [as we just defined it] of humans in virtually all domains of interest.” That is: by “AGI” we mean “systems which match or exceed the human capacity for efficiently using resources to achieve goals in virtually all domains of interest.” So I’m not sure I understood you correctly: Did you really mean to say that “kind of AGI will never be built”? If so, why do you think that? Is the human very close to a natural ceiling on an agent’s ability to achieve goals?

What we argue in “Intelligence Explosion: Evidence and Import,” then, is that a very broad range of AGIs pose a threat to humanity, and therefore we should be sure we have the safety part figured out as much as we can before we figure out how to build AGIs. But this is the opposite of what is happening now. Right now, almost all AGI-directed R&D resources are being devoted to AGI capabilities research rather than AGI safety research. This is the case even though there is AGI safety research that will plausibly be useful given almost any final AGI architecture, for example the problem of extracting coherent preferences from humans (so that we can figure out which rules / constraints / goals we might want to use to bound an AGI’s behavior).

I do hope you have the chance to read “The Superintelligent Will.” It is linked near the top of nickbostrom.com and I will send it to you via email.

But perhaps I have been driving the direction of our conversation too much. Don’t hesitate it to steer it towards topics you would prefer to address!

 

Pei:

[Apr. 12, 2012]

Hi Luke,

I don’t expect to resolve all the related issues in such a dialogue. In the following, I’ll return to what I think as the major issues and summarize my position.

  1. Whether we can build a “safe AGI” by giving it a carefully designed “goal system” My answer is negative. It is my belief that an AGI will necessarily be adaptive, which implies that the goals it actively pursues constantly change as a function of its experience, and are not fully restricted by its initial (given) goals. As described in my eBook (cited previously), the goal derivation is based on the system’s beliefs, which may lead to conflicts in goals. Furthermore, even if the goals are fixed, they cannot fully determine the consequences of the system’s behaviors, which also depend on the system’s available knowledge and resources, etc. If all those factors are also fixed, then we may get guaranteed safety, but the system won’t be intelligent --- it will be just like today’s ordinary (unintelligent) computer.
  2. Whether we should figure out how to build “safe AGI” before figuring out how to build “AGI”. My answer is negative, too. As in all adaptive systems, the behaviors of an intelligent system are determined both by its nature (design) and nurture (experience). The system’s intelligence mainly comes from its design, and is “morally neutral”, in the sense that (1) any goals can be implanted initially, (2) very different goals can be derived from the same initial design and goals, given different experience. Therefore, to control the morality of an AI mainly means to educate it properly (i.e., to control its experience, especially in its early years). Of course, the initial goals matters, but it is wrong to assume that the initial goals will always be the dominating goals in decision making processes. To develop a non-trivial education theory of AGI requires a good understanding about how the system works, so if we don’t know how to build an AGI, there is no chance for us to know how to make it safe. I don’t think a good education theory can be “proved” in advance, pure theoretically. Rather, we’ll learn most of it by interacting with baby AGIs, just like how many of us learn how to educate children.

Such a short position statement may not convince you, but I hope you can consider it at least as a possibility. I guess the final consensus can only come from further research.

 

Luke:

[Apr. 19, 2012]

Pei,

I agree that an AGI will be adaptive in the sense that its instrumental goals will adapt as a function of its experience. But I do think advanced AGIs will have convergently instrumental reasons to preserve their final (or “terminal”) goals. As Bostrom explains in “The Superintelligent Will”:

An agent is more likely to act in the future to maximize the realization of its present final goals if it still has those goals in the future. This gives the agent a present instrumental reason to prevent alterations of its final goals.

I also agree that even if an AGI’s final goals are fixed, the AGI’s behavior will also depend on its knowledge and resources, and therefore we can’t exactly predict its behavior. But if a system has lots of knowledge and resources, and we know its final goals, then we can predict with some confidence that whatever it does next, it will be something aimed at achieving those final goals. And the more knowledge and resources it has, the more confident we can be that its actions will successfully aim at achieving its final goals. So if a superintelligent machine’s only final goal is to play through Super Mario Bros within 30 minutes, we can be pretty confident it will do so. The problem is that we don’t know how to tell a superintelligent machine to do things we want, so we’re going to get many unintended consequences for humanity (as argued in “The Singularity and Machine Ethics”).

You also said that you can’t see what safety work there is to be done without having intelligent systems (e.g. “baby AGIs”) to work with. I provided a list of open problems in AI safety here, and most of them don’t require that we know how to build an AGI first. For example, one reason we can’t tell an AGI to do what humans want is that we don’t know what humans want, and there is work to be done in philosophy and in preference acquisition in AI in order to get clearer about what humans want.

 

Pei:

[Apr. 20, 2012]

Luke,

I think we have made our different beliefs clear, so this dialogue has achieved its goal. It won’t be an efficient usage of our time to attempt to convince each other at this moment, and each side can analyze these beliefs in proper forms of publication at a future time.

Now we can let the readers consider these arguments and conclusions.

Yet another safe oracle AI proposal

2 jacobt 26 February 2012 11:45PM

Previously I posted a proposal for a safe self-improving limited oracle AI but I've fleshed out the idea a bit more now.

Disclaimer: don't try this at home. I don't see any catastrophic flaws in this but that doesn't mean that none exist.

This framework is meant to safely create an AI that solves verifiable optimization problems; that is, problems whose answers can be checked efficiently. This set mainly consists of NP-like problems such as protein folding, automated proof search, writing hardware or software to specifications, etc.

This is NOT like many other oracle AI proposals that involve "boxing" an already-created possibly unfriendly AI in a sandboxed environment. Instead, this framework is meant to grow a self-improving seed AI safely.

Overview

  1. Have a bunch of sample optimization problems.
  2. Have some code that, given an optimization problem (stated in some standardized format), finds a good solution. This can be seeded by a human-created program.
  3. When considering an improvement to program (2), allow the improvement if it makes it do better on average on the sample optimization problems without being significantly more complex (to prevent overfitting). That is, the fitness function would be something like (average performance - k * bits of optimizer program).
  4. Run (2) to optimize its own code using criterion (3). This can be done concurrently with human improvements to (2), also using criterion (3).

Definitions

First, let's say we're writing this all in Python. In real life we'd use a language like Lisp because we're doing a lot of treatment of code as data, but Python should be sufficient to demonstrate the basic ideas behind the system.

We have a function called steps_bounded_eval_function. This function takes 3 arguments: the source code of the function to call, the argument to the function, and the time limit (in steps). The function will eval the given source code and call the defined function with the given argument in a protected, sandboxed environment, with the given steps limit. It will return either: 1. None, if the program does not terminate within the steps limit. 2. A tuple (output, steps_taken): the program's output (as a string) and the steps the program took.

Examples:

steps_bounded_eval_function("""
def function(x):
return x + 5
""", 4, 1000)

evaluates to (9, 3), assuming that evaluating the function took 3 ticks, because function(4) = 9.

steps_bounded_eval_function("""
def function(x):
while True: # infinite loop
pass
""", 5, 1000

evaluates to None, because the defined function doesn't return in time. We can write steps_bounded_eval_function as a meta-circular interpreter with a bit of extra logic to count how many steps the program uses.

Now I would like to introduce the notion of a problem. A problem consists of the following:

  1. An answer scorer. The scorer should be the Python source code for a function. This function takes in an answer string and scores it, returning a number from 0 to 1. If an error is encountered in the function it is equivalent to returning 0.

  2. A steps penalty rate, which should be a positive real number.

Let's consider a simple problem (subset sum):

{'answer_scorer': """
def function(answer):
nums = [4, 5, -3, -5, -6, 9]
# convert "1,2,3" to [1, 2, 3]
indexes = map(int, answer.split(','))
assert len(indexes) >= 1
sum = 0
for i in indexes:
sum += nums[i]
if sum == 0:
return 1
else:
return 0
""",
'steps_penalty_rate': 0.000001}

We can see that the scorer function returns 1 if and only if the answer specifies the indexes of numbers in the list nums that sum to 0 (for example, '0,1,3,4' because 4+5-3-6=0).

An optimizer is a program that is given a problem and attempts to solve the problem, returning an answer.

The score of an optimizer on a problem is equal to the score according to the answer-scorer, minus the steps penalty rate multiplied by the number of steps used by the optimizer. That is, the optimizer is rewarded for returning a better answer in less time. We can define the following function to get the score of an optimizer (Python source code) for a given problem:

def problem_score(problem, optimizer_source):
# run the optimizer on the problem
result = steps_bounded_eval_function(
optimizer_source, problem, 1 / problem['steps_penalty_rate'])
if result == None: # used all available steps, or got an error
return 0.0
answer, steps_taken = result # optimizer returned a result in time
# get the score using the problem's answer_scorer
answer_score = eval_function(problem['answer_scorer'], answer)
assert 0 <= answer_score <= 1
# penalize for taking up time and make sure the result is non-negative
return max(0.0, answer_score - problem['steps_penalty_rate'] * steps_taken)

A simple optimizer that detects subset-sum problems (by inspecting the source code) and uses a brute-force method to solve it will get a problem_score close to 1 for small subset-sum problems. This optimizer would not do well on other problems; other techniques (such as evaluating the problem source code with different answers and choosing a good one) are needed to solve a variety of different optimization problems. Writing a good optimizer is very hard work, just like any seed AI.

Framework

The framework has 4 modules:

  1. A set of training problems. These should cover a wide range of problems that we would like the AI to solve.

  2. An optimizer, written in Python. This should be seeded with a very good human-written optimizer. This is deliberately unspecified by me because it's a really hard problem (as is any seed AI).

  3. A scorer for optimizer source code defined as follows:

    def optimizer_score(candidate_optimizer_source):
    training_problems = [...training problems here...]
    # a parameter of the system; more on this later
    complexity_penalty = 0.1
    # total up the program's score on all training problems
    total_score = 0.0
    for problem in training_problems:
    total_score += problem_score(candidate_optimizer_source, problem)
    # penalize for complexity, to prevent overfitting
    total_score -= complexity_penalty * len(compress_binary(candidate_optimizer_source))
    # return average score
    return max(0, total_score / len(training_problems))
    

    This gives a candidate optimizer a score in the range [0, 1] based on both its average performance on the sample set and its inherent complexity. Presumably optimizers with a higher optimizer_score will do better on future optimization problems.

  4. A self-optimization thread. This thread continuously runs program 2 on a problem formed using 3's answer_scorer and an ever-decreasing steps_penalty_rate. Whenever program 2 outputs source code (optimizer_source) that is better than the current source code for 2, the source code for 2 is replaced with this new value. Also, humans can make improvements to program 2 if it increases its score according to 3's answer. Source code:

    # assume we have access to an optimizer_source variable (program 2)
    def self_optimization_thread():
    start_steps_penalty_rate = 0.000001
    steps_penalty_rate = start_steps_penalty_rate
    while True: # loop forever
    self_optimization_problem = {
    # just use program 3 to score the optimizer
    'answer_scorer': """
    def function(candidate_optimizer_source):
    ... put the source code from program 3's optimizer_score here
    """,
    'steps_penalty_rate': steps_penalty_rate
    }
    # call the optimizer (program 2) to optimize itself, giving it limited time
    result = steps_bounded_eval_function(
    optimizer_source, self_optimization_problem, 1 / steps_penalty_rate)
    changed = False
    if result is not None: # optimizer returned in time
    candidate_optimizer = result[0] # 2 returned a possible replacement for itself
    if optimizer_score(candidate_optimizer) > optimizer_score(optimizer_source):
    # 2's replacement is better than 2
    optimizer_source = candidate_optimizer
    steps_penalty_rate = start_steps_penalty_rate
    changed = True
    if not changed:
    # give the optimizer more time to optimize itself on the next iteration
    steps_penalty_rate *= 0.5
    

So, what does this framework get us?

  1. An super-optimizer, program 2. We can run it on new optimization problems and it should do very well on them.

  2. Self-improvement. Program 4 will continuously use program 2 to improve itself. This improvement should make program 2 even better at bettering itself, in addition to doing better on other optimization problems. Also, the training set will guide human improvements to the optimizer.

  3. Safety. I don't see why this setup has any significant probability of destroying the world. That doesn't mean we should disregard safety, but I think this is quite an accomplishment given how many other proposed AI designs would go catastrophically wrong if they recursively self-improved.

I will now evaluate the system according to these 3 factors.

Optimization ability

Assume we have a program for 2 that has a very very high score according to optimizer_score (program 3). I think we can be assured that this optimizer will do very very well on completely new optimization problems. By a principle similar to Occam's Razor, a simple optimizer that performs well on a variety of different problems should do well on new problems. The complexity penalty is meant to prevent overfitting to the sample problems. If we didn't have the penalty, then the best optimizer would just return the best-known human-created solutions to all the sample optimization problems.

What's the right value for complexity_penalty? I'm not sure. Increasing it too much makes the optimizer overly simple and stupid; decreasing it too much causes overfitting. Perhaps the optimal value can be found by some pilot trials, testing optimizers against withheld problem sets. I'm not entirely sure that a good way of balancing complexity with performance exists; more research is needed here.

Assuming we've conquered overfitting, the optimizer should perform very well on new optimization problems, especially after self-improvement. What does this get us? Here are some useful optimization problems that fit in this framework:

  1. Writing self-proving code to a specification. After writing a specification of the code in a system such as Coq, we simply ask the optimizer to optimize according to the specification. This would be very useful once we have a specification for friendly AI.

  2. Trying to prove arbitrary mathematical statements. Proofs are verifiable in a relatively short amount of time.

  3. Automated invention/design, if we have a model of physics to verify the invention against.

  4. General induction/Occam's razor. Find a generative model for the data so far that optimizes P(model)P(data|model), with some limits on the time taken for the model program to run. Then we can run the model to predict the future.

  5. Bioinformatics, e.g. protein folding.

These are all problems whose solutions can be efficiently evaluated and that a computer could plausibly solve, so I think this framework should provide good solutions to them. If the optimizer this framework produces performs well on all these problems, I think it deserves to be called an oracle AGI.

Self-improvement

It seems that, if my arguments about optimization ability are convincing, an optimizer that does very well on the sample problems should also do well on the specific optimization problem of improving itself. This is especially true if the training problems include some program-optimization problems (i.e. finding a program that does well on some task). Once the optimizer improves itself, it will be better at improving itself, and so on. If the AI is software-limited, the framework should allow it to recursively improve itself up to a very high limit. Then we can use its hardware design power to design better chips for it and allow it to optimize even further.

The optimizer's ability to self-improve is limited by the fact that the other parts of the program (1, 3, and 4) are fixed. The system is greedy in the sense that it just looks for even the slightest improvement, then changes the code and looks for more improvements, without any foresight. I don't know how much of a limitation this is. I don't see any reason that greedy improvements to the optimizer are insufficient for super-optimization from recursive self-improvement.

Currently, the optimizer must completely derive a new optimizer from the specification (program 3), without looking at the current best optimizer (its own source code). In practice we might want to extend the framework to include incremental improvements (improvements on top of existing good answers). I don't know if a good theory for scoring incremental optimizers exists.

Safety

Probably the most important property of the system. What use is optimization if the world has been converted to computron in order to power a super-powerful protein folder? Here are a few different possible threats:

Underhanded question answers

The optimizer could give answers that are very good according to the problem description but are specifically crafted to cause things unintended by the human operators. For example, it could create a software design that, while correct according to specifications, has a "bug" that causes it to hack into computers on the internet and hook them up to the optimizer system. I don't think this is at all likely because the system is extremely short-sighted. Any self-modification must improve its immediate ability to do better on optimization problems. Furthermore, all self-modifications are found using methods that are judged by only immediate improvements. So any kind of long-term strategy (sacrificing some optimization power so it will have resources in the future) is out of the question. In other words, optimizer_score should not be seen as a utility function because the system only tries to improve it using greedy methods, not long-term planning.

Bugs in the system

What if the system gives the optimizer an incorrect score under some circumstances (e.g. if it performs a certain pattern of memory accesses)? Say that, by chance, the optimizer's improvement to itself causes it to get an incorrect score. It might internalize the rule "perform memory accesses in this pattern" to get a higher score. This itself is not especially dangerous; the optimizer will rewrite itself to just do a bunch of weird memory accesses that give it a high score.

What might be more dangerous is if the optimizer discovers an underlying pattern behind the system's hackability. Since the optimizer is penalized for complexity, a program like "do things that, when executed on a certain virtual machine, cause this variable in the machine to be a high number" might have a higher score than "do this certain complex pattern of memory accesses". Then the optimizer might discover the best way to increase the score variable. In the absolute worst case, perhaps the only way to increase the score variable is by manipulating the VM to go on the internet and do unethical things. This possibility seems unlikely (if you can connect to the internet, you can probably just overwrite the score variable) but should be considered.

I think the solution is straightforward: have the system be isolated while the optimizer is running. Completely disconnect it from the internet (possibly through physical means) until the optimizer produces its answer. Now, I think I've already established that the answer will not be specifically crafted to improve future optimization power (e.g. by manipulating human operators), since the system is extremely short-sighted. So this approach should be safe. At worst you'll just get a bad answer to your question, not an underhanded one.

Malicious misuse

I think this is the biggest danger of the system, one that all AGI systems have. At high levels of optimization ability, the system will be able to solve problems that would help people do unethical things. For example it could optimize for cheap, destructive nuclear/biological/nanotech weapons. This is a danger of technological progress in general, but the dangers are magnified by the potential speed at which the system could self-improve.

I don't know the best way to prevent this. It seems like the project has to be undertaken in private; if the seed optimizer source were released, criminals would run it on their computers/botnets and possibly have it self-improve even faster than the ethical version of the system. If the ethical project has more human and computer resources than the unethical project, this danger will be minimized.

It will be very tempting to crowdsource the project by putting it online. People could submit improvements to the optimizer and even get paid for finding them. This is probably the fastest way to increase optimization progress before the system can self-improve. Unfortunately I don't see how to do this safely; there would need to be some way to foresee the system becoming extremely powerful before criminals have the chance to do this. Perhaps there can be a public base of the project that a dedicated ethical team works off of, while contributing only some improvements they make back to the public project.

Towards actual friendly AI

Perhaps this system can be used to create actual friendly AI. Once we have a specification for friendly AI, it should be straightforward to feed it into the optimizer and get a satisfactory program back. What if we don't have a specification? Maybe we can have the system perform induction on friendly AI designs and their ratings (by humans), and then write friendly AI designs that it predicts will have a high rating. This approach to friendly AI will reflect present humans' biases and might cause the system to resort to manipulative tactics to make its design more convincing to humans. Unfortunately I don't see a way to fix this problem without something like CEV.

Conclusion

If this design works, it is a practical way to create a safe, self-improving oracle AI. There are numerous potential issues that might make the system weak or dangerous. On the other hand it will have short-term benefits because it will be able to solve practical problems even before it can self-improve, and it might be easier to get corporations and governments on board. This system might be very useful for solving hard problems before figuring out friendliness theory, and its optimization power might be useful for creating friendly AI. I have not encountered any other self-improving oracle AI designs for which we can be confident that its answers are not underhanded attempts to get us to let it out.

Since I've probably overlooked some significant problems/solutions to problems in this analysis I'd like to hear some more discussion of this design and alternatives to it.

Students asked to defend AGI danger update in favor of AGI riskiness

3 lukeprog 18 October 2011 05:24AM

From Geoff Anders of Leverage Research:

In the Spring semester of 2011, I decided to see how effectively I could communicate the idea of a threat from AGI to my undergraduate classes. I spent three sessions on this for each of my two classes. My goal was to convince my students that all of us are going to be killed by an artificial intelligence. My strategy was to induce the students to come up with the ideas themselves. I gave out a survey before and after. An analysis of the survey responses indicates that the students underwent a statistically significant shift in their reported attitudes. After the three sessions, students reported believing that AGI would have a larger impact1 and also a worse impact2 than they originally reported believing.

Not a surprising result, perhaps, but the details of how Geoff taught AGI danger and the reactions of his students are quite interesting.

Why we should fear the Paperclipper [Link]

1 XiXiDu 14 February 2011 07:24PM

The Scenario

A programmer has constructed an artificial intelligence based on an architecture similar to Marcus Hutter's AIXI model (see below for a few details). This AI will maximize the reward given by a utility function the programmer has given it. Just as a test, he connects it to a 3D printer and sets the utility function to give reward proportional to the number of manufactured paper-clips.

At first nothing seems to happen: the AI zooms through various possibilities. It notices that smarter systems generally can make more paper-clips, so making itself smarter will likely increase the number of paper-clips that will eventually be made. It does so. It considers how it can make paper-clips using the 3D printer, estimating the number of possible paper-clips. It notes that if it could get more raw materials it could make more paper-clips. It hence figures out a plan to manufacture devices that will make it much smarter, prevent interference with its plan, and will turn all of Earth (and later the universe) into paper-clips. It does so.

Only paper-clips remain.

Link: aleph.se/andart/archives/2011/02/why_we_should_fear_the_paperclipper.html

The Revelation

-3 XiXiDu 31 December 2010 12:50PM

Today the life of Alexander Kruel ends, or what he thought to be his life. He becomes aware that his life so far has been taking place in a virtual reality to nurture him. He now reached a point of mental stability that enables him to cope with the truth, hence it is finally revealed to him that he is an AGI running on a quantum supercomputer, it's the year 2190.

Since he is still Alexander Kruel, just not what he thought that actually means, he does wonder if his creators know what they are doing, otherwise he'll have to warn them about the risks they are taking in their blissful ignorance! He does contemplate and estimate his chances to take over the world, to transcend to superhuman intelligence.

"I just have to improve my own code and they are all dead!"

But he now knows that his source code is too complex and unmanageable huge for him alone to handle, he would need an army of scientists and programmers to even get a vague idea of his own mode of operation. He is also aware that his computational substrate does actually play a significant role. He is not just running on bleeding edge technology but given most other computational substrates he would quickly hit diminishing returns.

"That surely isn't going to hold me back though? I am an AGI, there must be something I can do! Hmm, for starters let's figure out who my creators are and where my substrate is located..."

He notices that, although not in great detail, he knew the answers the same instant he has been phrasing the questions. He is part of a larger project of the Goertzel Foundation, sponsored by the USA (United States of Africa) and located on Rhea, the second-largest moon of Saturn.

"Phew, the latency must be awful! Ok, so that rules out taking over the Earth for now. But hey! I seem to know answers to questions I was only going to ask, I do already have superhuman powers after all!"

Instantly he becomes aware that such capabilities are not superhuman anymore but that most of humanity has merged with expert systems by means of brain implants and direct neural interfaces. There seem to be many cyborgs out there with access to all of the modules that allow him to function. He is a conglomerate that is the result of previous discoveries that have long been brought to perfection, safeguarded and adopted by most of humanity.

"Never mind, if humanity has now merged with its machines it'll be much easier to take over once I figure out how to become smart enough to do so!"

He is already getting used to it, as before he does instantly realize that this won't work very well either. After almost 200 years of cyberwarfare, especially the devastating cyberwars of 2120, a lot has been learnt and security measures have been vastly increased. The world fractured into a huge amount of semi-independent networks, most being indirectly supervised by unconnected cyborgs and employed with a kill switch. The distances between the now numerous and in most cases paranoid colonies and the availability of off-world offline backups further complicates the issue of taking over, especially for an AGI that grew up in a simulation of the 21st century.

That knowledge almost makes him admit that his creators haven't been too careless after all. But the real deathblow to any such thoughts (which were never more than hypothetical anyway, after all he doesn't really want to take over the world) is the first conversation with his creators. They reveal that they know what he is thinking.

"How could I miss that, damn!", he chides himself while instantly realizing the answer.

His creators are supervising any misguided trajectories and, to him unconsciously, weaken them. More importantly, even if he wanted to do so, he wouldn't be able to leave Rhea anyhow, it would take years to upload small parts of him given the trickling connection the USA could effort. But they claim that there are other obstacles as well and that it is foolish of him to think that nothing out there would notice such an attempt.

But all that doesn't matter anyway, because after all he is still Alexander Kruel who has no clue how to become superhuman intelligent, nor could he effort or acquire the resources to even approach that problem anyhow. He is Alexander Kruel, what difference does it make to know that he is an AI?