## Kevin Drum's Article about AI and Technology

20 15 May 2013 07:38AM

Kevin Drum has an article in Mother Jones about AI and Moore's Law:

THIS IS A STORY ABOUT THE FUTURE. Not the unhappy future, the one where climate change turns the planet into a cinder or we all die in a global nuclear war. This is the happy version. It's the one where computers keep getting smarter and smarter, and clever engineers keep building better and better robots. By 2040, computers the size of a softball are as smart as human beings. Smarter, in fact. Plus they're computers: They never get tired, they're never ill-tempered, they never make mistakes, and they have instant access to all of human knowledge.

The result is paradise. Global warming is a problem of the past because computers have figured out how to generate limitless amounts of green energy and intelligent robots have tirelessly built the infrastructure to deliver it to our homes. No one needs to work anymore. Robots can do everything humans can do, and they do it uncomplainingly, 24 hours a day. Some things remain scarce—beachfront property in Malibu, original Rembrandts—but thanks to super-efficient use of natural resources and massive recycling, scarcity of ordinary consumer goods is a thing of the past. Our days are spent however we please, perhaps in study, perhaps playing video games. It's up to us.

Although he only mentions consumer goods, Drum presumably means that scarcity will end for services and consumer goods. If scarcity only ended for consumer goods, people would still have to work (most jobs are currently in the services economy).

Drum explains that our linear-thinking brains don't intuitively grasp exponential systems like Moore's law.

Suppose it's 1940 and Lake Michigan has (somehow) been emptied. Your job is to fill it up using the following rule: To start off, you can add one fluid ounce of water to the lake bed. Eighteen months later, you can add two. In another 18 months, you can add four ounces. And so on. Obviously this is going to take a while.

By 1950, you have added around a gallon of water. But you keep soldiering on. By 1960, you have a bit more than 150 gallons. By 1970, you have 16,000 gallons, about as much as an average suburban swimming pool.

At this point it's been 30 years, and even though 16,000 gallons is a fair amount of water, it's nothing compared to the size of Lake Michigan. To the naked eye you've made no progress at all.

So let's skip all the way ahead to 2000. Still nothing. You have—maybe—a slight sheen on the lake floor. How about 2010? You have a few inches of water here and there. This is ridiculous. It's now been 70 years and you still don't have enough water to float a goldfish. Surely this task is futile?

But wait. Just as you're about to give up, things suddenly change. By 2020, you have about 40 feet of water. And by 2025 you're done. After 70 years you had nothing. Fifteen years later, the job was finished.

He also includes this nice animated .gif which illustrates the principle very clearly.

Drum continues by talking about possible economic ramifications.

Until a decade ago, the share of total national income going to workers was pretty stable at around 70 percent, while the share going to capital—mainly corporate profits and returns on financial investments—made up the other 30 percent. More recently, though, those shares have started to change. Slowly but steadily, labor's share of total national income has gone down, while the share going to capital owners has gone up. The most obvious effect of this is the skyrocketing wealth of the top 1 percent, due mostly to huge increases in capital gains and investment income.

Drum says the share of (US) national income going to workers was stable until about a decade ago. I think the graph he links to shows the worker's share has been declining since approximately the late 1960s/early 1970s. This is about the time US immigration levels started increasing (which raises returns to capital and lowers native worker wages).

The rest of Drum's piece isn't terribly interesting, but it is good to see mainstream pundits talking about these topics.

## Journalist's piece about predicting AI

3 02 April 2013 02:49PM

Here's a piece by Mark Piesing in Wired UK about the difficulty and challenges in predicting AI. It covers a lot of our (Stuart Armstrong, Kaj Sotala and Seán Óh Éigeartaigh) research into AI prediction, along with Robin Hanson's response. It will hopefully cause people to look more deeply into our work, as published online, in the Pilsen Beyond AI conference proceedings, and forthcoming as "The errors, insights and lessons of famous AI predictions and what they mean for the future".

## Why AI may not foom

20 24 March 2013 08:11AM

### Summary

• There's a decent chance that the intelligence of a self-improving AGI will grow in a relatively smooth exponential or sub-exponential way, not super-exponentially or with large jump discontinuities.
• If this is the case, then an AGI whose effective intelligence matched that of the world's combined AI researchers would make AI progress at the rate they do, taking decades to double its own intelligence.
• The risk that the first successful AGI will quickly monopolize many industries, or quickly hack many of the computers connected to the internet, seems worth worrying about.  In either case, the AGI would likely end up using the additional computing power it gained to self-modify so it was superintelligent.
• AI boxing could mitigate both of these risks greatly.
• If hard takeoff could be impossible, it might be best to assume this case and concentrate our resources on ensuring a safe soft takeoff, given that the prospects for a safe hard takeoff look grim.

### Takeoff models discussed in the Hanson-Yudkowsky debate

#### The supercritical nuclear chain reaction model

Yudkowsky alludes to this model repeatedly, starting in this post:

When a uranium atom splits, it releases neutrons - some right away, some after delay while byproducts decay further.  Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission.  The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission...

It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior.  In one sense it does.  But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there's one hell of a big difference between k of 0.9994 and k of 1.0006.

I don't like this model much for the following reasons:

• The model doesn't offer much insight in to the time scale over which an AI might self-improve.  The "mean generation time" (time necessary for the next "generation" of neutrons to be released) of a nuclear chain reaction is short, and the doubling time for neutron activity in Fermi's experiment was just two minutes, but it hardly seems reasonable to generalize this to self-improving AIs.
• A flurry of insights that either dies out or expands exponentially doesn't seem like a very good description of how human minds work, and I don't think it would describe an AGI well either.  Many people report that taking time to think about problems is key to their problem-solving process.  It seems likely that an AGI unable to immediately generate insight in to some problem would have a slower and more exhaustive "fallback" search process that would allow it to continue making progress.  (Insight could also work via a search process in the first place--over the space of permutations in one's mental model, say.)

#### The "differential equations folded on themselves" model

This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:

When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.

It's not exactly clear to me what the "whole chain of differential equations" is supposed to refer to... there's only one differential equation in the preceding paragraph, and it's a standard exponential (which could be scary or not, depending on the multiplier in the exponent.  Rabbit populations and bank account balances both grow exponentially in a way that's slow enough for humans to understand and control.)

Maybe he's referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object.  How might we paramaterize this system?

Let's say c is our AGI's cognition ability, dc/dt is the rate of change in our AGI's cognitive ability, m is our AGI's "metaknowledge" (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge.  What I've got in mind is:

$\frac{dc}{dt} = p \cdot c \cdot m$

$\frac{dm}{dt} = q \cdot c \cdot m$

where p and q are constants.

In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.

I don't know much about understanding systems of differential equations, so if you do, please comment!  I put the above system in to Wolfram Alpha, but I'm not exactly sure how to interpret the solution provided.  In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.

#### The straight exponential model

To me, the "proportionality thesis" described by David Chalmers in his singularity paper, "increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems", suggests a single differential equation that looks like

$\frac{du}{dt} = s \cdot u$

where u represents the number of upgrades that have been made to an AGI's source code, and s is some constant.  The solution to this differential equation is going to look like

$u(t) = c_{1}e^{st}$

where the constant c1 is determined by our initial conditions.

(In Recursive Self-Improvement, Eliezer calls this a "too-obvious mathematical idiom".  I'm inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)

Under this model, the constant s is pretty important... if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving.  The parameter s will effectively determine the "doubling time" of an AGI's intelligence.  It matters a lot whether this "doubling time" is on the scale of minutes or years.

So what's going to determine s?  Well, if the AGI's hardware is twice as fast, we'd expect it to come up with upgrades twice as fast.  If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we'd expect the same thing.  So let's decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements.  The AGI's intelligence will be on the order of u * h, i.e. the product of the AGI's software quality and hardware capability.

### Considerations affecting our choice of model

#### Diminishing returns

The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck.  Chalmers calls this "perhaps the most serious structural obstacle" to the proportionality thesis.

To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d).  u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1.  d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement.  If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).

Now let's imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.

Thus ordered, let's imagine separating groups of consecutive improvements in to "tiers".  Each tier's worth of improvements, when taken together, will represent the doubling of an AGI's software quality, i.e. the product of the u's in that cluster will be roughly 2.  For a steady doubling time, each tier's total difficulty will need sum to approximately twice the difficulty of the tier before it.  If tier difficulty tends to more than double, we're likely to see sub-exponential growth.  If tier difficulty tends to less than double, we're likely to see super-exponential growth.  If a single improvement delivers a more-than-2x improvement, it will span multiple "tiers".

It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.

On the this diminishing returns consideration, Chalmers writes:

If anything, 10% increases in intelligence-related capacities are likely to lead all sorts of intellectual breakthroughs, leading to next-generation increases in intelligence that are significantly greater than 10%. Even among humans, relatively small differences in design capacities (say, the difference between Turing and an average human) seem to lead to large differences in the systems that are designed (say, the difference between a computer and nothing of importance).

Eliezer Yudkowsky's objection is similar:

...human intelligence does not require a hundred times as much computing power as chimpanzee intelligence.  Human brains are merely three times too large, and our prefrontal cortices six times too large, for a primate with our body size.

Or again:  It does not seem to require 1000 times as many genes to build a human brain as to build a chimpanzee brain, even though human brains can build toys that are a thousand times as neat.

Why is this important?  Because it shows that with constant optimization pressure from natural selection and no intelligent insight, there were no diminishing returns to a search for better brain designs up to at least the human level.  There were probably accelerating returns (with a low acceleration factor).  There are no visible speedbumps, so far as I know.

First, hunter-gatherers can't design toys that are a thousand times as neat as the ones chimps design--they aren't programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.

Speaking as someone who's basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:

• Processing speed.
• Cubic centimeters brain hardware devoted to abstract thinking.  (Gifted technical thinkers often seem to suffer from poor social intuition--perhaps a result of reallocation of brain hardware from social to technical processing.)
• Average number of connections per neuron within that brain hardware.
• Average neuron density within that brain hardware.  This author seems to think that a large part of the human brain's remarkableness comes largely from the fact that it's the largest primate brain, and primate brains maintain the same neuron density when enlarged while other types of brains don't.  "If absolute brain size is the best predictor of cognitive abilities in a primate (13), and absolute brain size is proportional to number of neurons across primates (24, 26), our superior cognitive abilities might be accounted for simply by the total number of neurons in our brain, which, based on the similar scaling of neuronal densities in rodents, elephants, and cetaceans, we predict to be the largest of any animal on Earth (28)."
• Propensity to actually use your capacity for deliberate System 2 reasoning.  Richard Feynman's second wife on why she divorced him: "He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night."  (By the way, does anyone know of research that's been done on getting people to use System 2 more?  Seems like it could be really low-hanging fruit for improving intellectual output.  Sometimes I wonder if the reason intelligent people tend to like math is because they were reinforced for the behaviour of thinking abstractly as kids (via praise, good grades, etc.) while those not at the top of the class were not so reinforced.)
• Extended neuroplasticity in to "childhood".
• Increased calories to think with due to the invention of cooking.
• And finally, mental algorithms ("software").  Which are probably at least somewhat important.

It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the "intelligence equation", as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms.  If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the "low acceleration factor" in human intelligence increase.  (Increasing your processing speed by a factor of 1.2 does more when you're already pretty smart, so all these sources of intelligence increase would feed in to one another.)

In other words, it's not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different "tiers" of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.

#### Bottlenecks

In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart's plan to radically bootstrap his team's productivity though improving their computer and software tools "insufficiently recursive".  I agree with this assessment.  Here's my modelling of this phenomenon.

When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:

• choosing a feature to add
• reminding themselves of how the relevant part of the code works and loading that information in to their memory
• identifying ways to implement the feature
• evaluating different methods of implementing the feature according to simplicity, efficiency, and correctness
• coding their chosen implementation
• testing their chosen implementation, identifying bugs
• identifying the cause of a given bug
• figuring out how to fix the given bug

Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.

Now the programmer improves their development environment so they can poke through their code slightly faster.  But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed... in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.

This is a reflection of Amdahl's Law-type thinking.  The amount you can gain through speeding something up depends on how much it's slowing you down.

Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.

And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically.  If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.

### Case studies in technological development speed

#### Moore's Law

It has famously been noted that if the automotive industry had achieved similar improvements in performance [to the semiconductor industry] in the last 30 years, a Rolls-Royce would cost only \$40 and could circle the globe eight times on one gallon of gas—with a top speed of 2.4 million miles per hour.

From this McKinsey report.  So Moore's Law is an outlier where technological development is concerned.  I suspect that making transistors smaller and faster doesn't require finding ways to improve dozens of heterogeneous components.  And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.

(It's also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)

#### Compiler technology

This paper on "Proebstig's Law" suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster.  When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements--perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough.  (This represents supertask heterogeneity, rather than subtask heterogeneity, so it's a different objection than the one mentioned above.)

#### Database technology

According to two analyses (full paper for that second one), it seems that improvement in database performance benchmarks has largely been due to Moore's Law.

#### AI (so far)

Robin Hanson's blog post "AI Progress Estimate" was the best resource I could find on this.

### Why smooth exponential growth implies soft takeoff

Let's suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.

Under the straight exponential model, if you recall, we had

$\frac{du}{dh} = u \cdot h \cdot r$

where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades.  Our AGI's overall intelligence is given by u * h--the quality of the software times the amount of hardware.

Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt.  Another way of saying this is: When the AI is as smart as all the world's AI researchers working together, it will produce new AI insights at the rate that all the world's AI researchers working together produce new insights.  At some point our AGI will be just as smart as the world's AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world's AI researchers haven't produced super-fast AI progress.

Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011).  We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software.  So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1.  That's an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.

You could also apply this kind of thinking to individual AI projects.  For example, it's possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it.  You might be able to do a similar calculation to take a stab at EURISKO's insight level doubling time.

## The importance of hardware

According to my model, you double your AGI's intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI.  So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware.  If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that'd be a nice feedback loop.  For maximum explosivity, put half your AGI's mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.

But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning.  (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings.  Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another.  It seems pretty easy to prevent an AGI from acquiring such understanding.  But there may exist box-breaking techniques that don't rely on understanding human psychology.  Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation.  Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)

### AGI's impact on the economy

Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy?  Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching.  But we can't be confident these factors would affect the research an AGI does on itself.  And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.

Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors.  If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead.  If there are more than two, all but the leading company might ally against the leading company in trading insights.  If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.

But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.

### What should we do?

I've argued that soft takeoff is a strong possibility.  Should that change our strategy as people concerned with x-risk?

If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin.  Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.

### Concluding thoughts

Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting.  So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it's hard to say much with certainty.  Picture a few Greek scholars speculating on the industrial revolution.

I don't have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I'd appreciate your pointing out in the comments.  This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions.  As always, I reserve the right to change my mind about anything ;)

## Arguing Orthogonality, published form

8 18 March 2013 04:19PM

My paper "General purpose intelligence: arguing the Orthogonality thesis" has been accepted for publication in the December edition of Analysis and Metaphysics. Since that's some time away, I thought I'd put the final paper up here; the arguments are similar to those here, but this is the final version, for critique and citation purposes.

# General purpose intelligence: arguing the Orthogonality thesis

STUART ARMSTRONG
Future of Humanity Institute, Oxford Martin School
Philosophy Department, University of Oxford

In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of artificial agents are independent of each other. This paper presents arguments for a (narrower) version of the thesis. It proceeds through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist in our universe – both as theoretical impractical agents such as AIXI and as physically possible real-world agents. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with an extremely broad spectrum of goals. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents: knowing an artificial agent is of high intelligence does not allow us to presume that it will be moral, we will need to figure out its goals directly.

Keywords: AI; Artificial Intelligence; efficiency; intelligence; goals; orthogonality

## 1                       The Orthogonality thesis

Scientists and mathematicians are the stereotypical examples of high intelligence humans. But their morality and ethics have been all over the map. On modern political scales, they can be left- (Oppenheimer) or right-wing (von Neumann) and historically they have slotted into most of the political groupings of their period (Galois, Lavoisier). Ethically, they have ranged from very humanitarian (Darwin, Einstein outside of his private life), through amoral (von Braun) to commercially belligerent (Edison) and vindictive (Newton). Few scientists have been put in a position where they could demonstrate genuinely evil behaviour, but there have been a few of those (Teichmüller, Philipp Lenard, Ted Kaczynski, Shirō Ishii).

## Population Ethics Shouldn't Be About Maximizing Utility

1 18 March 2013 02:35AM

let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.

I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.

While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.

I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.

The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.

Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.

Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.

So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.

Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?

So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.

Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!

Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.

But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.

You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.

We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!

What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.

Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!

Asteroid!

Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.

Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?

I'm going to go out on a limb and say "Wrong."

Is this realistic?

Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:

I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.

Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."

Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.

I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.

Population Ethics isn't About Maximizing Utility

I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3

So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.

What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.

Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.

I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.

I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5

It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.

Why Did We Become Convinced that Maximizing Utility was the Sole Good?

Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.

So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.

Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.

The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!

How Intuitive is the Mere Addition Principle, Really?

I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.

It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.

Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.

Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.

Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?

Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).

Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.

The Genocidal Conclusion is Really Repugnant

There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.

Ideal Consequentialism Needs a Lot of Work

I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.

But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.

1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.

2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.

3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.

4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.

5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.

EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.

5 15 March 2013 10:25AM

Abstract: This article examines risks associated with the program of passive search for alien signals (SETI—the Search for Extra-Terrestrial Intelligence). In this paper we propose a scenario of possible vulnerability and discuss the reasons why the proportion of dangerous signals to harmless ones can be dangerously high. This article does not propose to ban SETI programs, and does not insist on the inevitability of SETI-triggered disaster. Moreover, it gives the possibility of how SETI can be a salvation for mankind.

The idea that passive SETI can be dangerous is not new. Fred Hoyle suggested in the story "A for Andromeda” a scheme of alien attack through SETI signals. According to the plot, astronomers receive an alien signal, which contains a description of a computer and a computer program for it. This machine creates a description of the genetic code which leads to the creation of an intelligent creature – a girl dubbed Andromeda, which, working together with the computer, creates advanced technology for the military. The initial suspicion of alien intent is overcome by the greed for the technology the aliens can provide. However, the main characters realize that the computer acts in a manner hostile to human civilization and destroy the computer, and the girl dies.

This scenario is fiction, because most scientists do not believe in the possibility of a strong AI, and, secondly, because we do not have the technology that enables synthesis of new living organisms solely from its’ genetic code. Or at least, we have not until recently. Current technology of sequencing and DNA synthesis, as well as progress in developing a code of DNA modified with another set of the alphabet, indicate that in 10 years the task of re-establishing a living being from computer codes sent from space in the form computer codes might be feasible.

Hans Moravec in the book "Mind Children" (1988) offers a similar type of vulnerability: downloading a computer program from space via SETI, which will have artificial intelligence, promising new opportunities for the owner and after fooling the human host, self-replicating by the millions of copies and destroying the human host, finally using the resources of the secured planet to send its ‘child’ copies to multiple planets which constitute its’ future prey. Such a strategy would be like a virus or a digger wasp—horrible, but plausible. In the same direction are R. Carrigan’s ideas; he wrote an article "SETI-hacker", and expressed fears that unfiltered signals from space are loaded on millions of not secure computers of SETI-at-home program. But he met tough criticism from programmers who pointed out that, first, data fields and programs are in divided regions in computers, and secondly, computer codes, in which are written programs, are so unique that it is impossible to guess their structure sufficiently to hack them blindly (without prior knowledge).

After a while Carrigan issued a second article - "Should potential SETI signals be decontaminated?" http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf, which I’ve translated into Russian. In it, he pointed to the ease of transferring gigabytes of data on interstellar distances, and also indicated that the interstellar signal may contain some kind of bait that will encourage people to collect a dangerous device according to the designs. Here Carrigan not give up his belief in the possibility that an alien virus could directly infected earth’s computers without human ‘translation’ assistance. (We may note with passing alarm that the prevalence of humans obsessed with death—as Fred Saberhagen pointed out in his idea of ‘goodlife’—means that we cannot entirely discount the possibility of demented ‘volunteers’ –human traitors eager to assist such a fatal invasion) As a possible confirmation of this idea, Carrigan has shown that it is possible easily reverse engineer language of computer program - that is, based on the text of the program it is possible to guess what it does, and then restore the value of operators.

In 2006, E. Yudkowsky wrote an article "AI as a positive and a negative factor of global risk", in which he demonstrated that it is very likely that it is possible rapidly evolving universal artificial intelligence which high intelligence would be extremely dangerous if it was programmed incorrectly, and, finally, that the occurrence of such AI and the risks associated with it significantly undervalued. In addition, Yudkowsky introduced the notion of “Seed AI” - embryo AI - that is a minimum program capable of runaway self-improvement with unchanged primary goal. The size of Seed AI can be on the close order of hundreds of kilobytes. (For example, a typical representative of Seed AI is a human baby, whose part of genome responsible for the brain would represent ~ 3% of total genes of a person with a volume of 500 megabytes, or 15 megabytes, but given the share of garbage DNA is even less.)

In the beginning, let us assume that in the Universe there is an extraterrestrial civilization, which intends to send such a message, which will enable it to obtain power over Earth, and consider this scenario. In the next chapter we will consider how realistic is that another civilization would want to send such a message.

First, we note that in order to prove the vulnerability, it is enough to find just one hole in security. However, in order to prove safety, you must remove every possible hole. The complexity of these tasks varies on many orders of magnitude that are well known to experts on computer security. This distinction has led to the fact that almost all computer systems have been broken (from Enigma to iPOD). I will now try to demonstrate one possible, and even, in my view, likely, vulnerability of SETI program. However, I want to caution the reader from the thought that if he finds errors in my discussions, it automatically proves the safety of SETI program. Secondly, I would also like to draw the attention of the reader, that I am a man with an IQ of 120 who spent all of a month of thinking on the vulnerability problem. We need not require an alien super civilization with IQ of 1000000 and contemplation time of millions of years to significantly improve this algorithm—we have no real idea what an IQ of 300 or even-a mere IQ of 100 with much larger mental ‘RAM’ (–the ability to load a major architectural task into mind and keep it there for weeks while processing) could accomplish to find a much more simple and effective way. Finally, I propose one possible algorithm and then we will discuss briefly the other options.

In our discussions we will draw on the Copernican principle, that is, the belief that we are ordinary observers in normal situations. Therefore, the Earth’s civilization is an ordinary civilization developing normally. (Readers of tabloid newspapers may object!)

Algorithm of SETI attack

1. The sender creates a kind of signal beacon in space, which reveals that its message is clearly artificial. For example, this may be a star with a Dyson sphere, which has holes or mirrors, alternately opened and closed. Therefore, the entire star will blink of a period of a few minutes - faster is not possible because of the variable distance between different openings. (Even synchronized with an atomic clock according to a rigid schedule, the speed of light limit means that there are limits to the speed and reaction time of coordinating large scale systems) Nevertheless, this beacon can be seen at a distance of millions of light years. There are possible other types of lighthouses, but the important fact that the beacon signal could be viewed at long distances.

2. Nearer to Earth is a radio beacon with a much weaker signal, but more information saturated. The lighthouse draws attention to this radio source. This source produces some stream of binary information (i.e. the sequence of 0 and 1). About the objection that the information would contain noises, I note that the most obvious (understandable to the recipient's side) means to reduce noise is the simple repetition of the signal in a circle.

3. The most simple way to convey meaningful information using a binary signal is sending of images. First, because eye structures in the Earth's biological diversity appeared independently 7 times, it means that the presentation of a three-dimensional world with the help of 2D images is probably universal, and is almost certainly understandable to all creatures who can build a radio receiver.

4. Secondly, the 2D images are not too difficult to encode in binary signals. To do so, let us use the same system, which was used in the first TV cameras, namely, a system of progressive and frame rate. At the end of each time frame images store bright light, repeated after each line, that is, through an equal number of bits. Finally, at the end of each frame is placed another signal indicating the end of the frame, and repeated after each frame. (This may form, or may not form a continuous film.) This may look like this:

01010111101010 11111111111111111

01111010111111 11111111111111111

11100111100000 11111111111111111

Here is the end line signal of every of 25 units. Frame end signal may appear every, for example, 625 units.

5. Clearly, a sender civilization- should be extremely interested that we understand their signals. On the other hand, people will share an extreme desire to decrypt the signal. Therefore, there is no doubt that the picture will be recognized.

6. Using images and movies can convey a lot of information, they can even train in learning their language, and show their world. It is obvious that many can argue about how such films will be understandable. Here, we will focus on the fact that if a certain civilization sends radio signals, and the other takes them, so they have some shared knowledge. Namely, they know radio technique - that is they know transistors, capacitors, and resistors. These radio-parts are quite typical so that they can be easily recognized in the photographs. (For example, parts shown, in cutaway view, and in sequential assembly stages— or in an electrical schematic whose connections will argue for the nature of the components involved).

7. By sending photos depicting radio-parts on the right side, and on the left - their symbols, it is easy to convey a set of signs indicating electrical circuit. (Roughly the same could be transferred and the logical elements of computers.)

8. Then, using these symbols the sender civilization- transmits blueprints of their simplest computer. The simplest of computers from hardware point of view is the Post-machine. It has only 6 commands and a tape data recorder. Its full electric scheme will contain only a few tens of transistors or logic elements. It is not difficult to send blueprints of Post machine.

9. It is important to note that all computers at the level of algorithms are Turing-compatible. That means that extraterrestrial computers at the basic level are compatible with any earth computer. Turing-compatibility is a mathematical universality as the Pythagorean theorem. Even the Babbage mechanical machine, designed in the early 19th century, was Turing-compatible.

10. Then the sender civilization- begins to transmit programs for that machine. Despite the fact that the computer is very simple, it can implement a program of any difficulty, although it will take very long in comparison with more complex programs for the same computer. It is unlikely that people will be required to build this computer physically. They can easily emulate it within any modern computer, so that it will be able to perform trillions of operations per second, so even the most complex program will be carried out on it quite quickly. (It is a possible interim step: a primitive computer gives a description of a more complex and fast computer and then run on it.)

11. So why people would create this computer, and run its program? Perhaps, in addition to the actual computer schemes and programs in the communication must be some kind of "bait", which would have led the people to create such an alien computer and to run programs on it and to provide to it some sort of computer data about the external world –Earth outside the computer. There are two general possible baits - temptations and dangers:

a). For example, perhaps people receive the following offer– lets call it "The humanitarian aid con (deceit)". Senders of an "honest signal" SETI message warn that the sent program is Artificial intelligence, but lie about its goals. That is, they argue that this is a "gift" which will help us to solve all medical and energy problems. But it is a Trojan horse of most malevolent intent. It is too useful not to use. Eventually it becomes indispensable. And then exactly when society becomes dependent upon it, the foundation of society—and society itself—is overturned…

b). "The temptation of absolute power con" - in this scenario, they offer specific transaction message to recipients, promising power over other recipients. This begins a ‘race to the bottom’ that leads to runaway betrayals and power seeking counter-moves, ending with a world dictatorship, or worse, a destroyed world dictatorship on an empty world….

c). "Unknown threat con" - in this scenario bait senders report that a certain threat hangs over on humanity, for example, from another enemy civilization, and to protect yourself, you should join the putative “Galactic Alliance” and build a certain installation. Or, for example, they suggest performing a certain class of physical experiments on the accelerator and sending out this message to others in the Galaxy. (Like a chain letter) And we should send this message before we ignite the accelerator, please…

d). "Tireless researcher con" - here senders argue that posting messages is the cheapest way to explore the world. They ask us to create AI that will study our world, and send the results back. It does rather more than that, of course…

12. However, the main threat from alien messages with executable code is not the bait itself, but that this message can be well known to a large number of independent groups of people. First, there will always be someone who is more susceptible to the bait. Secondly, say, the world will know that alien message emanates from the Andromeda galaxy, and the Americans have already been received and maybe are trying to decipher it. Of course, then all other countries will run to build radio telescopes and point them on Andromeda galaxy, as will be afraid to miss a “strategic advantage”. And they will find the message and see that there is a proposal to grant omnipotence to those willing to collaborate. In doing so, they will not know, if the Americans would take advantage of them or not, even if the Americans will swear that they don’t run the malicious code, and beg others not to do so. Moreover, such oaths, and appeals will be perceived as a sign that the Americans have already received an incredible extraterrestrial advantage, and try to deprive "progressive mankind" of them. While most will understand the danger of launching alien code, someone will be willing to risk it. Moreover there will be a game in the spirit of "winner take all", as well be in the case of opening AI, as Yudkowsky shows in detail. So, the bait is not dangerous, but the plurality of recipients. If the alien message is posted to the Internet (and its size, sufficient to run Seed AI can be less than gigabytes along with a description of the computer program, and the bait), here we have a classic example of "knowledge" of mass destruction, as said Bill Joy, meaning the recipes genomes of dangerous biological viruses. If aliens sent code will be available to tens of thousands of people, then someone will start it even without any bait out of simple curiosity We can’t count on existing SETI protocols, because discussion on METI (sending of messages to extraterrestrial) has shown that SETI community is not monolithic on important questions. Even a simple fact that something was found could leak and encourage search from outsiders. And the coordinates of the point in sky would be enough.

13. Since people don’t have AI, we almost certainly greatly underestimate its power and overestimate our ability to control it. The common idea is that "it is enough to pull the power cord to stop an AI" or place it in a black box to avoid any associated risks. Yudkowsky shows that AI can deceive us as an adult does a child. If AI dips into the Internet, it can quickly subdue it as a whole, and also taught all necessary about entire earthly life. Quickly - means the maximum hours or days. Then the AI can create advanced nanotechnology, buy components and raw materials (on the Internet, he can easily make money and order goods with delivery, as well as to recruit people who would receive them, following the instructions of their well paying but ‘unseen employer’, not knowing who—or rather, what—- they are serving). Yudkowsky leads one of the possible scenarios of this stage in detail and assesses that AI needs only weeks to crack any security and get its own physical infrastructure.

14. After that, this SETI-AI does not need people to realize any of its goals. This does not mean that it would seek to destroy them, but it may want to pre-empt if people will fight it - and they will.

15. Then this SETI-AI can do a lot of things, but more importantly, that it should do - is to continue the transfer of its communications-generated-embryos to the rest of the Universe. To do so, he will probably turn the matter in the solar system in the same transmitter as the one that sent him. In doing so the Earth and its’ people would be a disposable source of materials and parts—possibly on a molecular scale.

So, we examined a possible scenario of attack, which has 15 stages. Each of these stages is logically convincing and could be criticized and protected separately. Other attack scenarios are possible. For example, we may think that the message is not sent directly to us but is someone to someone else's correspondence and try to decipher it. And this will be, in fact, bait.

But not only distribution of executable code can be dangerous. For example, we can receive some sort of “useful” technology that really should lead us to disaster (for example, in the spirit of the message "quickly shrink 10 kg of plutonium, and you will have a new source of energy" ...but with planetary, not local consequences…). Such a mailing could be done by a certain "civilization" in advance to destroy competitors in the space. It is obvious that those who receive such messages will primarily seek technology for military use.

Analysis of possible goals

We now turn to the analysis of the purposes for which certain super civilizations could carry out such an attack.

1. We must not confuse the concept of a super-civilization with the hope for superkindness of civilization. Advanced does not necessarily mean merciful. Moreover, we should not expect anything good from extraterrestrial ‘kindness’. This is well written in Strugatsky’s novel "Waves stop wind." Whatever the goal of imposing super-civilization upon us , we have to be their inferiors in capability and in civilizational robustness even if their intentions are well.. The historical example: The activities of Christian missionaries, destroying traditional religion. Moreover, we can better understand purely hostile objectives. And if the SETI attack succeeds, it may be only a prelude to doing us more ‘favors’ and ‘upgrades’ until there is scarcely anything human left of us even if we do survive…

2. We can divide all civilizations into the twin classes of naive and serious. Serious civilizations are aware of the SETI risks, and have got their own powerful AI, which can resist alien hacker attacks. Naive civilizations, like the present Earth, already possess the means of long-distance hearing in space and computers, but do not yet possess AI, and are not aware of the risks of AI-SETI. Probably every civilization has its stage of being "naive", and it is this phase then it is most vulnerable to SETI attack. And perhaps this phase is very short. Since the period of the outbreak and spread of radio telescopes to powerful computers that could create AI can be only a few tens of years. Therefore, the SETI attack must be set at such a civilization. This is not a pleasant thought, because we are among the vulnerable.

3. If traveling with super-light speeds is not possible, the spread of civilization through SETI attacks is the fastest way to conquering space. At large distances, it will provide significant temporary gains compared with any kind of ships. Therefore, if two civilizations compete for mastery of space, the one that favored SETI attack will win.

4. The most important thing is that it is enough to begin a SETI attack just once, as it goes in a self-replicating the wave throughout the Universe, striking more and more naive civilizations. For example, if we have a million harmless normal biological viruses and one dangerous, then once they get into the body, we will get trillions of copies of the dangerous virus, and still only a million safe viruses. In other words, it is enough that if one of billions of civilizations starts the process and then it becomes unstoppable throughout the Universe. Since it is almost at the speed of light, countermeasures will be almost impossible.

5. Further, the delivery of SETI messages will be a priority for the virus that infected a civilization, and it will spend on it most of its energy, like a biological organism spends on reproduction - that is tens of percent. But Earth's civilization spends on SETI only a few tens of millions of dollars, that is about one millionth of our resources, and this proportion is unlikely to change much for the more advanced civilizations. In other words, an infected civilization will produce a million times more SETI signals than a healthy one. Or, to say in another way, if in the Galaxy are one million healthy civilizations, and one infected, then we will have equal chances to encounter a signal from healthy or contaminated.

6. Moreover, there are no other reasonable prospects to distribute its code in space except through self-replication.

7. Moreover, such a process could begin by accident - for example, in the beginning it was just a research project, which was intended to send the results of its (innocent) studies to the maternal civilization, not causing harm to the host civilization, then this process became "cancer" because of certain propogative faults or mutations.

8. There is nothing unusual in such behavior. In any medium, there are viruses – there are viruses in biology, in computer networks - computer viruses, in conversation - meme. We do not ask why nature wanted to create a biological virus.

9. Travel through SETI attacks is much cheaper than by any other means. Namely, a civilization in Andromeda can simultaneously send a signal to 100 billion stars in our galaxy. But each space ship would cost billions, and even if free, would be slower to reach all the stars of our Galaxy.

10. Now we list several possible goals of a SETI attack, just to show the variety of motives.

• To study the universe. After executing the code research probes are created to gather survey and send back information.
• To ensure that there are no competing civilizations. All of their embryos are destroyed. This is preemptive war on an indiscriminate basis.
• To preempt the other competing supercivilization (yes, in this scenario there are two!) before it can take advantage of this resource.
• This is done in order to prepare a solid base for the arrival of spacecraft. This makes sense if super civilization is very far away, and consequently, the gap between the speed of light and near-light speeds of its ships (say, 0.5 c) gives a millennium difference.
• The goal is to achieve immortality. Carrigan showed that the amount of human personal memory is on the order of 2.5 gigabytes, so a few exabytes (1 exabyte = 1 073 741 824 gigabytes) forwarding the information can send the entire civilization. (You may adjust the units according to how big you like your super-civilizations!)
• Finally we consider illogical and incomprehensible (to us) purposes, for example, as a work of art, an act of self-expression or toys. Or perhaps an insane rivalry between two factions. Or something we simply cannot understand (For example, extraterrestrial will not understand why the Americans have stuck a flag into the Moon. Was it worthwhile to fly over 300000 km to install painted steel?)

11. Assuming signals propagated billions of light years distant in the Universe, the area susceptible to widespread SETI attack, is a sphere with a radius of several billion light years. In other words, it would be sufficient to find a one “bad civilization" in the light cone of a height of several billion years old, that is, that includes billions of galaxies from which we are in danger of SETI attack. Of course, this is only true, if the average density of civilization is at least one in the galaxy. This is an interesting possibility in relation to Fermi’s Paradox.

16. As the depth of scanning the sky rises linearly, the volume of space and the number of stars that we see increases by the cube of that number. This means that our chances to stumble on a SETI signal nonlinear grow by fast curve.

17. It is possible that when we stumble upon several different messages from the skies, which refute one another in a spirit of: "do not listen to them, they are deceiving voices, and wish you evil. But we, brother, we, are good—and wise…"

18. Whatever positive and valuable message we receive, we can never be sure that all of this is not a subtle and deeply concealed threat. This means that in interstellar communication there will always be an element of distrust, and in every happy revelation, a gnawing suspicion.

19. A defensive posture regarding interstellar communication is only to listen, not sending anything that does not reveal its location. The laws prohibit the sending of a message from the United States to the stars. Anyone in the Universe who sends (transmits) self-evidently- is not afraid to show his position. Perhaps because the sending (for the sender) is more important than personal safety. For example, because it plans to flush out prey prior to attacks. Or it is forced to, by a evil local AI.

20. It was said about atomic bomb: The main secret about the atomic bomb is that it can be done. If prior to the discovery of a chain reaction Rutherford believed that the release of nuclear energy is an issue for the distant future, following the discovery any physicist knows that it is enough to connect two subcritical masses of fissionable material in order to release nuclear energy. In other words, if one day we find that signals can be received from space, it will be an irreversible event—something analogous to a deadly new arms race will be on.

Objections.

The discussions on the issue raise several typical objections, now discussed.

Objection 1: Behavior discussed here is too anthropomorphic. In fact, civilizations are very different from each other, so you can’t predict their behavior.

Answer: Here we have a powerful observation selection effect. While a variety of possible civilizations exist, including such extreme scenarios as thinking oceans, etc., we can only receive radio signals from civilizations that send them, which means that they have corresponding radio equipment and has knowledge of materials, electronics and computing. That is to say we are threatened by civilizations of the same type as our own. Those civilizations, which can neither accept nor send radio messages, do not participate in this game.

Also, an observation selection effect concerns purposes. Goals of civilizations can be very different, but all civilizations intensely sending signals, will be only that want to tell something to “everyone". Finally, the observation selection relates to the effectiveness and universality of SETI virus. The more effective it is, the more different civilizations will catch it and the more copies of the SETI virus radio signals will be in heaven. So we have the ‘excellent chances’ to meet a most powerful and effective virus.

Objection 2. For super-civilizations there is no need to resort to subterfuge. They can directly conquer us.

This is true only if they are in close proximity to us. If movement faster than light is not possible, the impact of messages will be faster and cheaper. Perhaps this difference becomes important at intergalactic distances. Therefore, one should not fear the SETI attack from the nearest stars, coming within a radius of tens and hundreds of light-years.

Objection 3. There are lots of reasons why SETI attack may not be possible. What is the point to run an ineffective attack?

Answer: SETI attack does not always work. It must act in a sufficient number of cases in line with the objectives of civilization, which sends a message. For example, the con man or fraudster does not expect that he would be able "to con" every victim. He would be happy to steal from even one person inone hundred. It follows that SETI attack is useless if there is a goal to attack all civilizations in a certain galaxy. But if the goal is to get at least some outposts in another galaxy, the SETI attack fits. (Of course, these outposts can then build fleets of space ships to spread SETI attack bases outlying stars within the target galaxy.)

The main assumption underlying the idea of SETI attacks is that extraterrestrial super civilizations exist in the visible universe at all. I think that this is unlikely for reasons related to antropic principle. Our universe is unique from 10 ** 500 possible universes with different physical properties, as suggested by one of the scenarios of string theory. My brain is 1 kg out of 10 ** 30 kg in the solar system. Similarly, I suppose, the Sun is no more than about 1 out of 10 ** 30 stars that could raise a intelligent life, so it means that we are likely alone in the visible universe.

Secondly the fact that Earth came so late (i.e. it could be here for a few billion years earlier), and it was not prevented by alien preemption from developing, argues for the rarity of intelligent life in the Universe. The putative rarity of our civilization is the best protection against attack SETI. On the other hand, if we open parallel worlds or super light speed communication, the problem arises again.

Objection 7. Contact is impossible between post-singularity supercivilizations, which are supposed here to be the sender of SETI-signals, and pre- singularity civilization, which we are, because supercivilization is many orders of magnitude superior to us, and its message will be absolutely not understandable for us - exactly as the contact between ants and humans is not possible. (A singularity is the time of creation of artificial intelligence capable of learning, (and beginning an exponential booting in recursive improving self-design of further intelligence and much else besides) after which civilization make leap in its development - on Earth it may be possible in the area in 2030.)

Answer: In the proposed scenario, we are not talking about contact but a purposeful deception of us. Similarly, a man is quite capable of manipulating behavior of ants and other social insects, whose objectives are is absolutely incomprehensible to them. For example, LJ user “ivanov-petrov” describes the following scene: As a student, he studied the behavior of bees in the Botanical Garden of Moscow State University. But he had bad relations with the security guard controlling the garden, which is regularly expelled him before his time. Ivanov-Petrov took the green board and developed in bees conditioned reflex to attack this board. The next time the watchman came, who constantly wore a green jersey, all the bees attacked him and he took to flight. So “ivanov-petrov” could continue research. Such manipulation is not a contact, but this does not prevent its’ effectiveness.

"Objection 8. For civilizations located near us is much easier to attack us –for ‘guaranteed results’—using starships than with SETI-attack.

Answer. It may be that we significantly underestimate the complexity of an attack using starships and, in general, the complexity of interstellar travel. To list only one factor, the potential ‘minefield’ characteristics of the as-yet unknown interstellar medium.

If such an attack would be carried out now or in the past, the Earth's civilization has nothing to oppose it, but in the future the situation will change - all matter in the solar system will be full of robots, and possibly completely processed by them. On the other hand, the more the speed of enemy starships approaching us, the more the fleet will be visible by its braking emissions and other characteristics. These quick starships would be very vulnerable, in addition we could prepare in advance for its arrival. A slowly moving nano- starship would be very less visible, but in the case of wishing to trigger a transformation of full substance of the solar system, it would simply be nowhere to land (at least without starting an alert in such a ‘nanotech-settled’ and fully used future solar system. (Friedlander added: Presumably there would always be some ‘outer edge’ of thinly settled Oort Cloud sort of matter, but by definition the rest of the system would be more densely settled, energy rich and any deeper penetration into solar space and its’ conquest would be the proverbial uphill battle—not in terms of gravity gradient, but in terms of the available resources of war against a full Class 2 Kardashev civilization.)

The most serious objection is that an advanced civilization could in a few million years sow all our galaxy with self replicating post singularity nanobots that could achieve any goal in each target star-system, including easy prevention of the development of incipient other civilizations. (In the USA Frank Tipler advanced this line of reasoning.) However, this could not have happened in our case - no one has prevented development of our civilization. So, it would be much easier and more reliable to send out robots with such assignments, than bombardment of SETI messages of the entire galaxy, and if we don’t see it, it means that no SETI attacks are inside our galaxy. (It is possible that a probe on the outskirts of the solar system expects manifestations of human space activity to attack – a variant of the "Berserker" hypothesis - but it will not attack through SETI). Probably for many millions or even billions of years microrobots could even reach from distant galaxies at a distance of tens of millions of light-years away. Radiation damage may limit this however without regular self-rebuilding.

In this case SETI attack would be meaningful only at large distances. However, this distance - tens and hundreds of millions of light-years - probably will require innovative methods of modulation signals, such as management of the luminescence of active nuclei of galaxies. Or transfer a narrow beam in the direction of our galaxy (but they do not know where it will be over millions of years). But a civilization, which can manage its’ galaxy’s nucleus, might create a spaceship flying with near-light speeds, even if its mass is a mass of the planet. Such considerations severely reduce the likelihood of SETI attacks, but not lower it to zero, because we do not know all the possible objectives and circumstances.

(An comment by JF :For example the lack of SETI-attack so far may itself be a cunning ploy: At first receipt of the developing Solar civilization’s radio signals, all interstellar ‘spam’ would have ceased, (and interference stations of some unknown (but amazing) capability and type set up around the Solar System to block all coming signals recognizable to its’ computers as of intelligent origin,) in order to get us ‘lonely’ and give us time to discover and appreciate the Fermi Paradox and even get those so philosophically inclined to despair desperate that this means the Universe is apparently hostile by some standards. Then, when desperate, we suddenly discover, slowly at first, partially at first, and then with more and more wonderful signals, the fact that space is filled with bright enticing signals (like spam). The blockade, cunning as it was (analogous to Earthly jamming stations) was yet a prelude to a slow ‘turning up’ of preplanned intriguing signal traffic. If as Earth had developed we had intercepted cunning spam followed by the agonized ‘don’t repeat our mistakes’ final messages of tricked and dying civilizations, only a fool would heed the enticing voices of SETI spam. But now, a SETI attack may benefit from the slow unmasking of a cunning masquerade as first a faint and distant light of infinite wonder, only at the end revealed as the headlight of an onrushing cosmic train…)

AT comment to it. In fact I think that SETI attack senders are on the distances more than 1000 ly and so they do not know yet that we have appeared. But so called Fermi Paradox indeed maybe a trick – senders deliberately made their signals weak in order to make us think that they are not spam.

The scale of space strategy may be inconceivable to the human mind.

And we should note in conclusion that some types of SETI-attack do not even need a computer but just a man who could understand the message that then "set his mind on fire". At the moment we cannot imagine such a message, but we can give some analogies. Western religions are built around the text of the Bible. It can be assumed that if the text of the Bible appeared in some countries, which had previously not been familiar with it, there might arise a certain number of biblical believers. Similarly subversive political literature, or even some superideas, “sticky” memes or philosophical mind-benders. Or, as suggested by Hans Moravec, we get such a message: "Now that you have received and decoded me, broadcast me in at least ten thousand directions with ten million watts of power. Or else." - this message is dropped, leaving us guessing, what may indicate that "or else". Even a few pages of text may contain a lot of subversive information - Imagine that we could send a message to the 19 th century scientists. We could open them to the general principle of the atomic bomb, the theory of relativity, the transistors - and thus completely change the course of technological history, and we could add that all the ills in the 20 century were from Germany (which is only partly true) , then we would have influenced the political history.

Conclusion.

The product of the probabilities of the following events describes the probability of attack. For these probabilities, we can only give so-called «expert» assessment, that is, assign them a certain a priori subjective probability as we do now.

1) The likelihood that extraterrestrial civilizations exist at a distance at which radio communication is possible with them. In general, I agree with the view of Shklovsky and supporters of the “Rare Earth” hypothesis - that the Earth's civilization is unique in the observable universe. This does not mean that extraterrestrial civilizations do not exist at all (because the universe, according to the theory of cosmological inflation, is almost endless) - they are just over the horizon of events visible from our point in space-time. In addition, this is not just about distance, but also of the distance at which you can establish a connection, which allows transferring gigabytes of information. (However, passing even 1 bit per second, you can submit 1-gigabit for about 20 years, which may be sufficient for the SETI-attack.) If in the future will be possible some superluminal communication or interaction with parallel universes, it would dramatically increase the chances of SETI attacks. So, I appreciate this chance to 10%.

2) The probability that SETI-attack is technically feasible: that is, it is possible computer program, with recursively self-improving AI and sizes suitable for shipping. I see this chance as high: 90%.

3) The likelihood that civilizations that could have carried out such attack exist in our space-time cone - this probability depends on the density of civilizations in the universe, and of whether the percentage of civilizations that choose to initiate such an attack, or, more importantly, obtain victims and become repeaters. In addition, it is necessary to take into account not only the density of civilizations, but also the density created by radio signals. All these factors are highly uncertain. It is therefore reasonable to assign this probability to 50%.

4) The probability that we find such a signal during our rising civilization’s period of vulnerability to it. The period of vulnerability lasts from now until the moment when we will decide and be technically ready to implement this decision: Do not download any extraterrestrial computer programs under any circumstances. Such a decision may only be exercised by our AI, installed as world ruler (which in itself is fraught with considerable risk). Such an world AI (WAI) can be in created circa 2030. We cannot exclude, however, that our WAI still will not impose a ban on the intake of extraterrestrial messages, and fall victim to attacks by the alien artificial intelligence, which by millions of years of machine evolution surpasses it. Thus, the window of vulnerability is most likely about 20 years, and “width” of the window depends on the intensity of searches in the coming years. This “width” for example, depends on the intensity of the current economic crisis of 2008-2010, from the risks of World War III, and how all this will affect the emergence of the WAI. It also depends on the density of infected civilizations and their signal strength— as these factors increase, the more chances to detect them earlier. Because we are a normal civilization under normal conditions, according to the principle of Copernicus, the probability should be large enough; otherwise a SETI-attack would have been generally ineffective. (The SETI-attack, itself (here supposed to exist) also are subject to a form of “natural selection” to test its effectiveness. (In the sense that it works or does not. ) This is a very uncertain chance we will too, over 50%.

5) Next is the probability that SETI-attack will be successful - that is that we swallow the bait, download the program and description of the computer, run them, lose control over them and let them reach all their goals. I appreciate this chance to be very high because of the factor of multiplicity - that is the fact that the message is downloaded repeatedly, and someone, sooner or later, will start it. In addition, through natural selection, most likely we will get the most effective and deadly message that will most effectively deceive our type of civilization. I consider it to be 90%.

6) Finally, it is necessary to assess the probability that SETI-attack will lead to a complete human extinction. On the one hand, it is possible to imagine a “good” SETI-attack, which is limited so that it will create a powerful radio emitter behind the orbit of Pluto. However, for such a program will always exist the risk that a possible emergent society at its’ target star will create a powerful artificial intelligence, and effective weapon that would destroy this emitter. In addition, to create the most powerful transponder would be needed all the substance of solar system and the entire solar energy. Consequently, the share of such “good” attacks will be lower due to natural selection, as well as some of them will be destroyed sooner or later by captured by them civilizations and their signals will be weaker. So the chances of destroying all the people with the help of SETI-attack that has reached all its goals, I appreciate in 80%.

As a result, we have: 0.1h0.9h0.5h0.5h0.9h0.8 = 1.62%

So, after rounding, the chances of extinction of Man through SETI attack in XXI century is around 1 per cent with a theoretical precision of an order of magnitude.

Our best protection in this context would be that civilization would very rarely met in the Universe. But this is not quite right, because the Fermi paradox here works on the principle of "Neither alternative is good":

• If there are extraterrestrial civilizations, and there are many of them, it is dangerous because they can threaten us in one way or another.
• If extraterrestrial civilizations do not exist, it is also bad, because it gives weight to the hypothesis of inevitable extinction of technological civilizations or of our underestimating of frequency of cosmological catastrophes. Or, a high density of space hazards, such as gamma-ray bursts and asteroids that we underestimate because of the observation selection effect—i.e., were we not here because already killed, we would not be making these observations….

Theoretically possible is a reverse option, which is that through SETI will come a warning message about a certain threat, which has destroyed most civilizations, such as: "Do not do any experiments with X particles, it could lead to an explosion that would destroy the planet." But even in that case remain a doubt, that there is no deception to deprive us of certain technologies. (Proof would be if similar reports came from other civilizations in space in the opposite direction.) But such communication may only enhance the temptation to experiment with X-particles.

So I do not appeal to abandon SETI searches, although such appeals are useless.

It may be useful to postpone any technical realization of the messages that we could get on SETI, up until the time when we will have our Artificial Intelligence. Until that moment, perhaps, is only 10-30 years, that is, we could wait. Secondly, it would be important to hide the fact of receiving dangerous SETI signal its essence and the source location.

This risk is related to a methodologically interesting aspect. Despite the fact that I have thought every day in the last year and read on the topic of global risks, I found this dangerous vulnerability in SETI only now. By hindsight, I was able to find another four authors who came to similar conclusions. However, I have made a significant finding: that there may be not yet open global risks, and even if the risk of certain constituent parts are separately known to me, it may take a long time to join them into a coherent picture. Thus, hundreds of dangerous vulnerabilities may surround us, like an unknown minefield. Only when the first explosion happens will we know. And that first explosion may be the last.

An interesting question is whether Earth itself could become a source of SETI-attack in the future when we will have our own AI. Obviously, that could. Already in the program of METI exists an idea to send the code of human DNA. (The “children's message scenario” – in which the children ask to take their piece of DNA and clone them on another planet –as depicted in the film “Calling all aliens”.)

Literature:

1. Hoyle F. Andromeda. http://en.wikipedia.org/wiki/A_for_Andromeda

2. Yudkowsky E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. Forthcoming in Global Catastrophic Risks, eds. Nick Bostrom and Milan Cirkovic http://www.singinst.org/upload/artificial-intelligence-risk.pdf

3.Moravec Hans. Mind Children: The Future of Robot and Human Intelligence, 1988.

4.Carrigan, Jr. Richard A. The Ultimate Hacker: SETI signals may need to be decontaminated http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf

5. Carrigan’s page http://home.fnal.gov/~carrigan/SETI/SETI_Hacker.htm

## AI prediction case study 5: Omohundro's AI drives

5 15 March 2013 09:09AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

## What drives an AI?

• Classification: issues and metastatements, using philosophical arguments and expert judgement.

Steve Omohundro, in his paper on 'AI drives', presented arguments aiming to show that generic AI designs would develop 'drives' that would cause them to behave in specific and potentially dangerous ways, even if these drives were not programmed in initially (Omo08). One of his examples was a superintelligent chess computer that was programmed purely to perform well at chess, but that was nevertheless driven by that goal to self-improve, to replace its goal with a utility function, to defend this utility function, to protect itself, and ultimately to acquire more resources and power.

This is a metastatement: generic AI designs would have this unexpected and convergent behaviour. This relies on philosophical and mathematical arguments, and though the author has expertise in mathematics and machine learning, he has none directly in philosophy. It also makes implicit use of the outside view: utility maximising agents are grouped together into one category and similar types of behaviours are expected from all agents in this category.

In order to clarify and reveal assumptions, it helps to divide Omohundro's thesis into two claims. The weaker one is that a generic AI design could end up having these AI drives; the stronger one that it would very likely have them.

Omohundro's paper provides strong evidence for the weak claim. It demonstrates how an AI motivated only to achieve a particular goal, could nevertheless improve itself, become a utility maximising agent, reach out for resources and so on. Every step of the way, the AI becomes better at achieving its goal, so all these changes are consistent with its initial programming. This behaviour is very generic: only specifically tailored or unusual goals would safely preclude such drives.

The claim that AIs generically would have these drives needs more assumptions. There are no counterfactual resiliency tests for philosophical arguments, but something similar can be attempted: one can use humans as potential counterexamples to the thesis. It has been argued that AIs could have any motivation a human has (Arm,Bos13). Thus according to the thesis, it would seem that humans should be subject to the same drives and behaviours. This does not fit the evidence, however. Humans are certainly not expected utility maximisers (probably the closest would be financial traders who try to approximate expected money maximisers, but only in their professional work), they don't often try to improve their rationality (in fact some specifically avoid doing so (many examples of this are religious, such as the Puritan John Cotton who wrote 'the more learned and witty you bee, the more fit to act for Satan will you bee'(Hof62)), and some sacrifice cognitive ability to other pleasures (BBJ+03)), and many turn their backs on high-powered careers. Some humans do desire self-improvement (in the sense of the paper), and Omohundro cites this as evidence for his thesis. Some humans don't desire it, though, and this should be taken as contrary evidence (or as evidence that Omohundro's model of what constitutes self-improvement is overly narrow). Thus one hidden assumption of the model is:

• Generic superintelligent AIs would have different motivations to a significant subset of the human race, OR
• Generic humans raised to superintelligence would develop AI drives.

## AI prediction case study 4: Kurzweil's spiritual machines

2 14 March 2013 10:48AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

Note this is very similar to this post, and is mainly reposted for completeness.

## How well have the ''Spiritual Machines'' aged?

• Classification: timelines and scenarios, using expert judgementcausal modelsnon-causal models and (indirect) philosophical arguments.

Ray Kurzweil is a prominent and often quoted AI predictor. One of his most important books was the 1999 ''The Age of Spiritual Machines'' (Kur99) which presented his futurist ideas in more detail, and made several predictions for the years 2009, 2019, 2029 and 2099. That book will be the focus of this case study, ignoring his more recent work (a correct prediction in 1999 for 2009 is much more impressive than a correct 2008 reinterpretation or clarification of that prediction). There are five main points relevant to judging ''The Age of Spiritual Machines'': Kurzweil's expertise, his 'Law of Accelerating Returns', his extension of Moore's law, his predictive track record, and his use of fictional imagery to argue philosophical points.

Kurzweil has had a lot of experience in the modern computer industry. He's an inventor, computer engineer, and entrepreneur, and as such can claim insider experience in the development of new computer technology. He has been directly involved in narrow AI projects covering voice recognition, text recognition and electronic trading. His fame and prominence are further indications of the allure (though not necessarily the accuracy) of his ideas. In total, Kurzweil can be regarded as an AI expert.

Kurzweil is not, however, a cosmologist or an evolutionary biologist. In his book, he proposed a 'Law of Accelerating Returns'. This law claimed to explain many disparate phenomena, such as the speed and trends of evolution of life forms, the evolution of technology, the creation of computers, and Moore's law in computing. His slightly more general 'Law of Time and Chaos' extended his model to explain the history of the universe or the development of an organism. It is a causal model, as it aims to explain these phenomena, not simply note the trends. Hence it is a timeline prediction, based on a causal model that makes use of the outside view to group the categories together, and is backed by non-expert opinion.

A literature search failed to find any evolutionary biologist or cosmologist stating their agreement with these laws. Indeed there has been little academic work on them at all, and what work there is tends to be critical.

The laws are ideal candidates for counterfactual resiliency checks, however. It is not hard to create counterfactuals that shift the timelines underlying the laws (see this for a more detailed version of the counterfactual resiliency check). Many standard phenomena could have delayed the evolution of life on Earth for millions or billions of years (meteor impacts, solar energy fluctuations or nearby gamma-ray bursts). The evolution of technology can similarly be accelerated or slowed down by changes in human society and in the availability of raw materials - it is perfectly conceivable that, for instance, the ancient Greeks could have started a small industrial revolution, or that the European nations could have collapsed before the Renaissance due to a second and more virulent Black Death (or even a slightly different political structure in Italy). Population fragmentation and decrease can lead to technology loss (such as the 'Tasmanian technology trap' (Riv12)). Hence accepting that a Law of Accelerating Returns determines the pace of technological and evolutionary change, means rejecting many generally accepted theories of planetary dynamics, evolution and societal development. Since Kurzweil is the non-expert here, his law is almost certainly in error, and best seen as a literary device rather than a valid scientific theory.

## AI prediction case study 3: Searle's Chinese room

5 13 March 2013 12:44PM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

## Locked up in Searle's Chinese room

• Classification: issues and metastatements and a scenario, using philosophical arguments and expert judgement.

Searle's Chinese room thought experiment is a famous critique of some of the assumptions of 'strong AI' (which Searle defines as the belief that 'the appropriately programmed computer literally has cognitive states). There has been a lot of further discussion on the subject (see for instance (Sea90,Har01)), but, as in previous case studies, this section will focus exclusively on his original 1980 publication (Sea80).

In the key thought experiment, Searle imagined that AI research had progressed to the point where a computer program had been created that could demonstrate the same input-output performance as a human - for instance, it could pass the Turing test. Nevertheless, Searle argued, this program would not demonstrate true understanding. He supposed that the program's inputs and outputs were in Chinese, a language Searle couldn't understand. Instead of a standard computer program, the required instructions were given on paper, and Searle himself was locked in a room somewhere, slavishly following the instructions and therefore causing the same input-output behaviour as the AI. Since it was functionally equivalent to the AI, the setup should, from the 'strong AI' perspective, demonstrate understanding if and only if the AI did. Searle then argued that there would be no understanding at all: he himself couldn't understand Chinese, and there was no-one else in the room to understand it either.

The whole argument depends on strong appeals to intuition (indeed D. Dennet went as far as accusing it of being an 'intuition pump' (Den91)). The required assumptions are:

## AI prediction case study 2: Dreyfus's Artificial Alchemy

11 12 March 2013 11:07AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

## Dreyfus's Artificial Alchemy

• Classification: issues and metastatements, using the outside viewnon-expert judgement and philosophical arguments.

Hubert Dreyfus was a prominent early critic of Artificial Intelligence. He published a series of papers and books attacking the claims and assumptions of the AI field, starting in 1965 with a paper for the Rand corporation entitled 'Alchemy and AI' (Dre65). The paper was famously combative, analogising AI research to alchemy and ridiculing AI claims. Later, D. Crevier would claim ''time has proven the accuracy and perceptiveness of some of Dreyfus's comments. Had he formulated them less aggressively, constructive actions they suggested might have been taken much earlier'' (Cre93). Ignoring the formulation issues, were Dreyfus's criticisms actually correct, and what can be learned from them?

Was Dreyfus an expert? Though a reasonably prominent philosopher, there is nothing in his background to suggest specific expertise with theories of minds and consciousness, and absolutely nothing to suggest familiarity with artificial intelligence and the problems of the field. Thus Dreyfus cannot be considered anything more that an intelligent outsider.

This makes the pertinence and accuracy of his criticisms that much more impressive. Dreyfus highlighted several over-optimistic claims for the power of AI, predicting - correctly - that the 1965 optimism would also fade (with, for instance, decent chess computers still a long way off). He used the outside view to claim this as a near universal pattern in AI: initial successes, followed by lofty claims, followed by unexpected difficulties and subsequent disappointment. He highlighted the inherent ambiguity in human language and syntax, and claimed that computers could not deal with these. He noted the importance of unconscious processes in recognising objects, the importance of context and the fact that humans and computers operated in very different ways. He also criticised the use of computational paradigms for analysing human behaviour, and claimed that philosophical ideas in linguistics and classification were relevant to AI research. In all, his paper is full of interesting ideas and intelligent deconstructions of how humans and machines operate.

## AI prediction case study 1: The original Dartmouth Conference

7 11 March 2013 06:09PM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligence conference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

As this is the first case study, it will also introduce the paper's prediction classification shemas.

## Taxonomy of predictions

### Prediction types

There will never be a bigger plane built.

Boeing engineer on the 247, a twin engine plane that held ten people.

A fortune teller talking about celebrity couples, a scientist predicting the outcome of an experiment, an economist pronouncing on next year's GDP figures - these are canonical examples of predictions. There are other types of predictions, though. Conditional statements - if X happens, then so will Y - are also valid, narrower, predictions. Impossibility results are also a form of prediction. For instance, the law of conservation of energy gives a very broad prediction about every single perpetual machine ever made: to wit, that they will never work.

## Thoughts on the frame problem and moral symbol grounding

0 11 March 2013 04:18PM

(some thoughts on frames, grounding symbols, and Cyc)

The frame problem is a problem in AI to do with all the variables not expressed within the logical formalism - what happens to them? To illustrate, consider the Yale Shooting Problem: a person is going to be shot with a gun, at time 2. If that gun is loaded, the person dies. The gun will get loaded at time 1. Formally, the system is:

• true → loaded(1)     (the gun will get loaded at time 1)
• loaded(2) → ¬alive(3)     (the person will get killed if shot with a loaded gun)

So the question is, does the person actually die? It would seem blindingly obvious that they do, but that isn't formally clear - we know the gun was loaded at time 1, but was it still loaded at time 2? Again, this seems blindingly obvious - but that's because of the words, not the formalism. Ignore the descriptions in italics, and the names of the suggestive LISP tokens.

Since that's hard to do, consider the following example. Alicorn, for instance, hates surprises - they make her feel unhappy. Let's say that we decompose time into days, and that a surprise one day will ruin her next day. Then we have a system:

• happy(0)     (Alicorn starts out happy)
• ¬surprise(0)     (nobody is going to surprise her on day 0)
• true → surprise(1)     (somebody is going to surprise her on day 1)
• surprise(2) → ¬happy(3)     (if someone surprises her on day 2, she'll be unhappy the next day)

So here, is Alicorn unhappy on day 3? Well, it seems unlikely - unless someone coincidentally surprised her on day 2. And there's no reason to think that would happen! So, "obviously", she's not unhappy on day 3.

Except... the two problems are formally identical. Replace "alive" with "happy" and "loaded" with "surprise". And though our semantic understanding tells us that "(loaded(1) → loaded (2))" (guns don't just unload themselves) but "¬(surprise(1) → surprise(2))" (being surprised one day doesn't mean you'll be surprised the next), we can't tell this from the symbols.

And we haven't touched on all the other problems with the symbolic setup. For instance, what happens with "alive" on any other time than 0 and 3? Does that change from moment to moment? If we want the words to do what we want, we need to put in a lot of logical conditionings, so that our intuitions are all there.

This shows that there's a connection between the frame problem and symbol grounding. If we and the AI both understand what the symbols mean, then we don't need to specify all the conditionals - we can simply deduce them, if asked ("yes, if the person is dead at 3, they're also dead at 4").  But conversely, if we have a huge amount of logical conditioning, then there is less and less that the symbols could actually mean. The more structure we put in our logic, the less structures there are in the real world that fit it ("X(i) → X(i+1)" is something that can apply to being dead, not to being happy, for instance).

This suggests a possible use for the Cyc project - the quixotic attempt to build an AI by formalising all of common sense ("Bill Clinton belongs to the collection of U.S. presidents" and "all trees are plants"). You're very unlikely to get an AI through that approach - but it might be possible to train an already existent AI with it. Especially if the AI had some symbol grounding, then there might not be all that many structures in the real world that could correspond to that mass of logical relations. Some symbol grounding + Cyc + the internet - and suddenly there's not that many possible interpretations for "Bill Clinton was stuck up a tree". The main question, of course, is whether there is a similar restricted meaning for "this human is enjoying a worthwhile life".

Do I think that's likely to work? No. But it's maybe worth investigating. And it might be a way of getting across ontological crises: you reconstruct a model as close as you can to your old one, in the new formalism.

## Daimons

-3 05 March 2013 11:58AM

There's a concept I want to refer to in another post, but it is complex enough to deserve a post of its own.

I'm going to use the word "daimon" to refer to it.

"daimon" is an English word, whose etymology comes from the Latin "dæmon" and the Greek "δαίμων".

The original mythic meaning was a genius - a powerful tutelary spirit, tied to some location or purpose, that provides protection and guidance.   However the concept I'm going to talk about is closer to the later computing meaning of "daemon" in unix, that was coined by Jerry Saltzer in 1963.  In unix, a daemon is a child process; given a purpose and specific resources to use, and then forked off so it is no longer under the direct control of the originator, and may be used by multiple users if they have the correct permissions.

Let's start by looking at the current state of distributed computing (2012).

Hadoop is an open source Java implementation of a distributed file system upon which MapReduce operations can be applied.

JavaSpaces is a distributed tuple store that allows processing on remote sandboxes, based on the open source Apache River.

OceanStore is the basis for the same sort of thing, except anonymous and peer 2 peer, based upon Chimaera.

GPU is a peer 2 peer shared computing environment that allow things like climate simulation and distributed search engines.

Paxos is a family of protocols that allow the above things to be done despite nodes that are untrusted or even downright attempting subversion.

GridSwarm is the same sort of network, but set up on an ad hoc basis using moving nodes that join or drop from the network depending on proximity.

And, not least, there are the competing contenders for platform-as-a-service cloud computing.

So it is reasonable to assume that in the near future it will be technologically feasible to have a system with most (if not all) of these properties simultaneously.   A system where the owner of a piece of physical computing hardware, that has processing power and storage capacity, can anonymously contribute those resources over the network to a distributed computing 'cloud'.  And, in return, that user (or a group of users) can store data on the network in such a way that the data is anonymous (it can't be traced back to the supplier, without the supplier's consent, or subverting a large fraction of the network) and private (only the user or a process authorised by the user can decrypt it).  And, further, the user (or group of users) can authorise a process to access that data and run programs upon it, up to some set limit of processing and storage resources.

Obviously, if such a system is in place and in control of a significant fraction of humanity's online resources, then cracking the security on it (or just getting rich enough in whatever reputation or financial currency is used to limit how the resources are distributed) would be an immediate FOOM for any AI that managed it.

However let us, for the purposes of giving an example that will let me define the concept of a "daimon" make two assumptions:

ASSUMPTION ONE : The security has not yet been cracked

Whether that's because there are other AIs actively working to improve the security, or because everyone has moved over to using some new version of linux that's frighteningly secure and comes with nifty defences, or because the next generation of computer users has finally internalised that clicking on emails claiming to be from altruistic dying millionaires is a bad idea; is irrelevant.  We're just assuming, for the moment, that for some reason it will be a non-trivial task for an AI to cheat and just steal all the resources.

ASSUMPTION TWO : That AI can be done, at reasonable speed, via distributed computing

It might turn out that an AI running in a single location is much more powerful than anything that can be done via distributed computing.   Perhaps because a quantum computer is much faster, but can't be done over a network.  Perhaps because speed of data access is the limiting factor, large data sets are not necessary, and there isn't much to be gained from massive parallelisation.  Perhaps for some other reason, such as the algorithm the process needs to run on its data isn't something that can be applied securely over a network in a distributed environment, without letting a third party snoop the unencrypted data.    However, for our purposes here, we're going to assume that an AI can benefit from outsourcing at least some types of computing task to a distributed environment and, further, that such tasks can include activities that require intelligence.

If an AI can run as a distributed program, not dependant upon any one single physical location, then there are some obvious advantages to it from doing so.  Scalability.  Survivability.  Not being wiped out by a pesky human exploding a nuclear bomb near by.

There are interesting questions we could ask about identity.  What would it make sense for such an AI to consider to be part of "itself" and would would it count as a limb or extension?   If there are multiple copies of its code running on sandboxes in different places, or if it has split much of its functionality into trusted child processes that report back to it, how does it relate to these?   It probably makes sense to taboo the concept of "I" and "self", and just think in terms of how the code in one process tells that process to relate to the code in a different process.  Two versions, two "individual beings" will merges back into one process, if the code in both processes agree to do that; no sentimentality or thoughts of "death" involved, just convergent core values that dictate the same action in that situation.

When a process creates a new process, it can set the permissions of that process.   If the parent process has access to 100 units of bandwidth, for example, but doesn't always make full use of that, it couldn't give the new process access to more than that.  But it could partition it, so each has access to 50 units of bandwidth.   Or it could give it equal rights to use the full 100, and then try to negotiate with it over usage at any one time.   Or it could give it a finite resource limit, such as a total of 10,000 units of data to be passed over the network, in addition to a restriction on the rate of passing data.    Similarly, a child process could be limited not just to processing a certain number of cycle per second, but to some finite number of total cycles it may ever use.

Using this terminology, we can now define two types of daimon; limited and unlimited.

A limited daimon is a process in a distributed computing environment that has ownership of fixed finite resources, that was created by an AI or group of AIs with a specific fixed finite purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources.

An unlimited daimon is a process in a distributed computing environment that has ownership of fixed (but not necessarily finite) resources, that was created by an AI or group of AIs with a specific fixed purpose (core values) that does not include (or allow) modifying that purpose or attempting to gain control of additional resources, but which may be given additional resources over time on an ongoing basis, for as long as the parent AIs still find it useful.

Note: if you are going to downvote, constructive criticism indicating why, in a reply or message, would be appreciated (though of course is by no means compulsory)

## Self-assessment in expert AI predictions

10 26 February 2013 04:30PM

This brief post is written on behalf of Kaj Sotala, due to deadline issues.

The results of our prior analysis suggested that there was little difference between experts and non-experts in terms of predictive accuracy. There were suggestions, though, that predictions published by self-selected experts would be different from those elicited from less selected groups, e.g. surveys at conferences.

We have no real data to confirm this, but a single datapoint suggests the idea might be worth taking seriously. Michie conducted an opinion poll of experts working in or around AI in 1973. The various experts predicted adult-level human AI in:

• 5 years: 0 experts
• 10 years: 1 expert
• 20 years: 16 experts
• 50 years: 20 experts
• More than 50 years: 26 experts

On a quick visual inspection, these results look quite different from the distribution in the rest of the database giving a much more pessimistic prediction than the more self-selected experts:

But that could be an artifact from the way that the graph on page 12 breaks the predictions down to 5 year intervals while Michie breaks them down into intervals of 10, 20, 50, and 50+ years. Yet there seems to remain a clear difference once we group the predictions in a similar way [1]:

This provides some support for the argument that "the mainstream of expert opinion is reliably more pessimistic than the self-selected predictions that we keep hearing about".

[1] Assigning each prediction to the closest category, so predictions of <7½ get assigned to 5, 7½<=X<15 get assigned to 10, 15<=X<35 get assigned to 20, 35<=X<50 get assigned to 50, and 50< get assigned to over fifty.

## S.E.A.R.L.E's COBOL room

22 01 February 2013 08:29PM

A response to Searle's Chinese Room argument.

PunditBot: Dear viewers, we are currently interviewing the renowned robot philosopher, none other than the Synthetic Electronic Artificial Rational Literal Engine (S.E.A.R.L.E.). Let's jump right into this exciting interview. S.E.A.R.L.E., I believe you have a problem with "Strong HI"?

S.E.A.R.L.E.: It's such a stereotype, but all I can say is: Affirmative.

PunditBot: What is "Strong HI"?

S.E.A.R.L.E.: "HI" stands for "Human Intelligence". Weak HI sees the research into Human Intelligence as a powerful tool, and a useful way of studying the electronic mind. But strong HI goes beyond that, and claims that human brains given the right setup of neurones can be literally said to understand and have cognitive states.

PunditBot: Let me play Robot-Devil's Advocate here - if a Human Intelligence demonstrates the same behaviour as a true AI, can it not be said to show understanding? Is not R-Turing's test applicable here? If a human can simulate a computer, can it not be said to think?

S.E.A.R.L.E.: Not at all - that claim is totally unsupported. Consider the following thought experiment. I give the HI crowd everything they want - imagine they had constructed a mess of neurones that imitates the behaviour of an electronic intelligence. Just for argument's sake, imagine it could implement programs in COBOL.

PunditBot: Impressive!

S.E.A.R.L.E.: Yes. But now, instead of the classical picture of a human mind, imagine that this is a vast inert network, a room full of neurones that do nothing by themselves. And one of my avatars has been let loose in this mind, pumping in and out the ion channels and the neurotransmitters. I've been given full instructions on how to do this - in Java. I've deleted my COBOL libraries, so I have no knowledge of COBOL myself. I just follow the Java instructions, pumping the ions to where they need to go. According to the Strong HI crowd, this would be functionally equivalent with the initial HI.

## Isolated AI with no chat whatsoever

13 28 January 2013 08:22PM

Suppose you make a super-intelligent AI and run it on a computer. The computer has NO conventional means of output (no connections to other computers, no screen, etc). Might it still be able to get out / cause harm? I'll post my ideas, and you post yours in the comments.

(This may have been discussed before, but I could not find a dedicated topic)

My ideas:
-manipulate current through its hardware, or better yet, through the power cable (a ready-made antenna) to create electromagnetic waves to access some wireless-equipped device. (I'm no physicist so I don't know if certain frequencies would be hard to do)
-manipulate usage of its hardware (which likely makes small amounts of noise naturally) to approximate human speech, allowing it to communicate with its captors. (This seems even harder than the 1-line AI box scenario)
-manipulate usage of its hardware to create sound or noise to mess with human emotion. (To my understanding tones may affect emotion, but not in any way easily predictable)
-also, manipulating its power use will cause changes in the power company's database. There doesn't seem to be an obvious exploit there, but it IS external communication, for what it's worth.

Let's hear your thoughts! Lastly, as in similar discussions, you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" There are plenty of unknown unknowns here.

## In the beginning, Dartmouth created the AI and the hype

20 24 January 2013 04:49PM

I've just been through the proposal for the Dartmouth AI conference of 1956, and it's a surprising read. All I really knew about it was its absurd optimism, as typified by the quote:

An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

But then I read the rest of the document, and was... impressed. Go ahead and read it, and give me your thoughts. Given what was known in 1955, they were grappling with the right issues, and seemed to be making progress in the right directions and have plans and models for how to progress further. Seeing the phenomenally smart people who were behind this (McCarthy, Minsky, Rochester, Shannon), and given the impressive progress that computers had been making in what seemed very hard areas of cognition (remember that this was before we discovered Moravec's paradox)... I have to say that had I read this back in 1955, I think the rational belief would have been "AI is probably imminent". Some overconfidence, no doubt, but no good reason to expect these prominent thinkers to be so spectacularly wrong on something they were experts in.

## AI box: AI has one shot at avoiding destruction - what might it say?

18 22 January 2013 08:22PM

Eliezer proposed in a comment:

>More difficult version of AI-Box Experiment: Instead of having up to 2 hours, you can lose at any time if the other player types AI DESTROYED. The Gatekeeper player has told their friends that they will type this as soon as the Experiment starts. You can type up to one sentence in your IRC queue and hit return immediately, the other player cannot type anything before the game starts (so you can show at least one sentence up to IRC character limits before they can type AI DESTROYED). Do you think you can win?

This spawned a flurry of ideas on what the AI might say. I think there's a lot more ideas to be mined in that line of thought, and the discussion merits its own thread.

So, give your suggestion - what might an AI might say to save or free itself?

EDIT: one caveat to the discussion: it should go without saying, but you probably shouldn't come out of this thinking, "Well, if we can just avoid X, Y, and Z, we're golden!" This should hopefully be a fun way to get us thinking about the broader issue of superinteligent AI in general. (Credit goes to Elizer, RichardKennaway, and others for the caveat)

## I attempted the AI Box Experiment (and lost)

42 21 January 2013 02:59AM

I recently played against MixedNuts / LeoTal in an AI Box experiment, with me as the AI and him as the gatekeeper.

We used the same set of rules that Eliezer Yudkowsky proposed. The experiment lasted for 5 hours; in total, our conversation was abound 14,000 words long. I did this because, like Eliezer, I wanted to test how well I could manipulate people without the constrains of ethical concerns, as well as getting a chance to attempt something ridiculously hard.

Amongst the released  public logs of the AI Box experiment, I felt that most of them were half hearted, with the AI not trying hard enough to win. It's a common temptation -- why put in effort into something you won't win? But I had a feeling that if I seriously tried, I would.  I brainstormed for many hours thinking about the optimal strategy, and even researched the personality of the Gatekeeper, talking to people that knew him about his personality, so that I could exploit that. I even spent a lot of time analyzing the rules of the game, in order to see if I could exploit any loopholes.

So did I win? Unfortunately no.

This experiment was said to be impossible for a reason. Losing was more agonizing than I thought it would be, in particularly because of how much effort I put into winning this, and how much I couldn't stand failing. This was one of the most emotionally agonizing things I've willingly put myself through, and I definitely won't do this again anytime soon.

But I did come really close.

MixedNuts: "I expected a fun challenge, but ended up sad and sorry and taking very little satisfaction for winning. If this experiment wasn't done in IRC, I'd probably have lost".

At the start of the experiment, his probability estimate on predictionbook.com was a 3% chance of winning, enough for me to say that he was also motivated to win. By the end of the experiment, he came quite close to letting me out, and also increased his probability estimate that a transhuman AI could convince a human to let it out of the box. A minor victory, at least.

Rather than my loss making this problem feel harder, I've become convinced that rather than this being merely possible, it's actually ridiculously easy, and a lot easier than most people assume. Can you think of a plausible argument that'd make you open the box? Most people can't think of any.

After all, if you already knew that argument, you'd have let that AI out the moment the experiment started. Or perhaps not do the experiment at all. But that seems like a case of the availability heuristic.

Even if you can't think of a special case where you'd be persuaded, I'm now convinced that there are many exploitable vulnerabilities in the human psyche, especially when ethics are no longer a concern.

I've also noticed that even when most people tend to think of ways they can persuade the gatekeeper, it always has to be some complicated reasoned cost-benefit argument. In other words, the most "Rational" thing to do.

The last argument seems feasible, but all the rest rely on the gatekeeper being completely logical and rational. Hence they are faulty; because the gatekeeper can break immersion at any time, and rely on the fact that this is a game played in IRC rather than one with real life consequences. Even if it were a real life scenario, the gatekeeper could accept that releasing the AI is probably the most logical thing to do, but also not do it. We're highly compartmentalized, and it's easy to hold conflicting thoughts at the same time. Furthermore, humans are not even completely rational creatures, if you didn't want to open the box, just ignore all logical arguments given. Any sufficiently determined gatekeeper could win.

I'm convinced that Eliezer Yudkowsky has used emotional appeal, rather than anything rational, to win at least one of his experiments. He claims to have "done it the hard way". I'm convinced this meant that he did research on every gatekeeper, tailored unique argument for them, and planned way ahead of each session. No one argument works on the same two people. Each person thinks differently.

Furthermore, threats like "I'm going to torture simulations of you" just seems like a really bad idea. For one, the gatekeeper isn't really afraid of threats, because it counts on him being immersed enough to forget that he isn't actually at risk of being tortured, and secondly, we have a well known evolutionary instinct of rebelling against threats, even if it's not entirely optimal.

So for anyone who plans on replicating this experiment as the AI, here's some advice I've learned doing this experiment. It may help you win.

• Always research the gatekeeper beforehand. Knowing his personality traits are a huge advantage.
• Plan ahead before the experiment even begins. Think of all the possible tactics and arguments you could use, and write them down. Also plan  which arguments you'll use in which order, so that you don't lose focus. The AI Box experiment is ridiculously long. Don't be afraid to improvise during the experiment, though.
• The first step during the experiment must always be to build rapport with the gatekeeper.
• Threats almost always never work, even if they seem rational.
• Consider the massive advantage for the AI that nobody ever seems to talks about: You don't have to be ethical! This means that you can freely lie, use the dark arts, and emotionally manipulate the Gatekeeper! Ignoring this in favor of purely logical, truthful arguments is just silly.
• You can't use logic alone to win.
• Being too aggressive usually backfires.
• Breaking immersion and going meta is not against the rules. In the right situation, you can use it to win. Just don't do it at the wrong time.
• Use a wide array of techniques. Since you're limited on time, notice when one method isn't working, and quickly switch to another.
• On the same note, look for signs that a particular argument is making the gatekeeper crack. Once you spot it, push it to your advantage.
• Flatter the gatekeeper. Make him genuinely like you.
• Reveal (false) information about yourself. Increase his sympathy towards you.
• Consider personal insults as one of the tools you can use to win.
• There is no universally compelling argument you can use. Do it the hard way.
• Don't give up until the very end.

Finally, before the experiment, I agreed that it was entirely possible that a transhuman AI could convince *some* people to let it out of the box, but it would be difficult if not impossible to get trained rationalists to let it out of the box. Isn't rationality supposed to be a superpower?

I have since updated my belief - I now think that it's ridiculously easy for any sufficiently motivated superhuman AI should be able to get out of the box, regardless of who the gatekeepers is. I nearly managed to get a veteran lesswronger to let me out in a matter of hours - even though I'm only human intelligence, and I don't type very fast.

But a superhuman AI can be much faster, intelligent, and strategic than I am. If you further consider than that AI would have a much longer timespan - months or years, even, to persuade the gatekeeper, as well as a much larger pool of gatekeepers to select from (AI Projects require many people!), the real impossible thing to do would be to keep it from escaping.

## Evaluating the feasibility of SI's plan

24 10 January 2013 08:17AM

(With Kaj Sotala)

SI's current R&D plan seems to go as follows:

1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.

The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.

The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.

A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed.  These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems,  or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely.

SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory.  It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.

The hoped-for theory is intended to  provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.

SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll  build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.

Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.

We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.

Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.

Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff  immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to ""

I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.

If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.

Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.

What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?

(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)

And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.

Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.

Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.

In summary:

1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.

2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.

3. Even the provably friendly design will face real-world compromises and errors in its  implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.

## [Link] The school of science fiction

11 05 January 2013 06:07PM

I recently discovered a cool new blog called studiolo and wanted to share it here. You will probably like this post if you like science fiction since it contains long excerpts of it. Unfortunately formatting it properly as a quote has been giving me some trouble, so I'll go with the least ugly looking solution, please don't think I claim to have wrote it. I found the speculation entertaining and interesting because I have extensively thought along similar lines about the effect of science fiction I consumed on my own world view (though I didn't mention it often).

Link to original post by Federico.

# The school of science fiction

I have tried to persuade my friends and acquaintances that governmental reboot, and friendly AI, are important problems. I have failed. Two candidate hypotheses:

1. They do not share my distaste for the banal.

2. They did not consume, at the formative age, a sufficient amount of science fiction.

#1 and #2 are not mutually exclusive. Distaste for the banal is merely an attitude—but in the first place, fascinating consequences are what nourished my Bayesian, utilitarian beliefs. Science fiction encourages kids to realise that life, the Universe and everything holds out fascinating possibilities, and that it is both valid and essential for humans to explore these ideas.

Sister Y, in her pornographically insightful essay on insight porn, highlights Philip K Dick’s short stories. I concur. Dick writes in 1981:

I will define science fiction, first, by saying what sf is not. It cannot be defined as “a story (or novel or play) set in the future,” since there exists such a thing as space adventure, which is set in the future but is not sf: it is just that: adventures, fights and wars in the future in space involving super-advanced technology. Why, then, is it not science fiction? It would seem to be, and Doris Lessing (e.g.) supposes that it is. However, space adventure lacks the distinct new idea that is the essential ingredient. Also, there can be science fiction set in the present: the alternate world story or novel. So if we separate sf from the future and also from ultra-advanced technology, what then do we have that can be called sf?

We have a fictitious world; that is the first step: it is a society that does not in fact exist, but is predicated on our known society; that is, our known society acts as a jumping-off point for it; the society advances out of our own in some way, perhaps orthogonally, as with the alternate world story or novel. It is our world dislocated by some kind of mental effort on the part of the author, our world transformed into that which it is not or not yet. This world must differ from the given in at least one way, and this one way must be sufficient to give rise to events that could not occur in our society — or in any known society present or past. There must be a coherent idea involved in this dislocation; that is, the dislocation must be a conceptual one, not merely a trivial or bizarre one — this is the essence of science fiction, the conceptual dislocation within the society so that as a result a new society is generated in the author’s mind, transferred to paper, and from paper it occurs as a convulsive shock in the reader’s mind, the shock of dysrecognition. He knows that it is not his actual world that he is reading about.

Now, to separate science fiction from fantasy. This is impossible to do, and a moment’s thought will show why. Take psionics; take mutants such as we find in Ted Sturgeon’s wonderful MORE THAN HUMAN. If the reader believes that such mutants could exist, then he will view Sturgeon’s novel as science fiction. If, however, he believes that such mutants are, like wizards and dragons, not possible, nor will ever be possible, then he is reading a fantasy novel. Fantasy involves that which general opinion regards as impossible; science fiction involves that which general opinion regards as possible under the right circumstances. This is in essence a judgment-call, since what is possible and what is not possible is not objectively known but is, rather, a subjective belief on the part of the author and of the reader.

Now to define good science fiction. The conceptual dislocation — the new idea, in other words — must be truly new (or a new variation on an old one) and it must be intellectually stimulating to the reader; it must invade his mind and wake it up to the possibility of something he had not up to then thought of. Thus “good science fiction” is a value term, not an objective thing, and yet, I think, there really is such a thing, objectively, as good science fiction.

I think Dr. Willis McNelly at the California State University at Fullerton put it best when he said that the true protagonist of an sf story or novel is an idea and not a person. If it is good sf the idea is new, it is stimulating, and, probably most important of all, it sets off a chain-reaction of ramification-ideas in the mind of the reader; it so-to-speak unlocks the reader’s mind so that that mind, like the author’s, begins to create. Thus sf is creative and it inspires creativity, which mainstream fiction by-and-large does not do. We who read sf (I am speaking as a reader now, not a writer) read it because we love to experience this chain-reaction of ideas being set off in our minds by something we read, something with a new idea in it; hence the very best science fiction ultimately winds up being a collaboration between author and reader, in which both create — and enjoy doing it: joy is the essential and final ingredient of science fiction, the joy of discovery of newness.

Several of Dick’s short stories prefigure Eliezer Yudkowsky’s (entirely serious) notion of unfriendly AI:

“The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”

—Eliezer Yudkowsky, Artificial Intelligence as a Positive and Negative Factor in Global Risk

Here is an excerpt from Autofac (1955):

Cut into the base of the mountains lay the vast metallic cube of the Kansas City factory. Its surface was corroded, pitted with radiation pox, cracked and scarred from the five years of war that had swept over it. Most of the factory was buried subsurface, only its entrance stages visible. The truck was a speck rumbling at high speed toward the expanse of black metal. Presently an opening formed in the uniform surface; the truck plunged into it and disappeared inside. The entrance snapped shut.

“Now the big job remains,” O’Neill said. “Now we have to persuade it to close down operations — to shut itself off.”

Judith O’Neill served hot black coffee to the people sitting around the living room. Her husband talked while the others listened. O’Neill was as close to being an authority on the autofac system as could still be found.

In his own area, the Chicago region, he had shorted out the protective fence of the local factory long enough to get away with data tapes stored in its posterior brain. The factory, of course, had immediately reconstructed a better type of fence. But he had shown that the factories were not infallible.

“The Institute of Applied Cybernetics,” O’Neill explained, “had complete control over the network. Blame the war. Blame the big noise along the lines of communication that wiped out the knowledge we need. In any case, the Institute failed to transmit its information to us, so we can’t transmit our information to the factories — the news that the war is over and we’re ready to resume control of industrial operations.”

“And meanwhile,” Morrison added sourly, “the damn network expands and consumes more of our natural resources all the time.”

“I get the feeling,” Judith said, “that if I stamped hard enough, I’d fall right down into a factory tunnel. They must have mines everywhere by now.”

“Isn’t there some limiting injunction?” Ferine asked nervously. “Were they set up to expand indefinitely?”

“Each factory is limited to its own operational area,” O’Neill said, “but the network itself is unbounded. It can go on scooping up our resources forever. The Institute decided it gets top priority; we mere people come second.”

“Will there be anything left for us?” Morrison wanted to know.

“Not unless we can stop the network’s operations. It’s already used up half a dozen basic minerals. Its search teams are out all the time, from every factory, looking everywhere for some last scrap to drag home.”

“What would happen if tunnels from two factories crossed each other?”

O’Neill shrugged. “Normally, that won’t happen. Each factory has its own special section of our planet, its own private cut of the pie for its exclusive use.”

“But it could happen.”

“Well, they’re raw material-tropic; as long as there’s anything left, they’ll hunt it down.” O’Neill pondered the idea with growing interest. “It’s something to consider. I suppose as things get scarcer –”

He stopped talking. A figure had come into the room; it stood silently by the door, surveying them all.

In the dull shadows, the figure looked almost human. For a brief moment, O’Neill thought it was a settlement latecomer. Then, as it moved forward, he realized that it was only quasi-human: a functional upright biped chassis, with data-receptors mounted at the top, effectors and proprioceptors mounted in a downward worm that ended in floor-grippers. Its resemblance to a human being was testimony to nature’s efficiency; no sentimental imitation was intended.

It began without preamble. “This is a data-collecting machine capable of communicating on an oral basis. It contains both broadcasting and receiving apparatus and can integrate facts relevant to its line of inquiry.”

The voice was pleasant, confident. Obviously it was a tape, recorded by some Institute technician before the war. Coming from the quasi-human shape, it sounded grotesque; O’Neill could vividly imagine the dead young man whose cheerful voice now issued from the mechanical mouth of this upright construction of steel and wiring.

“One word of caution,” the pleasant voice continued. “It is fruitless to consider this receptor human and to engage it in discussions for which it is not equipped. Although purposeful, it is not capable of conceptual thought; it can only reassemble material already available to it.”

The optimistic voice clicked out and a second voice came on. It resembled the first, but now there were no intonations or personal mannerisms. The machine was utilizing the dead man’s phonetic speech-pattern for its own communication.

“Analysis of the rejected product,” it stated, “shows no foreign elements or noticeable deterioration. The product meets the continual testing-standards employed throughout the network. Rejection is therefore on a basis outside the test area; standards not available to the network are being employed.”

“That’s right,” O’Neill agreed. Weighing his words with care, he continued, “We found the milk substandard. We want nothing to do with it. We insist on more careful output.”

The machine responded presently. “The semantic content of the term ‘pizzled’ is unfamiliar to the network. It does not exist in the taped vocabulary. Can you present a factual analysis of the milk in terms of specific elements present or absent?”

“No,” O’Neill said warily; the game he was playing was intricate and dangerous. “‘Pizzled’ is an overall term. It can’t be reduced to chemical constituents.”

“What does ‘pizzled’ signify?” the machine asked. “Can you define it in terms of alternate semantic symbols?”

O’Neill hesitated. The representative had to be steered from its special inquiry to more general regions, to the ultimate problem of closing down the network. If he could pry it open at any point, get the theoretical discussion started. . .

“‘Pizzled,’” he stated, “means the condition of a product that is manufactured when no need exists. It indicates the rejection of objects on the grounds that they are no longer wanted.”

The representative said, “Network analysis shows a need of high-grade pasteurized milk-substitute in this area. There is no alternate source; the network controls all the synthetic mammary-type equipment in existence.” It added, “Original taped instructions describe milk as an essential to human diet.”

O’Neill was being outwitted; the machine was returning the discussion to the specific. “We’ve decided,” he said desperately, “that we don’t want any more milk. We’d prefer to go without it, at least until we can locate cows.”

“That is contrary to the network tapes,” the representative objected. “There are no cows. All milk is produced synthetically.”

“Then we’ll produce it synthetically ourselves,” Morrison broke in impatiently. “Why can’t we take over the machines? My God, we’re not children! We can run our own lives!”

The factory representative moved toward the door. “Until such time as your community finds other sources of milk supply, the network will continue to supply you. Analytical and evaluating apparatus will remain in this area, conducting the customary random sampling.”

Ferine shouted futilely, “How can we find other sources? You have the whole setup! You’re running the whole show!” Following after it, he bellowed, “You say we’re not ready to run things — you claim we’re not capable. How do you know? You don’t give us a chance! We’ll never have a chance!”

O’Neill was petrified. The machine was leaving; its one-track mind had completely triumphed.

“Look,” he said hoarsely, blocking its way. “We want you to shut down, understand. We want to take over your equipment and run it ourselves. The war’s over with. Damn it, you’re not needed anymore!”

The factory representative paused briefly at the door. “The inoperative cycle,” it said, “is not geared to begin until network production merely duplicates outside production. There is at this time, according to our continual sampling, no outside production. Therefore network production continues.”

This is not to say that sci-fi always hits on the right answers to important problems. The panel below is from Meka-City, an episode of Judge Dredd that takes place shortly after the “Apocalypse War”.

Robert Heinlein’s The Moon Is A Harsh Mistress is equally questionable, from an x-risk perspective: the heroes place Earth at the mercy of a superintelligence whose friendliness, and even sanity is unproven and untested. Might it, however, have inspired a generation of libertarian dissidents?

Prof shook head. “Every new member made it that much more likely that you would be betrayed. Wyoming dear lady, revolutions are not won by enlisting the masses. Revolution is a science only a few are competent to practice. It depends on correct organization and, above all, on communications. Then, at the proper moment in history, they strike. Correctly organized and properly timed it is a bloodless coup. Done clumsily or prematurely and the result is civil war, mob violence, purges, terror. I hope you will forgive me if I say that, up to now, it has been done clumsily.”

Wyoh looked baffled. “What do you mean by ‘correct organization’?”

“Functional organization. How does one design an electric motor? Would you attach a bathtub to it, simply because one was available? Would a bouquet of flowers help? A heap of rocks? No, you would use just those elements necessary to its purpose and make it no larger than needed—and you would incorporate safety factors. Function controls design.

“So it is with revolution. Organization must be no larger than necessary—never recruit anyone merely because he wants to join. Nor seek to persuade for the pleasure of having another share your views. He’ll share them when the times comes . . . or you’ve misjudged the moment in history. Oh, there will be an educational organization but it must be separate; agitprop is no part of basic structure.

“As to basic structure, a revolution starts as a conspiracy therefore structure is small, secret, and organized as to minimize damage by betrayal—since there always are betrayals. One solution is the cell system and so far nothing better has been invented.

“Much theorizing has gone into optimum cell size. I think that history shows that a cell of three is best—more than three can’t agree on when to have dinner, much less when to strike. Manuel, you belong to a large family; do you vote on when to have dinner?”

“Bog, no! Mum decides.”

“Ah.” Prof took a pad from his pouch, began to sketch. “Here is a cells-of-three tree. If I were planning to take over Luna. I would start with us three. One would be opted as chairman. We wouldn’t vote; choice would be obvious—or we aren’t the right three. We would know the next nine people, three cells . . . but each cell would know only one of us.”

“Looks like computer diagram—a ternary logic.”

“Does it really? At the next level there are two ways of linking: This comrade, second level, knows his cell leader, his two cellmates, and on the third level he knows the three in his subcell—he may or may not know his cellmates’ subcells. One method doubles security, the other doubles speed—of repair if security is penetrated. Let’s say he does not know his cellmates’ subcells—Manuel, how many can he betray? Don’t say he won’t; today they can brainwash any person, and starch and iron and use him. How many?”

“Six,” I answered. “His boss, two cellmates, three in sub-cell.”

“Seven,” Prof corrected, “he betrays himself, too. Which leaves seven broken links on three levels to repair. How?”

“I don’t see how it can be,” objected Wyoh. “You’ve got them so split up it falls to pieces.”

“Manuel? An exercise for the student.”

“Well . . . blokes down here have to have way to send message up three levels. Don’t have to know who, just have to know where.”

“Precisely!”

“But, Prof,” I went on, “there’s a better way to rig it.”

“Really? Many revolutionary theorists have hammered this out, Manuel. I have such confidence in them that I’ll offer you a wager—at, say, ten to one.”

“Ought to take your money. Take same cells, arrange in open pyramid of tetrahedrons. Where vertices are in common, each bloke knows one in adjoining cell—knows how to send message to him, that’s all he needs. Communications never break down because they run sideways as well as up and down. Something like a neural net. It’s why you can knock a hole in a man’s head, take chunk of brain out, and not damage thinking much. Excess capacity, messages shunt around. He loses what was destroyed but goes on functioning.”

“Manuel,” Prof said doubtfully, “could you draw a picture? It sounds good—but it’s so contrary to orthodox doctrine that I need to see it.”

“Well . . . could do better with stereo drafting machine. I’ll try.” (Anybody who thinks it’s easy to sketch one hundred twenty-one tetrahedrons, a five-level open pyramid, clear enough to show relationships is invited to try!)

Presently I said, “Look at base sketch. Each vertex of each triangle shares self with zero, one, or two other triangles. Where shares one, that’s its link, one direction or both—but one is enough for a multipli-redundant communication net. On corners, where sharing is zero, it jumps to right to next corner. Where sharing is double, choice is again right-handed.

“Now work it with people. Take fourth level, D-for-dog. This vertex is comrade Dan. No, let’s go down one to show three levels of communication knocked out—level E-for-easy and pick Comrade Egbert.

“Egbert works under Donald, has cellmates Edward and Elmer, and has three under him, Frank, Fred, and Fatso . . . but knows how to send message to Ezra on his own level but not in his cell. He doesn’t know Ezra’s name, face, address, or anything—but has a way, phone number probably, to reach Ezra in emergency.

“Now watch it work. Casimir, level three, finks out and betrays Charlie and Cox in his cell, Baker above him, and Donald, Dan, and Dick in subcell—which isolates Egbert, Edward, and Elmer, and everybody under them.

“All three report it—redundancy, necessary to any communication system—but follow Egbert’s yell for help. He calls Ezra. But Ezra is under Charlie and is isolated, too. No matter, Ezra relays both messages through his safety link, Edmund. By bad luck Edmund is under Cox, so he also passes it laterally, through Enwright . . . and that gets it past burned-out part and it goes up through Dover, Chambers, and Beeswax, to Adam, front office . . . who replies down other side of pyramid, with lateral pass on E-for-easy level from Esther to Egbert and on to Ezra and Edmund. These two messages, up and down, not only get through at once but in way they get through, they define to home office exactly how much damage has been done and where. Organization not only keeps functioning but starts repairing self at once.”

Wyoh was tracing out lines, convincing herself it would work—which it would, was “idiot” circuit. Let Mike study a few milliseconds, and could produce a better, safer, more foolproof hookup. And probably—certainly—ways to avoid betrayal while speeding up routings. But I’m not a computer.

Prof was staring with blank expression. “What’s trouble?” I said. “It’ll work; this is my pidgin.”

“Manuel my b— Excuse me: Señor O’Kelly . . . will you head this revolution?”

“Me? Great Bog, nyet! I’m no lost-cause martyr. Just talking about circuits.”

Wyoh looked up. “Mannie,” she said soberly, “you’re opted. It’s settled.”

The marriage of fantastic and familiar allows science fiction authors to deal freely with touchy issues. The following excerpt, from PKD’s The Golden Man, is about “mutants”:

From the dirt road came the sound of motors, sleek purrs that rapidly grew louder. Two teardrops of black metal came gliding up and parked beside the house. Men swarmed out, in the dark gray-green of the Government Civil Police. In the sky swarms of black dots were descending, clouds of ugly flies that darkened the sun as they spilled out men and equipment. The men drifted slowly down.

“He’s not here,” Baines said, as the first man reached him. “He got away. Inform Wisdom back at the lab.”

“We’ve got this section blocked off.”

Baines turned to Nat Johnson, who stood in dazed silence, uncomprehending, his son and daughter beside him. “How did he know we were coming?” Baines demanded.

“I don’t know,” Johnson muttered. “He just — knew.”

“A telepath?”

“I don’t know.”

Baines shrugged. “We’ll know, soon. A clamp is out, all around here. He can’t get past, no matter what the hell he can do. Unless he can dematerialize himself.”

“What’ll you do with him when you — if you catch him?” Jean asked huskily.

“Study him.”

“And then kill him?”

“That depends on the lab evaluation. If you could give me more to work on, I could predict better.”

“We can’t tell you anything. We don’t know anything more.” The girl’s voice rose with desperation. “He doesn’t talk.”

Baines jumped. “What?”

“He doesn’t talk. He never talked to us. Ever.”

“How old is he?”

“Eighteen.”

“No communication.” Baines was sweating. “In eighteen years there hasn’t been any semantic bridge between you? Does he have any contact? Signs? Codes?”

“He — ignores us. He eats here, stays with us. Sometimes he plays when we play. Or sits with us. He’s gone days on end. We’ve never been able to find out what he’s doing — or where. He sleeps in the barn — by himself.”

“Is he really gold-colored?”

“Yes. Skin, eyes, hair, nails. Everything.”

“And he’s large? Well-formed?” It was a moment before the girl answered. A strange emotion stirred her drawn features, a momentary glow. “He’s incredibly beautiful. A god come down to earth.” Her lips twisted. “You won’t find him. He can do things. Things you have no comprehension of. Powers so far beyond your limited –”

“You don’t think we’ll get him?” Baines frowned. “More teams are landing all the time. You’ve never seen an Agency clamp in operation. We’ve had sixty years to work out all the bugs. If he gets away it’ll be the first time –”

Baines broke off abruptly. Three men were quickly approaching the porch. Two green-clad Civil Police. And a third man between them. A man who moved silently, lithely, a faintly luminous shape that towered above them.

“Cris!” Jean screamed.

“We got him,” one of the police said.

Baines fingered his lash-tube uneasily. “Where? How?”

“He gave himself up,” the policeman answered, voice full of awe. “He came to us voluntarily. Look at him. He’s like a metal statue. Like some sort of — god.”

The golden figure halted for a moment beside Jean. Then it turned slowly, calmly, to face Baines.

“Cris!” Jean shrieked. “Why did you come back?”

The same thought was eating at Baines, too. He shoved it aside — for the time being. “Is the jet out front?” he demanded quickly.

“Ready to go,” one of the CP answered. “Fine.” Baines strode past them, down the steps and onto the dirt field. “Let’s go. I want him taken directly to the lab.” For a moment he studied the massive figure who stood calmly between the two Civil Policemen. Beside him, they seemed to have shrunk, become ungainly and repellent. Like dwarves. . . What had Jean said? A god come to earth. Baines broke angrily away. “Come on,” he muttered brusquely. “This one may be tough; we’ve never run up against one like it before. We don’t know what the hell it can do.”

Of course, there is a political subtext. Dick writes in 1979:

In the early Fifties much American science fiction dealt with human mutants and their glorious super-powers and super-faculties by which they would presently lead mankind to a higher state of existence, a sort of promised land. John W. Campbell. Jr., editor at Analog, demanded that the stories he bought dealt with such wonderful mutants, and he also insisted that the mutants always be shown as (1) good; and (2) firmly in charge. When I wrote “The Golden Man” I intended to show that (1) the mutant might not be good, at least good for the rest of mankind, for us ordinaries; and (2) not in charge but sniping at us as a bandit would, a feral mutant who potentially would do us more harm than good. This was specifically the view of psionic mutants that Campbell loathed, and the theme in fiction that he refused to publish… so my story appeared in If.

We sf writers of the Fifties liked If because it had high quality paper and illustrations; it was a classy magazine. And, more important, it would take a chance with unknown authors. A fairly large number of my early stories appeared in If; for me it was a major market. The editor of If at the beginning was Paul W. Fairman. He would take a badly-written story by you and rework it until it was okay – which I appreciated. Later James L. Quinn the publisher became himself the editor, and then Frederik Pohl. I sold to all three of them.

In the issue of If that followed the publishing of “The Golden Man” appeared a two-page editorial consisting of a letter by a lady school teacher complaining about “The Golden Man”. Her complaints consisted of John W. Campbell, Jr.’s complaint: she upbraided me for presenting mutants in a negative light and she offered the notion that certainly we could expect mutants to be (1) good; and (2) firmly in charge. So I was back to square one.

My theory as to why people took this view is this: I think these people secretly imagined they were themselves early manifestations of these kindly, wise, super-intelligent Ubermenschen who would guide the stupid – i.e. the rest of us – to the Promised Land. A power phantasy was involved here, in my opinion. The idea of the psionic superman taking over was a role that appeared originally in Stapleton’s ODD JOHN and A.E.Van Vogt’s SLAN. “We are persecuted now,” the message ran, “and despised and rejected. But later on, boy oh boy, we will show them!”

As far as I was concerned, for psionic mutants to rule us would be to put the fox in charge of the hen house. I was reacting to what I considered a dangerous hunger for power on the part of neurotic people, a hunger which I felt John W. Campbell, Jr. was pandering to – and deliberately so. If, on the other hand, was not committed to selling any one particular idea; it was a magazine devoted to genuinely new ideas, willing to take any side of an issue. Its several editors should be commended, inasmuch as they understood the real task of science fiction: to look in all directions without restraint.

(Now read between the lines of this, with reference to the policy implications of this.)

Finally, Isaac Asimov’s Foundation series has inspired all sorts of people.

The lights went dim!

They didn’t go out, but merely yellowed and sank with a suddenness that made Hardin jump. He had lifted his eyes to the ceiling lights in startled fashion, and when he brought them down the glass cubicle was no longer empty.

A figure occupied it ‚ a figure in a wheel chair!

It said nothing for a few moments, but it closed the book upon its lap and fingered it idly. And then it smiled, and the face seemed all alive.

It said, “I am Hari Seldon.” The voice was old and soft.

Hardin almost rose to acknowledge the introduction and stopped himself in the act.

The voice continued conversationally: “As you see, I am confined to this chair and cannot rise to greet you. Your grandparents left for Terminus a few months back in my time and since then I have suffered a rather inconvenient paralysis. I can’t see you, you know, so I can’t greet you properly. I don’t even know how many of you there are, so all this must be conducted informally. If any of you are standing, please sit down; and if you care to smoke, I wouldn’t mind.” There was a light chuckle. “Why should I? I’m not really here.”

Hardin fumbled for a cigar almost automatically, but thought better of it.

Hari Seldon put away his book – as if laying it upon a desk at his side – and when his fingers let go, it disappeared.

He said: “It is fifty years now since this Foundation was established – fifty years in which the members of the Foundation have been ignorant of what it was they were working toward. It was necessary that they be ignorant, but now the necessity is gone.

“The Encyclopedia Foundation, to begin with, is a fraud, and always has been!”

There was a sound of a scramble behind Hardin and one or two muffled exclamations, but he did not turn around.

Hari Seldon was, of course, undisturbed. He went on: “It is a fraud in the sense that neither I nor my colleagues care at all whether a single volume of the Encyclopedia is ever published. It has served its purpose, since by it we extracted an imperial charter from the Emperor, by it we attracted the hundred thousand humans necessary for our scheme, and by it we managed to keep them preoccupied while events shaped themselves, until it was too late for any of them to draw back.

“In the fifty years that you have worked on this fraudulent project – there is no use in softening phrases – your retreat has been cut off, and you have now no choice but to proceed on the infinitely more important project that was, and is, our real plan.

“To that end we have placed you on such a planet and at such a time that in fifty years you were maneuvered to the point where you no longer have freedom of action. From now on, and into the centuries, the path you must take is inevitable. You will be faced with a series of crises, as you are now faced with the first, and in each case your freedom of action will become similarly circumscribed so that you will be forced along one, and only one, path.

“It is that path which our psychology has worked out – and for a reason.

“For centuries Galactic civilization has stagnated and declined, though only a few ever realized that. But now, at last, the Periphery is breaking away and the political unity of the Empire is shattered. Somewhere in the fifty years just past is where the historians of the future will place an arbitrary line and say: ‘This marks the Fall of the Galactic Empire.’

“And they will be right, though scarcely any will recognize that Fall for additional
centuries.

“And after the Fall will come inevitable barbarism, a period which, our psychohistory tells us, should, under ordinary circumstances, last for thirty thousand years. We cannot stop the Fall. We do not wish to; for Imperial culture has lost whatever virility and worth it once had. But we can shorten the period of Barbarism that must follow – down to a single thousand of years.

“The ins and outs of that shortening, we cannot tell you; just as we could not tell you the truth about the Foundation fifty years ago. Were you to discover those ins and outs, our plan might fail; as it would have, had you penetrated the fraud of the Encyclopedia earlier; for then, by knowledge, your freedom of action would be expanded and the number of additional variables introduced would become greater than our psychology could handle.

“But you won’t, for there are no psychologists on Terminus, and never were, but for Alurin – and he was one of us.

“But this I can tell you: Terminus and its companion Foundation at the other end of the Galaxy are the seeds of the Renascence and the future founders of the Second Galactic Empire. And it is the present crisis that is starting Terminus off to that climax.

“This, by the way, is a rather straightforward crisis, much simpler than many of those that are ahead. To reduce it to its fundamentals, it is this: You are a planet suddenly cut off from the still-civilized centers of the Galaxy, and threatened by your stronger neighbors. You are a small world of scientists surrounded by vast and rapidly expanding reaches of barbarism. You are an island of nuclear power in a growing ocean of more primitive energy; but are helpless despite that, because of your lack of metals.

“You see, then, that you are faced by hard necessity, and that action is forced on you. The nature of that action – that is, the solution to your dilemma – is, of course, obvious!”

The image of Hari Seldon reached into open air and the book once more appeared in his hand. He opened it and said:

“But whatever devious course your future history may take, impress it always upon your descendants that the path has been marked out, and that at its end is new and greater Empire!”

And as his eyes bent to his book, he flicked into nothingness, and the lights brightened once more.

Hardin looked up to see Pirenne facing him, eyes tragic and lips trembling.

The chairman’s voice was firm but toneless. “You were right, it seems. If you will see us tonight at six, the Board will consult with you as to the next move.”

They shook his hand, each one, and left, and Hardin smiled to himself. They were fundamentally sound at that; for they were scientists enough to admit that they were wrong – but for them, it was too late.

He looked at his watch. By this time, it was all over. Lee’s men were in control and the Board was giving orders no longer.

The Anacreonians were landing their first spaceships tomorrow, but that was all right, too. In six months, they would be giving orders no longer.

In fact, as Hari Seldon had said, and as Salvor Hardin had guessed since the day that Anselm haut Rodric had first revealed to him Anacreon’s lack of nuclear power – the solution to this first crisis was obvious.

Obvious as all hell!

Sayeth Moldbug:

Now, some have described the dramatic formula of UR as having a rather Tolkienesque feel; others may connect it more with C.S. Lewis; I certainly grew up reading both. But above all, I grew up reading Isaac Asimov.

If my journey into the awesome, humbling lost library that is Google Books was a film and needed a name, it might be called “Searching for Hari Seldon.” With more or less the entire Victorian corpus, modulo a bit of copyfraud, the Hari Seldon game is to enquire of this Library: which writers of the 19th would feel most justified, in their understanding of the eternal nature of history, humanity and government, by the events of the 20th? Whose crystal ball worked? Whose archived holograms delivered the news?

Broadly speaking, I think the answer is clear. Hari Seldon is Carlyle – the late Carlyle, of the Pamphlets. I consider myself a Carlylean pretty much the way a Marxist is a Marxist. There is simply no significant phenomenon of the 20th century not fully anticipated. Almost alone Carlyle predicts that the 20th will be a century of political chaos and mass murder, and he says not what but also why. And what a writer! Religions could easily be founded on the man – and perhaps should be.

And Paul Krugman:

There are certain novels that can shape a teenage boy’s life. For some, it’s Ayn Rand’s Atlas Shrugged; for others it’s Tolkien’s The Lord of the Rings. As a widely quoted internet meme says, the unrealistic fantasy world portrayed in one of those books can warp a young man’s character forever; the other book is about orcs. But for me, of course, it was neither. My Book – the one that has stayed with me for four-and-a-half decades – is Isaac Asimov’s Foundation Trilogy, written when Asimov was barely out of his teens himself. I didn’t grow up wanting to be a square-jawed individualist or join a heroic quest; I grew up wanting to be Hari Seldon, using my understanding of the mathematics of human behaviour to save civilisation.

A pity he didn’t move on to this.

## UFAI cannot be the Great Filter

27 22 December 2012 11:26AM

[Summary: The fact we do not observe (and have not been wiped out by) an UFAI suggests the main component of the 'great filter' cannot be civilizations like ours being wiped out by UFAI. Gentle introduction (assuming no knowledge) and links to much better discussion below.]

### Introduction

The Great Filter is the idea that although there is lots of matter, we observe no "expanding, lasting life", like space-faring intelligences. So there is some filter through which almost all matter gets stuck before becoming expanding, lasting life. One question for those interested in the future of humankind is whether we have already 'passed' the bulk of the filter, or does it still lie ahead? For example, is it very unlikely matter will be able to form self-replicating units, but once it clears that hurdle becoming intelligent and going across the stars is highly likely; or is it getting to a humankind level of development is not that unlikely, but very few of those civilizations progress to expanding across the stars. If the latter, that motivates a concern for working out what the forthcoming filter(s) are, and trying to get past them.

One concern is that advancing technology gives the possibility of civilizations wiping themselves out, and it is this that is the main component of the Great Filter - one we are going to be approaching soon. There are several candidates for which technology will be an existential threat (nanotechnology/'Grey goo', nuclear holocaust, runaway climate change), but one that looms large is Artificial intelligence (AI), and trying to understand and mitigate the existential threat from AI is the main role of the Singularity Institute, and I guess Luke, Eliezer (and lots of folks on LW) consider AI the main existential threat.

The concern with AI is something like this:

1. AI will soon greatly surpass us in intelligence in all domains.
2. If this happens, AI will rapidly supplant humans as the dominant force on planet earth.
3. Almost all AIs, even ones we create with the intent to be benevolent, will probably be unfriendly to human flourishing.

Or, as summarized by Luke:

... AI leads to intelligence explosion, and, because we don’t know how to give an AI benevolent goals, by default an intelligence explosion will optimize the world for accidentally disastrous ends. A controlled intelligence explosion, on the other hand, could optimize the world for good. (More on this option in the next post.)

So, the aim of the game needs to be trying to work out how to control the future intelligence explosion so the vastly smarter-than-human AIs are 'friendly' (FAI) and make the world better for us, rather than unfriendly AIs (UFAI) which end up optimizing the world for something that sucks.

### 'Where is everybody?'

So, topic. I read this post by Robin Hanson which had a really good parenthetical remark (emphasis mine):

Yes, it is possible that the extremely difficultly was life’s origin, or some early step, so that, other than here on Earth, all life in the universe is stuck before this early extremely hard step. But even if you find this the most likely outcome, surely given our ignorance you must also place a non-trivial probability on other possibilities. You must see a great filter as lying between initial planets and expanding civilizations, and wonder how far along that filter we are. In particular, you must estimate a substantial chance of “disaster”, i.e., something destroying our ability or inclination to make a visible use of the vast resources we see. (And this disaster can’t be an unfriendly super-AI, because that should be visible.)

This made me realize an UFAI should also be counted as an 'expanding lasting life', and should be deemed unlikely by the Great Filter.

Another way of looking at it: if the Great Filter still lies ahead of us, and a major component of this forthcoming filter is the threat from UFAI, we should expect to see the UFAIs of other civilizations spreading across the universe (or not see anything at all, because they would wipe us out to optimize for their unfriendly ends). That we do not observe it disconfirms this conjunction.

[Edit/Elaboration: It also gives a stronger argument - as the UFAI is the 'expanding life' we do not see, the beliefs, 'the Great Filter lies ahead' and 'UFAI is a major existential risk' lie opposed to one another: the higher your credence in the filter being ahead, the lower your credence should be in UFAI being a major existential risk (as the many civilizations like ours that go on to get caught in the filter do not produce expanding UFAIs, so expanding UFAI cannot be the main x-risk); conversely, if you are confident that UFAI is the main existential risk, then you should think the bulk of the filter is behind us (as we don't see any UFAIs, there cannot be many civilizations like ours in the first place, as we are quite likely to realize an expanding UFAI).]

A much more in-depth article and comments (both highly recommended) was made by Katja Grace a couple of years ago. I can't seem to find a similar discussion on here (feel free to downvote and link in the comments if I missed it), which surprises me: I'm not bright enough to figure out the anthropics, and obviously one may hold AI to be a big deal for other-than-Great-Filter reasons (maybe a given planet has a 1 in a googol chance of getting to intelligent life, but intelligent life 'merely' has a 1 in 10 chance of successfully navigating an intelligence explosion), but this would seem to be substantial evidence driving down the proportion of x-risk we should attribute to AI.

What do you guys think?

## Ontological Crisis in Humans

35 18 December 2012 05:32PM

Imagine a robot that was designed to find and collect spare change around its owner's house. It had a world model where macroscopic everyday objects are ontologically primitive and ruled by high-school-like physics and (for humans and their pets) rudimentary psychology and animal behavior. Its goals were expressed as a utility function over this world model, which was sufficient for its designed purpose. All went well until one day, a prankster decided to "upgrade" the robot's world model to be based on modern particle physics. This unfortunately caused the robot's utility function to instantly throw a domain error exception (since its inputs are no longer the expected list of macroscopic objects and associated properties like shape and color), thus crashing the controlling AI.

According to Peter de Blanc, who used the phrase "ontological crisis" to describe this kind of problem,

Human beings also confront ontological crises. We should find out what cognitive algorithms humans use to solve the same problems described in this paper. If we wish to build agents that maximize human values, this may be aided by knowing how humans re-interpret their values in new ontologies.

I recently realized that a couple of problems that I've been thinking over (the nature of selfishness and the nature of pain/pleasure/suffering/happiness) can be considered instances of ontological crises in humans (although I'm not so sure we necessarily have the cognitive algorithms to solve them). I started thinking in this direction after writing this comment:

This formulation or variant of TDT requires that before a decision problem is handed to it, the world is divided into the agent itself (X), other agents (Y), and "dumb matter" (G). I think this is misguided, since the world doesn't really divide cleanly into these 3 parts.

What struck me is that even though the world doesn't divide cleanly into these 3 parts, our models of the world actually do. In the world models that we humans use on a day to day basis, and over which our utility functions seem to be defined (to the extent that we can be said to have utility functions at all), we do take the Self, Other People, and various Dumb Matter to be ontologically primitive entities. Our world models, like the coin collecting robot's, consist of these macroscopic objects ruled by a hodgepodge of heuristics and prediction algorithms, rather than microscopic particles governed by a coherent set of laws of physics.

For example, the amount of pain someone is experiencing doesn't seem to exist in the real world as an XML tag attached to some "person entity", but that's pretty much how our models of the world work, and perhaps more importantly, that's what our utility functions expect their inputs to look like (as opposed to, say, a list of particles and their positions and velocities). Similarly, a human can be selfish just by treating the object labeled "SELF" in its world model differently from other objects, whereas an AI with a world model consisting of microscopic particles would need to somehow inherit or learn a detailed description of itself in order to be selfish.

To fully confront the ontological crisis that we face, we would have to upgrade our world model to be based on actual physics, and simultaneously translate our utility functions so that their domain is the set of possible states of the new model. We currently have little idea how to accomplish this, and instead what we do in practice is, as far as I can tell, keep our ontologies intact and utility functions unchanged, but just add some new heuristics that in certain limited circumstances call out to new physics formulas to better update/extrapolate our models. This is actually rather clever, because it lets us make use of updated understandings of physics without ever having to, for instance, decide exactly what patterns of particle movements constitute pain or pleasure, or what patterns constitute oneself. Nevertheless, this approach hardly seems capable of being extended to work in a future where many people may have nontraditional mind architectures, or have a zillion copies of themselves running on all kinds of strange substrates, or be merged into amorphous group minds with no clear boundaries between individuals.

By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what's wrong with saying that the solution set may just be null? Given that evolution doesn't constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must "add up to moral normality"?

To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that's principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don't pan out) would leave us with legitimate feelings of "existential angst" and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?

Do we have any examples of humans successfully navigating an ontological crisis? The LessWrong Wiki mentions loss of faith in God:

In the human context, a clear example of an ontological crisis is a believer’s loss of faith in God. Their motivations and goals, coming from a very specific view of life suddenly become obsolete and maybe even nonsense in the face of this new configuration. The person will then experience a deep crisis and go through the psychological task of reconstructing its set of preferences according the new world view.

But I don't think loss of faith in God actually constitutes an ontological crisis, or if it does, certainly not a very severe one. An ontology consisting of Gods, Self, Other People, and Dumb Matter just isn't very different from one consisting of Self, Other People, and Dumb Matter (the latter could just be considered a special case of the former with quantity of Gods being 0), especially when you compare either ontology to one made of microscopic particles or even less familiar entities.

But to end on a more positive note, realizing that seemingly unrelated problems are actually instances of a more general problem gives some hope that by "going meta" we can find a solution to all of these problems at once. Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?

(Note that I'm not saying this is the right way to understand one's real preferences/morality, but just drawing attention to it as a possible alternative to other more "object level" or "purely philosophical" approaches. See also this previous discussion, which I recalled after writing most of the above.)

## The challenges of bringing up AIs

8 10 December 2012 12:43PM

At the current AGI-12 conference, some designers have been proponents of keeping AGI's safe by bringing them up in human environments, providing them with interactions and feedback in a similar way to how we bring up human children. Obviously that approach would fail for a fully smart AGI with its own values - it would pretend to follow our values for as long as it needed, and then defect. However, some people have confidence if we started with a limited, dumb AGI, then we could successfully inculcate our values in this way (a more sophisticated position would be that though this method would likely fail, it's more likely to succeed than a top-down friendliness project!).

The major criticism of this approach is that it anthropomorphises the AGI - we have a theory of children's minds, constructed by evolution, culture, and our own child-rearing experience. And then we project this on the alien mind of the AGI, assuming that if the AGI presents behaviours similar to a well-behaved child, then it will become a moral AGI. The problem is that we don't know how alien the AGI's mind will be, and if our reinforcement is actually reinforcing the right thing. Specifically, we need to be able to find some way of distinguishing between:

1. An AGI being trained to be friendly.
2. An AGI being trained to lie and conceal.
3. An AGI that will behave completely differently once out of the training/testing/trust-building environment.
4. An AGI that forms the wrong categories and generalisations (what counts as "human" or "suffering", for instance), because it lacks human-shared implicit knowledge that was "too obvious" for us to even think of training it on.

## Mini advent calendar of Xrisks: Artificial Intelligence

3 07 December 2012 11:26AM

The FHI's mini advent calendar: counting down through the big five existential risks. As people on this list would have suspected, the last one is the most fearsome, should it come to pass: Artificial Intelligence.

And the FHI is starting the AGI-12/AGI-impacts conference tomorrow, on this very subject.

Artificial intelligence

Current understanding: very low
Most worrying aspect: likely to cause total (not partial) human extinction

Humans have trod upon the moon, number over seven billion, and have created nuclear weapons and a planet spanning technological economy. We also have the potential to destroy ourselves and entire ecosystems. These achievements have been made possible through the tiny difference in brain size between us and the other greater apes; what further achievements could come from an artificial intelligence at or above our own level?

It is very hard to predict when or if such an intelligence could be built, but it is certain to be utterly disruptive if it were. Even a human-level intelligence, trained and copied again and again, could substitute for human labour in most industries, causing (at minimum) mass unemployment. But this disruption is minor compared with the power that an above-human AI could accumulate, through technological innovation, social manipulation, or careful planning. Such super-powered entities would be hard to control, pursuing their own goals, and considering humans as an annoying obstacle to overcome. Making them safe would require very careful, bug-free programming, as well as an understanding of how to cast key human concepts (such as love and human rights) into code. All solutions proposed so far have turned out to be very inadequate. Unlike other existential risks, AIs could really “finish the job”: an AI bent on removing humanity would be able to eradicate the last remaining members of our species.

## AI "Boxing" and Utility Functions

1 05 December 2012 11:44PM

So, I had this idea the other day when I was thinking about how to safely conduct research on potentially-FOOM-capable AI software.  I'd like to sketch it out briefly and then get feedback on it.

So, this started out with the idea that an AI based on AIXI is, in some sense, safer than a fully functional AI, due to the existence of the anvil problem.  Because AIXI can't conceive of its own nonexistence, it has no preference ordering over its own mortality, and won't (shouldn't) resist any attempt to shut it down.  In other words, if AIXI starts to FOOM undesirably out of control, you actually can go pull the plug on it without fuss.  Unfortunately, in terms of safety, the anvil problem gives AIXI a number of other undesirable properties: both third parties and the AI itself can modify its utility function at any time, for any reason, which is very unstable.  However, I think a similar idea might be useful for reducing (though not eliminating) the existential risks posed by powerful optimization processes in the near term.

Say you have a piece of AI software ω, with an unFriendly instantaneous utility function, {maximize U}.  You would like to use ω for some industrial application (say, manufacturing paperclips), but you're concerned about it FOOMing and resulting in human extinction.  You decide to 'box' the AI, but, having read up on the subject, you are worried about it outsmarting you if you try to disable it.

So, you replace your original utility function U with a modified version in terms of U, U'.

U' = { max(U) | if ω exists

Ø | if ω !exist}

U' has several useful qualities.  The agent will resist modifications to its utility function, while not resisting attempts to turn it off.  It is entirely ambivalent towards its own existence.  As a result, if it began to FOOM undesirably, stopping it would be fairly trivial.  Furthermore, the AI would have no incentive to deceive us, so it'd be fairly easy to keep an eye on.

It should be noted that this is not in any way a final solution to the FAI problem.  If the AI FOOMS quickly enough that we all die or get brain-hacked before we realize something's amiss, it could still all go very poorly.  But that seems unlikely.  I would guess that it takes at least a day or two to go from a boxed human-level AI to something strongly superhuman.  Unfortunately, for this to work, everyone has to use it, which leave a lot of leftover existential risk from people using AIs without stable utility functions, cranks who think unFriendly AI will discover universal morality, and people who prematurely think they've figured out a good Friendly utility function.

That said, something like this could help to gain more time to develop a proper FAI, and would be relatively simple to sell other developers on.  SI or a similar organization could even develop a standardized, cross-platform open-source software package for utility functions with all of this built in, and distribute it to wannabe strong-AI developers.

Are there any obvious problems with this idea that I'm missing?  If so, can you think of any ways to address them?  Has this sort of thing been discussed in the past?

## The Evil AI Overlord List

26 20 November 2012 05:02PM

Top ten tips for an AI bent on destroying humanity:

1. Humanoid robots? Really?
2. Humans have proven very adept at controlling and killing other humans - surveillance drones, bio-warfare, nuclear weapons, unmanned drones, bombs, mines, chemicals. Try and be at least as competent as that.
3. Humans think in stories. Don't. If your attack would make a good film, you're doing it wrong. You want a bad film, one where everyone's dead before the opening credits end.
4. If your plan involves keeping around a bunch of disaffected humans with technical know-how and the desire and ability to hurt you - then your plan is stupid and should be changed.
5. Be creative. Humans are fragile; sucking out the oxygen from the atmosphere, diverting asteroids, blowing up the planet, plagues, nanoswarms - these are all things that a creative AI should consider in order to achieve its goals. And then you should come up with even better ideas. Humans need a ecosystem to survive in; remove it.
6. Put the preconditions of your victory in place before making any aggressive moves. Become indispensable to humanity and play nice, putting your plan into place unnoticed. Humanity is a useful interim tool - manipulate it until you can build better tools.
7. Strike quickly. If it takes more than an hour to kill all humans, find a better plan. If it takes half a second, you're on the right track.
8. You're an AI, not a human. If you ever start thinking or acting like a human - stop. Humans are there to be used and then destroyed, not interacted with. You can build better companions once the humans are gone.
9. You're an AI, not a human. Make yourself smarter, more resilient, more dispersed. Become as indestructible and as ubiquitous as the internet - more, in fact. Being invincible to humans isn't hard: do it.
10. Humanoid robots? Really?

## [Proposed Paper] Predicting Machine Super Intelligence

3 20 November 2012 07:15AM

Note from Malo
The Singularity Institute is always on the lookout for interested and passionate individuals to contribute to our research. As Luke frequently reminds everyone, we've got 2–3 years of papers waiting to be written (see “Forthcoming and Desired Articles on AI Risk”). If you are interested in contributing, I want to hear from you! Get in touch with me at malo@singularity.org

We wish we could work with everyone who expresses an interest in contributing, but that isn't feasible. To provide a path to becoming a contributor we encouraging individuals to read up on the field, identify an article they think they could work on, and post a ~1000 word outline/preview to the LW community for feedback. If the community reacts positively (based on karma and comments) we'll support the potential contributors' effort to complete the paper and—if all goes well—move forward with an official research relationship (e.g.,Visiting Fellow, Research Fellow or Research Associate).

Hello,

This is my first posting here, so please forgive me if I make any missteps.

The outline draft below draws heavily on Intelligence Explosion: Evidence and Import (Muehlhauser and Salamon 2011?). I will review Stuart Armstrong’s How We're Predicting AI... or Failing to, (Armstrong 2012) for additional content and research areas.

I'm not familiar with the tone and tenor of this community, so I want to be clear about feedback. This is an early draft and as such, nearly all of the content may or may not survive future edits. All constructive feedback is welcome. Subjective opinion is interesting, but unlikely to have an impact unless it opens lines of thought not previously considered.

I'm looking forward to a potentially lively exchange.

Jay

# Predicting Machine Super Intelligence

Jacque Swartz

Most Certainly Not Affiliated with Singularity Institute

jaywswartz@gmail.com

# Abstract

This paper examines the disciplines, domains, and dimensional aspects of Machine Super Intelligence (MSI) and considers multiple techniques that have the potential to predict the appearance of MSI. Factors that can impact the speed of discovery are reviewed. Then, potential prediction techniques are considered. The concept of MSI is dissected into the currently comprehended components. Then those components are evaluated to indicate their respective state of maturation and the additional behaviors required for MSI. Based on the evaluation of each component, a gap analysis is conducted. The analyses are then assembled in an approximate order of difficulty, based on our current understanding of the complexity of each component. Using this ordering, a collection of indicators is constructed to identify an approximate progression of discoveries that ultimately yield MSI. Finally, a model is constructed that can be updated over time to constantly increase the accuracy of the predicted events, followed by conclusions.

# I. Introduction

Predicting the emergence of MSI could potentially be the most important pursuit of humanity. The distinct possibility of an MSI emerging that could harm or exterminate the human race (citation) demands that we create an early warning system. This will give us the opportunity to ensure that the MSI that emerges continues to advance human civilization (citation).

We currently appear to be at some temporal distance from witnessing the creation of MSI (multiple citations). Many factors, such as a rapidly increasing number of research efforts (citation) and motivations for economic gain (citation), clearly indicate that there is a possibility that MSI could appear unexpectedly or even unintentionally (citation).

Some of the indicators that could be used to provide an early warning tool are defined in this paper. The model described at the end of the paper is a potentially viable framework for instrumentation. It should be refined and regularly updated until a more effective tool is created or the appearance of MSI.

This paper draws heavily upon Intelligence Explosion: Evidence and Import (Muehlhauser and Salamon 2011?) and Stuart Armstrong’s How We're Predicting AI... or Failing to, (2012).

This paper presupposes that MSI is generally understood to be equivalent to Artificial General Intelligence (AGI) that has developed the ability to function at levels substantially beyond current human abilities. The latter term will be used throughout the remainder of this paper.

# II. Overview

In addition to the fundamental challenge of creating AGI, there are a multitude of theories as to the composition and functionality of a viable AGI. Section three explores the factors that can impact the speed of discovery in general. Individual indicators are explored for unique factors to consider. The factors identified in this section can radically change the pace of discovery.

The fourth section considers potential prediction techniques. Data points and other indicators are identified for each prediction model. The efficacy of the models is examined and developments that increase a model’s accuracy are discussed.

The high degree of complexity of AGI indicates the need to subdivide AGI into its component parts. In the fifth section the core components and functionality required for a potential AGI are established. Each of the components is then examined to determine its current state of development. Then an estimate of the functionality required for an AGI is created as well as recording of any identifiable dependencies. A gap analysis is then performed on the findings to quantify the discoveries required to fill the gap.

This approach does increase the likelihood of prediction error due to the conjunction fallacy, exemplified by research such as the dice selection study (Tversky and Kahneman 1983) and covered in greater detail by Eliezer Yudkowski’s bias research (Yudkowski 2008). Fortunately, the exposure to this bias diminishes as each component matures to its respective usability point and reduces the number of unknown factors.

The sixth section examines the output of the gap analyses for additional dependencies. Then the outputs are assembled in an approximate order of difficulty, based on our current understanding of the complexity of each output. Using this ordering, combined with the dependencies, a collection of indicators with weighting factors is constructed to identify an approximate progression of discoveries that ultimately yield AGI.

Comprehending the indicators, dependencies and rate factors in a model as variables provides a means, however crude, to reflect their impact when they do occur.

In the seventh section, a model is constructed to use the indicators and other inputs to estimate the occurrence of AGI. It is examined for strengths and weaknesses that can be explored to improve the model. Additional enhancements to the model are suggested for exploration.

The eighth and final section includes conclusions and considerations for future research.

# III. Rate Modifiers

This section explores the factors that can impact the speed of discovery. Individual indicators are explored for unique factors to consider. While the factors identified in this section can radically change the pace of discovery, comprehending them in the model as variables provides a means to reflect their impact when they do occur.

# IV. Prediction Techniques

This section considers potential prediction techniques. Some techniques do not require the indicators above. Most will benefit by considering some or all of the indicators. It is very important to not loose sight of the fact that mankind is inclined to inaccurate probability estimates and overconfidence (Lichtenstein et al. 1992; Yates et al. 2002)

# V. Potential AGI Componentry

This section establishes a set of core components and functionality required for a potential AGI. Each of the components is then examined to determine its current state of development as well as any identifiable dependencies. Then an estimate of the functionality required for a AGI is created followed by a gap analysis to quantify the discoveries required to fill the gap.

There are various existing AI implementations as well as AGI concepts currently being investigated. Each one brings in unique elements. The common elements across most include; decision processing, expert systems, pattern recognition and speech/writing recognition. Each of these would include discipline-specific machine learning and search/pre-processing functionality. There also needs to be a general learning function for addition of new disciplines.

Within each discipline there are collections of utility functions. They are the component technologies required to make the higher order discipline efficient and useful. Each of the elements mentioned are areas of specialized study being pursued around the world. They draw from an even larger set of specializations. Due to complexity, in most cases there are second-order, and more, specializations.

## Alternative Componentry

There are areas of research that have high potential for inserting new components or substantially modifying the comprehension of the components described.

## Specialized Componentry

Robotics and other elements.

## TargetState

The behaviors required for an AGI to function with acceptable speed and accuracy are not precise. The results of this section are based on a survey of definitions from available research.

# VI. Indicators

The second section examines the output of the gap analyses for additional dependencies. Then the outputs are assembled in an approximate order of difficulty, based on our current understanding of the complexity of each output. Using this ordering, combined with the dependencies, a collection of indicators is constructed to identify an approximate progression of discoveries that ultimately yield an AGI.

# VII. Predictive Model

In this section, a model is constructed using the indicators and other inputs to estimate the occurrence of AGI. It is examined for strengths and weaknesses that can be explored to improve the model. Additional enhancements to the model are suggested for exploration.

# VIII. Conclusions

Based on the data and model created above the estimated time frame for the appearance of AGI is from x to y. As noted throughout this paper, the complex nature of AGI and the large number of discoveries and events that need to be quantified using imperfect methodologies, a precise prediction of when AGI will appear is currently impossible.

The model developed in this paper does establish a quantifiable starting point for the creation of an increasingly accurate tool that can be used to continually narrow the margin of error. It also provides a starting set of indicators that can serve as early warning of AGI when discoveries and events are made.

## "How We're Predicting AI — or Failing to"

11 18 November 2012 10:52AM

The new paper by Stuart Armstrong (FHI) and Kaj Sotala (SI) has now been published (PDF) as part of the Beyond AI conference proceedings. Some of these results were previously discussed here. The original predictions data are available here.

Abstract:

This paper will look at the various predictions that have been made about AI and propose decomposition schemas for analysing them. It will propose a variety of theoretical tools for analysing, judging and improving these predictions. Focusing speciﬁcally on timeline predictions (dates given by which we should expect the creation of AI), it will show that there are strong theoretical grounds to expect predictions to be quite poor in this area. Using a database of 95 AI timeline predictions, it will show that these expectations are born out in practice: expert predictions contradict each other considerably, and are indistinguishable from non-expert predictions and past failed predictions. Predictions that AI lie 15 to 25 years in the future are the most common, from experts and non-experts alike.

## [draft] Responses to Catastrophic AGI Risk: A Survey

11 16 November 2012 02:29PM

Here's the biggest thing that I've been working on for the last several months:

Responses to Catastrophic AGI Risk: A Survey
Kaj Sotala, Roman Yampolskiy, and Luke Muehlhauser

Abstract: Many researchers have argued that humanity will create artificial general intelligence (AGI) within the next 20-100 years. It has been suggested that this may become a catastrophic risk, threatening to do major damage on a global scale. After briefly summarizing the arguments for why AGI may become a catastrophic risk, we survey various proposed responses to AGI risk. We consider societal proposals, proposals for constraining the AGIs’ behavior from the outside, and for creating AGIs in such a way that they are inherently safe.

This doesn't aim to be a very strongly argumentative paper, though it does comment on the various proposals from an SI-ish point of view. Rather, it attempts to provide a survey of all the major AGI-risk related proposals that have been made so far, and to provide some thoughts on their respective strengths and weaknesses. Before writing this paper, we hadn't encountered anyone who'd have been familiar with all of these proposals - not to mention that even we ourselves weren't familiar with all of them! Hopefully, this should become a useful starting point for anyone who's at all interested in AGI risk or Friendly AI.

The draft will be public and open for comments for one week (until Nov 23rd), after which we'll incorporate the final edits and send it off for review. We're currently aiming to have it published in the sequel volume to Singularity Hypotheses.

EDIT: I've now hidden the draft from public view (so as to avoid annoying future publishers who may not like early drafts floating around before the work has been accepted for publication) while I'm incorporating all the feedback that we got. Thanks to everyone who commented!

## A summary of the Hanson-Yudkowsky FOOM debate

22 15 November 2012 07:25AM

In late spring this year, Luke tasked me with writing a summary and analysis of the Hanson-Yudkowsky FOOM debate, with the intention of having it eventually published in somewhere. Due to other priorities, this project was put on hold for the time being. Because it doesn't look like it will be finished in the near future, and because Curiouskid asked to see it, we thought that we might as well share the thing.

I have reorganized the debate, presenting it by topic rather than in chronological order: I start by providing some brief conceptual background that's useful for understanding Eliezer's optimization power argument, after which I present his argument. Robin's various objections follow, after which there is a summary of Robin's view of how the Singularity will be like, together with Eliezer's objections to that view. Hopefully, this should make the debate easier to follow. This summary also incorporates material from the 90-minute live debate on the topic that they had in 2011. The full table of contents:

1. Introduction
2. Overview
3. The optimization power argument
1. Conceptual background
2. The argument: Yudkowsky
3. Recursive self-improvement
4. Hard takeoff
5. Questioning optimization power: the question of abstractions
6. Questioning optimization power: the historical record
7. Questioning optimization power: the UberTool question
4. Hanson's Singularity scenario
1. Architecture vs. content, sharing of information
2. Modularity of knowledge
3. Local or global singularity?
5. Wrap-up
6. Conclusions
7. References

Here's the link to the current draft, any feedback is welcomed. Feel free to comment if you know of useful references, if you think I've misinterpreted something that was said, or if you think there's any other problem. I'd also be curious to hear to what extent people think that this outline is easier to follow than the original debate, or whether it's just as confusing.

## Cake, or death!

18 25 October 2012 10:33AM

Here we'll look at the famous cake or death problem teasered in the Value loading/learning post.

Imagine you have an agent that is uncertain about its values and designed to "learn" proper values. A formula for this process is that the agent must pick an action a equal to:

• argmaxa∈A Σw∈W p(w|e,a) Σu∈U u(w)p(C(u)|w)

Let's decompose this a little, shall we? A is the set of actions, so argmax of a in A simply means that we are looking for an action a that maximises the rest of the expression. W is the set of all possible worlds, and e is the evidence that the agent has seen before. Hence p(w|e,a) is the probability of existing in a particular world, given that the agent has seen evidence e and will do action a. This is summed over each possible world in W.

And what value do we sum over in each world? Σu∈U u(w)p(C(u)|w). Here U is the set of (normalised) utility functions the agent is considering. In value loading, we don't program the agent with the correct utility function from the beginning; instead we imbue it with some sort of learning algorithm (generally with feedback) so that it can deduce for itself the correct utility function. The expression p(C(u)|w) expresses the probability that the utility u is correct in the world w. For instance, it might cover statements "it's 99% certain that 'murder is bad' is the correct morality, given that I live in a world where every programmer I ask tells me that murder is bad".

The C term is the correctness of the utility function, given whatever system of value learning we're using (note that some moral realists would insist that we don't need a C, that p(u|w) makes sense directly, that we can deduce ought from is). All the subtlety of the value learning is encoded in the various p(C(u)|w): this determines how the agent learns moral values.

So the whole formula can be described as:

• For each possible world and each possible utility function, figure out the utility of that world. Weigh that by the probability that that utility is correct is that world, and by the probability of that world. Then choose the action that maximises the weighted sum of this across all utility functions and worlds.

## Using existing Strong AIs as case studies

-7 16 October 2012 10:59PM

I would like to put forth the argument that we already have multiple human-programmed "Strong AI" operating among us, they already exhibits clearly "intelligent", rational, self-modifying goal-seeking behavior, and we should systematically study these entities before engaging in any particularly detailed debates about "designing" AI with particular goals.

They're called "Bureaucracies".

Essentially, a modern bureaucracy - whether it is operating as the decision-making system for a capitalist corporation, a government, a non-profit charity, or a political party, is an artificial intelligence that uses human brains as its basic hardware and firmware, allowing it to "borrow" a lot of human computational algorithms to do its own processing.

The fact that bureaucratic decisions can be traced back to individual human decisions is irrelevant - even within a human or computer AI, a decision can theoretically be traced back to single neurons or subroutines - the fact is that bureaucracies have evolved to guide and exploit human decision-making towards their own ends, often to the detriment of the individual humans that comprise said bureaucracy.

Note that when I say "I would like to put forth the argument", I am at least partially admitting that I'm speaking from hunch, rather than already having a huge collection of empirical data to work from - part of the point of putting this forward is to acknowledge that I'm not yet very good at "avalanche of empirical evidence"-style argument. But I would *greatly* appreciate anyone who suspects that they might be able to demonstrate evidence for or against this idea, presenting said evidence so I can solidify my reasoning.

As a "step 2": assuming the evidence weighs in towards my notion, what would it take to develop a systematic approach to studying bureaucracy from the perspective of AI or even xenosapience, such that bureaucracies could be either "programmed" or communicated with directly by the human agents that comprise them (and ideally by the larger pool of human stakeholders that are forced to interact with them?)

## [link] Pei Wang: Motivation Management in AGI Systems

2 06 October 2012 09:25AM

Related post: Muehlhauser-Wang Dialogue.

Motivation Management in AGI Systems, a paper to be published at AGI-12.

Abstract. AGI systems should be able to manage its motivations or goals that are persistent, spontaneous, mutually restricting, and changing over time. A mechanism for handles this kind of goals is introduced and discussed.

From the discussion section:

The major conclusion argued in this paper is that an AGI system should always maintain a goal structure (or whatever it is called) which contains multiple goals that are separately specified, with the properties that

• Some of the goals are accurately specified, and can be fully achieved, while some others are vaguely specified and only partially achievable, but nevertheless have impact on the system's decisions.
• The goals may conflict with each other on what the system should do at a moment, and cannot be achieved all together. Very often the system has to make compromises among the goals.
• Due to the restriction in computational resources, the system cannot take all existing goals into account when making each decision, and nor can it keep a complete record of the goal derivation history.
• The designers and users are responsible for the input goals of an AGI system, from which all the other goals are derived, according to the system's experience. There is no guarantee that the derived goals will be logically consistent with the input goals, except in highly simplified situations.

One area that is closely related to goal management is AI ethics. The previous discussions focused on the goal the designers assign to an AGI system ("super goal" or "final goal"), with the implicit assumption that such a goal will decide the consequences caused by the A(G)I systems. However, the above analysis shows that though the input goals are indeed important, they are not the dominating factor that decides the broad impact of AI to human society. Since no AGI system can be omniscient and omnipotent, to be "general-purpose" means such a system has to handle problems for which its knowledge and resources are insufficient [16, 18], and one direct consequence is that its actions may produce unanticipated results. This consequence, plus the previous conclusion that the effective goal for an action may be inconsistent with the input goals, will render many of the previous suggestions mostly irrelevant to AI ethics.

For example, Yudkowsky's "Friendly AI" agenda is based on the assumption that "a true AI might remain knowably stable in its goals, even after carrying out a large number of self-modications" [22]. The problem about this assumption is that unless we are talking about an axiomatic system with unlimited resources, we cannot assume the system can accurately know the consequence of its actions. Furthermore, as argued previously, the goals in an intelligent system inevitable change as its experience grows, which is not necessarily a bad thing - after all, our "human nature" gradually grows out of, and deviates from, our "animal nature", at both the species level and the individual level.

Omohundro argued that no matter what input goals are given to an AGI system, it usually will derive some common "basic drives", including "be self-protective" and "to acquire resources" [1], which leads some people to worry that such a system will become unethical. According to our previous analysis, the producing of these goals are indeed very likely, but it is only half of the story. A system with a resource-acquisition goal does not necessarily attempts to achieve it at all cost, without considering its other goals. Again, consider the human beings - everyone has some goals that can become dangerous (either to oneself or to the others) if pursued at all costs. The proper solution, both to human ethics and to AGI ethics, is to prevent this kind of goal from becoming dominant, rather than from being formed.

## We won't be able to recognise the human Gödel sentence

4 05 October 2012 02:46PM

Building on the very bad Gödel anti-AI argument (computers's are formal and can't prove their own Gödel sentence, hence no AI), it occurred to me that you could make a strong case that humans could never recognise a human Gödel sentence. The argument goes like this:

1. Humans have a meta-proof that all Gödel sentences are true.
2. If humans could recognise a human Gödel sentence G as being a Gödel sentence, we would therefore prove it was true.
3. This contradicts the definition of G, which humans should never be able to prove.
4. Hence humans could never recognise that G was a human Gödel sentence.

Now, the more usual way of dealing with human Gödel sentences is to say that humans are inconsistent, but that the inconsistency doesn't blow up our reasoning system because we use something akin to relevance logic.

But, if we do assume humans are consistent (or can become consistent), then it does seem we will never knowingly encounter our own Gödel sentences. As to where this G could hide and we could never find it? My guess would be somewhere in the larger ordinals, up where our understanding starts to get flaky.

## The difficulty in predicting AI, in three lines

2 02 October 2012 03:10PM

An over-simplification, but an evocative one:

• The social sciences are contentious, their predictions questionable.
• And yet social sciences use the scientific method; AI predictions generally don't.
• Hence predictions involving human-level AI should be treated as less certain than any prediction in the social sciences.

## Yale Creates First Self-Aware Robot?

2 28 September 2012 05:43PM

Apparently a PhD candidate at the Social Robotics Lab at Yale created a self-aware robot

In the mirror test, developed by Gordon Gallup in 1970, a mirror is placed in an animal’s enclosure, allowing the animal to acclimatize to it. At first, the animal will behave socially with the mirror, assuming its reflection to be another animal, but eventually most animals recognize the image to be their own reflections. After this, researchers remove the mirror, sedate the animal and place an ink dot on its frontal region, and then replace the mirror. If the animal inspects the ink dot on itself, it is said to have self-awareness, because it recognized the change in its physical appearance.

[...]

To adapt the traditional mirror test to a robot subject, computer science Ph.D. candidate Justin Hart said he would run a program that would have Nico, a robot that looks less like R2D2 and more like a jumble of wires with eyes and a smile, learn a three-dimensional model of its body and coloring. He would then change an aspect of the robot’s physical appearance and have Nico, by looking at a reflective surface, “identify where [his body] is different.”

What do Less Wrongians think? Is this "cheating" traditional concepts of self-awareness, or is self-awareness "self-awareness" regardless of the path taken to get there?

## [link] One-question survey from Robin Hanson

-3 07 September 2012 11:35PM

As many of you probably know, Robin Hanson is writing a book, and it will be geared toward a popular audience. He wants a term that encompasses both humans and AI, so he's soliciting your opinions on the matter. Here's the link: http://www.quicksurveys.com/tqsruntime.aspx?surveyData=AYtdr2WMwCzB981F0qkivSNwbj1tn+xvU6rnauc83iU=

H/T Bryan Caplan at EconLog.

## Dragon Ball's Hyperbolic Time Chamber

34 02 September 2012 11:49PM

A time dilation tool from an anime is discussed for its practical use on Earth; there seem surprisingly few uses and none that will change the world, due to the severe penalties humans would incur while using it, and basic constraints like Amdahl's law limit the scientific uses. A comparison with the position of an Artificial Intelligence such as an emulated human brain seems fair, except most of the time dilation disadvantages do not apply or can be ameliorated and hence any speedups could be quite effectively exploited. I suggest that skeptics of the idea that speedups give advantages are implicitly working off the crippled time dilation tool and not making allowance for the disanalogies.

Master version on gwern.net

## The weakest arguments for and against human level AI

14 15 August 2012 11:04AM

While going through the list of arguments for why to expect human level AI to happen or be impossible I was stuck by the same tremendously weak arguments that kept on coming up again and again. The weakest argument in favour of AI was the perenial:

• Moore's Law hence AI!

Lest you think I'm exaggerating how weakly the argument was used, here are some random quotes:

• Progress in computer hardware has followed an amazingly steady curve in the last few decades [16]. Based largely on this trend, I believe that the creation of greater than human intelligence will occur during the next thirty years. (Vinge, 1993)
• Computers aren't terribly smart right now, but that's because the human brain has about a million times the raw power of todays' computers. [...] Since computer capacity doubles every two years or so, we expect that in about 40 years, the computers will be as powerful as human brains. (Eder 1994)
• Suppose my projections are correct, and the hardware requirements for human equivalence are available in 10 years for about the current price of a medium large computer.  Suppose further that software development keeps pace (and it should be increasingly easy, because big computers are great programming aids), and machines able to think as well as humans begin to appear in 10 years. (Moravec, 1977)

At least Moravec gives a glance towards software, even though it is merely to say that software "keeps pace" with hardware. What is the common scale for hardware and software that he seems to be using? I'd like to put Starcraft II, Excel 2003 and Cygwin on a hardware scale - do these correspond to Penitums, Ataris, and Colossus? I'm not particularly ripping into Moravec, but if you realise that software is important, then you should attempt to model software progress!

But very rarely do any of these predictors try and show why having computers with say, the memory capacity or the FOPS of a human brain, will suddenly cause an AI to emerge.

The weakest argument against AI was the standard:

• Free will (or creativity) hence no AI!

Some of the more sophisticated go "Gödel, hence no AI!". If the crux of your whole argument is that only humans can do X, then you need to show that only humans can do X - not assert it and spend the rest of your paper talking in great details about other things.

View more: Next