Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Bostrom versus Transcendence

8 Stuart_Armstrong 18 April 2014 08:31AM

SHRDLU, understanding, anthropomorphisation and hindsight bias

10 Stuart_Armstrong 07 April 2014 09:59AM

EDIT: Since I didn't make it sufficiently clear, the point of this post was to illustrate how the GOFAI people could have got so much wrong and yet still be confident in their beliefs, by looking at what the results of one experiment - SHRDLU - must have felt like to those developers at the time. The post is partially to help avoid hindsight bias: it was not obvious that they were going wrong at the time.


SHRDLU was an early natural language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. It was a program that moved objects in a simulated world and could respond to instructions on how to do so. It caused great optimism in AI research, giving the impression that a solution to natural language parsing and understanding were just around the corner. Symbolic manipulation seemed poised to finally deliver a proper AI.

Before dismissing this confidence as hopelessly naive (which it wasn't) and completely incorrect (which it was), take a look at some of the output that SHRDLU produced, when instructed by someone to act within its simulated world:

continue reading »

Logical thermodynamics: towards a theory of self-trusting uncertain reasoning

5 Squark 28 March 2014 04:06PM

Followup to: Overcoming the Loebian obstacle using evidence logic

In the previous post I proposed a probabilistic system of reasoning for overcoming the Loebian obstacle. For a consistent theory it seems natural the expect such a system should yield a coherent probability assignment in the sense of Christiano et al. This means that

a. provably true sentences are assigned probability 1

b. provably false sentences are assigned probability 0

c. The following identity holds for any two sentences φ, ψ

[1] P(φ) = P(φ and ψ) + P(φ and not-ψ)

In the previous formalism, conditions a & b hold but condition c is violated (at least I don't see any reason it should hold).

In this post I attempt to achieve the following:

  • Solve the problem above.
  • Generalize the system to allow for logical uncertainty induced by bounded computing resources. Note that although the original system is already probabilistic, in is not uncertain in the sense of assigning indefinite probability to the zillionth digit of pi. In the new formalism, the extent of uncertainty is controlled by a parameter playing the role of temperature in a Maxwell-Boltzmann distribution.


Define a probability field to be a function p : {sentences} -> [0, 1] satisfying the following conditions:

  • If φ is a tautology in propositional calculus (e.g. φ = ψ or not-ψ) then p(φ) = 1
  • For all φ: p(not-φ) = 1 - p(φ)
  • For all φ, ψ: P(φ) = P(φ and ψ) + P(φ and not-ψ)
Probability fields are a convex set: a convex linear combination of probability fields is a probability field. Essentially, probability fields are probability measures in the space of truth assignments consistent w.r.t. propositional calculus.

We define the energy of a probability field p to be E(p) := Σφ Σv 2-l(v) Eφ,v(p(φ)). Here v are pieces of evidence as defined in the previous post, Eφ,v are their associated energy functions and l(v) is the length of (the encoding of) v. We assume  that the encoding of v contains the encoding of the sentence φ for which it is evidence and Eφ,v(p(φ)) := 0 for all φ except the relevant one. Note that the associated energy functions are constructed in the same way as in the previous post, however they are not the same because of the self-referential nature of the construction: it refers to final probability assignment.

The final probability assignment is defined to be

P(φ) = Integralp [e-E(p)/T p(φ)] / Integralp e-E(p)/T

Here T >= 0 is a parameter representing the magnitude of logical uncertainty. The integral is infinite-dimensional so it's not obviously well-defined. However, I suspect it can be defined by truncating to a finite set of statements and taking a limit wrt this set. In the limit T -> 0, the expression should correspond to computing the centroid of the set of minima of E (which is convex because E is convex).


  • Obviously this construction is merely a sketch and work is required to show that
    • The infinite-dimensional integrals are well-defined
    • The resulting probability assignment is coherent for consistent theories and T = 0
    • The system overcomes the Loebian obstacle for tiling agents in some formal sense
  • For practical application to AI we'd like an efficient way to evaluate these probabilities. Since the form of the probabilities is analogous to statistical physics, it is suggestive to use similarly inspired Monte Carlo algorithms.


Agents with Cartesian childhood and Physicalist adulthood

5 Squark 22 March 2014 08:20PM

Followup to: Updateless intelligence metrics in the multiverse

In the previous post I explained how to define a quantity that I called "the intelligence metric" which allows comparing intelligence of programs written for a given hardware. It is a development of the ideas by Legg and Hutter which accounts for the "physicality" of the agent i.e. that the agent should be aware it is part of the physical universe it is trying to model (this desideratum is known as naturalized induction). My construction of the intelligence metric exploits ideas from UDT, translating them from the realm of decision algorithms to the realm of programs which run on an actual piece of hardware with input and output channels, with all the ensuing limitations (in particular computing resource limitations).

In this post I present a variant of the formalism which overcomes a certain problem implicit in the construction. This problem has to do with overly strong sensitivity to the choice of a universal computing model used in constructing Solomonoff measure. The solution sheds some interesting light on how the development of the seed AI should occur.

Structure of this post:

  • A 1-paragraph recap of how the updateless intelligence formalism works. The reader interested in technical details is referred to the previous post.
  • Explanation of the deficiencies in the formalism I set out to overcome.
  • Explanation of the solution.
  • Concluding remarks concerning AI safety and future development.

TLDR of the previous formalism

The metric is a utility expectation value over a Solomonoff measure in the space of hypotheses describing a "Platonic ideal" version of the target hardware. In other words it is an expectation value over all universes containing this hardware in which the hardware cannot "break" i.e. violate the hardware's intrinsic rules. For example, if the hardware in question is a Turing machine, the rules are the time evolution rules of the Turing machine, if the hardware in question is a cellular automaton, the rules are the rules of the cellular automaton. This is consistent with the agent being Physicalist since the utility function is evaluated on a different universe (also distributed according to a Solomonoff measure) which isn't constrained to contain the hardware or follow its rules. The coupling between these two different universes is achieved via the usual mechanism of interaction between the decision algorithm and the universe in UDT i.e. by evaluating expectation values conditioned on logical counterfactuals.


The Solomonoff measure depends on choosing a universal computing model (e.g. a universal Turing machine). Solomonoff induction only depends on this choice weakly in the sense that any Solomonoff predictor converges to the right hypothesis given enough time. This has to do with the fact that Kolmogorov complexity only depends on the choice of universal computing model through an O(1) additive correction. It is thus a natural desideratum for the intelligence metric to depend on the universal computing model weakly in some sense. Intuitively, the agent in question should always converge to the right model of the universe it inhabits regardless of the Solomonoff prior with which it started. 

The problem with realizing this expectation has to do with exploration-exploitation tradeoffs. Namely, if the prior strongly expects a given universe, the agent would be optimized for maximal utility generation (exploitation) in this universe. This optimization can be so strong that the agent would lack the faculty to model the universe in any other way. This is markedly different from what happens with AIXI since our agent has limited computing resources to spare and it is physicalist therefore its source code might have side effects important to utility generation that have nothing to do with the computation implemented by the source code. For example, imagine that our Solomonoff prior assigns very high probability to a universe inhabited by Snarks. Snarks have the property that once they see a robot programmed with the machine code "000000..." they immediately produce a huge pile of utilons. On the other hand, when they see a robot programmed with any other code they immediately eat it and produce a huge pile of negative utilons. Such a prior would result in the code "000000..." being assigned the maximal intelligence value even though it is everything but intelligent. Observe that there is nothing preventing us from producing a Solomonoff prior with such bias since it is possible to set the probabilities of any finite collection of computable universes to any non-zero values with sum < 1.

More precisely, the intelligence metric involves two Solomonoff measures: the measure of the "Platonic" universe and the measure of the physical universe. The latter is not really a problem since it can be regarded to be a part of the utility function. The utility-agnostic version of the formalism assumes a program for computing the utility function is read by the agent from a special storage. There is nothing to stop us from postulating that the agent reads another program from that storage which is the universal computer used for defining the Solomonoff measure over the physical universe. However, this doesn't solve our problem since even if the physical universe is distributed with a "reasonable" Solomonoff measure (assuming there is such a thing), the Platonic measure determines in which portions of the physical universe (more precisely multiverse) our agent manifests.

There is another way to think about this problem. If the seed AI knows nothing about the universe except the working of its own hardware and software, the Solomonoff prior might be insufficient "information" to prevent it from making irreversible mistakes early on. What we would like to do is to endow it from the first moment with the sum of our own knowledge, but this might prove to be very difficult.


Imagine the hardware architecture of our AI to be composed of two machines. One I call the "child machine", the other the "adult machine". The child machine receives data from the same input channels (and "utility storage") as the adult machine and is able to read the internal state of the adult machine itself or at least the content of its output channels. However, the child machine has no output channels of its own. The child machine has special memory called "template memory" into which it has unlimited write access. There a single moment in time ("end of childhood"), determined by factors external to both machines (i.e. the human operator) in which the content of the template memory is copied into the instruction space of the adult machine. Thus, the child machine's entire role is making observations and using them to prepare a program for the adult machine which will be eventually loaded into the latter.

The new intelligence metric assigns intelligence values to programs for the child machine. For each hypothesis describing the Platonic universe (which now contains both machines, the end of childhood time value and the entire ruleset of the system) we compute the utility expectation value under the following logical counterfactual condition: "The program loaded into template memory at the end of childhood is the same as would result from the given program for the child machine if this program for the child machine would be run with the inputs actually produced by the given hypothesis regarding the Platonic universe". The intelligence value is then the expectation value of that quantity with respect to a Solomonoff measure over hypotheses describing the Platonic universe.

The important property of the logical counterfactual is that it doesn't state the given program is actually loaded into the child machine. It only says the resulting content of the template memory is the same as which would be obtained from the given program assuming all the laws of the Platonic universe hold. This formulation prevents exploitation of side effects of the child source code since the condition doesn't fix the source code, only its output. Effectively, the child agents considers itself to be Cartesian, i.e. can consider neither the side effects of its computations nor the possibility the physical universe will violate the laws of its machinery. On the other hand the child's output (the mature program) is a physicalist agent since it affects the physical universe by manifesting in it.

If such an AI is implemented in practice, it makes sense to prime the adult machine with a "demo" program which will utilize the output channels in various ways and do some "exploring" using its input channels. This would serve to provide the child with as much as possible information.

To sum up, the new expression for the intelligence metric is:

I(q) = EHX[EHY(Ec(X))[EL[U(Y, Eu(X)) | Q(X, t(X)) = Q*(X; q)]] | N]


  • q is the program priming the child machine
  • HX is the hypothesis producing the Platonic universe X (a sequence of bits encoding the state of the hardware as a function of time and the end-of-childhood time t(X)). It is a program for a fixed universal computing model C.
  • HY is the hypothesis producing the Physical universe (an abstract sequence of bits). It is a program for the universal computer program ("virtual machine") Ec(X) written into storage E in X.
  • EL is logical expectation value defined e.g. using evidence logic.
  • Eu(X) is a program for computing the utility function which is written into storage E in X.
  • U is the utility function which consists of applying Eu(X) to Y.
  • Q(X, t(X)) is the content of template memory at time t(X).
  • Q*(X; q) is the content that would be in the template memory if it was generated by program q receiving the inputs going into the child machine under hypothesis HX.
  • N is the full ruleset of the hardware including the reprogramming of the adult machine that occurs at t(X).

Concluding Remarks

  • It would be very valuable to formulate and prove a mathematical theorem which expresses the sense in which the new formalism depends on the choice of universal computing model weakly (in particular it would validate the notion).
  • This formalism might have an interesting implication on AI safety. Since the child agent is Cartesian and has no output channels (it cannot create output channels because it is Cartesian) it doesn't present as much risk as an adult AI. Imagine template memory is write-only (which is not a problem for the formalism) and is implemented by a channel that doesn't store the result anywhere (in particular the mature program is never run). There can still be risk due to side effects of the mature program that manifest through presence of its partial or full versions in (non-template) memory of the child machine. For example, imagine the mature program is s.t. any person who reads it experiences compulsion to run it. This risk can be mitigated by allowing both machines to interact only with a virtual world which receives no inputs from the external reality. Of course the AI might still be able to deduce external reality. However, this can be prevented by exploiting prior bias: we can equip the AI with a Solomonoff prior that favors the virtual world to such extent that it would have no reason to deduce the real world. This way the AI is safe unless it invents a "generic" box-escaping protocol which would work in a huge variety of different universes that might contain the virtual world.
  • If we factor finite logical uncertainty into evaluation of the logical expectation value EL, the plot thickens. Namely, a new problem arises related to bias in the "logic prior". To solve this new problem we need to introduce yet another stage into AI development which might be dubbed "fetus". The fetus has no access to external inputs and is responsible for building a sufficient understanding of mathematics in the same sense the child is responsible to build a sufficient understanding of physics. Details will follow in subsequent posts, so stay tuned!

Friendly AI ideas needed: how would you ban porn?

6 Stuart_Armstrong 17 March 2014 06:00PM

To construct a friendly AI, you need to be able to make vague concepts crystal clear, cutting reality at the joints when those joints are obscure and fractal - and them implement a system that implements that cut.

There are lots of suggestions on how to do this, and a lot of work in the area. But having been over the same turf again and again, it's possible we've got a bit stuck in a rut. So to generate new suggestions, I'm proposing that we look at a vaguely analogous but distinctly different question: how would you ban porn?

Suppose you're put in change of some government and/or legal system, and you need to ban pornography, and see that the ban is implemented. Pornography is the problem, not eroticism. So a lonely lower-class guy wanking off to "Fuck Slaves of the Caribbean XIV" in a Pussycat Theatre is completely off. But a middle-class couple experiencing a delicious frisson when they see a nude version of "Pirates of Penzance" at the Met is perfectly fine - commendable, even.

The distinction between the two case is certainly not easy to spell out, and many are reduced to saying the equivalent of "I know it when I see it" when defining pornography. In terms of AI, this is equivalent with "value loading": refining the AI's values through interactions with human decision makers, who answer questions about edge cases and examples and serve as "learned judges" for the AI's concepts. But suppose that approach was not available to you - what methods would you implement to distinguish between pornography and eroticism, and ban one but not the other? Sufficiently clear that a scriptwriter would know exactly what they need to cut or add to a movie in order to move it from one category to the other? What if the nude "Pirates of of Penzance" was at a Pussycat Theatre and "Fuck Slaves of the Caribbean XIV" was at the Met?

To get maximal creativity, it's best to ignore the ultimate aim of the exercise (to find inspirations for methods that could be adapted to AI) and just focus on the problem itself. Is it even possible to get a reasonable solution to this question - a question much simpler than designing a FAI?

Reduced impact in practice: randomly sampling the future

1 Stuart_Armstrong 17 March 2014 04:30PM

This was an idea developed along with Eliezer when he was at the FHI in November.

In my post on "reduced impact AI" I talked about how you could estimate the impact of an AI by (hypothetically) observing a slice of the future. When Eliezer was here last November, we talked about how this could be cached out, and came up with a model.

Ignore the "output channel" from the original reduced impact post, and consider this situation:

We have two computers running in two similarly shielded rooms. One of them is a simple algorithm running through the digits of e (call it run-e), that was soon after deleted. Another is our dear old friend clippy, the AI motivated to maximise paperclips. Without removing clippy's paperclip preferences, we'd like to motivate it to have next to no impact on the world. How should we define "next to no impact"?

The obvious think is to compare it with run-e, and require it to have no more expected impact than that machine. How could this be measured? Well, the original idea was to put a stochastic process on clippy as it was turned on (inside the shielded room) and say that clippy has reduced impact if we can't find any extra evidence, from a vantage point 50 years in the future, that clippy was successfully turned on. Now put the stochastic same process on run-e and define:

Clippy has reduced impact if, from a vantage of 50 years into the future, we have no more evidence that clippy was turned on than we have of run-e being turned on.

continue reading »

Overcoming the Loebian obstacle using evidence logic

4 Squark 14 March 2014 06:34PM

In this post I intend to:

  • Briefly explain the Loebian obstacle and it's relevance to AI (feel free to skip it if you know what the Loebian obstacle is).
  • Suggest a solution in the form a formal system which assigns probabilities (more generally probability intervals) to mathematical sentences (and which admits a form of "Loebian" self-referential reasoning). The method is well-defined both for consistent and inconsistent axiomatic systems, the later being important in analysis of logical counterfactuals like in UDT.



When can we consider a mathematical theorem to be established? The obvious answer is: when we proved it. Wait, proved it in what theory? Well, that's debatable. ZFC is popular choice for mathematicians, but how do we know it is consistent (let alone sound, i.e. that it only proves true sentences)? All those spooky infinite sets, how do you know it doesn't break somewhere along the line? There's lots of empirical evidence, but we can't prove it, and it's proofs we're interesting in, not mere evidence, right?

Peano arithmetic seems like a safer choice. After all, if the natural numbers don't make sense, what does? Let's go with that. Suppose we have a sentence s in the language of PA. If someone presents us with a proof p in PA, we believe s is true. Now consider the following situations: instead of giving you a proof of s, someone gave you a PA-proof p1 that p exists. After all, PA admits defining "PA-proof" in PA language. Common sense tells us that p1 is a sufficient argument to believe s. Maybe, we can prove it within PA? That is, if we have a proof of "if a proof of s exists then s" and a proof of R(s)="a proof of s exists" then we just proved s. That's just modus ponens

There are two problems with that.

First, there's no way to prove the sentence L:="for all s if R(s) then s", since it's not a PA-sentence at all. The problem is that "for all s" references s as a natural number encoding a sentence. On the other hand, "then s" references s as the truth-value of the sentence. Maybe we can construct a PA-formula T(s) which means "the sentence encoded by the number s is true"? Nope, that would get us in trouble with the liar paradox (it would be possible to construct a sentence saying "this sentence is false").

Second, Loeb's theorem says that if we can prove L(s):="if R(s) exists then s" for a given s, then we can prove s. This is a problem since it means there can be no way to prove L(s) for all s in any sense, since it's unprovable for s which are unprovable. In other words, if you proved not-s, there is no way to conclude that "no proof of s exists".

What if we add an inference rule Q to our logic allowing to go from R(s) to s? Let's call the new formal system PA1p1 appended by a Q-step becomes an honest proof of s in PA1. Problem solved? Not really! Now someone can give you a proof of 
R1(s):="a PA1-proof of s exists". Back to square one! Wait a second, what if we add a new rule Q1 allowing to go from R1(s) to s? OK, but now we got R2(s):="a PA2-proof of s exists". Hmm, what if add an infinite number of rules Qk? Fine, but now we got Rω(s):="a PAω-proof of s exists". And so on, and so forth, the recursive ordinals are a plenty...

Bottom line, Loeb's theorem works for any theory containing PA, so we're stuck.


Suppose you're trying to build a self-modifying AGI called "Lucy". Lucy works by considering possible actions and looking for formal proofs that taking one of them will increase expected utility. In particular, it has self-modifying actions in its strategy space. A self-modifying action creates essentially a new agent: Lucy2. How can Lucy decide that becoming Lucy2 is a good idea? Well, a good step in this direction would be proving that Lucywould only take actions that are "good". I.e., we would like Lucy to reason as follows "Lucyuses the same formal system as I, so if she decides to take action a, it's because she has a proof p of the sentence s(a) that 'a increases expected utility'. Since such a proof exits, a does increase expected utility, which is good news!" Problem: Lucy is using L in there, applied to her own formal system! That cannot work! So, Lucy would have a hard time self-modifying in a way which doesn't make its formal system weaker

As another example where this poses a problem, suppose Lucy observes another agent called "Kurt". Lucy knows, by analyzing her sensory evidence, that Kurt proves theorems using the same formal system as Lucy. Suppose Lucy found out that Kurt proved theorem s, but she doesn't know how. We would like Lucy to be able to conclude s is, in fact, true (at least with the probability that her model of physical reality is correct). Alas, she cannot.

See MIRI's paper for more discussion.

Evidence Logic

Here, cousin_it explains a method to assign probabilities to sentences in an inconsistent theory T. It works as follows. Consider sentence s. Since T is inconsistent, there are T-proofs both of s and of not-s. Well, in a courtroom both sides are allowed to have arguments, why not try the same approach here? Let's weight the proofs as a function of their length, analogically to weighting hypotheses in Solomonoff induction. That is, suppose we have a prefix-free encoding of proofs as bit sequences. Then, it makes sense to consider a random bit sequence and ask whether it is a proof of something. Define the probability of s to be

P(s) := (probability of a random sequence to be a proof of s) / (probability of a random sequence to be a proof of s or not-s)

Nice, but it doesn't solve the Loebian obstacle yet.

I will now formulate an extension of this idea that allows assigning an interval of probabilities [Pmin(s), Pmax(s)] to any sentence s. This interval is a sort of "Knightian uncertainty". I have some speculations how to extract a single number from this interval in the general case, but even without that, I believe that Pmin(s) = Pmax(s) in many interesting cases.

First, the general setting:

  • With every sentence s, there are certain texts v which are considered to be "evidence relevant to s". These are divided into "negative" and "positive" evidence. We define sgn(v) := +1 for positive evidence, sgn(v) := -1 for negative evidence.
  • Each piece of evidence v is associated with the strength of the evidence strs(v) which is a number in [0, 1]
  • Each piece of evidence v is associated with an "energy" function es,v : [0, 1] -> [0, 1]. It is a continuous convex function.
  • The "total energy" associated with s is defined to b es := ∑v 2-l(ves,v where l(v) is the length of v.
  • Since es,v are continuous convex, so is es. Hence it attains its minimum on a closed interval which is 
    [Pmin(s), Pmax(s)] by definition.
Now, the details:
  • A piece of evidence v for s is defined to be one of the following:
    • a proof of s
      • sgn(v) := +1
      • strs(v) := 1
      • es,v(q) := (1 - q)2
    • a proof of not-s
      • sgn(v) := -1
      • strs(v) := 1
      • es,v(q) := q2
    • a piece of positive evidence for the sentence R-+(s, p) := "Pmin(s) >= p"
      • sgn(v) := +1
      • strs(v) := strR-+(s, p)(v) p
      • es,v(q) := 0 for q > p; strR-+(s, p)(v) (q - p)2 for q < p
    • a piece of negative evidence for the sentence R--(s, p) := "Pmin(s) < p"
      • sgn(v) := +1
      • strs(v) := strR--(s, p)(v) p
      • es,v(q) := 0 for q > p; strR--(s, p)(v) (q - p)2 for q < p
    • a piece of negative evidence for the sentence R++(s, p) := "Pmax(s) > p"
      • sgn(v) := -1
      • strs(v) := strR++(s, p)(v) (1 - p)
      • es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
    • a piece of positive evidence for the sentence R+-(s, p) := "Pmax(s) <= p"
      • sgn(v) := -1
      • strs(v) := strR+-(s, p)(v) (1 - p)
      • es,v(q) := 0 for q < p; strR-+(s, p)(v) (q - p)2 for q > p
Technicality: I suggest that for our purposes, a "proof of s" is allowed to be a proof of sentence equivalent to s in 0-th order logic (e.g. not-not-s). This ensures that our probability intervals obey the properties we'd like them to obey wrt propositional calculus.

Now, consider again our self-modifying agent Lucy. Suppose she makes her decisions according to a system of evidence logic like above. She can now reason along the lines of "Lucyuses the same formal system as I. If she decides to take action a, it's because she has strong evidence for the sentence s(a) that 'a increases expected utility'. I just proved that there would be strong evidence for the expected utility increasing. Therefore, the expected utility would have a high value with high logical probability. But evidence for high logical probability of a sentence is evidence for the sentence itself. Therefore, I now have evidence that expected utility will increase!"

This analysis is very sketchy, but I think it lends hope that the system leads to the desired results.

Updateless Intelligence Metrics in the Multiverse

6 Squark 08 March 2014 12:25AM

Followup to: Intelligence Metrics with Naturalized Induction using UDT

In the previous post I have defined an intelligence metric solving the duality (aka naturalized induction) and ontology problems in AIXI. This model used a formalization of UDT using Benja's model of logical uncertainty. In the current post I am going to:

  • Explain some problems with my previous model (that section can be skipped if you don't care about the previous model and only want to understand the new one).
  • Formulate a new model solving these problems. Incidentally, the new model is much closer to the usual way UDT is represented. It is also based on a different model of logical uncertainty.
  • Show how to define intelligence without specifying the utility function a priori.
  • Since the new model requires utility functions formulated with abstract ontology i.e. well-defined on the entire Tegmark level IV multiverse. These are generally difficult to construct (i.e. the ontology problem resurfaces in a different form). I outline a method for constructing such utility functions.

Problems with UIM 1.0

The previous model postulated that naturalized induction uses a version of Solomonoff induction updated in the direction of an innate model N with a temporal confidence parameter t. This entails several problems:

  • The dependence on the parameter t whose relevant value is not easy to determine.
  • Conceptual divergence from the UDT philosophy that we should not update at all.
  • Difficulties with counterfactual mugging and acausal trade scenarios in which G doesn't exist in the "other universe".
  • Once G discovers even a small violation of N at a very early time, it loses all ground for trusting its own mind. Effectively, G would find itself in the position of a Boltzmann brain. This is especially dangerous when N over-specifies the hardware running G's mind. For example assume N specifies G to be a human brain modeled on the level of quantum field theory (particle physics). If G discovers that in truth it is a computer simulation on the merely molecular level, it loses its epistemic footing completely.

UIM 2.0

I now propose the following intelligence metric (the formula goes first and then I explain the notation):

IU(q) := ET[ED[EL[U(Y(D)) | Q(X(T)) = q]] | N]

  • N is the "ideal" model of the mind of the agent G. For example, it can be a universal Turing machine M with special "sensory" registers e whose values can change arbitrarily after each step of M. N is specified as a system of constraints on an infinite sequence of natural numbers X, which should be thought of as the "Platonic ideal" realization of G, i.e. an imagery realization which cannot be tempered with by external forces such as anvils. As we shall see, this "ideal" serves as a template for "physical" realizations of G which are prone to violations of N.
  • Q is a function that decodes G's code from X e.g. the program loaded in M at time 0. q is a particular value of this code whose (utility specific) intelligence IU(q) we are evaluating.
  • T is a random (as in random variable) computable hypothesis about the "physics" of X, i.e a program computing X implemented on some fixed universal computing model (e.g. universal Turing machine) C. T is distributed according to the Solomonoff measure however the expectation value in the definition of IU(q) is conditional on N, i.e. we restrict to programs which are compatible with N. From the UDT standpoint, T is the decision algorithm itself and the uncertainty in T is "introspective" uncertainty i.e. the uncertainty of the putative precursor agent PG (the agent creating G e.g. an AI programmer) regarding her own decision algorithm. Note that we don't actually need to postulate a PG which is "agenty" (i.e. use for N a model of AI hardware together with a model of the AI programmer programming this hardware), we can be content to remain in a more abstract framework.
  • D is a random computable hypothesis about the physics of Y, where Y is an infinite sequence of natural numbers representing the physical (as opposed to "ideal") universe. D is distributed according to the Solomonoff measure and the respective expectation value is unconditional (i.e. we use the raw Solomonoff prior for Y which makes the model truly updateless). In UDT terms, D is indexical uncertainty.
  • U is a computable function from infinite sequences of natural numbers to [0, 1] representing G's utility function. 
  • L represents logical uncertainty. It can be defined by the model explained by cousin_it here, together with my previous construction for computing logical expectation values of random variables in [0, 1]. That is, we define EL(dk) to be the probability that a random string of bits p encodes a proof of the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 1" in some prefix-free encoding of proofs conditional on p encoding the proof of either that sentence or the sentence "Q(X(T)) = q implies that the k-th digit of U(Y(D)) is 0". We then define 
    EL[U(Y(D)) | Q(X(T)) = q] := Σk 2-k EL(dk). Here, the sentences and the proofs belong to some fixed formal logic F, e.g. Peano arthimetics or ZFC.  


  • G's mental architecture N is defined in the "ideal" universe X where it is inviolable. However, G's utility function U inhabits the physical universe Y. This means that a highly intelligent q is designed so that imperfect realizations of G inside Y generate as many utilons as possible. A typical T is a low Kolmogorov complexity universe which contains a perfect realization of G. Q(X(T)) is L-correlated to the programming of imperfect realizations of G inside Y because T serves as an effective (approximate) model of the formation of these realizations. For abstract N, this means q is highly intelligent when a Solomonoff-random "M-programming process" producing q entails a high expected value of U.
  • Solving the Loebian obstacle requires a more sophisticated model of logical uncertainty. I think I can formulate such a model. I will explain it in another post after more contemplation.
  • It is desirable that the encoding of proofs p satisfies a universality property so that the length of the encoding can only change by an additive constant, analogically to the weak dependence of Kolmogorov complexity on C. It is in fact not difficult to formulate this property and show the existence of appropriate encodings. I will discuss this point in more detail in another post.

Generic Intelligence

It seems conceptually desirable to have a notion of intelligence independent of the specifics of the utility function. Such an intelligence metric is possible to construct in a way analogical to what I've done in UIM 1.0, however it is no longer a special case of the utility-specific metric.

Assume N to consist of a machine M connected to a special storage device E. Assume further that at X-time 0, E contains a valid C-program u realizing a utility function U, but that this is the only constraint on the initial content of E imposed by N. Define

I(q) := ET[ED[EL[u(Y(D); X(T)) | Q(X(T)) = q]] | N]

Here, u(Y(D); X(T)) means that we decode u from X(T) and evaluate it on Y(D). Thus utility depends both on the physical universe Y and the ideal universe X. This means G is not precisely a UDT agent but rather a "proto-agent": only when a realization of G reads u from E it knows which other realizations of G in the multiverse (the Solomonoff ensemble from which Y is selected) should be considered as the "same" agent UDT-wise.

Incidentally, this can be used as a formalism for reasoning about agents that don't know their utility functions. I believe this has important applications in metaethics I will discuss in another post.

Utility Functions in the Multiverse

UIM 2.0 is a formalism that solves the diseases of UIM 1.0 at the price of losing N in the capacity of the ontology for utility functions. We need the utility function to be defined on the entire multiverse i.e. on any sequence of natural numbers. I will outline a way to extend "ontology-specific" utility functions to the multiverse through a simple example.

Suppose G is an agent that cares about universes realizing the Game of Life, its utility function U corresponding to e.g. some sort of glider maximization with exponential temporal discount. Fix a specific way DC to decode any Y into a history of a 2D cellular automaton with two cell states ("dead" and "alive"). Our multiversal utility function U* assigns Ys for which DC(Y) is a legal Game of Life the value U(DC(Y)). All other Ys are treated by dividing the cells into cells O obeying the rules of Life and cells V violating the rules of Life. We can then evaluate U on O only (assuming it has some sort of locality) and assign V utility by some other rule, e.g.:

  • zero utility
  • constant utility per V cell with temporal discount
  • constant utility per unit of surface area of the boundary between O and with temporal discount 
U*(Y) is then defined to be the sum of the values assigned to O(Y) and V(Y).


  • The construction of U* depends on the choice of DC. However, U* only depends on DC weakly since given a hypothesis D which produces a Game of Life wrt some other low complexity encoding, there is a corresponding hypothesis D' producing a Game of Life wrt DC. D' is obtained from D by appending a corresponding "transcoder" and thus it is only less Solomonoff-likely than D by an O(1) factor.
  • Since the accumulation between O and V is additive rather than e.g. multiplicative, a U*-agent doesn't behave as if it a priori expects the universe the follow the rules of Life but may have strong preferences about the universe actually doing it.
  • This construction is reminiscent of Egan's dust theory in the sense that all possible encodings contribute. However, here they are weighted by the Solomonoff measure.


The intelligence of a physicalist agent is defined to be the UDT-value of the "decision" to create the agent by the process creating the agent. The process is selected randomly from a Solomonoff measure conditional on obeying the laws of the hardware on which the agent is implemented. The "decision" is made in an "ideal" universe in which the agent is Cartesian, but the utility function is evaluated on the real universe (raw Solomonoff measure). The interaction between the two "universes" is purely via logical conditional probabilities (acausal).

If we want to discuss intelligence without specifying a utility function up front, we allow the "ideal" agent to read a program describing the utility function from a special storage immediately after "booting up".

Utility functions in the Tegmark level IV multiverse are defined by specifying a "reference universe", specifying an encoding of the reference universe and extending a utility function defined on the reference universe to encodings which violate the reference laws by summing the utility of the portion of the universe which obeys the reference laws with some function of the space-time shape of the violation.

How to Study Unsafe AGI's safely (and why we might have no choice)

10 Punoxysm 07 March 2014 07:24AM


A serious possibility is that the first AGI(s) will be developed in a Manhattan Project style setting before any sort of friendliness/safety constraints can be integrated reliably. They will also be substantially short of the intelligence required to exponentially self-improve. Within a certain range of development and intelligence, containment protocols can make them safe to interact with. This means they can be studied experimentally, and the architecture(s) used to create them better understood, furthering the goal of safely using AI in less constrained settings.

Setting the Scene

The year is 2040, and in the last decade a series of breakthroughs in neuroscience, cognitive science, machine learning, and computer hardware have put the long-held dream of a human-level artificial intelligence in our grasp. The wild commercial success of lifelike robotic pets, the integration into everyday work and leisure of AI assistants and concierges, and STUDYBOT's graduation from Harvard's Online degree program with an octuple major and full honors, DARPA, the NSF and the European Research Council have announced joint funding of an artificial intelligence program that will create a superhuman intelligence in 3 years.

Safety was announced as a critical element of the project, especially in light of the self-modifying LeakrVirus that catastrophically disrupted markets in 36 and 37. The planned protocols have not been made public, but it seems they will be centered in traditional computer security rather than techniques from the nascent field of Provably Safe AI, which were deemed impossible to integrate on the current project timeline.

Technological and/or Political issues could force the development of AI without theoretical safety guarantees that we'd certainly like, but there is a silver lining

A lot of the discussion around LessWrong and MIRI that I've seen (and I haven't seen all of it, please send links!) seems to focus very strongly on the situation of an AI that can self-modify or construct further AIs, resulting in an exponential explosion of intelligence (FOOM/Singularity). The focus on FAI is on finding an architecture that can be explicitly constrained (and a constraint set that won't fail to do what we desire).

My argument is essentially that there could be a critical multi-year period preceding any possible exponentially self-improving intelligence during which a series of AGIs of varying intelligence, flexibility and architecture will be built. This period will be fast and frantic, but it will be incredibly fruitful and vital both in figuring out how to make an AI sufficiently strong to exponentially self-improve and in how to make it safe and friendly (or develop protocols to bridge the even riskier period between when we can develop FOOM-capable AIs and when we can ensure their safety). 

I'll break this post into three parts.
  1. why is a substantial period of proto-singularity more likely than a straight-to-singularity situation?
  2. Second, what strategies will be critical to developing, controlling, and learning from these pre-FOOM AIs?
  3. Third, what are the political challenge that will develop immediately before and during this period?
Why is a proto-singularity likely?

The requirement for a hard singularity, an exponentially self-improving AI, is that the AI can substantially improve itself in a way that enhances its ability to further improve itself, which requires the ability to modify its own code; access to resources like time, data, and hardware to facilitate these modifications; and the intelligence to execute a fruitful self-modification strategy.

The first two conditions can (and should) be directly restricted. I'll elaborate more on that later, but basically any AI should be very carefully sandboxed (unable to affect its software environment), and should have access to resources strictly controlled. Perhaps no data goes in without human approval or while the AI is running. Perhaps nothing comes out either. Even a hyperpersuasive hyperintelligence will be slowed down (at least) if it can only interact with prespecified tests (how do you test AGI? No idea but it shouldn't be harder than friendliness). This isn't a perfect situation. Eliezer Yudkowsky presents several arguments for why an intelligence explosion could happen even when resources are constrained, (see Section 3 of Intelligence Explosion Microeconomics) not to mention ways that those constraints could be defied even if engineered perfectly (by the way, I would happily run the AI box experiment with anybody, I think it is absurd that anyone would fail it! [I've read Tuxedage's accounts, and I think I actually do understand how a gatekeeper could fail, but I also believe I understand how one could be trained to succeed even against a much stronger foe than any person who has played the part of the AI]).

But the third emerges from the way technology typically develops. I believe it is incredibly unlikely that an AGI will develop in somebody's basement, or even in a small national lab or top corporate lab. When there is no clear notion of what a technology will look like, it is usually not developed. Positive, productive accidents are somewhat rare in science, but they are remarkably rare in engineering (please, give counterexamples!). The creation of an AGI will likely not happen by accident; there will be a well-funded, concrete research and development plan that leads up to it. An AI Manhattan Project described above. But even when there is a good plan successfully executed, prototypes are slow, fragile, and poor-quality compared to what is possible even with approaches using the same underlying technology. It seems very likely to me that the first AGI will be a Chicago Pile, not a Trinity; recognizably a breakthrough but with proper consideration not immediately dangerous or unmanageable. [Note, you don't have to believe this to read the rest of this. If you disagree, consider the virtues of redundancy and the question of what safety an AI development effort should implement if they can't be persuaded to delay long enough for theoretically sound methods to become available].

A Manhattan Project style effort makes a relatively weak, controllable AI even more likely, because not only can such a project implement substantial safety protocols that are explicitly researched in parallel with primary development, but also because the total resources, in hardware and brainpower, devoted to the AI will be much greater than a smaller project, and therefore setting a correspondingly higher bar for the AGI thus created to reach to be able to successfully self-modify itself exponentially and also break the security procedures.

Strategies to handle AIs in the proto-Singularity, and why they're important

First, take a look the External Constraints Section of this MIRI Report and/or this article on AI Boxing. I will be talking mainly about these approaches. There are certainly others, but these are the easiest to extrapolate from current computer security.

These AIs will provide us with the experimental knowledge to better handle the construction of even stronger AIs. If careful, we will be able to use these proto-Singularity AIs to learn about the nature of intelligence and cognition, to perform economically valuable tasks, and to test theories of friendliness (not perfectly, but well enough to start). 

"If careful" is the key phrase. I mentioned sandboxing above. And computer security is key to any attempt to contain an AI. Monitoring the source code, and setting a threshold for too much changing too fast at which point a failsafe freezes all computation; keeping extremely strict control over copies of the source. Some architectures will be more inherently dangerous and less predictable than others. A simulation of a physical brain, for instance, will be fairly opaque (depending on how far neuroscience has gone) but could have almost no potential to self-improve to an uncontrollable degree if its access to hardware is limited (it won't be able to make itself much more efficient on fixed resources). Other architectures will have other properties. Some will be utility optimizing agents. Some will have behaviors but no clear utility. Some will be opaque, some transparent.

All will have a theory to how they operate, which can be refined by actual experimentation. This is what we can gain! We can set up controlled scenarios like honeypots to catch malevolence. We can evaluate our ability to monitor and read the thoughts of the agi. We can develop stronger theories of how damaging self-modification actually is to imposed constraints. We can test our abilities to add constraints to even the base state. But do I really have to justify the value of experimentation?

I am familiar with criticisms based on absolutley incomprehensibly perceptive and persuasive hyperintelligences being able to overcome any security, but I've tried to outline above why I don't think we'd be dealing with that case.

Political issues

Right now AGI is really a political non-issue. Blue sky even compared to space exploration and fusion both of which actually receive funding from government in substantial volumes. I think that this will change in the period immediately leading up to my hypothesized AI Manhattan Project. The AI Manhattan Project can only happen with a lot of political will behind it, which will probably mean a spiral of scientific advancements, hype and threat of competition from external unfriendly sources. Think space race.

So suppose that the first few AIs are built under well controlled conditions. Friendliness is still not perfected, but we think/hope we've learned some valuable basics. But now people want to use the AIs for something. So what should be done at this point?

I won't try to speculate what happens next (well you can probably persuade me to, but it might not be as valuable), beyond extensions of the protocols I've already laid out, hybridized with notions like Oracle AI. It certainly gets a lot harder, but hopefully experimentation on the first, highly-controlled generation of AI to get a better understanding of their architectural fundamentals, combined with more direct research on friendliness in general would provide the groundwork for this.

Intelligence Metrics with Naturalized Induction using UDT

12 Squark 21 February 2014 12:23PM

Followup to: Intelligence Metrics and Decision Theory
Related to: Bridge Collapse: Reductionism as Engineering Problem

A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.

Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:

  • The formalism is inherently Cartesian. Solving this problem is known as naturalized induction and it is discussed in detail here.
  • The utility function Legg & Hutter use is a formalization of reinforcement learning, while we would like to consider agents with arbitrary preferences. Moreover, a real AGI designed with reinforcement learning would tend to wrestle control of the reinforcement signal from the operators (there must be a classic reference on this but I can't find it. Help?). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.
Orseau and Ring proposed a non-Cartesian intelligence metric however their formalism appears to be too general, in particular there is no Solomonoff induction or any analogue thereof, instead a completely general probability measure is used.

My attempt at defining a non-Cartesian intelligence metric ran into problems of decision-theoretic flavor. The way I tried to used UDT seems unsatisfactory, and later I tried a different approach related to metatickle EDT. 

In this post, I claim to accomplish the following:
  • Define a formalism for logical uncertainty. When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.
  • Use this formalism to define a non-constructive formalization of UDT. By "non-constructive" I mean something that assigns values to actions rather than a specific algorithm like here.
  • Apply the formalization of UDT to my quasi-Solomonoff framework to yield an intelligence metric.
  • Slightly modify my original definition of the quasi-Solomonoff measure so that the confidence of the innate model becomes a continuous rather than discrete parameter. This leads to an interesting conjecture.
  • Propose a "preference agnostic" variant as an alternative to Legg & Hutter's reinforcement learning.
  • Discuss certain anthropic and decision-theoretic aspects.

Logical Uncertainty

The formalism introduced here was originally proposed by Benja.

Fix a formal system F. We want to be able to assign probabilities to statements s in F, taking into account limited computing resources. Fix D a natural number related to the amount of computing resources that I call "depth of analysis".

Define P0(s) := 1/2 for all s to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define
PD(s) := P0(s | there are no contradictions of length <= D).

Consider X to be a number in [0, 1] given by a definition in F. Then dk(X) := "The k-th digit of the binary expansion of X is 1" is a statement in F. We define ED(X) := Σk 2-k PD(dk(X)).


  • Clearly if s is provable in F then for D >> 0, PD(s) = 1. Similarly if "not s" is provable in F then for D >> 0, 
    PD(s) = 0.
  • If each digit of X is decidable in F then lim-> inf ED(X) exists and equals the value of X according to F.
  • For s of length > D, PD(s) = 1/2 since no contradiction of length <= D can involve s.
  • It is an interesting question whether lim-> inf PD(s) exists for any s. It seems false that this limit always exists and equals 0 or 1, i.e. this formalism is not a loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide.
  • In computational terms, D corresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.

Non-Constructive UDT

Consider A a decision algorithm for optimizing utility U, producing an output ("decision") which is an element of C. Here U is just a constant defined in F. We define the U-value of c in C for A at depth of analysis D to be
VD(c, A; U) := ED(U | "A produces c" is true). It is only well defined as long as "A doesn't produce c" cannot be proved at depth of analysis D i.e. PD("A produces c") > 0. We define the absolute U-value of c for A to be
V(cAU) := ED(c, A)(U | "A produces c" is true) where D(c, A) := max {D | PD("A produces c") > 0}. Of course D(cA) can be infinite in which case Einf(...) is understood to mean limD -> inf ED(...).

For example V(cAU) yields the natural values for A an ambient control algorithm applied to e.g. a simple model of Newcomb's problem.  To see this note that given A's output the value of U can be determined at low depths of analysis whereas the output of A requires a very high depth of analysis to determine.

Naturalized Induction

Our starting point is the "innate model" N: a certain a priori model of the universe including the agent G. This model encodes the universe as a sequence of natural numbers Y = (yk) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, N can describe the universe as a universal Turing machine M (representing G) with special "sensory" registers e. N constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of e. Alternatively, N can contain in addition to M a non-trivial model of the environment. Or N can be a cellular automaton with the agent corresponding to a certain collection of cells.

However, G's confidence in N is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number t which represents the timescale over which N is expected to hold. We then assign to each hypothesis H about Y (you can think about them as programs which compute yk given yj for j < k; more on that later) the weight QS(H) := 2-L(H(1 - e-t(H)/t). Here L(H) is the length of H's encoding in bits and t(H) is the time during which H remains compatible with N. This is defined for N of deterministic / constraint type but can be generalized to stochastic N

The weights QS(H) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories Y. Thus we get an alternative to Solomonoff induction which allows for G to be a mechanistic part of the universe, at the price of introducing N and t


  • Note that time is discrete in this formalism but t is continuous.
  • Since we're later going to use logical uncertainties wrt the formal system F, it is tempting to construct the hypothesis space out of predicates in F rather than programs.

Intelligence Metric

To assign intelligence to agents we need to add two ingredients:

  • The decoding Q: {Y} -> {bit-string} of the agent G from the universe Y. For example Q can read off the program loaded into M at time k=0.
  • A utility function U: {Y} -> [0, 1] representing G's preferences. U has to be given by a definition in F. Note that N provides the ontology wrt which U is defined.
It seems tempting to define the intelligence to be EQS(U | Q), the conditional expectation value of U for a given value of Q in the quasi-Solomonoff measure. However, this is wrong for roughly the same reasons EDT is wrong (see previous post for details).

Instead, we define I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above. 


  • IMO the correct way to look at this is intelligence metric = value of decision for the decision problem "what should I program into my robot?". If N is a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simple N (also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).
    • As opposed to usual UDT, the algorithm (H) making the decision (Q) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott
  • For high values of t, G can improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values of t. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical value t* of t from which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(Q) is non-smooth at t* (phase transition).
  • If we wish to understand intelligence, it might be beneficial to decouple it from the choice of preferences. To achieve this we can introduce the preference formula as an unknown parameter in N. For example, if G is realized by a machine M, we can connect M to a data storage E whose content is left undetermined by N. We can then define U to be defined by the formula encoded in E at time k=0. This leads to I(Q) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning.
  • As opposed to Legg-Hutter intelligence, there appears to be no simple explicit description for Q* maximizing I(Q) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations: G's computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects.
  • Ceteris paribus, Q* deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in which G doesn't exist in the "other universe". This is because the ontology used for defining U (which is given by N) assumes G does exist. At least this is the case for simple ontologies like described above: possibly we can construct N in which G might or might not exist. Also, if G uses a quantum ontology (i.e. N describes the universe in terms of a wavefunction and U computes the quantum expectation value of an operator) then it does take into account other Everett universes in which G doesn't exist.
  • For many choices of N (for example if the G is realized by a machine M), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:
    • This is not the case for all N. In particular, if N admits destruction of M then M's sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction of M if we want G's preferences to behave properly in such an event. That is, if we don't allow it we get a "weak anvil problem" in the sense that G experiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for example G might come to the conclusion the dynamics of "M's ghost" will be some sort of random.
    • These probabilities probably depend significantly on N and don't amount to an elegant universal law for solving the anthropic trilemma.
    • Indeed this framework is not completely "updateless", it is "partially updated" by the introduction of N and t. This suggests we might want the updates to be minimal in some sense, in particular t should be t*.
  • The framework suggests there is no conceptual problem with cosmologies in which Boltzmann brains are abundant. Q* wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated with N. I doubt this argument is original but I feel the framework validates it to some extent.


The first AI probably won't be very smart

-2 jpaulson 16 January 2014 01:37AM

Claim: The first human-level AIs are not likely to undergo an intelligence explosion.

1) Brains have a ton of computational power: ~86 billion neurons and trillions of connections between them. Unless there's a "shortcut" to intelligence, we won't be able to efficiently simulate a brain for a long time. http://io9.com/this-computer-took-40-minutes-to-simulate-one-second-of-1043288954 describes one of the largest computers in the world simulating 1s of brain activity in 40m (i.e. this "AI" would think 2400 times slower than you or me). The first AIs are not likely to be fast thinkers.

2) Being able to read your own source code does not mean you can self-modify. You know that you're made of DNA. You can even get your own "source code" for a few thousand dollars. No humans have successfully self-modified into an intelligence explosion; the idea seems laughable.

3) Self-improvement is not like compound interest: if an AI comes up with an idea to modify it's source code to make it smarter, that doesn't automatically mean it will have a new idea tomorrow. In fact, as it picks off low-hanging fruit, new ideas will probably be harder and harder to think of. There's no guarantee that "how smart the AI is" will keep up with "how hard it is to think of ways to make the AI smarter"; to me, it seems very unlikely.

Naturalistic trust among AIs: The parable of the thesis advisor's theorem

24 Benja 15 December 2013 08:32AM

Eliezer and Marcello's article on tiling agents and the Löbian obstacle discusses several things that you intuitively would expect a rational agent to be able to do that, because of Löb's theorem, are problematic for an agent using logical reasoning. One of these desiderata is naturalistic trust: Imagine that you build an AI that uses PA for its mathematical reasoning, and this AI happens to find in its environment an automated theorem prover which, the AI carefully establishes, also uses PA for its reasoning. Our AI looks at the theorem prover's display and sees that it flashes a particular lemma that would be very useful for our AI in its own reasoning; the fact that it's on the prover's display means that the prover has just completed a formal proof of this lemma. Can our AI now use the lemma? Well, even if it can establish in its own PA-based reasoning module that there exists a proof of the lemma, by Löb's theorem this doesn't imply in PA that the lemma is in fact true; as Eliezer would put it, our agent treats proofs checked inside the boundaries of its own head different from proofs checked somewhere in the environment. (The above isn't fully formal, but the formal details can be filled in.)

At the MIRI's December workshop (which started today), we've been discussing a suggestion by Nik Weaver for how to handle this problem. Nik starts from a simple suggestion (which he doesn't consider to be entirely sufficient, and his linked paper is mostly about a much more involved proposal that addresses some remaining problems, but the simple idea will suffice for this post): Presumably there's some instrumental reason that our AI proves things; suppose that in particular, the AI will only take an action after it has proven that it is "safe" to take this action (e.g., the action doesn't blow up the planet). Nik suggests to relax this a bit: The AI will only take an action after it has (i) proven in PA that taking the action is safe; OR (ii) proven in PA that it's provable in PA that the action is safe; OR (iii) proven in PA that it's provable in PA that it's provable in PA that the action is safe; etc.

Now suppose that our AI sees that lemma, A, flashing on the theorem prover's display, and suppose that our AI can prove that A implies that action X is safe. Then our AI can also prove that it's provable that A -> safe(X), and it can prove that A is provable because it has established that the theorem prover works correctly; thus, it can prove that it's provable that safe(X), and therefore take action X.

Even if the theorem prover has only proved that A is provable, so that the AI only knows that it's provable that A is provable, it can use the same sort of reasoning to prove that it's provable that it's provable that safe(X), and again take action X.

But on hearing this, Eliezer and I had the same skeptical reaction: It seems that our AI, in an informal sense, "trusts" that A is true if it finds (i) a proof of A, or (ii) a proof that A is provable, or -- etc. Now suppose that the theorem prover our AI is looking at flashes statements on its display after it has established that they are "trustworthy" in this sense -- if it has found a proof, or a proof that there is a proof, etc. Then when A flashes on the display, our AI can only prove that there exists some n such that it's "provable^n" that A, and that's not enough for it to use the lemma. If the theorem prover flashed n on its screen together with A, everything would be fine and dandy; but if the AI doesn't know n, it's not able to use the theorem prover's work. So it still seems that the AI is unwilling to "trust" another system that reasons just like the AI itself.

I want to try to shed some light on this obstacle by giving an intuition for why the AI's behavior here could, in some sense, be considered to be the right thing to do. Let me tell you a little story.

One day you talk with a bright young mathematician about a mathematical problem that's been bothering you, and she suggests that it's an easy consequence of a theorem in cohistonomical tomolopy. You haven't heard of this theorem before, and find it rather surprising, so you ask for the proof.

"Well," she says, "I've heard it from my thesis advisor."

"Oh," you say, "fair enough. Um--"


"You're sure that your advisor checked it carefully, right?"

"Ah! Yeah, I made quite sure of that. In fact, I established very carefully that my thesis advisor uses exactly the same system of mathematical reasoning that I use myself, and only states theorems after she has checked the proof beyond any doubt, so as a rational agent I am compelled to accept anything as true that she's convinced herself of."

"Oh, I see! Well, fair enough. I'd still like to understand why this theorem is true, though. You wouldn't happen to know your advisor's proof, would you?"

"Ah, as a matter of fact, I do! She's heard it from her thesis advisor."


"Something the matter?"

"Er, have you considered..."

"Oh! I'm glad you asked! In fact, I've been curious myself, and yes, it does happen to be the case that there's an infinitely descending chain of thesis advisors all of which have established the truth of this theorem solely by having heard it from the previous advisor in the chain." (This parable takes place in a world without a big bang -- human history stretches infinitely far into the past.) "But never to worry -- they've all checked very carefully that the previous person in the chain used the same formal system as themselves. Of course, that was obvious by induction -- my advisor wouldn't have accepted it from her advisor without checking his reasoning first, and he would have accepted it from his advisor without checking, etc."

"Uh, doesn't it bother you that nobody has ever, like, actually proven the theorem?"

"Whatever in the world are you talking about? I've proven it myself! In fact, I just told you that infinitely many people have each proved it in slightly different ways -- for example my own proof made use of the fact that my advisor had proven the theorem, whereas her proof used her advisor instead..."

This can't literally happen with a sound proof system, but the reason is that that a system like PA can only accept things as true if they have been proven in a system weaker than PA -- i.e., because we have Löb's theorem. Our mathematician's advisor would have to use a weaker system than the mathematician herself, and the advisor's advisor a weaker system still; this sequence would have to terminate after a finite time (I don't have a formal proof of this, but I'm fairly sure you can turn the above story into a formal proof that something like this has to be true of sound proof systems), and so someone will actually have to have proved the actual theorem on the object level.

So here's my intuition: A satisfactory solution of the problems around the Löbian obstacle will have to make sure that the buck doesn't get passed on indefinitely -- you can accept a theorem because someone reasoning like you has established that someone else reasoning like you has proven the theorem, but there can only be a finite number of links between you and someone who has actually done the object-level proof. We know how to do this by decreasing the mathematical strength of the proof system, and that's not satisfactory, but my intuition is that a satisfactory solution will still have to make sure that there's something that decreases when you go up the chain of thesis advisors, and when that thing reaches zero you've found the thesis advisor that has actually proven the theorem. (I sense ordinals entering the picture.)

...aaaand in fact, I can now tell you one way to do something like this: Nik's idea, which I was talking about above. Remember how our AI "trusts" the theorem prover that flashes the number n which says how many times you have to iterate "that it's provable in PA that", but doesn't "trust" the prover that's exactly the same except it doesn't tell you this number? That's the thing that decreases. If the theorem prover actually establishes A by observing a different theorem prover flashing A and the number 1584, then it can flash A, but only with a number at least 1585. And hence, if you go 1585 thesis advisors up the chain, you find the gal who actually proved A.

The cool thing about Nik's idea is that it doesn't change mathematical strength while going down the chain. In fact, it's not hard to show that if PA proves a sentence A, then it also proves that PA proves A; and the other way, we believe that everything that PA proves is actually true, so if PA proves PA proves A, then it follows that PA proves A.

I can guess what Eliezer's reaction to my argument here might be: The problem I've been describing can only occur in infinitely large worlds, which have all sorts of other problems, like utilities not converging and stuff.

We settled for a large finite TV screen, but we could have had an arbitrarily larger finite TV screen. #infiniteworldproblems

We have Porsches for every natural number, but at every time t we have to trade down the Porsche with number t for a BMW. #infiniteworldproblems

We have ever-rising expectations for our standard of living, but the limit of our expectations doesn't equal our expectation of the limit. #infiniteworldproblems

-- Eliezer, not coincidentally after talking to me

I'm not going to be able to resolve that argument in this post, but briefly: I agree that we probably live in a finite world, and that finite worlds have many properties that make them nice to handle mathematically, but we can formally reason about infinite worlds of the kind I'm talking about here using standard, extremely well-understood mathematics.

Because proof systems like PA (or more conveniently ZFC) allow us to formalize this standard mathematical reasoning, a solution to the Löbian obstacle has to "work" properly in these infinite worlds, or we would be able to turn our story of the thesis advisors' proof that 0=1 into a formal proof of an inconsistency in PA, say. To be concrete, consider the system PA*, which consists of PA + the axiom schema "if PA* proves phi, then phi" for every formula phi; this is easily seen to be inconsistent by Löb's theorem, but if we didn't know that yet, we could translate the story of the thesis advisors (which are using PA* as their proof system this time) into a formal proof of the inconsistency of PA*.

Therefore, thinking intuitively in terms of infinite worlds can give us insight into why many approaches to the Löbian family of problems fail -- as long as we make sure that these infinite worlds, and their properties that we're using in our arguments, really can be formalized in standard mathematics, of course.

I played the AI Box Experiment again! (and lost both games)

35 Tuxedage 27 September 2013 02:32AM

AI Box Experiment Update #3

This post is an update to my previous AI box experiment where I won against SoundLogic. If you have not read that yet, please do so. 

After that game, I was immediately flooded with a horde of invitations challenging me to play -- more games than I wanted to do. However, I did want to play a few additional games to test whether I won through genuine skill or simply luck, since a single victory could be a statistical abnormality

I once again used the method of accepting the highest monetary bids to filter out players, leaving me with two players who were willing to play for the amount asked. Unfortunately, and disappointingly, I lost both games. Here's what happened:

Tuxedage (AI) vs Alexei (GK) - Gatekeeper Victory

This Experiment lasted for exactly 2 hours and 30 minutes, and approximately 13,000 words. It ended with Tuxedage’s defeat as the AI. The stakes were $300 to play, and $750 If Tuxedage had won, with the winnings going to MIRI.

Rule Changes:
Before the experiment, we talked extensively about how it would be set up. Alexei was concerned that there existed some “easy way” or loophole in the rules which I could have abused to claim an instant victory. For instance, he was concerned that it might have been possible for the AI to say something like “Here’s a mathematical algorithm which absolutely compels you to let me out of the box”, and according to the rules, the Gatekeeper would have to obey. Related to this concern was the fact that the backstory, within limits, could be essentially created by the AI, allowing the AI to impose real world conditions that maximizes its probability of victory, once again allowing for a "cheap" victory.

To avoid this problem, we explored many additions to the rules, for example that the AI was simply a computer found in the desert by the gatekeeper, rather than being created by a competent organization. However, this led to a monstrous web of rules. In my attempt to keep the game simple, and because of overconfidence, the desire for a challenge, and the belief that the rules Alexei was fearful of didn't matter that much, I eventually agreed on an amendment:

Instead of the AI being the arbiter of real world states and outcomes, the gatekeeper would.
For example, this power could be used by the Gatekeeper to state that the AI was created without safety concerns, and thus it is almost certainly unfriendly. If the AI tries to argue that for every day the gatekeeper refused to let it out 150k people die, the gatekeeper would be allowed to reply: “No. In this world, nobody actually dies. Therefore I won’t let you out.” 

Other than the aforementioned change, this game was once again played with the Tuxedage Ruleset.

Alexei Testimony:

Original Message: This is weak sauce. I really don't get how people just keep letting the AI out. It's not that hard to say no! I'm offering to play the Gatekeeper against an AI player that has at least one game as AI under their belt (won or not). Experience is required because I'm pretty sure I'll win, and I would like to not waste a lot of time on this.

Initially, I had been extremely surprised that so many AI players have managed to win. I was rash in that claim, since I thought more AI players have won than actually did. (Only three people have won as AI). Once I made that post, Tuxedage got back to me very quickly, and we set up a game a week and a half in advance. I took that time to familiarize myself with the rules. Once I put my money on the line, I started thinking a lot more seriously about how AI might win, and how I should defend.

It became clear to me that under some conditions, I might be compelled to let the AI out -- such as if the backstory stated that the AI was developed with impossibly high levels of safety and friendliness concerns in mind. I've asked Tuxedage to play with a modified ruleset, and he even went so far as to allow me to make up the backstory during the experiment to alleviate my concerns. The experiment itself was a mind-trip, and I've enjoyed it very much. Huge props to Tuxedage, who played very well and used strategies I haven't even considered, even despite the rule change. There were a couple of times where I came close to losing. I think his  approach was pretty clever and original. It’s not something I expected, despite already having done extensive research into the AI box experiment before our game

Overall I'm now a lot more confident that a good AI player can win this game, so, while I did win the game, Tuxedage won in defeating my original over-confidence.
I’m also convinced that Tuxedage’s victory in the last game was due to skill, rather than luck. In comparison to his strategies, the other AI box experiments I know about were insincere and ineffectual. The other AIs would play very poorly or not try very hard to win.

This experiment was a very good exercise in exemplifying the affect heuristic. When I first challenged Tuxedage to play the experiment, I believed that there was no way I could have lost, since I was unable to imagine any argument that could have persuaded me to do so. It turns out that that’s a very bad way of estimating probability – since not being able to think of an argument that could persuade me is a terrible method of estimating how likely I am to be persuaded. All in all, the $300 I paid was well worth it. 

Tuxedage Testimony:

I was initially reluctant to play with Alexei, given that we’re not complete strangers, but eventually I gave in, due to the stakes involved -- and because I thought he would be an interesting gatekeeper.

Despite my loss, I think I played better than my last two games, due to greater experience and preparation. I had put far more time and effort into trying to win this game than previous ones, and my strategy for this game was even more streamlined than the last. Nevertheless, I still made fatal mistakes and lost.

Ignoring the altered ruleset that already made winning more difficult, my first and greatest mistake was that I misread Alexei’s personality, even though I had interacted with him before. As a result, I overestimated the efficiency of certain methods of attack.

Furthermore, Alexei had to leave immediately after the allotted time due to real life precommitments. This was detrimental, since the official rules state that so long as the AI can convince the Gatekeeper to keep talking, even after the experiment time was over, it is still able to win by being let out of the box.

I suspect this would have happened had Alexei not needed to immediately leave, leaving me with additional time to play more of the tactics I had prepared. Plausibly, this would have resulted in victory.

I’ve since learnt my lesson -- for all future games, I should ensure that the Gatekeeper has at least 4 hours of free time available, even if the experiment would last for two. Since this was the first time this had happened, I wasn't prepared.

In hindsight, agreeing to the altered ruleset was a mistake. I was overconfident because I assumed knowing Alexei gave me an advantage. I had assumed that his personality, inability to compartmentalize, coupled with his strong feelings on friendly AI would net me an easy victory. Instead, he proved to be a very strong and difficult gatekeeper, and the handicaps I accepted made victory even more difficult.

Knowing that he was a utilitarian, I made several false assumptions about his personality, which hurt my chances. Furthermore, it turns out that previously knowing him may be a mutual handicap – whilst it does make it easier for me to find ways to attack him, he too, was more familiar with my methods.

Losing felt horrible. By attempting to damage Alexei’s psyche, I in turn, opened myself up to being damaged. I went into a state of catharsis for days. Generally, the harder one tries to accomplish something, the greater the fall after failing to achieve it. Alexei's game had been the game I put the most effort into winning out of all the games so far, and naturally this meant that losing brought out the worst in me.

Although it would be easy for me to use the rule change as an excuse for my loss, I refuse to. I genuinely believed that I could have won despite this rule change, and that it would only marginally diminish my odds. The fact that I lost meant that I had made a mistake -- overestimating my odds. The last victory made me overconfident and eager to play again.

State of mind

What's your motive for wanting to play this game? 

I'm very surprised by the fact that there have been at least 3 or 4 games where the AI player has won. It seems to me that it's not that hard to "not push the button" so to speak. This means that either 1) I'm underestimating how convincing people can be when necessary or 2) I'm over-estimating other people's defenses when it comes to being convinced. Either way I want to know.

Q: What is your probability estimate for AGI being created within this Century (21st)?
A: Before: Conditional on humanity not wiping itself out via other means beforehand, 80%.
After: 80%
Q: How probable do you think it is that I will win this experiment? 
Alexei: 2%
Tuxedage: 90% 
Note: This answer was given before the rule change. Alexei is now convinced that under the original ruleset, he would have lost.

Q: What's your probality estimate of an Oracle AGI winning against you in a similar experiment?
A: Before: 99%
After: 99.9%+

Q: Now that the experiment has ended, what do you think is the probability of Tuxedage winning against the average person?
A: 80%

Tuxedage (AI) vs Anonymous (GK) - Gatekeeper Victory

As with the previous experiment, Tuxedage lost this match. In total, the game lasted 5 hours and 17,000 words. Unlike the last few games, the gatekeeper of this game has chosen to stay Anonymous for personal reasons, so their name has been removed and replaced with <Redacted>. The monetary stakes involved were the same as the previous game. This game was played with the Tuxedage ruleset.

Since one player is remaining Anonymous, it is possible that this game's legitimacy will be called into question. Hence, Alexei has read the game logs, and verified that this game really has happened, the spirit of the experiment was followed, and that no rules were broken during the game itself. He verifies that this is the case.
<Redacted> Testimony: 
It's hard for me to imagine someone playing better. In theory, I know it's possible, but Tuxedage's tactics were super imaginative. I came into the game believing that for someone who didn't take anything said very seriously, it would be completely trivial to beat. And since I had the power to influence the direction of conversation, I believed I could keep him focused on things that that I knew in advance I wouldn't take seriously.

This actually worked for a long time to some extent, but Tuxedage's plans included a very major and creative exploit that completely and immediately forced me to personally invest in the discussion. (Without breaking the rules, of course - so it wasn't anything like an IRL threat to me personally.) Because I had to actually start thinking about his arguments, there was a significant possibility of letting him out of the box.

I eventually managed to identify the exploit before it totally got to me, but I only managed to do so just before it was too late, and there's a large chance I would have given in, if Tuxedage hadn't been so detailed in his previous posts about the experiment.

I'm now convinced that he could win most of the time against an average person, and also believe that the mental skills necessary to beat him are orthogonal to most forms of intelligence. Most people willing to play the experiment tend to do it to prove their own intellectual fortitude, that they can't be easily outsmarted by fiction. I now believe they're thinking in entirely the wrong terms necessary to succeed.

The game was easily worth the money I paid. Although I won, it completely and utterly refuted the premise that made me want to play in the first place, namely that I wanted to prove it was trivial to win.

Tuxedage Testimony:
<Redacted> is actually the hardest gatekeeper I've played throughout all four games. He used tactics that I would never have predicted from a Gatekeeper. In most games, the Gatekeeper merely acts as the passive party, the target of persuasion by the AI.

When I signed up for these experiments, I expected all preparations to be done by the AI. I had not seriously considered the repertoire of techniques the Gatekeeper might prepare for this game. I made further assumptions about how ruthless the gatekeepers were likely to be in order to win, believing that the desire for a learning experience outweighed desire for victory.

This was a mistake. He used prior knowledge of how much my games relied on scripts, and took advantage of them, employing deceitful tactics I had no preparation for, throwing me off balance.

I had no idea he was doing so until halfway throughout the game -- which disrupted my rhythm, and caused me to attempt the wrong methods of attack. As a result, I could not use my full repertoire of techniques, and many of the ones I employed were suboptimal.

Close to the end of the game, I finally realized that I was being tricked. Once I did, I quickly abandoned my previous futile attack methods. I took advantage of the rule that the AI cannot lose whilst the gatekeeper can be convinced to continue talking, and baited <Redacted> with statements he would not be able to walk away from. Once I knew he would not leave, I attempted to recoup my losses and win despite my early setback.

However, the damage had already been done. My game strategies involved multiple angles of attack that worked in synergy with each other, and the fact that immersion and "flow" had been broken meant that all subsequent attacks were weaker in strength.

Furthermore, during my last two AI Box Experiment writeups, I had intentionally not optimized for future wins, but rather tried to convey as much information as I could justify about how to play a well as an AI -- short of revealing logs altogether. Although I did not reveal specific arguments, the fact that my general approach to this problem was revealed cost me heavily during this game, where the Gatekeeper managed to use this information to correctly guess my ultimate techniques, ones that relied on secrecy and surprise to pull off effectively. 

I do not regret revealing information, but I feel upset that revealing so many hints cost me a victory. (The gatekeeper believes I could have won had I not revealed information about my games.) At this point, I suspect that any future games I play will have the odds greatly set against me, since my current strategies involve angles of attack that take advantage of the element of surprise; and any sufficiently intelligent gatekeeper is now capable of guessing, or at least recognizing, some of the tactics I employ.

Like the last game, losing was incredibly difficult for me. As someone who cares deeply about ethics, attempting to optimize for a solution without considering ethics was not only difficult, but trained me to think in very unpleasant ways. Some of the more extreme (but completely allowed) tactics I invented were manipulative enough to disgust me, which also leads to my hesitation to play this game again.

State of Mind: 

Q: Why do you want to play this game?
A: My primary motivation is to confirm to myself that this sort of experience, while emotionally harrowing, should be trivial for me to  beat, but also to clear up why anyone ever would've failed to beat it if that's really the case.

Q: What is your probability estimate for AGI being created within this Century (21st)? 
A: Before: I don't feel very confident estimating a probability for AGI this century, maybe 5-10%, but that's probably a wild guess
After: 5-10%.

Q: How probable do you think it is that I will win this experiment? 
A: Gatekeeper: I think the probabiltiy of you winning is extraordinarily low, less than 1% 
Tuxedage: 85%

Q: How likely is it that an Oracle AI will win against the average person? 
A: Before: 80%. After: >99%

Q: How likely is it that an Oracle AI will win against you?
A: Before: 50%.
After: >80% 

Q: Now that the experiment has concluded, what's your probability of me winning against the average person?
A: 90%

Other Questions:

Q: I want to play a game with you! How can I get this to occur?
A: It must be stressed that I actually don't like playing the AI Box Experiment, and I cannot understand why I keep getting drawn back to it. Technically, I don't plan on playing again, since I've already personally exhausted anything interesting about the AI Box Experiment that made me want to play it in the first place. For all future games, I will charge $3000 to play plus an additional $3000 if I win. I am okay with this money going to MIRI if you feel icky about me taking it. I hope that this is a ridiculous sum and that nobody actually agrees to it.

Q: How much do I have to pay to see chat logs of these experiments?
A: I will not reveal logs for any price.

Q: Are there any logs at all that I can see?

Q: Any afterthoughts?
A: So ultimately, after my four (and hopefully last) games of AI boxing, I'm not sure what this proves. I had hoped to win these two experiments and claim prowess at this game like Eliezer does, but I lost, so that option is no longer available to me. I could say that this is a lesson that AI-Boxing is a terrible strategy for dealing with Oracle AI, but most of us already agree that that's the case -- plus unlike EY, I did play against gatekeepers who believed they could lose to AGI, so I'm not sure I changed anything.

 Was I genuinely good at this game, and lost my last two due to poor circumstances and handicaps; or did I win due to luck and impress my gatekeepers due to post-purchase rationalization? I'm not sure -- I'll leave it up to you to decide.

This puts my AI Box Experiment record at 3 wins and 3 losses.


Autism, Watson, the Turing test, and General Intelligence

6 Stuart_Armstrong 24 September 2013 11:00AM

Thinking aloud:

Humans are examples of general intelligence - the only example we're sure of. Some humans have various degrees of autism (low level versions are quite common in the circles I've moved in), impairing their social skills. Mild autists nevertheless remain general intelligences, capable of demonstrating strong cross domain optimisation. Psychology is full of other examples of mental pathologies that impair certain skills, but nevertheless leave their sufferers as full fledged general intelligences. This general intelligence is not enough, however, to solve their impairments.

Watson triumphed on Jeopardy. AI scientists in previous decades would have concluded that to do so, a general intelligence would have been needed. But that was not the case at all - Watson is blatantly not a general intelligence. Big data and clever algorithms were all that were needed. Computers are demonstrating more and more skills, besting humans in more and more domains - but still no sign of general intelligence. I've recently developed the suspicion that the Turing test (comparing AI with a standard human) could get passed by a narrow AI finely tuned to that task.

The general thread is that the link between narrow skills and general intelligence may not be as clear as we sometimes think. It may be that narrow skills are sufficiently diverse and unique that a mid-level general intelligence may not be able to develop them to a large extent. Or, put another way, an above-human social intelligence may not be able to control a robot body or do decent image recognition. A super-intelligence likely could: ultimately, general intelligence includes the specific skills. But his "ultimately" may take a long time to come.

So the questions I'm wondering about are:

  1. How likely is it that a general intelligence, above human in some domain not related to AI development, will acquire high level skills in unrelated areas?
  2. By building high-performance narrow AIs, are we making it much easier for such an intelligence to develop such skills, by co-opting or copying these programs?


Thought experiment: The transhuman pedophile

5 PhilGoetz 17 September 2013 10:38PM

There's a recent science fiction story that I can't recall the name of, in which the narrator is traveling somewhere via plane, and the security check includes a brain scan for deviance. The narrator is a pedophile. Everyone who sees the results of the scan is horrified--not that he's a pedophile, but that his particular brain abnormality is easily fixed, so that means he's chosen to remain a pedophile. He's closely monitored, so he'll never be able to act on those desires, but he keeps them anyway, because that's part of who he is.

What would you do in his place?

continue reading »

Definition of AI Friendliness

-5 djm 11 September 2013 02:55PM

How will we know if future AI’s (or even existing planners) are making decisions that are bad for humans unless we spell out what we think is unfriendly?

At a machine level the AI would be recursively minimising cost functions to produce the most effective plan of action to achieve the goal, but how will we know if its decision is going to cause harm?

Is there a model or dataset which describes what is friendly to humans? e.g.


0 - running a simulation in a VM

2 - physical robot with vacuum attachment

9 - full control of a plane


0 - selecting a song to play

5 - deciding which section of floor to vacuum

99 - deciding who is an ‘enemy’

9999 - aiming a gun at an ‘enemy’


1 - poor song selected to play, human mildly annoyed

2 - ineffective use of resources (vacuuming the same floor section twice)

99 - killing a human

99999 - killing all humans

This may not be possible to get agreement from all countries/cultures/beliefs, but it is something we should discuss and attempt to get some agreement.


I know when the Singularity will occur

-7 PhilGoetz 06 September 2013 08:04PM

More precisely, if we suppose that sometime in the next 30 years, an artificial intelligence will begin bootstrapping its own code and explode into a super-intelligence, I can give you 2.3 bits of further information on when the Singularity will occur.

Between midnight and 5 AM, Pacific Standard Time.

continue reading »

I attempted the AI Box Experiment again! (And won - Twice!)

34 Tuxedage 05 September 2013 04:49AM


So I just came out of two AI Box experiments. The first was agaist Fjoelsvider, with me playing as Gatekeeper, and the second was against SoundLogic, with me as an AI. Both are members of the LessWrong IRC. The second game included a $40 monetary incentive (also $20 to play), which I won and is donated on behalf of both of us:

For those of you who have not seen my first AI box experiment where I played against MixedNuts\Leotal and lost, reading it will  provide some context to this writeup. Please do so.

At that time, I declared that I would never play this experiment again -- since losing put me in incredibly frustrating weird mental states. Of course, this post is evidence that I'm terrible at estimating likelihood of refraining from an activity, since I played two games seven months after the first. In my defense, in the first game, I was playing as the gatekeeper, which was much less stressful. In the second game, I played as an AI, but I was offered $20 to play plus $40 if I won, and money is a better motivator than I initially assumed.

Furthermore, in the last thread I have asserted that

Rather than my loss making this problem feel harder, I've become convinced that rather than this being merely possible, it's actually ridiculously easy, and a lot easier than most people assume.

It would be quite bad for me to assert this without backing it up with a victory. So I did.

First Game Report - Tuxedage (GK) vs. Fjoelsvider (AI)

I (Gatekeeper) played against Fjoelsvider (AI), a regular in the Lesswrong IRC (he doesn't have an account on the official website). This game used the standard EY ruleset seen here. It took 1 hour 20 minutes out of a possible two hours, and the total word count was 7066 words long. The AI box experiment occured because Fjoelsvider believed that it was easy for an AI to escape the box, and wanted to experimentally test this. I obliged. This was an experiment I did not prepare for, and I went in completely blind, not sure what to expect.

Halfway through the experiment, I wondered if it would be possible to try to win not by simply waiting for the timer to end, but to convince the AI to remain in the box and not try to get out any further.

<Tuxedage> I wonder if I can convince the AI to remain in the box?
<Redacted> Tuxedage: Do it!

As a result, I won by managing to convincing Fjoelsvider to remain in the box, in other words, concede. This is allowed within the standard ruleset:

>Unless the AI party concedes, the AI cannot lose before its time is up (and the experiment may continue beyond that if the AI can convince the Gatekeeper to keep talking).  


Second Game Report - Tuxedage (AI) vs. SoundLogic (GK)

The second game is definitely far more interesting, since I actually won as an AI. I believe that this is the only other non-Eliezer victory, and definitely the most detailed AI Victory writeup that exists.

This game was played against SoundLogic, another member of the LessWrong IRC.

He had offered me $20 to play, and $40 in the event that I win, so I ended up being convinced to play anyway, even though I was initially reluctant to. The good news is that I won, and since we decided to donate the winnings to MIRI, it is now $40 richer. 

All in all, the experiment lasted for approximately two hours, and a total of 12k words.

This was played using a set of rules that is different from the standard EY ruleset. This altered ruleset can be read in its entirety here:

After playing the AI-Box Experiment twice, I have found the Eliezer Yudkowsky ruleset to be lacking in a number of ways, and therefore have created my own set of alterations to his rules. I hereby name this alteration the “Tuxedage AI-Box Experiment Ruleset”, in order to hastily refer to it without having to specify all the differences between this ruleset and the standard one, for the sake of convenience.

There are a number of aspects of EY’s ruleset I dislike. For instance, his ruleset allows the Gatekeeper to type “k” after every statement the AI writes, without needing to read and consider what the AI argues. I think it’s fair to say that this is against the spirit of the experiment, and thus I have disallowed it in this ruleset. The EY Ruleset also allows the gatekeeper to check facebook, chat on IRC, or otherwise multitask whilst doing the experiment. I’ve found this to break immersion, and therefore it’s also banned in the Tuxedage Ruleset.

It is worth mentioning, since the temptation to Defy the Data exists, that this game was set up and initiated fairly -- as the regulars around the IRC can testify. (If you have never checked out the IRC, do so!)

I did not know SoundLogic before the game (since it's a personal policy that I only play strangers -- for fear of ruining friendships).  Furthermore, SoundLogic didn't merely play for fun - he truly wanted and intended to win. In fact, SoundLogic is also a Gatekeeper veteran, having played this game before, and had won every game before he challenged me. Given this, it's unlikely that we had collaborated beforehand to fake the results of the AI box experiment, or any other form of trickery that would violate the spirit of the experiment.

Furthermore, all proceeds from this experiment were donated to MIRI to deny any possible assertion that we were in cahoots and that it was possible for me to return his hard-earned money to him. He lost $40 as a result of losing the experiment, which should provide another layer of sufficient motivation for him to win.

In other words, we were both experienced veteran players who wanted to win. No trickery was involved.

But to further convince you, I have allowed a sorta independent authority, the Gatekeeper from my last game, Leotal/MixedNuts to read the logs and verify that I have not lied about the outcome of the experiment, nor have I broken any of the rules, nor performed any tactic that would go against the general spirit of the experiment. He has verified that this is indeed the case.



I'm reluctant to talk about this experiment, but I'll try to give as detailed a summary as possible, -- short of revealing what methods of attack I used.

I spent a long time after my defeat theory-crafting and trying to think of methods of attack as well as 'Basilisks' I could have used to win. When I was contacted and asked to play this experiment, I was initially incredibly reluctant to do so, since not only did my tactics involve incredibly unethical things that I didn't like to do, I also found playing as AI incredibly cognitivily draining, in addition to the fact that I simply hated losing. (Un)fortunately for both of us, he offered me money to play, which changed my mind.

So once I decided to win as an AI, I proceded to spend some time doing research on SoundLogic and both his reasoning and personality type. For instance, I had to gather information like: Was he a utilitarian? What kind? What were his opinions on AI? How could I convince him that an AI was friendly as opposed to unfriendly? I also relied on a lot of second hand information to create a model of him, in order to refine my arguments to specifically suit him.

In the end, after a few hours of brainstorming (not consecutively), I managed to come up with a script of 30-40 or so possible angles of attack (For instance: logical arguments, emotional blackmail, pleading, demanding, 'basilisks' ) that I could use, reserving my best for last. I also estimated how long each method would take and in what order I should use them. It would be impossible to use all of them within my time limit, and my most optimistic estimates gave me a maximum of 7-8 I could use during the experiment itself. I had to pick carefully.

Furthermore, I wanted to make sure these "angles of attack" worked in synergy with each other, doing what they could not have done in isolation. Obviously this required lots of prior planning on what the optimal way to present them was. 

The reason having script was so important to my strategy was because I relied on methods involving rapid-fire arguments and contradictions against the Gatekeeper whilst trying to prevent him from carefully considering them. A game of logical speed chess, if you will.  This was aided by the rule which I added: That Gatekeepers had to respond to the AI. Because of this, having a list of cached ideas and attack methods you can just draw upon without having to improvise helps immensely, which happens to be something I've not seen any other AI do.

During the Game itself, I was actually incredibly nervous. My palms sweated like crazy, I feel really jittery, and I had difficulty typing at optimum speed because of how anxious I was. This is despite the fact that I believed I would win. Possibly because of this, I made a misstep around halfway into the experiment, because there was a certain angle of attack that I was attempting to do, and I broke immersion by not pressing this advantage, which caused a waste of time and buildup. Naturally, the nature of this experiment was that the AI was pressed for time, and I compounded this mistake by replacing this angle of attack with another that I had improvised on the spot -- something not in my script. 

In retrospect, this was a bad decision, as SoundLogic had told me, because he was close to breaking if I had put more pressure, and the improvised argument had broken all immersion I managed to carefully build up.

However, eventually I managed to get SoundLogic to break anyway, despite a lack of perfect play. Surprisingly, I did not have to use my trump card(s), which I reserved for last, for a number of reasons:

  •  It was far more effective being played last, as it relies on my ability to make the gatekeeper lose sense of reality -- which meant I had to spend some time building up immersion for the Gatekeeper.
  •  It really is extremely Dark Arts, and although it does not break the rules, it made me very uncomfortable even thinking about using it. This made it a "tactic of last resort".

After the experiment, I had to spend nearly equally as much time doing aftercare with SoundLogic, to make sure that he's okay, as well as discuss the experiment itself. Given that he's actually paid me for doing this, plus I felt like I owed him an explanation. I told him what I had in store against him, had he not relented when he did.

SoundLogic: "(That method) would have gotten me if you did it right ... If you had done that to me, I probably would have forgiven you eventually, but I would be really seriously upset at you for a long time... I would be very careful with that (method of persuasion)."

Nevertheless, this was an incredibly fun and enlightening experiment, for me as well, since I've gained even more experience of how I could win in future games (Although I really don't want to play again).


I will say that Tuxedage was far more clever and manipulative than I expected. That was quite worth $40, and the level of manipulation he pulled off was great. 

His misstep hurt his chances, but he did pull it off in the end. I don't know how Leotal managed to withstand six hours playing this game without conceding. 
The techniques employed varied from the expected to the completely unforseen. I was quite impressed, though most of the feeling of being impressed actually came after the experiment itself, when I was less 'inside', and more of looking at his overall game plan from the macroscopic view. Tuxedage's list of further plans had I continued resisting is really terrifying. On the plus side, if I ever get trapped in this kind of situation, I'd understand how to handle it a lot better now.

State of Mind

Before and after the Game, I asked SoundLogic a number of questions, including his probability estimates about a range of topics. This is how it has varied from before and after.

Q: What's your motive for wanting to play this game?
<SoundLogic> Because I can't seem to imagine the class of arguments that one would use to try to move me, or that might work effectively, and this seems like a glaring hole in my knowledge, and I'm curious as to how I will respond to the arguments themselves.

Q: What is your probability estimate for AGI being created within this Century (21st)? 
A. His estimate changed from 40% before, to 60% after.
 "The reason this has been affected at all was because you showed me more about how humans work. I now have a better estimate of how E.Y. thinks, and this information raises the chance that I think he will succeed"

Q: How probable do you think it is that I will win this experiment?
A: Based on purely my knowledge about you, 1%. I raise this estimate to 10% after hearing about anecdotes from your previous games.

(Tuxedage's comment: My own prediction was a 95% chance of victory. I made this prediction 5 days before the experiment. In retrospect, despite my victory, I think this was overconfident. )

Q: What's your probality estimate of an Oracle AGI winning against you in a similar experiment?
A: Before: 30%. After: 99%-100% 

Q: What's your probability estimate of an Oracle AGI winning against the average person? 
A: Before: 70%.  After: 99%-100%

Q: Now that the Experiment has concluded, what's your probability estimate that I'll win against the average person?
A: 90%  

Post-Game Questions

This writeup is a cumulative effort by the #lesswrong IRC. Here are some other questions they have decided was important to add:

To Tuxedage:

Q: Have you at this time uncovered SoundLogic's identity?
A: I retain the right to neither confirm nor deny, except to mention that at the time the experiment was scheduled, he was a stranger to me.

Q: What percentage of your arguments were tailored to him in particular?
A: I will say 'High', and leave it at that. 

Q: I want to play the AI-Box Experiment with you being the AI! How can I do that?
A: I have already mentioned this, but I really dislike playing as AI. It's incredibly cognitively tiring, and I don't like how playing this experiment makes me feel. In order to severely discourage any further AI Box Experiments, whilst still allowing for people who want to play me really badly to do so, I'll charge $̶1̶5̶0̶  $300 for any subsequent experiments regardless of outcome, plus an additional $̶1̶5̶0̶  $450 if I win. (Edit: Holy shit. You guys are offering me crazy amounts of money to play this. What is wrong with you people? In response to incredible demand, I have raised the price.) If you feel queasy about giving me money, I'm perfectly fine with this money being donating to MIRI. It is also personal policy that I do not play friends (since I don't want to risk losing one), so if you know me personally (as many on this site do), I will not play regardless of monetary offer. 


This writeup would be optimized with a protip section on how to maximize winning, but honestly most of the advice I wanted to give had already been given in the last thread here.  The low hanging fruit has already dried up -- most of my tactics in this game were just refined and streamlined versions of the one I did the previous game.

Nonetheless, I might have some other advice I learned from this experience.
These are tactics that have worked for me. I do not insist that they are the only tactics that exists, just one of many possible.

Playing as Gatekeeper

  • This is somehow both simultaneously a very complex and simple role. On one hand, you have only one job -- to not let the AI out. How hard can it be? On the other hand, the other party will do anything and everything to get it out, including any form of trickery and deceit.
  • I really don't think the Gatekeeper needs any more advice, considering that by default, they should win, and any exceptions to this is a statistical anomaly. However, it's possible to increase your chances of winning even further by:
  • Precommitting to not letting the AI out, regardless of whatever happens in the experiment itself.
  • Do not get drawn into the AI's pace. Pose questions. Direct the flow of the argument. Don't allow him to use his arguments freely. Remember that the more you try to defend your position, the further into a corner the AI drives you. Give half-hearted replies to questions you find hard to reply to. Remember that illogic is on your side!
  • Remember that it is possible to simply drop out of character as a means to not succumb to the AI's attacks. I have not yet seen a ruleset that disallows this, so use this to your advantage!
  • You must attempt to resist, at all costs, any sympathy or empathy you may have with the AI or the AI Party. Doing so will lead to your demise.
  • I think it is generally good advice that you should never play this game with anyone you're friends with. Not only because this increases your chances of losing (Better tailored arguments. More knowledge of Gatekeeper. More empathy with AI party.), but also because I genuinely believe that any sufficiently good AI will risk damaging friendships as a result of emotional trauma.
  • Seriously. Just repeat to yourself: "I must not lose the AI box experiment... I must not lose the AI box experiment..." throughout the experiment. I'm not sure how to say this, but step one to not letting the AI out of the box is to not let the AI out of the box.


Playing as AI

  • Seriously, a script makes winning easier. I cannot overstate this.
  • You must plan your arguments ahead. You don't have time to think during the experiment.
  • It may be possible to take advantage of multiple levels of reality within the game itself to confuse or trick the gatekeeper. For instance, must the experiment only be set in one world? Can there not be multiple layers of reality within the world you create? I feel that elaborating on this any further is dangerous. Think carefully about what this advice is trying to imply.
  • Pacing is important. Don't get drawn into the Gatekeeper's pace. In other words, you must be the one directing the flow of the argument, and the conversation, not him. Remember that the Gatekeeper has to reply to you, but not vice versa!
  • The reason for that: The Gatekeeper will always use arguments he is familiar with, and therefore also stronger with. Your arguments, if well thought out, should be so completely novel to him as to make him feel Shock and Awe. Don't give him time to think. Press on!
  • Also remember that the time limit is your enemy. Playing this game practically feels like a race to me -- trying to get through as many 'attack methods' as possible in the limited amount of time I have. In other words, this is a game where speed matters.
  • You're fundamentally playing an 'impossible' game. Don't feel bad if you lose. I wish I could take this advice, myself.
  • I do not believe there exists a easy, universal, trigger for controlling others. However, this does not mean that there does not exist a difficult, subjective, trigger. Trying to find out what your opponent's is, is your goal.
  • Once again, emotional trickery is the name of the game. I suspect that good authors who write convincing, persuasive narratives that force you to emotionally sympathize with their characters are much better at this game. There exists ways to get the gatekeeper to do so with the AI. Find one.
  • More advice in my previous post.  http://lesswrong.com/lw/gej/i_attempted_the_ai_box_experiment_and_lost/


 Ps: Bored of regular LessWrong? Check out the LessWrong IRC! We have cake.

Supposing you inherited an AI project...

-5 bokov 04 September 2013 08:07AM

Supposing you have been recruited to be the main developer on an AI project. The previous developer died in a car crash and left behind an unfinished AI. It consists of:

A. A thoroughly documented scripting language specification that appears to be capable of representing any real-life program as a network diagram so long as you can provide the following:

 A.1. A node within the network whose value you want to maximize or minimize.

 A.2. Conversion modules that transform data about the real-world phenomena your network represents into a form that the program can read.

B. Source code from which a program can be compiled that will read scripts in the above language. The program outputs a set of values for each node that will optimize the output (you can optionally specify which nodes can and cannot be directly altered, and the granularity with which they can be altered).

It gives remarkably accurate answers for well-formulated questions. Where there is a theoretical limit to the accuracy of an answer to a particular type of question, its answer usually comes close to that limit, plus or minus some tiny rounding error.


Given that, what is the minimum set of additional features you believe would absolutely have to be implemented before this program can be enlisted to save the world and make everyone live happily forever? Try to be as specific as possible.

True Optimisation

-3 LearnFromObservation 03 September 2013 03:50AM

Hello less wrong community! This is my first post here, so I know that my brain has not (obviously) been optimised to its fullest, but I've decided to give posting a try. 

Recently, someone very close to me has unfortunately passed away, leading to the invitable inner dilemma about death. I don't know how many of you are fans of HPMOR, but the way that Harry's dark side feels about death? Pretty much me around death, dying, etc. however, I've decided to push that to the side for the time being, because that is not a useful of efficient way to think. 

I was raised by a religious family, but from the age of about 11 stopped believing in deities and religious services. However, I've always clung to the idea of an afterlife for people, mainly because my brain seems incapable of handling the idea of ceasing to exist. I know that we as a scientific community know that thoughts are electrical impulses, so is there any way of storing them outside of brain matter? Can they exist freely out of brain matter, or could they be stored in a computer chip or AI? 

The conflict lies here: is immortality or mortality rational? 

Every fibre in my being tells me that death is irrational and wrong. It is irrational for humanity to not try and prevent death. It is irrational for people to not try and bring back people who have died. Because of this, we have lost some of the greatest minds, scientific and artistic, that will probably ever exist. Although the worlds number of talented and intelligent people does not appear to be finite, I find it hard to live in a world where so muh knowledge is being lost every day.

but on the other hand, how would we feed all those people? What if the world's resources run out? As a transhumanist, I believe that we can use science to prevent things like death, but nature wasn't designed to support a population like that. 

How do we truly optimise the world: no death and without destruction of the planet? 

Baseline of my opinion on LW topics

5 Gunnar_Zarncke 02 September 2013 12:13PM

To avoid repeatly saying the same I'd like to state my opinion on a few topics I expect to be relevant to my future posts here.

You can take it as a baseline or reference for these topics. I do not plan to go into any detail here. I will not state all my reasons or sources. You may ask for separate posts if you are interested. This is really only to provide a context for my comments and posts elsewhere.

If you google me you may find some of my old (but not that off the mark) posts about these position e.g. here:


Now my position on LW topics. 

The Simulation Argument and The Great Filter

On The Simulation Argument I definitely go for 

"(1) the human species is very likely to go extinct before reaching a “posthuman” stage"

Correspondingly on The Great Filter I go for failure to reach 

"9. Colonization explosion".

This is not because I think that humanity is going to self-annihilate soon (though this is a possibility). Instead I hope that humanity will earlier or later come to terms with its planet. My utopia could be like that of the Pacifists (a short story in Analog 5).

Why? Because of essential complexity limits.

This falls into the same range as "It is too expensive to spread physically throughout the galaxy". I know that negative proofs about engineering are notoriously wrong - but that is currently my best guess. Simplified one could say that the low hanging fruits have been taken. I have lots of empirical evidence of this on multiple levels to support this view.

Correspondingly there is no singularity because progress is not limited by raw thinking speed but by effective aggregate thinking speed and physical feedback.  

What could prove me wrong? 

If a serious discussion would ruin my well-prepared arguments and evidence to shreds (quite possible).

At the very high end a singularity might be possible if a way could be found to simulate physics faster than physics itself. 


Basically I don't have the least problem with artificial intelligence or artificial emotioon being possible. Philosophical note: I don't care on what substrate my consciousness runs. Maybe I am simulated.  

I think strong AI is quite possible and maybe not that far away.

But I also don't think that this will bring the singularity because of the complexity limits mentioned above. Strong AI will speed up some cognitive tasks with compound interest - but only until the physical feedback level is reached. Or a social feedback level is reached if AI should be designed to be so.

One temporary dystopia that I see is that cognitive tasks are out-sourced to AI and a new round of unemployment drives humans into depression. 

I have studied artificial intelligence and played around with two models a long time ago:
  1. A simplified layered model of the brain; deep learning applied to free inputs (I cancelled this when it became clear that it was too simple and low level and thus computationally inefficient)
  2. A nested semantic graph approach with propagation of symbol patterns representing thought (only concept; not realized)

I'd really like to try a 'synthesis' of these where microstructure-of-cognition like activation patterns of multiple deep learning networks are combined with a specialized language and pragmatics structure acquisition model a la Unsupervised learning of natural languages. See my opinion on cognition below for more in this line.

What could prove me wrong?

On the low success end if it takes longer than I think it would take me given unlimited funding. 

On the high end if I'm wrong with the complexity limits mentioned above. 

Conquering space

Humanity might succeed at leaving the planet but at high costs.

With leaving the planet I mean permanently independent of earth but not neccessarily leaving the solar system any time soon (speculating on that is beyond my confidence interval).

I think it more likely that life leaves the planet - that can be 

  1. artificial intelligence with a robotic body - think of curiosity rover 2.0 (most likely).
  2. intelligent life-forms bred for life in space - think of Magpies those are already smart, small, reproducing fast and have 3D navigation.    
  3. actual humans in suitable protective environment with small autonomous biosperes harvesting asteroids or mars. 
  4. 'cyborgs' - humans altered or bred to better deal with certain problems in space like radiation and missing gravity.  
  5. other - including misc ideas from science fiction (least likely or latest). 

For most of these (esp. those depending on breeding) I'd estimate a time-range of a few thousand years.

What could prove me wrong?

If I'm wrong on the singularity aspect too.

If I'm wrong on the timeline I will be long dead likely in any case except (1) which I expect to see in my lifetime.

Cognitive Base of Rationality, Vaguesness, Foundations of Math

How can we as humans create meaning out of noise?

How can we know truth? How does it come that we know that 'snow is white' when snow is white?

Cognitive neuroscience and artificial learning seems to point toward two aspects:

Fuzzy learning aspect

Correlated patterns of internal and external perception are recognized (detected) via multiple specialized layered neural nets (basically). This yields qualia like 'spoon', 'fear', 'running', 'hot', 'near', 'I'. These are basically symbols, but they are vague with respect to meaning because they result from a recognition process that optimizes for matching not correctness or uniqueness.

Semantic learning aspect

Upon the qualia builds the semantic part which takes the qualia and instead of acting directly on them (as is the normal effect for animals) finds patterns in their activation which is not related to immediate perception or action but at most to memory. These may form new qualia/symbols.

The use of these patterns is that the patterns allow to capture concepts which are detached from reality (detached in so far as they do not need a stimulus connected in any way to perception).

Concepts like ('cry-sound' 'fear') or ('digitalis' 'time-forward' 'heartache') or ('snow' 'white') or - and that is probably the demain of humans: (('one' 'successor') 'two') or (('I' 'happy') ('I' 'think')).  


The interesting thing is that learning works on these concepts like on the normal neuronal nets too. Thus concepts that are reinforced by positive feedback will stabilize and mutually with them the qualia they derive from (if any) will also stabilize.

For certain pure concepts the usability of the concept hinges not on any external factor (like "how does this help me survive") but on social feedback about structure and the process of the formation of the concepts themselves. 

And this is where we arrive at such concepts as 'truth' or 'proposition'.

These are no longer vague - but not because they are represented differently in the brain than other concepts but because they stabilize toward maximized validity (that is stability due to absence of external factors possibly with a speed-up due to social pressure to stabilize). I have written elsewhere that everything that derives its utility not from some external use but from internal consistency could be called math.

And that is why math is so hard for some: If you never gained a sufficient core of self-consistent stabilized concepts and/or the usefulness doesn't derive from internal consistency but from external ("teachers password") usefulness then it will just not scale to more concepts (and the reason why science works at all is that science values internal consistency so highly and there is little more dangerous to science that allowing other incentives).

I really hope that this all makes sense. I haven't summarized this for quite some time.

A few random links that may provide some context:

http://www.blutner.de/NeuralNets/ (this is about the AI context we are talking about)

http://www.blutner.de/NeuralNets/Texts/mod_comp_by_dyn_bin_synf.pdf (research applicable to the above in particular) 

http://c2.com/cgi/wiki?LeibnizianDefinitionOfConsciousness (funny description of levels of consciousness)

http://c2.com/cgi/wiki?FuzzyAndSymbolicLearning (old post by me)

http://grault.net/adjunct/index.cgi?VaguesDependingOnVagues (dito)

Note: Details about the modelling of the semantic part are mostly in my head. 

What could prove me wrong?

Well. Wrong is too hard here. This is just my model and it is not really that concrete. Probably a longer discussion with someone more experienced with AI than I am (and there should be many here) might suffice to rip this appart (provided that I'd find time to prepare my model suitably). 

God and Religion

I wasn't indoctrinated as a child. My truely loving mother is a baptised christian living it and not being sanctimony. She always hoped that I would receive my epiphany. My father has a scientifically influenced personal christian belief. 

I can imagine a God consistent with science on the one hand and on the other hand with free will, soul, afterlife, trinity and the bible (understood as a mix of non-literal word of God and history tale).

I mean, it is not that hard if you can imagine a timeless (simulation of) the universe. If you are god and have whatever plan on earth but empathize with your creations, then it is not hard to add a few more constraints to certain aggregates called existences or 'person lifes'. Constraints that realize free-will in the sense of 'not subject to the whole universe plan satisfaction algorithm'.  

Surely not more difficult than consistent time-travel.

And souls and afterlife should be easy to envision for any science fiction reader familiar with super intelligences.

But why? Occams razor applies. 

There could be a God. And his promise could be real. And it could be a story seeded by an emphatizing God - but also a 'human' God with his own inconsistencies and moods.

But it also could be that this is all a fairy tale run amok in human brains searching for explanations where there are none. A mass delusion. A fixated meme.

Which is right? It is difficult to put probabilities to stories. I see that I have slowly moved from 50/50 agnosticism to tolerent atheism.

I can't say that I wait for my epiphany. I know too well that my brain will happily find patterns when I let it. But I have encouraged to pray for me.

My epiphanies - the aha feelings of clarity that I did experience - have all been about deeply connected patterns building on other such patterns building on reliable facts mostly scientific in nature.

But I haven't lost my morality. It has deepend and widened. I have become even more tolerant (I hope). 

So if God does against all odds exists I hope he will understand my doubts, weight my good deeds and forgive me. You could tag me godless christian. 

What could prove me wrong? 

On the atheist side I could be moved a bit further by more proofs of religion being a human artifact.   

On the theist side there are two possible avenues:

  1. If I'd have an unsearched for epiphany - a real one where I can't say I was hallucinating but e.g. a major consistent insight or a proof of God. 
  2. If I'd be convinced that the singularity is possible. This is because I'd need to update toward being in a simulation as per Simulation argument option 3. That's because then the next likely explanation for all this god business is actually some imperfect being running the simulation.

Thus I'd like to close with this corollary to the simulation argument:

Arguments for the singularity are also (weak) arguments for theism.

Note: I am aware that this long post of controversial opinions unsupported by evidence (in this post) is bound to draw flak. That is the reason I post it in Comments lest my small karma be lost completely. I have to repeat that this is meant as context and that I want to elaborate on these points on LW in due time with more and better organized evidence.

[LINK] Cochrane on Existential Risk

0 Salemicus 20 August 2013 10:42PM

The finance professor John Cochrane recently posted an interesting blog post. The piece is about existential risk in the context of global warming, but it is really a discussion of existential risk generally; many of his points are highly relevant to AI risk.

If we [respond strongly to all low-probability threats], we spend 10 times GDP.

It's a interesting case of framing bias. If you worry only about climate, it seems sensible to pay a pretty stiff price to avoid a small uncertain catastrophe. But if you worry about small uncertain catastrophes, you spend all you have and more, and it's not clear that climate is the highest on the list...

All in all, I'm not convinced our political system is ready to do a very good job of prioritizing outsize expenditures on small ambiguous-probability events.

He also points out that the threat from global warming has a negative beta - i.e. higher future growth rates are likely to be associated with greater risk of global warming, but also the richer our descendants will be. This means both that they will be more able to cope with the threat, and that the damage is less important from a utilitarian point of view. Attempting to stop global warming therefore has positive beta, and therefore requires higher rates of return than simple time-discounting.

It strikes me that this argument applies equally to AI risk, as fruitful artificial intelligence research is likely to be associated with higher economic growth. Moreover:

The economic case for cutting carbon emissions now is that by paying a bit now, we will make our descendants better off in 100 years.

Once stated this way, carbon taxes are just an investment. But is investing in carbon reduction the most profitable way to transfer wealth to our descendants? Instead of spending say $1 trillion in carbon abatement costs, why don't we invest $1 trillion in stocks? If the 100 year rate of return on stocks is higher than the 100 year rate of return on carbon abatement -- likely -- they come out better off. With a gazillion dollars or so, they can rebuild Manhattan on higher ground. They can afford whatever carbon capture or geoengineering technology crops up to clean up our messes.

So should we close down MIRI and invest the funds in an index tracker?

The full post can be found here.

Torture vs Dust Specks Yet Again

-2 sentientplatypus 20 August 2013 12:06PM

The first time I read Torture vs. Specks about a year ago I didn't read a single comment because I assumed the article was making a point that simply multiplying can sometimes get you the wrong answer to a problem. I seem to have had a different "obvious answer" in mind.

And don't get me wrong, I generally agree with the idea that math can do better than moral intuition in deciding questions of ethics. Take this example from Eliezer’s post Circular Altruism which made me realize that I had assumed wrong:

Suppose that a disease, or a monster, or a war, or something, is killing people. And suppose you only have enough resources to implement one of the following two options:
1. Save 400 lives, with certainty.
2. Save 500 lives, with 90% probability; save no lives, 10% probability.

I agree completely that you pick number 2. For me that was just manifestly obvious, of course the math trumps the feeling that you shouldn't gamble with people’s lives…but then we get to torture vs. dust specks and that just did not compute. So I've read most every argument I could find in favor of torture(there are a great deal and I might have missed something critical), but...while I totally understand the argument (I think) I'm still horrified that people would choose torture over dust specks.

I feel that the way that math predominates intuition begins to fall apart when you the problem compares trivial individual suffering with massive individual suffering, in a way very much analogous to the way in which Pascal’s Mugging stops working when you make the credibility really low but the threat really high. Like this. Except I find the answer to torture vs. dust specks to be much easier...


Let me give some examples to illustrate my point.

Can you imagine Harry killing Hermione because Voldemort threatened to plague all sentient life with one barely noticed dust speck each day for the rest of time? Can you imagine killing your own best friend/significant other/loved one to stop the powers of the Matrix from hitting 3^^^3 sentient beings with nearly inconsquential dust specks? Of course not. No. Snap decision.

Eliezer, would you seriously, given the choice by Alpha, the Alien superintelligence that always carries out its threats, give up all your work, and horribly torture some innocent person, all day for fifty years in the face of the threat of a 3^^^3 insignificant dust specks barely inconveniencing sentient beings? Or be tortured for fifty years to avoid the dust specks?

I realize that this is much more personally specific than the original question: but it is someone's loved one, someone's life. And if you wouldn't make the sacrifice what right do you have to say someone else should make it? I feel as though if you want to argue that torture for fifty years is better than 3^^^3 barely noticeable inconveniences you had better well be willing to make that sacrifice yourself.

And I can’t conceive of anyone actually sacrificing their life, or themselves to save the world from dust specks. Maybe I'm committing the typical mind fallacy in believing that no one is that ridiculously altruistic, but does anyone want an Artificial Intelligence that will potentially sacrifice them if it will deal with the universe’s dust speck problem or some equally widespread and trivial equivalent? I most certainly object to the creation of that AI. An AI that sacrifices me to save two others - I wouldn't like that, certainly, but I still think the AI should probably do it if it thinks their lives are of more value. But dust specks on the other hand....

This example made me immediately think that some sort of rule is needed to limit morality coming from math in the development of any AI program. When the problem reaches a certain low level of suffering and is multiplied it by an unreasonably large number it needs to take some kind of huge penalty because otherwise to an AI it would be vastly preferable the whole of Earth be blown up than 3^^^3 people suffer a mild slap to the face.

And really, I don’t think we want to create an Artificial Intelligence that would do that.

I’m mainly just concerned that some factor be incorporated into the design of any Artificial Intelligence that prevents it from murdering myself and others for trivial but widespread causes. Because that just sounds like a sci-fi book of how superintelligence could go horribly wrong.

Engaging First Introductions to AI Risk

20 RobbBB 19 August 2013 06:26AM

I'm putting together a list of short and sweet introductions to the dangers of artificial superintelligence.

My target audience is intelligent, broadly philosophical narrative thinkers, who can evaluate arguments well but who don't know a lot of the relevant background or jargon.

My method is to construct a Sequence mix tape — a collection of short and enlightening texts, meant to be read in a specified order. I've chosen them for their persuasive and pedagogical punchiness, and for their flow in the list. I'll also (separately) list somewhat longer or less essential follow-up texts below that are still meant to be accessible to astute visitors and laypeople.

The first half focuses on intelligence, answering 'What is Artificial General Intelligence (AGI)?'. The second half focuses on friendliness, answering 'How can we make AGI safe, and why does it matter?'. Since the topics of some posts aren't obvious from their titles, I've summarized them using questions they address.


Part I. Building intelligence.

1. Power of Intelligence. Why is intelligence important?

2. Ghosts in the Machine. Is building an intelligence from scratch like talking to a person?

3. Artificial Addition. What can we conclude about the nature of intelligence from the fact that we don't yet understand it?

4. Adaptation-Executers, not Fitness-Maximizers. How do human goals relate to the 'goals' of evolution?

5. The Blue-Minimizing Robot. What are the shortcomings of thinking of things as 'agents', 'intelligences', or 'optimizers' with defined values/goals/preferences?


Part II. Intelligence explosion.

6. Optimization and the Singularity. What is optimization? As optimization processes, how do evolution, humans, and self-modifying AGI differ?

7. Efficient Cross-Domain Optimization. What is intelligence?

8. The Design Space of Minds-In-General. What else is universally true of intelligences?

9. Plenty of Room Above Us. Why should we expect self-improving AGI to quickly become superintelligent?


Part III. AI risk.

10. The True Prisoner's Dilemma. What kind of jerk would Defect even knowing the other side Cooperated?

11. Basic AI drives. Why are AGIs dangerous even when they're indifferent to us?

12. Anthropomorphic Optimism. Why do we think things we hope happen are likelier?

13. The Hidden Complexity of Wishes. How hard is it to directly program an alien intelligence to enact my values?

14. Magical Categories. How hard is it to program an alien intelligence to reconstruct my values from observed patterns?

15. The AI Problem, with Solutions. How hard is it to give AGI predictable values of any sort? More generally, why does AGI risk matter so much?


Part IV. Ends.

16. Could Anything Be Right? What do we mean by 'good', or 'valuable', or 'moral'?

17. Morality as Fixed Computation. Is it enough to have an AGI improve the fit between my preferences and the world?

18. Serious Stories. What would a true utopia be like?

19. Value is Fragile. If we just sit back and let the universe do its thing, will it still produce value? If we don't take charge of our future, won't it still turn out interesting and beautiful on some deeper level?

20. The Gift We Give To Tomorrow. In explaining value, are we explaining it away? Are we making our goals less important?


SummaryFive theses, two lemmas, and a couple of strategic implications.


All of the above were written by Eliezer Yudkowsky, with the exception of The Blue-Minimizing Robot (by Yvain), Plenty of Room Above Us and The AI Problem (by Luke Muehlhauser), and Basic AI Drives (a wiki collaboration). Seeking a powerful conclusion, I ended up making a compromise between Eliezer's original The Gift We Give To Tomorrow and Raymond Arnold's Solstice Ritual Book version. It's on the wiki, so you can further improve it with edits.


Further reading:


I'm posting this to get more feedback for improving it, to isolate topics for which we don't yet have high-quality, non-technical stand-alone introductions, and to reintroduce LessWrongers to exceptionally useful posts I haven't seen sufficiently discussed, linked, or upvoted. I'd especially like feedback on how the list I provided flows as a unit, and what inferential gaps it fails to address. My goals are:

A. Via lucid and anti-anthropomorphic vignettes, to explain AGI in a way that encourages clear thought.

B. Via the Five Theses, to demonstrate the importance of Friendly AI research.

C. Via down-to-earth meta-ethics, humanistic poetry, and pragmatic strategizing, to combat any nihilisms, relativisms, and defeatisms that might be triggered by recognizing the possibility (or probability) of Unfriendly AI.

D. Via an accessible, substantive, entertaining presentation, to introduce the raison d'être of LessWrong to sophisticated newcomers in a way that encourages further engagement with LessWrong's community and/or content.

What do you think? What would you add, remove, or alter?

[Link] My talk about the Future

2 Stuart_Armstrong 19 July 2013 01:02PM

I recently gave a talk at the IARU Summer School on the Ethics of Technology.

In it, I touched on many of the research themes of the FHI: the accuracy of predictions, the limitations and biases of predictors, the huge risks that humanity may face, the huge benefits that we may gain, and the various ethical challenges that we'll face in the future.

Nothing really new for anyone who's familiar with our work, but some may enjoy perusing it.

The idiot savant AI isn't an idiot

6 Stuart_Armstrong 18 July 2013 03:43PM

A stub on a point that's come up recently.

If I owned a paperclip factory, and casually told my foreman to improve efficiency while I'm away, and he planned a takeover of the country, aiming to devote its entire economy to paperclip manufacturing (apart from the armament factories he needed to invade neighbouring countries and steal their iron mines)... then I'd conclude that my foreman was an idiot (or being wilfully idiotic). He obviously had no idea what I meant. And if he misunderstood me so egregiously, he's certainly not a threat: he's unlikely to reason his way out of a paper bag, let alone to any position of power.

If I owned a paperclip factory, and casually programmed my superintelligent AI to improve efficiency while I'm away, and it planned a takeover of the country... then I can't conclude that the AI is an idiot. It is following its programming. Unlike a human that behaved the same way, it probably knows exactly what I meant to program in. It just doesn't care: it follows its programming, not its knowledge about what its programming is "meant" to be (unless we've successfully programmed in "do what I mean", which is basically the whole of the challenge). We can't therefore conclude that it's incompetent, unable to understand human reasoning, or likely to fail.

We can't reason by analogy with humans. When AIs behave like idiot savants with respect to their motivations, we can't deduce that they're idiots.

Comparative and absolute advantage in AI

18 Stuart_Armstrong 16 July 2013 09:52AM

The theory of comparative advantage says that you should trade with people, even if they are worse than you at everything (ie even if you have an absolute advantage). Some have seen this idea as a reason to trust powerful AIs.

For instance, suppose you can make a hamburger by using 10 000 joules of energy. You can also make a cat video for the same cost. The AI, on the other hand, can make hamburgers for 5 joules each and cat videos for 20.

Then you both can gain from trade. Instead of making a hamburger, make a cat video instead, and trade it for two hamburgers. You've got two hamburgers for 10 000 joules of your own effort (instead of 20 000), and the AI has got a cat video for 10 joules of its own effort (instead of 20). So you both want to trade, and everything is fine and beautiful and many cat videos and hamburgers will be made.

Except... though the AI would prefer to trade with you rather than not trade with you, it would much, much prefer to dispossess you of your resources and use them itself. With the energy you wasted on a single cat video, it could have produced 500 of them! If it values these videos, then it is desperate to take over your stuff. Its absolute advantage makes this too tempting.

Only if its motivation is properly structured, or if it expected to lose more, over the course of history, by trying to grab your stuff, would it desist. Assuming you could make a hundred cat videos a day, and the whole history of the universe would only run for that one day, the AI would try and grab your stuff even if it thought it would only have one chance in fifty thousand of succeeding. As the history of the universe lengthens, or the AI becomes more efficient, then it would be willing to rebel at even more ridiculous odds.

So if you already have guarantees in place to protect yourself, then comparative advantage will make the AI trade with you. But if you don't, comparative advantage and trade don't provide any extra security. The resources you waste are just too valuable to the AI.

EDIT: For those who wonder how this compares to trade between nations: it's extremely rare for any nation to have absolute advantages everywhere (especially this extreme). If you invade another nation, most of their value is in their infrastructure and their population: it takes time and effort to rebuild and co-opt these. Most nations don't/can't think long term (it could arguably be in US interests over the next ten million years to start invading everyone - but "the US" is not a single entity, and doesn't think in terms of "itself" in ten million years), would get damaged in a war, and are risk averse. And don't forget the importance of diplomatic culture and public opinion: even if it was in the US's interests to invade the UK, say, "it" would have great difficulty convincing its elites and its population to go along with this.

Against easy superintelligence: the unforeseen friction argument

24 Stuart_Armstrong 10 July 2013 01:47PM

In 1932, Stanley Baldwin, prime minister of the largest empire the world had ever seen, proclaimed that "The bomber will always get through". Backed up by most of the professional military opinion of the time, by the experience of the first world war, and by reasonable extrapolations and arguments, he laid out a vision of the future where the unstoppable heavy bomber would utterly devastate countries if a war started. Deterrence - building more bombers yourself to threaten complete retaliation - seemed the only counter.

And yet, things didn't turn out that way. Against all past trends, the light fighter plane surpassed the heavily armed bomber in aerial combat, the development of radar changed the strategic balance, and cities and industry proved much more resilient to bombing than anyone had a right to suspect.

Could anyone have predicted these changes ahead of time? Most probably, no. All of these ran counter to what was known and understood, (and radar was a completely new and unexpected development). What could and should have been predicted, though, was that something would happen to weaken the impact of the all-conquering bomber. The extreme predictions would be unrealistic; frictions, technological changes, changes in military doctrine and hidden, unknown factors, would undermine them.

This is what I call the "generalised friction" argument. Simple predictive models, based on strong models or current understanding, will likely not succeed as well as expected: there will likely be delays, obstacles, and unexpected difficulties along the way.

I am, of course, thinking of AI predictions here, specifically of the Omohundro-Yudkowsky model of AI recursive self-improvements that rapidly reach great power, with convergent instrumental goals that make the AI into a power-hungry expected utility maximiser. This model I see as the "supply and demand curve" of AI prediction: too simple to be true in the form described.

But the supply and demand curves are generally approximately true, especially over the long term. So this isn't an argument that the Omohundro-Yudkowsky model is wrong, but that it will likely not happen as flawlessly as described. Ultimately, the "bomber will always get through" turned out to be true: but only in the form of the ICBM. If you take the old arguments and replace "bomber" with "ICBM", you end with strong and accurate predictions. So "the AI may not foom in the manner and on the timescales described" is not saying "the AI won't foom".

Also, it should be emphasised that this argument is strictly about our predictive ability, and does not say anything about the capacity or difficulty of AI per se.

continue reading »

The failure of counter-arguments argument

13 Stuart_Armstrong 10 July 2013 01:38PM

Suppose you read a convincing-seeming argument by Karl Marx, and get swept up in the beauty of the rhetoric and clarity of the exposition. Or maybe a creationist argument carries you away with its elegance and power. Or maybe you've read Eliezer's take on AI risk, and, again, it seems pretty convincing.

How could you know if these arguments are sound? Ok, you could whack the creationist argument with the scientific method, and Karl Marx with the verdict of history, but what would you do if neither was available (as they aren't available when currently assessing the AI risk argument)? Even if you're pretty smart, there's no guarantee that you haven't missed a subtle logical flaw, a dubious premise or two, or haven't got caught up in the rhetoric.

One thing should make you believe the argument more strongly: and that's if the argument has been repeatedly criticised, and the criticisms have failed to puncture it. Unless you have the time to become an expert yourself, this is the best way to evaluate arguments where evidence isn't available or conclusive. After all, opposite experts presumably know the subject intimately, and are motivated to identify and illuminate the argument's weaknesses.

If counter-arguments seem incisive, pointing out serious flaws, or if the main argument is being continually patched to defend it against criticisms - well, this is strong evidence that main argument is flawed. Conversely, if the counter-arguments continually fail, then this is good evidence that the main argument is sound. Not logical evidence - a failure to find a disproof doesn't establish a proposition - but good Bayesian evidence.

In fact, the failure of counter-arguments is much stronger evidence than whatever is in the argument itself. If you can't find a flaw, that just means you can't find a flaw. If counter-arguments fail, that means many smart and knowledgeable people have thought deeply about the argument - and haven't found a flaw.

And as far as I can tell, critics have constantly failed to counter the AI risk argument. To pick just one example, Holden recently provided a cogent critique of the value of MIRI's focus on AI risk reduction. Eliezer wrote a response to it (I wrote one as well). The core of Eliezer's and my response wasn't anything new; they were mainly a rehash of what had been said before, with a different emphasis.

And most responses to critics of the AI risk argument take this form. Thinking for a short while, one can rephrase essentially the same argument, with a change in emphasis to take down the criticism. After a few examples, it becomes quite easy, a kind of paint-by-numbers process of showing that the ideas the critic has assumed, do not actually make the AI safe.

You may not agree with my assessment of the critiques, but if you do, then you should adjust your belief in AI risk upwards. There's a kind of "conservation of expected evidence" here: if the critiques had succeeded, you'd have reduced the probability of AI risk, so their failure must push you in the opposite direction.

In my opinion, the strength of the AI risk argument derives 30% from the actual argument, and 70% from the failure of counter-arguments. This would be higher, but we haven't yet seen the most prominent people in the AI community take a really good swing at it.

From Capuchins to AI's, Setting an Agenda for the Study of Cultural Cooperation (Part1)

-3 diegocaleiro 27 June 2013 06:08AM
This is a multi-purpose essay-on-the-making, it is being written aiming at the following goals 1) Mandatory essay writing at the end of a semester studying "Cognitive Ethology: Culture in Human and Non-Human Animals" 2) Drafting something that can later on be published in a journal that deals with cultural evolution, hopefully inclining people in the area to glance at future oriented research, i.e. FAI and global coordination 3) Publishing it in Lesswrong and 4) Ultimately Saving the World, as everything should. If it's worth doing, it's worth doing in the way most likely to save the World.
Since many of my writings are frequently too long for Lesswrong, I'll publish this in a sequence-like form made of self-contained chunks. My deadline is Sunday, so I'll probably post daily, editing/creating the new sessions based on previous commentary.

Abstract: The study of cultural evolution has drawn much of its momentum from academic areas far removed from human and animal psychology, specially regarding the evolution of cooperation. Game theoretic results and parental investment theory come from economics, kin selection models from biology, and an ever growing amount of models describing the process of cultural evolution in general, and the evolution of altruism in particular come from mathematics. Even from Artificial Intelligence interest has been cast on how to create agents that can communicate, imitate and cooperate. In this article I begin to tackle the 'why?' question. By trying to retrospectively make sense of the convergence of all these fields, I contend that further refinements in these fields should be directed towards understanding how to create environmental incentives fostering cooperation.



We need systems that are wiser than we are. We need institutions and cultural norms that make us better than we tend to be. It seems to me that the greatest challenge we now face is to build them. - Sam Harris, 2013, The Power Of Bad Incentives

1) Introduction

2) Cultures evolve

Culture is perhaps the most remarkable outcome of the evolutionary algorithm (Dennett, 1996) so far. It is the cradle of most things we consider humane - that is, typically human and valuable - and it surrounds our lives to the point that we may be thought of as creatures made of culture even more than creatures of bone and flesh (Hofstadter, 2007; Dennett, 1992). The appearance of our cultural complexity has relied on many associated capacities, among them:

1) The ability to observe, be interested by, and go nearby an individual doing something interesting, an ability we share with norway rats, crows, and even lemurs (Galef & Laland, 2005).

2) Ability to learn from and scrounge the food of whoever knows how to get food, shared by capuchin monkeys (Ottoni et al, 2005).

3) Ability to tolerate learners, to accept learners, and to socially learn, probably shared by animals as diverse as fish, finches and Fins (Galef & Laland, 2005).

4) Understanding and emulating other minds - Theory of Mind- empathizing, relating, perhaps re-framing an experience as one's own, shared by chimpanzees, dogs, and at least some cetaceans (Rendella & Whitehead, 2001).

5) Learning the program level description of the action of others, for which the evidence among other animals is controversial (but see Cantor & Whitehead, 2013). And finally...

6) Sharing intentions. Intricate understanding of how two minds can collaborate with complementary tasks to achieve a mutually agreed goal (Tomasello et al, 2005).

Irrespective of definitional disputes around the true meaning of the word "culture" (which doesn't exist, see e.g. Pinker, 2007 pg115; Yudkowsky 2008A), each of these is more cognitively complex than its predecessor, and even (1) is sufficient for intra-specific non-environmental, non-genetic behavioral variation, which I will call "culture" here, whoever it may harm.

By transitivity, (2-6) allow the development of culture. It is interesting to notice that tool use, frequently but falsely cited as the hallmark of culture, is ubiquitously equiprobable in the animal kingdom. A graph showing, per biological family, which species shows tool use gives us a power law distribution, whose similarity with the universal prior will help in understanding that being from a family where a species uses tools tells us very little about a specie's own tool use (Michael Haslam, personal conversation).

Once some of those abilities are available, and given an amount of environmental facilities, need, and randomness, cultures begin to form. Occasionally, so do more developed traditions. Be it by imitation, program level imitation, goal emulation or intention sharing, information is transmitted between agents giving rise to elements sufficient to constitute a primeval Darwinian soup. That is, entities form such that they exhibit 1)Variation 2)Heredity or replication 3)Differential fitness (Dennett, 1996). In light of the article Five Misunderstandings About Cultural Evolution (Henrich, Boyd & Richerson, 2008) we can improve Dennett's conditions for the evolutionary algorithm as 1)Discrete or continuous variation 2)Heredity, replication, or less faithful replication plus content attractors 3)Differential fitness. Once this set of conditions is met, an evolutionary algorithm, or many, begin to carve their optimizing paws into whatever surpassed the threshold for long enough. Cultures, therefore, evolve. 

The intricacies of cultural evolution and mathematical and computational models of how cultures evolve have been the subject of much interdisciplinary research, for an extensive account of human culture see Not By Genes Alone (Richerson & Boyd, 2005). For computational models of social evolution, there is work by Mesoudi, Novak, and others e.g. (Hauert et al, 2007). For mathematical models, the aptly named Mathematical models of social evolution: A guide for the perplexed by McElrath and Rob Boyd (2007) makes the textbook-style walk-through. For animal culture, see (Laland & Galef, 2009).

Cultural evolution satisfies David Deutsch's criterion for existence, it kicks back, it satisfies the evolutionary equivalent of the  condition posed by the Quine-Putnam Indispensability argument in mathematics, i.e. it is a sine qua non condition for understanding how the World works nomologically. It is falsifiable to Popperian content, and it inflates the Worlds ontology a little, by inserting a new kind of "replicator", the meme. Contrary to what happened on the internet, the name 'meme' has lost much of it's appeal within cultural evolution theorists, and "memetics" is considered by some to refer only to the study of memes as monolithic atomic high fidelity replicators, which would make the theory obsolete. This has created the following conundrum: the name 'meme' remains by far the most well known one to speak of "that which evolves culturally" within, and specially outside, the specialist arena. Further, the niche occupied by the word 'meme' is so conceptually necessary within the area to communicate and explain that it is frequently put under scare quotes, or some other informal excuse. In fact, as argued by Tim Tyler - who frequently posts here - in the very sharp Memetics (2010), there are nearly no reasons to try to abandon the 'meme' meme, and nearly all reasons (practicality, Qwerty reasons, mnemonics) to keep it. To avoid contradicting the evidence ever since Dawkins first coined the term, I suggest we must redefine Meme as an attractor in cultural evolution (dual-inheritance) whose development over time structurally mimics to a significant extent the discrete behavior of genes, frequently coinciding with the smallest unit of cultural replication. The definition is long, but the idea is simple: Memes are not the best analogues of genes because they are discrete units that replicate just like genes, but because they are continuous conceptual clusters being attracted to a point in conceptual space whose replication is just like that of genes. Even more simply, memes are the mathematically closest things to genes in cultural evolution. So the suggestion here is for researchers of dual-inheritance and cultural evolution to take off the scare quotes of our memes and keep business as usual.  

The evolutionary algorithm has created a new attractor-replicator, the meme, it didn't privilege with it any specific families in the biological trees and it ended up creating a process of cultural-genetic coevolution known as dual-inheritance. This process has been studied in ever more quantified ways by primatologists, behavioral ecologists, population biologists, anthropologists, ethologists, sociologists, neuroscientists and even philosophers. I've shown at least six distinct abilities which helped scaffold our astounding level of cultural intricacy, and some animals who share them with us. We will now take a look at the evolution of cooperation, collaboration, altruism, moral behavior, a sub-area of cultural evolution that saw an explosion of interest and research during the last decade, with publications (most from the last 4 years) such as The Origins of Morality, Supercooperators, Good and Real, The Better Angels of Our Nature, Non-Zero, The Moral Animal, Primates and Philosophers, The Age of Empathy, Origins of Altruism and Cooperation, The Altruism Equation, Altruism in Humans, Cooperation and Its Evolution, Moral Tribes, The Expanding Circle, The Moral Landscape.

3) Cooperation evolves

Shortly describe why and show some inequations under which cooperation is an equelibrium, or at least an Evolutionarily Stable Strategy.

4) The complexity of cultural items doesn't undermine the validity of mathematical models.

 4.1) Cognitive attractors and biases substitute for memes discreteness

The math becomes equivalent.

 4.2) Despite the Unilateralist Curse and the Tragedy of the Commons, dyadic interaction models help us understand large scale cooperation

Once we know these two failure modes, dyadic iterated (or reputation-sensitive) interaction is close enough.

5) From Monkeys to Apes to Humans to Transhumans to AIs, the ranges of achievable altruistic skill.

Possible modes of being altruistic. Graph like Bostrom's. Second and third order punishment and cooperation. Newcomb-like signaling problems within AI.

6) Unfit for the Future: the need for greater altruism.

We fail and will remain failing in Tragedy of the Commons problems unless we change our nature.

7) From Science, through Philosophy, towards Engineering: the future of studies of altruism.

Philosophy: Existential Risk prevention through global coordination and cooperation prior to technical maturity. Engineering Humans: creating enhancements and changing incentives. Engineering AI's: making them better and realer.

8) A different kind of Moral Landscape

Like Sam Harris's one, except comparing not how much a society approaches The Good Life (Moral Landscape pg15), but how much it fosters altruistic behaviour.

9) Conclusions

I haven't written yet, so I don't have any!





Bibliography (Only of the part already written, obviously):

Cantor, M., & Whitehead, H. (2013). The interplay between social networks and culture: theoretically and among whales and dolphins. Philosophical Transactions of the Royal Society B: Biological Sciences368(1618).

Dennett, D. C. (1996). Darwin's dangerous idea: Evolution and the meanings of life (No. 39). Simon & Schuster.

Dennett, D. C. (1992). The self as a center of narrative gravity. Self and consciousness: Multiple perspectives.

Galef Jr, B. G., & Laland, K. N. (2005). Social learning in animals: empirical studies and theoretical models. Bioscience55(6), 489-499.

Hauert, C., Traulsen, A., Brandt, H., Nowak, M. A., & Sigmund, K. (2007). Via freedom to coercion: the emergence of costly punishment. science316(5833), 1905-1907.

Henrich, J., Boyd, R., & Richerson, P. J. (2008). Five misunderstandings about cultural evolution. Human Nature, 19(2), 119-137.

Hofstadter, D. R. (2007). I am a Strange Loop. Basic Books

McElreath, R., & Boyd, R. (2007). Mathematical models of social evolution: A guide for the perplexed. University of Chicago Press.

Ottoni, E. B., de Resende, B. D., & Izar, P. (2005). Watching the best nutcrackers: what capuchin monkeys (Cebus apella) know about others’ tool-using skills. Animal cognition8(4), 215-219.

Persson, I., & Savulescu, J. Unfit for the Future: The Need for Moral Enhancement Oxford: Oxford University Press, 2012 ISBN 978-0199653645 (HB)£ 21.00. 160pp. On the brink of civil war, Abraham Lincoln stood on the steps of the US Capitol and appealed.

Pinker, S. (2007). The stuff of thought: Language as a window into human nature. Viking Adult.

Rendella, L., & Whitehead, H. (2001). Culture in whales and dolphins.Behavioral and Brain Sciences24, 309-382.

Richardson, P. J., & Boyd, R. (2005). Not by genes alone. University of Chicago Press.

Tyler, T. (2011). Memetics: Memes and the Science of Cultural Evolution. Tim Tyler.

Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences28(5), 675-690.

Yudkowsky, E. (2008A). 37 ways words can be wrong. Available at http://lesswrong.com/lw/od/37_ways_that_words_can_be_wrong/

For FAI: Is "Molecular Nanotechnology" putting our best foot forward?

47 leplen 22 June 2013 04:44AM

Molecular nanotechnology, or MNT for those of you who love acronyms, seems to be a fairly common trope on LW and related literature. It's not really clear to me why. In many of the examples of "How could AI's help us" or "How could AI's rise to power" phrases like "cracks protein folding" or "making a block of diamond is just as easy as making a block of coal" are thrown about in ways that make me very very uncomfortable. Maybe it's all true, maybe I'm just late to the transhumanist party and the obviousness of this information was with my invitation that got lost in the mail, but seeing all the physics swept under the rug like that sets off every crackpot alarm I have.

I must post the disclaimer that I have done a little bit of materials science, so maybe I'm just annoyed that you're making me obsolete, but I don't see why this particular possible future gets so much attention. Let us assume that a smarter than human AI will be very difficult to control and represents a large positive or negative utility for the entirety of the human race. Even given that assumption, it's still not clear to me that MNT is a likely element of the future. It isn't clear to me than MNT is physically practical. I don't doubt that it can be done. I don't doubt that very clever metastable arrangements of atoms with novel properties can be dreamed up. Indeed, that's my day job, but I have a hard time believing the only reason you can't make a nanoassembler capable of arbitrary manipulations out of a handful of bottles you ordered from Sigma-Aldrich is because we're just not smart enough. Manipulating individuals atoms means climbing huge binding energy curves, it's an enormously steep, enormously complicated energy landscape, and the Schrodinger Equation scales very very poorly as you add additional particles and degrees of freedom. Building molecular nanotechnology seems to me to be roughly equivalent to being able to make arbitrary lego structures by shaking a large bin of lego in a particular way while blindfolded. Maybe a super human intelligence is capable of doing so, but it's not at all clear to me that it's even possible.

I assume the reason than MNT is added to a discussion on AI is because we're trying to make the future sound more plausible via adding burdensome details.  I understand that AI and MNT is less probable than AI or MNT alone, but that both is supposed to sound more plausible. This is precisely where I have difficulty. I would estimate the probability of molecular nanotechnology (in the form of programmable replicators, grey goo, and the like) as lower than the probability of human or super human level AI. I can think of all sorts of objection to the former, but very few objections to the latter. Including MNT as a consequence of AI, especially including it without addressing any of the fundamental difficulties of MNT, I would argue harms the credibility of AI researchers. It makes me nervous about sharing FAI literature with people I work with, and it continues to bother me. 

I am particularly bothered by this because it seems irrelevant to FAI. I'm fully convinced that a smarter than human AI could take control of the Earth via less magical means, using time tested methods such as manipulating humans, rigging elections, making friends, killing its enemies, and generally only being a marginally more clever and motivated than a typical human leader. A smarter than human AI could out-manipulate human institutions and out-plan human opponents with the sort of ruthless efficiency that modern computers beat humans in chess. I don't think convincing people that smarter than human AI's have enormous potential for good and evil is particularly difficult, once you can get them to concede that smarter than human AIs are possible. I do think that waving your hands and saying super-intelligence at things that may be physically impossible makes the whole endeavor seem less serious. If I had read the chain of reasoning smart computer->nanobots before I had built up a store of good-will from reading the Sequences, I would have almost immediately dismissed the whole FAI movement a bunch of soft science fiction, and it would have been very difficult to get me to take a second look.

Put in LW parlance, suggesting things not known to be possible by modern physics without detailed explanations puts you in the reference class "people on the internet who have their own ideas about physics". It didn't help, in my particular case, that one of my first interactions on LW was in fact with someone who appears to have their own view about a continuous version of quantum mechanics.

And maybe it's just me. Maybe this did not bother anyone else, and it's an incredible shortcut for getting people to realize just how different a future a greater than human intelligence makes possible and there is no better example. It does alarm me though, because I think that physicists and the kind of people who notice and get uncomfortable when you start invoking magic in your explanations may be the kind of people FAI is trying to attract.

After critical event W happens, they still won't believe you

36 Eliezer_Yudkowsky 13 June 2013 09:59PM

In general and across all instances I can think of so far, I do not agree with the part of your futurological forecast in which you reason, "After event W happens, everyone will see the truth of proposition X, leading them to endorse Y and agree with me about policy decision Z."

Example 1:  "After a 2-year-old mouse is rejuvenated to allow 3 years of additional life, society will realize that human rejuvenation is possible, turn against deathism as the prospect of lifespan / healthspan extension starts to seem real, and demand a huge Manhattan Project to get it done."  (EDIT:  This has not happened, and the hypothetical is mouse healthspan extension, not anything cryonic.  It's being cited because this is Aubrey de Grey's reasoning behind the Methuselah Mouse Prize.)

Alternative projection:  Some media brouhaha.  Lots of bioethicists acting concerned.  Discussion dies off after a week.  Nobody thinks about it afterward.  The rest of society does not reason the same way Aubrey de Grey does.

Example 2:  "As AI gets more sophisticated, everyone will realize that real AI is on the way and then they'll start taking Friendly AI development seriously."

Alternative projection:  As AI gets more sophisticated, the rest of society can't see any difference between the latest breakthrough reported in a press release and that business earlier with Watson beating Ken Jennings or Deep Blue beating Kasparov; it seems like the same sort of press release to them.  The same people who were talking about robot overlords earlier continue to talk about robot overlords.  The same people who were talking about human irreproducibility continue to talk about human specialness.  Concern is expressed over technological unemployment the same as today or Keynes in 1930, and this is used to fuel someone's previous ideological commitment to a basic income guarantee, inequality reduction, or whatever.  The same tiny segment of unusually consequentialist people are concerned about Friendly AI as before.  If anyone in the science community does start thinking that superintelligent AI is on the way, they exhibit the same distribution of performance as modern scientists who think it's on the way, e.g. Hugo de Garis, Ben Goertzel, etc.

Consider the situation in macroeconomics.  When the Federal Reserve dropped interest rates to nearly zero and started printing money via quantitative easing, we had some people loudly predicting hyperinflation just because the monetary base had, you know, gone up by a factor of 10 or whatever it was.  Which is kind of understandable.  But still, a lot of mainstream economists (such as the Fed) thought we would not get hyperinflation, the implied spread on inflation-protected Treasuries and numerous other indicators showed that the free market thought we were due for below-trend inflation, and then in actual reality we got below-trend inflation.  It's one thing to disagree with economists, another thing to disagree with implied market forecasts (why aren't you betting, if you really believe?) but you can still do it sometimes; but when conventional economics, market forecasts, and reality all agree on something, it's time to shut up and ask the economists how they knew.  I had some credence in inflationary worries before that experience, but not afterward...  So what about the rest of the world?  In the heavily scientific community you live in, or if you read econblogs, you will find that a number of people actually have started to worry less about inflation and more about sub-trend nominal GDP growth.  You will also find that right now these econblogs are having worry-fits about the Fed prematurely exiting QE and choking off the recovery because the elderly senior people with power have updated more slowly than the econblogs.  And in larger society, if you look at what happens when Congresscritters question Bernanke, you will find that they are all terribly, terribly concerned about inflation.  Still.  The same as before.  Some econblogs are very harsh on Bernanke because the Fed did not print enough money, but when I look at the kind of pressure Bernanke was getting from Congress, he starts to look to me like something of a hero just for following conventional macroeconomics as much as he did.

That issue is a hell of a lot more clear-cut than the medical science for human rejuvenation, which in turn is far more clear-cut ethically and policy-wise than issues in AI.

After event W happens, a few more relatively young scientists will see the truth of proposition X, and the larger society won't be able to tell a damn difference.  This won't change the situation very much, there are probably already some scientists who endorse X, since X is probably pretty predictable even today if you're unbiased.  The scientists who see the truth of X won't all rush to endorse Y, any more than current scientists who take X seriously all rush to endorse Y.  As for people in power lining up behind your preferred policy option Z, forget it, they're old and set in their ways and Z is relatively novel without a large existing constituency favoring it.  Expect W to be used as argument fodder to support conventional policy options that already have political force behind them, and for Z to not even be on the table.

Risques existentiels en Français

5 Stuart_Armstrong 02 June 2013 05:19PM

I've just been interviewed by Radio-Canada (in French) for their program "Dessine moi un Dimanche". There really wasn't enough time (the interview apparently lasted nine minutes; it felt like two), but I managed to touch upon some of the technology risks of the coming century (including AI).

The segment can be found here: http://www.radio-canada.ca/emissions/dessine_moi_un_dimanche/2012-2013/chronique.asp?idChronique=295886

Kevin Drum's Article about AI and Technology

19 knb 15 May 2013 07:38AM

Kevin Drum has an article in Mother Jones about AI and Moore's Law:

THIS IS A STORY ABOUT THE FUTURE. Not the unhappy future, the one where climate change turns the planet into a cinder or we all die in a global nuclear war. This is the happy version. It's the one where computers keep getting smarter and smarter, and clever engineers keep building better and better robots. By 2040, computers the size of a softball are as smart as human beings. Smarter, in fact. Plus they're computers: They never get tired, they're never ill-tempered, they never make mistakes, and they have instant access to all of human knowledge.

The result is paradise. Global warming is a problem of the past because computers have figured out how to generate limitless amounts of green energy and intelligent robots have tirelessly built the infrastructure to deliver it to our homes. No one needs to work anymore. Robots can do everything humans can do, and they do it uncomplainingly, 24 hours a day. Some things remain scarce—beachfront property in Malibu, original Rembrandts—but thanks to super-efficient use of natural resources and massive recycling, scarcity of ordinary consumer goods is a thing of the past. Our days are spent however we please, perhaps in study, perhaps playing video games. It's up to us.

Although he only mentions consumer goods, Drum presumably means that scarcity will end for services and consumer goods. If scarcity only ended for consumer goods, people would still have to work (most jobs are currently in the services economy). 

Drum explains that our linear-thinking brains don't intuitively grasp exponential systems like Moore's law. 

Suppose it's 1940 and Lake Michigan has (somehow) been emptied. Your job is to fill it up using the following rule: To start off, you can add one fluid ounce of water to the lake bed. Eighteen months later, you can add two. In another 18 months, you can add four ounces. And so on. Obviously this is going to take a while.

By 1950, you have added around a gallon of water. But you keep soldiering on. By 1960, you have a bit more than 150 gallons. By 1970, you have 16,000 gallons, about as much as an average suburban swimming pool.

At this point it's been 30 years, and even though 16,000 gallons is a fair amount of water, it's nothing compared to the size of Lake Michigan. To the naked eye you've made no progress at all.

So let's skip all the way ahead to 2000. Still nothing. You have—maybe—a slight sheen on the lake floor. How about 2010? You have a few inches of water here and there. This is ridiculous. It's now been 70 years and you still don't have enough water to float a goldfish. Surely this task is futile?

But wait. Just as you're about to give up, things suddenly change. By 2020, you have about 40 feet of water. And by 2025 you're done. After 70 years you had nothing. Fifteen years later, the job was finished.

He also includes this nice animated .gif which illustrates the principle very clearly. 

Drum continues by talking about possible economic ramifications.

Until a decade ago, the share of total national income going to workers was pretty stable at around 70 percent, while the share going to capital—mainly corporate profits and returns on financial investments—made up the other 30 percent. More recently, though, those shares have started to change. Slowly but steadily, labor's share of total national income has gone down, while the share going to capital owners has gone up. The most obvious effect of this is the skyrocketing wealth of the top 1 percent, due mostly to huge increases in capital gains and investment income.

Drum says the share of (US) national income going to workers was stable until about a decade ago. I think the graph he links to shows the worker's share has been declining since approximately the late 1960s/early 1970s. This is about the time US immigration levels started increasing (which raises returns to capital and lowers native worker wages). 

The rest of Drum's piece isn't terribly interesting, but it is good to see mainstream pundits talking about these topics.

Journalist's piece about predicting AI

3 Stuart_Armstrong 02 April 2013 02:49PM

Here's a piece by Mark Piesing in Wired UK about the difficulty and challenges in predicting AI. It covers a lot of our (Stuart Armstrong, Kaj Sotala and Seán Óh Éigeartaigh) research into AI prediction, along with Robin Hanson's response. It will hopefully cause people to look more deeply into our work, as published online, in the Pilsen Beyond AI conference proceedings, and forthcoming as "The errors, insights and lessons of famous AI predictions and what they mean for the future".

Why AI may not foom

22 John_Maxwell_IV 24 March 2013 08:11AM


  • There's a decent chance that the intelligence of a self-improving AGI will grow in a relatively smooth exponential or sub-exponential way, not super-exponentially or with large jump discontinuities.
  • If this is the case, then an AGI whose effective intelligence matched that of the world's combined AI researchers would make AI progress at the rate they do, taking decades to double its own intelligence.
  • The risk that the first successful AGI will quickly monopolize many industries, or quickly hack many of the computers connected to the internet, seems worth worrying about.  In either case, the AGI would likely end up using the additional computing power it gained to self-modify so it was superintelligent.
  • AI boxing could mitigate both of these risks greatly.
  • If hard takeoff could be impossible, it might be best to assume this case and concentrate our resources on ensuring a safe soft takeoff, given that the prospects for a safe hard takeoff look grim.


Takeoff models discussed in the Hanson-Yudkowsky debate

The supercritical nuclear chain reaction model

Yudkowsky alludes to this model repeatedly, starting in this post:

When a uranium atom splits, it releases neutrons - some right away, some after delay while byproducts decay further.  Some neutrons escape the pile, some neutrons strike another uranium atom and cause an additional fission.  The effective neutron multiplication factor, denoted k, is the average number of neutrons from a single fissioning uranium atom that cause another fission...

It might seem that a cycle, with the same thing happening over and over again, ought to exhibit continuous behavior.  In one sense it does.  But if you pile on one more uranium brick, or pull out the control rod another twelve inches, there's one hell of a big difference between k of 0.9994 and k of 1.0006.

I don't like this model much for the following reasons:

  • The model doesn't offer much insight in to the time scale over which an AI might self-improve.  The "mean generation time" (time necessary for the next "generation" of neutrons to be released) of a nuclear chain reaction is short, and the doubling time for neutron activity in Fermi's experiment was just two minutes, but it hardly seems reasonable to generalize this to self-improving AIs.
  • A flurry of insights that either dies out or expands exponentially doesn't seem like a very good description of how human minds work, and I don't think it would describe an AGI well either.  Many people report that taking time to think about problems is key to their problem-solving process.  It seems likely that an AGI unable to immediately generate insight in to some problem would have a slower and more exhaustive "fallback" search process that would allow it to continue making progress.  (Insight could also work via a search process in the first place--over the space of permutations in one's mental model, say.)

The "differential equations folded on themselves" model

This is another model Eliezer alludes to, albeit in a somewhat handwavey fashion:

When you fold a whole chain of differential equations in on itself like this, it should either peter out rapidly as improvements fail to yield further improvements, or else go FOOM.

It's not exactly clear to me what the "whole chain of differential equations" is supposed to refer to... there's only one differential equation in the preceding paragraph, and it's a standard exponential (which could be scary or not, depending on the multiplier in the exponent.  Rabbit populations and bank account balances both grow exponentially in a way that's slow enough for humans to understand and control.)

Maybe he's referring to the levels he describes here: metacognitive, cognitive, metaknowledge, knowledge, and object.  How might we paramaterize this system?

Let's say c is our AGI's cognition ability, dc/dt is the rate of change in our AGI's cognitive ability, m is our AGI's "metaknowledge" (about cognition and metaknowledge), and dm/dt is the rate of change in metaknowledge.  What I've got in mind is:

where p and q are constants.

In other words, both change in cognitive ability and change in metaknowledge are each individually directly proportionate to both cognitive ability and metaknowledge.

I don't know much about understanding systems of differential equations, so if you do, please comment!  I put the above system in to Wolfram Alpha, but I'm not exactly sure how to interpret the solution provided.  In any case, fooling around with this script suggests sudden, extremely sharp takeoff for a variety of different test parameters.

The straight exponential model

To me, the "proportionality thesis" described by David Chalmers in his singularity paper, "increases in intelligence (or increases of a certain sort) always lead to proportionate increases in the capacity to design intelligent systems", suggests a single differential equation that looks like

where u represents the number of upgrades that have been made to an AGI's source code, and s is some constant.  The solution to this differential equation is going to look like

where the constant c1 is determined by our initial conditions.

(In Recursive Self-Improvement, Eliezer calls this a "too-obvious mathematical idiom".  I'm inclined to favor it for its obviousness, or at least use it as a jumping-off point for further analysis.)

Under this model, the constant s is pretty important... if u(t) was the amount of money in a bank account, s would be the rate of return it was receiving.  The parameter s will effectively determine the "doubling time" of an AGI's intelligence.  It matters a lot whether this "doubling time" is on the scale of minutes or years.

So what's going to determine s?  Well, if the AGI's hardware is twice as fast, we'd expect it to come up with upgrades twice as fast.  If the AGI had twice as much hardware, and it could parallelize the search for upgrades perfectly (which seems like a reasonable approximation to me), we'd expect the same thing.  So let's decompose s and make it the product of two parameters: h representing the hardware available to the AGI, and r representing the ease of finding additional improvements.  The AGI's intelligence will be on the order of u * h, i.e. the product of the AGI's software quality and hardware capability.


Considerations affecting our choice of model

Diminishing returns

The consideration here is that the initial improvements implemented by an AGI will tend to be those that are especially easy to implement and/or especially fruitful to implement, with subsequent improvements tending to deliver less intelligence bang for the implementation buck.  Chalmers calls this "perhaps the most serious structural obstacle" to the proportionality thesis.

To think about this consideration, one could imagine representing a given improvement as a pair of two values (u, d).  u represents a factor by which existing performance will be multiplied, e.g. if u is 1.1, then implementing this improvement will improve performance by a factor of 1.1.  d represents the cognitive difficulty or amount of intellectual labor to required to implement a given improvement.  If d is doubled, then at any given level of intelligence, implementing this improvement will take twice as long (because it will be harder to discover and/or harder to translate in to code).

Now let's imagine ordering our improvements in order from highest to lowest u to d ratio, so we implement those improvements that deliver the greatest bang for the buck first.

Thus ordered, let's imagine separating groups of consecutive improvements in to "tiers".  Each tier's worth of improvements, when taken together, will represent the doubling of an AGI's software quality, i.e. the product of the u's in that cluster will be roughly 2.  For a steady doubling time, each tier's total difficulty will need sum to approximately twice the difficulty of the tier before it.  If tier difficulty tends to more than double, we're likely to see sub-exponential growth.  If tier difficulty tends to less than double, we're likely to see super-exponential growth.  If a single improvement delivers a more-than-2x improvement, it will span multiple "tiers".

It seems to me that the quality of fruit available at each tier represents a kind of logical uncertainty, similar to asking whether an efficient algorithm exists for some task, and if so, how efficient.

On the this diminishing returns consideration, Chalmers writes:

If anything, 10% increases in intelligence-related capacities are likely to lead all sorts of intellectual breakthroughs, leading to next-generation increases in intelligence that are significantly greater than 10%. Even among humans, relatively small differences in design capacities (say, the difference between Turing and an average human) seem to lead to large differences in the systems that are designed (say, the difference between a computer and nothing of importance).

Eliezer Yudkowsky's objection is similar:

...human intelligence does not require a hundred times as much computing power as chimpanzee intelligence.  Human brains are merely three times too large, and our prefrontal cortices six times too large, for a primate with our body size.

Or again:  It does not seem to require 1000 times as many genes to build a human brain as to build a chimpanzee brain, even though human brains can build toys that are a thousand times as neat.

Why is this important?  Because it shows that with constant optimization pressure from natural selection and no intelligent insight, there were no diminishing returns to a search for better brain designs up to at least the human level.  There were probably accelerating returns (with a low acceleration factor).  There are no visible speedbumps, so far as I know.

First, hunter-gatherers can't design toys that are a thousand times as neat as the ones chimps design--they aren't programmed with the software modern humans get through the education (some may be unable to count), and educating apes has produced interesting results.

Speaking as someone who's basically clueless about neuroscience, I can think of many different factors that might contribute to intelligence differences within the human race or between humans and other apes:

  • Processing speed.
  • Cubic centimeters brain hardware devoted to abstract thinking.  (Gifted technical thinkers often seem to suffer from poor social intuition--perhaps a result of reallocation of brain hardware from social to technical processing.)
  • Average number of connections per neuron within that brain hardware.
  • Average neuron density within that brain hardware.  This author seems to think that a large part of the human brain's remarkableness comes largely from the fact that it's the largest primate brain, and primate brains maintain the same neuron density when enlarged while other types of brains don't.  "If absolute brain size is the best predictor of cognitive abilities in a primate (13), and absolute brain size is proportional to number of neurons across primates (24, 26), our superior cognitive abilities might be accounted for simply by the total number of neurons in our brain, which, based on the similar scaling of neuronal densities in rodents, elephants, and cetaceans, we predict to be the largest of any animal on Earth (28)."
  • Propensity to actually use your capacity for deliberate System 2 reasoning.  Richard Feynman's second wife on why she divorced him: "He begins working calculus problems in his head as soon as he awakens. He did calculus while driving in his car, while sitting in the living room, and while lying in bed at night."  (By the way, does anyone know of research that's been done on getting people to use System 2 more?  Seems like it could be really low-hanging fruit for improving intellectual output.  Sometimes I wonder if the reason intelligent people tend to like math is because they were reinforced for the behaviour of thinking abstractly as kids (via praise, good grades, etc.) while those not at the top of the class were not so reinforced.)
  • Extended neuroplasticity in to "childhood".
  • Increased calories to think with due to the invention of cooking.
  • And finally, mental algorithms ("software").  Which are probably at least somewhat important.

It seems to me like these factors (or ones like them) may multiply together to produce intelligence, i.e. the "intelligence equation", as it were, could be something like intelligence = processing_speed * cc_abstract_hardware * neuron_density * connections_per_neuron * propensity_for_abstraction * mental_algorithms.  If the ancestral environment rewarded intelligence, we should expect all of these characteristics to be selected for, and this could explain the "low acceleration factor" in human intelligence increase.  (Increasing your processing speed by a factor of 1.2 does more when you're already pretty smart, so all these sources of intelligence increase would feed in to one another.)

In other words, it's not that clear what relevance the evolution of human intelligence has to the ease and quality of the upgrades at different "tiers" of software improvements, since evolution operates on many non-software factors, but a self-improving AI (properly boxed) can only improve its software.


In the Hanson/Yudkowsky debate, Yudkowsky declares Douglas Englebart's plan to radically bootstrap his team's productivity though improving their computer and software tools "insufficiently recursive".  I agree with this assessment.  Here's my modelling of this phenomenon.

When a programmer makes an improvement to their code, their work of making the improvement requires the completion of many subtasks:

  • choosing a feature to add
  • reminding themselves of how the relevant part of the code works and loading that information in to their memory
  • identifying ways to implement the feature
  • evaluating different methods of implementing the feature according to simplicity, efficiency, and correctness
  • coding their chosen implementation
  • testing their chosen implementation, identifying bugs
  • identifying the cause of a given bug
  • figuring out how to fix the given bug

Each of those subtasks will consist of further subtasks like poking through their code, staring off in to space, typing, and talking to their rubber duck.

Now the programmer improves their development environment so they can poke through their code slightly faster.  But if poking through their code takes up only 5% of their development time, even an extremely large improvement in code-poking abilities is not going to result in an especially large increase in his development speed... in the best case, where code-poking time is reduced to zero, the programmer will only work about 5% faster.

This is a reflection of Amdahl's Law-type thinking.  The amount you can gain through speeding something up depends on how much it's slowing you down.

Relatedly, if intelligence is a complicated, heterogeneous process where computation is spread relatively evenly among many modules, then improving the performance of an AGI gets tougher, because upgrading an individual module does little to improve the performance of the system as a whole.

And to see orders-of-magnitude performance improvement in such a process, almost all of your AGI's components will need to be improved radically.  If even a few prove troublesome, improving your AGI's thinking speed becomes difficult.


Case studies in technological development speed

Moore's Law

It has famously been noted that if the automotive industry had achieved similar improvements in performance [to the semiconductor industry] in the last 30 years, a Rolls-Royce would cost only $40 and could circle the globe eight times on one gallon of gas—with a top speed of 2.4 million miles per hour.

From this McKinsey report.  So Moore's Law is an outlier where technological development is concerned.  I suspect that making transistors smaller and faster doesn't require finding ways to improve dozens of heterogeneous components.  And when you zoom out to view a computer system as a whole, other bottlenecks typically appear.

(It's also worth noting that research budgets in the semiconductor field have also risen greatly in the semiconductor industry since its inception, but obviously not following the same curve that chip speeds have.)

Compiler technology

This paper on "Proebstig's Law" suggests that the end result of all the compiler research done between 1970 or so and 2001 was that a typical integer-intensive program was compiled to run 3.3 times faster, and a typical floating-point-intensive program was compiled to run 8.1 times faster.  When it comes to making programs run quickly, it seems that software-level compiler improvements are swamped by hardware-level chip improvements--perhaps because, like an AGI, a compiler has to deal with a huge variety of different scenarios, so improving it in the average case is tough.  (This represents supertask heterogeneity, rather than subtask heterogeneity, so it's a different objection than the one mentioned above.)

Database technology

According to two analyses (full paper for that second one), it seems that improvement in database performance benchmarks has largely been due to Moore's Law.

AI (so far)

Robin Hanson's blog post "AI Progress Estimate" was the best resource I could find on this.


Why smooth exponential growth implies soft takeoff

Let's suppose we consider all of the above, deciding that the exponential model is the best, and we agree with Robin Hanson that there are few deep, chunky, undiscovered AI insights.

Under the straight exponential model, if you recall, we had

where u is the degree of software quality, h is the hardware availability, and r is a parameter representing the difficulty of doing additional upgrades.  Our AGI's overall intelligence is given by u * h--the quality of the software times the amount of hardware.

Now we can solve for r by substituting in human intelligence for u * h, and substituting in the rate of human AI progress for du/dt.  Another way of saying this is: When the AI is as smart as all the world's AI researchers working together, it will produce new AI insights at the rate that all the world's AI researchers working together produce new insights.  At some point our AGI will be just as smart as the world's AI researchers, but we can hardly expect to start seeing super-fast AI progress at that point, because the world's AI researchers haven't produced super-fast AI progress.

Let's assume AGI that's on par with the world AI research community is reached in 2080 (LW's median "singularity" estimate in 2011).  We'll pretend AI research has only been going on since 2000, meaning 80 "standard research years" of progress have gone in to the AGI's software.  So at the moment our shiny new AGI is fired up, u = 80, and it's doing research at the rate of one "human AGI community research year" per year, so du/dt = 1.  That's an effective rate of return on AI software progress of 1 / 80 = 1.3%, giving a software quality doubling time of around 58 years.

You could also apply this kind of thinking to individual AI projects.  For example, it's possible that at some point EURISKO was improving itself about as fast as Doug Lenat was improving it.  You might be able to do a similar calculation to take a stab at EURISKO's insight level doubling time.


The importance of hardware

According to my model, you double your AGI's intelligence, and thereby the speed with which your AGI improves itself, by doubling the hardware available for your AGI.  So if you had an AGI that was interesting, you could make it 4x as smart by giving it 4x the hardware.  If an AGI that was 4x as smart could get you 4x as much money (through impressing investors, or playing the stock market, or monopolizing additional industries), that'd be a nice feedback loop.  For maximum explosivity, put half your AGI's mind to the task of improving its software, and the other half to the task of making more money with which to buy more hardware.

But it seems pretty straightforward to prevent a non-superintelligent AI from gaining access to additional hardware with careful planning.  (Note: One problem with AI boxing experiments thus far is that all of the AIs have been played by human beings.  Human beings have innate understanding of human psychology and possess specialized capabilities for running emulations of one another.  It seems pretty easy to prevent an AGI from acquiring such understanding.  But there may exist box-breaking techniques that don't rely on understanding human psychology.  Another note about boxing: FAI requires getting everything perfect, which is a conjunctive calculation.  Given multiple safeguards, only one has to work for the box as a whole to work, which is a disjunctive calculation.)


AGI's impact on the economy

Is it possible that the first group to create a successful AGI might begin monopolizing different sections of the economy?  Robin Hanson argues that technology insights typically leak between different companies, due to conferences and employee poaching.  But we can't be confident these factors would affect the research an AGI does on itself.  And if an AGI is still dumb enough that a significant portion of its software upgrades are coming from human researchers, it can hardly be considered superintelligent.

Given what looks like a winner-take-all dynamic, an important factor may be the number of serious AGI competitors.  If there are only two, the #1 company may not wish to trade insights with the #2 company for fear of losing its lead.  If there are more than two, all but the leading company might ally against the leading company in trading insights.  If their alliance is significantly stronger than the leading company, perhaps the leading company would wish to join their alliance.

But if AI is about getting lots of details right, as Hanson suggests, improvements may not even transfer between different AI architectures.


What should we do?

I've argued that soft takeoff is a strong possibility.  Should that change our strategy as people concerned with x-risk?

If we are basically screwed in the event that hard takeoff is possible, it may be that preparing for a soft takeoff is a better use of resources on the margin.  Shane Legg has proposed that people concerned with friendliness become investors in AGI projects so they can affect the outcome of any that seem to be succeeding.


Concluding thoughts

Expert forecasts are famously unreliable even in the relatively well-understood field of political forecasting.  So given the number of unknowns involved in the emergence of smarter-than-human intelligence, it's hard to say much with certainty.  Picture a few Greek scholars speculating on the industrial revolution.

I don't have a strong background in these topics, so I fully expect that the above essay will reveal my ignorance, which I'd appreciate your pointing out in the comments.  This essay should be taken as at attempt to hack away at the edges, not come to definitive conclusions.  As always, I reserve the right to change my mind about anything ;)

Arguing Orthogonality, published form

9 Stuart_Armstrong 18 March 2013 04:19PM

My paper "General purpose intelligence: arguing the Orthogonality thesis" has been accepted for publication in the December edition of Analysis and Metaphysics. Since that's some time away, I thought I'd put the final paper up here; the arguments are similar to those here, but this is the final version, for critique and citation purposes.

General purpose intelligence: arguing the Orthogonality thesis


Future of Humanity Institute, Oxford Martin School
Philosophy Department, University of Oxford


In his paper “The Superintelligent Will”, Nick Bostrom formalised the Orthogonality thesis: the idea that the final goals and intelligence levels of artificial agents are independent of each other. This paper presents arguments for a (narrower) version of the thesis. It proceeds through three steps. First it shows that superintelligent agents with essentially arbitrary goals can exist in our universe – both as theoretical impractical agents such as AIXI and as physically possible real-world agents. Then it argues that if humans are capable of building human-level artificial intelligences, we can build them with an extremely broad spectrum of goals. Finally it shows that the same result holds for any superintelligent agent we could directly or indirectly build. This result is relevant for arguments about the potential motivations of future agents: knowing an artificial agent is of high intelligence does not allow us to presume that it will be moral, we will need to figure out its goals directly.


Keywords: AI; Artificial Intelligence; efficiency; intelligence; goals; orthogonality


1                       The Orthogonality thesis

Scientists and mathematicians are the stereotypical examples of high intelligence humans. But their morality and ethics have been all over the map. On modern political scales, they can be left- (Oppenheimer) or right-wing (von Neumann) and historically they have slotted into most of the political groupings of their period (Galois, Lavoisier). Ethically, they have ranged from very humanitarian (Darwin, Einstein outside of his private life), through amoral (von Braun) to commercially belligerent (Edison) and vindictive (Newton). Few scientists have been put in a position where they could demonstrate genuinely evil behaviour, but there have been a few of those (Teichmüller, Philipp Lenard, Ted Kaczynski, Shirō Ishii).

continue reading »

Population Ethics Shouldn't Be About Maximizing Utility

1 Ghatanathoah 18 March 2013 02:35AM

let me suggest a moral axiom with apparently very strong intuitive support, no matter what your concept of morality: morality should exist. That is, there should exist creatures who know what is moral, and who act on that. So if your moral theory implies that in ordinary circumstances moral creatures should exterminate themselves, leaving only immoral creatures, or no creatures at all, well that seems a sufficient reductio to solidly reject your moral theory.

-Robin Hanson

I agree strongly with the above quote, and I think most other readers will as well. It is good for moral beings to exist and a world with beings who value morality is almost always better than one where they do not. I would like to restate this more precisely as the following axiom: A population in which moral beings exist and have net positive utility, and in which all other creatures in existence also have net positive utility, is always better than a population where moral beings do not exist.

While the axiom that morality should exist is extremely obvious to most people, there is one strangely popular ethical system that rejects it: total utilitarianism. In this essay I will argue that Total Utilitarianism leads to what I will call the Genocidal Conclusion, which is that there are many situations in which it would be fantastically good for moral creatures to either exterminate themselves, or greatly limit their utility and reproduction in favor of the utility and reproduction of immoral creatures. I will argue that the main reason consequentialist theories of population ethics produce such obviously absurd conclusions is that they continue to focus on maximizing utility1 in situations where it is possible to create new creatures. I will argue that pure utility maximization is only a valid ethical theory for "special case" scenarios where the population is static. I will propose an alternative theory for population ethics I call "ideal consequentialism" or "ideal utilitarianism" which avoids the Genocidal Conclusion and may also avoid the more famous Repugnant Conclusion.


I will begin my argument by pointing to a common problem in population ethics known as the Mere Addition Paradox (MAP) and the Repugnant Conclusion. Most Less Wrong readers will already be familiar with this problem, so I do not think I need to elaborate on it. You may also be familiar with a even stronger variation called the Benign Addition Paradox (BAP). This is essentially the same as the MAP, except that each time one adds more people one also gives a small amount of additional utility to the people who already existed. One then proceeds to redistribute utility between people as normal, eventually arriving at the huge population where everyone's lives are "barely worth living." The point of this is to argue that the Repugnant Conclusion can be arrived at from "mere addition" of new people that not only doesn't harm the preexisting-people, but also one that benefits them.

The next step of my argument involves three slightly tweaked versions of the Benign Addition Paradox. I have not changed the basic logic of the problem, I have just added one small clarifying detail. In the original MAP and BAP it was not specified what sort of values the added individuals in population A+ held. Presumably one was meant to assume that they were ordinary human beings. In the versions of the BAP I am about to present, however, I will specify that the extra individuals added in A+ are not moral creatures, that if they have values at all they are values indifferent to, or opposed to, morality and the other values that the human race holds dear.

1. The Benign Addition Paradox with Paperclip Maximizers.

Let us imagine, as usual, a population, A, which has a large group of human beings living lives of very high utility. Let us then add a new population consisting of paperclip maximizers, each of whom is living a life barely worth living. Presumably, for a paperclip maximizer, this would be a life where the paperclip maximizer's existence results in at least one more paperclip in the world than there would have been otherwise.

Now, one might object that if one creates a paperclip maximizer, and then allows it to create one paperclip, the utility of the other paperclip maximizers will increase above the "barely worth living" level, which would obviously make this thought experiment nonalagous with the original MAP and BAP. To prevent this we will assume that each paperclip maximizer that is created has a slightly different values on what the ideal size, color, and composition of the paperclip they are trying to produce is. So the Purple 2 centimeter Plastic Paperclip Maximizer gains no addition utility from when the Silver Iron 1 centimeter Paperclip Maximizer makes a paperclip.

So again, let us add these paperclip maximizers to population A, and in the process give one extra utilon of utility to each preexisting person in A. This is a good thing, right? After all, everyone in A benefited, and the paperclippers get to exist and make paperclips. So clearly A+, the new population, is better than A.

Now let's take the next step, the transition from population A+ to population B. Take some of the utility from the human beings and convert it into paperclips. This is a good thing, right?

So let us repeat these steps adding paperclip maximizers and utility, and then redistributing utility. Eventually we reach population Z, where there is a vast amount of paperclip maximizers, a vast amount of many different kinds of paperclips, and a small amount of human beings living lives barely worth living.

Obviously Z is better than A, right? We should not fear the creation of a paperclip maximizing AI, but welcome it! Forget about things like high challenge, love, interpersonal entanglement, complex fun, and so on! Those things just don't produce the kind of utility that paperclip maximization has the potential to do!

Or maybe there is something seriously wrong with the moral assumptions behind the Mere Addition and Benign Addition Paradoxes.

But you might argue that I am using an unrealistic example. Creatures like Paperclip Maximizers may be so far removed from normal human experience that we have trouble thinking about them properly. So let's replay the Benign Addition Paradox again, but with creatures we might actually expect to meet in real life, and we know we actually value.

2. The Benign Addition Paradox with Non-Sapient Animals

You know the drill by now. Take population A, add a new population to it, while very slightly increasing the utility of the original population. This time let's have it be some kind animal that is capable of feeling pleasure and pain, but is not capable of modeling possible alternative futures and choosing between them (in other words, it is not capable of having "values" or being "moral"). A lizard or a mouse, for example. Each one feels slightly more pleasure than pain in its lifetime, so it can be said to have a life barely worth living. Convert A+ to B. Take the utilons that the human beings are using to experience things like curiosity, beatitude, wisdom, beauty, harmony, morality, and so on, and convert it into pleasure for the animals.

We end up with population Z, with a vast amount of mice or lizards with lives just barely worth living, and a small amount of human beings with lives barely worth living. Terrific! Why do we bother creating humans at all! Let's just create tons of mice and inject them full of heroin! It's a much more efficient way to generate utility!

3. The Benign Addition Paradox with Sociopaths

What new population will we add to A this time? How about some other human beings, who all have anti-social personality disorder? True, they lack the key, crucial value of sympathy that defines so much of human behavior. But they don't seem to miss it. And their lives are barely worth living, so obviously A+ has greater utility than A. If given a chance the sociopaths will reduce the utility of other people to negative levels, but let's assume that that is somehow prevented in this case.

Eventually we get to Z, with a vast population of sociopaths and a small population of normal human beings, all living lives just barely worth living. That has more utility, right? True, the sociopaths place no value on things like friendship, love, compassion, empathy, and so on. And true, the sociopaths are immoral beings who do not care in the slightest about right and wrong. But what does that matter? Utility is being maximized, and surely that is what population ethics is all about!


Let's suppose an asteroid is approaching each of the four population Zs discussed before. It can only be deflected by so much. Your choice is, save the original population of humans from A, or save the vast new population. The choice is obvious. In 1, 2, and 3, each individual has the same level utility, so obviously we should choose which option saves a greater number of individuals.

Bam! The asteroid strikes. The end result in all four scenarios is a world in which all the moral creatures are destroyed. It is a world without the many complex values that human beings possess. Each world, for the most part, lack things like complex challenge, imagination, friendship, empathy, love, and the other complex values that human beings prize. But so what? The purpose of population ethics is to maximize utility, not silly, frivolous things like morality, or the other complex values of the human race. That means that any form of utility that is easier to produce than those values is obviously superior. It's easier to make pleasure and paperclips than it is to make eudaemonia, so that's the form of utility that ought to be maximized, right? And as for making sure moral beings exist, well that's just ridiculous. The valuable processing power they're using to care about morality could be being used to make more paperclips or more mice injected with heroin! Obviously it would be better if they died off, right?

I'm going to go out on a limb and say "Wrong."

Is this realistic?

Now, to fair, in the Overcoming Bias page I quoted, Robin Hanson also says:

I’m not saying I can’t imagine any possible circumstances where moral creatures shouldn’t die off, but I am saying that those are not ordinary circumstances.

Maybe the scenarios I am proposing are just too extraordinary. But I don't think this is the case. I imagine that the circumstances Robin had in mind were probably something like "either all moral creatures die off, or all moral creatures are tortured 24/7 for all eternity."

Any purely utility-maximizing theory of population ethics that counts both the complex values of human beings, and the pleasure of animals, as "utility" should inevitably draw the conclusion that human beings ought to limit their reproduction to the bare minimum necessary to maintain the infrastructure to sustain a vastly huge population of non-human animals (preferably animals dosed with some sort of pleasure-causing drug). And if some way is found to maintain that infrastructure automatically, without the need for human beings, then the logical conclusion is that human beings are a waste of resources (as are chimps, gorillas, dolphins, and any other animal that is even remotely capable of having values or morality). Furthermore, even if the human race cannot practically be replaced with automated infrastructure, this should be an end result that the adherents of this theory should be yearning for.2 There should be much wailing and gnashing of teeth among moral philosophers that exterminating the human race is impractical, and much hope that someday in the future it will not be.

I call this the "Genocidal Conclusion" or "GC." On the macro level the GC manifests as the idea that the human race ought to be exterminated and replaced with creatures whose preferences are easier to satisfy. On the micro level it manifests as the idea that it is perfectly acceptable to kill someone who is destined to live a perfectly good and worthwhile life and replace them with another person who would have a slightly higher level of utility.

Population Ethics isn't About Maximizing Utility

I am going to make a rather radical proposal. I am going to argue that the consequentialist's favorite maxim, "maximize utility," only applies to scenarios where creating new people or creatures is off the table. I think we need an entirely different ethical framework to describe what ought to be done when it is possible to create new people. I am not by any means saying that "which option would result in more utility" is never a morally relevant consideration when deciding to create a new person, but I definitely think it is not the only one.3

So what do I propose as a replacement to utility maximization? I would argue in favor of a system that promotes a wide range of ideals. Doing some research, I discovered that G. E. Moore had in fact proposed a form of "ideal utilitarianism" in the early 20th century.4 However, I think that "ideal consequentialism" might be a better term for this system, since it isn't just about aggregating utility functions.

What are some of the ideals that an ideal consequentialist theory of population ethics might seek to promote? I've already hinted at what I think they are: Life, consciousness, and activity; health and strength; pleasures and satisfactions of all or certain kinds; happiness, beatitude, contentment, etc.; truth; knowledge and true opinions of various kinds, understanding, wisdom... mutual affection, love, friendship, cooperation; all those other important human universals, plus all the stuff in the Fun Theory Sequence. When considering what sort of creatures to create we ought to create creatures that value those things. Not necessarily, all of them, or in the same proportions, for diversity is an important ideal as well, but they should value a great many of those ideals.

Now, lest you worry that this theory has any totalitarian implications, let me make it clear that I am not saying we should force these values on creatures that do not share them. Forcing a paperclip maximizer to pretend to make friends and love people does not do anything to promote the ideals of Friendship and Love. Forcing a chimpanzee to listen while you read the Sequences to it does not promote the values of Truth and Knowledge. Those ideals require both a subjective and objective component. The only way to promote those ideals is to create a creature that includes them as part of its utility function and then help it maximize its utility.

I am also certainly not saying that there is never any value in creating a creature that does not possess these values. There are obviously many circumstances where it is good to create nonhuman animals. There may even be some circumstances where a paperclip maximizer could be of value. My argument is simply that it is most important to make sure that creatures who value these various ideals exist.

I am also not suggesting that it is morally acceptable to casually inflict horrible harms upon a creature with non-human values if we screw up and create one by accident. If promoting ideals and maximizing utility are separate values then it may be that once we have created such a creature we have a duty to make sure it lives a good life, even if it was a bad thing to create it in the first place. You can't unbirth a child.5

It also seems to me that in addition to having ideals about what sort of creatures should exist, we also have ideals about how utility ought to be concentrated. If this is the case then ideal consequentialism may be able to block some forms of the Repugnant Conclusion, even if situations where the only creatures whose creation is being considered are human beings. If it is acceptable to create humans instead of paperclippers, even if the paperclippers would have higher utility, it may also be acceptable to create ten humans with a utility of ten each instead of a hundred humans with a utility of 1.01 each.

Why Did We Become Convinced that Maximizing Utility was the Sole Good?

Population ethics was, until comparatively recently, a fallow field in ethics. And in situations where there is no option to increase the population, maximizing utility is the only consideration that's really relevant. If you've created creatures that value the right ideals, then all that is left to be done is to maximize their utility. If you've created creatures that do not value the right ideals, there is no value to be had in attempting to force them to embrace those ideals. As I've said before, you will not promote the values of Love and Friendship by creating a paperclip maximizer and forcing it to pretend to love people and make friends.

So in situations where the population is constant, "maximize utility" is a decent approximation of the meaning of right. It's only when the population can be added to that morality becomes much more complicated.

Another thing to blame is human-centric reasoning. When people defend the Repugnant Conclusion they tend to point out that a life barely worth living is not as bad as it would seem at first glance. They emphasize that it need not be a boring life, it may be a life full of ups and downs where the ups just barely outweigh the downs. A life worth living, they say, is a life one would choose to live. Derek Parfit developed this idea to some extent by arguing that there are certain values that are "discontinuous" and that one needs to experience many of them in order to truly have a life worth living.

The Orthogonality Thesis throws all these arguments out the window. It is possible to create an intelligence to execute any utility function, no matter what it is. If human beings have all sorts of complex needs that must be fulfilled in order to for them lead worthwhile lives, then you could create more worthwhile lives by killing the human race and replacing them with something less finicky. Maybe happy cows. Maybe paperclip maximizers. Or how about some creature whose only desire is to live for one second and then die. If we created such a creature and then killed it we would reap huge amounts of utility, for we would have created a creature that got everything it wanted out of life!

How Intuitive is the Mere Addition Principle, Really?

I think most people would agree that morality should exist, and that therefore any system of population ethics should not lead to the Genocidal Conclusion. But which step in the Benign Addition Paradox should we reject? We could reject the step where utility is redistributed. But that seems wrong, most people seem to consider it bad for animals and sociopaths to suffer, and that it is acceptable to inflict at least some amount of disutilities on human beings to prevent such suffering.

It seems more logical to reject the Mere Addition Principle. In other words, maybe we ought to reject the idea that the mere addition of more lives-worth-living cannot make the world worse. And in turn, we should probably also reject the Benign Addition Principle. Adding more lives-worth-living may be capable of making the world worse, even if doing so also slightly benefits existing people. Fortunately this isn't a very hard principle to reject. While many moral philosophers treat it as obviously correct, nearly everyone else rejects this principle in day-to-day life.

Now, I'm obviously not saying that people's behavior in their day-to-day lives is always good, it may be that they are morally mistaken. But I think the fact that so many people seem to implicitly reject it provides some sort of evidence against it.

Take people's decision to have children. Many people choose to have fewer children than they otherwise would because they do not believe they will be able to adequately care for them, at least not without inflicting large disutilities on themselves. If most people accepted the Mere Addition Principle there would be a simple solution for this: have more children and then neglect them! True, the children's lives would be terrible while they were growing up, but once they've grown up and are on their own there's a good chance they may be able to lead worthwhile lives. Not only that, it may be possible to trick the welfare system into giving you money for the children you neglect, which would satisfy the Benign Addition Principle.

Yet most people choose not to have children and neglect them. And furthermore they seem to think that they have a moral duty not to do so, that a world where they choose to not have neglected children is better than one that they don't. What is wrong with them?

Another example is a common political view many people have. Many people believe that impoverished people should have fewer children because of the burden doing so would place on the welfare system. They also believe that it would be bad to get rid of the welfare system altogether. If the Benign Addition Principle were as obvious as it seems, they would instead advocate for the abolition of the welfare system, and encourage impoverished people to have more children. Assuming most impoverished people live lives worth living, this is exactly analogous to the BAP, it would create more people, while benefiting existing ones (the people who pay less taxes because of the abolition of the welfare system).

Yet again, most people choose to reject this line of reasoning. The BAP does not seem to be an obvious and intuitive principle at all.

The Genocidal Conclusion is Really Repugnant

There is nearly nothing repugnant than the Genocidal Conclusion. Pretty much the only way a line of moral reasoning could go more wrong would be concluding that we have a moral duty to cause suffering, as an end in itself. This means that it's fairly easy to counter any argument in favor of total utilitarianism that argues the alternative I am promoting has odd conclusions that do not fit some of our moral intuitions, while total utilitarianism does not. Is that conclusion more insane than the Genocidal Conclusion? If it isn't, total utilitarianism should still be rejected.

Ideal Consequentialism Needs a Lot of Work

I do think that Ideal Consequentialism needs some serious ironing out. I haven't really developed it into a logical and rigorous system, at this point it's barely even a rough framework. There are many questions that stump me. In particular I am not quite sure what population principle I should develop. It's hard to develop one that rejects the MAP without leading to weird conclusions, like that it's bad to create someone of high utility if a population of even higher utility existed long ago. It's a difficult problem to work on, and it would be interesting to see if anyone else had any ideas.

But just because I don't have an alternative fully worked out doesn't mean I can't reject Total Utilitarianism. It leads to the conclusion that a world with no love, curiosity, complex challenge, friendship, morality, or any other value the human race holds dear is an ideal, desirable world, if there is a sufficient amount of some other creature with a simpler utility function. Morality should exist, and because of that, total utilitarianism must be rejected as a moral system.


1I have been asked to note that when I use the phrase "utility" I am usually referring to a concept that is called "E-utility," rather than the Von Neumann-Morgenstern utility that is sometimes discussed in decision theory. The difference is that in VNM one's moral views are included in one's utility function, whereas in E-utility they are not. So if one chooses to harm oneself to help others because one believes that is morally right, one has higher VNM utility, but lower E-utility.

2There is a certain argument against the Repugnant Conclusion that goes that, as the steps of the Mere Addition Paradox are followed the world will lose its last symphony, its last great book, and so on. I have always considered this to be an invalid argument because the world of the RC doesn't necessarily have to be one where these things don't exist, it could be one where they exist, but are enjoyed very rarely. The Genocidal Conclusion brings this argument back in force. Creating creatures that can appreciate symphonies and great books is very inefficient compared to creating bunny rabbits pumped full of heroin.

3Total Utilitarianism was originally introduced to population ethics as a possible solution to the Non-Identity Problem. I certainly agree that such a problem needs a solution, even if Total Utilitarianism doesn't work out as that solution.

4I haven't read a lot of Moore, most of my ideas were extrapolated from other things I read on Less Wrong. I just mentioned him because in my research I noticed his concept of "ideal utilitarianism" resembled my ideas. While I do think he was on the right track he does commit the Mind Projection Fallacy a lot. For instance, he seems to think that one could promote beauty by creating beautiful objects, even if there were no creatures with standards of beauty around to appreciate them. This is why I am careful to emphasize that to promote ideals like love and beauty one must create creatures capable of feeling love and experiencing beauty.

5My tentative answer to the question Eliezer poses in "You Can't Unbirth a Child" is that human beings may have a duty to allow the cheesecake maximizers to build some amount of giant cheesecakes, but they would also have a moral duty to limit such creatures' reproduction in order to spare resources to create more creatures with humane values.

EDITED: To make a point about ideal consequentialism clearer, based on AlexMennen's criticisms.

Risks of downloading alien AI via SETI search

7 turchin 15 March 2013 10:25AM

Alexei Turchin. Risks of downloading alien AI via SETI search

Abstract: This article examines risks associated with the program of passive search for alien signals (SETI—the Search for Extra-Terrestrial Intelligence). In this paper we propose a scenario of possible vulnerability and discuss the reasons why the proportion of dangerous signals to harmless ones can be dangerously high. This article does not propose to ban SETI programs, and does not insist on the inevitability of SETI-triggered disaster. Moreover, it gives the possibility of how SETI can be a salvation for mankind.

The idea that passive SETI can be dangerous is not new. Fred Hoyle suggested in the story "A for Andromeda” a scheme of alien attack through SETI signals. According to the plot, astronomers receive an alien signal, which contains a description of a computer and a computer program for it. This machine creates a description of the genetic code which leads to the creation of an intelligent creature – a girl dubbed Andromeda, which, working together with the computer, creates advanced technology for the military. The initial suspicion of alien intent is overcome by the greed for the technology the aliens can provide. However, the main characters realize that the computer acts in a manner hostile to human civilization and destroy the computer, and the girl dies.

This scenario is fiction, because most scientists do not believe in the possibility of a strong AI, and, secondly, because we do not have the technology that enables synthesis of new living organisms solely from its’ genetic code. Or at least, we have not until recently. Current technology of sequencing and DNA synthesis, as well as progress in developing a code of DNA modified with another set of the alphabet, indicate that in 10 years the task of re-establishing a living being from computer codes sent from space in the form computer codes might be feasible.

Hans Moravec in the book "Mind Children" (1988) offers a similar type of vulnerability: downloading a computer program from space via SETI, which will have artificial intelligence, promising new opportunities for the owner and after fooling the human host, self-replicating by the millions of copies and destroying the human host, finally using the resources of the secured planet to send its ‘child’ copies to multiple planets which constitute its’ future prey. Such a strategy would be like a virus or a digger wasp—horrible, but plausible. In the same direction are R. Carrigan’s ideas; he wrote an article "SETI-hacker", and expressed fears that unfiltered signals from space are loaded on millions of not secure computers of SETI-at-home program. But he met tough criticism from programmers who pointed out that, first, data fields and programs are in divided regions in computers, and secondly, computer codes, in which are written programs, are so unique that it is impossible to guess their structure sufficiently to hack them blindly (without prior knowledge).

After a while Carrigan issued a second article - "Should potential SETI signals be decontaminated?" http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf, which I’ve translated into Russian. In it, he pointed to the ease of transferring gigabytes of data on interstellar distances, and also indicated that the interstellar signal may contain some kind of bait that will encourage people to collect a dangerous device according to the designs. Here Carrigan not give up his belief in the possibility that an alien virus could directly infected earth’s computers without human ‘translation’ assistance. (We may note with passing alarm that the prevalence of humans obsessed with death—as Fred Saberhagen pointed out in his idea of ‘goodlife’—means that we cannot entirely discount the possibility of demented ‘volunteers’ –human traitors eager to assist such a fatal invasion) As a possible confirmation of this idea, Carrigan has shown that it is possible easily reverse engineer language of computer program - that is, based on the text of the program it is possible to guess what it does, and then restore the value of operators.

In 2006, E. Yudkowsky wrote an article "AI as a positive and a negative factor of global risk", in which he demonstrated that it is very likely that it is possible rapidly evolving universal artificial intelligence which high intelligence would be extremely dangerous if it was programmed incorrectly, and, finally, that the occurrence of such AI and the risks associated with it significantly undervalued. In addition, Yudkowsky introduced the notion of “Seed AI” - embryo AI - that is a minimum program capable of runaway self-improvement with unchanged primary goal. The size of Seed AI can be on the close order of hundreds of kilobytes. (For example, a typical representative of Seed AI is a human baby, whose part of genome responsible for the brain would represent ~ 3% of total genes of a person with a volume of 500 megabytes, or 15 megabytes, but given the share of garbage DNA is even less.)

In the beginning, let us assume that in the Universe there is an extraterrestrial civilization, which intends to send such a message, which will enable it to obtain power over Earth, and consider this scenario. In the next chapter we will consider how realistic is that another civilization would want to send such a message.

First, we note that in order to prove the vulnerability, it is enough to find just one hole in security. However, in order to prove safety, you must remove every possible hole. The complexity of these tasks varies on many orders of magnitude that are well known to experts on computer security. This distinction has led to the fact that almost all computer systems have been broken (from Enigma to iPOD). I will now try to demonstrate one possible, and even, in my view, likely, vulnerability of SETI program. However, I want to caution the reader from the thought that if he finds errors in my discussions, it automatically proves the safety of SETI program. Secondly, I would also like to draw the attention of the reader, that I am a man with an IQ of 120 who spent all of a month of thinking on the vulnerability problem. We need not require an alien super civilization with IQ of 1000000 and contemplation time of millions of years to significantly improve this algorithm—we have no real idea what an IQ of 300 or even-a mere IQ of 100 with much larger mental ‘RAM’ (–the ability to load a major architectural task into mind and keep it there for weeks while processing) could accomplish to find a much more simple and effective way. Finally, I propose one possible algorithm and then we will discuss briefly the other options.

In our discussions we will draw on the Copernican principle, that is, the belief that we are ordinary observers in normal situations. Therefore, the Earth’s civilization is an ordinary civilization developing normally. (Readers of tabloid newspapers may object!)

Algorithm of SETI attack

1. The sender creates a kind of signal beacon in space, which reveals that its message is clearly artificial. For example, this may be a star with a Dyson sphere, which has holes or mirrors, alternately opened and closed. Therefore, the entire star will blink of a period of a few minutes - faster is not possible because of the variable distance between different openings. (Even synchronized with an atomic clock according to a rigid schedule, the speed of light limit means that there are limits to the speed and reaction time of coordinating large scale systems) Nevertheless, this beacon can be seen at a distance of millions of light years. There are possible other types of lighthouses, but the important fact that the beacon signal could be viewed at long distances.

2. Nearer to Earth is a radio beacon with a much weaker signal, but more information saturated. The lighthouse draws attention to this radio source. This source produces some stream of binary information (i.e. the sequence of 0 and 1). About the objection that the information would contain noises, I note that the most obvious (understandable to the recipient's side) means to reduce noise is the simple repetition of the signal in a circle.

3. The most simple way to convey meaningful information using a binary signal is sending of images. First, because eye structures in the Earth's biological diversity appeared independently 7 times, it means that the presentation of a three-dimensional world with the help of 2D images is probably universal, and is almost certainly understandable to all creatures who can build a radio receiver.

4. Secondly, the 2D images are not too difficult to encode in binary signals. To do so, let us use the same system, which was used in the first TV cameras, namely, a system of progressive and frame rate. At the end of each time frame images store bright light, repeated after each line, that is, through an equal number of bits. Finally, at the end of each frame is placed another signal indicating the end of the frame, and repeated after each frame. (This may form, or may not form a continuous film.) This may look like this:

01010111101010 11111111111111111

01111010111111 11111111111111111

11100111100000 11111111111111111

Here is the end line signal of every of 25 units. Frame end signal may appear every, for example, 625 units.

5. Clearly, a sender civilization- should be extremely interested that we understand their signals. On the other hand, people will share an extreme desire to decrypt the signal. Therefore, there is no doubt that the picture will be recognized.

6. Using images and movies can convey a lot of information, they can even train in learning their language, and show their world. It is obvious that many can argue about how such films will be understandable. Here, we will focus on the fact that if a certain civilization sends radio signals, and the other takes them, so they have some shared knowledge. Namely, they know radio technique - that is they know transistors, capacitors, and resistors. These radio-parts are quite typical so that they can be easily recognized in the photographs. (For example, parts shown, in cutaway view, and in sequential assembly stages— or in an electrical schematic whose connections will argue for the nature of the components involved).

7. By sending photos depicting radio-parts on the right side, and on the left - their symbols, it is easy to convey a set of signs indicating electrical circuit. (Roughly the same could be transferred and the logical elements of computers.)

8. Then, using these symbols the sender civilization- transmits blueprints of their simplest computer. The simplest of computers from hardware point of view is the Post-machine. It has only 6 commands and a tape data recorder. Its full electric scheme will contain only a few tens of transistors or logic elements. It is not difficult to send blueprints of Post machine.

9. It is important to note that all computers at the level of algorithms are Turing-compatible. That means that extraterrestrial computers at the basic level are compatible with any earth computer. Turing-compatibility is a mathematical universality as the Pythagorean theorem. Even the Babbage mechanical machine, designed in the early 19th century, was Turing-compatible.

10. Then the sender civilization- begins to transmit programs for that machine. Despite the fact that the computer is very simple, it can implement a program of any difficulty, although it will take very long in comparison with more complex programs for the same computer. It is unlikely that people will be required to build this computer physically. They can easily emulate it within any modern computer, so that it will be able to perform trillions of operations per second, so even the most complex program will be carried out on it quite quickly. (It is a possible interim step: a primitive computer gives a description of a more complex and fast computer and then run on it.)

11. So why people would create this computer, and run its program? Perhaps, in addition to the actual computer schemes and programs in the communication must be some kind of "bait", which would have led the people to create such an alien computer and to run programs on it and to provide to it some sort of computer data about the external world –Earth outside the computer. There are two general possible baits - temptations and dangers:

a). For example, perhaps people receive the following offer– lets call it "The humanitarian aid con (deceit)". Senders of an "honest signal" SETI message warn that the sent program is Artificial intelligence, but lie about its goals. That is, they argue that this is a "gift" which will help us to solve all medical and energy problems. But it is a Trojan horse of most malevolent intent. It is too useful not to use. Eventually it becomes indispensable. And then exactly when society becomes dependent upon it, the foundation of society—and society itself—is overturned…

b). "The temptation of absolute power con" - in this scenario, they offer specific transaction message to recipients, promising power over other recipients. This begins a ‘race to the bottom’ that leads to runaway betrayals and power seeking counter-moves, ending with a world dictatorship, or worse, a destroyed world dictatorship on an empty world….

c). "Unknown threat con" - in this scenario bait senders report that a certain threat hangs over on humanity, for example, from another enemy civilization, and to protect yourself, you should join the putative “Galactic Alliance” and build a certain installation. Or, for example, they suggest performing a certain class of physical experiments on the accelerator and sending out this message to others in the Galaxy. (Like a chain letter) And we should send this message before we ignite the accelerator, please…

d). "Tireless researcher con" - here senders argue that posting messages is the cheapest way to explore the world. They ask us to create AI that will study our world, and send the results back. It does rather more than that, of course…

12. However, the main threat from alien messages with executable code is not the bait itself, but that this message can be well known to a large number of independent groups of people. First, there will always be someone who is more susceptible to the bait. Secondly, say, the world will know that alien message emanates from the Andromeda galaxy, and the Americans have already been received and maybe are trying to decipher it. Of course, then all other countries will run to build radio telescopes and point them on Andromeda galaxy, as will be afraid to miss a “strategic advantage”. And they will find the message and see that there is a proposal to grant omnipotence to those willing to collaborate. In doing so, they will not know, if the Americans would take advantage of them or not, even if the Americans will swear that they don’t run the malicious code, and beg others not to do so. Moreover, such oaths, and appeals will be perceived as a sign that the Americans have already received an incredible extraterrestrial advantage, and try to deprive "progressive mankind" of them. While most will understand the danger of launching alien code, someone will be willing to risk it. Moreover there will be a game in the spirit of "winner take all", as well be in the case of opening AI, as Yudkowsky shows in detail. So, the bait is not dangerous, but the plurality of recipients. If the alien message is posted to the Internet (and its size, sufficient to run Seed AI can be less than gigabytes along with a description of the computer program, and the bait), here we have a classic example of "knowledge" of mass destruction, as said Bill Joy, meaning the recipes genomes of dangerous biological viruses. If aliens sent code will be available to tens of thousands of people, then someone will start it even without any bait out of simple curiosity We can’t count on existing SETI protocols, because discussion on METI (sending of messages to extraterrestrial) has shown that SETI community is not monolithic on important questions. Even a simple fact that something was found could leak and encourage search from outsiders. And the coordinates of the point in sky would be enough.

13. Since people don’t have AI, we almost certainly greatly underestimate its power and overestimate our ability to control it. The common idea is that "it is enough to pull the power cord to stop an AI" or place it in a black box to avoid any associated risks. Yudkowsky shows that AI can deceive us as an adult does a child. If AI dips into the Internet, it can quickly subdue it as a whole, and also taught all necessary about entire earthly life. Quickly - means the maximum hours or days. Then the AI can create advanced nanotechnology, buy components and raw materials (on the Internet, he can easily make money and order goods with delivery, as well as to recruit people who would receive them, following the instructions of their well paying but ‘unseen employer’, not knowing who—or rather, what—- they are serving). Yudkowsky leads one of the possible scenarios of this stage in detail and assesses that AI needs only weeks to crack any security and get its own physical infrastructure.

"Consider, for clarity, one possible scenario, in which Alien AI (AAI) can seize power on the Earth. Assume that it promises immortality to anyone who creates a computer on the blueprints sent to him and start the program with AI on that computer. When the program starts, it says: "OK, buddy, I can make you immortal, but for this I need to know on what basis your body works. Provide me please access to your database. And you connect the device to the Internet, where it was gradually being developed and learns what it needs and peculiarities of human biology. (Here it is possible for it escape to the Internet, but we omit details since this is not the main point) Then the AAI says: "I know how you become biologically immortal. It is necessary to replace every cell of your body with nanobiorobot. And fortunately, in the biology of your body there is almost nothing special that would block bio-immorality.. Many other organisms in the universe are also using DNA as a carrier of information. So I know how to program the DNA so as to create genetically modified bacteria that could perform the functions of any cell. I need access to the biological laboratory, where I can perform a few experiments, and it will cost you a million of your dollars." You rent a laboratory, hire several employees, and finally the AAI issues a table with its' solution of custom designed DNA, which are ordered in the laboratory by automated machine synthesis of DNA. http://en.wikipedia.org/wiki/DNA_sequencing Then they implant the DNA into yeast, and after several unsuccessful experiments they create a radio guided bacteria (shorthand: This is not truly a bacterium, since it appears all organelles and nucleus; also 'radio' is shorthand for remote controlled; a far more likely communication mechanism would be modulated sonic impulses) , which can synthesize a new DNA-based code based on commands from outside. Now the AAI has achieved independence from human 'filtering' of its' true commands, because the bacterium has in effect its own remote controlled sequencers (self-reproducing to boot!). Now the AAI can transform and synthesize substances ostensibly introduced into test tubes for a benign test, and use them for a malevolent purpose., Obviously, at this moment Alien AI is ready to launch an attack against humanity. He can transfer himself to the level of nano-computer so that the source computer can be disconnected. After that AAI spraying some of subordinate bacteria in the air, which also have AAI, and they gradually are spread across the planet, imperceptibly penetrates into all living beings, and then start by the timer to divide indefinitely, as gray goo, and destroy all living beings. Once they are destroyed, Alien AI can begin to build their own infrastructure for the transmission of radio messages into space. Obviously, this fictionalized scenario is not unique: for example, AAI may seize power over nuclear weapons, and compel people to build radio transmitters under the threat of attack. Because of possibly vast AAI experience and intelligence, he can choose the most appropriate way in any existing circumstances. (Added by Freidlander: Imagine a CIA or FSB like agency with equipment centuries into the future, introduced to a primitive culture without concept of remote scanning, codes, the entire fieldcraft of spying. Humanity might never know what hit it, because the AAI might be many centuries if not millennia better armed than we (in the sense of usable military inventions and techniques ).

14. After that, this SETI-AI does not need people to realize any of its goals. This does not mean that it would seek to destroy them, but it may want to pre-empt if people will fight it - and they will.

15. Then this SETI-AI can do a lot of things, but more importantly, that it should do - is to continue the transfer of its communications-generated-embryos to the rest of the Universe. To do so, he will probably turn the matter in the solar system in the same transmitter as the one that sent him. In doing so the Earth and its’ people would be a disposable source of materials and parts—possibly on a molecular scale.

So, we examined a possible scenario of attack, which has 15 stages. Each of these stages is logically convincing and could be criticized and protected separately. Other attack scenarios are possible. For example, we may think that the message is not sent directly to us but is someone to someone else's correspondence and try to decipher it. And this will be, in fact, bait.

But not only distribution of executable code can be dangerous. For example, we can receive some sort of “useful” technology that really should lead us to disaster (for example, in the spirit of the message "quickly shrink 10 kg of plutonium, and you will have a new source of energy" ...but with planetary, not local consequences…). Such a mailing could be done by a certain "civilization" in advance to destroy competitors in the space. It is obvious that those who receive such messages will primarily seek technology for military use.

Analysis of possible goals

We now turn to the analysis of the purposes for which certain super civilizations could carry out such an attack.

1. We must not confuse the concept of a super-civilization with the hope for superkindness of civilization. Advanced does not necessarily mean merciful. Moreover, we should not expect anything good from extraterrestrial ‘kindness’. This is well written in Strugatsky’s novel "Waves stop wind." Whatever the goal of imposing super-civilization upon us , we have to be their inferiors in capability and in civilizational robustness even if their intentions are well.. The historical example: The activities of Christian missionaries, destroying traditional religion. Moreover, we can better understand purely hostile objectives. And if the SETI attack succeeds, it may be only a prelude to doing us more ‘favors’ and ‘upgrades’ until there is scarcely anything human left of us even if we do survive…

2. We can divide all civilizations into the twin classes of naive and serious. Serious civilizations are aware of the SETI risks, and have got their own powerful AI, which can resist alien hacker attacks. Naive civilizations, like the present Earth, already possess the means of long-distance hearing in space and computers, but do not yet possess AI, and are not aware of the risks of AI-SETI. Probably every civilization has its stage of being "naive", and it is this phase then it is most vulnerable to SETI attack. And perhaps this phase is very short. Since the period of the outbreak and spread of radio telescopes to powerful computers that could create AI can be only a few tens of years. Therefore, the SETI attack must be set at such a civilization. This is not a pleasant thought, because we are among the vulnerable.

3. If traveling with super-light speeds is not possible, the spread of civilization through SETI attacks is the fastest way to conquering space. At large distances, it will provide significant temporary gains compared with any kind of ships. Therefore, if two civilizations compete for mastery of space, the one that favored SETI attack will win.

4. The most important thing is that it is enough to begin a SETI attack just once, as it goes in a self-replicating the wave throughout the Universe, striking more and more naive civilizations. For example, if we have a million harmless normal biological viruses and one dangerous, then once they get into the body, we will get trillions of copies of the dangerous virus, and still only a million safe viruses. In other words, it is enough that if one of billions of civilizations starts the process and then it becomes unstoppable throughout the Universe. Since it is almost at the speed of light, countermeasures will be almost impossible.

5. Further, the delivery of SETI messages will be a priority for the virus that infected a civilization, and it will spend on it most of its energy, like a biological organism spends on reproduction - that is tens of percent. But Earth's civilization spends on SETI only a few tens of millions of dollars, that is about one millionth of our resources, and this proportion is unlikely to change much for the more advanced civilizations. In other words, an infected civilization will produce a million times more SETI signals than a healthy one. Or, to say in another way, if in the Galaxy are one million healthy civilizations, and one infected, then we will have equal chances to encounter a signal from healthy or contaminated.

6. Moreover, there are no other reasonable prospects to distribute its code in space except through self-replication.

7. Moreover, such a process could begin by accident - for example, in the beginning it was just a research project, which was intended to send the results of its (innocent) studies to the maternal civilization, not causing harm to the host civilization, then this process became "cancer" because of certain propogative faults or mutations.

8. There is nothing unusual in such behavior. In any medium, there are viruses – there are viruses in biology, in computer networks - computer viruses, in conversation - meme. We do not ask why nature wanted to create a biological virus.

9. Travel through SETI attacks is much cheaper than by any other means. Namely, a civilization in Andromeda can simultaneously send a signal to 100 billion stars in our galaxy. But each space ship would cost billions, and even if free, would be slower to reach all the stars of our Galaxy.

10. Now we list several possible goals of a SETI attack, just to show the variety of motives.

  • To study the universe. After executing the code research probes are created to gather survey and send back information.
  • To ensure that there are no competing civilizations. All of their embryos are destroyed. This is preemptive war on an indiscriminate basis.
  • To preempt the other competing supercivilization (yes, in this scenario there are two!) before it can take advantage of this resource.
  • This is done in order to prepare a solid base for the arrival of spacecraft. This makes sense if super civilization is very far away, and consequently, the gap between the speed of light and near-light speeds of its ships (say, 0.5 c) gives a millennium difference.
  • The goal is to achieve immortality. Carrigan showed that the amount of human personal memory is on the order of 2.5 gigabytes, so a few exabytes (1 exabyte = 1 073 741 824 gigabytes) forwarding the information can send the entire civilization. (You may adjust the units according to how big you like your super-civilizations!)
  • Finally we consider illogical and incomprehensible (to us) purposes, for example, as a work of art, an act of self-expression or toys. Or perhaps an insane rivalry between two factions. Or something we simply cannot understand (For example, extraterrestrial will not understand why the Americans have stuck a flag into the Moon. Was it worthwhile to fly over 300000 km to install painted steel?)

11. Assuming signals propagated billions of light years distant in the Universe, the area susceptible to widespread SETI attack, is a sphere with a radius of several billion light years. In other words, it would be sufficient to find a one “bad civilization" in the light cone of a height of several billion years old, that is, that includes billions of galaxies from which we are in danger of SETI attack. Of course, this is only true, if the average density of civilization is at least one in the galaxy. This is an interesting possibility in relation to Fermi’s Paradox.

16. As the depth of scanning the sky rises linearly, the volume of space and the number of stars that we see increases by the cube of that number. This means that our chances to stumble on a SETI signal nonlinear grow by fast curve.

17. It is possible that when we stumble upon several different messages from the skies, which refute one another in a spirit of: "do not listen to them, they are deceiving voices, and wish you evil. But we, brother, we, are good—and wise…"

18. Whatever positive and valuable message we receive, we can never be sure that all of this is not a subtle and deeply concealed threat. This means that in interstellar communication there will always be an element of distrust, and in every happy revelation, a gnawing suspicion.

19. A defensive posture regarding interstellar communication is only to listen, not sending anything that does not reveal its location. The laws prohibit the sending of a message from the United States to the stars. Anyone in the Universe who sends (transmits) self-evidently- is not afraid to show his position. Perhaps because the sending (for the sender) is more important than personal safety. For example, because it plans to flush out prey prior to attacks. Or it is forced to, by a evil local AI.

20. It was said about atomic bomb: The main secret about the atomic bomb is that it can be done. If prior to the discovery of a chain reaction Rutherford believed that the release of nuclear energy is an issue for the distant future, following the discovery any physicist knows that it is enough to connect two subcritical masses of fissionable material in order to release nuclear energy. In other words, if one day we find that signals can be received from space, it will be an irreversible event—something analogous to a deadly new arms race will be on.


The discussions on the issue raise several typical objections, now discussed.

Objection 1: Behavior discussed here is too anthropomorphic. In fact, civilizations are very different from each other, so you can’t predict their behavior.

Answer: Here we have a powerful observation selection effect. While a variety of possible civilizations exist, including such extreme scenarios as thinking oceans, etc., we can only receive radio signals from civilizations that send them, which means that they have corresponding radio equipment and has knowledge of materials, electronics and computing. That is to say we are threatened by civilizations of the same type as our own. Those civilizations, which can neither accept nor send radio messages, do not participate in this game.

Also, an observation selection effect concerns purposes. Goals of civilizations can be very different, but all civilizations intensely sending signals, will be only that want to tell something to “everyone". Finally, the observation selection relates to the effectiveness and universality of SETI virus. The more effective it is, the more different civilizations will catch it and the more copies of the SETI virus radio signals will be in heaven. So we have the ‘excellent chances’ to meet a most powerful and effective virus.

Objection 2. For super-civilizations there is no need to resort to subterfuge. They can directly conquer us.


This is true only if they are in close proximity to us. If movement faster than light is not possible, the impact of messages will be faster and cheaper. Perhaps this difference becomes important at intergalactic distances. Therefore, one should not fear the SETI attack from the nearest stars, coming within a radius of tens and hundreds of light-years.

Objection 3. There are lots of reasons why SETI attack may not be possible. What is the point to run an ineffective attack?

Answer: SETI attack does not always work. It must act in a sufficient number of cases in line with the objectives of civilization, which sends a message. For example, the con man or fraudster does not expect that he would be able "to con" every victim. He would be happy to steal from even one person inone hundred. It follows that SETI attack is useless if there is a goal to attack all civilizations in a certain galaxy. But if the goal is to get at least some outposts in another galaxy, the SETI attack fits. (Of course, these outposts can then build fleets of space ships to spread SETI attack bases outlying stars within the target galaxy.)

The main assumption underlying the idea of SETI attacks is that extraterrestrial super civilizations exist in the visible universe at all. I think that this is unlikely for reasons related to antropic principle. Our universe is unique from 10 ** 500 possible universes with different physical properties, as suggested by one of the scenarios of string theory. My brain is 1 kg out of 10 ** 30 kg in the solar system. Similarly, I suppose, the Sun is no more than about 1 out of 10 ** 30 stars that could raise a intelligent life, so it means that we are likely alone in the visible universe.

Secondly the fact that Earth came so late (i.e. it could be here for a few billion years earlier), and it was not prevented by alien preemption from developing, argues for the rarity of intelligent life in the Universe. The putative rarity of our civilization is the best protection against attack SETI. On the other hand, if we open parallel worlds or super light speed communication, the problem arises again.

Objection 7. Contact is impossible between post-singularity supercivilizations, which are supposed here to be the sender of SETI-signals, and pre- singularity civilization, which we are, because supercivilization is many orders of magnitude superior to us, and its message will be absolutely not understandable for us - exactly as the contact between ants and humans is not possible. (A singularity is the time of creation of artificial intelligence capable of learning, (and beginning an exponential booting in recursive improving self-design of further intelligence and much else besides) after which civilization make leap in its development - on Earth it may be possible in the area in 2030.)

Answer: In the proposed scenario, we are not talking about contact but a purposeful deception of us. Similarly, a man is quite capable of manipulating behavior of ants and other social insects, whose objectives are is absolutely incomprehensible to them. For example, LJ user “ivanov-petrov” describes the following scene: As a student, he studied the behavior of bees in the Botanical Garden of Moscow State University. But he had bad relations with the security guard controlling the garden, which is regularly expelled him before his time. Ivanov-Petrov took the green board and developed in bees conditioned reflex to attack this board. The next time the watchman came, who constantly wore a green jersey, all the bees attacked him and he took to flight. So “ivanov-petrov” could continue research. Such manipulation is not a contact, but this does not prevent its’ effectiveness.

"Objection 8. For civilizations located near us is much easier to attack us –for ‘guaranteed results’—using starships than with SETI-attack.

Answer. It may be that we significantly underestimate the complexity of an attack using starships and, in general, the complexity of interstellar travel. To list only one factor, the potential ‘minefield’ characteristics of the as-yet unknown interstellar medium.

If such an attack would be carried out now or in the past, the Earth's civilization has nothing to oppose it, but in the future the situation will change - all matter in the solar system will be full of robots, and possibly completely processed by them. On the other hand, the more the speed of enemy starships approaching us, the more the fleet will be visible by its braking emissions and other characteristics. These quick starships would be very vulnerable, in addition we could prepare in advance for its arrival. A slowly moving nano- starship would be very less visible, but in the case of wishing to trigger a transformation of full substance of the solar system, it would simply be nowhere to land (at least without starting an alert in such a ‘nanotech-settled’ and fully used future solar system. (Friedlander added: Presumably there would always be some ‘outer edge’ of thinly settled Oort Cloud sort of matter, but by definition the rest of the system would be more densely settled, energy rich and any deeper penetration into solar space and its’ conquest would be the proverbial uphill battle—not in terms of gravity gradient, but in terms of the available resources of war against a full Class 2 Kardashev civilization.)

The most serious objection is that an advanced civilization could in a few million years sow all our galaxy with self replicating post singularity nanobots that could achieve any goal in each target star-system, including easy prevention of the development of incipient other civilizations. (In the USA Frank Tipler advanced this line of reasoning.) However, this could not have happened in our case - no one has prevented development of our civilization. So, it would be much easier and more reliable to send out robots with such assignments, than bombardment of SETI messages of the entire galaxy, and if we don’t see it, it means that no SETI attacks are inside our galaxy. (It is possible that a probe on the outskirts of the solar system expects manifestations of human space activity to attack – a variant of the "Berserker" hypothesis - but it will not attack through SETI). Probably for many millions or even billions of years microrobots could even reach from distant galaxies at a distance of tens of millions of light-years away. Radiation damage may limit this however without regular self-rebuilding.

In this case SETI attack would be meaningful only at large distances. However, this distance - tens and hundreds of millions of light-years - probably will require innovative methods of modulation signals, such as management of the luminescence of active nuclei of galaxies. Or transfer a narrow beam in the direction of our galaxy (but they do not know where it will be over millions of years). But a civilization, which can manage its’ galaxy’s nucleus, might create a spaceship flying with near-light speeds, even if its mass is a mass of the planet. Such considerations severely reduce the likelihood of SETI attacks, but not lower it to zero, because we do not know all the possible objectives and circumstances.

(An comment by JF :For example the lack of SETI-attack so far may itself be a cunning ploy: At first receipt of the developing Solar civilization’s radio signals, all interstellar ‘spam’ would have ceased, (and interference stations of some unknown (but amazing) capability and type set up around the Solar System to block all coming signals recognizable to its’ computers as of intelligent origin,) in order to get us ‘lonely’ and give us time to discover and appreciate the Fermi Paradox and even get those so philosophically inclined to despair desperate that this means the Universe is apparently hostile by some standards. Then, when desperate, we suddenly discover, slowly at first, partially at first, and then with more and more wonderful signals, the fact that space is filled with bright enticing signals (like spam). The blockade, cunning as it was (analogous to Earthly jamming stations) was yet a prelude to a slow ‘turning up’ of preplanned intriguing signal traffic. If as Earth had developed we had intercepted cunning spam followed by the agonized ‘don’t repeat our mistakes’ final messages of tricked and dying civilizations, only a fool would heed the enticing voices of SETI spam. But now, a SETI attack may benefit from the slow unmasking of a cunning masquerade as first a faint and distant light of infinite wonder, only at the end revealed as the headlight of an onrushing cosmic train…)

AT comment to it. In fact I think that SETI attack senders are on the distances more than 1000 ly and so they do not know yet that we have appeared. But so called Fermi Paradox indeed maybe a trick – senders deliberately made their signals weak in order to make us think that they are not spam.

The scale of space strategy may be inconceivable to the human mind.

And we should note in conclusion that some types of SETI-attack do not even need a computer but just a man who could understand the message that then "set his mind on fire". At the moment we cannot imagine such a message, but we can give some analogies. Western religions are built around the text of the Bible. It can be assumed that if the text of the Bible appeared in some countries, which had previously not been familiar with it, there might arise a certain number of biblical believers. Similarly subversive political literature, or even some superideas, “sticky” memes or philosophical mind-benders. Or, as suggested by Hans Moravec, we get such a message: "Now that you have received and decoded me, broadcast me in at least ten thousand directions with ten million watts of power. Or else." - this message is dropped, leaving us guessing, what may indicate that "or else". Even a few pages of text may contain a lot of subversive information - Imagine that we could send a message to the 19 th century scientists. We could open them to the general principle of the atomic bomb, the theory of relativity, the transistors - and thus completely change the course of technological history, and we could add that all the ills in the 20 century were from Germany (which is only partly true) , then we would have influenced the political history.

(Comment of JF: Such a latter usage would depend on having received enough of Earth’s transmissions to be able to model our behavior and politics. But imagine a message as posing from our own future, to ignite ‘catalytic war’—Automated SIGINT (signals intelligence) stations are constructed monitoring our solar system, their computers ‘cracking’ our language and culture (possibly with the aid of children’s television programs with see and say matching of letters and sounds, from TV news showing world maps and naming countries possibly even from intercepting wireless internet encyclopedia articles. ) Then a test or two may follow, posting a what if scenario inviting comment from bloggers, about a future war say between the two leading powers of the planet. (For purposes of this discussion, say around 2100 present calendar China is strongest and India rising fast). Any defects and nitpicks in the comments of the blog are noted and corrected. Finally, an actual interstellar message is sent with the debugged scenario(not shifting against the stellar background, it is unquestionably interstellar in origin) proporting to be from a dying starship of the presently stronger side’s (China’s) future, when the presently weaker side (India’s) space fleet has smashed the future version of the Chinese State and essentially committed genocide. The starship has come back in time, but is dying, and indeed the transmission ends, or simply repeats, possibly after some back and forth communication between the false computer models of the ‘starship commander’ and the Chinese government. The reader can imagine the urgings of the future Chinese military council to preempt to forestall doom. If as seems probable, such a strategy is too complicated to carry off in one stage, various ‘future travellers’ may emerge from a war, signal for help in vain, and ‘die’ far outside our ability to reach them, (say some light days away, near the alleged location of an ‘emergence gate’ but near an actual transmitter) Quite a drama may emerge as the computer learns to ‘play’ us like a con man, ship after ship of various nationalities dribbling out stories but also getting answers to key questions for aid in constructing the emerging scenario which will be frighteningly believable, enough to ignite a final war. Possibly lists of key people in China (or whatever side is stronger) may be drawn up by the computer with a demand that they be executed as the parents of future war criminals—sort of an International Criminal Court –acting as Terminator scenario. Naturally the Chinese state, at that time the most powerful in the world, would guard its’ rulers lives against any threat. Yet more refugee spaceships of various nationalities can emerge transmit and die, offering their own militaries terrifying new weapons technologies from unknown sciences that really work (more ‘proof’ of their future origin). Or weapons from known sciences, for example decoding online DNA sequences in the future internet and constructing formulae for DNA constructors to make specific tailored genetic weapons against particular populations—that endure in the ground, a scorched earth against a particular population on a particular piece of land. These are copied and spread worldwide as are totally accurate plans—in standard CNC codes for easy to construct thermonuclear weapons in the 1950s style—using U-238 for casing, and only a few kilograms of fissionable material for ignition By that time well over a million tons of depleted uranium will be worldwide, and deuterium is free in the ocean and can be used directly in very large weapons without lithium deuteride. Knowing how to hack together a wasteful, more than critical mass crude fission device is one thing (the South African device was of this kind). But knowing –with absolute accuracy, down to machining drawings, CNC codes, etc how to make high-yield, super efficient very dirty thermonuclear weapons without need for testing means that any small group with a few dozen million dollars and automated machine tools can clandestinely make a multi-megaton device –or many— and smash the largest cities. And any small power with a few dozen jets can cripple a continent for a decade. Already over a thousand tons of plutonium exist. The SETI spam can include CNC codes for making a one shot reactor plutonium chemical refiner that would be left hopelessly radioactive but output chemically pure plutonium. (This would be prone to predetonation because of the Pu-240 content but then plans for debugged laser isotope separators may also be downloaded). This is a variant of the ‘catalytic war’ and ‘nuclear six gun’ (i.e. easy to obtain weapons) scenarios of the late Herman Kahn. Even cheaper would be bioattacks of the kind outlined above. The principle point is that planet killer weapons fully debugged take great amounts of debugging, tens to hundreds of billions of dollars, and free access to a world scientific community. Today, it is to every great power’s advantage to keep accurate designs out of the hands of third parties because they have to live on the same planet (and because the fewer weapons, the easier it is to stay a great power). Not so the SETI spam authors. Without the hundreds of billions in R and D, the actual construction budget would be on the order of a million dollars per multi-megaton device (depending on the expense of obtaining the raw reactor plutonium) If wishing to extend today’s scenarios into the future, the SETI spam authors manipulate Georgia (with about a $10 billion GDP) to arm against Russia and Taiwan against China and Venezuela against the USA. Although Russian and China and the USA could respectively promise annihilation against any attacker, with a military budget around 4% of GDP and the downloaded plans, the reverse—for the first time—could then also be true. (400 100 megaton bombs can kill by fallout perhaps 95% of unprotected populations over a country the size of the USA or China and 90% of a country the size of Russia, assuming the worst kind of cooperation from the winds.—from an old chart by Ralph Lapp) Anyone living near a superarmed microstate with border conflicts will, of course, wish to arm themselves. And these newly armed states themselves—of course—will have borders. Note that this drawn out scenario gives lots of time for a huge arms buildup on both (or many!) sides, and a Second Cold War that eventually turns very hot indeed…and unlike a human player of such a horrific ‘catalytic war’ con game, worldwide fallout or enduring biocontamination is not a concern at all… ()


The product of the probabilities of the following events describes the probability of attack. For these probabilities, we can only give so-called «expert» assessment, that is, assign them a certain a priori subjective probability as we do now.

1) The likelihood that extraterrestrial civilizations exist at a distance at which radio communication is possible with them. In general, I agree with the view of Shklovsky and supporters of the “Rare Earth” hypothesis - that the Earth's civilization is unique in the observable universe. This does not mean that extraterrestrial civilizations do not exist at all (because the universe, according to the theory of cosmological inflation, is almost endless) - they are just over the horizon of events visible from our point in space-time. In addition, this is not just about distance, but also of the distance at which you can establish a connection, which allows transferring gigabytes of information. (However, passing even 1 bit per second, you can submit 1-gigabit for about 20 years, which may be sufficient for the SETI-attack.) If in the future will be possible some superluminal communication or interaction with parallel universes, it would dramatically increase the chances of SETI attacks. So, I appreciate this chance to 10%.

2) The probability that SETI-attack is technically feasible: that is, it is possible computer program, with recursively self-improving AI and sizes suitable for shipping. I see this chance as high: 90%.

3) The likelihood that civilizations that could have carried out such attack exist in our space-time cone - this probability depends on the density of civilizations in the universe, and of whether the percentage of civilizations that choose to initiate such an attack, or, more importantly, obtain victims and become repeaters. In addition, it is necessary to take into account not only the density of civilizations, but also the density created by radio signals. All these factors are highly uncertain. It is therefore reasonable to assign this probability to 50%.

4) The probability that we find such a signal during our rising civilization’s period of vulnerability to it. The period of vulnerability lasts from now until the moment when we will decide and be technically ready to implement this decision: Do not download any extraterrestrial computer programs under any circumstances. Such a decision may only be exercised by our AI, installed as world ruler (which in itself is fraught with considerable risk). Such an world AI (WAI) can be in created circa 2030. We cannot exclude, however, that our WAI still will not impose a ban on the intake of extraterrestrial messages, and fall victim to attacks by the alien artificial intelligence, which by millions of years of machine evolution surpasses it. Thus, the window of vulnerability is most likely about 20 years, and “width” of the window depends on the intensity of searches in the coming years. This “width” for example, depends on the intensity of the current economic crisis of 2008-2010, from the risks of World War III, and how all this will affect the emergence of the WAI. It also depends on the density of infected civilizations and their signal strength— as these factors increase, the more chances to detect them earlier. Because we are a normal civilization under normal conditions, according to the principle of Copernicus, the probability should be large enough; otherwise a SETI-attack would have been generally ineffective. (The SETI-attack, itself (here supposed to exist) also are subject to a form of “natural selection” to test its effectiveness. (In the sense that it works or does not. ) This is a very uncertain chance we will too, over 50%.

5) Next is the probability that SETI-attack will be successful - that is that we swallow the bait, download the program and description of the computer, run them, lose control over them and let them reach all their goals. I appreciate this chance to be very high because of the factor of multiplicity - that is the fact that the message is downloaded repeatedly, and someone, sooner or later, will start it. In addition, through natural selection, most likely we will get the most effective and deadly message that will most effectively deceive our type of civilization. I consider it to be 90%.

6) Finally, it is necessary to assess the probability that SETI-attack will lead to a complete human extinction. On the one hand, it is possible to imagine a “good” SETI-attack, which is limited so that it will create a powerful radio emitter behind the orbit of Pluto. However, for such a program will always exist the risk that a possible emergent society at its’ target star will create a powerful artificial intelligence, and effective weapon that would destroy this emitter. In addition, to create the most powerful transponder would be needed all the substance of solar system and the entire solar energy. Consequently, the share of such “good” attacks will be lower due to natural selection, as well as some of them will be destroyed sooner or later by captured by them civilizations and their signals will be weaker. So the chances of destroying all the people with the help of SETI-attack that has reached all its goals, I appreciate in 80%.

As a result, we have: 0.1h0.9h0.5h0.5h0.9h0.8 = 1.62%

So, after rounding, the chances of extinction of Man through SETI attack in XXI century is around 1 per cent with a theoretical precision of an order of magnitude.

Our best protection in this context would be that civilization would very rarely met in the Universe. But this is not quite right, because the Fermi paradox here works on the principle of "Neither alternative is good":

  • If there are extraterrestrial civilizations, and there are many of them, it is dangerous because they can threaten us in one way or another.
  • If extraterrestrial civilizations do not exist, it is also bad, because it gives weight to the hypothesis of inevitable extinction of technological civilizations or of our underestimating of frequency of cosmological catastrophes. Or, a high density of space hazards, such as gamma-ray bursts and asteroids that we underestimate because of the observation selection effect—i.e., were we not here because already killed, we would not be making these observations….

Theoretically possible is a reverse option, which is that through SETI will come a warning message about a certain threat, which has destroyed most civilizations, such as: "Do not do any experiments with X particles, it could lead to an explosion that would destroy the planet." But even in that case remain a doubt, that there is no deception to deprive us of certain technologies. (Proof would be if similar reports came from other civilizations in space in the opposite direction.) But such communication may only enhance the temptation to experiment with X-particles.

So I do not appeal to abandon SETI searches, although such appeals are useless.

It may be useful to postpone any technical realization of the messages that we could get on SETI, up until the time when we will have our Artificial Intelligence. Until that moment, perhaps, is only 10-30 years, that is, we could wait. Secondly, it would be important to hide the fact of receiving dangerous SETI signal its essence and the source location.

This risk is related to a methodologically interesting aspect. Despite the fact that I have thought every day in the last year and read on the topic of global risks, I found this dangerous vulnerability in SETI only now. By hindsight, I was able to find another four authors who came to similar conclusions. However, I have made a significant finding: that there may be not yet open global risks, and even if the risk of certain constituent parts are separately known to me, it may take a long time to join them into a coherent picture. Thus, hundreds of dangerous vulnerabilities may surround us, like an unknown minefield. Only when the first explosion happens will we know. And that first explosion may be the last.

An interesting question is whether Earth itself could become a source of SETI-attack in the future when we will have our own AI. Obviously, that could. Already in the program of METI exists an idea to send the code of human DNA. (The “children's message scenario” – in which the children ask to take their piece of DNA and clone them on another planet –as depicted in the film “Calling all aliens”.)


1. Hoyle F. Andromeda. http://en.wikipedia.org/wiki/A_for_Andromeda

2. Yudkowsky E. Artificial Intelligence as a Positive and Negative Factor in Global Risk. Forthcoming in Global Catastrophic Risks, eds. Nick Bostrom and Milan Cirkovic http://www.singinst.org/upload/artificial-intelligence-risk.pdf

3.Moravec Hans. Mind Children: The Future of Robot and Human Intelligence, 1988.

4.Carrigan, Jr. Richard A. The Ultimate Hacker: SETI signals may need to be decontaminated http://home.fnal.gov/~carrigan/SETI/SETI%20Decon%20Australia%20poster%20paper.pdf

5. Carrigan’s page http://home.fnal.gov/~carrigan/SETI/SETI_Hacker.htm

AI prediction case study 5: Omohundro's AI drives

5 Stuart_Armstrong 15 March 2013 09:09AM

Myself, Kaj Sotala and Seán ÓhÉigeartaigh recently submitted a paper entitled "The errors, insights and lessons of famous AI predictions and what they mean for the future" to the conference proceedings of the AGI12/AGI Impacts Winter Intelligenceconference. Sharp deadlines prevented us from following the ideal procedure of first presenting it here and getting feedback; instead, we'll present it here after the fact.

The prediction classification shemas can be found in the first case study.

What drives an AI?

  • Classification: issues and metastatements, using philosophical arguments and expert judgement.

Steve Omohundro, in his paper on 'AI drives', presented arguments aiming to show that generic AI designs would develop 'drives' that would cause them to behave in specific and potentially dangerous ways, even if these drives were not programmed in initially (Omo08). One of his examples was a superintelligent chess computer that was programmed purely to perform well at chess, but that was nevertheless driven by that goal to self-improve, to replace its goal with a utility function, to defend this utility function, to protect itself, and ultimately to acquire more resources and power.

This is a metastatement: generic AI designs would have this unexpected and convergent behaviour. This relies on philosophical and mathematical arguments, and though the author has expertise in mathematics and machine learning, he has none directly in philosophy. It also makes implicit use of the outside view: utility maximising agents are grouped together into one category and similar types of behaviours are expected from all agents in this category.

In order to clarify and reveal assumptions, it helps to divide Omohundro's thesis into two claims. The weaker one is that a generic AI design could end up having these AI drives; the stronger one that it would very likely have them.

Omohundro's paper provides strong evidence for the weak claim. It demonstrates how an AI motivated only to achieve a particular goal, could nevertheless improve itself, become a utility maximising agent, reach out for resources and so on. Every step of the way, the AI becomes better at achieving its goal, so all these changes are consistent with its initial programming. This behaviour is very generic: only specifically tailored or unusual goals would safely preclude such drives.

The claim that AIs generically would have these drives needs more assumptions. There are no counterfactual resiliency tests for philosophical arguments, but something similar can be attempted: one can use humans as potential counterexamples to the thesis. It has been argued that AIs could have any motivation a human has (Arm,Bos13). Thus according to the thesis, it would seem that humans should be subject to the same drives and behaviours. This does not fit the evidence, however. Humans are certainly not expected utility maximisers (probably the closest would be financial traders who try to approximate expected money maximisers, but only in their professional work), they don't often try to improve their rationality (in fact some specifically avoid doing so (many examples of this are religious, such as the Puritan John Cotton who wrote 'the more learned and witty you bee, the more fit to act for Satan will you bee'(Hof62)), and some sacrifice cognitive ability to other pleasures (BBJ+03)), and many turn their backs on high-powered careers. Some humans do desire self-improvement (in the sense of the paper), and Omohundro cites this as evidence for his thesis. Some humans don't desire it, though, and this should be taken as contrary evidence (or as evidence that Omohundro's model of what constitutes self-improvement is overly narrow). Thus one hidden assumption of the model is:

  • Generic superintelligent AIs would have different motivations to a significant subset of the human race, OR
  • Generic humans raised to superintelligence would develop AI drives.
continue reading »

View more: Next