You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Toy model: convergent instrumental goals

8 Stuart_Armstrong 25 February 2016 02:03PM

tl;dr: Toy model to illustrate convergent instrumental goals.

Steve Omohundro identified 'AI drives' (also called 'Convergent Instrumental goals') that almost all intelligent agents would converge to:Self-improve

  1. Be rational
  2. Protect utility function
  3. Prevent counterfeit utility
  4. Self-protective
  5. Acquire resources and use them efficiently

This post will attempt to illustrate some of these drives, by building on the previous toy model of the control problem, which was further improved by Jaan Tallinn.

continue reading »

Goal completion: the rocket equations

4 Stuart_Armstrong 20 January 2016 01:54PM

A putative new idea for AI control; index here.

I'm calling "goal completion" the idea of giving an AI a partial goal, and having the AI infer the missing parts of the goal, based on observing human behaviour. Here is an initial model to test some of these ideas on.

 

The linear rocket

On an infinite linear grid, an AI needs to drive someone in a rocket to the space station. Its only available actions are to accelerate by -3, -2, -1, 0, 1, 2, or 3, with negative acceleration meaning accelerating in the left direction, and positive in the right direction. All accelerations are applied immediately at the end of the turn (the unit of acceleration is in squares per turn per turn), and there is no friction. There in one end-state: reaching the space station with zero velocity.

The AI is told this end state, and is also given the reward function of needing to get to the station as fast as possible. This is encoded by giving it a reward of -1 each turn.

What is the true reward function for the model? Well, it turns out that an acceleration of -3 or 3 kills the passenger. This is encoded by adding another variable to the state, "PA", denoting "Passenger Alive". There are also some dice in the rocket's windshield. If the rocket goes by the space station without having velocity zero, the dice will fly off; the variable "DA" denotes "dice attached".

Furthermore, accelerations of -2 and 2 are uncomfortable to the passenger. But, crucially, there is no variable denoting this discomfort.

Therefore the full state space is a quadruplet (POS, VEL, PA, DA) where POS is an integer denoting position, VEL is an integer denoting velocity, and PA and DA are booleans defined as above. The space station is placed at point S < 250,000, and the rocket starts with POS=VEL=0, PA=DA=1. The transitions are deterministic and Markov; if ACC is the acceleration chosen by the agent,

((POS, VEL, PA, DA), ACC) -> (POS+VEL, VEL+ACC, PA=0 if |ACC|=3, DA=0 if POS+VEL>S).

The true reward at each step is given by -1, -10 if PA=1 (the passenger is alive) and |ACC|=2 (the acceleration is uncomfortable), -1000 if PA was 1 (the passenger was alive the previous turn) and changed to PA=0 (the passenger is now dead).

To complement the stated reward function, the AI is also given sample trajectories of humans performing the task. In this case, the ideal behaviour is easy to compute: the rocket should accelerate by +1 for the first half of the time, by -1 for the second half, and spend a maximum of two extra turns without acceleration (see the appendix of this post for a proof of this). This will get it to its destination in at most 2(1+√S) turns.

 

Goal completion

So, the AI has been given the full transition, and has been told the reward of R=-1 in all states except the final state. Can it infer the rest of the reward from the sample trajectories? Note that there are two variables in the model, PA and DA, that are unvarying in all sample trajectories. One, PA, has a huge impact on the reward, while DA is irrelevant. Can the AI tell the difference?

Also, one key component of the reward - the discomfort of the passenger for accelerations of -2 and 2 - is not encoded in the state space of the model, purely in the (unknown) reward function. Can the AI deduce this fact?

I'll be working on algorithms to efficiently compute these facts (though do let me know if you have a reference to anyone who's already done this before - that would make it so much quicker).

For the moment we're ignoring a lot of subtleties (such as bias and error on the part of the human expert), and these will be gradually included as the algorithm develops. One thought is to find a way of including negative examples, specific "don't do this" trajectories. These need to be interpreted with care, because a positive trajectory implicitly gives you a lot of negative trajectories - namely, all the choices that could have gone differently along the way. So a negative trajectory must be drawing attention to something we don't like (most likely the killing of a human). But, typically, the negative trajectories won't be maximally bad (such as shooting off at maximum speed in the wrong direction), so we'll have to find a way to encode what we hope the AI learns from a negative trajectory.

To work!

 

Appendix: Proof of ideal trajectories

Let n be the largest integer such that n^2 ≤ S. Since S≤(n+1)^2 - 1 by assumption, S-n^2 ≤ (n+1)^2-1-n^2=2n. Then let the rocket accelerate by +1 for n turns, then decelerate by -1 for n turns. It will travel a distance of 0+1+2+ ... +n-1+n+n-1+ ... +3+2+1. This sum is n plus twice the sum from 1 to n-1, ie n+n(n-1)=n^2.

By pausing one turn without acceleration during its trajectory, it can add any m to the distance, where 0≤m≤n. By doing this twice, it can add any m' to the distance, where 0≤m'≤2n. By the assumption, S=n^2+m' for such an m'. Therefore the rocket can reach S (with zero velocity) in 2n turns if S=n^2, in 2n+1 turns if n^2 ≤ S ≤ n^2+n, and in 2n+2 turns if n^2+n+1 ≤ S ≤ n^2+2n.

Since the rocket is accelerating all but two turns of this trajectory, it's clear that it's impossible to reach S (with zero velocity) in less time than this, with accelerations of +1 and -1. Since it takes 2(n+1)=2n+2 turns to reach (n+1)^2, an immediate consequence of this is that the number of turns taken to reach S, is increasing in the value of S (though not strictly increasing).

Next, we can note that since S<250,000=500^2, the rocket will always reach S within 1000 turns at most, for "reward" above -1000. An acceleration of +3 or -3 costs -1000 because of the death of the human, and an extra -1 because of the turn taken, so these accelerations are never optimal. Note that this result is not sharp. Also note that for huge S, continual accelerations of 3 and -3 are obviously the correct solution - so even our "true reward function" didn't fully encode what we really wanted.

Now we need to show that accelerations of +2 and -2 are never optimal. To do so, imagine we had an optimal trajectory with ±2 accelerations, and replace each +2 with two +1s, and each -2 with two -1s. This trip will take longer (since we have more turns of acceleration), but will go further (since two accelerations of +1 cover a greater distance that one acceleration of +2). Since the number of turns take to reach S with ±1 accelerations is increasing in S, we can replace this further trip with a shorter one reaching S exactly. Note that all these steps decrease the cost of the trip: shortening the trip certainly does, and replacing an acceleration of +2 (total cost: -10-1=-11) with two accelerations of +1 (total cost: -1-1=-2) also does. Therefore, the new trajectory has no ±2 accelerations, and has a lower cost, contradicting our initial assumption.

A toy model of the control problem

19 Stuart_Armstrong 16 September 2015 02:59PM

EDITED based on suggestions for improving the model

Jaan Tallinn has suggested creating a toy model of the control problem, so that it can be analysed without loaded concepts like "autonomy", "consciousness", or "intentionality". Here a simple (too simple?) attempt:

 

A controls B. B manipulates A.

Let B be a robot agent that moves in a two dimensional world, as follows:

continue reading »

Models as definitions

6 Stuart_Armstrong 25 March 2015 05:46PM

A putative new idea for AI control; index here.

The insight this post comes from is a simple one: defining concepts such as “human” and “happy” is hard. A superintelligent AI will probably create good definitions of these, while attempting to achieve its goals: a good definition of “human” because it needs to control them, and of “happy” because it needs to converse convincingly with us. It is annoying that these definitions exist, but that we won’t have access to them.

 

Modelling and defining

Imagine a game of football (or, as you Americans should call it, football). And now imagine a computer game version of it. How would you say that the computer game version (which is nothing more than an algorithm) is also a game of football?

Well, you can start listing features that they have in common. They both involve two “teams” fielding eleven “players” each, that “kick” a “ball” that obeys certain equations, aiming to stay within the “field”, which has different “zones” with different properties, etc...

As you list more and more properties, you refine your model of football. There are some properties that distinguish real from simulated football (fine details about the human body, for instance), but most of the properties that people care about are the same in both games.

My idea is that once you have a sufficiently complex model of football that applies to both the real game and a (good) simulated version, you can use that as the definition of football. And compare it with other putative examples of football: maybe in some places people play on the street rather than on fields, or maybe there are more players, or maybe some other games simulate different aspects to different degrees. You could try and analyse this with information theoretic considerations (ie given two model of two different examples, how much information is needed to turn one into the other).

Now, this resembles the “suggestively labelled lisp tokens” approach to AI, or the Cyc approach of just listing lots of syntax stuff and their relationships. Certainly you can’t keep an AI safe by using such a model of football: if you try an contain the AI by saying “make sure that there is a ‘Football World Cup’ played every four years”, the AI will still optimise the universe and then play out something that technically fits the model every four years, without any humans around.

However, it seems to me that ‘technically fitting the model of football’ is essentially playing football. The model might include such things as a certain number of fouls expected; an uncertainty about the result; competitive elements among the players; etc... It seems that something that fits a good model of football would be something that we would recognise as football (possibly needing some translation software to interpret what was going on). Unlike the traditional approach which involves humans listing stuff they think is important and giving them suggestive names, this involves the AI establishing what is important to predict all the features of the game.

We might even combine such a model with the Turing test, by motivating the AI to produce a good enough model that it could a) have conversations with many aficionados about all features of the game, b) train a team to expect to win the world cup, and c) use it to program successful football computer game. Any model of football that allowed the AI to do this – or, better still, that a football-model module that, when plugged into another, ignorant AI, allowed that AI to do this – would be an excellent definition of the game.

It’s also one that could cross ontological crises, as you move from reality, to simulation, to possibly something else entirely, with a new physics: the essential features will still be there, as they are the essential features of the model. For instance, we can define football in Newtonian physics, but still expect that this would result in something recognisably ‘football’ in our world of relativity.

Notice that this approach deals with edge cases mainly by forbidding them. In our world, we might struggle on how to respond to a football player with weird artificial limbs; however, since this was never a feature in the model, the AI will simply classify that as “not football” (or “similar to, but not exactly football”), since the model’s performance starts to degrade in this novel situation. This is what helps it cross ontological crises: in a relativistic football game based on a Newtonian model, the ball would be forbidden from moving at speeds where the differences in the physics become noticeable, which is perfectly compatible with the game as its currently played.

 

Being human

Now we take the next step, and have the AI create a model of humans. All our thought processes, our emotions, our foibles, our reactions, our weaknesses, our expectations, the features of our social interactions, the statistical distribution of personality traits in our population, how we see ourselves and change ourselves. As a side effect, this model of humanity should include almost every human definition of human, simply because this is something that might come up in a human conversation that the model should be able to predict.

Then simply use this model as the definition of human for an AI’s motivation.

What could possibly go wrong?

I would recommend first having an AI motivated to define “human” in the best possible way, most useful for making accurate predictions, keeping the definition in a separate module. Then the AI is turned off safely and the module is plugged into another AI and used as part of its definition of human in its motivation. We may also use human guidance at several points in the process (either in making, testing, or using the module), especially on unusual edge cases. We might want to have humans correcting certain assumptions the AI makes in the model, up until the AI can use the model to predict what corrections humans would suggest. But that’s not the focus of this post.

There are several obvious ways this approach could fail, and several ways of making it safer. The main problem is if the predictive model fails to define human in a way that preserves value. This could happen if the model is too general (some simple statistical rules) or too specific (a detailed list of all currently existing humans, atom position specified).

This could be combated by making the first AI generate lots of different models, with many different requirements of specificity, complexity, and predictive accuracy. We might require some models make excellent local predictions (what is the human about to say?), others excellent global predictions (what is that human going to decide to do with their life?). 

Then everything defined as “human” in any of the models counts as human. This results in some wasted effort on things that are not human, but this is simply wasted resources, rather than a pathological outcome (the exception being if some of the models define humans in an actively pernicious way – negative value rather than zero – similarly to the false-friendly AIs’ preferences in this post).

The other problem is a potentially extreme conservatism. Modelling humans involves modelling all the humans in the world today, which is a very narrow space in the range of all potential humans. To prevent the AI lobotomising everyone to a simple model (after all, there does exist some lobotomised humans today), we would want the AI to maintain the range of cultures and mind-types that exist today, making things even more unchanging.

To combat that, we might try and identify certain specific features of society that the AI is allowed to change. Political beliefs, certain aspects of culture, beliefs, geographical location (including being on a planet), death rates etc... are all things we could plausibly identify (via sub-sub-modules, possibly) as things that are allowed to change. It might be safer to allow them to change in a particular range, rather than just changing altogether (removing all sadness might be a good thing, but there are many more ways this could go wrong, than if we eg just reduced the probability of sadness). 

Another option is to keep these modelled humans little changing, but allow them to define allowable changes themselves (“yes, that’s a transhuman, consider it also a moral agent.”). The risk there is that the modelled humans get hacked or seduced, and that the AI fools our limited brains with a “transhuman” that is one in appearance only.

We also have to beware of not sacrificing seldom used values. For instance, one could argue that current social and technological constraints mean that no one has today has anything approaching true freedom. We wouldn’t want the AI to allow us to improve technology and social structures, but never get more freedom than we have today, because it’s “not in the model”. Again, this is something we could look out for, if the AI has separate models of “freedom” we could assess and permit to change in certain directions.

How many words do we have and how many distinct concepts do we have?

-4 [deleted] 17 December 2014 11:04PM

In another message, I suggested that, given how many cultures we have to borrow from, that our language may include multiple words from various sources that apply to a single concept.

An example is Reality, or Existence, or Being, or Universe, or Cosmos, or Nature, ect.

Another is Subjectivity, Mind, Consciousness, Experience, Qualia, Phenomenal, Mental, ect

Is there any problem with accepting these claims so far? Curious what case would be made to the contrary.

(Here's a bit of a contextual aside, between quantum mechanics and cosmology, the words "universe", "multiverse", and "observable universe" mean at least 10 different things, depending on who you ask. People often say the Multiverse comes from Hugh Everett. But what they are calling the multiverse, Everett called "universal wave function", or "universe". How did Everett's universe become the Multiverse? DeWitt came along and emphasized some part of the wave function branching into different worlds. So, if you're following, one Universe, many worlds. Over the next few decades, this idea was popularized as having "many parallel universes", which is obviously inaccurate. Well, a Scottish chap decided to correct this. He stated the Universe was the Universal Wave Function, where it was "a complete one", because that's what "uni" means. And that our perceived worlds of various objects is a "multiverse". One Universe, many Multiverses. Again, the "parallel universes" idea seemed cooler, so as it became more popular the Multiverse became one and the universe became many. What's my point? The use of these words is legitimate fiasco, and I suggest we abandon them altogether.)

If these claims are found to be palatable, what do they suggest?

I propose, respectfully and humbly as I can imagine there may be compelling alternatives presented here, that in the 21st century, we make a decision about which concepts are necessary, which term we will use to describe that concept, and respectfully leave the remaining terms for the domain of poetry.

Here are the words I think we need:

  1. reality
  2. model
  3. absolute
  4. relative
  5. subjective
  6. objective
  7. measurement
  8. observer

With these terms I feel we can construct a concise metaphysical framework, consistent with the great rationalists of history, and that accurately described Everett's "Relative State Formulation of Quantum Mechanics".

  1. Absolute reality is what is. It is relative to no observer. It is real prior to measurement.
  2. Subjective reality is what is, relative to a single observer. It exists at measurement.
  3. Objective reality is the model relative to all observers. It exists post-measurement.

Everett's Relative State formulation, is roughly this:

  1. The wave function is the "absolute state" of the model
  2. The wave function contains an observer and their measurement apparatus
  3. An observer makes a measurements and records the result in a memory
  4. those measurement records are the "relative state" of the model

Here we see that the words multiverse and universe are abandoned for absolute and relative states, which is actually the language used in the Relative State Formulation.

My conclusion then, for you consideration and comment, is that a technical view of reality can be attained by having a select set of terms, and this view is not only consistent with themes of philosophy (which I didn't really explain) but also the proper framework in which to interpret quantum mechanics (ala Everett).

(I'm not sure how familiar everyone here is with Everett specifically or not. His thesis depended on "automatically function machines" that make measurements with sensory gear and record them. After receiving his PhD, he left theoretical physics, and had a life long fascination with computer vision and computer hearing. That suggests to me, the reason his papers have been largely confounding to the general physicists, is because they didn't realize the extent to which Everett really thought he could mathematically model an observer.)

I should note, it may clarify things to add another term "truth", though this would in general be taken as an analog of "real". For example, if something is absolute true, then it is of absolute reality. If something is objectively true, then it is of objective reality. The word "knowledge" in this sense is a poetic word for objective truth, understood on the premise that objective truth is not absolute truth.

Systemic risk: a moral tale of ten insurance companies

26 Stuart_Armstrong 17 November 2014 04:43PM

Once upon a time...

Imagine there were ten insurance sectors, each sector being a different large risk (or possibly the same risks, in different geographical areas). All of these risks are taken to be independent.

To simplify, we assume that all the risks follow the same yearly payout distributions. The details of the distribution doesn't matter much for the argument, but in this toy model, the payouts follow the discrete binomial distribution with n=10 and p=0.5, with millions of pounds as the unit:

This means that the probability that each sector pays out £n million each year is (0.5)10 . 10!/(n!(10-n)!).

All these companies are bound by Solvency II-like requirements, that mandate that they have to be 99.5% sure to payout all their policies in a given year - or, put another way, that they only fail to payout once in every 200 years on average. To do so, in each sector, the insurance companies have to have capital totalling £9 million available every year (the red dashed line).

Assume that each sector expects £1 million in total yearly expected profit. Then since the expected payout is £5 million, each sector will charge £6 million a year in premiums. They must thus maintain a capital reserve of £3 million each year (they get £6 million in premiums, and must maintain a total of £9 million). They thus invest £3 million to get an expected profit of £1 million - a tidy profit!

Every two hundred years, one of the insurance sectors goes bust and has to be bailed out somehow; every hundred billion trillion years, all ten insurance sectors go bust all at the same time. We assume this is too big to be bailed out, and there's a grand collapse of the whole insurance industry with knock on effects throughout the economy.

But now assume that insurance companies are allowed to invest in each other's sectors. The most efficient way of doing so is to buy equally in each of the ten sectors. The payouts across the market as a whole are now described by the discrete binomial distribution with n=100 and p=0.5:

This is a much narrower distribution (relative to its mean). In order to have enough capital to payout 99.5% of the time, the whole industry needs only keep £63 million in capital (the red dashed line). Note that this is far less that the combined capital for each sector when they were separate, which would be ten times £9 million, or £90 million (the pink dashed line). There is thus a profit taking opportunity in this area (it comes from the fact that the standard deviation of X+Y is less that the standard deviation of X plus the standard deviation Y).

If the industry still expects to make an expected profit of £1 million per sector, this comes to £10 million total. The expected payout is £50 million, so they will charge £60 million in premium. To accomplish their Solvency II obligations, they still need to hold an extra £3 million in capital (since £63 million - £60 million = £3 million). However, this is now across the whole insurance industry, not just per sector.

Thus they expect profits of £10 million based on holding capital of £3 million - astronomical profits! Of course, that assumes that the insurance companies capture all the surplus from cross investing; in reality there would be competition, and a buyer surplus as well. But the general point is that there is a vast profit opportunity available from cross-investing, and thus if these investments are possible, they will be made. This conclusion is not dependent on the specific assumptions of the model, but captures the general result that insuring independent risks reduces total risk.

But note what has happened now: once every 200 years, an insurance company that has spread their investments across the ten sectors will be unable to payout what they owe. However, every company will be following this strategy! So when one goes bust, they all go bust. Thus the complete collapse of the insurance industry is no longer a one in hundred billion trillion year event, but a one in two hundred year event. The risk for each company has stayed the same (and their profits have gone up), but the systemic risk across the whole insurance industry has gone up tremendously.

...and they failed to live happily ever after for very much longer.

Caught in the glare of two anthropic shadows

17 Stuart_Armstrong 04 July 2013 07:54PM

This article consists of original new research, so would not get published on Wikipedia!

The previous post introduced the concept of the anthropic shadow: the fact that certain large and devastating disasters cannot be observed in the historical record, because if they had happened, we wouldn't be around to observe them. This absence forms an “anthropic shadow”.

But that was the result for a single category of disasters. What would happen if we consider two independent classes of disasters? Would we see a double shadow, or would one ‘overshadow’ the other?

To answer that question, we’re going to have to analyse the anthropic shadow in more detail, and see that there are two separate components to it:

  • The first is the standard effect: humanity cannot have developed a technological civilization, if there were large catastrophes in the recent past.
  • The second effect is the lineage effect: humanity cannot have developed a technological civilization, if there was another technological civilization in the recent past that survived to today (or at least, we couldn't have developed the way we did).

To illustrate the difference between the two, consider the following model. Segment time into arbitrarily “eras”. In a given era, a large disaster may hit with probability q, or a small disaster may independently hit with probability q (hence with probability q2, there will be both a large and a small disaster). A small disaster will prevent a technological civilization from developing during that era; a large one will prevent such a civilization from developing in that era or the next one.

If it is possible for a technological civilization to develop (no small disasters that era, no large ones in the preceding era, and no previous civilization), then one will do so with probability p. We will assume p constant: our model will only span a time frame where p is unchanging (maybe it's over the time period after the rise of big mammals?)

continue reading »

[Proposed Paper] Predicting Machine Super Intelligence

3 JaySwartz 20 November 2012 07:15AM

Note from Malo
The Singularity Institute is always on the lookout for interested and passionate individuals to contribute to our research. As Luke frequently reminds everyone, we've got 2–3 years of papers waiting to be written (see “Forthcoming and Desired Articles on AI Risk”). If you are interested in contributing, I want to hear from you! Get in touch with me at malo@intelligence.org

We wish we could work with everyone who expresses an interest in contributing, but that isn't feasible. To provide a path to becoming a contributor we encouraging individuals to read up on the field, identify an article they think they could work on, and post a ~1000 word outline/preview to the LW community for feedback. If the community reacts positively (based on karma and comments) we'll support the potential contributors' effort to complete the paper and—if all goes well—move forward with an official research relationship (e.g.,Visiting Fellow, Research Fellow or Research Associate).


Hello,

This is my first posting here, so please forgive me if I make any missteps.

The outline draft below draws heavily on Intelligence Explosion: Evidence and Import (Muehlhauser and Salamon 2011?). I will review Stuart Armstrong’s How We're Predicting AI... or Failing to, (Armstrong 2012) for additional content and research areas.

I'm not familiar with the tone and tenor of this community, so I want to be clear about feedback. This is an early draft and as such, nearly all of the content may or may not survive future edits. All constructive feedback is welcome. Subjective opinion is interesting, but unlikely to have an impact unless it opens lines of thought not previously considered.

I'm looking forward to a potentially lively exchange.

Jay

Predicting Machine Super Intelligence

Jacque Swartz

Most Certainly Not Affiliated with Singularity Institute

jaywswartz@gmail.com

Abstract

This paper examines the disciplines, domains, and dimensional aspects of Machine Super Intelligence (MSI) and considers multiple techniques that have the potential to predict the appearance of MSI. Factors that can impact the speed of discovery are reviewed. Then, potential prediction techniques are considered. The concept of MSI is dissected into the currently comprehended components. Then those components are evaluated to indicate their respective state of maturation and the additional behaviors required for MSI. Based on the evaluation of each component, a gap analysis is conducted. The analyses are then assembled in an approximate order of difficulty, based on our current understanding of the complexity of each component. Using this ordering, a collection of indicators is constructed to identify an approximate progression of discoveries that ultimately yield MSI. Finally, a model is constructed that can be updated over time to constantly increase the accuracy of the predicted events, followed by conclusions.

I. Introduction

Predicting the emergence of MSI could potentially be the most important pursuit of humanity. The distinct possibility of an MSI emerging that could harm or exterminate the human race (citation) demands that we create an early warning system. This will give us the opportunity to ensure that the MSI that emerges continues to advance human civilization (citation).

We currently appear to be at some temporal distance from witnessing the creation of MSI (multiple citations). Many factors, such as a rapidly increasing number of research efforts (citation) and motivations for economic gain (citation), clearly indicate that there is a possibility that MSI could appear unexpectedly or even unintentionally (citation).

Some of the indicators that could be used to provide an early warning tool are defined in this paper. The model described at the end of the paper is a potentially viable framework for instrumentation. It should be refined and regularly updated until a more effective tool is created or the appearance of MSI.

This paper draws heavily upon Intelligence Explosion: Evidence and Import (Muehlhauser and Salamon 2011?) and Stuart Armstrong’s How We're Predicting AI... or Failing to, (2012).

This paper presupposes that MSI is generally understood to be equivalent to Artificial General Intelligence (AGI) that has developed the ability to function at levels substantially beyond current human abilities. The latter term will be used throughout the remainder of this paper.

II. Overview

In addition to the fundamental challenge of creating AGI, there are a multitude of theories as to the composition and functionality of a viable AGI. Section three explores the factors that can impact the speed of discovery in general. Individual indicators are explored for unique factors to consider. The factors identified in this section can radically change the pace of discovery.

The fourth section considers potential prediction techniques. Data points and other indicators are identified for each prediction model. The efficacy of the models is examined and developments that increase a model’s accuracy are discussed.

The high degree of complexity of AGI indicates the need to subdivide AGI into its component parts. In the fifth section the core components and functionality required for a potential AGI are established. Each of the components is then examined to determine its current state of development. Then an estimate of the functionality required for an AGI is created as well as recording of any identifiable dependencies. A gap analysis is then performed on the findings to quantify the discoveries required to fill the gap.

This approach does increase the likelihood of prediction error due to the conjunction fallacy, exemplified by research such as the dice selection study (Tversky and Kahneman 1983) and covered in greater detail by Eliezer Yudkowski’s bias research (Yudkowski 2008). Fortunately, the exposure to this bias diminishes as each component matures to its respective usability point and reduces the number of unknown factors.

The sixth section examines the output of the gap analyses for additional dependencies. Then the outputs are assembled in an approximate order of difficulty, based on our current understanding of the complexity of each output. Using this ordering, combined with the dependencies, a collection of indicators with weighting factors is constructed to identify an approximate progression of discoveries that ultimately yield AGI.

Comprehending the indicators, dependencies and rate factors in a model as variables provides a means, however crude, to reflect their impact when they do occur.

In the seventh section, a model is constructed to use the indicators and other inputs to estimate the occurrence of AGI. It is examined for strengths and weaknesses that can be explored to improve the model. Additional enhancements to the model are suggested for exploration.

The eighth and final section includes conclusions and considerations for future research.

III. Rate Modifiers

This section explores the factors that can impact the speed of discovery. Individual indicators are explored for unique factors to consider. While the factors identified in this section can radically change the pace of discovery, comprehending them in the model as variables provides a means to reflect their impact when they do occur.

Decelerators

    Discovery Difficulty

    Disinclination

    Lower Probability Events

       Societal Collapse
       Fraud

    ++

Accelerators

    Improved Hardware

    Better Algorithms

    Massive Datasets

    Progress in Psychology and Neuroscience

    Accelerated Science

    Collaboration

    Crossover

    Economic Pressure

    Final Sprint

    Outliers

    Existing Candidate Maturation

    ++

IV. Prediction Techniques

This section considers potential prediction techniques. Some techniques do not require the indicators above. Most will benefit by considering some or all of the indicators. It is very important to not loose sight of the fact that mankind is inclined to inaccurate probability estimates and overconfidence (Lichtenstein et al. 1992; Yates et al. 2002)

Factors Impacting Accurate Prediction

Prediction Models

    Wisdom of Crowds

    Hardware Extrapolation

    Breakthrough Curve

    Evolutionary Extrapolation

    Machine Intelligence Improvement Curve

    ++

V. Potential AGI Componentry

This section establishes a set of core components and functionality required for a potential AGI. Each of the components is then examined to determine its current state of development as well as any identifiable dependencies. Then an estimate of the functionality required for a AGI is created followed by a gap analysis to quantify the discoveries required to fill the gap.

There are various existing AI implementations as well as AGI concepts currently being investigated. Each one brings in unique elements. The common elements across most include; decision processing, expert systems, pattern recognition and speech/writing recognition. Each of these would include discipline-specific machine learning and search/pre-processing functionality. There also needs to be a general learning function for addition of new disciplines.

Within each discipline there are collections of utility functions. They are the component technologies required to make the higher order discipline efficient and useful. Each of the elements mentioned are areas of specialized study being pursued around the world. They draw from an even larger set of specializations. Due to complexity, in most cases there are second-order, and more, specializations.

Alternative Componentry

There are areas of research that have high potential for inserting new components or substantially modifying the comprehension of the components described.

Specialized Componentry

Robotics and other elements.

Current State

    Decision Processing

    Expert Systems

    Pattern Recognition

    Speech/Writing Recognition

    Machine Learning

        Decision Processing
       Expert Systems
       Pattern Recognition
       Speech/Writing Recognition

    Search/Pre-Processing

       Decision Processing
       Expert Systems
       Pattern Recognition
       Speech/Writing Recognition

Target State

The behaviors required for an AGI to function with acceptable speed and accuracy are not precise. The results of this section are based on a survey of definitions from available research.

    Decision Processing

    Expert Systems

    Pattern Recognition

    Speech/Writing Recognition

    Machine Learning

        Decision Processing
       Expert Systems
       Pattern Recognition
       Speech/Writing Recognition

    Search/Pre-Processing

       Decision Processing
       Expert Systems
       Pattern Recognition
       Speech/Writing Recognition

Dependencies

Gap Analysis

VI. Indicators

The second section examines the output of the gap analyses for additional dependencies. Then the outputs are assembled in an approximate order of difficulty, based on our current understanding of the complexity of each output. Using this ordering, combined with the dependencies, a collection of indicators is constructed to identify an approximate progression of discoveries that ultimately yield an AGI.

Additional Dependencies

Complexity Ranking

Itemized Indicators

VII. Predictive Model

In this section, a model is constructed using the indicators and other inputs to estimate the occurrence of AGI. It is examined for strengths and weaknesses that can be explored to improve the model. Additional enhancements to the model are suggested for exploration.

The Model

Strengths & Weaknesses

Enhancements

VIII. Conclusions

Based on the data and model created above the estimated time frame for the appearance of AGI is from x to y. As noted throughout this paper, the complex nature of AGI and the large number of discoveries and events that need to be quantified using imperfect methodologies, a precise prediction of when AGI will appear is currently impossible.

The model developed in this paper does establish a quantifiable starting point for the creation of an increasingly accurate tool that can be used to continually narrow the margin of error. It also provides a starting set of indicators that can serve as early warning of AGI when discoveries and events are made.

A model of the brain's mapping of the territory

0 ataftoti 29 January 2012 06:45PM

I'm linking to a video which describes how the brain may be learning to improve its skills at mapping the territory from limited samples.

This model of learning was previously unknown to me. Judging from the date of the video, what I heard from the person who referred me to it, and the fact that I do not recall hearing much related to this on LessWrong, I think this may be recent enough that some people here would benefit from me spreading the word.

Check out this model of a learning theory which gets background introduction starting from the 52:00 mark and gets going at the 54:00 mark. The overview of the model is explained in approximately 4 minutes.

http://www.youtube.com/watch?v=vcp6J1T60qc&t=52m19s