AI indifference through utility manipulation

2Stuart_Armstrong02 September 2010 05:06PM

Indifference is a precious and rare commodity for complex systems. The most likely effect of making a change in an intricate apparatus is a whole slew of knock-on effects crowned with unintended consequences. It would be ideal if one could make a change and be sure that the effects would remain isolated - that the rest of the system would be indifferent to the change.

For instance, it might be a sensible early-AI precaution to have an extra observer somewhere, sitting with his hand upon a button, ready to detonate explosives should the AI make a visible power grab. Except, of course, the AI will become aware of this situation, and will factor it in in any plans it makes, either by increasing its deception or by grabbing control of the detonation system as a top priority. We would be a lot safer if the AI were somehow completely indifferent to the observer and the explosives. That is a complex wish that we don't really know how to phrase; let's make it simpler, and make it happen.

continue reading »

Rationality quotes: September 2010

4Morendil01 September 2010 06:53AM

This is our monthly thread for collecting these little gems and pearls of wisdom, rationality-related quotes you've seen recently, or had stored in your quotesfile for ages, and which might be handy to link to in one of our discussions.

  • Please post all quotes separately, so that they can be voted up/down separately.  (If they are strongly related, reply to your own comments.  If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote comments/posts on LW/OB.
  • No more than 5 quotes per person per monthly thread, please.

Berkeley LW Meet-up Sunday September 5

5LucasSloan01 September 2010 02:47AM

So it has come to my attention that there are a lot of LWers in and around Berkeley, so I thought it might be nice if we all got together and shot the breeze for a couple hours.  I think it would be good to meet at the Starbucks at 2224 Shattuck Avenue at 7 o'clock on September 5th.

Less Wrong: Open Thread, September 2010

2matt01 September 2010 01:40AM

This thread is for the discussion of Less Wrong topics that have not appeared in recent posts. If a discussion gets unwieldy, celebrate by turning it into a top-level post.

Dreams of AIXI

2jacob_cannell30 August 2010 10:15PM

Implications of the Theory of Universal Intelligence

If you hold the AIXI theory for universal intelligence to be correct; that it is a useful model for general intelligence at the quantitative limits, then you should take the Simulation Argument seriously.


AIXI shows us the structure of universal intelligence as computation approaches infinity.  Imagine that we had an infinite or near-infinite Turing Machine.  There then exists a relatively simple 'brute force' optimal algorithm for universal intelligence. 


Armed with such massive computation, we could just take all of our current observational data and then use a particular weighted search through the subspace of all possible programs that correctly predict this sequence (in this case all the data we have accumulated to date about our small observable slice of the universe).  AIXI in raw form is not computable (because of the halting problem), but the slightly modified time limited version is, and this is still universal and optimal.


The philosophical implication is that actually running such an algorithm on an infinite Turing Machine would have the interesting side effect of actually creating all such universes.

AIXI’s mechanics, based on Solomonoff Induction, bias against complex programs with an exponential falloff ( 2^-l(p) ), a mechanism similar to the principle of Occam’s Razor.  The bias against longer (and thus more complex) programs, lends a strong support to the goal of String Theorists, who are attempting to find a simple, shorter program that can unify all current physical theories into a single compact description of our universe.  We must note that to date, efforts towards this admirable (and well-justified) goal have not born fruit.  We may actually find that the simplest algorithm that explains our universe is more ad-hoc and complex than we would desire it to be.  But leaving that aside, imagine that there is some relatively simple program that concisely explains our universe.

If we look at the history of the universe to date, from the Big Bang to our current moment in time, there appears to be a clear local telic evolutionary arrow towards greater X, where X is sometimes described as or associated with: extropy, complexity, life, intelligence, computation, etc etc.  Its also fairly clear that X (however quantified) is an exponential function of time.  Moore’s Law is a specific example of this greater pattern.


This leads to a reasonable inductive assumption, let us call it the reasonable assumption of progress: local extropy will continue to increase exponentially for the foreseeable future, and thus so will intelligence and computation (both physical computational resources and algorithmic efficiency). The reasonable assumption of progress appears to be a universal trend, a fundamental emergent property of our physics.


Simulations

If you accept that the reasonable assumption of progress holds, then AIXI implies that we almost certainly live in a simulation now.


As our future descendants expand in computational resources and intelligence, they will approach the limits of universal intelligence.  AIXI says that any such powerful universal intelligence, no matter what its goals or motivations, will create many simulations which effectively are pocket universes.  


The AIXI model proposes that simulation is the core of intelligence (with human-like thoughts being simply one approximate algorithm), and as you approach the universal limits, the simulations which universal intelligences necessarily employ will approach the fidelity of real universes - complete with all the entailed trappings such as conscious simulated entities.


The reasonable assumption of progress modifies our big-picture view of cosmology and the predicted history and future of the universe.  A compact physical theory of our universe (or multiverse), when run forward on a sufficient Universal Turing Machine, will lead not to one single universe/multiverse, but an entire ensemble of such multi-verses embedded within each other in something like a hierarchy of Matryoshka dolls.

The number of possible levels of embedding and the branching factor at each step can be derived from physics itself, and although such derivations are preliminary and necessarily involve some significant unknowns (mainly related to the final physical limits of computation), suffice to say that we have sufficient evidence to believe that the branching factor is absolutely massive, and many levels of simulation embedding are possible.

Some seem to have an intrinsic bias against the idea bases solely on its strangeness.

Another common mistake stems from the anthropomorphic bias: people tend to image the simulators as future versions of themselves.

The space of potential future minds is vast, and it is a failure of imagination on our part to assume that our descendants will be similar to us in details, especially when we have specific reasons to conclude that they will be vastly more complex.

Asking whether future intelligences will run simulations for entertainment or other purposes are not the right questions, not even the right mode of thought.  They may, they may not, it is difficult to predict future goal systems.  But those aren’t important questions anyway, as all universe intelligences will ‘run’ simulations, simply because that precisely is the core nature of intelligence itself.  As intelligence expands exponentially into the future, the simulations expand in quantity and fidelity.


The Assemble of Multiverses


Some critics of the SA rationalize their way out by advancing a position of ignorance concerning the set of possible external universes our simulation may be embedded within.  The reasoning then concludes that since this set is essentially unknown, infinite and uniformly distributed, that the SA as such thus tells us nothing. These assumptions do not hold water.

Imagine our physical universe, and its minimal program encoding, as a point in a higher multi-dimensional space.  The entire aim of physics in a sense is related to AIXI itself: through physics we are searching for the simplest program that can consistently explain our observable universe.  As noted earlier, the SA then falls out naturally, because it appears that any universe of our type when ran forward necessarily leads to a vast fractal hierarchy of embedded simulated universes.

At the apex is the base level of reality and all the other simulated universes below it correspond to slightly different points in the space of all potential universes - as they are all slight approximations of the original.  But would other points in the space of universe-generating programs also generate observed universes like our own?

We know that the fundamental constants in the current physics are apparently well-tuned for life, thus our physics is a lone point in the topological space supporting complex life: even just tiny displacements in any direction result in lifeless universes.  The topological space around our physics is thus sparse for life/complexity/extropy.  There may be other topological hotspots, and if you go far enough in some direction you will necessarily find other universes in Tegmark’s Ultimate Ensemble that support life.  However, AIXI tells us that intelligences in those universes will simulate universes similar to their own, and thus nothing like our universe.

On the other hand we can expect our universe to be slightly different from its parent due to the constraints of simulation, and we may even eventually be able to discover evidence of the approximation itself.  There are some tentative hints from the long-standing failure to find a GUT of physics, and perhaps in the future we may find our universe is an ad-hoc approximation of a simpler (but more computationally expensive) GUT theory in the parent universe.


Alien Dreams

Our   Milky Way galaxy   is vast and old, consisting of hundreds of billions of stars, some of which are more than 13 billion years old, more than three times older than our sun.  We have direct evidence of technological civilization developing in 4 billion years from simple protozoans, but it is difficult to generalize past this single example.  However, we do now have mounting evidence that planets are common, the biological precursors to life are probably common, simple life may even have had a historical presence on mars, and all signs are mounting to support the  principle of mediocrity:  that our solar system is not a precious gem, but is in fact a typical random sample.

If the evidence for the mediocrity principle continues to mount, it provides a further strong support for the Simulation Argument.  If we are not the first technological civilization to have arisen, then technological civilization arose and achieved Singularity long ago, and we are thus astronomically more likely to be in an alien rather than posthuman simulation.

What does this change?

The set of simulation possibilities can be subdivided into PHS (posthuman historical), AHS (alien historical), and AFS (alien future) simulations (as posthuman future simulation is inconsistent).  If we discover that we are unlikely to be the first technological Singularity, we should assume AHS and AFS dominate.  For reasons beyond this scope, I imagine that the AFS set will outnumber the AHS set.

Historical simulations would aim for historical fidelity, but future simulations would aim for fidelity to a 'what-if' scenario, considering some hypothetical action the alien simulating civilization could take.  In this scenario, the first civilization to reach technological Singularity in the galaxy would spread out, gather knowledge about the entire galaxy, and create a massive number of simulations.  It would use these in the same way that all universal intelligences do: to consider the future implications of potential actions.

What kinds of actions?  

The first-born civilization would presumably encounter many planets that already harbor life in various stages, along with planets that could potentially harbor life.  It would use forward simulations to predict the final outcome of future civilizations developing on these worlds.  It would then rate them according to some ethical/utilitarian theory (we don't even need to speculate on the criteria), and it would consider and evaluate potential interventions to change the future historical trajectory of that world: removing undesirable future civilizations, pushing other worlds towards desirable future outcomes, and so on.

At the moment its hard to assign apriori weighting to future vs historical simulation possibilities, but the apparent age of the galaxy compared to the relative youth of our sun is a tentative hint that we live in a future simulation, and thus that our history has potentially been altered.

 

Morality as Parfitian-filtered Decision Theory?

19SilasBarta30 August 2010 09:37PM

Non-political follow-up to: Ungrateful Hitchhikers (offsite)

 

Related to: Prices or Bindings?, The True Prisoner's Dilemma

 

Summary: Situations like the Parfit's Hitchhiker problem select for a certain kind of mind: specifically, one that recognizes that an action can be optimal, in a self-interested sense, even if it can no longer cause any future benefit.  A mind that can identify such actions might put them in a different category which enables it to perform them, in defiance of the (futureward) consequentialist concerns that normally need to motivate it.  Our evolutionary history has put us through such "Parfitian filters", and the corresponding actions, viewed from the inside, feel like "something we should do", even if we don’t do it, and even if we recognize the lack of a future benefit.  Therein lies the origin of our moral intuitions, as well as the basis for creating the category "morality" in the first place.

 

Introduction: What kind of mind survives Parfit's Dilemma?

 

Parfit's Dilemma – my version – goes like this: You are lost in the desert and near death.  A superbeing known as Omega finds you and considers whether to take you back to civilization and stabilize you.  It is a perfect predictor of what you will do, and only plans to rescue you if it predicts that you will, upon recovering, give it $0.01 from your bank account.  If it doesn’t predict you’ll pay, you’re left in the desert to die. [1]

 

So what kind of mind wakes up from this?  One that would give Omega the money.  Most importantly, the mind is not convinced to withhold payment on the basis that the benefit was received only in the past.  Even if it recognizes that no future benefit will result from this decision -- and only future costs will result -- it decides to make the payment anyway.

continue reading »

Harry Potter and the Methods of Rationality discussion thread, part 3

5Unnamed30 August 2010 05:37AM

The second thread has now also exceeded 500 comments, so after 42 chapters of MoR it's time for a new thread.

From the first thread

Spoiler Warning:  this thread contains unrot13'd spoilers for Harry Potter and the Methods of Rationality up to the current chapter and for the original Harry Potter series.  Please continue to use rot13 for spoilers to other works of fiction, or if you have insider knowledge of future chapters of Harry Potter and the Methods of Rationality.

A suggestion: mention at the top of your comment which chapter you're commenting on, or what chapter you're up to, so that people can understand the context of your comment even after more chapters have been posted.  This can also help people avoid reading spoilers for a new chapter before they realize that there is a new chapter.

Exploitation and cooperation in ecology, government, business, and AI

14PhilGoetz27 August 2010 02:27PM

Ecology

An article in a recent issue of Science (Elisa Thebault & Colin Fontaine, "Stability of ecological communities and the architecture of mutualistic and trophic networks", Science 329, Aug 13 2010, p. 853-856; free summary here) studies 2 kinds of ecological networks: trophic (predator-prey) and mutualistic (in this case, pollinators and flowers).  They looked at the effects of 2 properties of networks: modularity (meaning the presence of small, highly-connected subsets that have few external connections) and nestedness (meaning the likelihood that species X has the same sort of interaction with multiple other species).  (It's unfortunate that they never define modularity or nestedness formally; but this informal definition is still useful.  I'm going to call nestedness "sharing", since they do not state that their definition implies nesting one network inside another.)  They looked at the impact of different degrees of modularity and nestedness, in trophic vs. mutualistic networks, on persistence (fraction of species still alive at equilibrium) and resilience (1/time to return to equilibrium after a perturbation).  They used both simulated networks, and data from real-world ecological networks.

What they found is that, in trophic networks, modularity is good (increases persistence and resilience) and sharing is bad; while in mutualistic networks, modularity is bad and sharing is good.  Also, in trophic networks, species go extinct so as to make the network more modular and less sharing; in mutualistic networks, the opposite occurs.

The commonsense explanation is that, if species X is exploiting species Y (trophic), the interaction decreases the health of species Y; and so having more exploiters of Y is bad for both X and Y.  OTOH, if species X benefits from species Y, X will get a secondhand benefit from any mutually-beneficial relationships that Y has; if Y also benefits from X (mutualistic), then neither X nor Y will adapt to prevent Z from also having a mutualistic relationship with Y.  (The theory does not address a mixture of trophic and mutualistic interactions in a single network.)

continue reading »

Cryonics Questions

9James_Miller26 August 2010 11:19PM

Cryonics fills many with disgust, a cognitively dangerous emotion.  To test whether a few of your possible cryonics objections are reason or disgust based, I list six non-cryonics questions.  Answering yes to any one question indicates that rationally you shouldn’t have the corresponding cryonics objections. 

1.  You have a disease and will soon die unless you get an operation.  With the operation you have a non-trivial but far from certain chance of living a long, healthy life.  By some crazy coincidence the operation costs exactly as much as cryonics does and the only hospitals capable of performing the operation are next to cryonics facilities.  Do you get the operation?

Answering yes to (1) means you shouldn’t object to cryonics because of costs or logistics.

2.  You have the same disease as in (1), but now the operation costs far more than you could ever obtain.  Fortunately, you have exactly the right qualifications NASA is looking for in a space ship commander.  NASA will pay for the operation if in return you captain the ship should you survive the operation.  The ship will travel close to the speed of light.  The trip will subjectively take you a year, but when you return one hundred years will have passed on Earth.  Do you get the operation?

Answering yes to (2) means you shouldn't object to cryonics because of the possibility of waking up in the far future.

continue reading »

Self-fulfilling correlations

72PhilGoetz26 August 2010 09:07PM

Correlation does not imply causation.  Sometimes corr(X,Y) means X=>Y; sometimes it means Y=>X; sometimes it means W=>X, W=>Y.  And sometimes it's an artifact of people's beliefs about corr(X, Y).  With intelligent agents, perceived causation causes correlation.

Volvos are believed by many people to be safe.  Volvo has an excellent record of being concerned with safety; they introduced 3-point seat belts, crumple zones, laminated windshields, and safety cages, among other things.  But how would you evaluate the claim that Volvos are safer than other cars?

Presumably, you'd look at the accident rate for Volvos compared to the accident rate for similar cars driven by a similar demographic, as reflected, for instance in insurance rates.  (My google-fu did not find accident rates posted on the internet, but insurance rates don't come out especially pro-Volvo.)  But suppose the results showed that Volvos had only 3/4 as many accidents as similar cars driven by similar people.  Would that prove Volvos are safer?

continue reading »

View more: Next