[Stub] Ontological crisis = out of environment behaviour?
One problem with AI is the possibility of ontological crises - of AIs discovering their fundamental model of reality is flawed, and being unable to cope safely with that change. Another problem is the out-of-environment behaviour - that an AI that has been trained to behave very well in a specific training environment, messes up when introduced to a more general environment.
It suddenly occurred to me that these might in fact be the same problem in disguise. In both cases, the AI has developed certain ways of behaving in reaction to certain regular features of their environment. And suddenly they are placed in a situation where these regular features are absent - either because they realised that these features are actually very different from what they thought (ontological crisis) or because the environment is different and no longer supports the same regularities (out-of-environment behaviour).
In a sense, both these errors may be seen as imperfect extrapolation from partial training data.
The ongoing transformation of quantum field theory
Quantum field theory (QFT) is the basic framework of particle physics. Particles arise from the quantized energy levels of field oscillations; Feynman diagrams are the simple tool for approximating their interactions. The "standard model", the success of which is capped by the recent observation of a Higgs boson lookalike, is a quantum field theory.
But just like everything mathematical, quantum field theory has hidden depths. For the past decade, new pictures of the quantum scattering process (in which particles come together, interact, and then fly apart) have incrementally been developed, and they presage a transformation in the understanding of what a QFT describes.
At the center of this evolution is "N=4 super-Yang-Mills theory", the maximally supersymmetric QFT in four dimensions. I want to emphasize that from a standard QFT perspective, this theory contains nothing but scalar particles (like the Higgs), spin-1/2 fermions (like electrons or quarks), and spin-1 "gauge fields" (like photons and gluons). The ingredients aren't something alien to real physics. What distinguishes an N=4 theory is that the particle spectrum and the interactions are arranged so as to produce a highly extended form of supersymmetry, in which particles have multiple partners (so many LWers should be comfortable with the notion).
In 1997, Juan Maldacena discovered that the N=4 theory is equivalent to a type of string theory in a particular higher-dimensional space. In 2003, Edward Witten discovered that it is also equivalent to a different type of string theory in a supersymmetric version of Roger Penrose's twistor space. Those insights didn't come from nowhere, they explained algebraic facts that had been known for many years; and they have led to a still-accumulating stockpile of discoveries about the properties of N=4 field theory.
What we can say is that the physical processes appearing in the theory can be understood as taking place in either of two dual space-time descriptions. Each space-time has its own version of a particular large symmetry, "superconformal symmetry", and the superconformal symmetry of one space-time is invisible in the other. And now it is becoming apparent that there is a third description, which does not involve space-time at all, in which both superconformal symmetries are manifest, but in which space-time locality and quantum unitarity are not "visible" - that is, they are not manifest in the equations that define the theory in this third picture.
I cannot provide an authoritative account of how the new picture works. But here is my impression. In the third picture, the scattering processes of the space-time picture become a complex of polytopes - higher-dimensional polyhedra, joined at their faces - and the quantum measure becomes the volume of these polyhedra. Where you previously had particles, you now just have the dimensions of the polytopes; and the fact that in general, an n-dimensional space doesn't have n special directions suggests to me that multi-particle entanglements can be something more fundamental than the separate particles that we resolve them into.
It will be especially interesting to see whether this polytope combinatorics, that can give back the scattering probabilities calculated with Feynman diagrams in the usual picture, can work solely with ordinary probabilities. That was Penrose's objective, almost fifty years ago, when he developed the theory of "spin networks" as a new language for the angular momentum calculations of quantum theory, and which was a step towards the twistor variables now playing an essential role in these new developments. If the probability calculus of quantum mechanics can be obtained from conventional probability theory applied to these "structures" that may underlie familiar space-time, then that would mean that superposition does not need to be regarded as ontological.
I'm talking about this now because a group of researchers around Nima Arkani-Hamed, who are among the leaders in this area, released their first paper in a year this week. It's very new, and so arcane that, among physics bloggers, only Lubos Motl has talked about it.
This is still just one step in a journey. Not only does the paper focus on the N=4 theory - which is not the theory of the real world - but the results only apply to part of the N=4 theory, the so-called "planar" part, described by Feynman diagrams with a planar topology. (For an impressionistic glimpse of what might lie ahead, you could try this paper, whose author has been shouting from the wilderness for years that categorical knot theory is the missing piece of the puzzle.)
The N=4 theory is not reality, but the new perspective should generalize. Present-day calculations in QCD already employ truncated versions of the N=4 theory; and Arkani-Hamed et al specifically mention another supersymmetric field theory (known as ABJM after the initials of its authors), a deformation of which is holographically dual to a theory-of-everything candidate from 1983.
When it comes to seeing reality in this new way, we still only have, at best, a fruitful chaos of ideas and possibilities. But the solid results - the mathematical equivalences - will continue to pile up, and the end product really ought to be nothing less than a new conception of how physics works.
Beware Selective Nihilism
In a previous post, I argued that nihilism is often short changed around here. However I'm far from certain that it is correct, and in the mean time I think we should be careful not to discard our values one at a time by engaging in "selective nihilism" when faced with an ontological crisis, without even realizing that's what's happening. Karl recently reminded me of the post Timeless Identity by Eliezer Yudkowsky, which I noticed seems to be an instance of this.
As I mentioned in the previous post, our values seem to be defined in terms of a world model where people exist as ontologically primitive entities ruled heuristically by (mostly intuitive understandings of) physics and psychology. In this kind of decision system, both identity-as-physical-continuity and identity-as-psychological-continuity make perfect sense as possible values, and it seems humans do "natively" have both values. A typical human being is both reluctant to step into a teleporter that works by destructive scanning, and unwilling to let their physical structure be continuously modified into a psychologically very different being.
If faced with the knowledge that physical continuity doesn't exist in the real world at the level of fundamental physics, one might conclude that it's crazy to continue to value it, and this is what Eliezer's post argued. But if we apply this reasoning in a non-selective fashion, wouldn't we also conclude that we should stop valuing things like "pain" and "happiness" which also do not seem to exist at the level of fundamental physics?
In our current environment, there is widespread agreement among humans as to which macroscopic objects at time t+1 are physical continuations of which macroscopic objects existing at time t. We may not fully understand what exactly it is we're doing when judging such physical continuity, and the agreement tends to break down when we start talking about more exotic situations, and if/when we do fully understand our criteria for judging physical continuity it's unlikely to have a simple definition in terms of fundamental physics, but all of this is true for "pain" and "happiness" as well.
I suggest we keep all of our (potential/apparent) values intact until we have a better handle on how we're supposed to deal with ontological crises in general. If we convince ourselves that we should discard some value, and that turns out to be wrong, the error may be unrecoverable once we've lived with it long enough.
Ontological Crisis in Humans
Imagine a robot that was designed to find and collect spare change around its owner's house. It had a world model where macroscopic everyday objects are ontologically primitive and ruled by high-school-like physics and (for humans and their pets) rudimentary psychology and animal behavior. Its goals were expressed as a utility function over this world model, which was sufficient for its designed purpose. All went well until one day, a prankster decided to "upgrade" the robot's world model to be based on modern particle physics. This unfortunately caused the robot's utility function to instantly throw a domain error exception (since its inputs are no longer the expected list of macroscopic objects and associated properties like shape and color), thus crashing the controlling AI.
According to Peter de Blanc, who used the phrase "ontological crisis" to describe this kind of problem,
Human beings also confront ontological crises. We should find out what cognitive algorithms humans use to solve the same problems described in this paper. If we wish to build agents that maximize human values, this may be aided by knowing how humans re-interpret their values in new ontologies.
I recently realized that a couple of problems that I've been thinking over (the nature of selfishness and the nature of pain/pleasure/suffering/happiness) can be considered instances of ontological crises in humans (although I'm not so sure we necessarily have the cognitive algorithms to solve them). I started thinking in this direction after writing this comment:
This formulation or variant of TDT requires that before a decision problem is handed to it, the world is divided into the agent itself (X), other agents (Y), and "dumb matter" (G). I think this is misguided, since the world doesn't really divide cleanly into these 3 parts.
What struck me is that even though the world doesn't divide cleanly into these 3 parts, our models of the world actually do. In the world models that we humans use on a day to day basis, and over which our utility functions seem to be defined (to the extent that we can be said to have utility functions at all), we do take the Self, Other People, and various Dumb Matter to be ontologically primitive entities. Our world models, like the coin collecting robot's, consist of these macroscopic objects ruled by a hodgepodge of heuristics and prediction algorithms, rather than microscopic particles governed by a coherent set of laws of physics.
For example, the amount of pain someone is experiencing doesn't seem to exist in the real world as an XML tag attached to some "person entity", but that's pretty much how our models of the world work, and perhaps more importantly, that's what our utility functions expect their inputs to look like (as opposed to, say, a list of particles and their positions and velocities). Similarly, a human can be selfish just by treating the object labeled "SELF" in its world model differently from other objects, whereas an AI with a world model consisting of microscopic particles would need to somehow inherit or learn a detailed description of itself in order to be selfish.
To fully confront the ontological crisis that we face, we would have to upgrade our world model to be based on actual physics, and simultaneously translate our utility functions so that their domain is the set of possible states of the new model. We currently have little idea how to accomplish this, and instead what we do in practice is, as far as I can tell, keep our ontologies intact and utility functions unchanged, but just add some new heuristics that in certain limited circumstances call out to new physics formulas to better update/extrapolate our models. This is actually rather clever, because it lets us make use of updated understandings of physics without ever having to, for instance, decide exactly what patterns of particle movements constitute pain or pleasure, or what patterns constitute oneself. Nevertheless, this approach hardly seems capable of being extended to work in a future where many people may have nontraditional mind architectures, or have a zillion copies of themselves running on all kinds of strange substrates, or be merged into amorphous group minds with no clear boundaries between individuals.
By the way, I think nihilism often gets short changed around here. Given that we do not actually have at hand a solution to ontological crises in general or to the specific crisis that we face, what's wrong with saying that the solution set may just be null? Given that evolution doesn't constitute a particularly benevolent and farsighted designer, perhaps we may not be able to do much better than that poor spare-change collecting robot? If Eliezer is worried that actual AIs facing actual ontological crises could do worse than just crash, should we be very sanguine that for humans everything must "add up to moral normality"?
To expand a bit more on this possibility, many people have an aversion against moral arbitrariness, so we need at a minimum a utility translation scheme that's principled enough to pass that filter. But our existing world models are a hodgepodge put together by evolution so there may not be any such sufficiently principled scheme, which (if other approaches to solving moral philosophy also don't pan out) would leave us with legitimate feelings of "existential angst" and nihilism. One could perhaps still argue that any current such feelings are premature, but maybe some people have stronger intuitions than others that these problems are unsolvable?
Do we have any examples of humans successfully navigating an ontological crisis? The LessWrong Wiki mentions loss of faith in God:
In the human context, a clear example of an ontological crisis is a believer’s loss of faith in God. Their motivations and goals, coming from a very specific view of life suddenly become obsolete and maybe even nonsense in the face of this new configuration. The person will then experience a deep crisis and go through the psychological task of reconstructing its set of preferences according the new world view.
But I don't think loss of faith in God actually constitutes an ontological crisis, or if it does, certainly not a very severe one. An ontology consisting of Gods, Self, Other People, and Dumb Matter just isn't very different from one consisting of Self, Other People, and Dumb Matter (the latter could just be considered a special case of the former with quantity of Gods being 0), especially when you compare either ontology to one made of microscopic particles or even less familiar entities.
But to end on a more positive note, realizing that seemingly unrelated problems are actually instances of a more general problem gives some hope that by "going meta" we can find a solution to all of these problems at once. Maybe we can solve many ethical problems simultaneously by discovering some generic algorithm that can be used by an agent to transition from any ontology to another?
(Note that I'm not saying this is the right way to understand one's real preferences/morality, but just drawing attention to it as a possible alternative to other more "object level" or "purely philosophical" approaches. See also this previous discussion, which I recalled after writing most of the above.)
AI ontology crises: an informal typology
(with thanks to Owain Evans)
An ontological crisis happens when an agent's underlying model of reality changes, such as a Newtonian agent realising it was living in a relativistic world all along. These crises are dangerous if they scramble the agent's preferences: in the example above, an agent dedicated to maximise pleasure over time could transition to completely different behaviour when it transitions to relativistic time; depending on the transition, it may react by accelerating happy humans to near light speed, or inversely, ban them from moving - or something considerably more weird.
Peter de Blanc has a sensible approach to minimising the disruption ontological crises can cause to an AI, but this post is concerned with analyzing what happens when such approaches fail. How bad could it be? Well, this is AI, so the default is of course: unbelievably, hideously bad (i.e. situation normal). But in what ways exactly?
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)