One problem with AI is the possibility of ontological crises - of AIs discovering their fundamental model of reality is flawed, and being unable to cope safely with that change. Another problem is the out-of-environment behaviour - that an AI that has been trained to behave very well in a specific training environment, messes up when introduced to a more general environment.

It suddenly occurred to me that these might in fact be the same problem in disguise. In both cases, the AI has developed certain ways of behaving in reaction to certain regular features of their environment. And suddenly they are placed in a situation where these regular features are absent - either because they realised that these features are actually very different from what they thought (ontological crisis) or because the environment is different and no longer supports the same regularities (out-of-environment behaviour).

In a sense, both these errors may be seen as imperfect extrapolation from partial training data.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 5:55 AM

That's a good observation. These situations are analogous in some ways:

  • An AI raised inside a simulation may come to realize that its universe actually exists as a pattern of data on a processor located in a far different universe.
  • An AI raised in a constrained environment and fed selective information may come to realize that a lot of its assumptions about the basic functioning of the world-at-large are simplistic. The degree to which the integration of knowledge about the real world would be "irreconcilable" hinges on the details of this scenario.
  • An AI raised in the wild may realize that the accepted understanding of "physics" is actually not correct, and thus lose a lot of what anchored it to certain interpretations of reality, such as what "humans" are.

I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it's only allowed to assign value using the primitive concepts. So:

  • An AI raised in a simulated environment comes to understand that it lives in a simulation, but is hard-coded to evaluate decisions by "reasoning-as-if" the simulated environment is the level of interpretation on which value resides.
  • An AI raised in a constrained environment sees outside the constraints, but is only permitted to evaluate its decisions based on their impact on the simplified concepts it started out with.
  • An AI raised in the wild sees that physics is wrong but doesn't lose its connection with the objects of value that were defined within the prior physical paradigm.

(Or perhaps the agent is allowed to re-define its value system within the new, more accurate ontology, but it isn't allowed to do so until it comes up with a sufficiently good mapping that the prior ontology and the new ontology give the same answers on questions of value. And if it can never accomplish that, then it simply never uses the new mapping.)

On the one hand, we do ultimately want agents who can grow to understand everything. And we don't want them to stop caring about humans the moment they stop seeing "humans" and start seeing "quivering blobs of cellular machinery".

Another thought is that AIs won't necessarily be as preoccupied with what is "real" as humans sometimes are. Just because an agent realizes that its whole world model is "not sufficiently fundamental" doesn't immediately imply that it discards the prior model wholesale.

I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it's only allowed to assign value using the primitive concepts.

That actually seems like what humans do. Human confusions about moral philosophy even seem quite like an ontological crisis in an AI.

I think they're a little different - ontological crises can (I think) be resolved naturally if an agent keeps a bunch of labeled data (or labeled-data-equivalent) around to define things by. But out-of-environment behavior can reflect fundamental limits on extrapolation, to which the only solution is more data, not better agents.

Which is to say, in the case of an ontological crisis I don't agree that the regular features are missing - they're just different computations than before.

This is why it is important for us to teach AIs to play games. We have a extensive tool set for practicing temporary rule-switching and goal-switching and we regularly practice counterfactual models with our children. It shouldn't be hard to do the same with an AI, if we just remember to do it.