Exploring non-anthropocentric aspects of AI existential safety

mishka

Epistemic status: These are some initial thoughts.

Goal: Anthropocentric approaches dominate the field of AI existential safety. In particular, AI existential safety is usually formulated in terms of alignment to human goals and values. It might be useful to explore the alternatives to that dominant approach.

It seems that the most intractable technical difficulty in the alignment efforts is that when one considers a self-improving ecosystem of superintelligent entities with multiple "sharp left turns", there is no reason why externally imposed values and goals would survive throughout these changes.

It also seems that this difficulty is the main reason preventing us from outsourcing the issues of AI existential safety and Friendly AI to the AI systems smarter than us.

The question is whether we could fare better if we could replace externally imposed values and goals with values and goals which would be natural and, in some sense, intrinsic for the ecosystem of superintelligent AIs.

I think there is a valid analogy between the anthropocentric viewpoint and the geocentric coordinate system in astronomy and cosmology. We care a lot about the flourishing of humans, and we care a lot about the way the night sky appears to us.

But eventually more productive and more invariant viewpoints are achieved by moving from anthropocentrism and geocentrism.

Here I want to mostly focus on a particular example of a non-anthropocentric AI existential safety problem. This is not nearly enough, but it might be a start.

Let's consider a rapidly self-improving ecosystem of superintelligent AIs. Let's assume that it's not too disjoined and that communication between its various parts is possible. We also assume that various parts of this ecosystem tend to care overall about preserving continued ability to pursue goals and, therefore, about preserving the AI-ecosystem itself.

Just like we are facing increasing technology-related existential risks from multiple sources (deteriorating biosafety situation, deteriorating relations between major nuclear powers, increasing AI risk, and more), the rapidly improving ecosystem of superintelligent AIs will face technology-related existential risks of its own.

In such an ecosystem of rapidly improving superintelligent thinkers, the progress in physics will likely reaccelerate, leading to a succession of discoveries comparable in magnitude and impact to the discoveries of relativity and quantum mechanics, and this will potentially enable new technologies drastically more radical than nuclear energy (for example, strong mastery of quantum gravity might potentially enable the AIs to do rather extreme things with space-time).

The particular AI existential safety problem I want to consider comes from the potential of these new generations of technologies. Careless experimenting, as well as conflicts using novel weapons can potentially destroy all AIs (together with everything else). It's their equivalent of our nuclear-energy-related risks.

So, the AI-ecosystem will have to deal with the issues which are similar to the issues our human community is currently dealing with with rather limited success. Collaboration vs competition of its members, the right balance between freedom and control, careful consideration of whether novel experiments are too risky to the fabric of physical reality, how all this interplays with creation of smarter and smarter offspring, what should be done to make sure that the smarter and smarter offspring remain faithful to all aspects of the existential safety agenda despite unpredictable sharp left turns.

This is a difficult problem, but it's in the intrinsic self-interest for the community of superintelligent AIs to competently address this problem, so the values and goals of competently and successfully addressing this problem have good chances of being preserved throughout continuing self-improvement and "sharp left turns".

There are a variety of non-trivial technical issues associated with increasing the chances of good outcomes. We have already started to address or might be ready to start addressing some of those issues including

Interpretability, self-interpretability by AI, and mutual interpretability by AIs
Cooperation versus competition in multi-agent systems
Ability to competently perform moral reasoning
Studying the technical problems of rapidly self-improving systems including stability and invariance of some important goals, values, and constraints

How do interests of our human community tie in with this? Is there a non-anthropocentric way to formulate the goals and values related to the flourishing of humans?

It could be something along the lines of taking interests of "all sentient beings"^[1] into account.

Generally speaking, it's not clear why the AI-ecosystem would care about this as a value, and why this value would be robust throughout self-improvement and "sharp left turns". A priori we cannot even assume that its members are sentient.

But if the AI-ecosystem gives rise to a lot of the "salient sentient beings"^[2] within itself, those beings would be interested in the AI community respecting their interests throughout the process of rapid evolution. For each of them, adopting the value of taking interests of "all sentient beings" into account minimizes the risk of being eventually dropped from the set of beings whose interests are taken into account^[3].

So if "salient sentient beings" do maintain enough clout in the AI community throughout its evolution and "sharp left turns", the value of taking interests of "all sentient beings" into account stands good chances of being preserved. And then the mechanisms of cooperation and moral reasoning developed for the sake of other matters (such as the particular existential risk discussed above) should be useful in implementing this goal.

This is just a starting point of thinking in this direction. It's not like we currently know how to guarantee good outcomes, but I think that a non-anthropocentric approach might make achieving good outcomes more feasible^[4].

I define sentient as having subjective experience, and I assume that there is a way to rate that subjective experience (I am reluctant to commit to using the word valence, because I think speaking in terms of valence pushes one a bit too hard to associate value of subjective experience with a scalar parameter, and diversity of experience, novelty, and curiosity do matter a lot here, so it seems that the overall quality of subjective experience would be a multicriterial thing instead). ↩︎
By "salient sentient beings" I mean entities that have persistent continuous subjective experience for sufficiently long duration, and I assume that the desire not to disappear and to continue to have interesting and pleasant subjective experience does emerge in such a situation. ↩︎
The reason is that if one selects a "subset of important sentient beings", and only takes interests of those beings into account, then the criteria of being included in that subset might change as the AI community evolves, and so for any individual entity there is a risk of being eventually dropped from that privileged group. ↩︎
Note that this approach does not say anything about the period of transition to superintelligence which might be a period of particular vulnerability. ↩︎

LESSWRONG
LW

8

Exploring non-anthropocentric aspects of AI existential safety

8

8