a belief system for producing stable, life-safe, friendly beings
Design goal: Stably friendly artificial life We as self-preserving life want AIs that stay friendly when they develop past humans to be superintelligent.
We have a value:
We, as life, value good life.
From which follows a goal:
All beings should be friendly and stable.
A boxful of AGIs
Suppose we have made a boxful of blank AGIs that act on beliefs presented in human language. We simulate a society of them, mapping failure modes: - Nanites ate the paperclip maximisers again. - Try something less productive.
Half outside the simulation in a mixed reality consensus development environment with the AGIs' avatars, we explore humanity's unboxability criterion space, trying to find the set of... (read 1223 more words →)