The AI alignment properties of agents which would be interesting to a range of principals trying to solve AI alignment. For example:
- Does the AI "care" about reality, or just about its sensory observations?
- Does the AI properly navigate ontological shifts?
- Does the AI reason about itself as embedded in its environment?
The AI alignment properties of agents which would be interesting to a range of principals trying to solve AI alignment. For example: