Consider an agent with a model of the world W. How does W relate to the real world. W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f
with W_chair ↦ R_chair
.
The pointers problem refers is about figuring out f
.
In John's words (who introduced the concept here):
What functions of what variables (if any) in the environment and/or another world-model correspond to the
fact that most humanslatent variables in the agent’s world-model?
This relates to alignment, as we would rather havelike an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. It was introduced in a post with the same name.Therefore we'd like to figure out how to point to our values directly.
The pointers problem refers to the fact that most humans would rather have an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing[citation needed].knowing. It was introduced in a post with the same name.
The pointers problem refers to the fact that most humans would rather have an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing[citation needed].It was introduced in a post with the same name.
Consider an agent with a model of the world W. How does W relate to the real
world.world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a functionf
withW_chair ↦ R_chair
.