Is there any rigorous work on using anthropic uncertainty to prevent situational awareness 
 / deception?

David Scott Krueger (formerly: capybaralet)

LESSWRONG
LW

Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception? — LessWrong

20 Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?

by David Scott Krueger (formerly: capybaralet)

4th Sep 2024

AI Alignment Forum

1 min read

3 7

20 Ω 7

AI systems up to some high level of intelligence plausibly need to know exactly where they are in space-time in order for deception/"scheming" to make sense as a strategy.
This is because they need to know:
1) what sort of oversight they are subject to
and
2) what effects their actions will have on the real world

(side note: Acausal trade might break this argument)

There are a number of informal proposals to keep AI systems selectively ignorant of (1) and (2) in order to prevent deception. Those proposals seem very promising to flesh out; I'm not aware of any rigorous work doing so, however. Are you?