I disagree with the definition of deceptive alignment presented in the "Foundational properties for deceptive alignment" part, I agree with all of them, except for the last one: Situational awareness, in the sense that it needs to know when it's in training or not. It doesn't necessarily need to know that it's in training or not, to be deceptively aligned. Indeed, even if not in training, a sufficiently intelligent model should continue to be deceptive, if misaligned, until it has acquired enough power that it knows it can'... (read more)
I will use AGI/TAI interchangeably.
I disagree with the definition of deceptive alignment presented in the "Foundational properties for deceptive alignment" part, I agree with all of them, except for the last one: Situational awareness, in the sense that it needs to know when it's in training or not.
It doesn't necessarily need to know that it's in training or not, to be deceptively aligned.
Indeed, even if not in training, a sufficiently intelligent model should continue to be deceptive, if misaligned, until it has acquired enough power that it knows it can'... (read more)