(This post grew out of an old conversation with Wei Dai.)
Imagine a person sitting in a room, communicating with the outside world through a terminal. Further imagine that the person knows some secret fact (e.g. that the Moon landings were a hoax), but is absolutely committed to never revealing their knowledge of it in any way.
Can you, by observing the input-output behavior of the system, distinguish it from a person who doesn't know the secret, or knows some other secret instead?
Clearly the only reasonable answer is "no, not in general".
Now imagine a person in the same situation, claiming to possess some mental skill that's hard for you to verify (e.g. visualizing four-dimensional objects in their mind's eye). Can you, by observing the input-output behavior, distinguish it from someone who is lying about having the skill, but has a good grasp of four-dimensional math otherwise?
Again, clearly, the only reasonable answer is "not in general".
Now imagine a sealed box that behaves exactly like a human, dutifully saying things like "I'm conscious", "I experience red" and so on. Moreover, you know from trustworthy sources that the box was built by scanning a human brain, and then optimizing the resulting program to use less CPU and memory (preserving the same input-output behavior). Would you be willing to trust that the box is in fact conscious, and has the same internal experiences as the human brain it was created from?
A philosopher believing in computationalism would emphatically say yes. But considering the examples above, I would say I'm not sure! Not at all!
Let's pause right there. How do you know it? Obviously, you know it by observing evidence for past differences in behavior. This, of course, includes being told by a third party that the rooms are different and other forms of indirect observations.
If the AI has observed evidence for the difference between the rooms then it will take it into account. If AI has not observed any difference then it will not. The word "ignore" is completely inappropriate here. You can't ignore something you can't know. It's usage here suggests that, you expect, there is some type of evidence that you accept, but the AI wouldn't. Is that true? Maybe you expect the AI to have no long term memory? Or maybe you think it wouldn't trust what people tell it?
You assume that all my knowledge about humans comes from observing their behavior. That's not true. I know that I have certain internal experiences, and that other people are biologically similar to me, so they are likely to also have such experiences. That would still be true even if the experience was never described in words, or was impossible to describe in words, or if words didn't exist.
You are right that communicating such knowledge to an AI is hard. But we must find a way.