(This post grew out of an old conversation with Wei Dai.)
Imagine a person sitting in a room, communicating with the outside world through a terminal. Further imagine that the person knows some secret fact (e.g. that the Moon landings were a hoax), but is absolutely committed to never revealing their knowledge of it in any way.
Can you, by observing the input-output behavior of the system, distinguish it from a person who doesn't know the secret, or knows some other secret instead?
Clearly the only reasonable answer is "no, not in general".
Now imagine a person in the same situation, claiming to possess some mental skill that's hard for you to verify (e.g. visualizing four-dimensional objects in their mind's eye). Can you, by observing the input-output behavior, distinguish it from someone who is lying about having the skill, but has a good grasp of four-dimensional math otherwise?
Again, clearly, the only reasonable answer is "not in general".
Now imagine a sealed box that behaves exactly like a human, dutifully saying things like "I'm conscious", "I experience red" and so on. Moreover, you know from trustworthy sources that the box was built by scanning a human brain, and then optimizing the resulting program to use less CPU and memory (preserving the same input-output behavior). Would you be willing to trust that the box is in fact conscious, and has the same internal experiences as the human brain it was created from?
A philosopher believing in computationalism would emphatically say yes. But considering the examples above, I would say I'm not sure! Not at all!
Let's try another situation. Imagine two people in sealed rooms. You press a button and both of them scream in pain. However you know that only the first person is really suffering, while the second one is pretending and the button actually gives him pleasure. The two rooms have the same reaction to pressing the button, but the moral value of pressing the button is different. If you propose an AI that ignores all such differences in principle, and assigns moral value only based on external behavior without figuring out the nature of pain/pleasure/other qualia, then I won't invest in your AI because it will likely lead to horror.
Hence the title "steelmanning the chinese room argument". To have any shot at FAI, we need to figure out morality the hard way. Playing rationalist taboo isn't good enough. The hope of reducing all morally relevant properties (not just consciousness) to outward behavior is just that - a hope. You have zero arguments why it's true, and the post gives several arguments why it's false. Don't bet the world on it.
Let's pause right there. How do you know it? Obviously, you know it by observing evidence for past differences in behavior. This, of course, includes being told by a third party that the rooms are different and other forms of indirect observations.
If the AI has observed evidence for the difference between the rooms then it will take it into account. If AI has not observed any difference then it will n... (read more)