(This post grew out of an old conversation with Wei Dai.)
Imagine a person sitting in a room, communicating with the outside world through a terminal. Further imagine that the person knows some secret fact (e.g. that the Moon landings were a hoax), but is absolutely committed to never revealing their knowledge of it in any way.
Can you, by observing the input-output behavior of the system, distinguish it from a person who doesn't know the secret, or knows some other secret instead?
Clearly the only reasonable answer is "no, not in general".
Now imagine a person in the same situation, claiming to possess some mental skill that's hard for you to verify (e.g. visualizing four-dimensional objects in their mind's eye). Can you, by observing the input-output behavior, distinguish it from someone who is lying about having the skill, but has a good grasp of four-dimensional math otherwise?
Again, clearly, the only reasonable answer is "not in general".
Now imagine a sealed box that behaves exactly like a human, dutifully saying things like "I'm conscious", "I experience red" and so on. Moreover, you know from trustworthy sources that the box was built by scanning a human brain, and then optimizing the resulting program to use less CPU and memory (preserving the same input-output behavior). Would you be willing to trust that the box is in fact conscious, and has the same internal experiences as the human brain it was created from?
A philosopher believing in computationalism would emphatically say yes. But considering the examples above, I would say I'm not sure! Not at all!
You said:
I'm trying to show that's not good enough. Seeing red isn't the same as claiming to see red, feeling pain isn't the same as claiming to feel pain, etc. There are morally relevant facts about agents that aren't reducible to their behavior. Each behavior can arise from multiple internal experiences, some preferable to others. Humans can sometimes infer each other's experiences by similarity, but that doesn't work for all possible agents (including optimized uploads etc) that are built differently from humans. FAI needs to make such judgments in general, so it will need to understand how internal experience works in general. Otherwise we might get a Disneyland with no children, or with suffering children claiming to be happy. That's the point of the post.
You could try to patch the problem by making the AI create only agents that aren't too different from biological humans, for which the problem of suffering could be roughly solved by looking at neurons or something. But that leaves the door open to accidental astronomical suffering in other kinds of agents, so I wouldn't accept that solution. We need to figure out internal experience the hard way.
A record player looping the words "I see red" is very different from how humans see, both internally and behaviorally. A robot which takes a picture, finds the most common pixel color, and if that's red, plays the same "I see red" sound, is still in some ways different, but a lot less so. And if someone wanted to call this second robot conscious, as far as color is concerned, there would be no problem with that.
You may feel that pain is special, and that if we recognize a robot which sa... (read more)