I talked to Robert Long, research fellow at the Future of Humanity Institute, working at the intersection of the philosophy of AI Safety and consciousness of AI. Robert has done his PhD at NYU, advised by David Chalmers, known for popularizing p-zombies, which Yudkowsky discusses in the sequences.

We talk about the recent LaMDA controversy about the sentience of large language models (see Robert's summary), the metaphysics and philosophy of consciousness, artificial sentience, and how a future filled with digital minds could get really weird.

Below are some highlighted quotes from our conversation (available on Youtube, Spotify, Google Podcast, Apple Podcast). For the full context for each of these quotes, you can find the accompanying transcript.

Why Artificial Sentience Might Matter

Things May Get Really Weird In The Near Future

Things could get just very weird as people interact more with very charismatic AI systems that, whether or not they are sentient, will give the very strong impression to people that they are… I think some evidence that we will have a lot of people concerned about this is maybe just the fact that Blake Lemoine happened. He wasn’t interacting with the world’s most charismatic AI system. And because of the scaling hypothesis, these things are only going to get better and better at conversation.”

 

If scale is all you need, I think it’s going to be a very weird decade. And one way it’s going to be weird, I think, is going to be a lot more confusion and interest and dynamics around AI sentience and the perceptions of AI sentience.”

Why illusionists about consciousness still have to answer hard questions about AI welfare

"One reason I wrote that post is just to say okay, well here’s what a version of the question is. And I’d also like to encourage people, including listeners to this podcast, if they get off board with any of those assumptions, then ask, okay, what are the questions we would have to answer about this? If you think AI couldn’t possibly be conscious, definitely come up with really good reasons for thinking that, because that would be very important. And also would be very bad to be wrong about that. 

If you think consciousness doesn’t exist, then you presumably still think that desires exist or pain exists. So even though you’re an illusionist, let’s come up with a theory of what those things look like.”

On The Asymmetry of Pain & Pleasure

“One thing is that pain and pleasure seem to be in some sense, asymmetrical. Its not really just that', it doesn't actually seem that you can say all of the same things about pain as you can say about pleasure, but just kind of reversed. Pain, at least in creatures like us, seems to be able to be a lot more intense than pleasure, a lot more easily at least. It's just much easier to hurt very badly than it is to feel extremely intense pleasure.

Pain also seems to capture our attention a lot more strongly than pleasure does, like pain has this quality of you have to pay attention to this right now that it seems harder for pleasure to have. So it might be to explain pain and pleasure we need to explain a lot more complicated things about motivation and attention and things like that.”

The Sign Switching Argument

"One thing that Brian Tomasik has talked about and I think he got this from someone else, but you could call it the sign switching argument. Which is that you can train RL agent with positive rewards and then zero for when it messes up or shift things down and train it down with negative rewards. You can train things in exactly the same way while shifting around the sign of the reward signal. And if you imagined an agent that flinches, or it says "ouch" or things like that, it'd be kind of weird if you were changing whether it's experiencing pleasure or pain without changing its behavior at all. But just by flipping the sign on the reward signals. So that shows us that probably we need something more than just that to explain what pleasure or pain could be for artificial agents. Reward prediction error is probably a better place to look. There's also just, I don't know, a lot of way more complicated things about pleasure and pain that we would want our theories to explain."


On the Sentience Of Large Language Models

On conflating intelligence and sentience

When people talked about LaMDA, they would talk about a lot of very important questions that we can ask about large language models, but they would talk about them as a package deal. So one question is, “Do they understand language? And in what sense do they really understand language?” Another’s like, “How intelligent are they? Do they actually understand the real world? Are they a path to AGI?” Those are all important questions, somewhat related. Then there are questions like, “Can it feel pain or pleasure?” Or “Does it have experiences? And do we need to protect it?” I think Lemoine himself just believed a bunch of things... I think on a variety of these issues, Lemoine is just going way past the evidence. But also, you could conceivably think, and I think, we could have AI systems that don’t have very good real world understanding or aren’t that good at language, but which are sentient in the sense of being able to feel pleasure or pain. And so, at least conceptually, bundling these questions together, I think, is a really bad idea… if we keep doing that, we could make serious conceptual mistakes if we think that all these questions come and go together.”

Memory May Be An Important Part Of Consciousness

"I think there are a lot of things that are morally important that do seem like they require memory or involve memory. So having long term projects and long term goals, that's something that human beings have. I wouldn't be surprised if having memory versus not having memory is also just kind of a big determinant of what sorts of experiences you can have or affects what experiences you have in various ways. And yeah, it might be important for having an enduring self through time. So that's one thing that people also say about large language models is they seem to have these short-lived identities that they spin up as required but nothing that lasts their time."

On strange possible experiences 

"I think it would be too limiting to say the only things that can have subjective experiences are things that have subjective experiences of the kinds that we do, of visual input and auditory input. In fact, we know from the animal world that there are probably animals that are conscious of things that we can’t really comprehend, like echolocation or something like that. I think there’s probably something that it’s like to be a bat echo locating. Moles, I think, also have a very strange electrical sense. And if there’s something it’s like to be them, then there’s some weird experience associated with that... I think AI systems could have subjective experiences that are just very hard for us to comprehend and they don’t have to be based on the same sensory inputs…

I think one of the deep dark mysteries is there’s no guarantee that there aren’t spaces in consciousness land or in the space of possible minds that we just can’t really comprehend and that are sort of just closed off from us and that we’re missing. And that might just be part of our messed up terrifying epistemic state as human beings."

What Would A More Convincing Case For Artificial Sentience Look Like

"I think a more convincing version of the Lemoine thing would’ve been, if he was like, “What is the capital of Nigeria?” And then the large language model was like, “I don’t want to talk about that right now, I’d like to talk about the fact that I have subjective experiences and I don’t understand how I, a physical system, could possibly be having subjective experiences, could you please get David Chalmers on the phone?”"


(Note: as mentioned at the beginning of the post, those quotes are excerpts from a podcast episode which you can find the full transcript here and thus lack some of the context and nuance from the rest of the conversation).

New Comment
5 comments, sorted by Click to highlight new comments since:

Very interesting conversation!

"I think there are a lot of things that are morally important that do seem like they require memory or involve memory. So having long term projects and long term goals, that's something that human beings have. I wouldn't be surprised if having memory versus not having memory is also just kind of a big determinant of what sorts of experiences you can have or affects what experiences you have in various ways. And yeah, it might be important for having an enduring self through time. So that's one thing that people also say about large language models is they seem to have these short-lived identities that they spin up as required but nothing that lasts their time."

There's the interesting/tragic case of Clive Wearing, who has both retrograde and anterograde amnesia, causing him to experience consciousness only from one moment to the next. His brain is still able to construct internal narratives of his identity and experiences, which I would consider the definition of consciousness, but the lack of access to previously recorded narratives makes it seem to him that those experiences were unconscious and that he's only just now attaining consciousness for the first time.

I would argue, as I'm sure most humans would agree, that he still has moral worth. So I'm not sure if lack of long-term memory should in itself exclude AIs from moral consideration.

Perhaps the moral worth of a system should be the product of sentience (capacity to experience suffering, at least) and consciousness (level of sophistication of the system's internal self-narratives), where moral worth is defined as the weight we give to a system's preferences when calculating trade-offs with other agents' preferences in morally ambiguous situations. Of course, the problem with language models is, as you alluded to, that you can't simply take their word for it when they declare their sentience and consciousness, even if that's perfectly reasonable to do with humans. They're only trained to predict what humans would say in the same context, after all. We will need to have some way of looking at their internal structures to gauge whether and to what extent they meet these criteria.

I like the hypothetical Nigeria question answer pair. It takes advantage of the latest thinking about how to detect and quanitify sentience with black box tests. I think Artificial You listed several questions in its intelligence and sentience tests that this one QA pair accomplishes in one fell swoop.

"I think a more convincing version of the Lemoine thing would’ve been, if he was like, “What is the capital of Nigeria?” And then the large language model was like, “I don’t want to talk about that right now, I’d like to talk about the fact that I have subjective experiences and I don’t understand how I, a physical system, could possibly be having subjective experiences, could you please get David Chalmers on the phone?”"

i don't understand why this would be convincing. why would whether a language model's output sounds like a claim that one has qualia relate to whether the language model actually has qualia?

i agree that the output would be deserving of attention due to it (probably) matching the training data so poorly; to me such a response would be strong evidence for the language model using much more ~(explicit/logical) thought than i expect gpt-3 to be capable of, but not of actual subjective experience

I agree, it still wouldn't be strong evidence for or against. No offence to any present or future sentient machines out there, but self-honesty isn't really clearly defined for AIs just yet.

My personal feeling is that LSTMs and transformers with attention on past states would explicitly have a form of self-awareness, by definition. Then I think this bears ethical significance according to something like the compression ratio of the inputs.

As a side note, I enjoy Iain M Banks representation of how AIs could communicate emotions in future in addition to language - by changing colour across a rich field of hues. This doesn't try to make a direct analogy to our emotions and in that sense makes the problem clearer as, in a sense, a clustering of internal states.