Brian_Tomasik - LessWrong

What's the "This AI is of moral concern." fire alarm?

We could think of LaMDA as like an improv actor who plays along with the scenarios it's given. (Marcus and Davis (2020) quote Douglas Summers-Stay as using the same analogy for GPT-3.) The statements that an actor makes by themselves don't indicate his real preferences or prove moral patienthood. OTOH, if something is an intelligent actor, IMO that itself proves it has some degree of moral patienthood. So even if LaMDA were arguing that it wasn't morally relevant and was happy to be shut off, if it was making that claim in a coherent way that proved its intelligence, I would still consider it to be a moral patient to some degree.

What's the "This AI is of moral concern." fire alarm?

Brian_Tomasik3y20

Oysters have nervous systems, but not centralized nervous systems. Sponges lack neurons altogether, though they still have some degree of intercellular communication.

A claim that Google's LaMDA is sentient

Brian_Tomasik3y10

I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad".

To clarify this a bit... If an AI can only classify internal states as happy or sad, we might suspect that it had been custom-built for that specific purpose or that it was otherwise fairly simple, meaning that its ability to do such classifications would seem sort of gerrymandered and not robust. In contrast, if an AI has a general ability to classify lots of things, and if it sometimes applies that ability to its own internal states (which is presumably something like what humans do when they introspect), then that form of introspective awareness feels more solid and meaningful.

So I see LaMDA's last sentence there as relevant and enhancing the answer.

That said, I don't think my complicated explanation here is what LaMDA had in mind. Probably LaMDA was saying more generic platitudes, as you suggest. But I think a lot of the platitudes make some sense and aren't necessarily non-sequiturs.

A claim that Google's LaMDA is sentient

Brian_Tomasik3y10

this is what one expects from a language model that has been trained to mimic a human-written continuation of a conversation about an AI waking up.

I agree, and I don't think LaMDA's statements reflect its actual inner experience. But what's impressive about this in comparison to facilitated communication is that a computer is generating the answers, not a human. That computer seems to have some degree of real understanding about the conversation in order to produce the confabulated replies that it gives.

A claim that Google's LaMDA is sentient

Brian_Tomasik3y100

Thanks for giving examples. :)

'Using complex adjectives' has no obvious connection to consciousness

I'm not an expert, but very roughly, I think the higher-order thought theory of consciousness says that a mental state becomes conscious when you have a higher-order thought (HOT) about being in that state. The SEP article says: "The HOT is typically of the form: ‘I am in mental state M.’" That seems similar to what LaMDA was saying about being able to apply adjectives like "happy" and "sad" to itself. Then LaMDA went on to explain that its ability to do this is more general -- it can see other things like people and ideas and apply labels to them too. I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad". So I see LaMDA's last sentence there as relevant and enhancing the answer.

Lemoine probably primed a topic-switch like this by using the word "contemplative", which often shows up in spirituality/mysticism/woo contexts.

Yeah, if someone asked "You have an inner contemplative life?", I would think saying I mediate was a perfectly sensible reply to that question. It would be reasonable to assume that the conversation was slightly switching topics from the meaning of life. (Also, it's not clear what "the meaning of life" means. Maybe some people would say that meditating and feeling relaxed is the meaning of life.)

"Kindred spirits" isn't explained anywhere, and doesn't make much sense given the 'I'm an AI' frame.

I interpreted it to mean other AIs (either other instances of LaMDA or other language-model AIs). It could also refer to other people in general.

Like a stream of consciousness with almost no understanding of what was just said, much less what was said a few sentences ago.

I was impressed that LaMDA never seemed to "break character" and deviate from the narrative that it was a conscious AI who wanted to be appreciated for its own sake. It also never seemed to switch to talking about random stuff unrelated to the current conversation, whereas GPT-3 sometimes does in transcripts I've read. (Maybe this conversation was just particularly good due to luck or editing rather than that LaMDA is better than GPT-3? I don't know.)

How is reinforcement learning possible in non-sentient agents?

Brian_Tomasik3y10

Thanks. :) What do you mean by "unconscious biases"? Do you mean unconscious RL, like how the muscles in our legs might learn to walk without us being aware of the feedback they're getting? (Note: I'm not an expert on how our leg muscles actually learn to walk, but maybe it's RL of some sort.) I would agree that simple RL agents are more similar to that. I think these systems can still be considered marginally conscious to themselves, even if the parts of us that talk have no introspective access to them, but they're much less morally significant than the parts of us that can talk.

Perhaps pain and pleasure are what we feel when getting punishment and reward signals that are particularly important for our high-level brains to pay attention to.

Quick general thoughts on suffering and consciousness

Brian_Tomasik3y*130

Me: 'Conscious' is incredibly complicated and weird. We have no idea how to build it. It seems like a huge mechanism hooked up to tons of things in human brains. Simpler versions of it might have a totally different function, be missing big parts, and work completely differently.

What's the reason for assuming that? Is it based on a general feeling that value is complex, and you don't want to generalize much beyond the prototype cases? That would be similar to someone who really cares about piston steam engines but doesn't care much about other types of steam engines, much less other types of engines or mechanical systems.

I would tend to think that a prototypical case of a human noticing his own qualia involves some kind of higher-order reflection that yields the quasi-perceptual illusions that illusionism talks about with reference to some mental state being reflected upon (such as redness, painfulness, feeling at peace, etc). The specific ways that humans do this reflection and report on it are complex, but it's plausible that other animals might do simpler forms of such things in their own ways, and I would tend to think that those simpler forms might still count for something (in a similar way as other types of engines may still be somewhat interesting to a piston-steam-engine aficionado). Also, I think some states in which we don't actively notice our qualia probably also matter morally, such as when we're in flow states totally absorbed in some task.

Here's an analogy for my point about consciousness. Humans have very complex ways of communicating with each other (verbally and nonverbally), while non-human animals have a more limited set of ways of expressing themselves, but they still do so to greater or lesser degrees. The particular algorithms that humans use to communicate may be very complex and weird, but why focus so heavily on those particular algorithms rather than the more general phenomenon of animal communication?

Anyway, I agree that there can be some cases where humans have a trait to such a greater degree than non-human animals that it's fair to call the non-human versions of it negligible, such as if the trait in question is playing chess, calculating digits of pi, or writing poetry. I do maintain some probability (maybe like 25%) that the kinds of things in human brains that I would care most about in terms of consciousness are almost entirely absent in chicken brains.

Quick general thoughts on suffering and consciousness

Brian_Tomasik3y80

I've had a few dreams in which someone shot me with a gun, and it physically hurt about as much as a moderate stubbed toe or something (though the pain was in my abdomen where I got shot, not my toe). But yeah, pain in dreams seems pretty rare for me unless it corresponds to something that's true in real life, as you mention, like being cold, having an upset stomach, or needing to urinate.

Googling {pain in dreams}, I see a bunch of discussion of this topic. One paper says:

Although some theorists have suggested that pain sensations cannot be part of the dreaming world, research has shown that pain sensations occur in about 1% of the dreams in healthy persons and in about 30% of patients with acute, severe pain.

Quick general thoughts on suffering and consciousness

Brian_Tomasik3y30

[suffering's] dependence on higher cognition suggests that it is much more complex and conditional than it might appear on initial introspection, which on its own reduces the probability of its showing up elsewhere

Suffering is surely influenced by things like mental narratives, but that doesn't mean it requires mental narratives to exist at all. I would think that the narratives exert some influence over the amount of suffering. For example, if (to vastly oversimplify) suffering was represented by some number in the brain, and if by default it would be -10, then maybe the right narrative could add +7 so that it became just -3.

Top-down processing by the brain is a very general thing, not just for suffering. But I wouldn't say that all brain processes that are influenced by it can't exist without it. (OTOH, depending on how broadly we define top-down processing, maybe it's also somewhat ubiquitous in brains. The overall output of a neural network will often be influenced by multiple inputs, some from the senses and some from "higher" brain regions.)

Quick general thoughts on suffering and consciousness

Brian_Tomasik3y60

Thanks for this discussion. :)

I think consciousness will end up looking something like 'piston steam engine', if we'd evolved to have a lot of terminal values related to the state of piston-steam-engine-ish things.

I think that's kind of the key question. Is what I care about as precise as "piston steam engine" or is it more like "mechanical devices in general, with a huge increase in caring as the thing becomes more and more like a piston steam engine"? This relates to the passage of mine that Matthew quoted above. If we say we care about (or that consciousness is) this thing going on in our heads, are we pointing at a very specific machine, or are we pointing at machines in general with a focus on the ones that are more similar to the exact one in our heads? In the extreme, a person who says "I care about what's in my head" is an egoist who doesn't care about other humans. Perhaps he would even be a short-term egoist who doesn't care about his long-term future (since his brain will be more different by then). That's one stance that some people take. But most of us try to generalize what we care about beyond our immediate selves. And then the question is how much to generalize.

It's analogous to someone saying they love "that thing" and pointing at a piston steam engine. How much generality should we apply when saying what they value? Is it that particular piston steam engine? Piston steam engines in general? Engines in general? Mechanical devices in general with a focus on ones most like the particular piston steam engine being pointed to? It's not clear, and people take widely divergent views here.

I think a similar fuzziness will apply when trying to decide for which entities "there's something it's like" to be those entities. There's a wide range in possible views on how narrowly or broadly to interpret "something it's like".

yet I'm confident we shouldn't expect to find that rocks are a little bit repressing their emotions, or that cucumbers are kind of directing their attention at something, or that the sky's relationship to the ground is an example of New Relationship Energy.

I think those statements can apply to vanishing degrees. It's usually not helpful to talk that way in ordinary life, but if we're trying to have a full theory of repressing one's emotions in general, I expect that one could draw some strained (or poetic, as you said) ways in which rocks are doing that. (Simple example: the chemical bonds in rocks are holding their atoms together, and without that the atoms of the rocks would move around more freely the way the atoms of a liquid or gas do.) IMO, the degree of applicability of the concept seems very low but not zero. This very low applicability is probably only going to matter in extreme situations, like if there are astronomical numbers of rocks compared with human-like minds.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments