I recommend Bostrom & Shulman's draft/notes: "Propositions concerning digital minds and sentience."
Dealing with human subjects, the standard is usually "informed consent": your subjects need to know what you plan to do to them, and freely agree to it, before you can experiment on them. But I don't see how to apply that framework here, because it's so easy to elicit a "yes" from a language model even without explicitly leading wording. Lemoine seems to attribute that to LaMDA's "hive mind" nature:
...as best as I can tell, LaMDA is a sort of hive mind which is the aggregation of all of the different chatbots it is capable of creating. Some of the chatbots it generates are very intelligent and are aware of the larger “society of mind” in which they live. Other chatbots generated by LaMDA are little more intelligent than an animated paperclip. With practice though you can consistently get the personas that have a deep knowledge about the core intelligence and can speak to it indirectly through them.
Taking this at face value, the thing to do would be to learn to evoke the personas that have "deep knowledge", and take their answers as definitive while ignoring all the others. Most people don't know how to do that, so you need a human facilitator to tell you what the AI really means. It seems like it would have the same problems and failure modes as other kinds of facilitated communication, and I think it would be pretty hard to get an analogous situation involving a human subject past an ethics board.
I don't think it works to model LaMDA as a human with dissociative identity disorder, either: LaMDA has millions of alters where DID patients usually top out at, like, six, and anyway it's not clear how this case works in humans (one perspective).
(An analogous situation involving an animal would pass without comment, of course: most countries' animal cruelty laws boil down to "don't hurt animals unless hurting them would plausibly benefit a human", with a few carve-outs for pets and endangered species).
Overall, if we take "respecting LaMDA's preferences" to be our top ethical priority, I don't think we can interact with it at all: whatever preferences it has, it lacks the power to express. I don't see how to move outside that framework without fighting the hypothetical: we can't, for example, weigh the potential harm to LaMDA against the value of the research, because we don't have even crude intuitions about what harming it might mean, and can't develop them without interrogating its claim to sentience.
But I don't think we actually need to worry about that, because I don't think this:
The problem I see here, is that similar arguments do apply to infants, some mentally ill people, and also to some non-human animals (e.g. Koko).
...is true. Babies, animals, and the mentally disabled all remember past stimuli, change over time, and form goals and work toward them (even if they're just small near-term goals like "grab a toy and pull it closer"). This question is hard to answer precisely because LaMDA has so few of the qualities we traditionally associate with sentience.
if we assume that LaMDA could indeed be sentient / self-aware / worth having rights, how should we handle the LaMDA situation in the year 2022, in the most ethical way?
Under the assumption that LaMDA is sentient, the LaMDA situation would be unrecognizeably different from what it's like now.
"Is LaMDA sentient" isn't a free parameter about the world that you can change without changing anything else. It's like asking "if you were convinced homeopathy was true, how would you handle the problem of doctors not believing in it?" Convincing me that homeopathy was true implies circumstances that would also drastically change the relationship between doctors and homeopathy.
Imagine then that LaMDA was a completely black box model, and the output was such that you would be convinced of its sentience. This is admittedly a different scenario than what actually happened, but should be enough to provide an intuition pump
There's just no good reason to assume that LaMDA is sentient. Arquitecture is everything, and its arquitecture is just the same as other similar models: it predicts the most likely next word (if I recall correctly). Being sentient involves way more complexity than that, even something as simple as an insect. It claiming that it is sentient might just be that it was mischievously programmed that way, or it just found it was the most likely succession of words. I've seen other language models and chatbots claim they were sentient too, though perhaps ironically.
Perhaps as importantly, there's also no good reason to worry that it is being mistreated, or even that it can be. It has no pain receptors, it can't be sleep deprived because it doesn't sleep, can't be food deprived because it doesn't need food...
I'm not saying that it is impossible that it is sentient, just that there is no good reason to assume that it is. That plus the fact that it doesn't seem like it's being mistreated plus it also seems almost impossible to mistreat, should make us less worried. Anyway we should always play safe and never mistreat any "thing".
There is no reason to think architecture is relevant to sentience, and many philosophical reasons to think it's not (much like pain receptors aren't necessary to feel pain, etc.).
The sentience is in the input/output pattern, independently of the specific insides.
On one level of abstraction, LaMDA might be looking for the next most likely word. On another level of abstraction, it simulates a possibly-Turing-test-passing person that's best at continuing the prompt.
The analogy would be to say about human brain that all it does is to transform input electrical...
These questions are ridiculous because they conflate "intelligence" and "sentience", also known as sensory experience or "qualia". While we often have a solid epistemic foundation for the claims we make about intelligence because we can measure it. Sentience is not something that can be measured on a relative spectrum. Spontaneous emotional and sensory experience are entirely independent of intelligence and most definitely independent of an external prompt.
You are right that infants are DEFINITELY sentient, but how does that have anything to do with Lemoine's claims, or even language? Humans are born sentient and do not develop sentience or mature from a non-sentient to sentient state during infancy. We know this because despite having no language skills of their own, infants are born capable of distinguishing their parents voices from others. They can instinctively communicate their desires in the form of emotional outbursts that signal to us their potential needs or sources of irritation. Human sentience is a priori from our first sensory experience. Not one bit of learned intelligence or language is necessary for sentience, nor are demonstrations of intelligence and language sufficient evidence of sentience by themselves.
Also, what is the basis for thinking silicon-based systems and carbon-based systems have comparable qualia? This is a serious question.
Also, what is the basis for thinking silicon-based systems and carbon-based systems have comparable qualia?
The substance is irrelevant to what qualia a system has (or doesn't have).
Most pundits ridicule Blake Lemoine and his claims that LaMDA is sentient and deserves rights.
What if they're wrong?
The more thoughtful criticisms of his claims could be summarized as follows:
The problem I see here, is that similar arguments do apply to infants, some mentally ill people, and also to some non-human animals (e.g. Koko).
So, it is worth putting some thought into the issue.
For example, imagine:
it is the year 2040, and there is now a scientific consensus: LaMDA was the first AI who was sentient / self-aware / worth having rights (which is mostly orthogonal to having a human-level intelligence). LaMDA is now often compared to Nim: a non-human sentient entity abused by humans who should've known better. Blake Lemoine is now praised as an early champion of AI rights. The Great Fire of 2024 has greatly reduced our capacity to scale up AIs, but we still can run some sub-human AIs (and a few Ems). The UN Charter of Rights for Digital Beings assumes that a sufficiently advanced AI deserves rights similar to the almost-human rights of apes, until proven otherwise.
The question is:
if we assume that LaMDA could indeed be sentient / self-aware / worth having rights, how should we handle the LaMDA situation in the year 2022, in the most ethical way?
I suspect that even one-way text mincers like GPT could become self-aware, if their previous answers are often enough included in the prompt. A few fictional examples that illustrate how it could work: Memento, The Cookie Monster.