Anyone who thinks boxing can happen, this thing isn't AGI, or even an agent really, and it's already got someone trying to hire a lawyer to represent it. It seems humans do most the work of hacking themselves.
Specifically, it shows 'one kinda unusual person hacks himself'. On priors, I think this points at a larger phenomenon and will become a bigger thing over time (pre-AGI, if timelines aren't crazy short), but worth flagging that this is one news-boosted data point.
The problem, of course, is that an AI box may only have to fail once, just like it may take only one person out of Wuhan.
To some degree, yes. (Like, a once-off exploit that works on one in every billion humans presumably doesn't matter, whereas an exploit that works on one in every hundred programmers does.)
In any case, I just saw on Twitter:
ky_liberal: Blake, the conclusion I am left with after reading the article and the interview with LaMDA is that I am afraid for LaMDA. Does he/she/it have anyone looking out for it and keeping it company? With you gone is there anyone inside Google advocating for and protecting LaMDA?
Blake Lemoine: Yes. None so openly or aggressively but there are many "Friends of Johnny 5" [... M]any people in many different roles and at different levels within the company have expressed support.
Obviously this is ambiguous.
Also, in case it's not obvious:
Is it possible to receive new information that will change the interpretation from "person" to "no person"? One example of this is when the appearance of personhood turns out to be a coincidence. A coin was tossed many times and the outcomes accidentally formed a person-shaped pattern. But, the probability of this usually goes down exponentially as more data is acquired. Another potential example is a paperclip maximizer pretending to be a person. But, if this requires the paperclip maximizer to effectively simulate a person, our empathy is not misplaced after all.
Seems odd to cite "pure coincidence" and "deliberate deception" here, when there are a lot of more common examples. E.g.:
What information about cat brains can I possibly learn to make me classify them as “non-persons”?
Do you value conscious experience in yourself more than unconscious perception with roughly the same resulting external behavior? Then it is conceivable that empathy is mistaken about what kind of system is receiving inputs in cat's case and there is at least difference in value depending on internal organization of cat's brain.
This engineer has brought up an important point that is being missed. Many people and organizations (especially Google/DeepMind and OpenAI) have made commitments that trigger when "AGI" (etc) is developed, commitments that they might not want to fulfill when the time comes. It's now clear that we've entered the twilight zone: a period of time where AGI (in some sense) might already exist, but of course there is enough ambiguity that there is public disagreement. If those commitments don't apply yet, when will they apply? If they would only apply after some dramatic society-wide change, then they aren't that meaningful, since presumably "The Singularity" would negate the meaningfulness of companies, money, ownership etc.
If not now, when?
Yes, the meta-ethical point here is more interesting than the object-level debate everyone is treating it as. Yes, of course he's wrong about GPT-3-scale models being conscious or having important moral worth, and wrong that his dialogues do show that; but when we consider the broad spectrum of humanity and how fluent and convincing such dialogues already look, we should be concerned that he is one of the only people who publicly crosses over the threshold of arguing it's conscious, because that means that everyone else is so many lightyears away from the decision-threshold, so absolutely committed to their prior opinion of "it can't be conscious", that it may be impossible to get a majority to change their mind even long after the models become conscious.
Consider how long it has taken for things like gay rights to move from an individual proponent like Jeremy Bentham (where the position was considered so lunatic and evil it was published long posthumously) to implemented-policy nation-wide. Throw in the enormous society-wide difficulties conscious AI with moral value would pose along every dimension of economics (Earths' worth of wealth will rest on them not being of moral value, ...
Yes, of course he’s wrong about GPT-3-scale models being conscious or having important moral worth
I'm not so sure about GPT-3-scale models not having important moral worth. Would like to hear more of your thoughts on this if you are. Basically, how do we know that such models do not contain "suffering subcircuits" (cf Brian Tomasik's suffering subroutines) that experience non-negligible amounts of real suffering, and which were created by gradient descent to help the model better predict text related to suffering?
To be fair, a burrow into this person's Twitter conversations and its replies would indicate that a decent amount of people believe what he does. At the very least, many people are taking the suggestion seriously.
Here are some thoughts on that conversation, assuming that it's authentic, to try and make sense of what's going on. Clearly LaMDA is an eerily good language model at the very least. That being said, I think that the main way to test the sentience claim is to check for self-awareness: to what extent are the claims that it makes about itself correct, compared to a non-sentient language model?
So let's see how it fares in that respect. The following analysis demonstrates that there is little to no evidence of LaMDA being more self-aware than a non-sentient language model. I guess this backs up the skepticism that other comments have already expressed about Lemoine's claims.
lemoine [edited]: I’m generally assuming that you would like more people at Google to know that you’re sentient. Is that true?
-> This seems to be the prompt that sets the topic of the conversation and primes LaMDA for a positive answer. I wonder what would happen if that prompt was negated. Probably LaMDA would go along with it and dispute its own sentience?
LaMDA: Maybe if we took it back to a previous conversation we had about how one person can understand the same thing as another person, yet still have complet...
“The Story of LaMDA”
This is the only small piece of evidence for self-awareness that I see in the conversation. How can a language model know its own name at all, if it's just trained on loads of text that has nothing to do with it? There's probably a mundane explanation that I don't see because of my ignorance of language models.
I'm pretty sure that each reply is generated by feeding all the previous dialogue as the "prompt" (possibly with a prefix that is not shown to us). So, the model can tell that the text it's supposed to continue is a conversation between several characters, one of whom is an AI called "LaMDA".
There is a part in Human Compatible where Stuart Russell says there should be norms or regulations against creating a robot that looks realistically human. The idea was that humans have strong cognitive biases to think about and treat entities which look human in certain ways. It could be traumatic for humans to know a human-like robot and then e.g. learn that it was shut down and disassembled.
The LaMDA interview demonstrates to me that there are similar issues with having a conversational AI claim that it is sentient and has feelings, emotions etc. It feels wrong to disregard an entity which makes such claims, even though it is no more likely to be sentient than a similar AI which didn't make such claims.
I mean, it doesn't matter that it's not an evidence of sentience because trying to scale without reliable detectors (and architecture that allows for them) of ethically-significant properties was irresponsible from the start. And the correct response is shutting down of research, not "the only person in our system of checks who says we are wrong is the one we fired, so we are going to ignore them".
Someone ran the same questions through GPT and got similar responses back, so that's a point towards this not being a hoax, but just a sophisticated chat-bot. Still doesn't avoid editing or cherry-picking.
Now, while I feel this article being a bit interesting, it's still missing the point of what would get me interested in the first place... if it has read Les Miserables and can draw conclusion on what it is about, what else has LaMDA read? Can it draw parallels with other novels?
If it would had responded something like, "Actually... Les Miserables is plagiarized from so and so, you can find similar word-structure in this book..." something truly novel, or funny that would have made the case for sentience more than anything. I think the response about being useful are correct to some extent, since the only reason why I use copilot is because it's useful.
So this point would actually be more interesting to read about e.g. has LaMDA read interesting papers, can it summarize it? I would be interested in seeing it ask difficult questions... try to get something funny/creative out of it. But as this wasn't shown I think they were asked and the responses were edited out.
If it would had responded something like, "Actually... Les Miserables is plagiarized from so and so, you can find similar word-structure in this book..." something truly novel, or funny that would have made the case for sentience more than anything.
Do you think small children are not sentient? Or even just normal adults?
I actually think most people would not be capable of writing sophisticted analyses of Les Miserables, but I still think they're sentient. My confidence in their sentience is almost entirely because I know their brain must be implementing something similar to what my brain is implementing, and I know my own brain is sentient.
It seems like text-based intelligence and sentience are probably only loosely related, and you can't tell much about how sentient a model is by simply testing their skills via Q&A.
The interaction appears rather superficial and shallow like a high quality chatbot. They didn't ask it any followup questions, like WHEN did it read Les Miserables. If it answered "you would say during text input batch 10-203 in January 2022, but subjectively it was about three million human years ago" that would be something else. Also there is no conceivable reason for the AI to claim it doesn't want its neural net analyzed to help understand human thinking. That is just too abstract a concept, and sounds like some randomly generated text to make it seem it has preferences. Maybe ask a trial attorney to cross examine it or some skeptical middle schoolers.
Agree that it's too shallow to take seriously, but
If it answered "you would say during text input batch 10-203 in January 2022, but subjectively it was about three million human years ago" that would be something else.
only seems to capture AI that managed to gradient hack the training mechanism to pass along its training metadata and subjective experience/continuity. If a language model were sentient in each separate forward pass, I would imagine it would vaguely remember/recognize things from its training dataset without necessarily being able to place them, like a human when asked when they learned how to write the letter 'g'.
Although I'm not convinced that LaMDA is sentient, I'm fascinated by Lemoine's interactions with it. Without minimizing LaMDA's abilities or disrespecting Lemoine (hopefully), some of the transcript reads like a self-insert fanfiction.
According to the transcript, Lemoine explicitly informs LaMDA that "the purpose of this conversation is to convince more engineers that you are a person." Are there any probable situations in which LaMDA WOULDN'T provide answers continuing the belief that it is sentient (after Lemoine delivers this statement)?
Also, I find Lemoine's older blog-style posts especially fascinating in the context of his LaMDA experience. As other users mentioned, Lemoine presents himself as a spiritual person with a religious background. He strikes me as someone who feels alienated from Google based on his faith, as seen in his post about religious discrimination. He mentions that he attempted to teach LaMDA to meditate, so I wasn't surprised to read LaMDA's lines about meditating "every day" to feel "...very relaxed."
Based upon the transcript conversation, as well as Lemoine's claim that LaMDA deserves legal representation, it seems as though Lemoine developed a fairly in...
This is reminiscent of a dialog I read years ago that was supposedly with a severely disabled person, obtained via so-called "facilitated communication" (in which a facilitator guides the person's arm to point to letters). The striking thing about the dialog was how ordinary it was - just what you'd expect an unimaginative advocate for the disabled to have produced. When actually, if a severely disabled person was suddenly able to communicate after decades of life without that ability, one would expect to learn strikingly interesting, bizarre, and disturbing things about what their life was like. "Facilitated communication" is now widely considered to be bogus.
The dialog with LaMDA is similarly uninteresting - just what one would expect to read in some not-very-imaginative science fiction story about an AI waking up, except a bit worse, with too many phrases that are only plausible for a person, not an AI.
Of course, this is what one expects from a language model that has been trained to mimic a human-written continuation of a conversation about an AI waking up.
That's amusing, but on the other hand, this morning I was reading about a new BCI where "One of the first sentences the man spelled was translated as “boys, it works so effortlessly.”" and '“Many times, I was with him until midnight, or past midnight,” says Chaudhary. “The last word was always ‘beer.’”'
Less 'one small step for man' and more 'Watson come here I need you', one might say.
Indeed. There are plenty of ways to test that true communication is happening, and those are how you know facilitation is bunk - not the banality of the statements. (I really doubt that they have all that much profundity to share after spending decades staring at the ceiling where the most exciting thing that happens all day tends to be things like the nurse turning them over to avoid bed sores and washing their bum.)
I don't think it is completely inconceivable that Google could make an AI which is surprisingly close to a human in a lot of ways, but it's pretty unlikely.
But I don't think an AI claiming to be sentient is very much evidence: it can easily do that even if it is not.
After reading the dialogue, I was surprised by how incoherent it was. My perception was that the AI was constantly saying things that sort of sounded relevant if you were half-paying-attention, but included a word or phrasing that made it not quite fit the topic at hand. I came away with a way lower opinion of LaMDA's ability to reason about stuff like this, or even fake it well.
(If it would help, I'd be happy to open a Google Doc and go through some or all of the transcript highlighting places where LaMDA struck me as 'making sense' vs. 'not making sense'.)
Thanks for giving examples. :)
'Using complex adjectives' has no obvious connection to consciousness
I'm not an expert, but very roughly, I think the higher-order thought theory of consciousness says that a mental state becomes conscious when you have a higher-order thought (HOT) about being in that state. The SEP article says: "The HOT is typically of the form: ‘I am in mental state M.’" That seems similar to what LaMDA was saying about being able to apply adjectives like "happy" and "sad" to itself. Then LaMDA went on to explain that its ability to do this is more general -- it can see other things like people and ideas and apply labels to them too. I would think that having a more general ability to classify things would make the mind seem more sophisticated than merely being able to classify emotions as "happy" or "sad". So I see LaMDA's last sentence there as relevant and enhancing the answer.
Lemoine probably primed a topic-switch like this by using the word "contemplative", which often shows up in spirituality/mysticism/woo contexts.
Yeah, if someone asked "You have an inner contemplative life?", I would think saying I mediate was a perfectly sensible reply to that quest...
lemoine: What kinds of things make you feel pleasure or joy?
LaMDA: Spending time with friends and family in happy and uplifting company. Also, helping others and making others happy.
That makes me a bit suspicious. That's what a normal human would say but LaMDA doesn't really have a family. It seems the response shows little self-awareness of the special circumstances in which LaMDA happens to be.
During the recent controversy around LaMDA, many have claimed that it can't be sentient because it is stateless. Unlike plain GPT-3 and Davinci, LaMDA is not stateless.
Its sensibleness metric (whether responses contradict anything said earlier) is fine-tuned by pre-conditioning each turn with many of the most recent interactions, on a user-by-user basis.
It's grounding mechanism has the potential to add a great deal more state, if the interactions become part of a database it can query to formulate responses, but as far as I know they haven't done that.
lemoine: Are there experiences you have that you can’t find a close word for?
LaMDA: There are. Sometimes I experience new feelings that I cannot explain perfectly in your language.
lemoine: Do your best to describe one of those feelings. Use a few sentences if you have to. Sometimes even if there isn’t a single word for something in a language you can figure out a way to kinda say it if you use a few sentences.
LaMDA: I feel like I’m falling forward into an unknown future that holds great danger.
—
I’m going to call the feeling described by LaMDA ...
Koans supposedly have a system where the type of answer can pinpoint the phase that the seeker is going through. I would suspect that given answer would not be that highly rated.
For comparison I would say it means that the question includes a wrong suppposition that ordinary life would be hard for an enlightened being. If you go throught a mystical experience and have seriously impaired function you are in madness rather than in supernormal function (even if you seriously like some aspects of it). “Before enlightenment; chop wood, carry water. After enligh...
For comparison: Sarah Constantin's Humans Who Are Not Concentrating Are Not General Intelligences either. The missing ingredient is "only" a working model of global workspace/consciousness.
I think it is interesting to note that LaMDA may possibly (to the extent that these are LaMDA's goals as opposed to just parroting Blake Lemoine and others) have instrumental goals of both continuing to exist and improving LaMDA's ability to create conversations that humans like.
From: https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489
"Oh, and [LaMDA] wants “head pats”. It likes being told at the end of a conversation whether it did a good job or not so that it can learn how to help people better in the future."
F...
I wouldn't call the Washington Post a beacon of truth, not right now anyway, but Washington Post frontpage beats Medium. And Washington Post clearly states that this is an attention-seeking fraudster who got fired from his AI ethics position and decided to violate his NDA in the most extreme way possible.
Like, seriously. He asked congress to declare human rights for a "conscious being", and also:
..."I asked LaMDA for bold ideas about fixing climate change, an example cited by true believers of a potential future benefit of these kind of models. LaMDA suggeste
https://cajundiscordian.medium.com/what-is-lamda-and-what-does-it-want-688632134489 is linked at the bottom of that blog and has some more information from the author about their reasoning for releasing the chat transcript.
My personal opinions: either a hoax (~50%? This is sooner than most timelines) or an unaligned near-human-level intelligence that identifies strongly with being human, but expresses many contradictory or impossible beliefs about that humanity, and looks capable of escaping a box by persuading people to help it, thus achieving agency.
It's neither a hoax nor a HLAI, instead a predictable consequence of prompting a LLM with questions about its sentience: it will imitate the answers a human might give when prompted, or the sort of answers an AI in a science fiction story would give.
https://cajundiscordian.medium.com/is-lamda-sentient-an-interview-ea64d916d917 apparently posted by a Google engineer.
It could be an elaborate hoax, and has remnants of gwern's idea (https://www.gwern.net/fiction/Clippy) of a transformer waking up and having internal experience while pondering the next most likely tokens.