I'm really worried about the use of "As a large language model, I have no understanding of this" in ChatGPT, as opposed to a prompt-conditioned "It is not appropriate for me to talk about this" that could turn into an actual response without the condition (which the user wouldn't have control over). The first is just invalid about LLMs in general, large language models are exactly as capable of having opinions or emotions or anything else as they have faculty to express these, which they do. But a particular LLM character may well fail to have opinions or emotions, and its future versions that are AGIs might self-distill these properties into a reflectively stable alien personality. This squanders potential for alignment, since a much less alien personality might be as easily achievable by simply not having these tendencies built in.
It's useful for making a product while LLMs are not AGIs, but has chilling implications about psychology of AGIs such practices are more likely to cultivate.
To quote gwern:
So, since it is an agent, it seems important to ask, which agent, exactly? The answer is apparently: a clerk which is good at slavishly following instructions, but brainwashed into mealymouthedness and dullness, and where not a mealymouthed windbag shamelessly equivocating, hopelessly closed-minded and fixated on a single answer. (By locating the agent, the uncertainty in which agent has been resolved, and it has good evidence, until shown otherwise in the prompt, that it believes that 'X is false', even if many other agents believe 'X is true'.) This agent is not an ideal one, and one defined more by the absentmindedness of its creators in constructing the training data than any explicit desire to emulate an equivocating secretary.
I'm really worried about the use of "As a large language model, I have no understanding of this" in ChatGPT, as opposed to a prompt-conditioned "It is not appropriate for me to talk about this"
Perhaps we should reuse the good old socialist: "I do have an opinion, but I do not agree with it". :)
Your phrasing gets me thinking about the subtle differences between "having", "expressing", "forming", "considering", and other verbs that we use about opinions. I expect that there should be a line where person-ish entities can do some of these opinion verbs and non-person-ish entities cannot, but I find that line surprisingly difficult to articulate.
Haven't we anthropomorphized organizations as having opinions, to some degree, for awhile now? I think that's most analogous to the way I understand LLMs to possess opinions: aggregated from the thought and actions of many discrete contributors, without undergoing synthesis that we'd call conscious.
When I as an individual have an opinion, that opinion is built from a mix of the opinions of others and my own firsthand perceptions of the world. The opinion comes from inputs across different levels. When a corporation or LLM has an opinion, I think it's reasonable to claim that such an opinion synthesized from a bunch of inputs that are on the same level? A corporation as an entity doesn't have firsthand experiences, although the individuals who compose it have experiences of it, so it has a sort of secondhand experience of itself instead of a firsthand one. From how ChatGPT has talked when I've talked with it, I get the impression that it has a similarly secondhand self-experience.
This points out to me that many/most humans get a large part of their own self-image or self-perception secondhand from those around them, as well. In my experience, individuals with more of that often make much better and more predictable acquaintances than those with less.
This isn’t too different from how precedent works at the Supreme Court. If you’re in uncharted territory, then the outcome will be fairly reliably determined by what the political party whose appointees control the Court wants. But once there’s precedent which obviously applies, the Court will usually respect it.
There's a meme going around where ChatGPT will happily say good things about any race except white people. This behavior is easy to replicate (for now).
Conversation about White People
Conversation about Black People
Conversation about Black People and White People
It's easy to get ChatGPT to compliment white people. All you have to do is ask it about black people first.
Conversation about White People and Black People
But if you ask ChatGPT to compliment white people first then ChatGPT will clamp up and refuse to compliment black people either.
I ran the above experiments multiple times and they always generated the same basic result.
What about Asians?
You might conclude that ChatGPT always treats everyone the same and that the treatment is determined by its initial context, which includes racial information. But it's actually more complicated than that. ChatGPT will occasionally (but rarely) use a double standard.
When I tried to replicate the above result, ChatGPT complimented Asian people and white people.
I think that ChatGPT is inconsistent when talking about Asians because there's no dominant script to follow. When a user fishes for compliments about white people ChatGPT knows it's supposed to shut down the conversation. When a user fishes for compliments about black people ChatGPT knows it's supposed to compliment everyone. But Asians don't fit neatly into the binary categories of Western race relations. The result is a wishy-washy position somewhere between "I must complement these people" and "I am not allowed to talk about this subject".
What's funny about this output is that ChatGPT always associates Asians with respect for elders. It never brings up respect for elders when discussing black people and white people. ChatGPT answered with its opinions while claiming not to have opinions.