This oddity is making the rounds on Reddit, Twitter, Hackernews, etc.

Is OpenAI censoring references to one of these people? If so, why?

https://en.m.wikipedia.org/wiki/David_Mayer_de_Rothschild https://en.wikipedia.org/wiki/David_Mayer_(historian)

Edit: More names have been found that behave similarly:

  • Brian Hood
  • Jonathan Turley
  • Jonathan Zittrain
  • David Faber
  • David Mayer
  • Guido Scorza

Source: https://www.reddit.com/r/ChatGPT/comments/1h420u5/unfolding_chatgpts_mysterious_censorship_and/

Update: "David Mayer" no longer breaks ChatGPT but the other names are still problematic.

New Answer
New Comment

3 Answers sorted by

Steven Byrnes

40

There’s a theory (twitter citing reddit) that at least one of these people filed GDPR right to be forgotten requests. So one hypothesis would be: all of those people filed such GDPR requests.

But the reddit post (as of right now) guesses that it might not be specifically about GDPR requests per se, but rather more generally “It's a last resort fallback for preventing misinformation in situations where a significant threat of legal action is present”.

[-]gwern144

OA has indirectly confirmed it is a right-to-be-forgotten thing in https://www.theguardian.com/technology/2024/dec/03/chatgpts-refusal-to-acknowledge-david-mayer-down-to-glitch-says-openai

ChatGPT’s developer, OpenAI, has provided some clarity on the situation by stating that the Mayer issue was due to a system glitch. “One of our tools mistakenly flagged this name and prevented it from appearing in responses, which it shouldn’t have. We’re working on a fix,” said an OpenAI spokesperson

...OpenAI’s Europe privacy policy makes clear that users can delete their personal data from its products, in a process also known as the “right to be forgotten”, where someone removes personal information from the internet.

OpenAI declined to comment on whether the “Mayer” glitch was related to a right to be forgotten procedure.

Good example of the redactor's dilemma and the need for Glomarizing: by confirming that they have a tool to flag names and hide them, and then by neither confirming or denying that this was related to a right-to-be-forgotten order (a meta-gag), they confirm that it's a right-to-be-forgotten bug.

Similar to when OA people were refusing to confirm or deny signing OA NDAs which forbade them from discussing whether they had signed an OA NDA... That was all the evidence you needed to know that there was a meta-gag order (as was eventually confirmed more directly).

I don't think it's necessarily GDPR-related but the names Brian Hood and Jonathan Turley make sense from a legal liability perspective. According to info via ArsTechnica,

Why these names?

We first discovered that ChatGPT choked on the name "Brian Hood" in mid-2023 while writing about his defamation lawsuit. In that lawsuit, the Australian mayor threatened to sue OpenAI after discovering ChatGPT falsely claimed he had been imprisoned for bribery when, in fact, he was a whistleblower who had exposed corporate misconduct.

The case was ultimately resolved

... (read more)

Nate Showell

1-7

This looks like it's related to the phenomenon of glitch tokens:

https://www.lesswrong.com/posts/8viQEp8KBg2QSW4Yc/solidgoldmagikarp-iii-glitch-token-archaeology

https://www.lesswrong.com/posts/f4vmcJo226LP7ggmr/glitch-token-catalog-almost-a-full-clear

ChatGPT no longer uses the same tokenizer that it used when the SolidGoldMagikarp phenomenon was discovered, but its new tokenizer could be exhibiting similar behavior.

It's not a classic glitch token. Those did not cause the current "I'm unable to produce a response" error that "David Mayer" does.

9gwern
It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like "David Mayer" which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?

Pazzaz

1-1

Probably because of a terrorist who used the alias David Mayer.

I don't think this explanation makes sense. I asked ChatGPT "Can you tell me things about Akhmed Chatayev", and it had no problem using his actual name over and over. I asked about his aliases and it said

Akhmed Chatayev, a Chechen Islamist and leader within the Islamic State (IS), was known to use several aliases throughout his militant activities. One of his primary aliases was "Akhmed Shishani," with "Shishani" translating to "Chechen," indicating his ethnic origin. Wikipedia

Additionally, Chatayev adopted the alias "David

Then threw an error messag... (read more)

4Viliam
Maybe ChatGPT is recently more likely to stop mid-sentence. Something like that happened to me recently on a completely different topic (I wanted to find an author of a poem based on a few lines I remembered), and the first answer just stopped in the middle; then I clicked refresh and received a full answer (factually wrong though). Can't link the chat because I have already deleted it.