This oddity is making the rounds on Reddit, Twitter, Hackernews, etc.
Is OpenAI censoring references to one of these people? If so, why?
https://en.m.wikipedia.org/wiki/David_Mayer_de_Rothschild https://en.wikipedia.org/wiki/David_Mayer_(historian)
Edit: More names have been found that behave similarly:
- Brian Hood
- Jonathan Turley
- Jonathan Zittrain
- David Faber
- David Mayer
- Guido Scorza
Source: https://www.reddit.com/r/ChatGPT/comments/1h420u5/unfolding_chatgpts_mysterious_censorship_and/
Update: "David Mayer" no longer breaks ChatGPT but the other names are still problematic.
It would also be odd as a glitch token. These are space-separated names, so most tokenizers will tokenize them separately, and glitch tokens appear to be due to undertraining but how could that possibly be the case for a phrase like "David Mayer" which has so many instances across the Internet which have no apparent reason to be filtered out by data-curation processes the way the glitch tokens often do?