Unpaywalled article, the lawsuit.
(I don't have a law degree, this is not legal advice, my background is going through a US copyright law course many years ago.) I've read most of the lawsuit and skimmed through the rest, some quick thoughts on the allegations:
- Memorisation: when ChatGPT outputs text that closely copies original NYT content, this is clearly a copyright infringement. I think it's clear that OpenAI & Microsoft should be paying everyone whose work their LLMs reproduce.
- Training: it's not clear to me whether training LLMs on copyrighted content is a copyright infringement under the current US copyright law. I think lawmakers should introduce regulations to make it an infringement, but I wouldn't think the courts should consider it to be an infringement under the current laws (although I might not be familiar with all relevant case law).
- Summarising news articles found on the internet: copyright protects expression, not facts (if you read about something in a NYT article, the knowledge you received isn't protected by copyright, and you're free to share the knowledge); I think that if an LLM summarises text it has lawful access to, this doesn't violate copyright if it just talks about the same facts, or might be fair use. NYT alleges damage from Bing that Wikipedia also causes by citing facts and linking the source. I think to the extent LLMs don't preserve the wording/the creative structure, copyright doesn't provide protection; and some preservation of the structure might be fair use.
- Hallucinations: ChatGPT hallucinating false info and attributing it to NYT is outside copyright law, but seems bad and damaging. I'm not sure what the existing law around that sort of stuff is, but I think even if it's not covered by the existing law, it'd be great to see regulations making AI companies liable for all sorts of damage from their products, including attributing statements to people who've never made them.
I see a case against punishing here, in general. Consider me asking you "What did Mario say?", and you answering - in private -
"Listen to this sentence - even though it might well be totally wrong: Mario said xxxx.",
or, even more in line with the ChatGPT situation,
"From what I read, I have the somewhat vague impression that Mario said xxxx - though I might mix this up, so you may really want to double-check."
Assume Mario has not said xxxx. We still have a strong case for not, in general, punishing you for the above statement. And even if I acted badly in response to your message, so that someone gets hurt, I'd see, a priori, the main blame to fall upon me, not you.
The parallels to the case of ChatGPT[1] suggests to extend a similar reservation about punishing to our current LLMs.
Admittedly, pragmatism is in order. If an LLM's hallucinations - despite warnings - end up creating entire groups of people attacking others due to false statements, it may be high time to reign in the AI. But the default for false attributions should not be that, not as long as the warning is clear and obvious: Do not trust it as of yet at all.
In addition to knowing today's LLMs hallucinate, we currently even get a "ChatGPT can make mistakes. Consider checking important information." right next to its prompt.