Is GitHub Copilot in legal trouble?

tcelferact

This is basically a crosspost for https://githubcopilotinvestigation.com/. I noticed that some folks in California are considering a lawsuit against Microsoft/OpenAI.

tl;dr:

Copilot is trained on open source software.
Copilot doesn't respect the licensing agreements of that software.
Copilot doesn't have a clear fair use argument for doing so.
By accepting copilot suggestions, you are potentially violating licensing agreements yourself.

Sections of particular interest:

[W]e inquired privately with Friedman and other Microsoft and GitHub representatives in June 2021, asking for solid legal references for GitHub’s public legal positions … They provided none.

- Software Freedom Conservancy

“You are responsible for ensuring the security and quality of your code. We recommend you take the same precautions when using code generated by GitHub Copilot that you would when using any code you didn’t write yourself. These precautions include rigorous testing, IP [(= intellectual property)] scanning [my emphasis], and tracking for security vulnerabilities.”

- https://docs.github.com/en/copilot/overview-of-github-copilot/about-github-copilot#using-github-copilot

Whether or not AI training is fair use under US copyright law is an unsettled question that likely will be fought out in some court battles.

https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf seems to suggest that the US copyright office believes:

The Board accepts as a threshold matter Thaler’s representation that the Work was
autonomously created by artificial intelligence without any creative contribution from a human
actor

Given that all those AI-generated imagines are based in part on human-generated training data, this seems to be an expressed view that the training data is no "creative contribution from a human actor"

From an AI risk perspective, this seems to be an interesting question. You could limit AI capability by pushing for a law that makes the training data use copyright.