Whether or not AI training is fair use under US copyright law is an unsettled question that likely will be fought out in some court battles.
https://www.copyright.gov/rulings-filings/review-board/docs/a-recent-entrance-to-paradise.pdf seems to suggest that the US copyright office believes:
The Board accepts as a threshold matter Thaler’s representation that the Work was
autonomously created by artificial intelligence without any creative contribution from a human
actor
Given that all those AI-generated imagines are based in part on human-generated training data, this seems to be an expressed view that the training data is no "creative contribution from a human actor"
From an AI risk perspective, this seems to be an interesting question. You could limit AI capability by pushing for a law that makes the training data use copyright.
This is basically a crosspost for https://githubcopilotinvestigation.com/. I noticed that some folks in California are considering a lawsuit against Microsoft/OpenAI.
tl;dr:
Sections of particular interest:
- Software Freedom Conservancy
- https://docs.github.com/en/copilot/overview-of-github-copilot/about-github-copilot#using-github-copilot