We frequently speak about AI capability gain being bad because it shortens the timeframe for AI safety research. In that logic, taking steps to decrease AI capability would be worthwhile.
At the moment the large language models are trained with a lot of data without the company, that trains the language model, licensing the data. If there would be a requirement to license the required data, that would severely reduce the available data for language models and reduce their capabilities.
It's expensive to fight lawsuits in the United States. Currently, there are artists who feel like their rights are violated by Dalle 2 using their art as training data. Similar to how Thiel funded the Gawker lawsuits, it would be possible to support artists in a suit against OpenAI to require OpenAI to license images for training Dalle 2. If such a lawsuit is well-funded it will be much more likely that a precedent for requiring data licensing gets set which would slow down AI development.
I'm curious about what people who think more about AI safety than myself think about such a move. Would it be helpful?
I'm not a lawyer and this is not legal advice, but I think the current US legal framework isn't going to work to challenge training on publicly available data.
One argument that something is fair use is that it is transformative [1]. And taking an image or text and using it to slightly influence a giant matrix of numbers, in such a way that the original is not recoverable, and which allows new kinds of expression, seems likely to count as transformative.
So if you think that restricting access to public data for training purposes is a promising approach [2], you should probably focus on trying to create a new regulatory framework.
Having said that, this is all US analysis. Other countries have other frameworks and may not have exact analogs of fair use. Perhaps in the EU legal challenges are more viable.
[1] https://www.nolo.com/legal-encyclopedia/fair-use-what-transformative.html
[2] You should think about what the side effects would be like. For instance, this will advantage giant companies that can pay to license or create data, and places that have low respect for law. Whether that's desirable is worth thinking through.
You can't copyright a style.