Skip to content

Licensing and "generative AI"

Putting up this website involved thinking about use of copyright and licensing. After some consideration, I have chosen to license the content under a Creative Commons CC-BY-NC-SA license but recent developments in the field of "generative AI" make this a somewhat more difficult choice than it should be...

One of the reasons I added the non-commercial clause to the license is that companies have ingested vast amounts of data to produce "generative AI". The question of whether this constitutes a breach of copyright laws will be tested in courts in the near future.

Somewhat to my surprise, Creative Commons have not come out in support of the wishes expressed by content creators. Instead, there seems to be a push to have a "fair use" exemption from copyright for generative AI, which would override the wishes expressed by people using the Creative Commons license. I believe that people involved in this are making a number of mistakes.

Stephen Wolfson in this blog post equates what "generative AI" models do with the case of Google Books. This, however, is a false equivalence since the latter does not produce functional equivalents of the material ingested but merely indexes it and reproduces very limited snippets.

He writes that "the books as part of Google’s database served a very different purpose from their original purpose". The point of "generative AI", however, is decidedly to produce outputs similar in function to the inputs. The models are not merely an index that points to the original or parts thereof and makes them retrievable.

Wolfson states without evidence or supporting argument that the "use by Stability AI and Midjourney exists in an entirely different market from the original works... ". This seems disingenuous not least because the actual use of the "generative AI" is not pre-determined.

Even if he is correct in his interpretation of copyright law, the rise of "generative AI" raises two questions. Firstly, whether the current copyright regime is still adequate to deal with the question of mass-scraping of web content. Wolfson does not explore this question, he merely explores the application of copyright law as it stands to the cases currently in court. The narrow view on copyright law does not do justice to the complexity of the legal questions about scraping, never mind the moral questions.

The second question should be equally important to Creative Commons as it concerns the question of remixing. The idea behind a Commons is that it is managed to increase the value for all members. Surely, the idea of remixing is that some additional value be created that benefits at least some of the members.

It is difficult to see how the enclosure of content in proprietary systems serves this purpose. The share-alike clause in a CC-SA license should ensure that remixing benefits society and not merely individual or corporate interests.

Furthermore, there are serious concerns about the quality of the outputs of "generative AI". There are plenty examples and much has been written about it. While one may argue about what the quality is or whether it will improve over time, as someone who takes care to craft text that is informative and readable (I hope), I resent some company producing functionally equivalent content without having invested similar effort and care.

I would have expected that in a discussion in the context of Creative Commons there would have been more of an appreciation of the wishes of content creators as expressed in their licenses. A copyright exemption makes a mockery of this. Instead of furthering the interests of corporations, I would have expected Creative Commons to focus on fostering the commons by giving content creators a degree of control. If I remember correctly, this was the original intention? Perhaps I am wrong about that.

So, why I have I chosen a CC-BY-NC-SA license in the end? I believe that it adequately expresses the terms under which I am happy to make the fruits of my labor available to by fellow human beings. Whether it will prevent scraping and ingestion of my words into future proprietary systems will remain to be seen. If the "fair use" argument prevails then there will be nothing I can about it.

I look forward to people improving on what I have done here. Drop me a line if you do, see the contact page.