Anthropic is a rare example of a company that is excited to be sued.”Good news for AI, bad news for authors: A U.S. federal judge has ruled that Anthropic’s use of copyrighted books to train its AI models is “fair use” under the U.S. copyright law.
The ruling, by Judge William Alsup in the Northern District of California, represents a major development in the unfolding legal contest pitting human against machine in the contest for creative freedom.
The ruling stems from a lawsuit filed by a small group of authors who accused Anthropic of copyright infringement for using their works to train its large language models (LLMs), like Claude, without specific permission. Such training was ‘quintessentially transformative,’ and he likened it to a human reader nourishing for a new exposé after browsing through many distantly related texts – not copying or supplanting the original works.
This “transformative” quality was critical to the finding of fair use, as traditional factors for the fair use test include purpose and character of the use, nature of the copyrighted work, amount and substantiality of the portion used, and the effect of the use upon the potential market for or value of the copyrighted work.
This ruling is a big win for AI companies: It means they can use copyrighted materials for AI training as long as they legally obtained the materials first. Importantly, the judge also ordered that Anthropic will remain on trial over their purported use of pirated books to populate their central library of content.
This distinction shows that although the AI training process itself was fair use, the way to get the copyrighted material was legally bound. The December trial will decide damages for the use of those illicitly acquired materials, which could prove very costly for Anthropic.
The decision has immediate implications for the tech industry, offering a measure of legal clarity and a possible precedent for a host of additional copyright infringement lawsuits against leading A.I. developers, including Meta and OpenAI. It seems to be saying that AI companies can still push the limits of innovation and train their models on large bodies of data, so long as that data is sourced legally.
But for content creators, the decision is something of a mixed bag. Though they dont dispute the transformative nature of training AI, authors and publishers are still dealing with the economics of their contributions to AI development and what they see as insufficient remuneration.
The current legal terrain is confirmed by other litigation (including ongoing discussions over licensing frameworks), showing that the fight over data rights in the time of generative AI is not over. While this ruling is groundbreaking for fair use in AI training, it serves as an important reminder that lawful data acquisition is a crucial part of the continued evolution of AI.