Fair Use in the Age of AI: A Student's Guide to a High-Stakes Legal Battle
- Introduction: AI, Copyright, and the Billion-Dollar Question
Imagine a technology that can read every book ever written to learn how to write a new one. That technology is here, but it's built on a billion-dollar question: Did it have the right to read those books in the first place? For Artificial Intelligence (AI) to learn, it must process staggering amounts of data, including billions of images and vast libraries of text. Much of this material—from news articles and books to digital art and photographs—is protected by copyright, igniting a fierce legal battle between AI developers and the creators who own that content.
The core legal question is this: Is it legal for AI companies to train their models on copyrighted material without getting permission or paying for it?
At the center of this global debate is a critical legal doctrine known as fair use. This document will break down what fair use is, how its four-factor test works, and how courts are applying this concept to the world of AI, using a major court case as our guide.
Let's begin by understanding the legal principle at the heart of this conflict.
- What is Fair Use? A Simple Breakdown
In simple terms, fair use is a crucial exception in U.S. copyright law that allows for the limited use of copyrighted works without the owner's permission. It exists to ensure that copyright doesn't stifle creativity, commentary, education, and innovation, which often build upon existing works.
However, fair use is not a simple checklist. It is a flexible but complex "balancing test" that courts use to weigh the specific facts of each case. To do this, judges analyze four factors laid out in the Copyright Act (17 U.S.C. §107).
The Four Factors of Fair Use
- The purpose and character of the use: Is the new work for commercial profit or for non-profit educational purposes? More importantly, is it "transformative"—does it add a new meaning or purpose to the original?
- The nature of the copyrighted work: Is the original work more factual or more creative? Courts provide more protection to a highly creative novel than to a factual list of telephone numbers.
- The amount and substantiality of the portion used: How much of the original work was used in the new one? Was the "heart" of the original work taken?
- The effect of the use upon the potential market for or value of the copyrighted work: Does the new work harm the original creator's ability to make money from their work? Does it serve as a market substitute?
Now, let's see how a court applied these four factors to a real-world dispute involving an AI company.
- Case Study: Thomson Reuters vs. Ross Intelligence
This case provides the first major court ruling on whether using copyrighted works to train an AI tool constitutes fair use.
- The Plaintiff: Thomson Reuters, the owner of Westlaw, a massive legal research platform used by lawyers and law students.
- The Copyrighted Work: Westlaw's "headnotes." These are short, original summaries of key points of law found in court cases, written and copyrighted by Thomson Reuters' legal editors.
- The Defendant: Ross Intelligence, a startup that was building a competing AI-powered legal research tool.
- The Action: Ross initially asked to license Westlaw's content, but Thomson Reuters refused. Ross then hired a third-party company, LegalEase, and instructed it to create bulk memos with legal questions and answers using Westlaw's headnotes to train its AI model.
The court's decision was clear: Ross's use of the copyrighted headnotes was not fair use and constituted copyright infringement.
To understand how the judge reached this conclusion, we need to examine the court's reasoning for each of the four factors.
- The Four Factors in Action: How the Judge Decided
In the Thomson Reuters v. Ross case, the judge carefully balanced all four factors. The final decision hinged on the purpose of Ross's AI tool and its direct impact on Westlaw's market.
4.1 Factor 1: The Purpose and Character of the Use
The court found this factor weighed against fair use for two key reasons:
- Commercial Use: Ross's use was entirely commercial. It was building a for-profit product designed to compete in the marketplace.
- Not Transformative: The court ruled that Ross's tool was not transformative. It didn't create something with a new purpose; it simply used Westlaw's copyrighted content to create a direct competitor in the same market (legal research).
The judge relied heavily on a recent Supreme Court decision, Andy Warhol Foundation v. Goldsmith, to make this point.
Ross’s use is not transformative because it does not have a “further purpose or different character” from Thomson Reuters’s.
The court also distinguished this case from others involving "intermediate copying" (like Google v. Oracle), where copying was found to be fair use. The judge reasoned that in those cases, copying was necessary to achieve an innovative goal. Here, Ross did not need to copy Westlaw's headnotes; it could have created its own training data without infringing on Thomson Reuters' copyrights.
4.2 Factor 2: The Nature of the Copyrighted Work
This factor weighed in favor of Ross and its fair use argument. The court recognized that Westlaw's headnotes, while original, are primarily factual summaries of legal points rather than highly creative or imaginative works of fiction. The law generally provides less protection to factual works than to creative ones under the fair use doctrine.
4.3 Factor 3: The Amount and Substantiality of the Portion Used
This factor also weighed in favor of Ross and its fair use argument. The court reasoned that while Ross copied the headnotes, this was an intermediate step. The final product—the AI tool that produced citations to court cases—did not contain the copyrighted headnotes themselves. Because the end product was different from the copied material, this factor tilted in Ross's favor.
4.4 Factor 4: The Effect on the Potential Market
The court found this factor weighed heavily against fair use and was decisive in its ruling.
The court called this factor "the single most important element of fair use."
The judge identified two types of market harm:
- Direct Market Harm: Ross's product was explicitly created to be a direct market substitute for Westlaw. It was designed to take customers away from Thomson Reuters by offering a similar service.
- Potential Market Harm: Ross's actions undermined a potential new market that Thomson Reuters could enter: the market for licensing "data to train legal AI models." By simply taking the data, Ross harmed Thomson Reuters' ability to profit from licensing its headnotes for this exact purpose in the future.
This case involved a non-generative AI tool, but what does it tell us about the broader landscape of generative AI?
- The Bigger Picture: What Does This Case Mean for Generative AI?
The Ross Intelligence ruling is significant, but it's important to note it involved a non-generative AI tool that was a direct competitor to the copyright holder. Generative AI models, like those from OpenAI and Anthropic, are different—they create entirely new text, images, and audio.
Recent court rulings in cases involving generative AI, such as Bartz v. Anthropic and Kadrey v. Meta, have taken a different path. In those cases, courts have suggested that training a generative model can be fair use because the purpose is highly transformative. However, these rulings come with significant warnings. In Kadrey, for example, the judge's decision was "pointedly narrow" and he even suggested that when considering if AI training on copyrighted works is illegal, "in most cases the answer will likely be yes."
The table below contrasts the key arguments.
Thomson Reuters v. Ross (Non-Generative AI) Generative AI Cases (e.g., Bartz v. Anthropic) Use: Created a direct commercial competitor. Use: Deemed "highly transformative," creating something new. Court's View: Not transformative; a market substitute. Court's View: Analogous to "human learning and memory." Outcome: Not fair use. Outcome: Training can be fair use (though questions remain).
This is a rapidly evolving area of law. Beyond just disagreeing on outcomes, courts are also split on the very method they should use. For instance, judges in Bartz and Kadrey diverged on whether the act of acquiring data and the act of training a model on it should be analyzed together or separately. Other major lawsuits, including the high-profile case of The New York Times v. OpenAI, are still pending. The outcomes of these cases will continue to shape the future of AI and copyright law.
Let's conclude by distilling the most important lessons from this complex legal landscape.
- Conclusion: Three Key Takeaways for Students
As courts continue to grapple with these issues, here are three essential takeaways to help you understand the core of the debate over AI and fair use.
- Fair Use is a Balancing Act: Fair use is not a simple, predictable rule. It is a case-by-case analysis where courts must weigh the four factors. The outcome depends entirely on the specific facts of how a copyrighted work was used.
- Purpose and Market Impact are Crucial: In AI cases so far, courts have focused heavily on Factor 1 (the purpose and character of the use) and Factor 4 (the effect on the market). In the Ross case, these two factors outweighed the other two and were decisive.
- "Transformative" vs. "Substitute" is the Key Divide: The central legal battle is over a single question: Is training an AI a "transformative" act that creates a new product with a new purpose (which may be fair use)? Or does it simply create a "substitute" that harms the original creator's market (which is likely not fair use)?
As future lawyers, technologists, and creators, understanding this evolving battleground isn't just an academic exercise—it's essential for shaping a future where innovation and creativity can both thrive.