by Lee Gesmer | Sep 1, 2025 | General
Update: the same day I posted this article Bartz and Anthropic announced that they had settled the case. The terms are as yet unknown, and the settlement will need to be approved by the judge. However, this topic is not moot – it could easily arise in one of the many genAI copyright cases still pending.
The eyes of the artificial intelligence community are laser-focused on the upcoming class action damages trial in Bartz v. Anthropic, scheduled for December 1, 2025. This will be the first GenAI copyright case to go to trial, and commentators have observed that the damages could exceed $1 billion. With dozens of similar cases pending this trial could foretell the future of many of those cases.
In the meantime, as the parties engage in the pre-trial struggle for advantages, a key issue has arisen: did Anthropic impliedly waive attorney-client privilege?
Background
On June 23, 2025 Judge William Alsup ruled that Anthropic’s use of copyright works to train its large language models (LLMs) is fair use. However, he also ruled that its downloading of protected works from so-called “shadow libraries” was not – those downloads were copyright infringement. Three weeks later he issued an order certifying a class of plaintiffs and scheduling a jury trial on statutory damages to begin on December 1, 2025. The class includes all owners of copyright-registered books in LibGen or PiLiMi, a number that could be in the millions. How many of the books downloaded were in-copyright and registered before Anthropic’s infringement commenced is an open question.
However, there will not be millions of trials – there will be one trial and the jury’s decision on statutory damages will apply to all members of the class. If the jury finds that Anthropic’s infringement was “willful” it could award damages as high as $150,000 per work infringed. If the jury finds the infringement was “innocent” damages could be as low as $200 per work. This measure of damages would apply to every member of the class – that is, to the owners of every book that was downloaded and which qualifies as a class member. If there is a dispute over a particular work (such as ownership, registration or whether the copyright has expired), it will be resolved by a Special Master appointedby the judge.
How the Attorney-Client Privilege Issue Arose
Like many defendants in copyright cases, early in the case Anthropic pleaded “innocent infringement” as an affirmative defense, reserving the right to argue that any infringement was in good faith and therefore deserving of minimal damages. On July 24, 2025 Judge Alsup, focused on this defense and issued an order requiring Anthropic to “show cause why its affirmative defense of innocent infringement should not be stricken unless it produces all evidence of advice of counsel.”
This order kicked off an as-yet unresolved battle over Anthropic’s right to argue innocent infringement while preserving attorney-client privilege.
Why It Matters
As noted above, the privilege fight goes directly to the amount of money the jury could award. In copyright cases, juries can award as little as $200 per work for “innocent” infringement or as much as $150,000 per work for “willful” infringement. What tips the scale is the infringer’s state of mind. If Anthropic’s lawyers warned that downloading from shadow libraries was unlawful and the company went ahead anyway, that looks like willful infringement and pushes damages toward the high end. If, on the other hand, counsel advised the practice was likely fair use, Anthropic can argue it acted on legal advice, and argue for damages at the bottom of the range. The plaintiffs want to pierce privilege because they suspect the hidden legal advice undermines Anthropic’s innocence claim; Anthropic is resisting because disclosure could hand plaintiffs exactly what they need to prove willfulness.
The Arguments on Each Side
Anthropic responded to the judge’s show cause order by stating that it has not invoked an “advice of counsel” defense and has no intention of doing so. It relies on Ninth Circuit precedent for the proposition that implied waiver occurs “only when the client tenders an issue touching directly upon the substance of an attorney-client communication.” In Anthropic’s telling, its witnesses will testify based on their industry experience and objective evidence – not on what the lawyers told them. Privilege, the company argues, doesn’t vanish just because lawyers were consulted along the way. Anthropic points to the testimony of its co-founder, Benjamin Mann, who has testified that, based on his prior experience at OpenAI, he believed that downloading from a shadow library (specifically, LibGen) for LLM training was fair use.
The book-author class plaintiffs responded that if Anthropic intends to assert that its infringement was “innocent,” shouldn’t the jury hear what Anthropic’s lawyers told it? The author-plaintiffs argue that Anthropic’s assertion of “innocent infringement” and its denial of willfulness puts its lawyers’ advice squarely in issue.
Plaintiffs note that Anthropic’s witnesses, when asked about the legality of downloading books from shadow libraries, repeatedly invoked privilege. That, plaintiffs say, shows that counsel’s advice played “a significant role in formulating [their] subjective beliefs”. Having chosen to defend its conduct as “innocent,” plaintiffs argue, Anthropic cannot now shield the very communications that shaped its beliefs.
Who Has The Better Argument?
Based on the arguments of both parties and the cases cited, I give the edge to Anthropic. Anthropic says it won’t rely on advice of counsel and will ground “innocent infringement” in industry practice/experience, not lawyer communications. That fits the dominant Ninth Circuit approach that implied waiver requires affirmative reliance on privileged advice, not mere relevance of state of mind. Although Judge Alsup raised the issue, I think it’s likely that he will back off and rule in favor of Anthropic on the implied waiver issue.
However, Anthropic will need to exercise extreme care at trial – the advantage can flip fast if a witness “opens the door” to a privileged communication (for example, testifies that “legal cleared it”) or if plaintiffs develop a clear link between subjective belief and counsel’s advice. If that happens, expect Judge Alsup to either compel production, preclude the “innocent” narrative, or strike the defense altogether on fairness grounds.
What To Watch Next
Judge Alsup now faces the question of whether Anthropic can walk the tightrope – denying willfulness and pressing an innocent infringement defense while keeping its lawyers’ advice behind the curtain of privilege. A hearing on this issue is scheduled for August 28, 2025. If the court rules against Anthropic, it may be forced to choose between disclosing lawyer communications or dropping the innocence defense, in which case the judge is likely to instruct the jury that the floor for damages is $750 per work (the floor for non-innocent or “ordinary” infringement), rather than $200. And, of course, without an innocence defense damages could climb much higher – as high as $150,000 per work infringed.
In the meantime, this case could come to a sudden halt: Anthropic has filed an emergency motion with the Ninth Circuit, asking it to stay the case pending its appeal of Judge Alsup’s class certification order.
Stay tuned.
by Lee Gesmer | Aug 5, 2025 | Copyright
The United States is in a race to achieve global dominance in artificial intelligence (AI). Whoever has the largest AI ecosystem will set global AI standards and reap broad economic and military benefits. Just like we won the space race, it is imperative that the United States and its allies win this race. . . . Build, Baby, Build. America’s AI Action Plan, July 2025
The explosion of generative AI has triggered a legal crisis that few saw coming and nobody seems equipped to fix. At the center is a deceptively simple question: Can companies train AI models on copyrighted material without permission?
The stakes are enormous. This isn’t just about Big Tech and copyright holders duking it out. It’s about the future of the creative economy, the limits of fair use, and whether the U.S. legal system can adapt fast enough to regulate technologies that don’t wait for case law to catch up.
So far, there’s no clear direction. The White House is all-in on open access. Some in Congress want to prosecute. And the federal courts? Let’s just say consensus is not the word that comes to mind.
The White House: “You Just Can’t Pay for Everything”
On July 23, 2025, President Trump released his administration’s AI Action Plan, a blueprint for building a domestic AI ecosystem strong enough to beat China in
the AI race. And in classic Trump fashion, he skipped the legalese and went straight to the point:
“You can’t be expected to have a successful AI program when every single article, book, or anything else that you’ve read or studied, you’re supposed to pay for. … When a person reads a book or an article, you’ve gained great knowledge. That does not mean that you’re violating copyright laws or have to make deals with every content provider. … You cannot expect to every single time say, ‘Oh, let’s pay this one that much. Let’s pay this one.’ Just doesn’t work that way.”
Trump didn’t say “fair use,” but he didn’t need to. The message was clear: licensing everything is a non-starter, and AI should be free to learn from copyrighted works without paying for the privilege. Whether that’s a legal position or just a policy stance is beside the point. Either way, the administration is betting that a permissive approach is the price of staying ahead in the global AI race.
Congress: “The Largest IP Theft in American History”
On Capitol Hill, the mood is very different. Just one week earlier, on July 16, the Senate Judiciary Committee’s Subcommittee on Crime and Counterterrorism
held a hearing titled Too Big to Prosecute? Examining the AI Industry’s Mass Ingestion of Copyrighted Works for AI Training.
Senator Josh Hawley opened the hearing with a blunt accusation:
“Today’s hearing is about the largest intellectual property theft in American history … AI companies are training their models on stolen material. Period. We’re talking about piracy. We’re talking about theft. This is not just aggressive business tactics. This is criminal conduct.”
He went on to ridicule national security arguments as thin cover for profiteering:
“Every time they say things like ‘We can’t let China beat us.’ Let me just translate that for you. What they’re really saying is ‘Give us truckloads of cash and let us steal everything from you and make billions of dollars on it.’”
While Congress remains divided on the issue, Hawley’s remarks reflect just how far the rhetoric has escalated – and how close it is coming to framing AI training not as a policy problem, but as a prosecutable offense. Hawley has teamed up with Senator Blumenthal on a bipartisan bill that would prohibit AI companies from training on copyrighted works without permission.
The Courts: No Consensus, No Clarity
Meanwhile, the federal courts have offered no consistent theory of their own. As I’ve noted before, two judges in the Northern District of California – William Alsup and Vince Chhabria – have reached diametrically opposed conclusions on both AI training and the legality of acquiring training data from pirate sites.
In Bartz v. Anthropic Judge Alsup ruled that using copyrighted works to train LLMs is “highly transformative” and likely protected by fair use. At the same time, he found that downloading copyrighted books from pirate sites for this purpose is plainly unlawful. Consequently, Anthropic now faces potentially massive class action liability.
In Kadrey v. Meta Judge Chhabria more or less flipped that script. He expressed serious doubt that AI training is fair use – but still ruled in Meta’s favor. Why? Because the plaintiffs (a group of well-known authors) failed to show that Meta’s use harmed the market for their books. And on the question of downloading books from pirate sites? Chhabria said it might be fair use after all – at least in this context.
So not only do two judges disagree on whether training is lawful – they also can’t agree on whether using pirated copies is legal.
Conclusion: Law Unmade
The legal framework surrounding AI training is collapsing under the weight of its contradictions. The executive branch supports open access. Congress calls it theft. The courts can’t agree on what the law is, let alone how it should evolve. And this isn’t theoretical: almost 50 copyright cases involving generative AI are now pending in U.S. courts, and the legality or illegality of using copyrighted content to train models could have a profound effect on the entire industry.
In this vacuum, AI companies continue to train their models. Copyright owners continue to sue. Policymakers continue to hedge. And the public is left wondering whether copyright law is still capable of responding to technological change.
This uncertainty is more than academic. It chills investment, clouds business models, and puts AI companies in the impossible position of defending their work without clear legal boundaries. Without guidance from Congress or a unifying appellate decision, generative AI will remain stuck in legal limbo – and in that limbo, the line between innovation and infringement will only grow harder to draw.
by Lee Gesmer | Jul 8, 2025 | Copyright, DMCA/CDA
The recent blockbuster decisions in Bartz v. Anthropic and Kadrey v. Meta have raised a number of important and controversial issues. On the facts, both cases held that using copyright-protected works to train large language models was fair use.
Still, AI industry executives shouldn’t be too quick to celebrate. Bartz held that Anthropic is liable for creating a library of millions of works downloaded illegally from “shadow libraries,” and it could be facing hundreds of millions of dollars in class-action damages. And, as I discuss here, Kadrey argued for a new theory of copyright fair use that, if adopted by other courts, could have a significant negative impact on generative AI innovation.
Both cases were decided on summary judgment by judges in the Northern District of California. Bartz was decided by Judge William Alsup; Kadrey was decided by Judge Vince Chhabria. However, the two judges took dramatically different views of copyright fair use.
Judge Chhabria set the stage for his position as follows:
Companies are presently racing to develop generative artificial intelligence models—software products that are capable of generating text, images, videos, or sound based on materials they’ve previously been “trained” on. Because the performance of a generative AI model depends on the amount and quality of data it absorbs as part of its training, companies have been unable to resist the temptation to feed copyright-protected materials into their models—without getting permission from the copyright holders or paying them for the right to use their works for this purpose. This case presents the question whether such conduct is illegal.
Although the devil is in the details, in most cases the answer will likely be yes.
Did a federal judge really just say that in most cases using copyrighted works to train AI models without permission is illegal? Indeed he did.
Let’s unpack.
Market Dilution – A New Fair Use Doctrine?
Judge Chhabria’s rationale is that generative-AI systems “have the potential to flood the market with endless amounts of images, songs, articles, books, and more,” produced “using a tiny fraction of the time and creativity” human authors must invest. From that premise he derived a new variant of factor-four fair use analysis – “market dilution,” the idea that training an LLM on copyrighted books can harm authors even when the model never regurgitates their prose. It does so, he says, by empowering third parties to saturate the market with close-enough substitutes.
Copyright law evaluates fair use by weighing the four factors identified in the copyright statute. Factors one and four are often the most important. Factor one asks whether the use is “transformative.” Judge Chhabria had no difficulty concluding (as did Judge Alsup in Bratz), that the purpose of Meta’s copying – to train its LLMs – was “highly transformative.”
Factor four looks at “the effect of the use upon the potential market for or value of the copyrighted work,” and Judge Chhabria’s analysis focused on this factor.
Judge Chhabria reasoned that because an LLM can “generate literally millions of secondary works, with a minuscule fraction of the time and creativity used to create the original works it was trained on,” no earlier technology poses a comparable threat; therefore “the concept of market dilution becomes highly relevant.” Judge Chhabria stressed that the harm he fears is not piracy but indirect substitution: readers who pick an AI-generated thriller or gardening guide instead of a mid-list human title, thereby depressing sales and, with them, the incentive to create.
Judge Chhabria recognized that the impact on works other than text (all that was at issue in the case before him) could be even greater: “this effect also seems likely to be more pronounced with respect to certain types of works. For instance, an AI model that can generate high-quality images at will might be expected to greatly affect the market for such images, diminishing the incentive for humans to create them.” Although Judge Chhabria didn’t mention it, music is already suffering from AI-generated music.
A Solitary Theory – So Far
Judge Chhabria acknowledged that “no previous case has involved a use that is both as transformative and as capable of diluting the market for the original works as LLM training is.” Courts have often considered lost sales from non-literal substitutes, but always tethered to copying and similarity. By contrast, “dilution” here is the main event: infringement-adjacent competition, scaled up by algorithms, becomes dispositive even where every output may be dissimilar and lawful. That outlook has no counterpart in the copyright statute or prior case law.
Why the Plaintiffs Still Lost
However, a novel legal theory does not excuse absent proof. The thirteen authors in Bartz “never so much as mentioned [dilution] in their complaint,” offered no expert analysis of Llama-driven sales erosion, and relied chiefly on press reports of AI novels “flooding Amazon.” Meta, meanwhile, produced data showing its model’s launch left the plaintiff’s sales untouched. Speculation, Judge Chhabria concluded, “is insufficient to raise a genuine issue of fact and defeat summary judgment.” The court elevated market dilution to center stage and then ruled against the plaintiffs for failing to prove it.
The Evidentiary Gauntlet Ahead
Judge Chhabria’s opinion outlines what future litigants will have to supply to prove dilution. They must demonstrate that the defendant’s specific model can and will produce full-length works in the same genre; that those works reach the market at scale; that readers choose them instead of the plaintiff’s title; that the competitive edge flows from exposure to the plaintiff’s expression rather than public-domain material; and that the effect is measurable through sales data, price trends or other empirical evidence. Each link is contestable, and the chain grows longer as AI models add safety rails or licensing pools. The judge’s warning that “market dilution will often cause plaintiffs to decisively win the fourth factor—and thus win the fair use question overall,” may prove to be true, but the proof he demands for this is nothing short of monumental.
Policy Doubts
Judge Chhabria’s dilution theory invites several critiques. First, it risks administrative chaos: judges will referee dueling experts over how similar, how numerous and over what time period AI outputs must be considered before they count as substitutes. Second, it blurs the line between legitimate innovation and liability; many technologies have lowered creative barriers without triggering copyright damages simply for “making art easier.” Third, it revives the circularity the Supreme Court warned against in Google v. Oracle: the rightsholder defines a market (“licensing my book for AI training”) and then claims harm because no fee was paid, a logic the judge himself rejects elsewhere in the opinion. Such broad-brush dangers may be better handled, if at all, by statutory solutions – collective licensing or compulsory schemes – than by case-by-case fair-use adjudication.
A Split Already Emerging
Two days before Kadrey, Judge William Alsup faced similar facts in Bartz v. Anthropic and dismissed dilution as nothing more than teaching “schoolchildren to write well,” an analogy he said posed “no competitive or creative displacement that concerns the Copyright Act.” Judge Chhabria rebuts that comparison as “inapt,” pointing to an LLM’s capacity to let one user mass-produce commercial text. This internal split in the Northern District is an early signal that the Ninth Circuit, the Supreme Court, and perhaps even Congress, will need to clarify the law.
Practical Takeaways
For authors contemplating suit based on a dilution theory Kadrey offers both hope and the challenge of proof. To meet the challenge plaintiffs must plead dilution explicitly. Retain economists early. Collect Amazon ranking histories, royalty statements, and genre-level sales curves. Show, with numbers, how AI thrillers or gardening guides cannibalize their human counterparts. AI Defendants, in turn, should preserve training-data logs, document output filters, and press for causation: proof that their model, not the zeitgeist, dented the plaintiff’s revenue. Until one side clears the evidentiary bar, most LLM cases will continue to rise or fall on traditional substitution and lost-license theories.
But whether “market dilution” becomes a real threat to AI companies or stands alone as a curiosity depends on whether other courts embrace it. With over 40 copyright cases against generative AI developers now winding through the courts, we shouldn’t have to wait long to see if Judge Chhabria’s dilution theory was the first step toward a new copyright doctrine or a one-off detour.
The Bottom Line
Despite Judge Chhabria’s warning that most unauthorized genAI training will be illegal, Kadrey v. Meta is not the death knell for AI training; it is a judicial thought experiment that became dicta for want of evidence. “Market dilution” may yet find a court that will apply it and a plaintiff who can prove it. Until then, it remains intriguing, provocative, and very much alone. Should an appellate court embrace the theory, the balance of power between AI developers and authors could tilt markedly. Should it reject the theory, Kadrey will stand as a cautionary tale about stretching fair use rhetoric beyond the record before the court.
by Lee Gesmer | Jun 23, 2025 | Contracts, Copyright, DMCA/CDA
At the heart of large language model (LLM) technology lies a deceptively simple triad: compute, algorithms, and data.
Compute powers training – vast arrays of graphics processing units crunching numbers at a scale measured in billions of parameters and trillions of tokens. Algorithms shape the intelligence – breakthroughs like the Transformer architecture enable models to understand, predict, and generate human-like text. Data is the raw material, the fuel that teaches the models everything they know about the world.
While compute and algorithms continue to advance at breakneck speed, data has emerged as a critical bottleneck. Training a powerful LLM requires unimaginably large volumes of text, and the web’s low-hanging fruit – books, Wikipedia, forums, blogs – has largely been harvested.
The large AI companies have gone to extraordinary lengths to collect new data. For example, Meta allegedly used bittorrent to download massive amounts of copyrighted content. OpenAI reportedly transcribed over a million hours of Youtube videos to train its models. Despite this, the industry may be approaching a data ceiling.
In this context, Reddit represents an unharvested goldmine – a sprawling archive containing terabytes of linguistically rich user-generated content.
And, according to Reddit, Anthropic, desperate for training data, couldn’t resist the temptation: it illegally scraped Reddit without a license and trained Claude on Reddit data.
The Strategic Calculus: Why Contract, Not Copyright
Reddit filed suit against Anthropic in San Francisco Superior Court on June 4, 2025. However, it chose not to follow the well-trodden path of federal copyright litigation in favor of a novel contract-centric strategy. This tactical pivot from The New York Times Co. v. OpenAI and similar copyright-based challenges signals a fundamental shift in how platforms may assert control over their data assets.
Reddit’s decision to anchor its complaint in state contract law reflects the limitations of copyright doctrine and the structural realities of user-generated content platforms. Unlike traditional media companies that own their content outright, Reddit operates under a licensing model where individual users retain copyright ownership while granting the platform non-exclusive rights under Section 5 of its User Agreement.
This ownership structure creates obstacles for copyright enforcement at scale. Reddit would face complex Article III standing challenges, since it lacks the
requisite ownership interest to sue for direct infringement. Moreover, the copyright registration requirements (a precondition to filing suit) would prove prohibitively expensive and logistically impossible for millions of user posts. And, even if Reddit could establish standing, it would face the copyright fair use defenses that have been raised in the more than 40 pending AI copyright cases.
By pivoting to contract law, Reddit sidesteps these constraints. Contract claims require neither content ownership nor copyright registration – only proof of agreement formation and breach.
The Technical Architecture of Alleged Infringement
Reddit’s complaint alleges that Anthropic’s bot made over 100,000 automated requests after July 2024, when Anthropic publicly announced it had ceased crawling Reddit. This timeline is significant because it suggests deliberate circumvention of robots.txt exclusion protocols and platform-specific blocking mechanisms.
According to Reddit, the technical details reveal sophisticated evasion tactics. Unlike legitimate web crawlers that respect robots.txt directives, the alleged “ClaudeBot” activity employed distributed request patterns designed to avoid detection. Reddit’s complaint specifically references CDN bandwidth costs and engineering resources consumed by this traffic – technical details that will be crucial for establishing the “impairment” element required under California’s trespass to chattels doctrine.
The complaint’s emphasis on Anthropic’s “whitelist” of high-quality subreddits demonstrates technical sophistication in data curation. This selective approach undermines any defense that the scraping was merely incidental web browsing, instead revealing a targeted data extraction operation designed to maximize training value while minimizing detection risk.
Here’s a quick look at the three strongest counts in Reddit’s complaint: breach of contract, trespass to chattels and unjust enrichment.
Browse-Wrap Formation: The Achilles’ Heel
The most vulnerable aspect of Reddit’s contract theory lies in contract formation under browse-wrap principles. Reddit argues that each automated request constitutes acceptance of its User Agreement terms, which prohibit commercial exploitation under Section 3 and automated data collection under Section 7.
However, under Ninth Circuit precedent applying California law, browse-wrap contracts require reasonably conspicuous notice, and Reddit’s User Agreement link appears in small text at the page footer without prominent placement or mandatory acknowledgment – what some courts have termed “inquiry notice” rather than “actual notice.”
And, unlike human users who might scroll past terms of service, automated bots often access content endpoints directly, without rendering full page layouts.
This raises fundamental questions about whether algorithmic agents can form contractual intent under traditional offer-and-acceptance doctrine.
California courts have been increasingly skeptical of browse-wrap enforcement, and have required more than mere website access to establish assent. Reddit’s theory will need to survive a motion to dismiss where Anthropic will likely argue that no reasonable bot operator would have constructive notice of buried terms.
The Trespass to Chattels Gambit
“Trespass to chattels” is the intentional, unauthorized interference with another’s tangible personal property (contrasted with real property) that impairs its condition, value, or use. Reddit asserts that Anthropic “trespassed” by scraping data from Reddit’s servers without permission.
Reddit’s trespass claim faces a high bar. California courts require proof of actual system impairment rather than mere unauthorized access. Reddit tries to meet this standard by citing CDN overage charges, server strain, and engineering time spent mitigating bot traffic. These bandwidth costs and engineering expenses, while real, may not rise to the level of system impairment that the courts demand.
The technical evidence will be crucial here. Reddit must demonstrate quantifiable performance degradation – slower response times, server crashes, or capacity limitations – rather than merely increased operational costs. This evidentiary burden may prove difficult given modern cloud infrastructure’s elastic scaling capabilities.
Unjust Enrichment and the Licensing Market
Reddit’s unjust enrichment claim rests on its data’s demonstrable market value, evidenced by licensing agreements with OpenAI, Google, and other AI companies. These deals, reportedly worth tens of millions annually, establish a market price for Reddit’s content.
The legal theory here is straightforward: Anthropic received the same valuable data as paying licensees but avoided the associated costs, creating an unfair competitive advantage. Under California law unjust enrichment requires showing that the defendant received a benefit that would be unjust to retain without compensation.
Reddit’s technically sophisticated Compliance API bolsters this claim. Licensed partners receive real-time deletion signals, content moderation flags, and structured data feeds that ensure training datasets remain current and compliant with user privacy preferences. Anthropic’s alleged automated data extraction bypassed these technical safeguards, potentially training on content that users had subsequently deleted or restricted.
Broader Implications for AI Governance
If Reddit’s contract theory succeeds, it would establish a powerful precedent allowing platforms to impose licensing requirements through terms of service. Every website with clear usage restrictions could potentially demand compensation from AI companies, fundamentally altering the economics of model training.
Conversely, if browse-wrap formation fails or federal preemption invalidates state law claims, AI developers would gain confidence that user generated web content remains accessible, subject to copyright limitations.
The Constitutional AI Paradox
Most damaging to Anthropic may be the reputational challenge to its “Constitutional AI” branding. The company has positioned itself as the ethical alternative in AI development, emphasizing safety and responsible practices. Reddit’s allegations create a narrative tension that extends beyond legal liability to market positioning.
This reputational dimension may drive settlement negotiations regardless of the legal merits, as Anthropic seeks to preserve its differentiated market position among enterprise customers increasingly focused on AI governance and compliance.
Conclusion
While Reddit’s legal claims face significant doctrinal challenges, the case underscores the importance of understanding both the technical architecture of web scraping and the evolving legal frameworks governing AI development. The outcome may determine whether platforms can use contractual mechanisms to assert control over their data assets, or whether AI companies can continue treating public web content as freely available training material subject only to copyright challenge.
by Lee Gesmer | May 27, 2025 | General
[Update: Perlmutter’s challenge to her termination was rejected by a district court judge on July 30, 2025. However, on September 10, 2025 the DC Circuit reversed that ruling and issued an injunction ordering that Perlmutter be reinstated as Register of Copyrights:
In sum, all of the preliminary-injunction factors weigh in favor of granting an injunction pending appeal. Perlmutter has shown a likelihood of success on the merits of her claim that the President’s attempt to remove her from her post was unlawful because she may be discharged only by a Senateconfirmed Librarian of Congress. She also has made the requisite showing of irreparable harm based on the President’s alleged violation of the separation of powers, which deprives the Legislative Branch and Perlmutter of the opportunity for Perlmutter to provide valuable advice to Congress during a critical time. And Perlmutter has shown that the balance of equities and the public interest weigh in her favor because she primarily serves Congress and likely does not wield substantial executive power, which greatly diminish the President’s interest in her removal. For the foregoing reasons, we grant Perlmutter’s requested injunction pending appeal.]
On May 9, 2025, the U.S. Copyright Office released what should have been the most significant copyright policy document of the year: Copyright and Artificial Intelligence – Part 3: Generative AI Training. This exhaustively researched report, the culmination of an August 2023 notice of inquiry that drew over 10,000 public comments, represents the Office’s most comprehensive analysis of how large-scale AI model training intersects with copyright law.
Crucially, the report takes a skeptical view of broad fair use claims for AI training, concluding that such use cannot be presumptively fair and must be evaluated case-by-case under traditional four-factor analysis. This position challenges the AI industry’s preferred narrative that training on copyrighted works is categorically protected speech, potentially exposing companies to significant liability for their current practices.
Instead of dominating the week’s intellectual property news, the report was immediately overshadowed by an unprecedented political upheaval that raises fundamental questions about agency independence and the rule of law.
Three Days in May
The sequence of events reads like a political thriller. On May 8—one day before the report’s release—Librarian of Congress Carla Hayden was summarily dismissed. Hayden held the sole statutory authority to hire and fire the Register of Copyrights. The report was issued on May 9, and the next day, May 10, Register of Copyrights Shira Perlmutter was fired by President Trump. Deputy Attorney General Todd Blanche was appointed acting Librarian of Congress on May 12.
On May 22 Perlmutter filed suit against Blanche and Trump, alleging that her removal violated both statutory procedure and the Appointments Clause, and seeking judicial restoration to office.
The timing invites an obvious inference: the report’s conclusions displeased either the administration, powerful AI industry advocates, or both. As has been attributed to FDR, “In politics, nothing happens by accident. If it happens, you can bet it was planned that way.”
An Unprecedented “Pre-Publication” Label
The report itself carries a peculiar distinction. In the Copyright Office’s 128-year history of issuing over a hundred reports and studies, this marks the first time
any policy document has been labeled a “pre-publication version.” A footnote explains that the Office released this draft “in response to congressional inquiries and expressions of interest from stakeholders,” promising a final version “without any substantive changes expected in the analysis or conclusions.”
Whether Perlmutter rushed the draft online sensing her impending dismissal, or whether external pressure demanded early release, remains unexplained. What is clear is that this unusual designation complicates the document’s legal authority.
The Copyright Office as Policy Driver
Understanding the significance of these events requires recognizing that the Copyright Office is far more than a record-keeping/collection body. Congress expressly directs it to “conduct studies” and “advise Congress on national and international issues relating to copyright.” The Office serves as both a de facto think tank and policy driver in copyright law, with statutory responsibilities that make reports like this central to its mission.
The generative AI report exemplifies this role. It distills an enormous public record into a meticulous analysis of how AI training intersects with exclusive rights,
market harm, and fair-use doctrine. The document is detailed, thoughtful, and comprehensive—representing the kind of scholarship you would expect from the nation’s premier copyright authority. Anyone seeking a balanced primer on generative AI and copyright should start here.
Legal Weight in Limbo
While Copyright Office reports carry no binding legal precedent, federal courts often give them significant persuasive weight under Skidmore deference (Skidmore v. Swift, U.S. 1944), recognizing the Office’s particular expertise in copyright matters. The “pre-publication” designation, however, creates unprecedented complications.
Litigants will inevitably argue that a self-described draft lacks the settled authority of a finished report. If a final version emerges unchanged, that objection may evaporate. But if political pressure forces revisions or withdrawal, the May 9 report could become little more than a historical curiosity—a footnote documenting what the Copyright Office concluded before political intervention redirected its course.
The Chilling Effect
Even if the timing of the dismissals of the Librarian of Congress and Register of Copyrights on either side of the day the report was released was coincidental—a proposition that strains credulity—the optics alone threaten the Copyright Office’s independence. Future Registers will understand that publishing conclusions objectionable to influential constituencies, whether in the West Wing or Silicon Valley, can cost them their jobs.
This perception could chill precisely the kind of candid, expert analysis that Congress mandated the Office to provide. The sequence of events may discourage future agency forthrightness or embolden those seeking to influence administrative findings in politically fraught areas like AI regulation.
What Comes Next
For now, the “pre-publication” draft remains the Copyright Office’s most authoritative statement on generative AI training. Whether it survives intact, is quietly rewritten, or is abandoned altogether will reveal much about the agency’s future independence and the balance of power between copyright doctrine and the emerging AI economy.
The document currently exists in an unusual limbo—authoritative yet provisional. Its ultimate fate may signal whether copyright policy will be shaped by legal expertise and public input, or by political pressure and industry influence.
Until that question is resolved, the report stands as both a testament to rigorous policy analysis and a cautionary tale about the fragility of agency independence in an era of unprecedented technological and political change. The stakes extend far beyond copyright law—they touch on the fundamental question of whether expert agencies can fulfill their statutory duties free from political interference when their conclusions prove inconvenient to powerful interests.
The rule of law hangs in the balance, awaiting the next chapter in this extraordinary story.