by Lee Gesmer | May 27, 2025 | General
On May 9, 2025, the U.S. Copyright Office released what should have been the most significant copyright policy document of the year: Copyright and Artificial Intelligence – Part 3: Generative AI Training. This exhaustively researched report, the culmination of an August 2023 notice of inquiry that drew over 10,000 public comments, represents the Office’s most comprehensive analysis of how large-scale AI model training intersects with copyright law.
Crucially, the report takes a skeptical view of broad fair use claims for AI training, concluding that such use cannot be presumptively fair and must be evaluated case-by-case under traditional four-factor analysis. This position challenges the AI industry’s preferred narrative that training on copyrighted works is categorically protected speech, potentially exposing companies to significant liability for their current practices.
Instead of dominating the week’s intellectual property news, the report was immediately overshadowed by an unprecedented political upheaval that raises fundamental questions about agency independence and the rule of law.
Three Days in May
The sequence of events reads like a political thriller. On May 8—one day before the report’s release—Librarian of Congress Carla Hayden was summarily dismissed. Hayden held the sole statutory authority to hire and fire the Register of Copyrights. The report was issued on May 9, and the next day, May 10, Register of Copyrights Shira Perlmutter was fired by President Trump. Deputy Attorney General Todd Blanche was appointed acting Librarian of Congress on May 12.
On May 22 Perlmutter filed suit against Blanche and Trump, alleging that her removal violated both statutory procedure and the Appointments Clause, and seeking judicial restoration to office.
The timing invites an obvious inference: the report’s conclusions displeased either the administration, powerful AI industry advocates, or both. As has been attributed to FDR, “In politics, nothing happens by accident. If it happens, you can bet it was planned that way.”
An Unprecedented “Pre-Publication” Label
The report itself carries a peculiar distinction. In the Copyright Office’s 128-year history of issuing over a hundred reports and studies, this marks the first time
any policy document has been labeled a “pre-publication version.” A footnote explains that the Office released this draft “in response to congressional inquiries and expressions of interest from stakeholders,” promising a final version “without any substantive changes expected in the analysis or conclusions.”
Whether Perlmutter rushed the draft online sensing her impending dismissal, or whether external pressure demanded early release, remains unexplained. What is clear is that this unusual designation complicates the document’s legal authority.
The Copyright Office as Policy Driver
Understanding the significance of these events requires recognizing that the Copyright Office is far more than a record-keeping/collection body. Congress expressly directs it to “conduct studies” and “advise Congress on national and international issues relating to copyright.” The Office serves as both a de facto think tank and policy driver in copyright law, with statutory responsibilities that make reports like this central to its mission.
The generative AI report exemplifies this role. It distills an enormous public record into a meticulous analysis of how AI training intersects with exclusive rights,
market harm, and fair-use doctrine. The document is detailed, thoughtful, and comprehensive—representing the kind of scholarship you would expect from the nation’s premier copyright authority. Anyone seeking a balanced primer on generative AI and copyright should start here.
Legal Weight in Limbo
While Copyright Office reports carry no binding legal precedent, federal courts often give them significant persuasive weight under Skidmore deference (Skidmore v. Swift, U.S. 1944), recognizing the Office’s particular expertise in copyright matters. The “pre-publication” designation, however, creates unprecedented complications.
Litigants will inevitably argue that a self-described draft lacks the settled authority of a finished report. If a final version emerges unchanged, that objection may evaporate. But if political pressure forces revisions or withdrawal, the May 9 report could become little more than a historical curiosity—a footnote documenting what the Copyright Office concluded before political intervention redirected its course.
The Chilling Effect
Even if the timing of the dismissals of the Librarian of Congress and Register of Copyrights on either side of the day the report was released was coincidental—a proposition that strains credulity—the optics alone threaten the Copyright Office’s independence. Future Registers will understand that publishing conclusions objectionable to influential constituencies, whether in the West Wing or Silicon Valley, can cost them their jobs.
This perception could chill precisely the kind of candid, expert analysis that Congress mandated the Office to provide. The sequence of events may discourage future agency forthrightness or embolden those seeking to influence administrative findings in politically fraught areas like AI regulation.
What Comes Next
For now, the “pre-publication” draft remains the Copyright Office’s most authoritative statement on generative AI training. Whether it survives intact, is quietly rewritten, or is abandoned altogether will reveal much about the agency’s future independence and the balance of power between copyright doctrine and the emerging AI economy.
The document currently exists in an unusual limbo—authoritative yet provisional. Its ultimate fate may signal whether copyright policy will be shaped by legal expertise and public input, or by political pressure and industry influence.
Until that question is resolved, the report stands as both a testament to rigorous policy analysis and a cautionary tale about the fragility of agency independence in an era of unprecedented technological and political change. The stakes extend far beyond copyright law—they touch on the fundamental question of whether expert agencies can fulfill their statutory duties free from political interference when their conclusions prove inconvenient to powerful interests.
The rule of law hangs in the balance, awaiting the next chapter in this extraordinary story.
by Lee Gesmer | May 14, 2025 | Copyright, DMCA/CDA, General
The Copyright Office has been engaged in a multi-year study of how copyright law intersects with artificial intelligence. That process culminated in a series of three separate reports: Part 1 – Unauthorized Digital Replicas, Part 2 – Copyrightability, and now, the much-anticipated Part 3—Generative AI Training.
Many in the copyright community anticipated that the arrival of Part 3 would be the most important and controversial. It addresses a central legal question in the flood of recent litigation against AI companies: whether using copyrighted works as training data for generative AI qualifies as fair use. While the views of the Copyright Office are not binding on courts, they often carry persuasive weight with federal judges and legislators.
The Report Arrives, Along With Political Controversy
The Copyright Office finally issued its report – Copyright and Artificial Intelligence – Part 3 Generative AI Training (link on Copyright Office website; back-up link here) on Friday, May 9, 2025. But this was no routine publication. The document came with an unusual designation: “Pre-Publication Version.”
Then came the shock. The following day, Shira Perlmutter, the Register of Copyrights and nominal author of the report, was fired. Perlmutter had served in the role since October 2020, appointed by Librarian of Congress Carla Hayden—herself fired two days earlier, on May 8, 2025.
These abrupt, unexplained dismissals have rocked the copyright community. The timing has fueled speculation of political interference linked to concerns from the tech sector. Much of the conjecture has centered on Elon Musk and xAI, his artificial intelligence company, which may face copyright claims over the training of its Grok LLM model.
Adding to the mystery is the “pre-publication” label itself—something the Copyright Office has not used before. The appearance of this label, followed by Perlmutter’s termination, has prompted widespread belief that the report’s contents were viewed as too unfriendly to the AI industry’s legal position, and that her removal was a prelude to a potential retraction or revision.
What’s In That Report, Anyway?
Why might this report have rattled so many cages?
In short, it delivers a sharp rebuke to the AI industry’s prevailing fair use narrative. While the Office does not conclude that AI training is categorically infringing, its analytical framework casts deep doubt on the broad legality of using copyrighted works without permission to train generative models.
Here are key takeaways:
Transformative use? At the heart of the report is a skeptical view of whether using copyrighted works to train an AI model is “transformative” under Supreme Court precedent. The Office states that such use typically does not “comment on, criticize, or otherwise engage with” the copyrighted works in a way that transforms their meaning or message. Instead, it describes training as a “non-expressive” use that merely “extracts information about linguistic or aesthetic patterns” from copyrighted works—a use that courts may find insufficiently transformative.
Commercial use? The report flatly rejects the argument that AI training should be considered “non-commercial” simply because the outputs are new or the process is computational. Training large models is a commercial enterprise by for-profit companies seeking to monetize the results, and that, the Office emphasizes, weighs against fair use under the first factor.
Amount and substantiality? Many AI models are trained on entire books, images, or articles. The Office notes that this factor weighs against fair use when the entirety of a work is copied—even if only to extract patterns—particularly when that use is not clearly transformative.
Market harm? Here, the Office sounds the loudest alarm. It directly links unauthorized AI training to lost licensing opportunities, emerging collective licensing schemes, and potential market harm. The Office also notes that AI companies have begun entering into licensing deals with rightsholders—ironically undercutting their own arguments that licensing is impractical. As the Office puts it, the emergence of such markets suggests that fair use should not apply, because a functioning market for licenses is precisely what the fourth factor is meant to protect.
But Google Books? The report goes out of its way to distinguish training on entire works from cases like Authors Guild v. Google, where digitized snippets were used for a non-expressive, publicly beneficial purpose—search. AI training, by contrast, is described as for-profit, opaque, and producing outputs that may compete with the original works themselves.
Collectively, these conclusions paint a picture of AI training as a weak candidate for fair use protection. The report doesn’t resolve the issue, but it offers courts a comprehensive framework for rejecting broad fair use claims. And it sends a strong signal to Congress that licensing—statutory or voluntary—may be the appropriate policy response.
Conclusion
It didn’t take long for litigants to seize on the report. The plaintiffs in Kadrey v. Meta (which I recently wrote about here) filed a Statement of Supplemental Authority on May 12, 2025, the very next business day, citing Ninth Circuit authority that Copyright Office reports may be persuasive in arriving at judicial decisions (but failing to note that the report in question here is “pre-publication”). The report was submitted to judges in other active AI copyright cases as well.
The coming weeks may determine whether this report is a high-water mark in the Copyright Office’s independence or the opening move in its politicization. The “pre-publication” status may lead to a walk-back under new leadership. If, on the other hand, the report is published as final without substantive change, it may become a touchstone in the pending cases and influence future legislation.
If it survives, the legal debate over generative AI may have moved into a new phase—one where assertions of fair use must confront a detailed, skeptical, and institutionally backed counterargument.
As for the firing of Shira Perlmutter and Carla Hayden? No official explanation has been offered. But when the nation’s top copyright official is fired within 24 hours of issuing what could prove to be the most consequential copyright report in a generation, the message—intentional or not—is that politics may be catching up to policy.
Copyright and Artificial Intelligence, Part 3: Generative AI Training (pre-publication)
by Lee Gesmer | Apr 28, 2025 | Copyright
“Move fast and break things.” Mark Zuckerberg’s famous motto seems especially apt when examining how Meta developed Llama, its flagship AI model.
Like OpenAI, Google, Anthropic, and others, Meta faces copyright lawsuits for using massive amounts of copyrighted material to train its large language models (LLMs). However, the claims against Meta go further. In Kadrey v. Meta, the plaintiffs allege that Meta didn’t just scrape data — it pirated it, using BitTorrent to pull hundreds of terabytes of copyrighted books from shadow libraries like LibGen and Z-Library.
This decision could significantly weaken Meta’s fair use defense and reshape the legal framework for AI training-data acquisition.
Meta’s BitTorrent Activities
In Kadrey v. Meta, plaintiffs allege that discovery has revealed that Meta’s GenAI team pivoted from tentative licensing discussions with publishers to mass BitTorrent downloading after receiving internal approvals that allegedly escalated “all the way to MZ”—Mark Zuckerberg.
BitTorrent is a peer-to-peer file-sharing protocol that efficiently distributes large files by breaking them into small pieces and sharing them across a decentralized “swarm” of users. Once a user downloads a piece, they immediately begin uploading it to others—a process known as “seeding.” While BitTorrent powers many legitimate projects like open source software distribution, it’s also the lifeblood of piracy networks. Courts have long treated unauthorized BitTorrent traffic as textbook copyright infringement (e.g., Glacier Films v. Turchin, 9th Cir. 2018).
The plaintiffs allege that Meta engineers, worried that BitTorrent “doesn’t feel right for a Fortune 500 company,” nevertheless torrented 267 terabytes between April and June 2024—roughly twenty Libraries of Congress worth of data. This included the entire LibGen non-fiction archive, Z-Library’s cache, and massive swaths of the Internet Archive. According to the plaintiffs’ forensic analysis, Meta’s servers re-seeded the files back into the swarm, effectively redistributing mountains of pirated works.
The Legal Framework and Why BitTorrent Matters
Meta’s alleged use of BitTorrent complicates its copyright defense in several critical ways:
1. Reproduction vs. Distribution Liability
Most LLM training involves reproducing copyrighted works, which defendants typically argue is protected as fair use. But BitTorrent introduces unauthorized distribution under § 106(3) of the Copyright Act. Even if the court finds Llama’s training to be fair use, unauthorized seeding could constitute a separate violation harder to defend as transformative.
2. Willfulness and Statutory Damages
Internal communications allegedly showed engineers warning about the legal risks, describing the pirated sources as “dodgy,” and joking about torrenting from corporate laptops. Plaintiffs allege that Meta ran the jobs on Amazon Web Services rather than Facebook servers, in a deliberate effort to make the traffic harder to trace back to Menlo Park. If proven, these facts could support a finding of willful infringement, exposing Meta to enhanced statutory damages of up to $150,000 per infringed work.
3. “Unclean Hands” and Fair Use Implications
The method of acquisition may significantly impact fair use analysis. Plaintiffs point to Harper & Row v. Nation Enterprise (1985), where the Supreme Court found that bad faith acquisition—stealing Gerald Ford’s manuscript—undermined the defendant’s fair use defense. They argue that torrenting from pirate libraries is today’s equivalent of exploiting a purloined manuscript.
Meta’s Defense and Its Vulnerabilities
Meta argues that its use of the plaintiffs’ books is transformative: it extracts statistical patterns, not expressive content. They rely on Authors Guild v. Google Books (2nd Cir. 2015) and emphasize that fair use focuses on how a work is used, not obtained. Meta claims that its engineers took steps to minimize seeding—however, the internal data logs that would prove this are missing.
The company also frames Llama’s outputs as new, non-infringing content—asserting that bad faith, even if proven, should not defeat fair use.
However, the plaintiffs counter that Llama differs from Google Books in key respects:
– Substitution risk: Llama is a commercial product capable of producing long passages that may mimic authors’ voices, not merely displaying snippets.
– Scale: The amount of copying—terabytes of entire book databases—dwarfs that upheld in Google Books.
– Market harm: Licensing markets for AI training datasets are emerging, and Meta’s decision to torrent pirated copies directly undermines that market.
Moreover, courts have routinely rejected defenses based on the idea that pirated material is “publicly available.” Downloading infringing content over BitTorrent has never been viewed kindly—even when defendants claimed to have good intentions.
Even if Meta persuades the court that its training of Llama is transformative, the torrenting evidence remains a serious threat because:
– The automatic seeding function of BitTorrent means Meta likely distributed copyrighted material, independent of any transformative use
– The apparent bad faith (jokes about piracy, euphemisms describing pirated archives as “public” datasets) and efforts to conceal traffic present a damaging narrative
– The deletion of torrent logs may support an adverse inference that distribution occurred
– Judge Vince Chhabria might prefer to decide the case on familiar grounds—traditional copyright infringement—rather than attempting to set sweeping precedent on AI fair use
Broader Implications
If the court rules that unlawful acquisition via BitTorrent taints subsequent transformative uses, the AI industry will face a paradigm shift. Companies will need to document clean sourcing for training datasets—or face massive statutory damages.
If Meta prevails, however, it may open the door for more aggressive data acquisition practices: anything “publicly available” online could become fair game for AI training, so long as the final product is sufficiently transformative.
Regardless of the outcome, the record in Kadrey v. Meta is already reshaping AI companies’ risk calculus. “Scrape now, pay later” is beginning to look less like a clever strategy and more like a legal time bomb.
Conclusion
BitTorrent itself isn’t on trial in Kadrey v. Meta, but its DNA lies at the center of the dispute. For decades, most fair use battles have focused on how a copyrighted work is exploited. This case asks a new threshold question: does how you got the work come first?
The answer could define how the next generation of AI is built.
by Lee Gesmer | Mar 22, 2025 | General
In 2019, Stephen Thaler filed an unusual copyright application. Instead of submitting traditional artwork, the piece—titled “A Recent Entrance to Paradise” (image at top)—identified an unusual “creator”: the “Creativity Machine.” The Creativity Machine is an AI system invented by Thaler. In his application for registration, Thaler informed the Copyright Office that the work was “created autonomously by machine,” and he claimed the copyright based on his “ownership of the machine.”
After appealing the Copyright Office denial of registration to the District Court and losing, Thaler appealed to the U.S. Court of Appeals for the District of Columbia.
On March 18, 2025, the D.C. Circuit upheld the Copyright Office as well as the District Court, holding that copyright protection under the 1976 Act cannot be granted to a work generated solely by artificial intelligence.
Notably, this ruling does not exclude AI-assisted works from protection; it merely confirms that a human must exercise genuine creative control. The key question now is how much human input is necessary to qualify as the author—a point the court left open for future clarification.
Here are the key takeaways:
Human Authorship Is Mandatory. The court held that the Copyright Act of 1976 requires an “author” to be a human being. Works generated solely by AI—where AI is listed as the sole creator—do not qualify. Under the Copyright Act “author” means human. A machine, including an AI system, is not a legal creator.
AI-Assisted Works May Still Be Protected. The court underscored that human creators remain free to use AI tools. Such works can be copyrighted, provided a person (not just AI) exercises creative control. This is consistent with the recently released Copyright Office Report on ‘Copyright and Artificial Intelligence (Part 2), which confirms that the use of AI tools to assist human creativity is not a bar for copyright protection of the output as long as there is sufficient human control over the expressive elements.

A Single Piece of American Cheese
In fact, on January 30, 2025, the Copyright Office registered A Single Piece of American Cheese, based on the “selection, coordination, and arrangement of material generated by artificial intelligence”. (Image at left). See How We Received The First Copyright for a Single Image Created Entirely with AI-Generated Material.
Work-Made-for-Hire Doesn’t Save AI-Only Authorship. Dr. Thaler’s argument that AI could be his “employee” under the work-for-hire doctrine failed because the underlying creation must still have a human author in the first place.
Waived Argument. Dr. Thaler tried to claim he was effectively the author by directing the AI. The court found he had not properly raised this argument at the administrative level and therefore declined to consider it. This might have been his best argument, had he made it.
Policy Questions Left to Congress. While noting that new AI technologies could raise important policy issues, the court emphasized that it is for Congress, not the judiciary, to expand copyright beyond human authors.
Thaler v. Perlmutter (D.C. Cir. Mar. 20, 2025)
(For an earlier post on this case see: Court Denies Copyright Protection to AI Generated Artwork.)
by Lee Gesmer | Mar 20, 2025 | General
In October 2024 I created (probably not the right word – delivered?) a podcast using NotebookLM: An Experiment: An AI Generated Podcast on Artificial Intelligence and Copyright Law. The podcast that NotebookLM created was quite good, so I thought I’d try another one.
This is in the nature of experimentation, simply to explore this unusual AI tool.
This time the topic is the Oracle v. Google copyright litigation. I thought this would be a good topic to experiment with, since it is a complex topic and there are decisions by federal district court judge William Alsup (link), two Federal Circuit opinions (1, 2), and finally the Supreme Court decision. So, here goes.
Google v. Oracle: Copyright and Fair Use of Software APIs
. . . (May load a bit slowly – give it time).