by Lee Gesmer | Mar 17, 2025 | Copyright, DMCA/CDA
In my recent post—Postscript to my AI Series – Why Not Use the DMCA?—I discussed early developments in two cases pending against OpenAI in the U.S. District Federal District Court for the Southern District of New York (SDNY). Both cases focus on the claim that in the process of training its AI models, OpenAI illegally removed “copyright management information.” And, as I discuss below, they reach different outcomes.
What Is Copyright Management Information?
Many people who are familiar with the Digital Millennium Copyright Act’s (DMCA) “notice and takedown” provisions are unfamiliar with a part of the DMCA that makes it illegal to remove “copyright management information,” or “CMI.”
CMI includes copyright notices, information identifying the author, and details about the terms of use or rights associated with the work. It can be visible directly on the work, or metadata in the underlying code.
The CMI removal statute—Section 1202(b)(1) of the DMCA—is a “double scienter” law, requiring that a plaintiff prove that (1) CMI was intentionally removed from a copyrighted work, and (2) that the alleged infringer knew or had reasonable grounds to know that the removal of CMI would “induce, enable, facilitate, or conceal” copyright infringement.
Here is an example of how this law might work.
Assume that I have copied a work and that I have a legitimate fair use defense. However, assume further that I duplicated the work, removed the copyright notice and published the work without it. I have a fair use defense as to duplication and distribution, but could I still be liable for CMI removal?
The answer is yes. A violation of the DMCA is independent of my fair use defense. And, the penalty is not trivial. Liability for CMI removal can result in statutory damages ranging from $2,500 to $25,000 per violation, as well as attorneys’ fees and injunctive relief. Moreover, unlike infringement actions, a claim for CMI removal does not require prior registration of the copyright.
All of this adds up to a powerful tool for copyright plaintiffs, a fact that has not been lost on plaintiffs’ counsel in AI litigation.
CMI – Why Don’t AI Companies Want To Include It?
AI companies’ removal of CMI during training stems from both technical necessities and strategic considerations. From a technical perspective, large language model training requires standardized data preparation processes that typically strip metadata, formatting, and peripheral information to create uniform training examples. This preprocessing is fundamental to how neural networks learn from text—they require clean, consistent inputs to identify linguistic patterns effectively.
The computational overhead is also significant. Preserving and processing CMI for billions of training examples would increase storage requirements and computational costs. AI companies argue that this additional information provides minimal benefit to model performance while significantly increasing training complexity.
Content owners, however, contend that these technical justifications mask more strategic motivations. They argue that AI companies deliberately eliminate attribution information to obscure the provenance of training data, making it difficult to detect when copyrighted material has been incorporated into models. This removal, they claim, facilitates a form of “laundering” copyrighted content through AI systems, where original sources become untraceable.
More pointedly, content creators assert that CMI removal directly enables downstream infringement by making it impossible for users to identify when an AI output derives from or reproduces copyrighted works. Without embedded attribution information, neither the AI company nor end users can properly credit or license content that appears in generated outputs.
The technical reality and legal implications of this process sit at the heart of these emerging cases, with courts now being asked to determine whether standard machine learning preprocessing constitutes intentional CMI removal under the DMCA’s “double scienter” standard.
Raw Story Media v. OpenAI
In the first of the two SDNY cases, Raw Story Media v. OpenAI, federal district court judge Colleen McMahon dismissed Raw Story’s claim that when training ChatGPT, OpenAI had illegally removed CMI.
At the heart of Judge McMahon’s decision was her observation that although OpenAI removed CMI from Raw Story articles, Raw Story was unable to allege that the works from which CMI had been removed had ever been disseminated by ChatGPT to anyone. On these facts, Judge McMahon held that Raw Story lacked standing under the Article III standing principles established by the Supreme Court in Transunion v. Ramirez (2021). It’s worth noting her observation that “the likelihood that ChatGPT would output plagiarized content from one of Plaintiffs’ articles seems remote” based on “the quantity of information contained in the [AI model].”
The Intercept Media v. OpenAI
In the second case, The Intercept Media v. OpenAI, The Intercept made the same allegation. It asserted that OpenAI had intentionally removed CMI—in this case authors, copyright notices, terms of use and title information—from its AI training set.
However, in this case Judge Jed Rakoff came to the opposite conclusion. In November 2024 he issued a bottom-line order declining to dismiss plaintiff’s CMI claim and stated that an opinion explaining his rationale would be forthcoming.
That opinion was issued on February 20, 2025.
At this early stage of the case (before discovery or trial) the judge found that The Intercept met the “double scienter” standard. As to the first part of the test, The Intercept alleged that the algorithm that OpenAI uses to build its AI training data sets can only capture an article’s main text, which excludes CMI. This satisfied the intentional removal element.
As to the second component of the standard, the court was persuaded by The Intercept’s theory of “downstream infringement,” which argues that OpenAI’s model might enable users to generate content based on The Intercept’s copyrighted works without proper attribution. And importantly, unlike in Raw Story, The Intercept was able to provide examples of verbatim regurgitation of its content from ChatGPT based on prompts from The Intercept’s data scientist.
The district court held that a copyright injury “does not require publication to a third party,” finding unpersuasive OpenAI’s argument that the Intercept failed to demonstrate a concrete injury because it had not conclusively established that users had actually accessed The Intercept’s articles via ChatGPT.
Curiously, Judge Rakoff’s decision failed to mention the earlier ruling in Raw Story Media, Inc. v. OpenAI, where Judge McMahon held, on similar facts, that the plaintiffs lacked standing to assert removal of CMI claims. Both cases were decided by SDNY district court judges. However, unlike the ruling in Raw Story Media Judge Rakoff concluded that The Intercept’s alleged injury was closely related to the property-based harms typically protected under copyright law, satisfying the Article III standing requirement.
Thus, while Raw Story’s CMI claims against OpenAI have been dismissed, The Intercept’s CMI removal case against OpenAI will proceed.
by Lee Gesmer | Feb 17, 2025 | General
The community of copyright AI watchers has been eagerly awaiting the first case to evaluate the legality of using copyright-protected works as training data. We finally have it, and it has a lot of copyright law experts scratching their heads and wondering what it means for the AI industry.
On February 11, 2025, Third Circuit federal appeals court Judge Stephanos Bibas—sitting by designation in the U.S. District Court for the District of Delaware—issued a decision that is likely to shape the future of AI copyright litigation. By granting partial summary judgment to Thomson Reuters Enterprise Centre GmbH (“Thomson Reuters”) against Ross Intelligence Inc. (“Ross”), the court revisited and reversed its earlier 2023 opinion and rejected Ross’s fair use defense. Although this case involves a non-generative AI application, the reasoning has implications for the more than 30 ongoing AI copyright cases currently being litigated.
Case Overview
The Ross litigation centers on allegations that Ross used copyrighted material from Thomson Reuters’ Westlaw—a leading legal research platform—to train its AI-driven legal research tool. Ross wanted to use the Westlaw headnotes to train its AI model, but Thomson Reuters would not grant Ross a license. Instead, Ross commissioned “Bulk Memos” from a third-party provider. These memos, designed to simulate legal questions and answers, closely mirrored Westlaw headnotes—concise summaries that encapsulate judicial opinions. After determining that 2,243 headnotes were substantially similar to the Westlaw headnotes the court held that this was direct copyright infringement and rejected Ross’s fair use defense.
Breaking Down the Fair Use Analysis
The court evaluated the four statutory fair use factors, with two—“purpose and character” and “market effect”—proving decisive:
1 – Purpose and Character of the Use: The court found that Ross’s use was commercial and aimed at developing a product that directly competes with Westlaw. Despite Ross’s argument that its copying was merely an “intermediate step” in a broader process, the judge rejected the intermediate copying cases (discussed below), emphasizing that “Ross was using Thomson Reuters’s headnotes as AI data to create a legal research tool to compete with Westlaw.” Importantly, the court’s analysis was informed by the framework established in the recent Supreme Court decision in Warhol v. Goldsmith, which stressed that reproduction fails to constitute a transformative use if the copying serves a similar market function as the original. The Warhol precedent underlines that transformation requires a “further purpose or different character” from the original work, a requirement Ross did not meet.
2 – Market Effect: The market effect factor proved even more influential. By positioning itself as a direct substitute for Westlaw, Ross both disrupted the existing market and undercut potential licensing markets for Thomson Reuters’s content (notwithstanding that Thomson refused to license to Ross). The court noted that any harm to this market—“undoubtedly the single most important element of fair use”—weighed decisively against Ross.
While the factors addressing the nature of the copyrighted work and the amount used modestly favored Ross, they were insufficient to overcome the adverse findings regarding the purpose of the use and market harm.
The Court’s 2023 Ruling vs. The Current Ruling
It’s worth noting the struggle the judge went through in deciding the fair use issue in this case. Judges rarely reverse themselves on major rulings, but that’s what happened here.
As I noted, the judge in this case had issued a 2023 decision on the fair use issue. There, he held that the question of whether Ross’s use of the West headnotes was fair use to be a jury issue.
In the current decision he reversed himself.
Here’s what the judge said in 2023:
If Ross’s characterization of its activities is accurate, it translated human language into something understandable by a computer as a step in the process of trying to develop a “wholly new,” albeit competing, product—a search tool that would produce highly relevant quotations from judicial opinions in response to natural language questions. This also means that Ross’s final product would not contain or output infringing material. Under Sega [v. Accolade] and Sony [v. Connectix], this is transformative intermediate copying.
And here is what he said in his 2025 decision:
My prior opinion wrongly concluded that I had to send this factor to a jury. I based that conclusion on Sony and Sega. Since then, I have realized that the intermediate-copying cases [Sony, Sega] (1) are computer-programming copying cases; and (2) depend in part on the need to copy to reach the underlying ideas. Neither is true here. Because of that, this case fits more neatly into the newer framework advanced by Warhol. I thus look to the broad purpose and character of Ross’s use. Ross took the headnotes to make it easier to develop a competing legal research tool. So Ross’s use is not transformative. Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today.
This was a major change in direction, and it reflects the challenge the judge perceived in applying copyright fair use to artificial intelligence under the facts in this case.
Implications for Generative AI Litigation
The question on the minds of most copyright AI observers is, “what does this mean for the more than 30 copyright cases against frontier AI model developers—OpenAI, Google, Anthropic, Facebook, X/Twitter, and many others”?
My answer? In most cases, likely not much.
The 2025 Ross decision underscores that even intermediate copying can fall outside fair use when it ultimately facilitates the creation of a product that directly competes with the copyrighted work. For example, unlike Authors Guild v. Google Books, where the transformation involved enabled a unique search function without substituting for the original works, Ross’s use of headnotes was aimed squarely at developing an AI legal research tool that encroaches on Westlaw’s market. This market harm—central to fair use analysis—undermines the fair use defense by establishing that the copying, even if temporary or intermediate, has a direct commercial impact. The ruling aligns with recent precedents like Warhol, which require a truly transformative purpose rather than mere replication, thereby narrowing the scope of permissible intermediate copying in AI training contexts.
However, the case may not have much significance for most of the pending AI copyright cases. While the Ross decision tightens the fair use framework in situations where the end product directly competes with the original work, most current generative AI cases do not involve direct competition. Most generative AI systems produce entirely new content rather than serving as a substitute for the copyrighted materials used during training. As a result, the market harm and competitive concerns central to the Ross ruling may not be as relevant in these cases, and its impact on the broader generative AI landscape may be limited.
Conclusion
The ruling in Thomson Reuters v. Ross Intelligence sets an important precedent for how courts may evaluate the use of copyrighted works in AI training. Although fact-specific and limited to a non-generative AI context, the decision’s reliance on principles from the Warhol case—particularly the need for a transformative purpose and the critical weight of market impact—will likely influence future disputes, including those involving frontier generative AI models, particularly where the AI model competes with the owner of the training data.
Developers and content owners alike should take note: as the legal landscape adapts to the realities of AI, robust data sourcing strategies and a clear understanding of copyright limitations will be crucial. For companies working on generative AI, the challenge will be to innovate without replicating the competitive functions of existing copyrighted works—a balancing act that this decision has now brought into focus.
It’s also important to note that this ruling doesn’t end the case. There are remaining issues of fact that the judge reserved for trial. However, it appears that Ross Intelligence is bankrupt, and therefore may not have the financial resources to continue to trial. And, of course, Ross could appeal the trial judge’s rulings at the conclusion of the case, although it is questionable whether it will be able to do so for the same reason. It seems likely that this case will end here.
Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. (D. Del. Feb. 11, 2025)
by Lee Gesmer | Feb 1, 2025 | DMCA/CDA
There was a period from roughly 2010 to 2016 when it seemed like I was posting on the DMCA take-down system every few months. Many of these posts focused on the Viacom v. Youtube litigation in the Second Circuit. See here, here, here and here. This massive litigation ended with a settlement in 2014. Nevertheless, before the case settled the Second Circuit issued a significant decision, establishing an important precedent on application of the Digital Millennium Copyright Act.
The Second Circuit’s January 13, 2025 decision in Capitol Records v. Vimeo – written by Judge Pierre Leval, the Second Circuit’s widely acknowledged authority on copyright law – feels like déjà vu. Fifteen years after Capitol Records filed suit the court has reaffirmed and expanded upon the DMCA safe harbor principles it established thirteen years ago in YouTube. Yet the Vimeo decision addresses novel issues that highlight how both technology and legal doctrine have evolved since the YouTube era.
Building on Youtube’s Foundation
In its 2012 decision in Viacom v. YouTube, the Second Circuit ruled that to overcome an internet provider’s DMCA safe harbor protection requires copyright owners to show either that a platform had actual knowledge of specific infringements or that infringement would be “obvious to a reasonable person” – so-called “red flag knowledge.” Generalized awareness that infringement has occurred on a platform wasn’t enough. This framework has served as primary guidance during the explosive growth of user-generated content over the past decade.
Vimeo: New Technology, New Challenges
The Vimeo case presented similar issues but in a transformed technological landscape. Capitol Records asserted that Vimeo lost safe harbor protection because its employees interacted with 281 user-posted videos containing copyrighted music. While YouTube dealt with a nascent video-sharing platform, Vimeo involved a sophisticated service with established content moderation practices.
The court’s analysis of “red flag” knowledge builds on YouTube while providing important new guidance. Employee interaction with content through likes, comments, or featuring videos doesn’t create red flag knowledge. Copyright owners must now prove “specialized knowledge,” and basic copyright training or work experience isn’t enough to establish the expertise needed for this level of knowledge. Even obvious use of copyrighted music doesn’t create red flag knowledge given the complexity of fair use determinations, with the court specifically citing the recent Warhol case where copyright experts split on fair use analysis.
While YouTube focused primarily on knowledge standards, Vimeo tackles a critical question for modern platforms: how much content moderation is too much? The court held that basic curation—like featuring videos in “Staff Picks” or maintaining community standards—won’t strip safe harbor protection. It left open whether more aggressive moderation or encouraging specific types of potentially infringing content might cross the line.
See No Evil, Hear No Evil
However, the decision also creates incentives for platforms to minimize their oversight of copyrighted uploads to avoid triggering red flag liability: by limiting
active monitoring or interaction with user-generated content, platforms can reduce the risk of being deemed to have actual or red flag knowledge of infringement. This has the effect of reinforcing the DMCA’s notice-and-takedown framework as the primary mechanism for addressing copyright infringement. Platforms like Vimeo are likely to choose to rely more heavily on this reactive system rather than implementing robust preemptive measures.
AI and the Future of Safe Harbor
The Vimeo decision leaves open an increasingly important question: how will courts apply these standards as platforms adopt artificial intelligence for content moderation? While the court focused on human knowledge and interaction, modern platforms increasingly rely on automated systems to identify potential infringement. Future litigation will likely need to address whether AI-powered content recognition creates the kind of “specialized knowledge” that might lead to red flag awareness, and whether algorithmic promotion of certain content categories could constitute “substantial influence.”
While Vimeo expands on YouTube’s framework, both cases highlight a fundamental flaw in the DMCA safe harbor: the time and cost of litigation effectively nullifies its protections. YouTube took three years to resolve; Vimeo took fifteen. Without legislative clarity on key terms like “red flag knowledge” and “substantial influence,” copyright owners can continue using litigation costs as a weapon against small and mid-sized platforms—exactly what the DMCA was meant to prevent.
As technology advances, particularly in AI-powered content moderation, platforms must carefully balance robust content management with safe harbor compliance. The Vimeo decision provides valuable guidance while highlighting the need for continued evolution in DMCA safe harbor doctrine.
Capitol Records, LLC v. Vimeo, Inc. (2d Cir. Jan. 13, 2025)
by Lee Gesmer | Dec 30, 2024 | General
I’ve been belatedly reading Chris Miller’s Chip War, so I’m particularly attuned to U.S.-China relations around technology. Of course, the topic of Miller’s excellent book is advanced semiconductor chips, not social media apps. Nevertheless, the topic now occupying the attention of the Supreme Court and the president elect is the national security threat presented by a social media app used by an estimated 170 million U.S. users.
With Miller’s book as background I was interested when, on December 6, 2024, the D.C. Circuit Court of Appeals denied TikTok’s petitions challenging the constitutionality of the Protecting Americans from Foreign Adversary Controlled Applications Act. This statute, which was signed into law on April 24, 2024, mandated that TikTok’s parent company, ByteDance Ltd., divest its ownership of TikTok within 270 days or face a nationwide ban in the United States. The law reflected Congress’s concerns that ByteDance and, by extension, the Chinese government, constituted a national security threat due to concerns about data collection and potential content manipulation.
The effect of the D.C. Circuit’s decision is that ByteDance must divest itself of TikTok by January 19, 2025, the day before the Presidential inauguration.
It took TikTok and ByteDance only ten days – until December 16, 2024 – to file with the Supreme Court an emergency motion for injunction, pending full review by the Court. And it then took the Supreme Court only two days to treat this motion as a petition for a writ of certiorari, allow the petition and put the case on the Supreme Court version of a “rocket docket” – briefing must be completed by January 3, 2025, and the Court will hear oral argument on January 10th, giving it plenty of time to decide the issue in the nine days left until January 19th.
Enter Donald Trump. In a surprising twist, the former president – who initially tried to ban TikTok in 2020 – has filed an amicus brief opposing an immediate ban. He contends that the January 19th deadline improperly constrains his incoming administration’s foreign policy powers, and he wants time to negotiate a solution balancing security and speech rights:
President Trump is one of the most powerful, prolific, and influential users of social media in history. Consistent with his commanding presence in this area, President Trump currently has 14.7 million followers on TikTok with whom he actively communicates, allowing him to evaluate TikTok’s importance as a unique medium for freedom of expression, including core political speech. Indeed, President Trump and his rival both used TikTok to connect with voters during the recent Presidential election campaign, with President Trump doing so much more effectively. . . .
[Staying the statutory deadline] would . . . allow President Trump’s Administration the opportunity to pursue a negotiated resolution that, if successful, would obviate the need for this Court to decide these questions.
Trump Amicus Brief, pp. 2, 9
The legal issues are novel and significant. The D.C. Circuit applied strict scrutiny but gave heavy deference to national security concerns while spending little time on users’ speech interests. Trump raises additional separation of powers questions about Congress dictating national security decisions and mandating specific executive branch procedures.
This case isn’t just about one app. The case reflects deeper tensions over Chinese technological influence, data privacy, and government control of social media. The Court’s decision will likely shape how we regulate foreign-owned platforms while protecting constitutional rights in an interconnected world.
The January 10th arguments – if indeed they go forward on that date, given that president-elect Trump prefers they not – should be fascinating. At stake is not just TikTok’s fate, but precedent for how courts balance national security claims against free speech in the digital age.
________
Addendum:
The highly expedited schedule kept a lot of lawyers busy over the holidays. You can access the docket here. I count 22 amicus briefs, most filed on December 27. Reply briefs are due January 3.
Update: On January 17, 2025 the Court upheld the D.C. Circuit in a per curium decision holding the challenged provisions of the Protecting Americans from Foreign Adversary Controlled Applications Act do not violate petitioners’ First Amendment rights. (link to opinion) On January 20, 2025 President Trump signed an Executive Order instructing the Attorney General not to take any action on behalf of the United States to enforce the Act for 75 days from the date of the Order (link to Order).
by Lee Gesmer | Dec 11, 2024 | Copyright, DMCA/CDA
After reading my 3-part series on copyright and LLMs (start with Part 1, here) a couple of colleagues have asked me whether content owners could use the Digital Millennium Copyright Act (DMCA) to challenge the use of their copyright-protected content.
I’ll provide a short summary of the law on this issue, but the first thing to note is that the DMCA offers two potential avenues for content owners: Section 512(c)‘s widely-used ‘notice and takedown’ system and the lesser-known Section 1202(b)(1), which addresses the removal of copyright management information (CMI), like author names, titles, copyright notices and terms and conditions .
Section 1202(b)(1) – Removal or Alteration of CMI
First, let’s talk about the lesser-known DMCA provision. Several plaintiffs have tried an innovative approach under this provision, arguing that AI companies violated Section 1202(b)(1) by stripping away CMI in the training process.
In November, two federal judges in New York reached opposite conclusions on these claims. In Raw Story Media, Inc. v. OpenAI the plaintiff alleged that OpenAI had removed CMI during the training process, in violation of 1202(b)(1). The court applied the standing requirement established in Transunion v. Ramirez, a recent Supreme Court case that dramatically restricted standing to sue in federal courts to enforce federal statutes. The court held that the publisher lacked standing because it couldn’t prove that it had suffered “concrete harm” from the alleged CMI removal from ChatGPT. The court based this conclusion on the fact that Raw Story “did not allege that a copy of its work from which the CMI had been removed had been disseminated by ChatGPT to anyone in response to any specific query.” Absent dissemination Raw Media had no claim – under Transunion, “no concrete harm, no standing.”
But weeks later, in The Intercept Media v. OpenAI, a different judge issued a short order allowing similar claims to proceed. We are awaiting the opinion explaining his rationale.
The California federal courts have also been unwelcoming to 1202(b)(1) claims. In two cases – Anderson v. Stability AI and Doe 1 v. Gitub the courts dismissed 1202(b)(1) claims on the ground that the removal of CMI requires identicality between the original work and the copy, which the plaintiffs had failed to establish. However, the Github case has been certified for an interlocutory appeal to the Ninth Circuit, and that appeal is worth watching. I’ll note that the identicality requirement is not in the Copyright Act – it is an example of judge-made copyright doctrine.
Section 512(c) – Notice-and-Takedown
While you are likely familiar with the DMCA’s Section 512(c) notice-and-takedown system (think YouTube removing copyrighted videos or music), this law faces major hurdles in the AI context. A DMCA take-down notice must be specific about the location where the infringing material is hosted – typically a URL. In the case of an AI model the challenge is that data used by AI models is not accessible or identifiable, making it impossible for copyright owners to issue takedown notices.
Unsurprisingly, I can’t find any major AI case in which a plaintiff has alleged violation of Section 512(c).
Conclusion
The collision between AI technology and copyright law highlights a fundamental challenge: our existing legal framework, designed for the digital age of the late 1990s, struggles to address the unique characteristics of AI systems. The DMCA, enacted when peer-to-peer file sharing was the primary concern, now faces unprecedented questions about its applicability to AI training data.
Stay tuned.