One of the enduring mysteries of Internet law is the legality of “web scraping.” Although scraping is invisible to most users it pervades the Internet and constitutes a substantial volume of all Internet traffic.
Here are two examples of scraping: (1) Clearview AI has scraped billions of publicly available images from social media platforms and compiled them into a facial recognition database that it’s made available to law-enforcement and private industry. (2) hiQ Labs has scraped publicly available information from profiles on LinkedIn and used it to analyze and predict employees’ likelihood of seeking other employment
Both of these companies engaged in web scraping without permission. Did either of them violate any laws? This post will examine how the courts have treated hiQ’s actions, but the broader context, beyond that case, is that web scraping often is a legal enigma. As one commentator noted, “most often the legal status of scraping is characterized as something just shy of unknowable, or a matter entirely left to the whims of courts …. .” Sellars, Twenty Years of Web Scraping and the Computer Fraud and Abuse Act, p. 377 (link).
But first, what is web scraping? Here’s a definition provided by the Electronic Frontier Foundation: “web scraping is machine-automated web browsing that accesses and records the same information which a human visitor to the site might do manually.” Typically this function, also called data scraping, is performed by an Internet bot, or simply “bot,” a software program that runs automated tasks (scripts) over the Internet. A well-known example of this is Google, which uses its web scraper, “Googlebot,” to collect data from the Internet that is then indexed for searching via Google’s Internet search software.
Internet scraping is often done without the permission of the people and companies that post information on websites. Courts and legal commentators have identified many legal claims that an aggrieved party might assert against scrapers. These include trespass to chattels, copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract and breach of privacy. See Sobel, A New Common Law of Web Scraping (link).
Most of these claims remain theoretical and untested in the courts. However, there is one law that has been used successfully to challenge scraping – the Computer Fraud and Abuse Act, 18 U.S. Code §1030, or the “CFAA.” (link)
The Computer Fraud and Abuse Act
The CFAA – the federal anti-hacking law – imposes civil and criminal liability for certain acts of computer trespass. The hiQ/LinkedIn case focused on the CFAA’s “without authorization” provision. This section of the law imposes liability on ”[w]hoever … intentionally accesses a computer without authorization … and thereby obtains … information ….”
The CFAA applies to any computer connected to the Internet. Therefore, the CFAA may be violated when someone accesses a website “without authorization.” However, the words “without authorization” are undefined, leaving it to the courts to decide how they should be applied.
What if a company scrapes data from a website that has required it to agree to contractual terms and conditions that bar scraping? In this case it may have acted “without authorization” and therefore violated the CFAA. At the very least, it will be in breach of contract.
But what if the website is public facing – that is, it makes information available to visitors without the use of a password – and the site owner demands that it stop? Are the scraper’s actions now “without authorization”? That was the issue the Ninth Circuit recently decided in hiQ Labs, Inc. v. LinkedIn Corp. (9th Cir. April 18, 2022).
hiQ Labs v. LinkedIn Corp.
hiQ Labs v. LinkedIn has an unusual legal posture and complex history spanning five years. hiQ, a corporate data analytics company, uses automated software to collect information that LinkedIn users share on their public profiles. LinkedIn.com is a public facing website whose users own the information they provide to LinkedIn. LinkedIn tried to stymie hiQ with IP blocking, but this proved unsuccessful. LinkedIn then demanded that hiQ stop scraping its site, asserting that it violated the CFAA. After receiving this demand hiQ filed suit on the theory of tortious interference, seeking a declaratory judgment that LinkedIn could not lawfully invoke the CFAA or use technological measures to stop it from scraping LinkedIn.com.
hiQ scored an initial victory before the district court – the court issued a preliminary injunction ordering LinkedIn to withdraw its cease-and-desist letter and remove any existing technical barriers to hiQ’s access to public profiles.
LinkedIn appealed, leading to a decision by the Ninth Circuit, a Supreme Court appeal and remand (remanded in light of Van Buren, below, without opinion), and a second decision by the Ninth Circuit. At all times during this convoluted procedural history the central question was whether, once LinkedIn demanded that hiQ cease scraping the site, any further scraping of LinkedIn’s data was “without authorization” in violation of the CFAA. Simply put, once LinkedIn demanded that hiQ stop scraping did hiQ violate the CFAA by failing to comply?
After much discussion, including analysis of the wording of the statute, case precedents and the legislative history, the Ninth Circuit upheld the preliminary injunction in hiQ’s favor, stating –
the CFAA’s prohibition on accessing a computer “without authorization” is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA. The data hiQ seeks to access … has not been demarcated by LinkedIn as private using … an authorization system. hiQ has therefore raised serious questions about whether LinkedIn may invoke the CFAA to preempt hiQ’s possibly meritorious tortious interference claim.
The court referenced the “gates-up-or-down” inquiry that the Supreme Court established in Van Buren v. United States (USSC 2021), which involved the “exceeds authorized access” prong of the CFAA (not at issue in LinkedIn):
In other words, applying the ‘gates’ analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place. Van Buren therefore reinforces our conclusion that the concept of ‘without authorization’ does not apply to public websites.
And, the court touched on an antitrust-flavored policy rationale that supported hiQ’s access:
. . . the public interest favors hiQ’s position. . . . giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.
Based on this reasoning the Ninth Circuit refused to dissolve the preliminary injunction against LinkedIn, sending the case back to the district court for further proceedings.
What does the decision mean for data aggregators and researchers who use bots to “scrape” information from public facing websites? With some qualifications it is a win for both non-profit researchers and for-profit companies like hiQ and Clearview AI, who seek to scrape and exploit data commercially. Under this decision a public facing, “gates up” website cannot use the CFAA to demand that a scraper stop.
However, there are limits. For example, aggregators need to be careful not to copy expression that may be protected by copyright. That was not an issue in the LinkedIn case, since users retain ownership of their profiles, and therefore LinkedIn has no copyright interest in the data contributed by its users.
The ruling creates an incentive for websites to shield information behind a log-in page and terms and conditions barring data scraping, although whether this constitutes a violation of the CFAA in every instance remains uncertain.
State law causes of action, particularly common law trespass to chattels, are an undeveloped but possibly viable theory for websites seeking to block scraping. The Ninth Circuit called this out specifically: ”it may be that web scraping exceeding the scope of the website owner’s consent gives rise to a common law tort claim for trespass to chattel.” Whether this theory will hold water remains to be seen. In the meantime, idiosyncratic state laws may come into play. Clearview AI – the company that created a facial recognition database – has been targeted for violation of the Illinois Biometric Information Privacy Act. (link)
Lastly, the Ninth Circuit is only one of many federal circuits, and other circuits may disagree with the conclusion reached in this decision. The day may come when the Supreme Court decides the legality of web scraping of public facing data under the CFAA. LinkedIn v. hiQ is ongoing, and perhaps this very case will end up back before the Supreme Court.
A Postscript on Technical Barriers
Any discussion of this case would remain incomplete without a mention of the technical barriers issue. Recall that the trial court ordered LinkedIn to remove any existing technical barriers to hiQ’s access to public profiles. Specifically, the trial court entered a preliminary injunction enjoining LinkedIn from “blocking or putting in place any [technical] mechanism with the effect of blocking hiQ’s access to LinkedIn member public profiles.” This included IP address blocking, which LinkedIn had attempted before the lawsuit began. The trial court entered this order based on the doctrines of tortious interference and unfair competition.
This order was upheld by the Ninth Circuit on both appeals.
If this aspect of the case has you puzzled you are not alone. If you are a technologist you may be wondering why, even if LinkedIn didn’t have a CFAA claim against hiQ, it should be unable to take efforts to attempt to block hiQ from accessing its site. If you’re a lawyer, you know that claims of tortious interference and unfair competition are difficult to maintain, and you may be wondering how LinkedIn’s IP blocking could be a violation of either doctrine.
None of the courts that have issued rulings in this case have addressed these issues, other than in passing.
Now that the case is back in the district court this aspect of the case – which should be of ongoing interest to both the technical and legal Internet communities – will likely receive further attention.
LinkedIn v. hiQ (9th Cir. Apri 18, 2022)
This week’s internal report by MIT on its handling of the Aaron Swartz case may be an appropriate time to note that the sound and fury over the Computer Fraud and Abuse Act (the “CFAA”) is not limited to its use in criminal cases like the Swartz prosecution. The controversy extends to the use of this law in civil cases as well.*
*The CFAA may be used as either a civil or a criminal law. However, the words of the statute must mean the same thing in each context. As the court noted in the case discussed in this post, “it is not possible to define authorization narrowly for some CFAA violations and broadly for others.”
In my July 2nd post on AMD v. Feldstein I noted that the case had given rise to two note-worthy decisions. The May 15, 2013 decision, discussed in that post, involved the legalities of the former-AMD employees’ alleged solicitation of current AMD employees in violation of non-solicitation agreements. However, Massachusetts Federal District Court Judge Timothy Hillman issued a second opinion in the case on June 10, 2013, ruling on the defendant-employees’ motion to dismiss claims of civil liability under the CFAA.
Judge Hillman’s June 10th opinion reflects the struggle within the federal courts nationally over how to apply the CFAA. The controversy focuses on the section of the law that imposes criminal and civil penalties on –
whoever … intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains … information from any protected computer. 18 U.S.C. § 1030(a)(2)(C).*
*As Judge Hillman observed, “The breadth of this provision is difficult to understate.”
I discussed the courts’ varied interpretations of this provision in some detail in an August 2012 post. As I stated there, “the issue, on which the federal courts cannot agree, is whether an employee who has authorized access to a computer, but uses that access for an illegal purpose — typically to take confidential information in anticipation of resigning to start a competing company or join one — violates the CFAA.” In that post I discussed several of the conflicting decisions over this issue, including the Ninth Circuit’s influential en banc holding in U.S. v. Nosal.
In considering how to apply the CFAA in this case Judge Hillman described the two possible interpretations of the CFAA. The first, a “technological model of authorization,” requires that an employee violate a technologically implemented barrier in order for his actions to give rise to a violation. Under this model, which the court described as the “narrow interpretation,” the employee would have to (for example) use someone else’s login credentials to access his employer’s computer, or break into (hack) the computer. Another way to view the “narrow interpretation” of the CFAA is that the phrase “without authorization” in the statute is limited to outsiders who do not have permission to access the employer’s computer in the first place.
Under the second model, described as the “broader interpretation,” the employee would be liable under the CFAA if he used a valid password to access information for an improper purpose, for example, obtaining confidential or trade secret information that will be provided to the employee’s new employer, or to a competitor of the current employer.
As Judge Hillman noted, the only First Circuit case to address the scope of the CFAA – EF Cultural Travel BV v. Explorica, Inc. – only created uncertainty over how to apply the statute. At least one Massachusetts district court judge has viewed EF Cultural as an endorsement of the broader interpretation. Guest-Tek Interactive Entm’t, Inc. v. Pullen (Gorton, J. 2009).
Judge Hillman, however, disagreed, distinguishing EF Cultural and concluding that the facts in that case shift it into the realm of “access that exceeds authorization” rather than permitted access for an unauthorized purpose. In what seems to be a clear nod to the Ninth Circuit in Nosal, he held that “as between a broad definition that pulls trivial contractual violations into the realm of federal criminal penalties, and a narrow one that forces the victims of misappropriation and/or breach of contract to seek justice under state, rather than federal, law, the prudent choice is clearly the narrower definition.”*
*A recent case from the district of New Hampshire appears to have reached the same conclusion. Wentworth-Douglas Hospital v. Young & Novis Prof. Assoc. (June 29, 2012).
This is the conclusion that the former AMD employee-defendants wanted the court to reach on their motion to dismiss the CFAA count. However, it turned out to be somewhat of a pyrrhic victory for them.
Although the judge found that “the narrower interpretation” of the CFAA “is preferable” and found that AMD’s “allegations are … insufficient to sustain a CFAA claim under a narrow interpretation of the CFAA,” he declined to dismiss the CFAA claims against the defendants, noting that “this is an unsettled area of federal law, and one where the courts have yet to establish a clear pleading standard.” Instead, he deferred action on this claim until the factual record is complete, meaning the conclusion of discovery.
And, if it was the defendants’ hope to dismiss the federal CFAA claim in order to force the case into state court (which is unclear, since AMD also pleaded diversity jurisdiction), not only did they fail to accomplish this by means of this motion, but the court went so far as to note that “even if the CFAA claims are dismissed at a later date, the pendant state law claims need not be remanded to state courts. … It would be extremely inefficient to remand this case to state court given the quantity of evidence already presented to this Court.”
Bottom line: the defendants won their central legal argument, but did not get their reward. And, this section of the CFAA remains in limbo in the First Circuit, pending final word from the First Circuit Court of Appeals on its proper interpretation.
Advanced Micro Devices, Inc.v. Feldstein
Yet another “data scraping” case is percolating in the Northern District of California. Craigslist has sued the online aggregator 3Taps, Inc. (and others), claiming that they illegally copied Craigslist’s classified apartment listings. In effect, 3Taps was attempting to disintermediate Craigslist—to insert itself between Craigslist and its users.
3Taps filed a motion to dismiss the multiple claims asserted in the suit, most of which was denied in the decision linked below.
Of particular interest is the court’s refusal to dismiss Craigslist’s claim that 3Taps violated the Computer Fraud and Abuse Act (CFAA), a controversial federal “anti-hacker” statute that has been interpreted in conflicting ways by the federal courts (see an earlier post on this topic here), and which was the law Aaron Schwartz was accused of violating (contributing, many believe, to his suicide earlier this year).
The CFAA permits a civil cause of action against any person who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains . . . information from any protected computer.” 18 U.S.C. § 1030(a)(c). Craigslist has alleged that 3Taps’ use of Craigslist’s listings violated a cease and desist letter Craigslist sent to 3Tap prohibiting it from republishing the listings. The court found that 3Taps’ continued use was a potential violation of the CFAA as the Ninth Circuit has interpreted that statute, and denied 3Taps’ motion to dismiss that claim.
This ruling is consistent with decisions in the First Circuit, the federal circuit which includes Massachusetts. (See, e.g., EFF Cultural Travel v. Explorica, 1st Cir. 2001).
Although the California district court denied 3Tap’s motion to dismiss the CFAA claim, it expressed its concerns around the policy issues raised by Craigslist’s CFAA claim in this case. Quoting from the opinion:
The parties have not addressed a threshold question of whether the CFAA applies where the owner of an otherwise publicly available website takes steps to restrict access by specific entities, such as the owner’s competitors. “Some commentators have noted that suits under anti-hacking laws have gone beyond the intended scope of such laws and are increasingly being used as a tactical tool to gain business or litigation advantages.” … The CFAA was passed in 1986, well before the development of the modern internet, and originally only covered certain computers operated by the federal government or financial institutions. … Although courts in this district have held that the CFAA may apply to unauthorized access to websites, the parties have not cited a case from this district or the Ninth Circuit addressing its application to information that is generally available to the public. … Applying the CFAA to publicly available website information presents uncomfortable possibilities. Any corporation could subject its competitors to civil and criminal liability for visiting its otherwise publicly available home page; in theory, a major news outlet could seek criminal charges against competing journalists for reading articles on its website.
These comments are a reflection of the Ninth Circuit’s concerns about the application of this statute, as described in the Ninth Circuit’s high-profile 2012 en banc decision in U.S. v. Nosal.
Last year (when this suit was filed) Professor Eric Goldman provided a trenchant analysis of the business issues facing Craigslist which motivated it to bring this lawsuit. He concluded, “even though it might look like Craigslist is making bizarre moves, I think its moves are quite rational. They’re exactly the kind of moves you’d expect from a panicked company realizing its uncomfortably precarious marketplace position.”
Craigslist v. 3Taps (N. D. Cal. April 29, 2013)
Last month I wrote a post titled “Online Agreements – Easy To Get Right, Easy To Get Wrong.” In that post I discussed two cases in which the plaintiff had failed to take appropriate steps to necessary to impose terms and conditions on its customers.
A recent case decided by the federal district court for the District of Pennsylvania provides yet another example of how sloppy online contracting can doom a claim based on an online agreement.
The case, CollegeSource, Inc. v. AcademyOne, Inc., (E.D. Pa. October 25, 2012), involves the practice colloquially referred to as “screen scraping” — that is, copying information from displayed webpages, usually in large quantities for commercial use. See, e.g., Ef Cultural Travel Bv v. Explorica , 274 F.3d 577 (1st Cir. 2001) (describing screen scraping).
It’s easy — legally and technically — to prevent this by prohibiting it in the site’s online terms and conditions. Doing so allows the site owner to assert not only state-law breach of contract, but the potentially more advantageous federal Computer Fraud and Abuse Act (“CFAA”). However, the site user must agree to the terms and conditions.
Unfortunately for CollegeSource, it didn’t get this quite right. Specifically, CollegeSource offered three services. Two of the services required that the user accept a “browsewrap” subscription agreement that expressly prohibited scraping (“you agree not to . . . scrape or display data from the Content for use on another web site or service”). However, the third service did not require users to agree to this restriction. Think of this as two doors locked, one open. CollegeSource’s contract-based argument that the subscription agreement applied to the third service failed to persuade the district court judge. The result: no breach of contract and no violation of the CFAA.
CollegeSource, Inc. v. AcademyOne, Inc.
Yet another federal appeals court has attempted to parse the Computer Fraud and Abuse Act’s (“CFAA”) ambiguous statutory language. The issue, on which the federal courts cannot agree, is whether an employee who has authorized access to a computer, but uses that access for an illegal purpose — typically to take confidential information in anticipation of resigning to start a competing company or join one — violates the CFAA.
The controversy is focused on the words “without authorization” and “exceeds authorized access” in the law:
[Whoever] knowingly and with intent to defraud, accesses a protected computer without authorization, or exceeds authorized access, and by means of such conduct furthers the intended fraud and obtains anything of value … shall be punished. 18 U.S.C. § 1030(a)(4).
Late last year, in a widely noted decision, the 9th Circuit adopted the “narrow” view of the CFAA, holding the law does not extend to an employee who has authorized access but uses that access to make unauthorized use. U.S. v. Nosal (en banc).
In late July the Fourth Circuit issued a decision in WEC Carolina Energy Solutions v. Miller, agreeing with Nosal and holding that conduct by an employee that violates the employer’s “use policy” (typically contained in an employee manual, handbook or “computer use policy”) does not give rise to a violation of the CFAA. As Fourth Circuit stated, “we reject an interpretation of the CFAA that imposes liability on employees who violate a use policy, choosing instead to limit such liability to individuals who access computers without authorization or who obtain or alter information beyond the bounds of their authorized access.”
Under the Fourth Circuit’s interpretation of the statute, (1) without authorization refers to a situation where someone is not authorized to access a computer and accesses it, and (2) exceeds authorized access applies when someone has “approval to access a computer, but uses his access to obtain or alter information that falls outside the bounds of his approved access.”
Under the “broad” view of the CFAA, which has been rejected by the Ninth and Fourth Circuits, employees who have authorized access to a computer, but who exceed the scope of that access, are subject to liability under the statute. The First Circuit, where I practice, has adopted this view of the law. EF Cultural Travel v. Explorica (2001).
There is now a clear circuit conflict over the interpretation of this law. The Ninth and Fourth Circuits read it narrowly, and several other circuits (including the First), apply it broadly. Often, a circuit split over the meaning of a federal statute provides a basis for the Supreme Court to grant review and break the tie. The betting is that this will occur here.
Why does any of this matter? Because in a civil case it enables a plaintiff to get a case that typically rests on state claims, such as conversion, misappropriation of trade secrets or breach of fiduciary duty, into federal court, a venue often preferred by plaintiffs.
Update: The government will not appeal the 9th Circuit decision in the Nosal case. Link to Motion for Issuance of Mandate here.
I’ve posted the slides from a CLE talk I gave on Wednesday, April 25th. Hopefully, the slides are informative standing alone. They address the very recent DMCA decisions by the 9th Circuit (Veoh) and 2nd Circuit (Youtube), the copyright “first sale” doctrine as applied to digital files in the Redigi case pending in SDNY, and recent trademark “keyword advertising” cases decided in the 4th and 9th Circuits (Rosetta Stone in the 4th Circuit, Network Automation and Louis Vuitton in the 9th). There are also some slides devoted to the CFAA, including the 9th Circuit’s en banc decision in the Nosal case.
If the embedded Scribd document doesn’t appear on your computer directly below, click here to go directly to Scribd
Copyright and Trademark Issues on the Internet
A decision in Jagex v. Impulse Software, issued by Massachusetts U.S. District Court Judge Gorton in August, has some interesting (albeit not nonobvious) lessons for software developers seeking to protect their websites from copying or reverse engineering. The decision arises in the context of a preliminary injunction – a request by Jagex to provide it with legal relief at the outset of the case, before discovery or trial – so Jagex may yet prevail in this case, particularly since most of the reasons the court denied it relief that this stage can be corrected before the case progresses much further.
The plaintiff, Jagex operates an online role-playing game called “Runescape.” Runescape is a “massively multiplayer online role-playing game” (MMORPG for short, but we’ll just call it “the game”).
Impulse offers online cheat tools – software that lets users advance through the levels of the game without actually playing the game. Moving to higher and more challenging levels is the goal of the game, and the Impulse software allows users to reach those hallowed grounds without investing the time and effort the game expects users to endure. And, it is possible to invest a great deal of time and effort with this game – Judge Gorton noted that the top three Runescape players averaged about 20,000 hours of playing time.
Pamela Jones lays out the legal issues in this case on Groklaw, here, where she links to many key documents, and embeds the EFF’s amicus brief, in its entirety.
I was trying to figure out how to explain to you all that is involved in the case of the U.S. v. Lori Drew, the cyberbullying case that so many lawyers are expressing concerns about. I felt I needed a lawyer to explain it, but where would I find one who felt like doing some unpaid work, and over the Thanksgiving holiday to boot?
Then I had a brainstorm. I could show you the amicus brief [PDF] submitted in the case by the Electronic Frontier Foundation, the Center for Democracy and Technology, and Public Citizen, which was also signed by “14 individual faculty members listed in Appendix A who research, teach and write scholarly articles and books about internet law, cybercrime, criminal law and related topics at law schools nationwide”. Appendix A is at the very end. If you look at the list, you’ll see that it’s some of the finest and most knowledgeable lawyers and law professors specializing in cyberlaw. The brief was written by Jennifer Granick of EFF and Philip R. Malone of Harvard Law School’s Berkman Center for Internet and Society’s Cyberlaw Clinic.
I think when you read it, it will turn your hair white. Continue reading ….