One of the enduring mysteries of Internet law is the legality of “web scraping.” Although scraping is invisible to most users it pervades the Internet and constitutes a substantial volume of all Internet traffic.
Here are two examples of scraping: (1) Clearview AI has scraped billions of publicly available images from social media platforms and compiled them into a facial recognition database that it’s made available to law-enforcement and private industry. (2) hiQ Labs has scraped publicly available information from profiles on LinkedIn and used it to analyze and predict employees’ likelihood of seeking other employment
Both of these companies engaged in web scraping without permission. Did either of them violate any laws? This post will examine how the courts have treated hiQ’s actions, but the broader context, beyond that case, is that web scraping often is a legal enigma. As one commentator noted, “most often the legal status of scraping is characterized as something just shy of unknowable, or a matter entirely left to the whims of courts …. .” Sellars, Twenty Years of Web Scraping and the Computer Fraud and Abuse Act, p. 377 (link).
But first, what is web scraping? Here’s a definition provided by the Electronic Frontier Foundation: “web scraping is machine-automated web browsing that accesses and records the same information which a human visitor to the site might do manually.” Typically this function, also called data scraping, is performed by an Internet bot, or simply “bot,” a software program that runs automated tasks (scripts) over the Internet. A well-known example of this is Google, which uses its web scraper, “Googlebot,” to collect data from the Internet that is then indexed for searching via Google’s Internet search software.
Internet scraping is often done without the permission of the people and companies that post information on websites. Courts and legal commentators have identified many legal claims that an aggrieved party might assert against scrapers. These include trespass to chattels, copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract and breach of privacy. See Sobel, A New Common Law of Web Scraping (link).
Most of these claims remain theoretical and untested in the courts. However, there is one law that has been used successfully to challenge scraping – the Computer Fraud and Abuse Act, 18 U.S. Code §1030, or the “CFAA.” (link)
The Computer Fraud and Abuse Act
The CFAA – the federal anti-hacking law – imposes civil and criminal liability for certain acts of computer trespass. The hiQ/LinkedIn case focused on the CFAA’s “without authorization” provision. This section of the law imposes liability on ”[w]hoever … intentionally accesses a computer without authorization … and thereby obtains … information ….”
The CFAA applies to any computer connected to the Internet. Therefore, the CFAA may be violated when someone accesses a website “without authorization.” However, the words “without authorization” are undefined, leaving it to the courts to decide how they should be applied.
What if a company scrapes data from a website that has required it to agree to contractual terms and conditions that bar scraping? In this case it may have acted “without authorization” and therefore violated the CFAA. At the very least, it will be in breach of contract.
But what if the website is public facing – that is, it makes information available to visitors without the use of a password – and the site owner demands that it stop? Are the scraper’s actions now “without authorization”? That was the issue the Ninth Circuit recently decided in hiQ Labs, Inc. v. LinkedIn Corp. (9th Cir. April 18, 2022).
hiQ Labs v. LinkedIn Corp.
hiQ Labs v. LinkedIn has an unusual legal posture and complex history spanning five years. hiQ, a corporate data analytics company, uses automated software to collect information that LinkedIn users share on their public profiles. LinkedIn.com is a public facing website whose users own the information they provide to LinkedIn. LinkedIn tried to stymie hiQ with IP blocking, but this proved unsuccessful. LinkedIn then demanded that hiQ stop scraping its site, asserting that it violated the CFAA. After receiving this demand hiQ filed suit on the theory of tortious interference, seeking a declaratory judgment that LinkedIn could not lawfully invoke the CFAA or use technological measures to stop it from scraping LinkedIn.com.
hiQ scored an initial victory before the district court – the court issued a preliminary injunction ordering LinkedIn to withdraw its cease-and-desist letter and remove any existing technical barriers to hiQ’s access to public profiles.
LinkedIn appealed, leading to a decision by the Ninth Circuit, a Supreme Court appeal and remand (remanded in light of Van Buren, below, without opinion), and a second decision by the Ninth Circuit. At all times during this convoluted procedural history the central question was whether, once LinkedIn demanded that hiQ cease scraping the site, any further scraping of LinkedIn’s data was “without authorization” in violation of the CFAA. Simply put, once LinkedIn demanded that hiQ stop scraping did hiQ violate the CFAA by failing to comply?
After much discussion, including analysis of the wording of the statute, case precedents and the legislative history, the Ninth Circuit upheld the preliminary injunction in hiQ’s favor, stating –
the CFAA’s prohibition on accessing a computer “without authorization” is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA. The data hiQ seeks to access … has not been demarcated by LinkedIn as private using … an authorization system. hiQ has therefore raised serious questions about whether LinkedIn may invoke the CFAA to preempt hiQ’s possibly meritorious tortious interference claim.
The court referenced the “gates-up-or-down” inquiry that the Supreme Court established in Van Buren v. United States (USSC 2021), which involved the “exceeds authorized access” prong of the CFAA (not at issue in LinkedIn):
In other words, applying the ‘gates’ analogy to a computer hosting publicly available webpages, that computer has erected no gates to lift or lower in the first place. Van Buren therefore reinforces our conclusion that the concept of ‘without authorization’ does not apply to public websites.
And, the court touched on an antitrust-flavored policy rationale that supported hiQ’s access:
. . . the public interest favors hiQ’s position. . . . giving companies like LinkedIn free rein to decide, on any basis, who can collect and use data—data that the companies do not own, that they otherwise make publicly available to viewers, and that the companies themselves collect and use—risks the possible creation of information monopolies that would disserve the public interest.
Based on this reasoning the Ninth Circuit refused to dissolve the preliminary injunction against LinkedIn, sending the case back to the district court for further proceedings.
What does the decision mean for data aggregators and researchers who use bots to “scrape” information from public facing websites? With some qualifications it is a win for both non-profit researchers and for-profit companies like hiQ and Clearview AI, who seek to scrape and exploit data commercially. Under this decision a public facing, “gates up” website cannot use the CFAA to demand that a scraper stop.
However, there are limits. For example, aggregators need to be careful not to copy expression that may be protected by copyright. That was not an issue in the LinkedIn case, since users retain ownership of their profiles, and therefore LinkedIn has no copyright interest in the data contributed by its users.
The ruling creates an incentive for websites to shield information behind a log-in page and terms and conditions barring data scraping, although whether this constitutes a violation of the CFAA in every instance remains uncertain.
State law causes of action, particularly common law trespass to chattels, are an undeveloped but possibly viable theory for websites seeking to block scraping. The Ninth Circuit called this out specifically: ”it may be that web scraping exceeding the scope of the website owner’s consent gives rise to a common law tort claim for trespass to chattel.” Whether this theory will hold water remains to be seen. In the meantime, idiosyncratic state laws may come into play. Clearview AI – the company that created a facial recognition database – has been targeted for violation of the Illinois Biometric Information Privacy Act. (link)
Lastly, the Ninth Circuit is only one of many federal circuits, and other circuits may disagree with the conclusion reached in this decision. The day may come when the Supreme Court decides the legality of web scraping of public facing data under the CFAA. LinkedIn v. hiQ is ongoing, and perhaps this very case will end up back before the Supreme Court.
A Postscript on Technical Barriers
Any discussion of this case would remain incomplete without a mention of the technical barriers issue. Recall that the trial court ordered LinkedIn to remove any existing technical barriers to hiQ’s access to public profiles. Specifically, the trial court entered a preliminary injunction enjoining LinkedIn from “blocking or putting in place any [technical] mechanism with the effect of blocking hiQ’s access to LinkedIn member public profiles.” This included IP address blocking, which LinkedIn had attempted before the lawsuit began. The trial court entered this order based on the doctrines of tortious interference and unfair competition.
This order was upheld by the Ninth Circuit on both appeals.
If this aspect of the case has you puzzled you are not alone. If you are a technologist you may be wondering why, even if LinkedIn didn’t have a CFAA claim against hiQ, it should be unable to take efforts to attempt to block hiQ from accessing its site. If you’re a lawyer, you know that claims of tortious interference and unfair competition are difficult to maintain, and you may be wondering how LinkedIn’s IP blocking could be a violation of either doctrine.
None of the courts that have issued rulings in this case have addressed these issues, other than in passing.
Now that the case is back in the district court this aspect of the case – which should be of ongoing interest to both the technical and legal Internet communities – will likely receive further attention.