Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
Web pages are a treasure-trove of useful information for financial firms and software companies that are able to capture it using Web crawling (or scraping) technology. Yet, for over 20 years, courts have struggled to draw the line between the usefulness of such information and the rights of the content owners and website operators from which that content is derived. Once a niche issue, the increased use of this technology has compounded the disputes related to it.
In particular, website operators have used the Computer Fraud and Abuse Act (CFAA) to prevent crawling of their websites. While recent judicial opinions have harmonized the rules for accessing websites without authorization, the courts diverge as to whether the CFAA prohibits accessing otherwise publicly available information for an unauthorized purpose. Moreover, new Web crawling techniques are testing the limits of existing case law. Cf. Adrianne Jeffries, “How Google Eats a Business Whole,” Outline (April 17, 2017).
Increased Use of Web Crawling
Whether a finance firm engaged in quantitative analysis or a software company developing new search algorithms, technology-minded businesses are routinely and automatically accessing third-party websites every day using variations on Web crawling to gather content and information. Generally, they start with a seed list of Web pages from which they will request content, including HTML, text, image, and other files. Then, they copy the files and either extract specific data or the entirety of the files for later analysis.
For example, search engines generally identify hyperlinks and keywords from accessed Web pages, add that information to their database for later analysis to improve their search algorithm, and continue to move across the Internet looking for new sources of content. Technology-savvy businesses, however, continue to develop new uses for search technology. Thus, while early efforts may have involved creating databases of factual information or gathering contact information for marketing solicitations, all manner of uses have been developed, including follow-on and copy-cat services that repeatedly access competitors' platforms as part of their functionality. In addition to potential copyright issues (not discussed here), these new services may raise concerns for website operators if they disrupt the operators' services or damage their servers.
Legal Questions
Depending on how such Web crawling is conducted, it could implicate the rights of the content owner, as well as the website operator. Early cases analyzed Web crawling technology through the lens of trespass law and similar rights. See, eBay v. Bidder's Edge, 100 F. Supp. 2d 1058 (N.D. Cal. 2000); Am. Online v. LCGM, 46 F. Supp. 2d 444 (E.D. Va. 1998). It, however, did not take long for content owners to bring claims under copyright law and other intellectual property disciplines against those using Web crawling technology. See, Associated Press v. Meltwater, 931 F. Supp. 2d 537 (2013); Ticketmaster v. Tickets.Com, No. 99 Civ. 7654, 2003 WL 21406289 (C.D. Cal. March 7, 2003). Today, claims have been raised under a range of legal disciplines from breach of contract to trade secrets misappropriation to other forms of unfair competition.
One legal issue that merits particular attention is the CFAA, a criminal and civil statute that “prohibits acts of computer trespass by those who are not authorized users or who exceed authorized use.” Facebook v. Power Ventures, 844 F.3d 1058 (9th Cir. 2016). While some have attempted to limit the CFAA's reach by referring to it as merely an “anti-hacking” statute (see, United States v. Nosal, 844 F.3d 1024, 1049 (9th Cir. 2016) (Reinhardt, J., dissenting)), courts have found that Web crawling technology potentially can violate the Act. See, EF Cultural Travel BV v. Zefer, 318 F.3d 58 (1st Cir. 2003); CouponCabin v. Savings.com, No. 2:14 Civ. 39, 2017 WL 83337 (N.D. Ind. Jan. 10, 2017), 2016 WL 3181826 (N.D. Ind. June 8, 2016); Craigslist v. 3Taps, 942 F. Supp. 2d 962 (N.D. Cal. 2013); Snap-on Bus. Solutions v. O'Neil & Assocs., 708 F. Supp. 2d 669 (N.D. Ohio 2010).
All courts to have considered the issue agree that a company using Web crawling technology “can run afoul of the CFAA when he or she has no permission to access a computer or when such permission has been revoked explicitly.” Facebook v. Power Ventures, at 1067. The private right of action under the CFAA also requires that the plaintiff “suffer[] damages or loss,” 18 U.S.C. §1030(g), but loss has been broadly defined and may include the time that the website operators spends “analyzing, investigating, and responding to [the web crawler's] actions.” Id., at 1066. Permission can be revoked in a number of ways, including issuance of a cease-and-desist letter, implementing technological measures such as IP address blocking, or revoking login credentials. Id., at 1067; see also, Nosal, at 1036; CouponCabin, at 3.
The courts, however, differ in their approach to those that are given some access but have “exceeded the limits of their authorization” by retrieving material for unauthorized purposes. Facebook, at 1068. As the Ninth Circuit recently reaffirmed, it, as well as the Second and Fourth Circuits, have interpreted the CFAA such that these activities are not a violation. Id. (discussing United States v. Nosal, 676 F.3d 854 (9th Cir. 2012)); United States v. Valle, 807 F.3d 508 (2d Cir. 2015); WEC Carolina Energy Solutions v. Miller, 687 F.3d 199 (4th Cir. 2012). The First, Fifth, Eighth, and Eleventh Circuits, by contrast, extend potential liability to access that falls outside the “purposes for which access has been given.” United States v. John, 597 F.3d 263, 272 (5th Cir. 2010); see also, United States v. Teague, 646 F.3d 1119 (8th Cir. 2011); United States v. Rodriguez, 628 F.3d 1258 (11th Cir. 2010); EF Cultural Travel BV v. Explorica, 274 F.3d 577, 581-84 (1st Cir. 2001). While the Supreme Court recently considered application of the CFAA, it did not resolve this well-developed circuit split. Musacchio v. United States, 136 S. Ct. 709 (2016). Two petitions for certiorari are currently pending before the court: Power Ventures v. Facebook, No. 16-1105 (U.S.); Nosal v. United States, No. 16A840 (U.S.).
As a result, while a Web crawler that accesses websites without authorization or when authorization is revoked violates the CFAA, the courts might reach different results for a company that crawls Web pages that permit public access but prohibit Web crawling or other activities in which the company is engaged.
Practical Considerations
The growing interest in Web crawling among financial firms and software companies suggests that disputes will continue to arise as new technologies are developed. It is therefore important for those engaged in Web crawling to understand that simply because content and information can be found on the Internet, does not mean that all means of accessing it are permissible. Moreover, courts have held that accessing a website after authorization has been revoked is not permissible. See, Facebook v. Grunin, 77 F. Supp. 3d 965 (N.D. Cal. 2015).
That being said, while a careful Web crawler might want to review the Terms of Use of each website it intends to capture to confirm that Web crawling is permitted, the Ninth Circuit has expressed concern that requiring such careful analysis is not practical. Nosal, at 861.Thus, it has held that “violation of the terms of use of a website cannot itself constitute access without authorization.” Facebook, at 1068. The First Circuit, by contrast, has held that a “lack of authorization could be established by an explicit statement on the website restricting access,” such as terms of use, but even it has cautioned that “public policy might in turn limit certain restrictions.” EF Cultural Travel BV v. Zefer, at 62. Similarly, courts have considered whether use of technological measures, such as the Robots Exclusion Protocol (or robots.txt) — used by website operators to tell Web crawler programs what files or folders should not be visited — might be used as a proxy for such restrictions. See, QVC v. Resultly, 99 F. Supp. 3d 525, 540 (E.D. Pa. 2015); Healthcare Advocates v. Harding, Earley, Follmer & Frailey, 497 F. Supp. 2d 627, 648 (E.D. Pa. 2007).
Conclusion
The ongoing litigations referenced in this article and those filed in the future may provide greater clarity on the bounds of legal Web crawling. For now, businesses using these techniques should tread carefully lest they get caught in the CFAA's Web.
*****
Joshua L. Simmons is an intellectual property partner at Kirkland & Ellis. He can be reached at [email protected]. This article also appeared in the New York Law Journal, an ALM sibling of Internet Law & Strategy.
Web pages are a treasure-trove of useful information for financial firms and software companies that are able to capture it using Web crawling (or scraping) technology. Yet, for over 20 years, courts have struggled to draw the line between the usefulness of such information and the rights of the content owners and website operators from which that content is derived. Once a niche issue, the increased use of this technology has compounded the disputes related to it.
In particular, website operators have used the Computer Fraud and Abuse Act (CFAA) to prevent crawling of their websites. While recent judicial opinions have harmonized the rules for accessing websites without authorization, the courts diverge as to whether the CFAA prohibits accessing otherwise publicly available information for an unauthorized purpose. Moreover, new Web crawling techniques are testing the limits of existing case law. Cf. Adrianne Jeffries, “How
Increased Use of Web Crawling
Whether a finance firm engaged in quantitative analysis or a software company developing new search algorithms, technology-minded businesses are routinely and automatically accessing third-party websites every day using variations on Web crawling to gather content and information. Generally, they start with a seed list of Web pages from which they will request content, including HTML, text, image, and other files. Then, they copy the files and either extract specific data or the entirety of the files for later analysis.
For example, search engines generally identify hyperlinks and keywords from accessed Web pages, add that information to their database for later analysis to improve their search algorithm, and continue to move across the Internet looking for new sources of content. Technology-savvy businesses, however, continue to develop new uses for search technology. Thus, while early efforts may have involved creating databases of factual information or gathering contact information for marketing solicitations, all manner of uses have been developed, including follow-on and copy-cat services that repeatedly access competitors' platforms as part of their functionality. In addition to potential copyright issues (not discussed here), these new services may raise concerns for website operators if they disrupt the operators' services or damage their servers.
Legal Questions
Depending on how such Web crawling is conducted, it could implicate the rights of the content owner, as well as the website operator. Early cases analyzed Web crawling technology through the lens of trespass law and similar rights. See, eBay v. Bidder's Edge, 100 F. Supp. 2d 1058 (N.D. Cal. 2000);
One legal issue that merits particular attention is the CFAA, a criminal and civil statute that “prohibits acts of computer trespass by those who are not authorized users or who exceed authorized use.”
All courts to have considered the issue agree that a company using Web crawling technology “can run afoul of the CFAA when he or she has no permission to access a computer or when such permission has been revoked explicitly.” Facebook v. Power Ventures, at 1067. The private right of action under the CFAA also requires that the plaintiff “suffer[] damages or loss,”
The courts, however, differ in their approach to those that are given some access but have “exceeded the limits of their authorization” by retrieving material for unauthorized purposes. Facebook, at 1068. As the Ninth Circuit recently reaffirmed, it, as well as the Second and Fourth Circuits, have interpreted the CFAA such that these activities are not a violation. Id . (discussing
As a result, while a Web crawler that accesses websites without authorization or when authorization is revoked violates the CFAA, the courts might reach different results for a company that crawls Web pages that permit public access but prohibit Web crawling or other activities in which the company is engaged.
Practical Considerations
The growing interest in Web crawling among financial firms and software companies suggests that disputes will continue to arise as new technologies are developed. It is therefore important for those engaged in Web crawling to understand that simply because content and information can be found on the Internet, does not mean that all means of accessing it are permissible. Moreover, courts have held that accessing a website after authorization has been revoked is not permissible. See ,
That being said, while a careful Web crawler might want to review the Terms of Use of each website it intends to capture to confirm that Web crawling is permitted, the Ninth Circuit has expressed concern that requiring such careful analysis is not practical. Nosal, at 861.Thus, it has held that “violation of the terms of use of a website cannot itself constitute access without authorization.” Facebook, at 1068. The First Circuit, by contrast, has held that a “lack of authorization could be established by an explicit statement on the website restricting access,” such as terms of use, but even it has cautioned that “public policy might in turn limit certain restrictions.” EF Cultural Travel BV v. Zefer, at 62. Similarly, courts have considered whether use of technological measures, such as the Robots Exclusion Protocol (or robots.txt) — used by website operators to tell Web crawler programs what files or folders should not be visited — might be used as a proxy for such restrictions. See ,
Conclusion
The ongoing litigations referenced in this article and those filed in the future may provide greater clarity on the bounds of legal Web crawling. For now, businesses using these techniques should tread carefully lest they get caught in the CFAA's Web.
*****
Joshua L. Simmons is an intellectual property partner at
ENJOY UNLIMITED ACCESS TO THE SINGLE SOURCE OF OBJECTIVE LEGAL ANALYSIS, PRACTICAL INSIGHTS, AND NEWS IN ENTERTAINMENT LAW.
Already a have an account? Sign In Now Log In Now
For enterprise-wide or corporate acess, please contact Customer Service at [email protected] or 877-256-2473
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
Latham & Watkins helped the largest U.S. commercial real estate research company prevail in a breach-of-contract dispute in District of Columbia federal court.
Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.