Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
The proliferation of Internet access and mobile devices has led to an exponential explosion of content on the Web, creating a vast repository of “publicly available” information. This includes not only news, business and financial information, but also personal data, movie and restaurant reviews, concert ticket sales, flight information, and a virtually endless array of other categories. This same technological explosion, however, has made it far easier for third parties to extract this data for commercial sale and use ' and to do so for free and without authorization. This data extraction, commonly referred to as “scraping,” “crawling,” or “spidering” (collectively “scraping”), creates legal issues and concerns for both sides of this issue ' those who want to scrape, and those who want to protect against scraping of their websites. See, EF Cultural Travel BV v. Zefer, 318 F.3d 58, 60 (1st Cir. 2003) (“A scraper, also called a 'robot' or 'bot,' is nothing more than a computer program that accesses information contained in a succession of webpages stored on the accessed computer”); eBay v. Bidder's Edge, 100 F. Supp. 2d 1058, 1060 (N.D. Cal. 2000).
While it is possible to embed instructions on websites that inform the scraping software whether scraping is permitted (called “robot.txt” files), compliance with such instructions is voluntary. See, Bidder's Edge , at 1061.
This article provides a primer on the legal framework surrounding scraping, addressing both the grounds for potential claims against scrapers, and ways to avoid liability for scraping. The common theories of liability arising from scraping are copyright infringement, trespass to chattels, breach of contract, and violation of the Computer Fraud and Abuse Act (CFAA), 18 U.S.C. '1030. This article also discusses the leading cases applying these legal theories to website scraping, and concludes that the most effective way to create potential claims against scrapers is through carefully drafted prohibitions in a website's Terms of Use. Conversely, the most effective way to defend against a claim of unauthorized scraping is to abide by such Terms of Use, or to establish that scraping constitutes a fair use and does not overburden the servers of the website being scraped.
Copyright Infringement
Scraping inherently involves copying, and therefore one of the most obvious claims against scrapers is copyright infringement. However, such claims are often open to attack on several grounds. First, in order to have standing to bring a claim for copyright infringement, the owner (or exclusive licensee) of the website being scraped must also be the owner of the copyrightable content that is the subject of the claim. See, e.g., Nautical Solutions Mktg. v. Boats.com, No. 8:02-CV-760, 2004 WL 783121, at 2-3 (M.D. Fla. April 1, 2004) (denying post-trial motion for declaration of copyright infringement, because, inter alia , the website that was being scraped did not own the copyright to the data and images that were being copied). This can pose a barrier to bringing a lawsuit if, for example, the content at issue is user-generated (such as videos or reviews), and the rights in the content have not been transferred to the website owner.
Second, copyright law does not protect ideas, but rather only tangible expression. See, Feist Publ'ns v. Rural Tel. Serv., 499 U.S. 340 (1991). Thus, the scraping of general factual data does not give rise to a viable claim for copyright infringement. For example, in Ticketmaster v. Tickets.com, No. 99-CV-7654, 2003 WL 21406289, at 4-6 (C.D. Cal. March 7, 2003), the court rejected an infringement claim because the material being extracted ' factual information regarding concerts and URLs ' was not copyrightable. See also, Nautical Solutions, at 2-3 (reaching similar result for scraping of information regarding the sale of yachts).
Third, even if the information copied by the scraper is protectable under copyright law, the defendant may be able to rely upon the “fair use” defense. Under the Copyright Act, courts are to consider the following factors to determine if a use is a fair use: 1) the purpose and character of the use; 2) the nature of the copyrighted work; 3) the amount and substantiality of the portion used in relation to the work as a whole; and 4) the effect of the use upon the potential market for or value of the copyrighted work. See, 17 U.S.C. '107. For example, in Kelly v. Arriba Soft, 336 F.3d 811, 819 (9th Cir. 2003), the court held that the use of scraping software by a search engine to reproduce images in thumbnail form was not a sustainable basis for a claim of copyright infringement, because the thumbnail images created from the full-size scraped images were “transformative” and qualified as a fair use of the images. An in-depth discussion of the nuances of the fair use doctrine is outside the scope of this article. For a discussion of fair use, see, Melville B. Nimmer, 4 Nimmer on Copyright '13.05 (Lexis 2013).
Trespass to Chattels
A trespass to chattels is defined as intentionally dispossessing another of a chattel or using or intermeddling with a chattel in the possession of another. See , Restatement (Second) of Torts '218 (Westlaw 2012); see also, Bidder's Edge, at 1069. This legal theory applies to the Internet inasmuch as a website proprietor has a “fundamental property right to exclude others from its computer system[.]” Id. at 1067. Moreover, even if a website is publicly accessible, its servers are private property, and the proprietor may therefore grant conditional access to users, including prohibitions against scraping. Id. at 1070.
For example, in Bidder's Edge, the court held that excessive scraping can support a claim for trespass to chattels if it taxes the plaintiff's computer system in such a way that would substantially impair it, and, if so, an injunction may be granted. Id. at 1071-72. Specifically, the court held that there was a viable trespass cause of action due to the excessive scraping of eBay's website at the rate of 80,000-100,000 times per day. Id. at 1071.
Similarly, in Register.com v. Verio, 356 F.3d 393, 404-05 (2d Cir. 2004), the Court of Appeals for the Second Circuit held that Verio's use of search robots consumed a significant portion of the capacity of Register's computer system, and that Verio was therefore engaged in a trespass. The court reasoned that if it were to allow these queries, then it was “highly probable” that other companies would begin to do the same, which would likely result in Register's system being “overtaxed and [it] would crash.” Id. at 404. However, in Ticketmaster, the court held that the use of scrapers to extract data was not a trespass to chattels, because there was no evidence that the scraping caused any tangible interference with the operation of Ticketmaster's system. Ticketmaster, at 3.
Breach of Contract
Courts have held that a viable method of preventing scraping is to include prohibitions against scraping in the website's terms of use. See, e.g., Bidder's Edge, at 1067; Zefer, at 62. Such restrictions are generally conveyed to website users through a “clickwrap” or “browsewrap” agreement.
A clickwrap agreement is an online agreement that requires the user to consent to terms and conditions by affirmatively clicking a dialogue box agreeing to the terms before the user can proceed to use a website. See, Specht v. Netscape Commc'ns, 306 F.3d 17, 22 n.4 (2d Cir. 2002); Hines v. Overstock.com, 668 F. Supp. 2d 362, 366-67 (E.D.N.Y. 2009). Clickwrap agreements are generally enforceable, due to the user's clear manifestation of assent, so long as the terms do not violate other basic contract principles ( e.g. , unconscionability). See, Specht, at 22 n.4. [ Editor's Note: For more on clickwrap and browsewrap agreements, see, "Courts Address Clickwrap and Electronic Contracting," in the April 2013 issue of e-Commerce Law & Strategy.]
For example, in Bidder's Edge, the court took note of the fact that the user agreement at the time, to which users were required to click “I Accept,” expressly prohibited “any robot, spider, other automatic device, or manual process to monitor or copy our [W]eb pages or the content contained herein without our prior expressed written permission.” Bidder's Edge, at 1060. The court stated that these terms of use constituted a limited license, and that actions not permitted by this license were restricted. Id. at 1067.
Browsewrap agreements, on the other hand, involve the posting of a link to terms and conditions on a website for users to read, but do not require users to affirmatively manifest assent to the terms and conditions ' instead, user consent is implied by continued use of the website. See, Specht, at 25.
The enforceability of such agreements requires a fact-specific inquiry, and turns largely upon the location and accessibility of the terms of use. See, e.g., Specht, at 35; Hines, at 367. According to the Specht court: “Reasonably conspicuous notice of the existence of contract terms and unambiguous manifestation of assent to those terms by consumers are essential if electronic bargaining is to have integrity and credibility.” Specht , at 35 (finding a browsewrap agreement unenforceable).
For example, in Hines the court held that the browsewrap agreement was not enforceable, because in this case the plaintiff had no actual or constructive notice of the terms and conditions of use. See, Hines, at 367. However, in Southwest Airlines v. BoardFirst, No. 3:06-CV-0891, 2007 WL 4823761, at 7 (N.D. Texas Sept. 12, 2007), where there was evidence that defendant had actual knowledge of Southwest's terms and conditions, but nevertheless continued to use Southwest's website in violation of those terms, the court held that the browsewrap agreement was an enforceable contract.
Terms of Use may also be binding where the terms are reasonably known to the user ' even in circumstances in which the terms are not known to the user before the first use of the website. For example, in Register.com, the user was made aware of the Terms of Use only after first accessing the information provided on the website. See, Register.com, at 401-04. The court held that while the Terms of Use were technically neither a clickwrap nor a browsewrap agreement because they were only displayed after the user accessed the information on the website, the restrictions therein were nevertheless enforceable because the user accessed the website repeatedly and therefore was on notice during subsequent visits. Id.
In sum, while statements of assent such as “I agree,” which are often elicited through clickwrap agreements, are preferable and unequivocally reflect a manifestation of assent, the user need not necessarily state the magic words “I agree” (or some similar formulation). See, Id. at 402-03. However, “the website user must have had actual or constructive knowledge of the site's terms and conditions, and have manifested assent to them” in some manner, implicit or explicit. See, Cvent v. Eventbrite, 739 F. Supp. 2d 927, 937 (E.D. Va. 2010); see also, Hines, at 367.
Violation of the CFAA
The CFAA is a federal statute that provides liability for anyone who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains ' information from any protected computer.” 18 U.S.C. '1030(a)(4); see also , 18 U.S.C. '1030(g) (providing for civil liability and a private right of action). The CFAA also requires that there be a minimum amount of damages of at least $5,000 over a one-year period. See, 18 U.S.C. '1030(a)(4). Similar to the breach of contract cases discussed above, CFAA cases often hinge upon whether a user had actual or constructive knowledge of the restrictive terms of a website's terms of use ( i.e., knowledge that the scraping was “unauthorized”).
For example, in Southwest Airlines v. Farechase, 318 F. Supp. 2d 435, 440 (N.D. Tex. 2004), defendants scraped fare, route and scheduling information from Southwest.com. The court denied a motion to dismiss the CFAA claim because Southwest alleged: i) damages of at least $5,000; and ii) that it had put defendant on actual notice that scraping was prohibited. See, Id. at 439-40; see also, Zefer, at 62-63 (upholding a preliminary injunction issued under the CFAA where defendant had knowledge that scraping was unauthorized).
However, in Cvent, even though the Terms of Use stated that competitors were prohibited from accessing and utilizing the information on the website, the court held that there was no violation of the CFAA. See, Cvent, at 932-34. The court concluded that the terms of use were not sufficiently visible because the link was “buried” at the bottom of the first page, in extremely fine print, and users had to scroll down to see it, thereby rendering them insufficient protection for the site. See, Id.
Conclusion and Proposed Terms of Use
Scraping may be permissible under U.S. law if the content at issue is not subject to copyright protection, if the scraping does not unduly burden the website's servers, and if the website's terms of use do not prohibit scraping or if assent to such terms has not been manifested.
However, if the client's goal is to reduce or protect against scraping, and to establish a potential basis for liability, the website's terms of use should contain language to the following effect, and users should be put on reasonable notice of such terms. This language is, of course, merely provided as an example:
By accessing this website, you accept without limitation or qualification, and agree to be bound and abide by, the following terms and conditions (Terms of Use). [CLIENT] may revise and update these Terms of Use from time to time in its sole discretion. Your continued use of this website following the posting of revised Terms of Use means that you accept and agree to any and all changes to the Terms of Use. You may use this website only for lawful purposes and in accordance with these Terms of Use, and you agree not to: (i) use this website in any manner that could disable, overburden, damage, or impair this website, or interfere with any other use of this website, including, but not limited to, any user's ability to engage in real-time activities through this website; (ii) use any robot, spider or other automatic device, process or means to access this website for any purpose, including to monitor or copy any of the material on this website; (iii) use any manual process to monitor or copy any of the material on this website, or to engage in any other unauthorized purpose without the express prior written consent of [CLIENT]; (iv) otherwise use any device, software or routine that interferes with the proper working of this website; or (v) otherwise attempt to interfere with the proper working of this website.
Anthony J. Dreyer is a partner, and Jamie Stockton is an associate, with Skadden, Arps, Slate, Meagher & Flom. Brittany Bettman , a summer associate, assisted in the preparation of this article.
The proliferation of Internet access and mobile devices has led to an exponential explosion of content on the Web, creating a vast repository of “publicly available” information. This includes not only news, business and financial information, but also personal data, movie and restaurant reviews, concert ticket sales, flight information, and a virtually endless array of other categories. This same technological explosion, however, has made it far easier for third parties to extract this data for commercial sale and use ' and to do so for free and without authorization. This data extraction, commonly referred to as “scraping,” “crawling,” or “spidering” (collectively “scraping”), creates legal issues and concerns for both sides of this issue ' those who want to scrape, and those who want to protect against scraping of their websites. See,
While it is possible to embed instructions on websites that inform the scraping software whether scraping is permitted (called “robot.txt” files), compliance with such instructions is voluntary. See, Bidder's Edge , at 1061.
This article provides a primer on the legal framework surrounding scraping, addressing both the grounds for potential claims against scrapers, and ways to avoid liability for scraping. The common theories of liability arising from scraping are copyright infringement, trespass to chattels, breach of contract, and violation of the Computer Fraud and Abuse Act (CFAA), 18 U.S.C. '1030. This article also discusses the leading cases applying these legal theories to website scraping, and concludes that the most effective way to create potential claims against scrapers is through carefully drafted prohibitions in a website's Terms of Use. Conversely, the most effective way to defend against a claim of unauthorized scraping is to abide by such Terms of Use, or to establish that scraping constitutes a fair use and does not overburden the servers of the website being scraped.
Copyright Infringement
Scraping inherently involves copying, and therefore one of the most obvious claims against scrapers is copyright infringement. However, such claims are often open to attack on several grounds. First, in order to have standing to bring a claim for copyright infringement, the owner (or exclusive licensee) of the website being scraped must also be the owner of the copyrightable content that is the subject of the claim. See, e.g., Nautical Solutions Mktg. v. Boats.com, No. 8:02-CV-760, 2004 WL 783121, at 2-3 (M.D. Fla. April 1, 2004) (denying post-trial motion for declaration of copyright infringement, because, inter alia , the website that was being scraped did not own the copyright to the data and images that were being copied). This can pose a barrier to bringing a lawsuit if, for example, the content at issue is user-generated (such as videos or reviews), and the rights in the content have not been transferred to the website owner.
Second, copyright law does not protect ideas, but rather only tangible expression. See,
Third, even if the information copied by the scraper is protectable under copyright law, the defendant may be able to rely upon the “fair use” defense. Under the Copyright Act, courts are to consider the following factors to determine if a use is a fair use: 1) the purpose and character of the use; 2) the nature of the copyrighted work; 3) the amount and substantiality of the portion used in relation to the work as a whole; and 4) the effect of the use upon the potential market for or value of the copyrighted work. See, 17 U.S.C. '107. For example, in
Trespass to Chattels
A trespass to chattels is defined as intentionally dispossessing another of a chattel or using or intermeddling with a chattel in the possession of another. See , Restatement (Second) of Torts '218 (Westlaw 2012); see also, Bidder's Edge, at 1069. This legal theory applies to the Internet inasmuch as a website proprietor has a “fundamental property right to exclude others from its computer system[.]” Id. at 1067. Moreover, even if a website is publicly accessible, its servers are private property, and the proprietor may therefore grant conditional access to users, including prohibitions against scraping. Id. at 1070.
For example, in Bidder's Edge, the court held that excessive scraping can support a claim for trespass to chattels if it taxes the plaintiff's computer system in such a way that would substantially impair it, and, if so, an injunction may be granted. Id. at 1071-72. Specifically, the court held that there was a viable trespass cause of action due to the excessive scraping of eBay's website at the rate of 80,000-100,000 times per day. Id. at 1071.
Similarly, in
Breach of Contract
Courts have held that a viable method of preventing scraping is to include prohibitions against scraping in the website's terms of use. See, e.g., Bidder's Edge, at 1067; Zefer, at 62. Such restrictions are generally conveyed to website users through a “clickwrap” or “browsewrap” agreement.
A clickwrap agreement is an online agreement that requires the user to consent to terms and conditions by affirmatively clicking a dialogue box agreeing to the terms before the user can proceed to use a website. See,
For example, in Bidder's Edge, the court took note of the fact that the user agreement at the time, to which users were required to click “I Accept,” expressly prohibited “any robot, spider, other automatic device, or manual process to monitor or copy our [W]eb pages or the content contained herein without our prior expressed written permission.” Bidder's Edge, at 1060. The court stated that these terms of use constituted a limited license, and that actions not permitted by this license were restricted. Id. at 1067.
Browsewrap agreements, on the other hand, involve the posting of a link to terms and conditions on a website for users to read, but do not require users to affirmatively manifest assent to the terms and conditions ' instead, user consent is implied by continued use of the website. See, Specht, at 25.
The enforceability of such agreements requires a fact-specific inquiry, and turns largely upon the location and accessibility of the terms of use. See, e.g., Specht, at 35; Hines, at 367. According to the Specht court: “Reasonably conspicuous notice of the existence of contract terms and unambiguous manifestation of assent to those terms by consumers are essential if electronic bargaining is to have integrity and credibility.” Specht , at 35 (finding a browsewrap agreement unenforceable).
For example, in Hines the court held that the browsewrap agreement was not enforceable, because in this case the plaintiff had no actual or constructive notice of the terms and conditions of use. See, Hines, at 367. However, in
Terms of Use may also be binding where the terms are reasonably known to the user ' even in circumstances in which the terms are not known to the user before the first use of the website. For example, in Register.com, the user was made aware of the Terms of Use only after first accessing the information provided on the website. See, Register.com, at 401-04. The court held that while the Terms of Use were technically neither a clickwrap nor a browsewrap agreement because they were only displayed after the user accessed the information on the website, the restrictions therein were nevertheless enforceable because the user accessed the website repeatedly and therefore was on notice during subsequent visits. Id.
In sum, while statements of assent such as “I agree,” which are often elicited through clickwrap agreements, are preferable and unequivocally reflect a manifestation of assent, the user need not necessarily state the magic words “I agree” (or some similar formulation). See, Id. at 402-03. However, “the website user must have had actual or constructive knowledge of the site's terms and conditions, and have manifested assent to them” in some manner, implicit or explicit. See,
Violation of the CFAA
The CFAA is a federal statute that provides liability for anyone who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains ' information from any protected computer.” 18 U.S.C. '1030(a)(4); see also , 18 U.S.C. '1030(g) (providing for civil liability and a private right of action). The CFAA also requires that there be a minimum amount of damages of at least $5,000 over a one-year period. See, 18 U.S.C. '1030(a)(4). Similar to the breach of contract cases discussed above, CFAA cases often hinge upon whether a user had actual or constructive knowledge of the restrictive terms of a website's terms of use ( i.e., knowledge that the scraping was “unauthorized”).
For example, in
However, in Cvent, even though the Terms of Use stated that competitors were prohibited from accessing and utilizing the information on the website, the court held that there was no violation of the CFAA. See, Cvent, at 932-34. The court concluded that the terms of use were not sufficiently visible because the link was “buried” at the bottom of the first page, in extremely fine print, and users had to scroll down to see it, thereby rendering them insufficient protection for the site. See, Id.
Conclusion and Proposed Terms of Use
Scraping may be permissible under U.S. law if the content at issue is not subject to copyright protection, if the scraping does not unduly burden the website's servers, and if the website's terms of use do not prohibit scraping or if assent to such terms has not been manifested.
However, if the client's goal is to reduce or protect against scraping, and to establish a potential basis for liability, the website's terms of use should contain language to the following effect, and users should be put on reasonable notice of such terms. This language is, of course, merely provided as an example:
By accessing this website, you accept without limitation or qualification, and agree to be bound and abide by, the following terms and conditions (Terms of Use). [CLIENT] may revise and update these Terms of Use from time to time in its sole discretion. Your continued use of this website following the posting of revised Terms of Use means that you accept and agree to any and all changes to the Terms of Use. You may use this website only for lawful purposes and in accordance with these Terms of Use, and you agree not to: (i) use this website in any manner that could disable, overburden, damage, or impair this website, or interfere with any other use of this website, including, but not limited to, any user's ability to engage in real-time activities through this website; (ii) use any robot, spider or other automatic device, process or means to access this website for any purpose, including to monitor or copy any of the material on this website; (iii) use any manual process to monitor or copy any of the material on this website, or to engage in any other unauthorized purpose without the express prior written consent of [CLIENT]; (iv) otherwise use any device, software or routine that interferes with the proper working of this website; or (v) otherwise attempt to interfere with the proper working of this website.
Anthony J. Dreyer is a partner, and Jamie Stockton is an associate, with
What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.
During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.
The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.
As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.
Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.