Law.com Subscribers SAVE 30%

Call 855-808-4530 or email GroupSales@alm.com to receive your discount on a new subscription.

It's 12 O'Clock: Do You Know Where Your Data Are? IP Protections for Databases

By Gary S. Morris
May 01, 2004

An economist once said that the reason talk is so cheap is because the supply generally exceeds demand. Not so with information. No matter how much is produced, people always seem to want more. And more information means more databases, and the amount of work involved in compiling and organizing information into databases can be staggering. Yet, in many cases, anyone can copy the stored data and essentially replicate all or a portion of the database at a mere fraction of the cost of creating the database in the first place. Some have argued that this freedom to copy acts as a disincentive for anyone to organize information into databases. After all, if the creator can't expect to reap a fair economic reward for the effort expended, why bother?

Patchwork U.S. Protection Scheme

Various aspects of databases can be protected under existing intellectual property regimes. Patents can be used to protect the ideas that underlie new and useful ways of storing, searching for, and retrieving information from a database. For example, Google has several patent applications pending for its successful technology. See, e.g., U.S. Pat. No., 6,678,681, “Information Extraction From a Database.” Copyright law affords protection for creative expression, such as creative material stored in a database and unique ways of compiling the stored information. But that's more or less where protection for databases afforded by existing law stops. Even though they can be costly to compile, databases often are compilations of facts, which cannot be patented or copyrighted, and often employ existing tools and methods that likewise cannot be protected. In other words, the database owner cannot rely on existing patent and copyright law to prevent the reproduction of much of the information and functionality contained in a database.

Does Europe Offer A Solution?

To overcome these limitations of existing intellectual property laws, the European Parliament and the Council of the European Union have passed Directive 96/99/C, which mandates the establishment of a new right to specially protect databases. This Directive mandates that member states provide a right for the maker of a database to prevent others from extracting and reusing all or a substantial part of the maker's database, provided the maker can show that creating the database involved a substantial investment. This right stands in addition to whatever copyright and patent rights may already belong to the owner of a database. To encourage others outside of Europe to enact similar protections, the Directive includes a reciprocity clause that states that data provided by another country and used in Europe will only be protected if that country's laws include equivalent database protections.

Free Speech and Free Data

To date, the United States has declined to do so. Database protection legislation is routinely introduced every year in Congress, where it is always killed in committee by free speech advocates and corporations with adverse interests. See, e.g., HR 3261, “Database and Collections of Information Misappropriation Act” (2004). They argue that facts must remain freely available to all, in whatever form, and that enacting special protection for databases would impair their availability. See, e.g. Zetter, K: Hands Off! That Fact is Mine. Wired Magazine, March 3, 2004.

If at First Patents Don't Succeed, Try Copyrights

Fortunately, the database owner in the United States is not entirely without recourse to stop copying. Copyright law does protect some aspects of databases. For example, copyright law can protect the way in which the data is organized in the database, provided there is more than one way to do so. The database owner can sue a person for copyright infringement for copying the data from the database and organizing it in the same way as in the owner's database. All the owner needs to prove to show copyright infringement is ownership of a valid copyright, and that the copyrighted material has indeed been copied by the accused infringer. See Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991).

And yet, proving actual copying can be problematic. The contents of many databases are facts that often are widely available in an uncompiled form from various sources. The accused infringer can argue that he independently compiled the contents of the database, and that he independently created the arrangement used in his database, ie, that the similarity in arrangement between his database and the owner's database is coincidental. Indeed, independent creation is a defense to copyright infringement.

Courts have widely recognized the difficulties inherent in proving actual copying. See, e.g., Roth Greeting Cards v. United Card Co., 429 F.2d 1106 (9th Cir. 1970). Consequently, copying is usually established by showing that the defendant had access to the allegedly copied work, and that the defendant's work is substantially similar to it. See, e.g., Langman Fabrics v. Graff Californiawear, Inc., 160 F.3d 106, 115 (2d Cir. 1998).

Substantial similarity stands on its own, as the original and the accused databases can be compared side by side. But showing that the defendant had access to the original database can be difficult. Sometimes, accesses to a database are not tracked by the owner. Even if they are, a determined copier can use a third party to access the owner's database and pull data from it, or use electronic anonymizing techniques such as a proxy.

Watermarking: The Invisible Technical Solution?

One way to show access is called “watermarking.” Watermarking involves marking the data in the owner's database so that the markings are invisible, and hence difficult to detect, remove or alter. Publishers of telephone books have long seeded their directories with arbitrarily created false entries that would be copied along with the rest. The problem with electronic databases is that the use of arbitrarily created false entries (eg, false name, false address, etc.) could affect the accuracy of the results returned by the database in response to certain search requests from users. Results can be filtered by comparing them to a list of such arbitrary false entries, but comparing every result to a list of watermarked records can be computationally burdensome.

Ideally, the watermark in a database should have the following three attributes:

  • It should be undetectable so that it cannot be removed or modified;
  • It should be generated using an algorithm that can reproduce the watermark to prove that the watermark is present in copied data; and
  • It should be self authenticating, meaning that it should provably be the watermark without having to refer to anything beyond the data itself.

For example, consider a database that has millions of records, each of which contains a name, an address and a telephone number. To satisfy the first requirement, each record that relates to a watermark should appear to be legitimate. That is, the name, address and telephone number fields should appear to be plausible entries.

To satisfy the second requirement, a rule or set of rules should be applied to combine the name, address and telephone number in a way that is unlikely to occur in actual data. For example, each of the area code, exchange and last four digits of the telephone number is divided by 26. The remainders (each a number between 0 and 25) plus 1 are concatenated to form the number of the street address in the record. These fictitious records can easily be filtered from results returned in response to a user query by testing the number of the street address in relation to the telephone number, and excluding results having the watermark property.

This example also satisfies the third requirement, because the entire watermark is contained in a particular relationship between one field of the record (the telephone number) and another (the street number in the address).

If the watermarking algorithm is selected properly, it can be shown that the chance of a legitimate record having the properties imputed by the algorithm is astronomically small, and of several records having these same properties, smaller still. This can be compelling evidence that records in another's database have indeed been copied, and were not independently compiled. Assuming an illicitly copied database with the same original structure as the original, then access and substantial similarity can now be shown, and infringement is proved.

Better still, new watermarking algorithms may be patented. Of course, in many cases, the effectiveness of a watermarking scheme depends upon its secrecy. However, if claimed correctly, this need be no impediment to obtaining patent coverage. The trick is to claim the algorithm parametrically, ie, in general terms, and then to select and keep secret the parametric values in an implementation of the algorithm. To make this clearer, consider the watermarking scheme described above. The patent claim could be directed toward “dividing each component of the telephone number by a first number M until the remainder is less than M, and then concatenating these remainders to form the number of the street address.” This patent claim could cover all variations of the algorithm stated above. At the same time, the exact number chosen by the database owner for implementing the algorithm (in the above example, 26) can remain secret, thereby protecting the integrity of the watermarks.

So the database owner need not despair of a complete lack of intellectual property protection for his or her work. A carefully considered application of patent- and copyright-related measures can go a long way toward helping to protect the substantial investment involved in compiling and maintaining databases.



Gary S. Morris

An economist once said that the reason talk is so cheap is because the supply generally exceeds demand. Not so with information. No matter how much is produced, people always seem to want more. And more information means more databases, and the amount of work involved in compiling and organizing information into databases can be staggering. Yet, in many cases, anyone can copy the stored data and essentially replicate all or a portion of the database at a mere fraction of the cost of creating the database in the first place. Some have argued that this freedom to copy acts as a disincentive for anyone to organize information into databases. After all, if the creator can't expect to reap a fair economic reward for the effort expended, why bother?

Patchwork U.S. Protection Scheme

Various aspects of databases can be protected under existing intellectual property regimes. Patents can be used to protect the ideas that underlie new and useful ways of storing, searching for, and retrieving information from a database. For example, Google has several patent applications pending for its successful technology. See, e.g., U.S. Pat. No., 6,678,681, “Information Extraction From a Database.” Copyright law affords protection for creative expression, such as creative material stored in a database and unique ways of compiling the stored information. But that's more or less where protection for databases afforded by existing law stops. Even though they can be costly to compile, databases often are compilations of facts, which cannot be patented or copyrighted, and often employ existing tools and methods that likewise cannot be protected. In other words, the database owner cannot rely on existing patent and copyright law to prevent the reproduction of much of the information and functionality contained in a database.

Does Europe Offer A Solution?

To overcome these limitations of existing intellectual property laws, the European Parliament and the Council of the European Union have passed Directive 96/99/C, which mandates the establishment of a new right to specially protect databases. This Directive mandates that member states provide a right for the maker of a database to prevent others from extracting and reusing all or a substantial part of the maker's database, provided the maker can show that creating the database involved a substantial investment. This right stands in addition to whatever copyright and patent rights may already belong to the owner of a database. To encourage others outside of Europe to enact similar protections, the Directive includes a reciprocity clause that states that data provided by another country and used in Europe will only be protected if that country's laws include equivalent database protections.

Free Speech and Free Data

To date, the United States has declined to do so. Database protection legislation is routinely introduced every year in Congress, where it is always killed in committee by free speech advocates and corporations with adverse interests. See, e.g., HR 3261, “Database and Collections of Information Misappropriation Act” (2004). They argue that facts must remain freely available to all, in whatever form, and that enacting special protection for databases would impair their availability. See, e.g. Zetter, K: Hands Off! That Fact is Mine. Wired Magazine, March 3, 2004.

If at First Patents Don't Succeed, Try Copyrights

Fortunately, the database owner in the United States is not entirely without recourse to stop copying. Copyright law does protect some aspects of databases. For example, copyright law can protect the way in which the data is organized in the database, provided there is more than one way to do so. The database owner can sue a person for copyright infringement for copying the data from the database and organizing it in the same way as in the owner's database. All the owner needs to prove to show copyright infringement is ownership of a valid copyright, and that the copyrighted material has indeed been copied by the accused infringer. See Feist Publications, Inc. v. Rural Tel. Serv. Co., 499 U.S. 340 (1991).

And yet, proving actual copying can be problematic. The contents of many databases are facts that often are widely available in an uncompiled form from various sources. The accused infringer can argue that he independently compiled the contents of the database, and that he independently created the arrangement used in his database, ie, that the similarity in arrangement between his database and the owner's database is coincidental. Indeed, independent creation is a defense to copyright infringement.

Courts have widely recognized the difficulties inherent in proving actual copying. See, e.g., Roth Greeting Cards v. United Card Co., 429 F.2d 1106 (9th Cir. 1970). Consequently, copying is usually established by showing that the defendant had access to the allegedly copied work, and that the defendant's work is substantially similar to it. See, e.g., Langman Fabrics v. Graff Californiawear, Inc., 160 F.3d 106, 115 (2d Cir. 1998).

Substantial similarity stands on its own, as the original and the accused databases can be compared side by side. But showing that the defendant had access to the original database can be difficult. Sometimes, accesses to a database are not tracked by the owner. Even if they are, a determined copier can use a third party to access the owner's database and pull data from it, or use electronic anonymizing techniques such as a proxy.

Watermarking: The Invisible Technical Solution?

One way to show access is called “watermarking.” Watermarking involves marking the data in the owner's database so that the markings are invisible, and hence difficult to detect, remove or alter. Publishers of telephone books have long seeded their directories with arbitrarily created false entries that would be copied along with the rest. The problem with electronic databases is that the use of arbitrarily created false entries (eg, false name, false address, etc.) could affect the accuracy of the results returned by the database in response to certain search requests from users. Results can be filtered by comparing them to a list of such arbitrary false entries, but comparing every result to a list of watermarked records can be computationally burdensome.

Ideally, the watermark in a database should have the following three attributes:

  • It should be undetectable so that it cannot be removed or modified;
  • It should be generated using an algorithm that can reproduce the watermark to prove that the watermark is present in copied data; and
  • It should be self authenticating, meaning that it should provably be the watermark without having to refer to anything beyond the data itself.

For example, consider a database that has millions of records, each of which contains a name, an address and a telephone number. To satisfy the first requirement, each record that relates to a watermark should appear to be legitimate. That is, the name, address and telephone number fields should appear to be plausible entries.

To satisfy the second requirement, a rule or set of rules should be applied to combine the name, address and telephone number in a way that is unlikely to occur in actual data. For example, each of the area code, exchange and last four digits of the telephone number is divided by 26. The remainders (each a number between 0 and 25) plus 1 are concatenated to form the number of the street address in the record. These fictitious records can easily be filtered from results returned in response to a user query by testing the number of the street address in relation to the telephone number, and excluding results having the watermark property.

This example also satisfies the third requirement, because the entire watermark is contained in a particular relationship between one field of the record (the telephone number) and another (the street number in the address).

If the watermarking algorithm is selected properly, it can be shown that the chance of a legitimate record having the properties imputed by the algorithm is astronomically small, and of several records having these same properties, smaller still. This can be compelling evidence that records in another's database have indeed been copied, and were not independently compiled. Assuming an illicitly copied database with the same original structure as the original, then access and substantial similarity can now be shown, and infringement is proved.

Better still, new watermarking algorithms may be patented. Of course, in many cases, the effectiveness of a watermarking scheme depends upon its secrecy. However, if claimed correctly, this need be no impediment to obtaining patent coverage. The trick is to claim the algorithm parametrically, ie, in general terms, and then to select and keep secret the parametric values in an implementation of the algorithm. To make this clearer, consider the watermarking scheme described above. The patent claim could be directed toward “dividing each component of the telephone number by a first number M until the remainder is less than M, and then concatenating these remainders to form the number of the street address.” This patent claim could cover all variations of the algorithm stated above. At the same time, the exact number chosen by the database owner for implementing the algorithm (in the above example, 26) can remain secret, thereby protecting the integrity of the watermarks.

So the database owner need not despair of a complete lack of intellectual property protection for his or her work. A carefully considered application of patent- and copyright-related measures can go a long way toward helping to protect the substantial investment involved in compiling and maintaining databases.



Gary S. Morris Kenyon & Kenyon
Read These Next
Overview of Regulatory Guidance Governing the Use of AI Systems In the Workplace Image

Businesses have long embraced the use of computer technology in the workplace as a means of improving efficiency and productivity of their operations. In recent years, businesses have incorporated artificial intelligence and other automated and algorithmic technologies into their computer systems. This article provides an overview of the federal regulatory guidance and the state and local rules in place so far and suggests ways in which employers may wish to address these developments with policies and practices to reduce legal risk.

Is Google Search Dead? How AI Is Reshaping Search and SEO Image

This two-part article dives into the massive shifts AI is bringing to Google Search and SEO and why traditional searches are no longer part of the solution for marketers. It’s not theoretical, it’s happening, and firms that adapt will come out ahead.

While Federal Legislation Flounders, State Privacy Laws for Children and Teens Gain Momentum Image

For decades, the Children’s Online Privacy Protection Act has been the only law to expressly address privacy for minors’ information other than student data. In the absence of more robust federal requirements, states are stepping in to regulate not only the processing of all minors’ data, but also online platforms used by teens and children.

Revolutionizing Workplace Design: A Perspective from Gray Reed Image

In an era where the workplace is constantly evolving, law firms face unique challenges and opportunities in facilities management, real estate, and design. Across the industry, firms are reevaluating their office spaces to adapt to hybrid work models, prioritize collaboration, and enhance employee experience. Trends such as flexible seating, technology-driven planning, and the creation of multifunctional spaces are shaping the future of law firm offices.

From DeepSeek to Distillation: Protecting IP In An AI World Image

Protection against unauthorized model distillation is an emerging issue within the longstanding theme of safeguarding intellectual property. This article examines the legal protections available under the current legal framework and explore why patents may serve as a crucial safeguard against unauthorized distillation.