Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

PII: How New Technology Makes It Easier To Maintain Compliance

By Todd M. Haley
October 28, 2011

Personally identifiable information (PII) is any data about an individual that could, potentially identify that person, such as a name, fingerprints or other biometric data, e-mail address, street address, telephone number or Social Security Number (SSN).

A study done at MIT by Latanya Sweeney, now a professor at Carnegie Mellon University, found that 87% of the population in the United States could be uniquely identified by just three pieces of PII: their five-digit zip code, gender and date of birth. This demonstrates that SSNs, while valuable, is not necessary to identify unique individuals.

Song-Beverly Act

Up until recently, Massachusetts had passed some of the strictest laws on individual privacy, being one of the first states in America to pass laws similar to the privacy laws in Europe, and expressing that individuals' information is owned and protected by the individual, regardless of where that information is generated. Now California agrees as well. On Feb. 10, 2011, the California Supreme Court released its decision in Pineda v. Williams-Sonoma Stores, Inc., No. S178241, holding that zip code information is PII under the state's Song-Beverly Credit Card Act (Cal. Civil Code 1747, (1971)). The court's decision restricts businesses in California from requesting and recording a person's zip code as part of a credit card transaction.

Under the Song-Beverly Act, a business is prohibited from requesting, or requiring as a condition to accepting a credit card payment, the cardholder's personal information. The California Court of Appeal had previously held that a zip code, without additional information, was not PII in Party City Corp. v. Superior Court, 169 Cal.App.4th 497 (2008). However, in Pineda, the California Supreme Court clarified California's broad interpretation of PII.

In addition to these cases, PII is now being regulated by many different agencies including but not limited to the: U.S. Securities and Exchange Commission (SEC), United States Nuclear Regulatory Commission, Federal Trade Commission (FTC), Federal Communications Commission, Social Security Administration, and the Department of Health and Human Services.

For more information on PII and associated regulations, check out these two whitepapers: Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers, from the FTC (http://bit.ly/otyOPK) and Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), from the National Institute of Standards and Technology (NIST), a division of the U.S. Department of Commerce (http://1.usa.gov/aU94uy).

Identify and Remediate PII

One of the best ways that corporations and law firms can identify and manage PII is through a data topology map ' a diagram showing where data resides on their corporate network (see, “Locations of PII” in the white paper, Is Personally Identifiable Information (PII) Pervasive on Your Company's Computers?, located at www.eteraconsulting.com/resource-center/articles). In the past, these types of data topology maps could only be formulated through manual review and labor. However, with the advent of advanced information management systems (e.g., StoredIQ, Kazeon, Clearwell), corporations are now able to create automatic data topology maps.

The beauty of these new automatic data topology maps is that they can also be used dynamically to maintain up-to-date records, remediate the data (identify, classify, isolate, delete), and can provide exhaustive reporting about the data. Additionally, these technologies can be used successfully to enforce and report on compliance and risk matters on an ongoing basis.

Each of these new tools provides advanced searching, filtering, enforcement, maintenance and reporting for PII governance. To provide some specifics around these concepts, let's look at a particular case study and technology solution as a means of review.

The Case Study

One of the largest utility companies in the United States completed an internal audit at one of its sites and found that there were some instances of employee's PII, particularly SSNs on e-mail and file servers. When it came to us to perform a more extensive analysis, we provided two options, but made a specific recommendation based on its requirements. We recommended selecting the technology that could: 1) be attached to the network by a single cable and have no software installed on their networks; 2) index large amounts of data quickly with performance hits to their network (important due to a drop-dead deadline); and 3) produce highly customized reports quickly and easily.

The solution performed a quick crawl (“thindex”) of the network, which provided the client with a solid data sample from which to make decisions. From there, a full-text index was completed of the targeted system areas, indexing approximately 17 TBs and 30 million files in approximately five days. From there, a few pre-defined queries were started, such as SSNs, credit card numbers and vital records (using a library of over 65,000 medical terms). Further analysis was then performed by creating customized patterns to identify specific corporate credit cards, passports, checking and savings account numbers and much more. While the corporation only expected to find a few thousand records, the technology identified over 76,000 files that contained PII. (See the specific results in the aforementioned white paper, Is Personally Identifiable Information (PII) Pervasive on Your Company's Computers?)

The Technology Solution

The technology solution that was selected in the case study was an enterprise solution called StoredIQ. Because the technology is very unintrusive, StoredIQ was configured to the specific needs of the corporation. In the case study above, speed was essential so the technology was delivered in a full-rack, seven-server, large memory cluster with extensive space to handle the large indexing logs and rapid movement of data required. In other instances, the technology has been used as a smaller virtual solution, a single-server solution or other configurations to achieve different goals. This flexibility allows corporations to analyze the three pillars of projects ' time, cost and quality ' to determine which solution best fits their overall challenge. The advantage of StoredIQ, in this case study and the PII arena, is its ability to use its policy manager to automate repeatable, defensible actions to ensure that PII patterns are not present within the unstructured data sources.

StoredIQ achieves this flexibility by creating and maintaining deep indexes into data sources ' network storage systems (NetApp, IBM), workstations, e-mail applications (Microsoft Exchange, Lotus Domino), e-mail archives (Symantec EnterPrise Vault, EMC Email xTender), content management systems (SharePoint) ' to enable customers to do analysis of unstructured data in near real time. Through a browser interface, the technology provides ongoing information on how much data is in place, where it sits, who is involved and how much PII is present within the enterprise. By using advanced linguistics technology, including keyword and semantic searching, the software is able to tag and classify data across multiple data points and consolidate it for easy analysis.

As with all indexing technologies, the capabilities have to be balanced so that the indexing does not affect the overall network performance, while still maintaining a reasonable speed of indexing. In the case study above, network performance was independently monitored and there was minimal effect on the target network or servers. StoredIQ's parallel and distributed operations allowed the system to make adjustments as it indexed to ensure that it did not pass certain thresholds. Additionally, the scheduling capability provided within the technology allowed indexing sessions to be automatically done during non-peak hours. As stated above, its fault tolerant, clustered design, along with its federation capabilities, did not require any additional software or hardware to be provided on the target network.

For this particular project, the advanced libraries, the ability to move through terabytes of data at lightning speed without performance degradation, and the dynamic, customizable reports provided this corporation with an easier way to maintain and enforce PII compliance.

What's in It for You?

Whenever technology articles are read, the first question to ask is “what's in it for me?” The continuing growth of corporate data, with pedabytes becoming the new key phrase in the next 18 months, means that more PII data will be prevalent. At the same time, the privacy concerns of citizens are also increasing thus increasing, thus more regulations are being created to combat the presence of this PII.

With this immense pressure, there is no way that human-watch monitoring and enforcement can provide corporations and law firms with the protection that they need. Through advanced PII identification solutions, corporations can begin to put a solid, automated process in place that allows for monitoring, maintenance and reporting to occur more regularly. And, with the introduction of managed services, on top of these services, corporations can bring the right people, the right technology, and the right processes to bear, providing innovation in the management of PII.


Todd M. Haley is the Vice President of e-Discovery at eTERA Consulting, LLC (www.eteraconsulting.com). Haley consults on e-discovery and data management matters. In his current position, as well as in his previous experience as the Chief Technology Officer of a litigation law firm, Haley develops strategies, protocols and project management models.

Personally identifiable information (PII) is any data about an individual that could, potentially identify that person, such as a name, fingerprints or other biometric data, e-mail address, street address, telephone number or Social Security Number (SSN).

A study done at MIT by Latanya Sweeney, now a professor at Carnegie Mellon University, found that 87% of the population in the United States could be uniquely identified by just three pieces of PII: their five-digit zip code, gender and date of birth. This demonstrates that SSNs, while valuable, is not necessary to identify unique individuals.

Song-Beverly Act

Up until recently, Massachusetts had passed some of the strictest laws on individual privacy, being one of the first states in America to pass laws similar to the privacy laws in Europe, and expressing that individuals' information is owned and protected by the individual, regardless of where that information is generated. Now California agrees as well. On Feb. 10, 2011, the California Supreme Court released its decision in Pineda v. Williams-Sonoma Stores, Inc., No. S178241, holding that zip code information is PII under the state's Song-Beverly Credit Card Act (Cal. Civil Code 1747, (1971)). The court's decision restricts businesses in California from requesting and recording a person's zip code as part of a credit card transaction.

Under the Song-Beverly Act, a business is prohibited from requesting, or requiring as a condition to accepting a credit card payment, the cardholder's personal information. The California Court of Appeal had previously held that a zip code, without additional information, was not PII in Party City Corp. v. Superior Court , 169 Cal.App.4th 497 (2008). However, in Pineda, the California Supreme Court clarified California's broad interpretation of PII.

In addition to these cases, PII is now being regulated by many different agencies including but not limited to the: U.S. Securities and Exchange Commission (SEC), United States Nuclear Regulatory Commission, Federal Trade Commission (FTC), Federal Communications Commission, Social Security Administration, and the Department of Health and Human Services.

For more information on PII and associated regulations, check out these two whitepapers: Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers, from the FTC (http://bit.ly/otyOPK) and Guide to Protecting the Confidentiality of Personally Identifiable Information (PII), from the National Institute of Standards and Technology (NIST), a division of the U.S. Department of Commerce (http://1.usa.gov/aU94uy).

Identify and Remediate PII

One of the best ways that corporations and law firms can identify and manage PII is through a data topology map ' a diagram showing where data resides on their corporate network (see, “Locations of PII” in the white paper, Is Personally Identifiable Information (PII) Pervasive on Your Company's Computers?, located at www.eteraconsulting.com/resource-center/articles). In the past, these types of data topology maps could only be formulated through manual review and labor. However, with the advent of advanced information management systems (e.g., StoredIQ, Kazeon, Clearwell), corporations are now able to create automatic data topology maps.

The beauty of these new automatic data topology maps is that they can also be used dynamically to maintain up-to-date records, remediate the data (identify, classify, isolate, delete), and can provide exhaustive reporting about the data. Additionally, these technologies can be used successfully to enforce and report on compliance and risk matters on an ongoing basis.

Each of these new tools provides advanced searching, filtering, enforcement, maintenance and reporting for PII governance. To provide some specifics around these concepts, let's look at a particular case study and technology solution as a means of review.

The Case Study

One of the largest utility companies in the United States completed an internal audit at one of its sites and found that there were some instances of employee's PII, particularly SSNs on e-mail and file servers. When it came to us to perform a more extensive analysis, we provided two options, but made a specific recommendation based on its requirements. We recommended selecting the technology that could: 1) be attached to the network by a single cable and have no software installed on their networks; 2) index large amounts of data quickly with performance hits to their network (important due to a drop-dead deadline); and 3) produce highly customized reports quickly and easily.

The solution performed a quick crawl (“thindex”) of the network, which provided the client with a solid data sample from which to make decisions. From there, a full-text index was completed of the targeted system areas, indexing approximately 17 TBs and 30 million files in approximately five days. From there, a few pre-defined queries were started, such as SSNs, credit card numbers and vital records (using a library of over 65,000 medical terms). Further analysis was then performed by creating customized patterns to identify specific corporate credit cards, passports, checking and savings account numbers and much more. While the corporation only expected to find a few thousand records, the technology identified over 76,000 files that contained PII. (See the specific results in the aforementioned white paper, Is Personally Identifiable Information (PII) Pervasive on Your Company's Computers?)

The Technology Solution

The technology solution that was selected in the case study was an enterprise solution called StoredIQ. Because the technology is very unintrusive, StoredIQ was configured to the specific needs of the corporation. In the case study above, speed was essential so the technology was delivered in a full-rack, seven-server, large memory cluster with extensive space to handle the large indexing logs and rapid movement of data required. In other instances, the technology has been used as a smaller virtual solution, a single-server solution or other configurations to achieve different goals. This flexibility allows corporations to analyze the three pillars of projects ' time, cost and quality ' to determine which solution best fits their overall challenge. The advantage of StoredIQ, in this case study and the PII arena, is its ability to use its policy manager to automate repeatable, defensible actions to ensure that PII patterns are not present within the unstructured data sources.

StoredIQ achieves this flexibility by creating and maintaining deep indexes into data sources ' network storage systems (NetApp, IBM), workstations, e-mail applications (Microsoft Exchange, Lotus Domino), e-mail archives (Symantec EnterPrise Vault, EMC Email xTender), content management systems (SharePoint) ' to enable customers to do analysis of unstructured data in near real time. Through a browser interface, the technology provides ongoing information on how much data is in place, where it sits, who is involved and how much PII is present within the enterprise. By using advanced linguistics technology, including keyword and semantic searching, the software is able to tag and classify data across multiple data points and consolidate it for easy analysis.

As with all indexing technologies, the capabilities have to be balanced so that the indexing does not affect the overall network performance, while still maintaining a reasonable speed of indexing. In the case study above, network performance was independently monitored and there was minimal effect on the target network or servers. StoredIQ's parallel and distributed operations allowed the system to make adjustments as it indexed to ensure that it did not pass certain thresholds. Additionally, the scheduling capability provided within the technology allowed indexing sessions to be automatically done during non-peak hours. As stated above, its fault tolerant, clustered design, along with its federation capabilities, did not require any additional software or hardware to be provided on the target network.

For this particular project, the advanced libraries, the ability to move through terabytes of data at lightning speed without performance degradation, and the dynamic, customizable reports provided this corporation with an easier way to maintain and enforce PII compliance.

What's in It for You?

Whenever technology articles are read, the first question to ask is “what's in it for me?” The continuing growth of corporate data, with pedabytes becoming the new key phrase in the next 18 months, means that more PII data will be prevalent. At the same time, the privacy concerns of citizens are also increasing thus increasing, thus more regulations are being created to combat the presence of this PII.

With this immense pressure, there is no way that human-watch monitoring and enforcement can provide corporations and law firms with the protection that they need. Through advanced PII identification solutions, corporations can begin to put a solid, automated process in place that allows for monitoring, maintenance and reporting to occur more regularly. And, with the introduction of managed services, on top of these services, corporations can bring the right people, the right technology, and the right processes to bear, providing innovation in the management of PII.


Todd M. Haley is the Vice President of e-Discovery at eTERA Consulting, LLC (www.eteraconsulting.com). Haley consults on e-discovery and data management matters. In his current position, as well as in his previous experience as the Chief Technology Officer of a litigation law firm, Haley develops strategies, protocols and project management models.

Read These Next
COVID-19 and Lease Negotiations: Early Termination Provisions Image

During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.

How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

Generative AI and the 2024 Elections: Risks, Realities, and Lessons for Businesses Image

GenAI's ability to produce highly sophisticated and convincing content at a fraction of the previous cost has raised fears that it could amplify misinformation. The dissemination of fake audio, images and text could reshape how voters perceive candidates and parties. Businesses, too, face challenges in managing their reputations and navigating this new terrain of manipulated content.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.