Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Less Is More in Database Discovery

By Michael Spencer and Diana Fasching
November 29, 2012

In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information.

For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing.

For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.

Key Principles

If corporate counsel has good lines of communication and support from internal information technology and/or e-discovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (See, “The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation,” (Conrad J. Jacoby et al. eds., Public Comment Version 2011) (http://bit.ly/Sr0OcR), hereinafter Sedona Database Principles.))

1. Calculated on the Fly

The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.

Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see, database principle #2 in the Sedona Database Principles at 26-30).

2. Translation

Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.

3. Combining Data

Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information from a database.

4. Data Changes over Time

A database can maintain current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For
example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2013, but could be entered separately and sequenced, resulting in two rows of data for this promotion. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.

5. Protected Information

While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.

6. Efficiency and Security

Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, “ Ubiquitous Databases,” Law Tech. News (Dec. 1, 2010), http://bit.ly/U0WG0L, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data querying, validating, and presenting it in a form that is useful.

7. Static Reports

Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see, database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see, the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort, (see, Karl Schieneman et al., “E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives,” ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see, database principle #3 in the Sedona Database Principles at 26-30).

8. Communication and Cooperation

Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.

Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (see, Schieneman, supra). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see, Sedona Database Principles at 6).

Conclusion

By targeting only relevant information, counsel will get data that is meaningful, useful and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.


Michael Spencer is the Records and Discovery Manager for DISH Network L. L. C. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.

In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information.

For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing.

For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.

Key Principles

If corporate counsel has good lines of communication and support from internal information technology and/or e-discovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (See, “The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation,” (Conrad J. Jacoby et al. eds., Public Comment Version 2011) (http://bit.ly/Sr0OcR), hereinafter Sedona Database Principles.))

1. Calculated on the Fly

The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.

Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see, database principle #2 in the Sedona Database Principles at 26-30).

2. Translation

Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.

3. Combining Data

Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information from a database.

4. Data Changes over Time

A database can maintain current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For
example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2013, but could be entered separately and sequenced, resulting in two rows of data for this promotion. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.

5. Protected Information

While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.

6. Efficiency and Security

Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, “ Ubiquitous Databases,” Law Tech. News (Dec. 1, 2010), http://bit.ly/U0WG0L, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data querying, validating, and presenting it in a form that is useful.

7. Static Reports

Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see, database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see, the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort, (see, Karl Schieneman et al., “E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives,” ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see, database principle #3 in the Sedona Database Principles at 26-30).

8. Communication and Cooperation

Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.

Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (see, Schieneman, supra). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see, Sedona Database Principles at 6).

Conclusion

By targeting only relevant information, counsel will get data that is meaningful, useful and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.


Michael Spencer is the Records and Discovery Manager for DISH Network L. L. C. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.

Read These Next
COVID-19 and Lease Negotiations: Early Termination Provisions Image

During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.

How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

Generative AI and the 2024 Elections: Risks, Realities, and Lessons for Businesses Image

GenAI's ability to produce highly sophisticated and convincing content at a fraction of the previous cost has raised fears that it could amplify misinformation. The dissemination of fake audio, images and text could reshape how voters perceive candidates and parties. Businesses, too, face challenges in managing their reputations and navigating this new terrain of manipulated content.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.