Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Less Is More in Database Discovery

By Michael Spencer and Diana Fasching
November 29, 2012

In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information.

For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing.

For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.

Key Principles

If corporate counsel has good lines of communication and support from internal information technology and/or e-discovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (See, “The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation,” (Conrad J. Jacoby et al. eds., Public Comment Version 2011) (http://bit.ly/Sr0OcR), hereinafter Sedona Database Principles.))

1. Calculated on the Fly

The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.

Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see, database principle #2 in the Sedona Database Principles at 26-30).

2. Translation

Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.

3. Combining Data

Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information from a database.

4. Data Changes over Time

A database can maintain current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For
example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2013, but could be entered separately and sequenced, resulting in two rows of data for this promotion. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.

5. Protected Information

While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.

6. Efficiency and Security

Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, “ Ubiquitous Databases,” Law Tech. News (Dec. 1, 2010), http://bit.ly/U0WG0L, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data querying, validating, and presenting it in a form that is useful.

7. Static Reports

Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see, database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see, the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort, (see, Karl Schieneman et al., “E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives,” ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see, database principle #3 in the Sedona Database Principles at 26-30).

8. Communication and Cooperation

Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.

Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (see, Schieneman, supra). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see, Sedona Database Principles at 6).

Conclusion

By targeting only relevant information, counsel will get data that is meaningful, useful and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.


Michael Spencer is the Records and Discovery Manager for DISH Network L. L. C. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.

In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information.

For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing.

For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.

Key Principles

If corporate counsel has good lines of communication and support from internal information technology and/or e-discovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (See, “The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation,” (Conrad J. Jacoby et al. eds., Public Comment Version 2011) (http://bit.ly/Sr0OcR), hereinafter Sedona Database Principles.))

1. Calculated on the Fly

The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.

Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see, database principle #2 in the Sedona Database Principles at 26-30).

2. Translation

Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.

3. Combining Data

Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information from a database.

4. Data Changes over Time

A database can maintain current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For
example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2013, but could be entered separately and sequenced, resulting in two rows of data for this promotion. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.

5. Protected Information

While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.

6. Efficiency and Security

Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, “ Ubiquitous Databases,” Law Tech. News (Dec. 1, 2010), http://bit.ly/U0WG0L, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data querying, validating, and presenting it in a form that is useful.

7. Static Reports

Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see, database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see, the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort, (see, Karl Schieneman et al., “E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives,” ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see, database principle #3 in the Sedona Database Principles at 26-30).

8. Communication and Cooperation

Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.

Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (see, Schieneman, supra). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see, Sedona Database Principles at 6).

Conclusion

By targeting only relevant information, counsel will get data that is meaningful, useful and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.


Michael Spencer is the Records and Discovery Manager for DISH Network L. L. C. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.
Read These Next
Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

'Huguenot LLC v. Megalith Capital Group Fund I, L.P.': A Tutorial On Contract Liability for Real Estate Purchasers Image

In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Fresh Filings Image

Notable recent court filings in entertainment law.

Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.