Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information. For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing. For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.
Key Principles
If corporate counsel has good lines of communication and support from internal information technology and/or eDiscovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (Also see The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation (Conrad J. Jacoby et al. eds., Public Comment Version 2011), hereinafter Sedona Database Principles.)
1. Calculated on the Fly
The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.
Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see database principle #2 in the Sedona Database Principles at 26-30).
2. Translation
Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.
3. Combining Data
Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To provide context, imagine an Excel workbook with 5,000 worksheets. A search for several key terms would be no easy feat. But, Excel workbooks, unlike databases, generally exist as single, stand-alone entities that allow a user to quickly search information from various worksheets. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information
from a database.
One should not underestimate the complexity of searching through multiple tables within a database. Counsel cannot perform a simple Google-like search. Rather, counsel must create structured, often complex, queries to obtain information. This likely means navigating the dangers of improper queries, including Cartesian joins (the combination of every row of one table with every row of another table rather than combining only related rows), which lead to incorrect and meaningless data. Rather than take on such a daunting task, it is far easier, less time consuming, and more reliable to actually consider, determine, and target relevant data.
4. Data Changes over Time
A database can maintain both current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2012, but could be entered separately and sequenced, resulting in two rows of data for this promotion event. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.
5. Protected Information
While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.
6. Efficiency and Security
Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, Ubiquitous Databases, Law Tech. News (Dec. 1, 2010), www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=1202475262660, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless they are familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data handle querying, validating, and presenting it in a form that is useful.
7. Static Reports
Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often both over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort,(Karl Schieneman et al., E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives, ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see database principle #3 in the Sedona Database Principles at 26-30).
8. Communication and Cooperation
Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.
Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (Schieneman, E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see the Sedona Database Principles at 6).
Conclusion
By targeting only relevant information, counsel will get data that is meaningful, useful, and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.
Michael Spencer is the Records and Discovery Manager for DISH Network L. L. C. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.
In legal discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information. For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data. On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not need to bother storing. For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.
Key Principles
If corporate counsel has good lines of communication and support from internal information technology and/or eDiscovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why “less” (i.e., targeted requests or productions) can be “more” (i.e., readily giving counsel all of the relevant information) in database discovery. (Also see The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation (Conrad J. Jacoby et al. eds., Public Comment Version 2011), hereinafter Sedona Database Principles.)
1. Calculated on the Fly
The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.
Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that “[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality” (see database principle #2 in the Sedona Database Principles at 26-30).
2. Translation
Terminology changes over time. For example, business users may refer to “paid leave” later as “paid leave of absence.” Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on “translate values” such as where the letter P equates to “paid leave” or, later, “paid leave of absence.” These translate values allow data stored in underlying data tables and application code to remain constant (as “P”) even as the business users change terminology over time. The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see “P.” Counsel would have to look elsewhere, likely in another table within the database, to find “P.'s” meaning. With a tailored request for employee status, counsel would instead get the meaningful description of “paid leave of absence” with no extra work involved.
3. Combining Data
Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To provide context, imagine an Excel workbook with 5,000 worksheets. A search for several key terms would be no easy feat. But, Excel workbooks, unlike databases, generally exist as single, stand-alone entities that allow a user to quickly search information from various worksheets. To search a database, however, one must use queries to combine database tables to create a useful representation ' a process that generally requires a trained analyst familiar with the underlying database model (i.e., field/table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information
from a database.
One should not underestimate the complexity of searching through multiple tables within a database. Counsel cannot perform a simple Google-like search. Rather, counsel must create structured, often complex, queries to obtain information. This likely means navigating the dangers of improper queries, including Cartesian joins (the combination of every row of one table with every row of another table rather than combining only related rows), which lead to incorrect and meaningless data. Rather than take on such a daunting task, it is far easier, less time consuming, and more reliable to actually consider, determine, and target relevant data.
4. Data Changes over Time
A database can maintain both current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2012, but could be entered separately and sequenced, resulting in two rows of data for this promotion event. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.
5. Protected Information
While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.
6. Efficiency and Security
Because “databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator ' [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing” (Craig Ball, Ubiquitous Databases, Law Tech. News (Dec. 1, 2010), www.law.com/jsp/lawtechnologynews/PubArticleLTN.jsp?id=1202475262660, emphasis added). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed. This “security trimming” limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless they are familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data handle querying, validating, and presenting it in a form that is useful.
7. Static Reports
Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often both over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a “native” database production. “[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format” (see the “Mismatch of 'Native Format' to Most Database Productions” discussion in Sedona Database Principles at 18). Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort,(Karl Schieneman et al., E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives, ESIBytes (Oct. 23, 2009), www.esibytes.com/?p=984). However, recognize that the “parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery” (see database principle #3 in the Sedona Database Principles at 26-30).
8. Communication and Cooperation
Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.
Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (Schieneman, E-Discovery of Databases ' Plaintiff's and Defendant's Perspectives). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, “better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data” (see the Sedona Database Principles at 6).
Conclusion
By targeting only relevant information, counsel will get data that is meaningful, useful, and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money ' and even headaches.
Michael Spencer is the Records and Discovery Manager for
ENJOY UNLIMITED ACCESS TO THE SINGLE SOURCE OF OBJECTIVE LEGAL ANALYSIS, PRACTICAL INSIGHTS, AND NEWS IN ENTERTAINMENT LAW.
Already a have an account? Sign In Now Log In Now
For enterprise-wide or corporate acess, please contact Customer Service at [email protected] or 877-256-2473
What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.
During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.
The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.
As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.
Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.