Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
Courts are increasingly ordering counsel to identify and produce information beyond traditional e-mail and loose files. Whether its employee and payroll data related to a wage and hour dispute or trade data related to a market manipulation investigation, understanding the Electronic Discovery Reference Model (EDRM) as it relates to increasingly larger volumes of structured data has never been a more critical e-discovery capability. Structured data repositories, whether live or deactivated legacy systems, reside behind all corporate firewalls. This structured information is often complex and no two repositories are the same. A forensic examiner may encounter complex data models, encrypted fields, and highly customized, proprietary or third-party systems which may make data retrieval difficult. As a result, collection, processing and hosting methods are unique to each case and dictated by the structured data relevant to the matter.
Structured data is electronically stored information (ESI) that is organized and stored in a structured format, often in a database, mainframe or other repository. This article outlines some key differences between structured and unstructured e-discovery processes, structured data challenges, and key considerations that lead to accurate productions in a cost-efficient manner.
Identification
The first challenge to structured data e-discovery is identifying the relevant pieces of information required to conduct a thorough and timely investigation. While unstructured data is typically stored as stand-alone discrete files, structured data is often divided into fields across multiple tables. The requesting party is only entitled to fields that contain relevant information. To identify relevant structured data, an examiner must first ob tain an understanding of the client's operations and data storage facilities. This includes identifying key IT and operations contacts and understanding the network topology, system locations (and related data privacy laws), server platforms, applications and estimated data size. This allows an examiner to determine the requisite hardware and software tools needed to preserve, collect and process the data, as well as the repository size required to host the data.
Interviewing business owners and IT system owners of the data early on in the e-discovery process to avoid irrelevant collections or multiple extractions is a critical first step. Relevant data maps, schemas, or dictionaries should be obtained prior to the interviews to facilitate the discussion. Does data relevant to the case reside with third-party service providers or in the cloud? Whereas unstructured data typically resides within a corporation, structured data may not. In that instance, the provider may require that the forensic examiner sign a non-disclosure agreement prior to providing data maps and access to the information. The time it takes to get data residing on third party systems can vary. Early identification will help expedite the collection process.
It is also important to understand the nature of the data being collected. Databases often contain large amounts of private data. Personally Identifiable Information (PII), customer records, credit card numbers, Social Security numbers, financial records as well as Personal Healthcare Information (PHI) may be present in these structures and are subject to state, national and international privacy laws.
Following the identification phase, counsel and the forensic examiner can begin assessing the burden to produce. Some considerations include: data availability; additional backup tape and hard drive space requirements; operational impact; and data volumes.
Preservation
With relevant ESI identified, a preservation plan must be established to ensure that the destruction or alteration of ESI is suspended. While unstructured data is fairly static in nature, structured data is often dynamic as normal business operations generate new records, update existing records, and destroy old records using automated truncation scripts. IT and business owners should receive a preservation hold notice stating relevant ESI and timeframe, and require confirmation of receipt. Discussions should occur on whether or not active data destruction and backup tape rotations should be suspended. If a timeline of events must be reconstructed and a separate change log within the repository does not exist, suspending backup tape rotation may be necessary and potentially costly. If a change log exists, a one-time full back-up or data export to a staging server may eliminate the need for backup tape suspension. If the matter requires suspension of data destruction in an active repository, additional costs for hard drive space may be incurred. Consider the option of using cold storage or other report repositories. If appropriate, these may be relied upon and deletion of the reports could simply be suspended.
Keep in mind that retention periods may vary from one table to the next and may even vary at the field level, so it's important to understand the life cycle of data within a table or field to ensure that all of the data within the established timeframe is preserved and subsequently collected. Relevant views and cross reference tables may have changed over time, impacting report output. To avoid potential fines, a broad preservation request should be delivered first then lifted as scope is narrowed.
Collection
Whereas unstructured data is collected using common ESI collection methods and tools, structured data collection varies based on underlying structure, extraction options, and production needs. Per the Sedona Conference Database Principles (available for download at http://bit.ly/Sr0OcR), to determine the burden to produce and agree on scope, requesting and responding parties should use empirical information. Prior to a full collection, sample extraction and testing is performed. In large enterprise databases, not all elements are owned or controlled by the same group and all relevant information may not be equally accessible. Portions of processing and data aggregation may be outsourced while the structures themselves are controlled by corporate. Or a company may own the raw data but the source code and database design in a cloud-based system may be proprietary. Fractured ownership of data and its underlying structure is a common challenge, even between a parent company and its subsidiary.
Counsel and forensic examiners should be aware of the operational impact of performing a live collection from a production environment. Extraction of live data can place a lock on extraction sources, preventing or delaying access to a system and causing an unintended business interruption. Business owners and IT personnel can help assess the impact of collection.
Prior to collection, a formal document and data request outlining all systems and required fields should be supplied to the data extraction teams. Since extraction is often done by internal IT or third parties, using chain of custody forms and obtaining a copy of the extraction scripts containing all filter criteria is essential. All data extracts must be verified. A forensic examiner should assess the extracted text data for quality and proper formatting and adjust extraction protocols and formatting as needed. Full extracts should be reconciled to the original source through check sums, record counts, and other verification procedures to help ensure completeness and accuracy.
Processing
Once structured data has been collected and validated, it is often loaded into a new structured format and transformed, consolidated, normalized and standardized. Data sources may be diverse, further complicating the processing phase. These can include public records, news articles, social media, business data (e.g., transactional, operational, financial, or sales), and third party data. Determining that the processing steps are in alignment with the overall e-discovery strategy and production needs is a critical step. Failure to do so will result in unnecessary costs, typically incurred at an hourly consulting rate.
Once the data has been joined, analysis may include keyword matching, red flag and other anomaly identification, transaction scoring, discovery of non-obvious relationships and hidden patterns, and statistical analysis. Using specific software, a forensic examiner might create link, relationship, cluster and geospatial diagrams, among others.
The good news is that today's rapidly developing big data tools can simplify analysis of structured sources. If possible, a single tool should be used that can connect to multiple databases and file formats which can broaden the view of the data and potentially reduce costs related to data manipulation.
Hosting, Review and Production
While there are a multitude of off-the-shelf software solutions for unstructured e-discovery, structured information is typically hosted and reviewed in a custom application to accommodate its unique format. A Web-enabled review platform should be used for hosting. Case management software, which allows for the aggregation of transactions sharing common characteristics into a single group, is often a good review approach. These software applications may also be leveraged to assist with prioritization, assignment and coding. Since structured data often contains personally identifiable information (PII) or personal health information (PHI), masking or redaction may be required. Counsel should ensure the software can meet those requirements as well as the appropriate production format agreed upon with the requesting party.
As with traditional e-discovery, structured data volume drives costs. It's important to ensure the dataset includes only the rows which are relevant to the matter. Record, date range, and field culling should be revisited at this point to keep hosting and review costs down.
Conclusion
Structured data e-discovery is complex, and as such presents many challenges. Where e-discovery of unstructured data has many off-the-shelf options available that offer almost plug-and-play capabilities, there is no one all-encompassing solution with e-discovery of structured data. Counsel and forensic examiners should have a solid understanding of what structured data e-discovery entails and experience in tackling the complexities involved. The key to a successful and cost-effective project is to adhere to a proven methodology in which the steps of identification, preservation, collection, processing, hosting and reviewing are followed. The demand for structured data e-discovery has never been greater, and this trend will only grow in the years to come.
Wendy Predescu is a principal in KPMG LLP's U.S. Forensic Technology Services. She conducts large and small scale computer forensic examinations, structured data ediscovery investigations, fraud and FCPA investigations, financial investigations, and transaction look backs. Philip Zimmermann is a senior associate with KPMG LLP's U.S. Forensic Technology Services, Data Analytics team. An accomplished software engineer with nearly 20 years of extensive experience in the design, implementation, testing, packaging, and support of a broad range of products tailored to meet the needs of clients. This article represents the views of the author only, and does not necessarily represent the view or professional advice of KPMG LLP.
Courts are increasingly ordering counsel to identify and produce information beyond traditional e-mail and loose files. Whether its employee and payroll data related to a wage and hour dispute or trade data related to a market manipulation investigation, understanding the Electronic Discovery Reference Model (EDRM) as it relates to increasingly larger volumes of structured data has never been a more critical e-discovery capability. Structured data repositories, whether live or deactivated legacy systems, reside behind all corporate firewalls. This structured information is often complex and no two repositories are the same. A forensic examiner may encounter complex data models, encrypted fields, and highly customized, proprietary or third-party systems which may make data retrieval difficult. As a result, collection, processing and hosting methods are unique to each case and dictated by the structured data relevant to the matter.
Structured data is electronically stored information (ESI) that is organized and stored in a structured format, often in a database, mainframe or other repository. This article outlines some key differences between structured and unstructured e-discovery processes, structured data challenges, and key considerations that lead to accurate productions in a cost-efficient manner.
Identification
The first challenge to structured data e-discovery is identifying the relevant pieces of information required to conduct a thorough and timely investigation. While unstructured data is typically stored as stand-alone discrete files, structured data is often divided into fields across multiple tables. The requesting party is only entitled to fields that contain relevant information. To identify relevant structured data, an examiner must first ob tain an understanding of the client's operations and data storage facilities. This includes identifying key IT and operations contacts and understanding the network topology, system locations (and related data privacy laws), server platforms, applications and estimated data size. This allows an examiner to determine the requisite hardware and software tools needed to preserve, collect and process the data, as well as the repository size required to host the data.
Interviewing business owners and IT system owners of the data early on in the e-discovery process to avoid irrelevant collections or multiple extractions is a critical first step. Relevant data maps, schemas, or dictionaries should be obtained prior to the interviews to facilitate the discussion. Does data relevant to the case reside with third-party service providers or in the cloud? Whereas unstructured data typically resides within a corporation, structured data may not. In that instance, the provider may require that the forensic examiner sign a non-disclosure agreement prior to providing data maps and access to the information. The time it takes to get data residing on third party systems can vary. Early identification will help expedite the collection process.
It is also important to understand the nature of the data being collected. Databases often contain large amounts of private data. Personally Identifiable Information (PII), customer records, credit card numbers, Social Security numbers, financial records as well as Personal Healthcare Information (PHI) may be present in these structures and are subject to state, national and international privacy laws.
Following the identification phase, counsel and the forensic examiner can begin assessing the burden to produce. Some considerations include: data availability; additional backup tape and hard drive space requirements; operational impact; and data volumes.
Preservation
With relevant ESI identified, a preservation plan must be established to ensure that the destruction or alteration of ESI is suspended. While unstructured data is fairly static in nature, structured data is often dynamic as normal business operations generate new records, update existing records, and destroy old records using automated truncation scripts. IT and business owners should receive a preservation hold notice stating relevant ESI and timeframe, and require confirmation of receipt. Discussions should occur on whether or not active data destruction and backup tape rotations should be suspended. If a timeline of events must be reconstructed and a separate change log within the repository does not exist, suspending backup tape rotation may be necessary and potentially costly. If a change log exists, a one-time full back-up or data export to a staging server may eliminate the need for backup tape suspension. If the matter requires suspension of data destruction in an active repository, additional costs for hard drive space may be incurred. Consider the option of using cold storage or other report repositories. If appropriate, these may be relied upon and deletion of the reports could simply be suspended.
Keep in mind that retention periods may vary from one table to the next and may even vary at the field level, so it's important to understand the life cycle of data within a table or field to ensure that all of the data within the established timeframe is preserved and subsequently collected. Relevant views and cross reference tables may have changed over time, impacting report output. To avoid potential fines, a broad preservation request should be delivered first then lifted as scope is narrowed.
Collection
Whereas unstructured data is collected using common ESI collection methods and tools, structured data collection varies based on underlying structure, extraction options, and production needs. Per the Sedona Conference Database Principles (available for download at http://bit.ly/Sr0OcR), to determine the burden to produce and agree on scope, requesting and responding parties should use empirical information. Prior to a full collection, sample extraction and testing is performed. In large enterprise databases, not all elements are owned or controlled by the same group and all relevant information may not be equally accessible. Portions of processing and data aggregation may be outsourced while the structures themselves are controlled by corporate. Or a company may own the raw data but the source code and database design in a cloud-based system may be proprietary. Fractured ownership of data and its underlying structure is a common challenge, even between a parent company and its subsidiary.
Counsel and forensic examiners should be aware of the operational impact of performing a live collection from a production environment. Extraction of live data can place a lock on extraction sources, preventing or delaying access to a system and causing an unintended business interruption. Business owners and IT personnel can help assess the impact of collection.
Prior to collection, a formal document and data request outlining all systems and required fields should be supplied to the data extraction teams. Since extraction is often done by internal IT or third parties, using chain of custody forms and obtaining a copy of the extraction scripts containing all filter criteria is essential. All data extracts must be verified. A forensic examiner should assess the extracted text data for quality and proper formatting and adjust extraction protocols and formatting as needed. Full extracts should be reconciled to the original source through check sums, record counts, and other verification procedures to help ensure completeness and accuracy.
Processing
Once structured data has been collected and validated, it is often loaded into a new structured format and transformed, consolidated, normalized and standardized. Data sources may be diverse, further complicating the processing phase. These can include public records, news articles, social media, business data (e.g., transactional, operational, financial, or sales), and third party data. Determining that the processing steps are in alignment with the overall e-discovery strategy and production needs is a critical step. Failure to do so will result in unnecessary costs, typically incurred at an hourly consulting rate.
Once the data has been joined, analysis may include keyword matching, red flag and other anomaly identification, transaction scoring, discovery of non-obvious relationships and hidden patterns, and statistical analysis. Using specific software, a forensic examiner might create link, relationship, cluster and geospatial diagrams, among others.
The good news is that today's rapidly developing big data tools can simplify analysis of structured sources. If possible, a single tool should be used that can connect to multiple databases and file formats which can broaden the view of the data and potentially reduce costs related to data manipulation.
Hosting, Review and Production
While there are a multitude of off-the-shelf software solutions for unstructured e-discovery, structured information is typically hosted and reviewed in a custom application to accommodate its unique format. A Web-enabled review platform should be used for hosting. Case management software, which allows for the aggregation of transactions sharing common characteristics into a single group, is often a good review approach. These software applications may also be leveraged to assist with prioritization, assignment and coding. Since structured data often contains personally identifiable information (PII) or personal health information (PHI), masking or redaction may be required. Counsel should ensure the software can meet those requirements as well as the appropriate production format agreed upon with the requesting party.
As with traditional e-discovery, structured data volume drives costs. It's important to ensure the dataset includes only the rows which are relevant to the matter. Record, date range, and field culling should be revisited at this point to keep hosting and review costs down.
Conclusion
Structured data e-discovery is complex, and as such presents many challenges. Where e-discovery of unstructured data has many off-the-shelf options available that offer almost plug-and-play capabilities, there is no one all-encompassing solution with e-discovery of structured data. Counsel and forensic examiners should have a solid understanding of what structured data e-discovery entails and experience in tackling the complexities involved. The key to a successful and cost-effective project is to adhere to a proven methodology in which the steps of identification, preservation, collection, processing, hosting and reviewing are followed. The demand for structured data e-discovery has never been greater, and this trend will only grow in the years to come.
Wendy Predescu is a principal in
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
Latham & Watkins helped the largest U.S. commercial real estate research company prevail in a breach-of-contract dispute in District of Columbia federal court.
The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.