Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Combating Big Data with a 'Facts First' Approach to e-Discovery

By Scott Giordano
October 02, 2013

One of the most pressing challenges for legal teams is the ability to quickly identify relevant electronically stored information (ESI) when litigation or regulatory action arises. This challenge has been significantly exacerbated by the arrival of “Big Data,” which refers to data sets that are so large and complex that mining and obtaining useful intelligence about them is impossible using conventional analytical methods and tools. Concerned with maintaining defensibility, many organizations take a “preserve everything” approach, which results in data sets so large that it becomes extremely difficult to identify the most relevant ESI early enough to potentially change the direction of the matter. This problem cannot be overcome by hiring more people, installing more servers or hiring outside service providers. It must be addressed holistically and aggressively with a combination of human intelligence, legal process and advanced information retrieval technology. Taken together, this approach represents a “Facts First” intelligence-gathering methodology that allows legal teams to identify, analyze and defensibly reduce ESI volumes.

The e-Discovery Risks of Big Data

Enterprises face a growing challenge of meeting e-discovery requirements in the face of out-of-control ESI growth. Unstructured data within corporations is growing at a rate of nearly 62% per year, according to International Data Corporation (IDC). This data proliferation, coupled with an over-reliance on backup tapes and ESI being stored in the cloud and across borders, makes the narrowing of the ESI “funnel” ' the process by which legal teams distill large volumes of ESI down to only what's relevant ' much more difficult and risky. Federal Rule of Civil Procedure (FRCP) Rule 26(g) exacerbates this challenge by imposing a duty on attorneys to sign every discovery request, response or objection and certify that the signee conducted a “reasonable inquiry” into the facts and the law supporting it.

This “reasonableness” standard is informed by the judiciary's understanding of the capabilities and limits of enterprise technology, as well as case law. Furthermore, both the courts and industry experts acknowledge that e-discovery obligations in the era of Big Data require at least some degree of advanced technology adoption.

The Principle of Facts First

A Facts First methodology is designed to address the explosion of e-discovery costs that organizations have experienced over the last decade. Until recently, cost control measures were limited to negotiating acceptable keywords, sending documents overseas for first-pass review, early case assessment (ECA) after ESI collection, and fighting “cost-shifting” battles. None of these approaches, however, address the problem early in the process. They tend to focus on post-collection e-discovery phases, after significant resources and expenses have already been expended.

A Facts First methodology is specifically tailored to each applicable phase of the Electronic Discovery Reference Model (EDRM), from Identification up to Production. The goal is to identify the roughly 1% of ESI at the outset of a case that will enable legal teams to eliminate 80% or more of the non-relevant ESI. With Facts First, legal teams gain opportunities to dispose matters favorably as early as possible by:

  • Limiting the scope of discovery prior to Rule 26(f) “meet and confer” or case settlement negotiations;
  • Precisely identifying sought-after documents for production pursuant to Rule 26(g); and
  • Minimizing the number of documents that ultimately require attorney review and the associated costs.

Facts First in Practice

Facts First functions by leveraging information gained during the earliest EDRM phases to precisely locate relevant and responsive ESI and prevent irrelevant or non-responsive documents from reaching attorney review. Following are some examples of Facts First in practice.

Identification

Identification offers an ideal opportunity to narrow the ESI by rapidly identifying the “must have” custodians, review their respective repositories, locating the types of documents that will most likely advance the litigant's case as well as those that will ultimately require attorney review, and eliminate broad groups of custodians and non-custodial data sources (NCDS) that are not relevant. This step should begin with custodian interviews so that the legal teams can uncover principle information about the matter, including potential repositories under their control, potential NCDS (e.g., SharePoint), as well as gain recommendations for other potential custodians. This should include historical scoping to identify potential custodians who have changed departments or functions over time. Information gained from the identification practices will enable counsel to develop and implement informed, effective and narrowly-tailored legal hold orders.

Early Case Assessment

Early Case Assessment (ECA) is often listed by industry participants variously on the EDRM somewhere after Identification and before Production. In practice, ECA represents a “meta” phase and encompasses efforts to determine the scope and merits of the matter, develop litigation strategies, make try-or-settle decisions and prepare for Rule 26(f) meet-and-confer sessions. With Facts First, identifying responsive ESI and the most relevant case facts is accomplished by leveraging information gained during the identification phase and creating an index of the enterprise's ESI ecosystem using a software agent. Once indexed, legal teams can apply any number of search tools, from traditional keyword filtering to machine-learning technology, to rapidly determine which documents are relevant and responsive, as well as identify those documents upon which the disposition of the matter will be based. All of the foregoing is accomplished before collection takes place.

Collection

At the point of Collection, the ESI funnel can be narrowed further. Most e-discovery practitioners are accustomed to processing data after Collection via a third-party service provider or with a specialized processing tool. Newer collection technologies can cull duplicates and irrelevant system files automatically during the collection process, creating a more manageable document set without the need to pay for separate processing charges. This approach also provides faster access to the evidence for first-pass and privilege review internally or by outside counsel.

Review

Even after ESI volumes have been significantly reduced, relying on manual, antiquated approaches to analyze and review ESI for relevancy and privilege can still result in exorbitant costs and time delays. Numerous studies have exposed the shortcomings of traditional, linear human review when compared to more advanced technology-assisted methods. The application of machine intelligence and human expertise to ESI document sets in order to minimize the necessity of human review is variously referred to as predictive coding, computer-assisted review (CAR), or technology-assisted review (TAR).

Whatever the nomenclature, the relentlessly growing body of ESI has made the application of machine-learning technology necessary for compliance with the FRCP and judicial demands.

The latest application of machine-learning, predictive intelligence , relies on training a computer (building a “model”) to predict which documents presented to it are most likely to be responsive. That model can then be applied to data sets of nearly any size before and after collection, enabling rapid and accurate analysis, retrieval and review of responsive documents. Moreover, once a model is created, it can be applied to additional data sets requiring review in the future; in fact, a library of models can be created and developed over time to address different types of matters.

Facts First Beyond e-Discovery

Going beyond traditional e-discovery, Facts First can be used to proactively obtain intelligence from an ESI set for a variety of legally related situations, such as:

  • Second requests. Both the U.S. Department of Justice and the Federal Trade Commission can request in excess of one million documents from a corporation pursuant to a merger with or acquisition of another, in order to determine whether such a combination will potentially result in a restraint of trade. Facts First speeds what is essentially an e-discovery process and substantially reduces attorney review costs.
  • Internal investigations. A thorough internal investigation often follows an industrial accident or when wrongdoing by an insider to an organization is suspected. Such investigations typically focus on electronic documents and communications. Facts First can greatly assist in identifying percipient witnesses, illuminating a chain of events and developing theories as to an event's cause or causes.
  • Protection of intellectual property. Loss of intellectual property and confidential information through employee negligence and malicious outsiders is a constant, growing problem for organizations. It has resulted in the creation of the legal-technical discipline of data loss prevention (DLP). Facts First can be proactively used to promote effective DLP through identification of employees that possess, and NCDS that contain, materials that has not been properly marked as confidential or protected from distribution.
  • Compliance. Ensuring compliance with regulatory agencies, trade groups or internally by policy and by quality standards, such as Six Sigma, requires e-discovery practices. A leading example of an externally imposed requirement is the protection of personally identifiable information (PII). U.S. regulatory agencies require organizations to protect PII in differing contexts, such as medical records and financial information. The same processes used to identify relevant and responsive ESI and label them accordingly can also be applied to an organization's ESI ecosystem in order to identify the nature of the ESI under the control of custodians and of that contained in NCDS, as well as the location of duplicates and near-duplicates, enterprise wide. The identification and location of ESI containing evidence of policy or quality control failures can likewise be conducted in order to improve operations and prevent larger problems from manifesting.

Conclusion

Facts First meets the legal, logistical and economic challenges presented by today's e-discovery process requirements, with an emphasis on locating relevant and key documents that can control costs and lead to more favorable case outcomes. Not only are legal teams having to address larger data sets, they also must account for evolving data forms.

Discoverable ESI now resides on mobile devices, cloud-based e-mail systems and social media sites. Data is everywhere, and anything powered by electricity will more often than not produce some form of ESI. These data sets may be amalgamations of “sensory” data (log files and other metadata), social media, structured (relational databases) and unstructured (e-mail, application files). A Facts First methodology addresses the need to narrow the ESI funnel. Once the disciplines of Facts First are successfully established, the opportunities to combat the Big Data evolution and protect the organization in legal and operational contexts are nearly limitless.


Scott Giordano is corporate technology counsel at Exterro. Giordano holds both Information Security Systems Professional (CISSP) and Certified Information Privacy Professional (CIPP) certifications and serves as Exterro's subject matter expert on the intersection of law and technology as it applies to e-discovery, information governance, compliance and risk management issues.

One of the most pressing challenges for legal teams is the ability to quickly identify relevant electronically stored information (ESI) when litigation or regulatory action arises. This challenge has been significantly exacerbated by the arrival of “Big Data,” which refers to data sets that are so large and complex that mining and obtaining useful intelligence about them is impossible using conventional analytical methods and tools. Concerned with maintaining defensibility, many organizations take a “preserve everything” approach, which results in data sets so large that it becomes extremely difficult to identify the most relevant ESI early enough to potentially change the direction of the matter. This problem cannot be overcome by hiring more people, installing more servers or hiring outside service providers. It must be addressed holistically and aggressively with a combination of human intelligence, legal process and advanced information retrieval technology. Taken together, this approach represents a “Facts First” intelligence-gathering methodology that allows legal teams to identify, analyze and defensibly reduce ESI volumes.

The e-Discovery Risks of Big Data

Enterprises face a growing challenge of meeting e-discovery requirements in the face of out-of-control ESI growth. Unstructured data within corporations is growing at a rate of nearly 62% per year, according to International Data Corporation (IDC). This data proliferation, coupled with an over-reliance on backup tapes and ESI being stored in the cloud and across borders, makes the narrowing of the ESI “funnel” ' the process by which legal teams distill large volumes of ESI down to only what's relevant ' much more difficult and risky. Federal Rule of Civil Procedure (FRCP) Rule 26(g) exacerbates this challenge by imposing a duty on attorneys to sign every discovery request, response or objection and certify that the signee conducted a “reasonable inquiry” into the facts and the law supporting it.

This “reasonableness” standard is informed by the judiciary's understanding of the capabilities and limits of enterprise technology, as well as case law. Furthermore, both the courts and industry experts acknowledge that e-discovery obligations in the era of Big Data require at least some degree of advanced technology adoption.

The Principle of Facts First

A Facts First methodology is designed to address the explosion of e-discovery costs that organizations have experienced over the last decade. Until recently, cost control measures were limited to negotiating acceptable keywords, sending documents overseas for first-pass review, early case assessment (ECA) after ESI collection, and fighting “cost-shifting” battles. None of these approaches, however, address the problem early in the process. They tend to focus on post-collection e-discovery phases, after significant resources and expenses have already been expended.

A Facts First methodology is specifically tailored to each applicable phase of the Electronic Discovery Reference Model (EDRM), from Identification up to Production. The goal is to identify the roughly 1% of ESI at the outset of a case that will enable legal teams to eliminate 80% or more of the non-relevant ESI. With Facts First, legal teams gain opportunities to dispose matters favorably as early as possible by:

  • Limiting the scope of discovery prior to Rule 26(f) “meet and confer” or case settlement negotiations;
  • Precisely identifying sought-after documents for production pursuant to Rule 26(g); and
  • Minimizing the number of documents that ultimately require attorney review and the associated costs.

Facts First in Practice

Facts First functions by leveraging information gained during the earliest EDRM phases to precisely locate relevant and responsive ESI and prevent irrelevant or non-responsive documents from reaching attorney review. Following are some examples of Facts First in practice.

Identification

Identification offers an ideal opportunity to narrow the ESI by rapidly identifying the “must have” custodians, review their respective repositories, locating the types of documents that will most likely advance the litigant's case as well as those that will ultimately require attorney review, and eliminate broad groups of custodians and non-custodial data sources (NCDS) that are not relevant. This step should begin with custodian interviews so that the legal teams can uncover principle information about the matter, including potential repositories under their control, potential NCDS (e.g., SharePoint), as well as gain recommendations for other potential custodians. This should include historical scoping to identify potential custodians who have changed departments or functions over time. Information gained from the identification practices will enable counsel to develop and implement informed, effective and narrowly-tailored legal hold orders.

Early Case Assessment

Early Case Assessment (ECA) is often listed by industry participants variously on the EDRM somewhere after Identification and before Production. In practice, ECA represents a “meta” phase and encompasses efforts to determine the scope and merits of the matter, develop litigation strategies, make try-or-settle decisions and prepare for Rule 26(f) meet-and-confer sessions. With Facts First, identifying responsive ESI and the most relevant case facts is accomplished by leveraging information gained during the identification phase and creating an index of the enterprise's ESI ecosystem using a software agent. Once indexed, legal teams can apply any number of search tools, from traditional keyword filtering to machine-learning technology, to rapidly determine which documents are relevant and responsive, as well as identify those documents upon which the disposition of the matter will be based. All of the foregoing is accomplished before collection takes place.

Collection

At the point of Collection, the ESI funnel can be narrowed further. Most e-discovery practitioners are accustomed to processing data after Collection via a third-party service provider or with a specialized processing tool. Newer collection technologies can cull duplicates and irrelevant system files automatically during the collection process, creating a more manageable document set without the need to pay for separate processing charges. This approach also provides faster access to the evidence for first-pass and privilege review internally or by outside counsel.

Review

Even after ESI volumes have been significantly reduced, relying on manual, antiquated approaches to analyze and review ESI for relevancy and privilege can still result in exorbitant costs and time delays. Numerous studies have exposed the shortcomings of traditional, linear human review when compared to more advanced technology-assisted methods. The application of machine intelligence and human expertise to ESI document sets in order to minimize the necessity of human review is variously referred to as predictive coding, computer-assisted review (CAR), or technology-assisted review (TAR).

Whatever the nomenclature, the relentlessly growing body of ESI has made the application of machine-learning technology necessary for compliance with the FRCP and judicial demands.

The latest application of machine-learning, predictive intelligence , relies on training a computer (building a “model”) to predict which documents presented to it are most likely to be responsive. That model can then be applied to data sets of nearly any size before and after collection, enabling rapid and accurate analysis, retrieval and review of responsive documents. Moreover, once a model is created, it can be applied to additional data sets requiring review in the future; in fact, a library of models can be created and developed over time to address different types of matters.

Facts First Beyond e-Discovery

Going beyond traditional e-discovery, Facts First can be used to proactively obtain intelligence from an ESI set for a variety of legally related situations, such as:

  • Second requests. Both the U.S. Department of Justice and the Federal Trade Commission can request in excess of one million documents from a corporation pursuant to a merger with or acquisition of another, in order to determine whether such a combination will potentially result in a restraint of trade. Facts First speeds what is essentially an e-discovery process and substantially reduces attorney review costs.
  • Internal investigations. A thorough internal investigation often follows an industrial accident or when wrongdoing by an insider to an organization is suspected. Such investigations typically focus on electronic documents and communications. Facts First can greatly assist in identifying percipient witnesses, illuminating a chain of events and developing theories as to an event's cause or causes.
  • Protection of intellectual property. Loss of intellectual property and confidential information through employee negligence and malicious outsiders is a constant, growing problem for organizations. It has resulted in the creation of the legal-technical discipline of data loss prevention (DLP). Facts First can be proactively used to promote effective DLP through identification of employees that possess, and NCDS that contain, materials that has not been properly marked as confidential or protected from distribution.
  • Compliance. Ensuring compliance with regulatory agencies, trade groups or internally by policy and by quality standards, such as Six Sigma, requires e-discovery practices. A leading example of an externally imposed requirement is the protection of personally identifiable information (PII). U.S. regulatory agencies require organizations to protect PII in differing contexts, such as medical records and financial information. The same processes used to identify relevant and responsive ESI and label them accordingly can also be applied to an organization's ESI ecosystem in order to identify the nature of the ESI under the control of custodians and of that contained in NCDS, as well as the location of duplicates and near-duplicates, enterprise wide. The identification and location of ESI containing evidence of policy or quality control failures can likewise be conducted in order to improve operations and prevent larger problems from manifesting.

Conclusion

Facts First meets the legal, logistical and economic challenges presented by today's e-discovery process requirements, with an emphasis on locating relevant and key documents that can control costs and lead to more favorable case outcomes. Not only are legal teams having to address larger data sets, they also must account for evolving data forms.

Discoverable ESI now resides on mobile devices, cloud-based e-mail systems and social media sites. Data is everywhere, and anything powered by electricity will more often than not produce some form of ESI. These data sets may be amalgamations of “sensory” data (log files and other metadata), social media, structured (relational databases) and unstructured (e-mail, application files). A Facts First methodology addresses the need to narrow the ESI funnel. Once the disciplines of Facts First are successfully established, the opportunities to combat the Big Data evolution and protect the organization in legal and operational contexts are nearly limitless.


Scott Giordano is corporate technology counsel at Exterro. Giordano holds both Information Security Systems Professional (CISSP) and Certified Information Privacy Professional (CIPP) certifications and serves as Exterro's subject matter expert on the intersection of law and technology as it applies to e-discovery, information governance, compliance and risk management issues.

Read These Next
How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

COVID-19 and Lease Negotiations: Early Termination Provisions Image

During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

The Power of Your Inner Circle: Turning Friends and Social Contacts Into Business Allies Image

Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.