Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Field-Based Intelligence

By David Deppe
November 02, 2015

Has acceptance of technology-assisted review (TAR) finally turned a corner and earned broad acceptance in the legal community? Some recent comments by the influential and technology-savvy Magistrate Judge Andrew Peck, published in a March 2015 decision would seem to indicate that TAR has moved beyond the controversial stage and entered into the mainstream of e-discovery practice. See, Rio Tinto PLC v. Vale S.A., et al., No. 14 Civ. 3042, U.S. District Court, SDNY (March 2, 2015) ().

Culling Before TAR

“In the three years since da Silva Moore,” writes Judge Peck, “case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Id., referencing Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012). Judge Peck points out, however, that courts have not generally approved of requesting parties trying to force producing parties to use TAR, and he also notes there are still “open” issues related to use of the technology ' most notably the degree to which parties need to be transparent and cooperative with regard to selection of seed sets used to “train” the TAR system to identify evidence likely to be responsive.

Litigants are increasingly turning to TAR because they haven't found an efficient way to separate what they are looking for from what they are not prior to document review. After all, document review is still the most material cost driver in discovery, and TAR, when planned and executed correctly, has the potential to cost less than linear document review. That being said, corporate law departments, law firms and their clients are looking at any and all options to defensibly reduce their total project cost and annual budgets.

Challenges to TAR thus far have highlighted methodologies executed by parties to minimize the document universes they subject to TAR. In In re Biomet, No. 3:12-MD-2391 (N.D. Ind. April 18, 2013), the defendant used keyword searches and deduplication to reduce its potentially responsive universe from 19.5 million to 2.5 million documents. In Rio Tinto, the defendant used search terms to eliminate almost 75% of the documents in its universe. The plaintiffs in both matters lost challenges asserting that keyword searches do not return an acceptable recall of responsive information, and the respective courts approved these culling methods prior to TAR. One key takeaway is that subjecting the entire collected data universe or even entire user directories to TAR without first intelligently filtering is cost prohibitive.

Field-Based Intelligence

In most cases, information gathering in preparation for discovery is disjointed: A hold notice is issued to employees selected by the company. Outside counsel is hired by the company. Outside counsel conducts substantive interviews with the custodians (although this is not always the case). Data is collected and sent to a vendor, but none of the intelligence gathered in the interviews is passed on to the vendor. Outside counsel comes up with search terms. The vendor runs search terms and reports volume to outside counsel. Terms are revised based on volume reports ' but no one looks at a single document or the relationship between the documents containing a search term. At this point, a large universe of documents has been identified for review and the client must decide: Will TAR be less expensive than linear review? That, however, is the wrong question.

A better question would be: “What percentage of the documents we review get produced?” I have been asking this question for the last two years, and the average answer is 10%. That means there is a 90% opportunity to reduce the most significant costs associated with those documents.

So where do we begin? “Field-based intelligence” is gathered during data collections, custodian interviews and the process-driven exchange of information between the custodians, outside counsel, the client and the consultant driving the technology. Examples include names and details of the opposing party's employees, specific custodians communicated with and the nature of the communications related to the claims at issue, specific types of documents likely to be responsive to requests, and the nature of communications sought by counsel. That intelligence can then be applied to the collected data to quickly ' and in the first instance ' find what you are looking for. With this approach, you can begin to identify sets of non-responsive documents returned by your search terms that comprise the 90% not being produced.

Think of field-based intelligence as a surgeon using a scalpel rather than an axe. It involves making a concerted, human-led, machine-assisted effort to understand what the custodians know, with whom they have communicated and the types of data used around each claim or issue. This exercise enables the practitioner to quickly identify and validate specific examples of what they are looking for. Using those positive validations to identify false negatives in the search for relevant data actually results in significant false data reduction, which has a material impact on cost savings and maximizes the richness of the dataset prior to TAR or linear review.

Unlike TAR, which is primarily machine-driven, field-based intelligence leverages a combination of targeted automation and the data analysis expertise of experienced consultants to reduce data volumes and aggregate intelligence in a systematic way, and at an earlier stage of the e-discovery process.

At my own organization, we use a process and application called Questio. Here's how it works: In-house and outside counsel have an opportunity to directly engage with the collected dataset. They view the application of the aggregated intelligence in the Questio platform during sessions driven by a Questio consultant. We identify “hot” or responsive documents and non-responsive documents in the first 24 hours. Outside counsel then validates those result sets and the documents move on to the next stage. Positive results are promoted to a review platform and negative results are excluded and remain in Questio. To be clear, outside counsel or the client is making the call based on clear defensible intelligence, not UnitedLex or Questio. The idea is to perform highly targeted, intelligent extractions after collections and before processing, hosting and review, then apply the aggregated intelligence to the dataset in Questio.

Advantages of Intelligence-Based Approach

Of the many advantages, perhaps the most significant is the enhancement of the downstream e-discovery process. Understanding the relationships between litigating parties' employees and key issues that may otherwise not have been identified for months can change a litigator's strategy. The ability to quantify cost savings at the matter level is critical. Relying solely on TAR at the review phase can significantly limit timely identification of key documents, relationships and areas of risk, as well as increase the total project cost. An intelligence-based approach offers a logical blend of technology and services earlier in the e-discovery process.

In fact, we developed this technology and process because there was nothing available on the market through which a scientific data reduction process could be applied before processing. The growth of complex data and file types is increasing the number of documents that are resistant to most TAR systems and thus require manual review. The presence of such documents, coupled with the need to perform full review of post-TAR responsive sets, can easily undermine the total project cost reduction rationale that often justifies the use of TAR.

In determining the cost implications of TAR, take the time to measure your discovery spend on past projects so you have a better understanding of the kinds of data you typically deal with and know what it costs to funnel it through the discovery process. Here are two ways to measure your discovery spend per matter and compare across all matters:

  1. Total project cost (all e-discovery and document review costs) divided by the volume of data (GB) ingested. This gives you the cost per GB you ingested to search to compare across matters.
  2. Total project cost divided by the number of documents reviewed to give you a total cost per document reviewed.

These are metrics you can easily obtain historically, as well as apply to existing and future matters to measure your success in achieving your lowest total project cost. If you calculate your average total project cost per GB ingested, perhaps you will have just materially simplified your bidding process. To bid all services across the litigation lifecycle for a fixed cost per GB ingested would end the challenges associated with rates and line item comparisons and fully align the interests of all parties.


David Deppe is the president of UnitedLex Corporation and is responsible for the international management of Litigation Services, Investigations and Cyber Risk Solutions. He has worked closely with government agencies, top-50 national law firms, and Global 500 companies.

Has acceptance of technology-assisted review (TAR) finally turned a corner and earned broad acceptance in the legal community? Some recent comments by the influential and technology-savvy Magistrate Judge Andrew Peck, published in a March 2015 decision would seem to indicate that TAR has moved beyond the controversial stage and entered into the mainstream of e-discovery practice. See, Rio Tinto PLC v. Vale S.A., et al., No. 14 Civ. 3042, U.S. District Court, SDNY (March 2, 2015) ().

Culling Before TAR

“In the three years since da Silva Moore,” writes Judge Peck, “case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Id., referencing Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012). Judge Peck points out, however, that courts have not generally approved of requesting parties trying to force producing parties to use TAR, and he also notes there are still “open” issues related to use of the technology ' most notably the degree to which parties need to be transparent and cooperative with regard to selection of seed sets used to “train” the TAR system to identify evidence likely to be responsive.

Litigants are increasingly turning to TAR because they haven't found an efficient way to separate what they are looking for from what they are not prior to document review. After all, document review is still the most material cost driver in discovery, and TAR, when planned and executed correctly, has the potential to cost less than linear document review. That being said, corporate law departments, law firms and their clients are looking at any and all options to defensibly reduce their total project cost and annual budgets.

Challenges to TAR thus far have highlighted methodologies executed by parties to minimize the document universes they subject to TAR. In In re Biomet, No. 3:12-MD-2391 (N.D. Ind. April 18, 2013), the defendant used keyword searches and deduplication to reduce its potentially responsive universe from 19.5 million to 2.5 million documents. In Rio Tinto, the defendant used search terms to eliminate almost 75% of the documents in its universe. The plaintiffs in both matters lost challenges asserting that keyword searches do not return an acceptable recall of responsive information, and the respective courts approved these culling methods prior to TAR. One key takeaway is that subjecting the entire collected data universe or even entire user directories to TAR without first intelligently filtering is cost prohibitive.

Field-Based Intelligence

In most cases, information gathering in preparation for discovery is disjointed: A hold notice is issued to employees selected by the company. Outside counsel is hired by the company. Outside counsel conducts substantive interviews with the custodians (although this is not always the case). Data is collected and sent to a vendor, but none of the intelligence gathered in the interviews is passed on to the vendor. Outside counsel comes up with search terms. The vendor runs search terms and reports volume to outside counsel. Terms are revised based on volume reports ' but no one looks at a single document or the relationship between the documents containing a search term. At this point, a large universe of documents has been identified for review and the client must decide: Will TAR be less expensive than linear review? That, however, is the wrong question.

A better question would be: “What percentage of the documents we review get produced?” I have been asking this question for the last two years, and the average answer is 10%. That means there is a 90% opportunity to reduce the most significant costs associated with those documents.

So where do we begin? “Field-based intelligence” is gathered during data collections, custodian interviews and the process-driven exchange of information between the custodians, outside counsel, the client and the consultant driving the technology. Examples include names and details of the opposing party's employees, specific custodians communicated with and the nature of the communications related to the claims at issue, specific types of documents likely to be responsive to requests, and the nature of communications sought by counsel. That intelligence can then be applied to the collected data to quickly ' and in the first instance ' find what you are looking for. With this approach, you can begin to identify sets of non-responsive documents returned by your search terms that comprise the 90% not being produced.

Think of field-based intelligence as a surgeon using a scalpel rather than an axe. It involves making a concerted, human-led, machine-assisted effort to understand what the custodians know, with whom they have communicated and the types of data used around each claim or issue. This exercise enables the practitioner to quickly identify and validate specific examples of what they are looking for. Using those positive validations to identify false negatives in the search for relevant data actually results in significant false data reduction, which has a material impact on cost savings and maximizes the richness of the dataset prior to TAR or linear review.

Unlike TAR, which is primarily machine-driven, field-based intelligence leverages a combination of targeted automation and the data analysis expertise of experienced consultants to reduce data volumes and aggregate intelligence in a systematic way, and at an earlier stage of the e-discovery process.

At my own organization, we use a process and application called Questio. Here's how it works: In-house and outside counsel have an opportunity to directly engage with the collected dataset. They view the application of the aggregated intelligence in the Questio platform during sessions driven by a Questio consultant. We identify “hot” or responsive documents and non-responsive documents in the first 24 hours. Outside counsel then validates those result sets and the documents move on to the next stage. Positive results are promoted to a review platform and negative results are excluded and remain in Questio. To be clear, outside counsel or the client is making the call based on clear defensible intelligence, not UnitedLex or Questio. The idea is to perform highly targeted, intelligent extractions after collections and before processing, hosting and review, then apply the aggregated intelligence to the dataset in Questio.

Advantages of Intelligence-Based Approach

Of the many advantages, perhaps the most significant is the enhancement of the downstream e-discovery process. Understanding the relationships between litigating parties' employees and key issues that may otherwise not have been identified for months can change a litigator's strategy. The ability to quantify cost savings at the matter level is critical. Relying solely on TAR at the review phase can significantly limit timely identification of key documents, relationships and areas of risk, as well as increase the total project cost. An intelligence-based approach offers a logical blend of technology and services earlier in the e-discovery process.

In fact, we developed this technology and process because there was nothing available on the market through which a scientific data reduction process could be applied before processing. The growth of complex data and file types is increasing the number of documents that are resistant to most TAR systems and thus require manual review. The presence of such documents, coupled with the need to perform full review of post-TAR responsive sets, can easily undermine the total project cost reduction rationale that often justifies the use of TAR.

In determining the cost implications of TAR, take the time to measure your discovery spend on past projects so you have a better understanding of the kinds of data you typically deal with and know what it costs to funnel it through the discovery process. Here are two ways to measure your discovery spend per matter and compare across all matters:

  1. Total project cost (all e-discovery and document review costs) divided by the volume of data (GB) ingested. This gives you the cost per GB you ingested to search to compare across matters.
  2. Total project cost divided by the number of documents reviewed to give you a total cost per document reviewed.

These are metrics you can easily obtain historically, as well as apply to existing and future matters to measure your success in achieving your lowest total project cost. If you calculate your average total project cost per GB ingested, perhaps you will have just materially simplified your bidding process. To bid all services across the litigation lifecycle for a fixed cost per GB ingested would end the challenges associated with rates and line item comparisons and fully align the interests of all parties.


David Deppe is the president of UnitedLex Corporation and is responsible for the international management of Litigation Services, Investigations and Cyber Risk Solutions. He has worked closely with government agencies, top-50 national law firms, and Global 500 companies.

Read These Next
Overview of Regulatory Guidance Governing the Use of AI Systems In the Workplace Image

Businesses have long embraced the use of computer technology in the workplace as a means of improving efficiency and productivity of their operations. In recent years, businesses have incorporated artificial intelligence and other automated and algorithmic technologies into their computer systems. This article provides an overview of the federal regulatory guidance and the state and local rules in place so far and suggests ways in which employers may wish to address these developments with policies and practices to reduce legal risk.

Is Google Search Dead? How AI Is Reshaping Search and SEO Image

This two-part article dives into the massive shifts AI is bringing to Google Search and SEO and why traditional searches are no longer part of the solution for marketers. It’s not theoretical, it’s happening, and firms that adapt will come out ahead.

While Federal Legislation Flounders, State Privacy Laws for Children and Teens Gain Momentum Image

For decades, the Children’s Online Privacy Protection Act has been the only law to expressly address privacy for minors’ information other than student data. In the absence of more robust federal requirements, states are stepping in to regulate not only the processing of all minors’ data, but also online platforms used by teens and children.

Revolutionizing Workplace Design: A Perspective from Gray Reed Image

In an era where the workplace is constantly evolving, law firms face unique challenges and opportunities in facilities management, real estate, and design. Across the industry, firms are reevaluating their office spaces to adapt to hybrid work models, prioritize collaboration, and enhance employee experience. Trends such as flexible seating, technology-driven planning, and the creation of multifunctional spaces are shaping the future of law firm offices.

From DeepSeek to Distillation: Protecting IP In An AI World Image

Protection against unauthorized model distillation is an emerging issue within the longstanding theme of safeguarding intellectual property. This article examines the legal protections available under the current legal framework and explore why patents may serve as a crucial safeguard against unauthorized distillation.