Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Field-Based Intelligence

By David Deppe
November 02, 2015

Has acceptance of technology-assisted review (TAR) finally turned a corner and earned broad acceptance in the legal community? Some recent comments by the influential and technology-savvy Magistrate Judge Andrew Peck, published in a March 2015 decision would seem to indicate that TAR has moved beyond the controversial stage and entered into the mainstream of e-discovery practice. See, Rio Tinto PLC v. Vale S.A., et al., No. 14 Civ. 3042, U.S. District Court, SDNY (March 2, 2015) ().

Culling Before TAR

“In the three years since da Silva Moore,” writes Judge Peck, “case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Id., referencing Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012). Judge Peck points out, however, that courts have not generally approved of requesting parties trying to force producing parties to use TAR, and he also notes there are still “open” issues related to use of the technology ' most notably the degree to which parties need to be transparent and cooperative with regard to selection of seed sets used to “train” the TAR system to identify evidence likely to be responsive.

Litigants are increasingly turning to TAR because they haven't found an efficient way to separate what they are looking for from what they are not prior to document review. After all, document review is still the most material cost driver in discovery, and TAR, when planned and executed correctly, has the potential to cost less than linear document review. That being said, corporate law departments, law firms and their clients are looking at any and all options to defensibly reduce their total project cost and annual budgets.

Challenges to TAR thus far have highlighted methodologies executed by parties to minimize the document universes they subject to TAR. In In re Biomet, No. 3:12-MD-2391 (N.D. Ind. April 18, 2013), the defendant used keyword searches and deduplication to reduce its potentially responsive universe from 19.5 million to 2.5 million documents. In Rio Tinto, the defendant used search terms to eliminate almost 75% of the documents in its universe. The plaintiffs in both matters lost challenges asserting that keyword searches do not return an acceptable recall of responsive information, and the respective courts approved these culling methods prior to TAR. One key takeaway is that subjecting the entire collected data universe or even entire user directories to TAR without first intelligently filtering is cost prohibitive.

Field-Based Intelligence

In most cases, information gathering in preparation for discovery is disjointed: A hold notice is issued to employees selected by the company. Outside counsel is hired by the company. Outside counsel conducts substantive interviews with the custodians (although this is not always the case). Data is collected and sent to a vendor, but none of the intelligence gathered in the interviews is passed on to the vendor. Outside counsel comes up with search terms. The vendor runs search terms and reports volume to outside counsel. Terms are revised based on volume reports ' but no one looks at a single document or the relationship between the documents containing a search term. At this point, a large universe of documents has been identified for review and the client must decide: Will TAR be less expensive than linear review? That, however, is the wrong question.

A better question would be: “What percentage of the documents we review get produced?” I have been asking this question for the last two years, and the average answer is 10%. That means there is a 90% opportunity to reduce the most significant costs associated with those documents.

So where do we begin? “Field-based intelligence” is gathered during data collections, custodian interviews and the process-driven exchange of information between the custodians, outside counsel, the client and the consultant driving the technology. Examples include names and details of the opposing party's employees, specific custodians communicated with and the nature of the communications related to the claims at issue, specific types of documents likely to be responsive to requests, and the nature of communications sought by counsel. That intelligence can then be applied to the collected data to quickly ' and in the first instance ' find what you are looking for. With this approach, you can begin to identify sets of non-responsive documents returned by your search terms that comprise the 90% not being produced.

Think of field-based intelligence as a surgeon using a scalpel rather than an axe. It involves making a concerted, human-led, machine-assisted effort to understand what the custodians know, with whom they have communicated and the types of data used around each claim or issue. This exercise enables the practitioner to quickly identify and validate specific examples of what they are looking for. Using those positive validations to identify false negatives in the search for relevant data actually results in significant false data reduction, which has a material impact on cost savings and maximizes the richness of the dataset prior to TAR or linear review.

Unlike TAR, which is primarily machine-driven, field-based intelligence leverages a combination of targeted automation and the data analysis expertise of experienced consultants to reduce data volumes and aggregate intelligence in a systematic way, and at an earlier stage of the e-discovery process.

At my own organization, we use a process and application called Questio. Here's how it works: In-house and outside counsel have an opportunity to directly engage with the collected dataset. They view the application of the aggregated intelligence in the Questio platform during sessions driven by a Questio consultant. We identify “hot” or responsive documents and non-responsive documents in the first 24 hours. Outside counsel then validates those result sets and the documents move on to the next stage. Positive results are promoted to a review platform and negative results are excluded and remain in Questio. To be clear, outside counsel or the client is making the call based on clear defensible intelligence, not UnitedLex or Questio. The idea is to perform highly targeted, intelligent extractions after collections and before processing, hosting and review, then apply the aggregated intelligence to the dataset in Questio.

Advantages of Intelligence-Based Approach

Of the many advantages, perhaps the most significant is the enhancement of the downstream e-discovery process. Understanding the relationships between litigating parties' employees and key issues that may otherwise not have been identified for months can change a litigator's strategy. The ability to quantify cost savings at the matter level is critical. Relying solely on TAR at the review phase can significantly limit timely identification of key documents, relationships and areas of risk, as well as increase the total project cost. An intelligence-based approach offers a logical blend of technology and services earlier in the e-discovery process.

In fact, we developed this technology and process because there was nothing available on the market through which a scientific data reduction process could be applied before processing. The growth of complex data and file types is increasing the number of documents that are resistant to most TAR systems and thus require manual review. The presence of such documents, coupled with the need to perform full review of post-TAR responsive sets, can easily undermine the total project cost reduction rationale that often justifies the use of TAR.

In determining the cost implications of TAR, take the time to measure your discovery spend on past projects so you have a better understanding of the kinds of data you typically deal with and know what it costs to funnel it through the discovery process. Here are two ways to measure your discovery spend per matter and compare across all matters:

  1. Total project cost (all e-discovery and document review costs) divided by the volume of data (GB) ingested. This gives you the cost per GB you ingested to search to compare across matters.
  2. Total project cost divided by the number of documents reviewed to give you a total cost per document reviewed.

These are metrics you can easily obtain historically, as well as apply to existing and future matters to measure your success in achieving your lowest total project cost. If you calculate your average total project cost per GB ingested, perhaps you will have just materially simplified your bidding process. To bid all services across the litigation lifecycle for a fixed cost per GB ingested would end the challenges associated with rates and line item comparisons and fully align the interests of all parties.


David Deppe is the president of UnitedLex Corporation and is responsible for the international management of Litigation Services, Investigations and Cyber Risk Solutions. He has worked closely with government agencies, top-50 national law firms, and Global 500 companies.

Has acceptance of technology-assisted review (TAR) finally turned a corner and earned broad acceptance in the legal community? Some recent comments by the influential and technology-savvy Magistrate Judge Andrew Peck, published in a March 2015 decision would seem to indicate that TAR has moved beyond the controversial stage and entered into the mainstream of e-discovery practice. See, Rio Tinto PLC v. Vale S.A., et al., No. 14 Civ. 3042, U.S. District Court, SDNY (March 2, 2015) ().

Culling Before TAR

“In the three years since da Silva Moore,” writes Judge Peck, “case law has developed to the point that it is now black letter law that where the producing party wants to utilize TAR for document review, courts will permit it.” Id., referencing Da Silva Moore v. Publicis Groupe, 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012). Judge Peck points out, however, that courts have not generally approved of requesting parties trying to force producing parties to use TAR, and he also notes there are still “open” issues related to use of the technology ' most notably the degree to which parties need to be transparent and cooperative with regard to selection of seed sets used to “train” the TAR system to identify evidence likely to be responsive.

Litigants are increasingly turning to TAR because they haven't found an efficient way to separate what they are looking for from what they are not prior to document review. After all, document review is still the most material cost driver in discovery, and TAR, when planned and executed correctly, has the potential to cost less than linear document review. That being said, corporate law departments, law firms and their clients are looking at any and all options to defensibly reduce their total project cost and annual budgets.

Challenges to TAR thus far have highlighted methodologies executed by parties to minimize the document universes they subject to TAR. In In re Biomet, No. 3:12-MD-2391 (N.D. Ind. April 18, 2013), the defendant used keyword searches and deduplication to reduce its potentially responsive universe from 19.5 million to 2.5 million documents. In Rio Tinto, the defendant used search terms to eliminate almost 75% of the documents in its universe. The plaintiffs in both matters lost challenges asserting that keyword searches do not return an acceptable recall of responsive information, and the respective courts approved these culling methods prior to TAR. One key takeaway is that subjecting the entire collected data universe or even entire user directories to TAR without first intelligently filtering is cost prohibitive.

Field-Based Intelligence

In most cases, information gathering in preparation for discovery is disjointed: A hold notice is issued to employees selected by the company. Outside counsel is hired by the company. Outside counsel conducts substantive interviews with the custodians (although this is not always the case). Data is collected and sent to a vendor, but none of the intelligence gathered in the interviews is passed on to the vendor. Outside counsel comes up with search terms. The vendor runs search terms and reports volume to outside counsel. Terms are revised based on volume reports ' but no one looks at a single document or the relationship between the documents containing a search term. At this point, a large universe of documents has been identified for review and the client must decide: Will TAR be less expensive than linear review? That, however, is the wrong question.

A better question would be: “What percentage of the documents we review get produced?” I have been asking this question for the last two years, and the average answer is 10%. That means there is a 90% opportunity to reduce the most significant costs associated with those documents.

So where do we begin? “Field-based intelligence” is gathered during data collections, custodian interviews and the process-driven exchange of information between the custodians, outside counsel, the client and the consultant driving the technology. Examples include names and details of the opposing party's employees, specific custodians communicated with and the nature of the communications related to the claims at issue, specific types of documents likely to be responsive to requests, and the nature of communications sought by counsel. That intelligence can then be applied to the collected data to quickly ' and in the first instance ' find what you are looking for. With this approach, you can begin to identify sets of non-responsive documents returned by your search terms that comprise the 90% not being produced.

Think of field-based intelligence as a surgeon using a scalpel rather than an axe. It involves making a concerted, human-led, machine-assisted effort to understand what the custodians know, with whom they have communicated and the types of data used around each claim or issue. This exercise enables the practitioner to quickly identify and validate specific examples of what they are looking for. Using those positive validations to identify false negatives in the search for relevant data actually results in significant false data reduction, which has a material impact on cost savings and maximizes the richness of the dataset prior to TAR or linear review.

Unlike TAR, which is primarily machine-driven, field-based intelligence leverages a combination of targeted automation and the data analysis expertise of experienced consultants to reduce data volumes and aggregate intelligence in a systematic way, and at an earlier stage of the e-discovery process.

At my own organization, we use a process and application called Questio. Here's how it works: In-house and outside counsel have an opportunity to directly engage with the collected dataset. They view the application of the aggregated intelligence in the Questio platform during sessions driven by a Questio consultant. We identify “hot” or responsive documents and non-responsive documents in the first 24 hours. Outside counsel then validates those result sets and the documents move on to the next stage. Positive results are promoted to a review platform and negative results are excluded and remain in Questio. To be clear, outside counsel or the client is making the call based on clear defensible intelligence, not UnitedLex or Questio. The idea is to perform highly targeted, intelligent extractions after collections and before processing, hosting and review, then apply the aggregated intelligence to the dataset in Questio.

Advantages of Intelligence-Based Approach

Of the many advantages, perhaps the most significant is the enhancement of the downstream e-discovery process. Understanding the relationships between litigating parties' employees and key issues that may otherwise not have been identified for months can change a litigator's strategy. The ability to quantify cost savings at the matter level is critical. Relying solely on TAR at the review phase can significantly limit timely identification of key documents, relationships and areas of risk, as well as increase the total project cost. An intelligence-based approach offers a logical blend of technology and services earlier in the e-discovery process.

In fact, we developed this technology and process because there was nothing available on the market through which a scientific data reduction process could be applied before processing. The growth of complex data and file types is increasing the number of documents that are resistant to most TAR systems and thus require manual review. The presence of such documents, coupled with the need to perform full review of post-TAR responsive sets, can easily undermine the total project cost reduction rationale that often justifies the use of TAR.

In determining the cost implications of TAR, take the time to measure your discovery spend on past projects so you have a better understanding of the kinds of data you typically deal with and know what it costs to funnel it through the discovery process. Here are two ways to measure your discovery spend per matter and compare across all matters:

  1. Total project cost (all e-discovery and document review costs) divided by the volume of data (GB) ingested. This gives you the cost per GB you ingested to search to compare across matters.
  2. Total project cost divided by the number of documents reviewed to give you a total cost per document reviewed.

These are metrics you can easily obtain historically, as well as apply to existing and future matters to measure your success in achieving your lowest total project cost. If you calculate your average total project cost per GB ingested, perhaps you will have just materially simplified your bidding process. To bid all services across the litigation lifecycle for a fixed cost per GB ingested would end the challenges associated with rates and line item comparisons and fully align the interests of all parties.


David Deppe is the president of UnitedLex Corporation and is responsible for the international management of Litigation Services, Investigations and Cyber Risk Solutions. He has worked closely with government agencies, top-50 national law firms, and Global 500 companies.

Read These Next
Strategy vs. Tactics: Two Sides of a Difficult Coin Image

With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.

'Huguenot LLC v. Megalith Capital Group Fund I, L.P.': A Tutorial On Contract Liability for Real Estate Purchasers Image

In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.

The Article 8 Opt In Image

The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.

Fresh Filings Image

Notable recent court filings in entertainment law.

Major Differences In UK, U.S. Copyright Laws Image

This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.