Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Technology Assisted Review: Much More Than Predictive Coding

By Greg Buckles
July 30, 2012

Last June, Recommind stole a march in the e-discovery market with a patent for its predictive coding (PC) offering. The patent covers Recommind's systems and methods for iterative computer-assisted document analysis and review, and came just as a wave of different technology assisted review (TAR) offerings hit the market.

The result was a tumultuous year where confusion reigned: What is PC? What does the Recommind patent cover, and can other vendors offer PC? What about all the other predictive-type solutions flooding the market?

With some case law beginning to emerge now, almost a year later, the market has recognized that Recommind's PC methodology and usage case is only a small part of the bigger TAR picture, and that it is time for legal teams to embrace new, advanced review methodologies.

The bottom line is that, in the context of today's advanced technological world, TAR is about using a combination of technology and people to speed, improve and sometimes automate elements of the legal review process in a way that reduces costs and improves quality.

The eDJ Group has been conducting surveys and interviews to get a clearer picture of market adoption and attitudes. Interestingly, a quick graph of average Google hits per month for the search term “predictive coding” reveals a rapidly increasing use of the term that peaked at LTNY 2012 and has begun to decline despite recent related cases. See Figure 1 below.

[IMGCAP(1)]

The broader search term “technology assisted review” first appeared on the Internet in the middle of last year and has gained traction, most likely because it is a more suitable term to describe a market in which PC is but one advanced method. A recent eDiscovery Journal poll showed almost 60% of respondents preferred the broader term TAR to the narrower PC. See Figure 2 below.

[IMGCAP(2)]

TAR is not simply about determining which documents are relevant and/or privileged and marking them as such; rather, TAR is more broadly applicable in other scenarios:

  • Pre-collection. Early case assessment (ECA) ' identification of custodians, sources and collection criteria.
  • Processing. Culling and collection organization.
  • Review. Clustering, relationship extraction and more.
  • Post-review. Quality assurance and iterative collection refinement.

The majority of TAR users eDJ interviewed were more comfortable using TAR pre-and post-review than actually allowing the system to make relevance decisions on the final “collection.” Recent high-profile cases highlight some of the issues around TAR and how it is applied in practice. The issues in one of these cases, Kleen Products v. Packaging Corporation of America, can be seen as a TAR-generational conflict: search optimization vs. concept training. Both methods utilize iterative sampling processes based on human decisions to include or exclude ESI. The dominant TAR methods appear to fall into three primary groupings:

  1. Rules-driven. 'I know what I am looking for and how to profile it.' Human experts extract the common criteria for searches or rules. Examples: search optimization, linguistic analysis, filtering.
  2. Facet-driven.I let the system show me the profile groups first.” The collection is analyzed and profiled to identify groups. Examples: clustering, concepts, social network analysis.
  3. Propagation-driven.I start making decisions and the system looks for similar-related items.” Sample and known seed sets are reviewed and the system learns commonalities from the decisions. Examples: near-duplicate expansion, predictive coding.

These TAR mechanisms are not mutually exclusive. In fact, combining the mechanisms together can help overcome the limitations of individual approaches. For example, if a document corpus is not rich (e.g., does not have a high enough percentage of relevant documents), it can be hard to create a seed set that will be a good training set for the propagation-based system. It is, however, possible to use facet-based TAR methods like concept searching to more quickly find the documents that are relevant to create a model for relevance that the propagation-based system can leverage.

The Da Silva Moore v. Publicis Groupe case raised customer interest in trying TAR solutions because of claims that certain products or methods were approved or endorsed. One should note, however, that no tool or specific process has been generally approved or endorsed; rather, the use of TAR has been allowed in cases where the parties have agreed on TAR or allowed pending objections based on the results.

It is important to understand TAR in the context of priorities: people, process and then technology. Most e-discovery teams have adapted traditional linear review workflows from paper documents to ESI collections. TAR solutions step out of the linear review box and introduce concepts such as confidence levels, distribution factors, precision, recall and F1 (a summary measure combining both recall and precision) stability. Someone on the team must understand your chosen TAR solution and be able to explain and defend it in the context of your unique discovery. TAR solutions promise to increase relevance quality while decreasing the time and cost of review. Hold them to that promise by measuring the results. Most courts seem more interested in the quantified output than the technology underpinning the process; measurement ultimately trumps method.

Getting the right expertise in place is critical to practicing TAR in a way that will not only reduce review costs, but stand up in court. Organizations looking to successfully exploit the mechanisms of TAR will need:

  • Experts in the right tools and information retrieval. Software is an important part of TAR. The team executing TAR will need someone that can program the toolset with the rules necessary for the system to intelligently mark documents. Furthermore, information retrieval is a science unto itself, blending linguistics, statistics and computer science. Anyone practicing TAR will need the right team of experts to ensure a defensible and measurable process;
  • A legal review team. While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the decisions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set; and
  • An auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on statistics that demonstrate how certain the organization can be that the output of the TAR system matches the input specification. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing measurements should be trained in statistics.

That brings us back to the crux of the Da Silva Moore arguments, “How do you know when your TAR process is good enough?” How do you assure yourself that your manual review satisfies the standards of reasonable effort?

The answer? Strict quality control during the process followed by quality assurance with predefined acceptance criteria ' and thorough documentation at every step.

The Da Silva Moore transcripts and expert affidavits contain some interesting arguments on sample sizing and acceptable rates of false-negative results. No sufficiently large relevance review is perfect, but few counsel are ready to hear that truth. We have no firm rules or case law that define discovery quality standards. Therefore, anyone practicing TAR should document TAR decisions and QA/QC efforts with the knowledge that the other side may challenge them.


Greg Buckles is a co-founder & principal analyst for the consultancy, eDJ Group. Previously, Buckles served as the senior product manager of e-discovery for Symantec Corporation's Information Foundation group. Buckles is also a member of the Sedona Conference and the EDRM Committees.

Last June, Recommind stole a march in the e-discovery market with a patent for its predictive coding (PC) offering. The patent covers Recommind's systems and methods for iterative computer-assisted document analysis and review, and came just as a wave of different technology assisted review (TAR) offerings hit the market.

The result was a tumultuous year where confusion reigned: What is PC? What does the Recommind patent cover, and can other vendors offer PC? What about all the other predictive-type solutions flooding the market?

With some case law beginning to emerge now, almost a year later, the market has recognized that Recommind's PC methodology and usage case is only a small part of the bigger TAR picture, and that it is time for legal teams to embrace new, advanced review methodologies.

The bottom line is that, in the context of today's advanced technological world, TAR is about using a combination of technology and people to speed, improve and sometimes automate elements of the legal review process in a way that reduces costs and improves quality.

The eDJ Group has been conducting surveys and interviews to get a clearer picture of market adoption and attitudes. Interestingly, a quick graph of average Google hits per month for the search term “predictive coding” reveals a rapidly increasing use of the term that peaked at LTNY 2012 and has begun to decline despite recent related cases. See Figure 1 below.

[IMGCAP(1)]

The broader search term “technology assisted review” first appeared on the Internet in the middle of last year and has gained traction, most likely because it is a more suitable term to describe a market in which PC is but one advanced method. A recent eDiscovery Journal poll showed almost 60% of respondents preferred the broader term TAR to the narrower PC. See Figure 2 below.

[IMGCAP(2)]

TAR is not simply about determining which documents are relevant and/or privileged and marking them as such; rather, TAR is more broadly applicable in other scenarios:

  • Pre-collection. Early case assessment (ECA) ' identification of custodians, sources and collection criteria.
  • Processing. Culling and collection organization.
  • Review. Clustering, relationship extraction and more.
  • Post-review. Quality assurance and iterative collection refinement.

The majority of TAR users eDJ interviewed were more comfortable using TAR pre-and post-review than actually allowing the system to make relevance decisions on the final “collection.” Recent high-profile cases highlight some of the issues around TAR and how it is applied in practice. The issues in one of these cases, Kleen Products v. Packaging Corporation of America, can be seen as a TAR-generational conflict: search optimization vs. concept training. Both methods utilize iterative sampling processes based on human decisions to include or exclude ESI. The dominant TAR methods appear to fall into three primary groupings:

  1. Rules-driven. 'I know what I am looking for and how to profile it.' Human experts extract the common criteria for searches or rules. Examples: search optimization, linguistic analysis, filtering.
  2. Facet-driven.I let the system show me the profile groups first.” The collection is analyzed and profiled to identify groups. Examples: clustering, concepts, social network analysis.
  3. Propagation-driven.I start making decisions and the system looks for similar-related items.” Sample and known seed sets are reviewed and the system learns commonalities from the decisions. Examples: near-duplicate expansion, predictive coding.

These TAR mechanisms are not mutually exclusive. In fact, combining the mechanisms together can help overcome the limitations of individual approaches. For example, if a document corpus is not rich (e.g., does not have a high enough percentage of relevant documents), it can be hard to create a seed set that will be a good training set for the propagation-based system. It is, however, possible to use facet-based TAR methods like concept searching to more quickly find the documents that are relevant to create a model for relevance that the propagation-based system can leverage.

The Da Silva Moore v. Publicis Groupe case raised customer interest in trying TAR solutions because of claims that certain products or methods were approved or endorsed. One should note, however, that no tool or specific process has been generally approved or endorsed; rather, the use of TAR has been allowed in cases where the parties have agreed on TAR or allowed pending objections based on the results.

It is important to understand TAR in the context of priorities: people, process and then technology. Most e-discovery teams have adapted traditional linear review workflows from paper documents to ESI collections. TAR solutions step out of the linear review box and introduce concepts such as confidence levels, distribution factors, precision, recall and F1 (a summary measure combining both recall and precision) stability. Someone on the team must understand your chosen TAR solution and be able to explain and defend it in the context of your unique discovery. TAR solutions promise to increase relevance quality while decreasing the time and cost of review. Hold them to that promise by measuring the results. Most courts seem more interested in the quantified output than the technology underpinning the process; measurement ultimately trumps method.

Getting the right expertise in place is critical to practicing TAR in a way that will not only reduce review costs, but stand up in court. Organizations looking to successfully exploit the mechanisms of TAR will need:

  • Experts in the right tools and information retrieval. Software is an important part of TAR. The team executing TAR will need someone that can program the toolset with the rules necessary for the system to intelligently mark documents. Furthermore, information retrieval is a science unto itself, blending linguistics, statistics and computer science. Anyone practicing TAR will need the right team of experts to ensure a defensible and measurable process;
  • A legal review team. While much of the chatter around TAR centers on its ability to cut lawyers out of the review process, the reality is that the legal review team will become more important than ever. The quality and consistency of the decisions this team makes will determine the effectiveness that any tool can have in applying those decisions to a document set; and
  • An auditor. Much of the defensibility and acceptability of TAR mechanisms will rely on statistics that demonstrate how certain the organization can be that the output of the TAR system matches the input specification. Accurate measures of performance are important not only at the end of the TAR process, but also throughout the process in order to understand where efforts need to be focused in the next cycle or iteration. Anyone involved in setting or performing measurements should be trained in statistics.

That brings us back to the crux of the Da Silva Moore arguments, “How do you know when your TAR process is good enough?” How do you assure yourself that your manual review satisfies the standards of reasonable effort?

The answer? Strict quality control during the process followed by quality assurance with predefined acceptance criteria ' and thorough documentation at every step.

The Da Silva Moore transcripts and expert affidavits contain some interesting arguments on sample sizing and acceptable rates of false-negative results. No sufficiently large relevance review is perfect, but few counsel are ready to hear that truth. We have no firm rules or case law that define discovery quality standards. Therefore, anyone practicing TAR should document TAR decisions and QA/QC efforts with the knowledge that the other side may challenge them.


Greg Buckles is a co-founder & principal analyst for the consultancy, eDJ Group. Previously, Buckles served as the senior product manager of e-discovery for Symantec Corporation's Information Foundation group. Buckles is also a member of the Sedona Conference and the EDRM Committees.

Read These Next
How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

COVID-19 and Lease Negotiations: Early Termination Provisions Image

During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

The Power of Your Inner Circle: Turning Friends and Social Contacts Into Business Allies Image

Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.