Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
Technology-Assisted Review (TAR) is clearly a hot topic in eDiscovery circles right now. A quick Google search certainly confirms that premise, and reinforces that organizations are looking for new answers to the most expensive aspect of eDiscovery.
The purpose of this article is not to examine whether or not TAR is sound in concept. Data volumes are increasing year over year at an alarming rate ' putting a great strain on already-thin resources ' and technology employed intelligently has been proven to streamline the document review process and deliver material time and cost savings.
The purpose of this article is instead to examine the role of artificial intelligence in TAR. To date, artificial intelligence has served as a backbone of virtually every form of TAR solution available. However, new concerns and evolving market demands are forcing organizations to re-think its usage in this capacity.
This article thus explores how artificial intelligence has been used effectively in the past, investigates a second form of TAR that delivers equal savings without relying on machine learning, and then offers best practices for getting the most out of each alternative.
A Brief History of TAR
Accelerating document review in litigation is critical because corporate-generated electronically stored information is growing at a rapid pace. To make matters worse, the time constraints in litigation, discovery, and document review are tightening from flooded, inflexible dockets. Historically, organizations have reviewed every document in a collection in order to minimize the perceived risk of producing privileged information or missing relevant data. But volumes are increasing such that this “linear review” is no longer a financially feasible approach.
Technology-Assisted Review has emerged as a viable alternative. This process of leveraging a combination of human input and technology to more rapidly identify potentially relevant data from a document collection has caught on because of its ability to significantly decrease the time and expense of review ' addressing the biggest component (up to 75%) of the total eDiscovery budget.
Until recently, some apprehension of this “new” category of technology has remained. Important factors such as the exact savings that can be provided, how the technology works, and (importantly) how competing offerings differ posed real concerns for counsel. Those apprehensions are now waning, due to a greater amount of education available and recent opinions and TAR has grown in popularity in recent years.
The Role of Artificial Intelligence
The vast majority of TAR systems currently available in the market have been based on various forms of artificial intelligence (e.g., Latent Semantic Indexing). These technologies leverage brute technical force and pattern matching to “predict” document relevance.
With this approach, thorough training is a critical requirement for the machine to make proper predictions of document relevancy. This training, often called a seed set, is developed by taking a small portion of documents from the total collection and analyzing them for potential relevance. These coded documents are then combined with a layer of artificial intelligence, and then the system codes additional documents across the full collection with similar semantic patterns in a “more like this” kind of search. The output is a smaller set of documents that the system determines may be relevant.
Early adopters of this technology have found it to be especially effective at identifying potentially relevant documents from large collections, particularly when the need exists to achieve some very rapid initial conclusions. It remains widely used across a wide range of enterprises.
There are some emerging concerns about this approach, though. The first is a lack of standards for how seed sets are developed. A few years ago, the common practice was to select as few as 500 documents to provide this training. As attorneys and courts have learned more, that number has expanded ' in some reported cases, up to 20,000 documents. A related problem is that there is no obvious end point. Even if you develop a seed set with 20,000 documents, it's no guarantee that you've provided broad enough training for the system. This is because artificial intelligence doesn't understand context like humans do and thus needs to be taught every way the thought can be expressed, with specific examples, in order to be successful.
Evolving Objective(s)
Today's TAR solutions primarily focus on helping organizations identify potentially relevant documents from a large collection of data. By expediting this process, they are able to deliver time and cost savings. However, with ever-increasing data volumes and shrinking timeframes, savings versus linear review alone is no longer enough for many.
In addition to understanding what documents may be relevant, forward-looking organizations also want to know that these documents tell them. With deeper insight into the documents, they are able to better organize the results, understand more about the content of each document, make some strategic decisions from the output, and ultimately achieve even greater savings. This objective is called knowledge extraction, and it is quickly emerging a key market trend.
While the artificial intelligence approach to TAR delivers many benefits, it has become apparent that knowledge extraction is not done ' because the technology generally doesn't capture why a document is relevant, but instead only if it might be relevant. This has opened the door for alternative approaches to enter the market.
A Language-Based Form of TAR Emerges
A second form of TAR, which does not rely on artificial intelligence, has since emerged. This methodology leverages language to understand content by performing two simple steps. First, users perform vocabulary analysis across the entire document collection. In doing so, they organize the collection into a logical framework, extract vocabulary for in-depth analysis, and associate documents to targeted issue(s). Second, users are asked to highlight the specific language within each document that they felt made it potentially relevant. By uniquely capturing this important information, it is able to provide significant document-level insight that can be used for knowledge extraction.
This combination of vocabulary and highlighting analysis yields knowledge extraction across the remaining documents. Specifically, it delivers: 1) Deep insight into each issue; 2) Rapid recognition of sub issues in the matter; 3) Visibility into what additional documents can be set aside; and 4) Insight into issue-relevant language in summary form.
This level of insight is unique to the language-based approach. Other benefits include greater transparency into coding decisions being made and greater control over the review team by allowing senior staff to audit those coding decisions in real time. It also uniquely enables users to re-use work product from one matter to the next and treat eDiscovery as a regular business.
Case Law for Both Approaches to TAR
Again, two general approaches to TAR are now widely available: an artificial intelligence-based approach and a language-based approach. Both deliver significant savings in time and cost, and both have been the subject of recent court opinions ' most notably, Judge Andrew J. Peck's Feb. 24 order in Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP)(S.D.N.Y. Feb. 24, 2012) and U.S. Magistrate Judge Nan Nolan's ruling in Kleen Products v. Packaging Corporation of America, Case No. 10 C 5711 (N.D. Ill. April 8, 2011).
In Da Silva, Judge Peck specifically holds that, “(Technology)-assisted review is an acceptable way to search for relevant ESI in appropriate cases. “
This statement, equally applicable to both alternatives, clearly gives us comfort in considering such an approach for expediting document review and minimizing its cost.
In Kleen, a case litigating the use of a language-based analytics workflow in document review, Judge Nolan held for the producing party for a number of reasons, but specifically because their approach has been embraced by the court system for years. She specifically relies on Principle 6 of the Sedona Best Practices, Recommendations and Principles for Addressing Electronic Document Production in justifying her decision. Principle 6 directs that:
Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.
With this set of decisions in play, the runway for TAR is clear. Now, the challenge is determining which approach is most appropriate for each case.
Choosing the Right Approach
Generally, the makeup of your case and your data set will influence which approach to take. Specific factors to consider include, but may not be limited to: 1) The estimated budget for the case; 2) The total amount in controversy; 3) The time allowed for producing responsive documents; 4) The volume of potentially relevant data identified for document review; 5) The need for additional insight into the remaining documents (knowledge extraction); and 6) The need for transparency and control in support of your selection.
First and foremost, regardless of the approach selected, particular attention must be given to The Sedona Conference Cooperation Proclamation before the approach is implemented. To emphasize this point, both Da Silva and Kleen reference The Proclamation as a key basis for their decisions. The Da Silva opinion provides:
Of course, the best approach to the use of computer-assisted coding (Technology-Assisted Review) is to follow the Sedona Cooperation Proclamation model. Advise opposing counsel that you plan to use computer-assisted coding and seek agreement; if you cannot, consider whether to abandon predictive coding for that case or go to the court for advance approval.”
Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 5.
Without a showing that an agreement is in place, the ability to refute a challenge to your TAR protocol will likely be much more difficult.
Best Practices for Both Alternatives
Taking a look at an artificial intelligence-based approach first, it is important to document the following at the planning stage: 1) The parties' agreement; 2) The relative amount of ESI to be reviewed; 3) The superiority of an (artificial-intelligence based) review to the available alternatives; 4) The need for cost effectiveness and proportionality under Rule 26(b)(2)(C); and 5) The transparency of the process.
Once an agreement has been reached between the parties on this approach, the producing party should be able to address the following questions to support the results: 1) What was done to implement the agreed-upon process? 2) Why has that process produced a defensible result? 3) Were the documents used to train the system shared with opposing counsel in advance? and 4) Can a showing be made that sufficient quality control testing was done to validate the results?
'Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 22.
For the language-based approach, some of the above also apply. In addition, make sure you take an active role on the front end of the process ' to clearly define the issues of the case. The logic and structure you put in at the outset will pay off over time with better results. Further, ensure that you apply oversight to the review team's coding decisions, and (as above) apply quality control measures to test results.
Conclusion
Regardless of which approach you choose, remember that implementing review acceleration technology and managing a case from beginning to end can be a difficult process and require resources that you may not have on staff. Consider retaining a technology and legal workflow expert to help you choose the right approach.
Bobbi Basile is Director, Consulting & Analytics for RenewData. She is responsible for leading the implementation of Language-Based Analytics engagements and has 24 years of experience in delivering strategic, operations and technology services to Fortune 500 legal departments and law firms. Basile is an active participant in The Sedona Conference Working Group on Electronic Document Retention and Production.
Technology-Assisted Review (TAR) is clearly a hot topic in eDiscovery circles right now. A quick
The purpose of this article is not to examine whether or not TAR is sound in concept. Data volumes are increasing year over year at an alarming rate ' putting a great strain on already-thin resources ' and technology employed intelligently has been proven to streamline the document review process and deliver material time and cost savings.
The purpose of this article is instead to examine the role of artificial intelligence in TAR. To date, artificial intelligence has served as a backbone of virtually every form of TAR solution available. However, new concerns and evolving market demands are forcing organizations to re-think its usage in this capacity.
This article thus explores how artificial intelligence has been used effectively in the past, investigates a second form of TAR that delivers equal savings without relying on machine learning, and then offers best practices for getting the most out of each alternative.
A Brief History of TAR
Accelerating document review in litigation is critical because corporate-generated electronically stored information is growing at a rapid pace. To make matters worse, the time constraints in litigation, discovery, and document review are tightening from flooded, inflexible dockets. Historically, organizations have reviewed every document in a collection in order to minimize the perceived risk of producing privileged information or missing relevant data. But volumes are increasing such that this “linear review” is no longer a financially feasible approach.
Technology-Assisted Review has emerged as a viable alternative. This process of leveraging a combination of human input and technology to more rapidly identify potentially relevant data from a document collection has caught on because of its ability to significantly decrease the time and expense of review ' addressing the biggest component (up to 75%) of the total eDiscovery budget.
Until recently, some apprehension of this “new” category of technology has remained. Important factors such as the exact savings that can be provided, how the technology works, and (importantly) how competing offerings differ posed real concerns for counsel. Those apprehensions are now waning, due to a greater amount of education available and recent opinions and TAR has grown in popularity in recent years.
The Role of Artificial Intelligence
The vast majority of TAR systems currently available in the market have been based on various forms of artificial intelligence (e.g., Latent Semantic Indexing). These technologies leverage brute technical force and pattern matching to “predict” document relevance.
With this approach, thorough training is a critical requirement for the machine to make proper predictions of document relevancy. This training, often called a seed set, is developed by taking a small portion of documents from the total collection and analyzing them for potential relevance. These coded documents are then combined with a layer of artificial intelligence, and then the system codes additional documents across the full collection with similar semantic patterns in a “more like this” kind of search. The output is a smaller set of documents that the system determines may be relevant.
Early adopters of this technology have found it to be especially effective at identifying potentially relevant documents from large collections, particularly when the need exists to achieve some very rapid initial conclusions. It remains widely used across a wide range of enterprises.
There are some emerging concerns about this approach, though. The first is a lack of standards for how seed sets are developed. A few years ago, the common practice was to select as few as 500 documents to provide this training. As attorneys and courts have learned more, that number has expanded ' in some reported cases, up to 20,000 documents. A related problem is that there is no obvious end point. Even if you develop a seed set with 20,000 documents, it's no guarantee that you've provided broad enough training for the system. This is because artificial intelligence doesn't understand context like humans do and thus needs to be taught every way the thought can be expressed, with specific examples, in order to be successful.
Evolving Objective(s)
Today's TAR solutions primarily focus on helping organizations identify potentially relevant documents from a large collection of data. By expediting this process, they are able to deliver time and cost savings. However, with ever-increasing data volumes and shrinking timeframes, savings versus linear review alone is no longer enough for many.
In addition to understanding what documents may be relevant, forward-looking organizations also want to know that these documents tell them. With deeper insight into the documents, they are able to better organize the results, understand more about the content of each document, make some strategic decisions from the output, and ultimately achieve even greater savings. This objective is called knowledge extraction, and it is quickly emerging a key market trend.
While the artificial intelligence approach to TAR delivers many benefits, it has become apparent that knowledge extraction is not done ' because the technology generally doesn't capture why a document is relevant, but instead only if it might be relevant. This has opened the door for alternative approaches to enter the market.
A Language-Based Form of TAR Emerges
A second form of TAR, which does not rely on artificial intelligence, has since emerged. This methodology leverages language to understand content by performing two simple steps. First, users perform vocabulary analysis across the entire document collection. In doing so, they organize the collection into a logical framework, extract vocabulary for in-depth analysis, and associate documents to targeted issue(s). Second, users are asked to highlight the specific language within each document that they felt made it potentially relevant. By uniquely capturing this important information, it is able to provide significant document-level insight that can be used for knowledge extraction.
This combination of vocabulary and highlighting analysis yields knowledge extraction across the remaining documents. Specifically, it delivers: 1) Deep insight into each issue; 2) Rapid recognition of sub issues in the matter; 3) Visibility into what additional documents can be set aside; and 4) Insight into issue-relevant language in summary form.
This level of insight is unique to the language-based approach. Other benefits include greater transparency into coding decisions being made and greater control over the review team by allowing senior staff to audit those coding decisions in real time. It also uniquely enables users to re-use work product from one matter to the next and treat eDiscovery as a regular business.
Case Law for Both Approaches to TAR
Again, two general approaches to TAR are now widely available: an artificial intelligence-based approach and a language-based approach. Both deliver significant savings in time and cost, and both have been the subject of recent court opinions ' most notably, Judge Andrew J. Peck's Feb. 24 order in Da Silva Moore v. Publicis Groupe & MSL Group, No. 11 Civ. 1279 (ALC) (AJP)(S.D.N.Y. Feb. 24, 2012) and
In Da Silva, Judge Peck specifically holds that, “(Technology)-assisted review is an acceptable way to search for relevant ESI in appropriate cases. “
This statement, equally applicable to both alternatives, clearly gives us comfort in considering such an approach for expediting document review and minimizing its cost.
In Kleen, a case litigating the use of a language-based analytics workflow in document review, Judge Nolan held for the producing party for a number of reasons, but specifically because their approach has been embraced by the court system for years. She specifically relies on Principle 6 of the Sedona Best Practices, Recommendations and Principles for Addressing Electronic Document Production in justifying her decision. Principle 6 directs that:
Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.
With this set of decisions in play, the runway for TAR is clear. Now, the challenge is determining which approach is most appropriate for each case.
Choosing the Right Approach
Generally, the makeup of your case and your data set will influence which approach to take. Specific factors to consider include, but may not be limited to: 1) The estimated budget for the case; 2) The total amount in controversy; 3) The time allowed for producing responsive documents; 4) The volume of potentially relevant data identified for document review; 5) The need for additional insight into the remaining documents (knowledge extraction); and 6) The need for transparency and control in support of your selection.
First and foremost, regardless of the approach selected, particular attention must be given to The Sedona Conference Cooperation Proclamation before the approach is implemented. To emphasize this point, both Da Silva and Kleen reference The Proclamation as a key basis for their decisions. The Da Silva opinion provides:
Of course, the best approach to the use of computer-assisted coding (Technology-Assisted Review) is to follow the Sedona Cooperation Proclamation model. Advise opposing counsel that you plan to use computer-assisted coding and seek agreement; if you cannot, consider whether to abandon predictive coding for that case or go to the court for advance approval.”
Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 5.
Without a showing that an agreement is in place, the ability to refute a challenge to your TAR protocol will likely be much more difficult.
Best Practices for Both Alternatives
Taking a look at an artificial intelligence-based approach first, it is important to document the following at the planning stage: 1) The parties' agreement; 2) The relative amount of ESI to be reviewed; 3) The superiority of an (artificial-intelligence based) review to the available alternatives; 4) The need for cost effectiveness and proportionality under Rule 26(b)(2)(C); and 5) The transparency of the process.
Once an agreement has been reached between the parties on this approach, the producing party should be able to address the following questions to support the results: 1) What was done to implement the agreed-upon process? 2) Why has that process produced a defensible result? 3) Were the documents used to train the system shared with opposing counsel in advance? and 4) Can a showing be made that sufficient quality control testing was done to validate the results?
'Da Silva Moore, 11 civ 1279 Slip Op., Feb. 24, 2012, at 22.
For the language-based approach, some of the above also apply. In addition, make sure you take an active role on the front end of the process ' to clearly define the issues of the case. The logic and structure you put in at the outset will pay off over time with better results. Further, ensure that you apply oversight to the review team's coding decisions, and (as above) apply quality control measures to test results.
Conclusion
Regardless of which approach you choose, remember that implementing review acceleration technology and managing a case from beginning to end can be a difficult process and require resources that you may not have on staff. Consider retaining a technology and legal workflow expert to help you choose the right approach.
Bobbi Basile is Director, Consulting & Analytics for RenewData. She is responsible for leading the implementation of Language-Based Analytics engagements and has 24 years of experience in delivering strategic, operations and technology services to Fortune 500 legal departments and law firms. Basile is an active participant in The Sedona Conference Working Group on Electronic Document Retention and Production.
ENJOY UNLIMITED ACCESS TO THE SINGLE SOURCE OF OBJECTIVE LEGAL ANALYSIS, PRACTICAL INSIGHTS, AND NEWS IN ENTERTAINMENT LAW.
Already a have an account? Sign In Now Log In Now
For enterprise-wide or corporate acess, please contact Customer Service at [email protected] or 877-256-2473
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.
Latham & Watkins helped the largest U.S. commercial real estate research company prevail in a breach-of-contract dispute in District of Columbia federal court.