Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
The litigation industry is awash with technology. According to consulting firm Gartner, law firms, corporations and service providers spent almost $2 billion in 2014 buying or licensing e-discovery software, almost none of which existed just 10 years ago. See, 'Magic Quadrant for E-Discovery Software.' Why? The primary driver has been the explosion in the amount and variety of discoverable data in the world. Without software, litigators today simply cannot get their job done, and corporation spend on discovery would be unfathomable. So far, so good ' our industry has made progress.
Yet there remains a significant challenge. In November 2013, Legal Tech Newsletter's ALM sibling, Law Technology News, published an article with the byline: 'Humans Are Still Essential in E-Discovery.' The article summarized results from an extensive study conducted by the Electronic Discovery Institute and involving two leading Stanford statistics professors. The study compared the results of 19 technology assisted review providers with an entirely human review of a 1.7 million document data set. The headline conclusion was 'software is only as good as its operators.' The article included a chart that ranked the 19 providers' performance as 'optimal,' 'average,' and 'low' for each of three reviews (responsive, privilege and hot documents). The range of performance was dramatic. Only one provider scored 'optimal' in all three categories, while five others scored 'optimal' in two of the three categories. Four scored 'low' in all three categories. And many used the exact same software. The range of spend was equally wide, with one provider's total cost under $50k and another's over $1.5 million, a 30x difference.
The conclusion that a product is only as good as its operator is not a revolutionary one. The same could be said for daily life functions, like driving a car. And yet, in litigation, the complexity and stakes are especially high. Demand is unpredictable. Many of the operations are complex. Everything needs to happen fast. The range of potential outcomes is very wide. Deadlines are critical. Atul Gawande, in his outstanding book The Checklist Manifesto, describes how checklists created dramatic improvement in results in the emergency healthcare industry, an industry which has many commonalities to litigation discovery.
Below is a high-level account of a real-life litigation discovery matter we handled. Behind each phase of operation is a carefully crafted set of processes and checklists. Indeed, we apply similar processes and checklists to the recruitment of our team ' an equally critical component in ensuring success. In reading the account, our hope is that you take away a lesson or two that you can apply in your day-to-day practice (and we have summarized four lessons at the end of the article).
The Challenge
An energy company was heading into arbitration in a large-scale construction dispute. The law firm representing the company needed to review and produce responsive documents from a data set of 2.1 million documents in a 30-day time period. A cost-effective and efficient solution to cull and review the documents for relevance was needed to manage this large data set in the tight timeframe required.
The Solution
Working closely with our consulting team, the law firm decided on a workflow involving Discovia's proprietary culling tool, Intelligent Case Assessment (ICA), Equivio Zoom's Relevance predictive coding module, and keyword search.
Step 1: Junk File Analysis and Culling Phase
Using file extension sampling and exclusion, 112,285 documents were excluded as computer-generated files or image files. Additionally, domain parsing and analysis was used to identify 230 e-mail domains and 57,000 additional documents as 'junk' mail. Because Equivio Relevance's predictive coding technology was selected as the second phase of the culling process, the client opted not to perform any further keyword filtering at this point to further reduce the data set.
Step 2: Configuring
Once initial culling was complete, our team of technology assisted review consultants worked with the client to configure the project for Relevance. Files not suitable for Relevance (such as image files or other files with no extracted text) were set aside and loaded into Relativity for standard review. The remaining files (approximately 1.6 million documents) were loaded into Relevance. Because Relevance only requires the text of the documents, rather than complete native files, the loading process is very fast. In this case, the files were loaded in a matter of hours, which served the case team well in this time-sensitive matter. The client selected a case expert who was well-versed in the case data and able to definitively distinguish relevant from non-relevant documents to train the system. Additionally, in order to comply with the agreed upon discovery protocol in the matter, the client disclosed its plans to use technology assisted review to the opposing party.
The Assessment Phase
The case expert was provided a random set of documents to begin the 'assessment phase' of the Relevance workflow. The purpose of the assessment phase is to provide a statistically significant sample of documents, subsequently used to monitor training progress and to estimate the recall and precision of the final results. The assessment phase typically requires the case expert to review 500-1,500 documents. In this case, the expert reviewed 520 documents and the system determined the richness of the document population to be 24.5%.
The Training Phase
With the assessment phase complete, the expert reviewer was able to move on to the interactive machine learning (aka, training) phase of the Relevance workflow. During this phase, the case expert reviews documents in batches of 40 to train the system on responsiveness. The expert marks each document in the batch responsive or non-responsive, allowing the system to learn. Using the information gathered from each batch, the system presents the expert with a new batch of documents from which it can gain further insight into the nuances of relevance. When the system has learned all that it can such that the expert reviewing another batch of documents would not yield additional information, the system indicates that 'stabilization' has been reached. In this matter, the expert had reviewed 27 batches (only 1,080 additional documents) when the system reached stability.
The Batch Calculation Phase
With training complete, the system could now apply a relevance score to every document in entire population. This process, called batch calculation, assigns a relevance score between 0-100 to every document. After the batch calculation was run, the case team hit a bump in the road. They identified an additional data source that had not been collected that was likely to contain documents relevant to the matter. That data would now need to be incorporated into the Relevance workflow, assigned a relevance score and incorporated into their decision-making process for the next steps. Fortunately, the Relevance workflow is designed to easily incorporate additional data sets not contemplated during the interactive training phase. In this case, given that the nature of the new data was similar to the data that the system had already been trained on, the expert only needed to review one batch of 40 documents from the new data set to allow the system to recalculate the relevance scores for the entire document universe.
The Decision Phase
After the new data set was incorporated into the batch calculation, the case team set about making a decision on which documents would go through review. Working with Discovia's experienced consulting team, the case team used Equivio's decision support tools to determine a cutoff point for document production. The team decided that they would select a relevance cutoff point that provided them with a high level of recall. Recall measures the number of relevant documents retrieved out of the total number of relevant documents in the document set. For example, if recall is 80%, that means that eight out of every 10 relevant documents have been retrieved by the system. For most matters with high document counts, 100% recall is not an option. Typically, recall in the 80% range is considered very strong. (To provide some perspective, an oft-cited study about recall using iterative keyword search showed recall to be only 20%. See, Blair and Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Commc'ns ACM 289 (1985). Another study that compared human review to technology assisted review showed that even recall from full manual review was only 59.3%. See, 'Overview of the TREC 2009 Legal Track.')
The system recommended a review cutoff point at 84.3% recall (32% of the data set). The decision support tools also showed the approximate cost of human review at that cutoff point, and the cost to find the next relevant document. The case team had the ability to adjust the recall and review percentages to get a review cutoff and cost that is right for their matter. In this case, the team decided to go with Equivio system's baseline recommendation. The 32% of documents with relevance scores above the cutoff point were loaded to Relativity for privilege review, after which all non-privileged documents would be produced to opposing counsel.
Step 3: Privilege Search
The 32% of documents most likely to be responsive were loaded into a Relativity database hosted by Discovia for privilege searching and review. The case team worked with our search consultants to craft privilege searches that would capture potentially privileged data based on e-mail domain, attorney names and keywords. These searches were used to prioritize the privilege review and focus reviewers' attention on the documents most likely to be privileged. Of the ~608,000 documents loaded for review, the potentially privileged search narrowed the review set to only ~134,000. These documents were then reviewed by a contract attorney team.
Step 4: Quality Control
Sample documents that did not meet the relevance cutoff were reviewed to ensure that they did not include relevant material. A similar sampling was done on documents outside the potentially privileged search hits to ensure that the searches did not miss potentially privileged material.
Step 5: Production
The client made two productions of documents to the opposing party, in compliance with the discovery protocol for the matter. Ultimately, ~485,000 documents were produced. A clawback provision that was part of the agreed-upon discovery protocol could be used in the event that any privileged material was inadvertently produced.
The Results
Using a combination of culling, predictive coding, and keyword and domain search, we assisted the client with a cost-effective and defensible approach to managing a data set of 2.1 million documents. After a case expert reviewed 1,600 documents, a relevance score was assigned to the rest of the document population. After a privilege review which was expedited with the use of a robust set of potentially privilege searches, ~485,000 documents were produced to opposing counsel. Not only did the client meet the accelerated deadline set out in the discovery protocol, they did so in a manner that was far more cost effective than a more traditional approach of using keyword search for culling and linear review for responsiveness. They experienced a more accurate result and better recall of responsive data, while at the same time saving over $2,000,000 compared with traditional keyword search and linear review.
Lessons and Takeaways
There are four takeaways that we believe this case study helps to illustrate.
First, predictive coding, otherwise known as technology-assisted review, is harder to get right than people think. There are many steps, decision-points and quality control mechanisms along the way. Our team follows a proven set of guidelines and checklists in supporting our clients. You only need to refer back to the LTN article and see how poorly most providers performed in the study to see that success in predictive coding-based review is not easy to achieve.
Second, very positive results, in terms of time and cost savings, can be achieved with a powerful process/workflow and with minimum review. In this case, a 30-day deadline was hit which would simply not have been possible with a more manual process. And $2 million was saved in discovery costs.
Third, the tool you use for technology assisted review is only the third most important factor in determining your success, after people and process. In vetting a team to support a predictive coding process, we recommend asking to see process documents, checklists, tracking tools and communication protocols. We recommend asking how the team was recruited ' and what processes were used in that recruitment. And above all, we recommend checking references on prior performance.
Finally, different reviews require a very different set of technologies and workflows. Note that in our account, a very different process/protocol was used for the relevance as for the privilege review. Had there been, for example, a 'hot document' review, it would have been different again.
Benjamin Beck is Chief Client Officer at Discovia, a global provider of e-discovery services. Tobin Dietrich is a Discovery Solutions Consultant at Discovia, specializing in advanced technological solutions to e-discovery problems.
The litigation industry is awash with technology. According to consulting firm
Yet there remains a significant challenge. In November 2013, Legal Tech Newsletter's ALM sibling, Law Technology News, published an article with the byline: 'Humans Are Still Essential in E-Discovery.' The article summarized results from an extensive study conducted by the Electronic Discovery Institute and involving two leading Stanford statistics professors. The study compared the results of 19 technology assisted review providers with an entirely human review of a 1.7 million document data set. The headline conclusion was 'software is only as good as its operators.' The article included a chart that ranked the 19 providers' performance as 'optimal,' 'average,' and 'low' for each of three reviews (responsive, privilege and hot documents). The range of performance was dramatic. Only one provider scored 'optimal' in all three categories, while five others scored 'optimal' in two of the three categories. Four scored 'low' in all three categories. And many used the exact same software. The range of spend was equally wide, with one provider's total cost under $50k and another's over $1.5 million, a 30x difference.
The conclusion that a product is only as good as its operator is not a revolutionary one. The same could be said for daily life functions, like driving a car. And yet, in litigation, the complexity and stakes are especially high. Demand is unpredictable. Many of the operations are complex. Everything needs to happen fast. The range of potential outcomes is very wide. Deadlines are critical. Atul Gawande, in his outstanding book The Checklist Manifesto, describes how checklists created dramatic improvement in results in the emergency healthcare industry, an industry which has many commonalities to litigation discovery.
Below is a high-level account of a real-life litigation discovery matter we handled. Behind each phase of operation is a carefully crafted set of processes and checklists. Indeed, we apply similar processes and checklists to the recruitment of our team ' an equally critical component in ensuring success. In reading the account, our hope is that you take away a lesson or two that you can apply in your day-to-day practice (and we have summarized four lessons at the end of the article).
The Challenge
An energy company was heading into arbitration in a large-scale construction dispute. The law firm representing the company needed to review and produce responsive documents from a data set of 2.1 million documents in a 30-day time period. A cost-effective and efficient solution to cull and review the documents for relevance was needed to manage this large data set in the tight timeframe required.
The Solution
Working closely with our consulting team, the law firm decided on a workflow involving Discovia's proprietary culling tool, Intelligent Case Assessment (ICA), Equivio Zoom's Relevance predictive coding module, and keyword search.
Step 1: Junk File Analysis and Culling Phase
Using file extension sampling and exclusion, 112,285 documents were excluded as computer-generated files or image files. Additionally, domain parsing and analysis was used to identify 230 e-mail domains and 57,000 additional documents as 'junk' mail. Because Equivio Relevance's predictive coding technology was selected as the second phase of the culling process, the client opted not to perform any further keyword filtering at this point to further reduce the data set.
Step 2: Configuring
Once initial culling was complete, our team of technology assisted review consultants worked with the client to configure the project for Relevance. Files not suitable for Relevance (such as image files or other files with no extracted text) were set aside and loaded into Relativity for standard review. The remaining files (approximately 1.6 million documents) were loaded into Relevance. Because Relevance only requires the text of the documents, rather than complete native files, the loading process is very fast. In this case, the files were loaded in a matter of hours, which served the case team well in this time-sensitive matter. The client selected a case expert who was well-versed in the case data and able to definitively distinguish relevant from non-relevant documents to train the system. Additionally, in order to comply with the agreed upon discovery protocol in the matter, the client disclosed its plans to use technology assisted review to the opposing party.
The Assessment Phase
The case expert was provided a random set of documents to begin the 'assessment phase' of the Relevance workflow. The purpose of the assessment phase is to provide a statistically significant sample of documents, subsequently used to monitor training progress and to estimate the recall and precision of the final results. The assessment phase typically requires the case expert to review 500-1,500 documents. In this case, the expert reviewed 520 documents and the system determined the richness of the document population to be 24.5%.
The Training Phase
With the assessment phase complete, the expert reviewer was able to move on to the interactive machine learning (aka, training) phase of the Relevance workflow. During this phase, the case expert reviews documents in batches of 40 to train the system on responsiveness. The expert marks each document in the batch responsive or non-responsive, allowing the system to learn. Using the information gathered from each batch, the system presents the expert with a new batch of documents from which it can gain further insight into the nuances of relevance. When the system has learned all that it can such that the expert reviewing another batch of documents would not yield additional information, the system indicates that 'stabilization' has been reached. In this matter, the expert had reviewed 27 batches (only 1,080 additional documents) when the system reached stability.
The Batch Calculation Phase
With training complete, the system could now apply a relevance score to every document in entire population. This process, called batch calculation, assigns a relevance score between 0-100 to every document. After the batch calculation was run, the case team hit a bump in the road. They identified an additional data source that had not been collected that was likely to contain documents relevant to the matter. That data would now need to be incorporated into the Relevance workflow, assigned a relevance score and incorporated into their decision-making process for the next steps. Fortunately, the Relevance workflow is designed to easily incorporate additional data sets not contemplated during the interactive training phase. In this case, given that the nature of the new data was similar to the data that the system had already been trained on, the expert only needed to review one batch of 40 documents from the new data set to allow the system to recalculate the relevance scores for the entire document universe.
The Decision Phase
After the new data set was incorporated into the batch calculation, the case team set about making a decision on which documents would go through review. Working with Discovia's experienced consulting team, the case team used Equivio's decision support tools to determine a cutoff point for document production. The team decided that they would select a relevance cutoff point that provided them with a high level of recall. Recall measures the number of relevant documents retrieved out of the total number of relevant documents in the document set. For example, if recall is 80%, that means that eight out of every 10 relevant documents have been retrieved by the system. For most matters with high document counts, 100% recall is not an option. Typically, recall in the 80% range is considered very strong. (To provide some perspective, an oft-cited study about recall using iterative keyword search showed recall to be only 20%. See, Blair and Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System, 28 Commc'ns ACM 289 (1985). Another study that compared human review to technology assisted review showed that even recall from full manual review was only 59.3%. See, 'Overview of the TREC 2009 Legal Track.')
The system recommended a review cutoff point at 84.3% recall (32% of the data set). The decision support tools also showed the approximate cost of human review at that cutoff point, and the cost to find the next relevant document. The case team had the ability to adjust the recall and review percentages to get a review cutoff and cost that is right for their matter. In this case, the team decided to go with Equivio system's baseline recommendation. The 32% of documents with relevance scores above the cutoff point were loaded to Relativity for privilege review, after which all non-privileged documents would be produced to opposing counsel.
Step 3: Privilege Search
The 32% of documents most likely to be responsive were loaded into a Relativity database hosted by Discovia for privilege searching and review. The case team worked with our search consultants to craft privilege searches that would capture potentially privileged data based on e-mail domain, attorney names and keywords. These searches were used to prioritize the privilege review and focus reviewers' attention on the documents most likely to be privileged. Of the ~608,000 documents loaded for review, the potentially privileged search narrowed the review set to only ~134,000. These documents were then reviewed by a contract attorney team.
Step 4: Quality Control
Sample documents that did not meet the relevance cutoff were reviewed to ensure that they did not include relevant material. A similar sampling was done on documents outside the potentially privileged search hits to ensure that the searches did not miss potentially privileged material.
Step 5: Production
The client made two productions of documents to the opposing party, in compliance with the discovery protocol for the matter. Ultimately, ~485,000 documents were produced. A clawback provision that was part of the agreed-upon discovery protocol could be used in the event that any privileged material was inadvertently produced.
The Results
Using a combination of culling, predictive coding, and keyword and domain search, we assisted the client with a cost-effective and defensible approach to managing a data set of 2.1 million documents. After a case expert reviewed 1,600 documents, a relevance score was assigned to the rest of the document population. After a privilege review which was expedited with the use of a robust set of potentially privilege searches, ~485,000 documents were produced to opposing counsel. Not only did the client meet the accelerated deadline set out in the discovery protocol, they did so in a manner that was far more cost effective than a more traditional approach of using keyword search for culling and linear review for responsiveness. They experienced a more accurate result and better recall of responsive data, while at the same time saving over $2,000,000 compared with traditional keyword search and linear review.
Lessons and Takeaways
There are four takeaways that we believe this case study helps to illustrate.
First, predictive coding, otherwise known as technology-assisted review, is harder to get right than people think. There are many steps, decision-points and quality control mechanisms along the way. Our team follows a proven set of guidelines and checklists in supporting our clients. You only need to refer back to the LTN article and see how poorly most providers performed in the study to see that success in predictive coding-based review is not easy to achieve.
Second, very positive results, in terms of time and cost savings, can be achieved with a powerful process/workflow and with minimum review. In this case, a 30-day deadline was hit which would simply not have been possible with a more manual process. And $2 million was saved in discovery costs.
Third, the tool you use for technology assisted review is only the third most important factor in determining your success, after people and process. In vetting a team to support a predictive coding process, we recommend asking to see process documents, checklists, tracking tools and communication protocols. We recommend asking how the team was recruited ' and what processes were used in that recruitment. And above all, we recommend checking references on prior performance.
Finally, different reviews require a very different set of technologies and workflows. Note that in our account, a very different process/protocol was used for the relevance as for the privilege review. Had there been, for example, a 'hot document' review, it would have been different again.
Benjamin Beck is Chief Client Officer at Discovia, a global provider of e-discovery services. Tobin Dietrich is a Discovery Solutions Consultant at Discovia, specializing in advanced technological solutions to e-discovery problems.
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.