Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Efficient Review In a Time-Sensitive Government Investigation

By Sanjay Manocha
October 02, 2014

Over the past 10 years, government investigations have become increasingly sophisticated in analyzing electronically stored information (ESI). Federal executive departments and agencies have made substantial investments in advanced analytical systems that help investigators and prosecutors filter voluminous amounts of incoming ESI to quickly focus on items of particular interest and relevance to an investigation. These systems, once almost magical in the speed and depth of their analysis, are now commonplace.

Companies and organizations responding to Civil Investigative Demands (CIDs) and other government requests for information must recognize that the information provided will be analyzed using these powerful tools. Significant documents in a voluminous production that previously might have been overlooked will now likely be discovered. Even more importantly, prosecutorial data analysis may reveal documents to the investigators that, in the absence of similar capabilities, a target company may not be aware it was producing.

Recently, RVM Enterprises, Inc. (RVM) worked with an AmLaw 100 firm and one of the world's largest corporations to respond to voluminous and time-sensitive requests made in connection with an investigation conducted by the United States Department of Justice (DOJ). The DOJ sought both the production of documents and fact witnesses for depositions, and the law firm and its client had less than three weeks to analyze over five million potentially responsive e-mail messages and other documents already produced to the DOJ.

The client initially intended to use a traditional keyword search approach to identify “hot” documents for use in witness interviews, but a traditional document review method could never have met this tight deadline. Equally important, the client's legal response team understood that the DOJ investigative team was using some of the DOJ's more robust ESI analytical tools to “data mine” the target company's document production, and the team wanted to do everything it could to ensure that the DOJ's analysis did not turn up any documents that had not already been evaluated by the client's response team.

Approach

As requested by the client, we started the project by indexing the collection and running initial keyword searches developed by the client. It is well documented that keyword search queries are both over-inclusive and under-inclusive; they generate false positives that aren't genuinely relevant to the matter being investigated at the same time that they do not find many of the relevant documents that do not contain any of the keyword search terms used. Some studies have demonstrated that traditional keyword searches alone may miss up to 80% of the relevant documents in a collection. See, Da Silva Moore v. Publicis Groupe , 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012) (citing, inter alia, David L. Blair & M. E. Maron, “An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,” 28 Comm. ACM 289 (1985)).

Recognizing that keyword search alone was insufficient for this project and its compressed timeframe, we suggested that a multi-modal, multi-platform approach would best meet the needs of the client's response team. Traditional methodologies, such as standard keyword search, e-mail threading, near-duplicate identification and concept clustering would, in each case, help winnow down the overwhelming volume of potentially responsive documents to a more manageable level, but no single tool could do the whole job on its own. In addition, the team identified the need to deploy advanced predictive coding technology.

We also explained to the client that the order in which tools were deployed would play an important role in determining how efficient culling the mountain of documents would be, while simultaneously flagging potentially significant documents for closer analysis. Tools can be deployed in different order within a workflow depending on client objectives; here, RVM advised and the client agreed to an approach that first focused on triaging relevant data before eliminating redundant documents for concept clustering. As a final step, RVM leveraged conceptual search to enhance the ultimate attorney review and investigation of the responsive documents.

To begin triaging the document collection, we took all documents, along with known documents of interest that had already been identified by the legal team, and deployed Equivio's Relevance predictive coding technology to begin to filter and prioritize further analytical work.

Legal team members used Equivio Relevance to directly view sample document sets from the collection and rate the relative importance of those documents. Each sample set categorized by the legal response team was then used to “train” the Equivio Relevance tool. Relevance would then update its document rankings by relative importance and spawn an additional sample set for attorney review. Results of this and subsequent rounds of Equivio Relevance training would continue to improve the system's accuracy until the Relevance rankings closely matched the legal team's understanding of the substance and relevance of the documents.

Here, Relevance training proceeded quickly, and the legal team needed only three iterations of Relevance system training before Equivio Relevance was able to reliably rank documents by relative importance.

Once the documents were ranked in terms of relative importance, it was important to find the appropriate dividing point between likely relevant and likely irrelevant documents. The RVM team used sampling and other techniques to help the legal team choose an appropriate cut-off point for this review, based on both objective and subjective analytical factors. Based on the importance of the investigation, the legal response team erred on the side of caution, choosing a somewhat lower cut-off point to ensure that as many responsive documents as possible would be passed to the next stage of analysis.

To further filter and prioritize the documents above the Relevance cut-off point, RVM next deployed Equivio's near-duplication and e-mail threading tools to identify documents that were substantially similar to each other or, in the case of e-mail threading, all messages generated from a common strand of e-mail “conversations.”

This analytical approach to near-duplicate identification gave RVM and the client's legal response team the ability to group document drafts comprising largely duplicate content, and in so doing reduce the number of documents requiring individual attorney review.

Similarly, Equivio's e-mail thread analysis identified “inclusive” e-mail messages that included the text of all prior messages in a thread. Therefore, reviewing a single inclusive e-mail message would eliminate the need for an attorney to review all other messages in a thread. Where e-mail conversations fragmented into multiple conversations, the system identifies multiple inclusive e-mail messages to account for message forks. From the outset, our team knew that the client's legal response team would ultimately conduct its final relevance review and document production using kCura's Relativity platform. We loaded the documents into a Relativity workspace and then leveraged Relativity's own functionality to further prioritize the review via clustering analytics applied to Equivio-identified pivots and inclusive documents.

Combined with Equivio Relevance rankings, this permitted the legal team to prioritize clusters containing known high-value documents and these clusters were immediately reviewed by the legal team for substance.

Equivio Relevance rankings were also key to identifying clusters of documents that contained both high-ranking and low-ranking documents. Why did these clusters exist? Were there additional concepts or vocabulary that that hadn't previously been recognized? Again, cross-referencing the analysis from two disparate systems permitted immediate, reproducible, and highly defensible quality control that was constantly being used to refine the eyes-on review and re-prioritize document clusters.

Systematic, substantive review also required consistent rules and results. To meet this requirement for consistency, the legal response team crafted a detailed written review protocol that defined the review categorization methodology based on initial analysis derived from briefings by the client, known high-value documents that were the results of keyword search, and Equivio Relevance training.

The legal team's protocol also provided flexible “rules of engagement” for the collection. Rather than focusing only on defined clusters or review batches, reviewers were encouraged to use the platform's “find similar documents” and “expand terms” functionality whenever an unexpected document of interest was identified. When a review team member found a particularly significant document, Equivio near-duplicate or e-mail threading would help the team member see the entire context for the document, whether or not all of these documents were in the original cluster where the first document was found.

Behind the scenes of the legal team review, the RVM team regularly collected the results of the legal team's eyes-on document review and used their review decisions to update the concept clusters ' adding new clusters and re-categorizing the priority of existing clusters based on the ongoing analysis of the legal team.

Having this behind-the-scenes review in place was significant because it served as an iterative process to continually improve the organization of the documents with new information as it was discovered. We partnered with the legal review team by speaking with them on a daily basis to keep all members of the response team on the same page.

Results

How long did this entire process take?

At the outset, the legal team of five associates faced the Herculean task of analyzing five million documents in three weeks. Departing from tradition, and using RVM's Structured Review (RSR) approach, they completed their task of first pass review, from start to finish, in just three weeks (17 days).

During that 17-day period, the legal response team, using RVM's multi-platform approach, reduced the initial population of five million documents to 400,000 documents potentially requiring eyes-on review by the legal team ' a 92% reduction.

Once the legal team began its substantive document review, RVM's heavy use of analytics permitted the remaining documents to be further prioritized for review, giving the team great confidence in the review results without requiring individual review of the full 400,000 documents.

Analysis at the end of the project demonstrated the noteworthy discovery that, if not for the implementation of RVM's Structured Review approach, and in particular the use of Equivio's Relevance product for predictive coding, 61% of the “hot” documents actually found by the client's legal response team during the review process would have been missed because those documents contained none of the keyword search terms that were first used to focus the project. Those “hot” documents would have been missed by the defense had they not implemented the analytics-based strategy proposed by RVM.

Failure to identify such a large portion of responsive documents using traditional methods would have been likely and would have put the client legal team at a significant and material tactical disadvantage to the government during witness interviews had the government found the documents a traditional search would have missed.

Conclusion

Responding to government investigations has traditionally required a costly and time-consuming review process with marked potential for human error. In this case, using a more traditional data search and review methodology, assuming that the resources were available, would have required a staff of 100 review attorneys and more than five months to complete the task.

In contrast, using Equivio analytics and predictive coding as part of RVM's Structured Review program, the legal team completed its first pass review in 17 days using only five attorneys. Employing these dramatic efficiencies in technology and time in the first pass review phase, the client was able to save nearly 50% of the total review costs that would have been incurred using a traditional approach. At the same time, compared with the results generated by the best possible keyword search used at the start of the project, RVM's approach identified more than twice as many critical documents used to prepare for fact witness depositions.


Sanjay Manocha oversees implementation of advanced analytics and predictive coding technologies in discovery practice at RVM Enterprises, Inc. Prior to RVM, Manocha was CEO of N-Tier Discovery, a discovery analytics consulting firm, and practiced law, specializing in complex commercial litigation and regulatory investigations.

Over the past 10 years, government investigations have become increasingly sophisticated in analyzing electronically stored information (ESI). Federal executive departments and agencies have made substantial investments in advanced analytical systems that help investigators and prosecutors filter voluminous amounts of incoming ESI to quickly focus on items of particular interest and relevance to an investigation. These systems, once almost magical in the speed and depth of their analysis, are now commonplace.

Companies and organizations responding to Civil Investigative Demands (CIDs) and other government requests for information must recognize that the information provided will be analyzed using these powerful tools. Significant documents in a voluminous production that previously might have been overlooked will now likely be discovered. Even more importantly, prosecutorial data analysis may reveal documents to the investigators that, in the absence of similar capabilities, a target company may not be aware it was producing.

Recently, RVM Enterprises, Inc. (RVM) worked with an AmLaw 100 firm and one of the world's largest corporations to respond to voluminous and time-sensitive requests made in connection with an investigation conducted by the United States Department of Justice (DOJ). The DOJ sought both the production of documents and fact witnesses for depositions, and the law firm and its client had less than three weeks to analyze over five million potentially responsive e-mail messages and other documents already produced to the DOJ.

The client initially intended to use a traditional keyword search approach to identify “hot” documents for use in witness interviews, but a traditional document review method could never have met this tight deadline. Equally important, the client's legal response team understood that the DOJ investigative team was using some of the DOJ's more robust ESI analytical tools to “data mine” the target company's document production, and the team wanted to do everything it could to ensure that the DOJ's analysis did not turn up any documents that had not already been evaluated by the client's response team.

Approach

As requested by the client, we started the project by indexing the collection and running initial keyword searches developed by the client. It is well documented that keyword search queries are both over-inclusive and under-inclusive; they generate false positives that aren't genuinely relevant to the matter being investigated at the same time that they do not find many of the relevant documents that do not contain any of the keyword search terms used. Some studies have demonstrated that traditional keyword searches alone may miss up to 80% of the relevant documents in a collection. See, Da Silva Moore v. Publicis Groupe , 2012 U.S. Dist. LEXIS 23350 at 19 (S.D.N.Y. Feb. 24, 2012) (citing, inter alia, David L. Blair & M. E. Maron, “An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System,” 28 Comm. ACM 289 (1985)).

Recognizing that keyword search alone was insufficient for this project and its compressed timeframe, we suggested that a multi-modal, multi-platform approach would best meet the needs of the client's response team. Traditional methodologies, such as standard keyword search, e-mail threading, near-duplicate identification and concept clustering would, in each case, help winnow down the overwhelming volume of potentially responsive documents to a more manageable level, but no single tool could do the whole job on its own. In addition, the team identified the need to deploy advanced predictive coding technology.

We also explained to the client that the order in which tools were deployed would play an important role in determining how efficient culling the mountain of documents would be, while simultaneously flagging potentially significant documents for closer analysis. Tools can be deployed in different order within a workflow depending on client objectives; here, RVM advised and the client agreed to an approach that first focused on triaging relevant data before eliminating redundant documents for concept clustering. As a final step, RVM leveraged conceptual search to enhance the ultimate attorney review and investigation of the responsive documents.

To begin triaging the document collection, we took all documents, along with known documents of interest that had already been identified by the legal team, and deployed Equivio's Relevance predictive coding technology to begin to filter and prioritize further analytical work.

Legal team members used Equivio Relevance to directly view sample document sets from the collection and rate the relative importance of those documents. Each sample set categorized by the legal response team was then used to “train” the Equivio Relevance tool. Relevance would then update its document rankings by relative importance and spawn an additional sample set for attorney review. Results of this and subsequent rounds of Equivio Relevance training would continue to improve the system's accuracy until the Relevance rankings closely matched the legal team's understanding of the substance and relevance of the documents.

Here, Relevance training proceeded quickly, and the legal team needed only three iterations of Relevance system training before Equivio Relevance was able to reliably rank documents by relative importance.

Once the documents were ranked in terms of relative importance, it was important to find the appropriate dividing point between likely relevant and likely irrelevant documents. The RVM team used sampling and other techniques to help the legal team choose an appropriate cut-off point for this review, based on both objective and subjective analytical factors. Based on the importance of the investigation, the legal response team erred on the side of caution, choosing a somewhat lower cut-off point to ensure that as many responsive documents as possible would be passed to the next stage of analysis.

To further filter and prioritize the documents above the Relevance cut-off point, RVM next deployed Equivio's near-duplication and e-mail threading tools to identify documents that were substantially similar to each other or, in the case of e-mail threading, all messages generated from a common strand of e-mail “conversations.”

This analytical approach to near-duplicate identification gave RVM and the client's legal response team the ability to group document drafts comprising largely duplicate content, and in so doing reduce the number of documents requiring individual attorney review.

Similarly, Equivio's e-mail thread analysis identified “inclusive” e-mail messages that included the text of all prior messages in a thread. Therefore, reviewing a single inclusive e-mail message would eliminate the need for an attorney to review all other messages in a thread. Where e-mail conversations fragmented into multiple conversations, the system identifies multiple inclusive e-mail messages to account for message forks. From the outset, our team knew that the client's legal response team would ultimately conduct its final relevance review and document production using kCura's Relativity platform. We loaded the documents into a Relativity workspace and then leveraged Relativity's own functionality to further prioritize the review via clustering analytics applied to Equivio-identified pivots and inclusive documents.

Combined with Equivio Relevance rankings, this permitted the legal team to prioritize clusters containing known high-value documents and these clusters were immediately reviewed by the legal team for substance.

Equivio Relevance rankings were also key to identifying clusters of documents that contained both high-ranking and low-ranking documents. Why did these clusters exist? Were there additional concepts or vocabulary that that hadn't previously been recognized? Again, cross-referencing the analysis from two disparate systems permitted immediate, reproducible, and highly defensible quality control that was constantly being used to refine the eyes-on review and re-prioritize document clusters.

Systematic, substantive review also required consistent rules and results. To meet this requirement for consistency, the legal response team crafted a detailed written review protocol that defined the review categorization methodology based on initial analysis derived from briefings by the client, known high-value documents that were the results of keyword search, and Equivio Relevance training.

The legal team's protocol also provided flexible “rules of engagement” for the collection. Rather than focusing only on defined clusters or review batches, reviewers were encouraged to use the platform's “find similar documents” and “expand terms” functionality whenever an unexpected document of interest was identified. When a review team member found a particularly significant document, Equivio near-duplicate or e-mail threading would help the team member see the entire context for the document, whether or not all of these documents were in the original cluster where the first document was found.

Behind the scenes of the legal team review, the RVM team regularly collected the results of the legal team's eyes-on document review and used their review decisions to update the concept clusters ' adding new clusters and re-categorizing the priority of existing clusters based on the ongoing analysis of the legal team.

Having this behind-the-scenes review in place was significant because it served as an iterative process to continually improve the organization of the documents with new information as it was discovered. We partnered with the legal review team by speaking with them on a daily basis to keep all members of the response team on the same page.

Results

How long did this entire process take?

At the outset, the legal team of five associates faced the Herculean task of analyzing five million documents in three weeks. Departing from tradition, and using RVM's Structured Review (RSR) approach, they completed their task of first pass review, from start to finish, in just three weeks (17 days).

During that 17-day period, the legal response team, using RVM's multi-platform approach, reduced the initial population of five million documents to 400,000 documents potentially requiring eyes-on review by the legal team ' a 92% reduction.

Once the legal team began its substantive document review, RVM's heavy use of analytics permitted the remaining documents to be further prioritized for review, giving the team great confidence in the review results without requiring individual review of the full 400,000 documents.

Analysis at the end of the project demonstrated the noteworthy discovery that, if not for the implementation of RVM's Structured Review approach, and in particular the use of Equivio's Relevance product for predictive coding, 61% of the “hot” documents actually found by the client's legal response team during the review process would have been missed because those documents contained none of the keyword search terms that were first used to focus the project. Those “hot” documents would have been missed by the defense had they not implemented the analytics-based strategy proposed by RVM.

Failure to identify such a large portion of responsive documents using traditional methods would have been likely and would have put the client legal team at a significant and material tactical disadvantage to the government during witness interviews had the government found the documents a traditional search would have missed.

Conclusion

Responding to government investigations has traditionally required a costly and time-consuming review process with marked potential for human error. In this case, using a more traditional data search and review methodology, assuming that the resources were available, would have required a staff of 100 review attorneys and more than five months to complete the task.

In contrast, using Equivio analytics and predictive coding as part of RVM's Structured Review program, the legal team completed its first pass review in 17 days using only five attorneys. Employing these dramatic efficiencies in technology and time in the first pass review phase, the client was able to save nearly 50% of the total review costs that would have been incurred using a traditional approach. At the same time, compared with the results generated by the best possible keyword search used at the start of the project, RVM's approach identified more than twice as many critical documents used to prepare for fact witness depositions.


Sanjay Manocha oversees implementation of advanced analytics and predictive coding technologies in discovery practice at RVM Enterprises, Inc. Prior to RVM, Manocha was CEO of N-Tier Discovery, a discovery analytics consulting firm, and practiced law, specializing in complex commercial litigation and regulatory investigations.

Read These Next
Generative AI and the 2024 Elections: Risks, Realities, and Lessons for Businesses Image

GenAI's ability to produce highly sophisticated and convincing content at a fraction of the previous cost has raised fears that it could amplify misinformation. The dissemination of fake audio, images and text could reshape how voters perceive candidates and parties. Businesses, too, face challenges in managing their reputations and navigating this new terrain of manipulated content.

How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

Warehouse Liability: Know Before You Stow! Image

As consumers continue to shift purchasing and consumption habits in the aftermath of the pandemic, manufacturers are increasingly reliant on third-party logistics and warehousing to ensure their products timely reach the market.