Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Using Language-Based Analytics to Accelerate the Review of an Incoming Production of Documents

By Bobbi Basile
April 02, 2014

Over the past decade a lot of effort and debate has gone into answering the question: How do we reduce the amount of data we need to review related to the production of documents? Solutions have focused on reviewing documents to produce, while incoming productions have been largely ignored.

Incoming productions are increasingly becoming a burden due to accelerated dockets, larger volumes and clients' reduced budgets. Furthermore, agreements between counsel to produce documents responsive to stipulated (usually overly broad) keywords shift the burden to the receiving party to separate the proverbial wheat from the chaff. When a production has been made without a review of the documents for relevancy, the density of relevant material can be quite low and the exercise to uncover it time-consuming and expensive.

This article examines the critical role language-based analytics plays in incoming productions and how attorneys can gain a strategic advantage using this approach. This includes how to:

  • Use language-based analytics to quickly identify documents that are on point and filter out non-relevant documents;
  • Categorize documents according to issue;
  • Assess whether the opposing party has met production obligations;
  • Get armed with valuable data about the incoming production that can be used in discussions with opposing counsel;
  • Define and create the review workflow according to the strategic objectives of the case as it relates to each issue; and
  • Better prepare for depositions and find critical exhibit documents faster.

What Do the Documents Say?

The purpose of reviewing an incoming production of documents is “the same but different” than reviewing documents to produce. While the purposes are aligned as far as identifying relevant documents and, more important, isolating deposition and/or trial exhibits, there are key distinctions, such as getting the answers to a few questions: Did the opposing party meet its production obligations? Did we receive the documents we asked for? And, based on what was received, did we ask for the right documents? What do the documents we received say about the issues of the case?

Answering these questions requires that the document collection first be organized to align topically with the documents requested. The next step in the process is to create an organizational schema to tag the incoming documents by category (issue).

Once the categories are defined, the key terms and phrases that are associated with each category are defined. The language-based analytics process generates a list of words extracted from the documents to inform counsel's selection of key terms and phrases. This method does not rely upon the computer's interpretation of language or concepts, but rather provides counsel with a multiple choice list of language pulled out of the collection from which to choose. Boolean queries are generated and run across the population to create a virtual categorization of the document collection according to issue.

Once the documents are tagged according to category, it becomes readily apparent whether the opposing party has produced documents in accordance with those sought in the request for production. For instance, in a recent case, our client quickly discovered that the opposing party had failed to produce documents pertaining to 26 out of 54 categories of documents requested. Within five days of receiving the production, our client objected to the content of the production with a level of specificity that surprised the producing party and ultimately led to settlement negotiations.

Often more revealing is the volume of documents that remain uncategorized ' those documents that do not contain any language relevant to any of the topics at issue. A sampling of the documents coupled with an analysis by file type, subject or file name and custodian results in a valuable synopsis of the documents that did not hit on the relevant language searches. As an example, a recent client discovered that two-thirds of an incoming production contained documents completely off point, such as the inclusion of popular literature (e-books) and personal media files. This information led our client to conclude that the production was made without the benefit of a review for relevant documents and that the producing party likely did not understand the content of its client's documents.

Of course, it is possible that the uncategorized documents contain some unexpected keywords that should be added to the analysis or that the collection contains documents about topics you had not thought of as relevant and must now consider.

The analysis of the uncategorized documents continues until a sampling test proves with at least 95% certainty that none of the uncategorized documents are relevant to the case.

At this point in the process, we have categorized the documents received to align with the topics of the documents requested and analyzed the documents that were not relevant to those topics.

The next step is to align the review of the potentially relevant documents to the strategy of the case respective to each category or case issue. For each category, there are three strategic possibilities:

  1. We are seeking enough good documents to make our point;
  2. We need to find every possible example document; and
  3. We hope to find a smoking gun.

It is important to define a strategy for each category so that you know when you have accomplished your goal and can move on to the next category when reviewing the documents.

To accelerate the review of the documents in the production, the challenge is to look at enough documents to be able to answer the following questions:

  1. Did we receive the documents we asked for?
  2. Based on what we received, did we ask for the right things?
  3. What do these documents actually say about the issues?
  4. Which documents will become exhibits?

How many documents must be looked at in order to satisfy these objectives? This is where mathematics comes in. To explain the process, let's consider an example. Assume that our organizational schema consists of 50 categories and that each category has been populated with 2,000 documents. Query: Do we need to read all 100,000 documents to understand what the collection says about each of the 50 issues? The Poisson distribution says “no, we need only read 15,000.”

The gist of the mathematics means to be 95% certain to have seen all of the relevant language that appears in more than 1% of the documents in the category (a “rare event”), we need only read 300″ documents in that category. In other words, by reading 300 randomly selected documents from each category, we are 95% certain to see the relevant language that appears in all but 50 (1%) of the 2,000 documents in each category.

Unless we are looking for a smoking gun, that is certainly enough language to both understand what the documents say about the issues and find our exhibits. If we are looking for a smoking gun, we may have to review more documents to reach our certainty level.

To summarize the process:

  1. Create an organizational structure that aligns with the categories of documents requested.
  2. Populate the categories by tagging documents that contain language relevant to the respective category.
  3. Examine the categories.
  4. Identify an evidentiary strategy for each category.
  5. Review documents in each category.

By employing a language-based approach to analyzing an incoming document production, within a matter of days of receipt you will be able to:

  • Identify documents that are on point and avoid reviewing non-relevant documents;
  • Categorize documents according to issue;
  • Assess whether the opposing party has met production obligations and, if not, identify the deficiencies with specificity;
  • Be armed with valuable data about the incoming production that can be used in negotiations with opposing counsel.
  • Review enough documents about each case issue to understand what the documents produced say about the issue.
  • Better prepare for depositions and find critical exhibit documents faster.

Conclusion

Examining large incoming productions does not have to be time-consuming and expensive. Language-based analytics is a method that reduces the number of documents that must be reviewed in order to understand very accurately what the collection says about each topic and whether the opposing party has met its obligations.


Bobbi Basile, CLA/CP, is Director, Consulting and Analytics for RenewData. Basile has delivered a variety of consulting engagements which include: electronic discovery technology selection and implementation, content management technology selection and implementation, creation of outside counsel guidelines, defining corporate enterprise records management programs, law department knowledge management, trial presentation technology training and law firm matter management selection and implementation.

Over the past decade a lot of effort and debate has gone into answering the question: How do we reduce the amount of data we need to review related to the production of documents? Solutions have focused on reviewing documents to produce, while incoming productions have been largely ignored.

Incoming productions are increasingly becoming a burden due to accelerated dockets, larger volumes and clients' reduced budgets. Furthermore, agreements between counsel to produce documents responsive to stipulated (usually overly broad) keywords shift the burden to the receiving party to separate the proverbial wheat from the chaff. When a production has been made without a review of the documents for relevancy, the density of relevant material can be quite low and the exercise to uncover it time-consuming and expensive.

This article examines the critical role language-based analytics plays in incoming productions and how attorneys can gain a strategic advantage using this approach. This includes how to:

  • Use language-based analytics to quickly identify documents that are on point and filter out non-relevant documents;
  • Categorize documents according to issue;
  • Assess whether the opposing party has met production obligations;
  • Get armed with valuable data about the incoming production that can be used in discussions with opposing counsel;
  • Define and create the review workflow according to the strategic objectives of the case as it relates to each issue; and
  • Better prepare for depositions and find critical exhibit documents faster.

What Do the Documents Say?

The purpose of reviewing an incoming production of documents is “the same but different” than reviewing documents to produce. While the purposes are aligned as far as identifying relevant documents and, more important, isolating deposition and/or trial exhibits, there are key distinctions, such as getting the answers to a few questions: Did the opposing party meet its production obligations? Did we receive the documents we asked for? And, based on what was received, did we ask for the right documents? What do the documents we received say about the issues of the case?

Answering these questions requires that the document collection first be organized to align topically with the documents requested. The next step in the process is to create an organizational schema to tag the incoming documents by category (issue).

Once the categories are defined, the key terms and phrases that are associated with each category are defined. The language-based analytics process generates a list of words extracted from the documents to inform counsel's selection of key terms and phrases. This method does not rely upon the computer's interpretation of language or concepts, but rather provides counsel with a multiple choice list of language pulled out of the collection from which to choose. Boolean queries are generated and run across the population to create a virtual categorization of the document collection according to issue.

Once the documents are tagged according to category, it becomes readily apparent whether the opposing party has produced documents in accordance with those sought in the request for production. For instance, in a recent case, our client quickly discovered that the opposing party had failed to produce documents pertaining to 26 out of 54 categories of documents requested. Within five days of receiving the production, our client objected to the content of the production with a level of specificity that surprised the producing party and ultimately led to settlement negotiations.

Often more revealing is the volume of documents that remain uncategorized ' those documents that do not contain any language relevant to any of the topics at issue. A sampling of the documents coupled with an analysis by file type, subject or file name and custodian results in a valuable synopsis of the documents that did not hit on the relevant language searches. As an example, a recent client discovered that two-thirds of an incoming production contained documents completely off point, such as the inclusion of popular literature (e-books) and personal media files. This information led our client to conclude that the production was made without the benefit of a review for relevant documents and that the producing party likely did not understand the content of its client's documents.

Of course, it is possible that the uncategorized documents contain some unexpected keywords that should be added to the analysis or that the collection contains documents about topics you had not thought of as relevant and must now consider.

The analysis of the uncategorized documents continues until a sampling test proves with at least 95% certainty that none of the uncategorized documents are relevant to the case.

At this point in the process, we have categorized the documents received to align with the topics of the documents requested and analyzed the documents that were not relevant to those topics.

The next step is to align the review of the potentially relevant documents to the strategy of the case respective to each category or case issue. For each category, there are three strategic possibilities:

  1. We are seeking enough good documents to make our point;
  2. We need to find every possible example document; and
  3. We hope to find a smoking gun.

It is important to define a strategy for each category so that you know when you have accomplished your goal and can move on to the next category when reviewing the documents.

To accelerate the review of the documents in the production, the challenge is to look at enough documents to be able to answer the following questions:

  1. Did we receive the documents we asked for?
  2. Based on what we received, did we ask for the right things?
  3. What do these documents actually say about the issues?
  4. Which documents will become exhibits?

How many documents must be looked at in order to satisfy these objectives? This is where mathematics comes in. To explain the process, let's consider an example. Assume that our organizational schema consists of 50 categories and that each category has been populated with 2,000 documents. Query: Do we need to read all 100,000 documents to understand what the collection says about each of the 50 issues? The Poisson distribution says “no, we need only read 15,000.”

The gist of the mathematics means to be 95% certain to have seen all of the relevant language that appears in more than 1% of the documents in the category (a “rare event”), we need only read 300″ documents in that category. In other words, by reading 300 randomly selected documents from each category, we are 95% certain to see the relevant language that appears in all but 50 (1%) of the 2,000 documents in each category.

Unless we are looking for a smoking gun, that is certainly enough language to both understand what the documents say about the issues and find our exhibits. If we are looking for a smoking gun, we may have to review more documents to reach our certainty level.

To summarize the process:

  1. Create an organizational structure that aligns with the categories of documents requested.
  2. Populate the categories by tagging documents that contain language relevant to the respective category.
  3. Examine the categories.
  4. Identify an evidentiary strategy for each category.
  5. Review documents in each category.

By employing a language-based approach to analyzing an incoming document production, within a matter of days of receipt you will be able to:

  • Identify documents that are on point and avoid reviewing non-relevant documents;
  • Categorize documents according to issue;
  • Assess whether the opposing party has met production obligations and, if not, identify the deficiencies with specificity;
  • Be armed with valuable data about the incoming production that can be used in negotiations with opposing counsel.
  • Review enough documents about each case issue to understand what the documents produced say about the issue.
  • Better prepare for depositions and find critical exhibit documents faster.

Conclusion

Examining large incoming productions does not have to be time-consuming and expensive. Language-based analytics is a method that reduces the number of documents that must be reviewed in order to understand very accurately what the collection says about each topic and whether the opposing party has met its obligations.


Bobbi Basile, CLA/CP, is Director, Consulting and Analytics for RenewData. Basile has delivered a variety of consulting engagements which include: electronic discovery technology selection and implementation, content management technology selection and implementation, creation of outside counsel guidelines, defining corporate enterprise records management programs, law department knowledge management, trial presentation technology training and law firm matter management selection and implementation.

Read These Next
COVID-19 and Lease Negotiations: Early Termination Provisions Image

During the COVID-19 pandemic, some tenants were able to negotiate termination agreements with their landlords. But even though a landlord may agree to terminate a lease to regain control of a defaulting tenant's space without costly and lengthy litigation, typically a defaulting tenant that otherwise has no contractual right to terminate its lease will be in a much weaker bargaining position with respect to the conditions for termination.

How Secure Is the AI System Your Law Firm Is Using? Image

What Law Firms Need to Know Before Trusting AI Systems with Confidential Information In a profession where confidentiality is paramount, failing to address AI security concerns could have disastrous consequences. It is vital that law firms and those in related industries ask the right questions about AI security to protect their clients and their reputation.

Pleading Importation: ITC Decisions Highlight Need for Adequate Evidentiary Support Image

The International Trade Commission is empowered to block the importation into the United States of products that infringe U.S. intellectual property rights, In the past, the ITC generally instituted investigations without questioning the importation allegations in the complaint, however in several recent cases, the ITC declined to institute an investigation as to certain proposed respondents due to inadequate pleading of importation.

Authentic Communications Today Increase Success for Value-Driven Clients Image

As the relationship between in-house and outside counsel continues to evolve, lawyers must continue to foster a client-first mindset, offer business-focused solutions, and embrace technology that helps deliver work faster and more efficiently.

The Power of Your Inner Circle: Turning Friends and Social Contacts Into Business Allies Image

Practical strategies to explore doing business with friends and social contacts in a way that respects relationships and maximizes opportunities.