Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
For years, attorneys have relied on manual coding or indexing to help them search and categorize sometimes-vast collections of paper documents related to litigation.
In litigation, mergers and acquisitions, and compliance scenarios, document coding is one of the best ways of creating a database that will help a legal team determine document responsiveness, privilege level or both relatively rapidly.
Over the last few years, the legal-support services industry has been inundated by increased demand for electronic discovery and electronic processing of litigation documents, many of which need some type of coding that will allow them to be searched for in and retrieved from automated litigation support (ALS) databases.
Fortunately, advances in technology have produced significant cost-efficiency and workflow improvements make coding a sensible choice for much smaller document collections, whether paper or electronic.
Indeed, much of the innovation in coding workflow has occurred in electronic document coding (EDC), referred to colloquially in the industry as auto coding, which can be applied to paper or electronic documents.
Near- and Long-Term Implications
Manual coding has been done for many years ' domestically or offshore ' by workers called document coders, who visually scrutinize individual documents in a collection and assess them using a matter-specific coding manual, generally developed by attorneys and litigation-support managers, along with vendor project managers on particular cases.
The coding manual standardizes coding protocols, requirements and parameters. Coders capture data from documents they work with, then enter that information in descriptive index fields, generally bibliographic, so that the documents can be referenced, searched and retrieved by referring to a paper index or by using an automated imaged-document litigation-support database application. Ultimately, cataloging or indexing documents for an ALS system allows the documents to be more easily retrieved, sorted, reviewed, printed, prioritized and disseminated than non-ALS-based tools would permit.
While manual coding remains the prevalent and most costly method of document categorization, EDC advances have made it an increasingly viable alternative. Companies like Cataphora, Attenex and Planet Data Solutions, which provide litigation support-related electronic-discovery and EDC services and products, have repeatedly tested and measured the accuracy, consistency, cost and time-efficiency of automated processing. If a paper- and electronic-document collection is relatively “clean,” then the much-sought better, faster, cheaper ideal is found, and becomes reality.
Putting Words Into Action
EDC is the use of rules-based computer programs and algorithms to make objective and subjective decisions about how to categorize, code or index documents.
Checking against vast libraries of document descriptors, such as industry-specific terms like complex drug names or textual patterns, these powerful computer programs can analyze text and extract dates, names, organizations and key words, and put them in database fields much like manual document coders can ' only much faster, for instance, in days instead of months. These programs can also categorize and code documents conceptually, meaning that even if a key term were not in the documents, other sufficiently related semantic patterns or equivocal language in the documents would allow the computer juggling the algorithms to assess and categorize the documents in context.
Although still far from being the document-analysis panacea it was initially marketed as years ago, EDC's time has come. Though document-processing protocols differ markedly between paper and electronic documents, effective EDC relies largely on the following:
A discussion of how to facilitate effective electronic document coding for paper and e-documents continues below.
Substantive Textual Information
For effective electronic document coding, regardless of data source (paper or electronic), the documents must possess substantive attributes that allow them to be described. For electronic documents or files, this information is inherent in the metadata or substantive information found in the document.
Paper documents must be scanned and subjected to optical-character recognition (OCR) processing, or be manually transcribed. Manual transcription is generally expensive, and manual coding is usually the best for documents that are manually transcribed. The net result: the electronic document-coding engine will have normalized input with which to operate, regardless of input source.
A breakdown of document type and processes follows.
Paper Documents
For effective electronic coding of paper documents, the documents must be processed: unitized, scanned and read by OCR to obtain substantive text for each page of every document in the collection.
Unitization
Paper-document unitization occurs before documents are scanned to generate electronic images ' in the document-preparation phase. During unitization, documents are organized according to physical or logical boundaries ' or both. Unitization, then, refers to the box, folder, staple, paper clip, rubber band or any document-binding or document-demarcation method.
Scanning
Scanning documents generates the images from which the substantive text will be captured. Scanned images are generally captured at a resolution of 200 dots per inch (DPI), with higher-image resolutions available, depending on document condition and legibility.
OCR
Optical character recognition technology generates text files for the electronic coding process. This is the most critical aspect of the paper document electronic coding process, because text quality of scanned images will affect efficacy of electronic document coding. Nearly always, OCR accuracy will require some type of manual review to ensure 100% accuracy.
At the end of this phase of the paper process, output data consists of:
Electronic Documents
Unlike their paper counterparts, e-documents don't require unitization, scanning or OCR. Electronic files ' ie, files collected from a hard disk or e-mail files (PST or NSF) ' are logically, or hierarchically, organized in a fashion consistent with the file and folder structure from the hardware or software system from which they were collected. Electronic files also have metadata associated with them, as well as the substantive text the files contain.
Document Taxonomy
Throughout history, libraries have used taxonomies to organize book collections so that researchers could navigate oceans of knowledge that otherwise would be uncharted, data-seeker drowning morasses of information. In litigation, document taxonomies are no different. Here, taxonomy can be defined as a structured and hierarchical list of relevant descriptive subject terms for a specific topic or body of knowledge (or documents). Taxonomies are specific to document collections in that the taxonomy of a corpus of pharmaceutical documents in a patent-infringement case, for example, is context-specific to pharmaceutical documents, as opposed to documents in securities litigation. In ALS applications, taxonomies are used to index and later retrieve the important subjective concepts found in the document bodies.
While it may seem that e-document coding could be somewhat rigid, the opposite is true. Today's electronic document coding algorithms are based largely on artificial-intelligence methodologies and are self-learning, meaning that the software running the algorithms builds on itself from statistical feedback, user validation and from the document collection itself as it grows and changes. The processes EDC comprises also build in methods of allowing human intervention and optimization. It does not require someone with a doctorate in artificial intelligence to fine-tune the system, which is a feature that allows greater flexibility.
Sample Model Documents
Model documents form the initial basis of the self-learning process for electronic document coding applications. They provide the first point of reference from which the applications begin to assess other documents in the collection. Therefore, it is very important in the initial project-development stage to clearly define what documents and terms constitute a particular element of the coding database ' eg, document type (memo, letter, financial statement), organization, proper name, keyword, etc.
Different Levels and Variations Of Document Coding
There are numerous best-practice coding approaches, each of them specific to a particular situation and each objective-dependent. A coding protocol designed around information needs of a case is well grounded. Some situations will require only minimum data-capture for each document, such as:
Other situations will require significantly more detail from the substantive body of the document or metadata, such as:
The following coding treatment levels reflect common scenarios.
Batch treatment level coding. This type of bibliographic coding treatment is best suited for document types conducive to grouping, ie, sequential invoices or purchase-order numbers. Rather than code each document element, a single document record is used to represent the entire group or batch. This will also affect coding pricing that the document provides.
The downside to this method is that the user will need to scroll through each of the document images to find a specific item (which could have been coded individually) that will allow him or her to find a particular record within the group or batch.
Basic bibliographic treatment level coding. This involves the basic objective descriptive information. Example fields include:
Folder treatment level coding. A streamlined approach to creating an ALS database in which appropriate portions of a document collection are bibliographically coded at the folder level rather than at the document level (such as in batch coding). For this coding approach to be effective, meaningful folder boundaries and folder labels are essential.
Keyword treatment level. A coding treatment approach in which a predetermined list of words significant to the litigation is developed, and documents containing any of the words are coded and indexed accordingly. Minor variations of the words in the list (such as singular or plural usage) are considered part of the list. Keyword coding is literal and does not require reading of documents for interpretation.
Names-in-text treatment level. This particular coding treatment approach is employed to capture personal and organizational names in the text, or body, of the document. EDC is especially powerful in grabbing this information from text and populating the appropriate database fields with the relevant information.
Subjective key concept issue coding. This coding is by far the most skill-intensive and valuable because it requires specific knowledge of the issues, strategy and objectives, and because the resulting document data lends itself to whittling down a document collection to a responsive subset more effectively than other types of coding.
Conclusion & Recommendations
EDC should be used wherever possible. Its benefits are that documents can be coded faster, and more consistently, accurately and cheaply, provided the text quality used to do the coding is high. Electronic documents will provide two sources of high-quality text for EDC applications: the e-file substantive text and the corresponding files' metadata.
EDC is not good for use with paper documents that are of poor quality (lots of handwriting and marginalia, illegible data, spreadsheets, or diagram- and graphic-intensive documents). However, where OCR text derived from paper documents is of relatively high accuracy (>85%), results from EDC vis-a-vis price make it well worth the decision to code.
At the end of the day, lawyers should leverage competencies of experts to help discern appropriate vendors, methods and technologies that will facilitate their objectives, depending on the discovery materials they encounter.
For years, attorneys have relied on manual coding or indexing to help them search and categorize sometimes-vast collections of paper documents related to litigation.
In litigation, mergers and acquisitions, and compliance scenarios, document coding is one of the best ways of creating a database that will help a legal team determine document responsiveness, privilege level or both relatively rapidly.
Over the last few years, the legal-support services industry has been inundated by increased demand for electronic discovery and electronic processing of litigation documents, many of which need some type of coding that will allow them to be searched for in and retrieved from automated litigation support (ALS) databases.
Fortunately, advances in technology have produced significant cost-efficiency and workflow improvements make coding a sensible choice for much smaller document collections, whether paper or electronic.
Indeed, much of the innovation in coding workflow has occurred in electronic document coding (EDC), referred to colloquially in the industry as auto coding, which can be applied to paper or electronic documents.
Near- and Long-Term Implications
Manual coding has been done for many years ' domestically or offshore ' by workers called document coders, who visually scrutinize individual documents in a collection and assess them using a matter-specific coding manual, generally developed by attorneys and litigation-support managers, along with vendor project managers on particular cases.
The coding manual standardizes coding protocols, requirements and parameters. Coders capture data from documents they work with, then enter that information in descriptive index fields, generally bibliographic, so that the documents can be referenced, searched and retrieved by referring to a paper index or by using an automated imaged-document litigation-support database application. Ultimately, cataloging or indexing documents for an ALS system allows the documents to be more easily retrieved, sorted, reviewed, printed, prioritized and disseminated than non-ALS-based tools would permit.
While manual coding remains the prevalent and most costly method of document categorization, EDC advances have made it an increasingly viable alternative. Companies like Cataphora, Attenex and Planet Data Solutions, which provide litigation support-related electronic-discovery and EDC services and products, have repeatedly tested and measured the accuracy, consistency, cost and time-efficiency of automated processing. If a paper- and electronic-document collection is relatively “clean,” then the much-sought better, faster, cheaper ideal is found, and becomes reality.
Putting Words Into Action
EDC is the use of rules-based computer programs and algorithms to make objective and subjective decisions about how to categorize, code or index documents.
Checking against vast libraries of document descriptors, such as industry-specific terms like complex drug names or textual patterns, these powerful computer programs can analyze text and extract dates, names, organizations and key words, and put them in database fields much like manual document coders can ' only much faster, for instance, in days instead of months. These programs can also categorize and code documents conceptually, meaning that even if a key term were not in the documents, other sufficiently related semantic patterns or equivocal language in the documents would allow the computer juggling the algorithms to assess and categorize the documents in context.
Although still far from being the document-analysis panacea it was initially marketed as years ago, EDC's time has come. Though document-processing protocols differ markedly between paper and electronic documents, effective EDC relies largely on the following:
A discussion of how to facilitate effective electronic document coding for paper and e-documents continues below.
Substantive Textual Information
For effective electronic document coding, regardless of data source (paper or electronic), the documents must possess substantive attributes that allow them to be described. For electronic documents or files, this information is inherent in the metadata or substantive information found in the document.
Paper documents must be scanned and subjected to optical-character recognition (OCR) processing, or be manually transcribed. Manual transcription is generally expensive, and manual coding is usually the best for documents that are manually transcribed. The net result: the electronic document-coding engine will have normalized input with which to operate, regardless of input source.
A breakdown of document type and processes follows.
Paper Documents
For effective electronic coding of paper documents, the documents must be processed: unitized, scanned and read by OCR to obtain substantive text for each page of every document in the collection.
Unitization
Paper-document unitization occurs before documents are scanned to generate electronic images ' in the document-preparation phase. During unitization, documents are organized according to physical or logical boundaries ' or both. Unitization, then, refers to the box, folder, staple, paper clip, rubber band or any document-binding or document-demarcation method.
Scanning
Scanning documents generates the images from which the substantive text will be captured. Scanned images are generally captured at a resolution of 200 dots per inch (DPI), with higher-image resolutions available, depending on document condition and legibility.
OCR
Optical character recognition technology generates text files for the electronic coding process. This is the most critical aspect of the paper document electronic coding process, because text quality of scanned images will affect efficacy of electronic document coding. Nearly always, OCR accuracy will require some type of manual review to ensure 100% accuracy.
At the end of this phase of the paper process, output data consists of:
Electronic Documents
Unlike their paper counterparts, e-documents don't require unitization, scanning or OCR. Electronic files ' ie, files collected from a hard disk or e-mail files (PST or NSF) ' are logically, or hierarchically, organized in a fashion consistent with the file and folder structure from the hardware or software system from which they were collected. Electronic files also have metadata associated with them, as well as the substantive text the files contain.
Document Taxonomy
Throughout history, libraries have used taxonomies to organize book collections so that researchers could navigate oceans of knowledge that otherwise would be uncharted, data-seeker drowning morasses of information. In litigation, document taxonomies are no different. Here, taxonomy can be defined as a structured and hierarchical list of relevant descriptive subject terms for a specific topic or body of knowledge (or documents). Taxonomies are specific to document collections in that the taxonomy of a corpus of pharmaceutical documents in a patent-infringement case, for example, is context-specific to pharmaceutical documents, as opposed to documents in securities litigation. In ALS applications, taxonomies are used to index and later retrieve the important subjective concepts found in the document bodies.
While it may seem that e-document coding could be somewhat rigid, the opposite is true. Today's electronic document coding algorithms are based largely on artificial-intelligence methodologies and are self-learning, meaning that the software running the algorithms builds on itself from statistical feedback, user validation and from the document collection itself as it grows and changes. The processes EDC comprises also build in methods of allowing human intervention and optimization. It does not require someone with a doctorate in artificial intelligence to fine-tune the system, which is a feature that allows greater flexibility.
Sample Model Documents
Model documents form the initial basis of the self-learning process for electronic document coding applications. They provide the first point of reference from which the applications begin to assess other documents in the collection. Therefore, it is very important in the initial project-development stage to clearly define what documents and terms constitute a particular element of the coding database ' eg, document type (memo, letter, financial statement), organization, proper name, keyword, etc.
Different Levels and Variations Of Document Coding
There are numerous best-practice coding approaches, each of them specific to a particular situation and each objective-dependent. A coding protocol designed around information needs of a case is well grounded. Some situations will require only minimum data-capture for each document, such as:
Other situations will require significantly more detail from the substantive body of the document or metadata, such as:
The following coding treatment levels reflect common scenarios.
Batch treatment level coding. This type of bibliographic coding treatment is best suited for document types conducive to grouping, ie, sequential invoices or purchase-order numbers. Rather than code each document element, a single document record is used to represent the entire group or batch. This will also affect coding pricing that the document provides.
The downside to this method is that the user will need to scroll through each of the document images to find a specific item (which could have been coded individually) that will allow him or her to find a particular record within the group or batch.
Basic bibliographic treatment level coding. This involves the basic objective descriptive information. Example fields include:
Folder treatment level coding. A streamlined approach to creating an ALS database in which appropriate portions of a document collection are bibliographically coded at the folder level rather than at the document level (such as in batch coding). For this coding approach to be effective, meaningful folder boundaries and folder labels are essential.
Keyword treatment level. A coding treatment approach in which a predetermined list of words significant to the litigation is developed, and documents containing any of the words are coded and indexed accordingly. Minor variations of the words in the list (such as singular or plural usage) are considered part of the list. Keyword coding is literal and does not require reading of documents for interpretation.
Names-in-text treatment level. This particular coding treatment approach is employed to capture personal and organizational names in the text, or body, of the document. EDC is especially powerful in grabbing this information from text and populating the appropriate database fields with the relevant information.
Subjective key concept issue coding. This coding is by far the most skill-intensive and valuable because it requires specific knowledge of the issues, strategy and objectives, and because the resulting document data lends itself to whittling down a document collection to a responsive subset more effectively than other types of coding.
Conclusion & Recommendations
EDC should be used wherever possible. Its benefits are that documents can be coded faster, and more consistently, accurately and cheaply, provided the text quality used to do the coding is high. Electronic documents will provide two sources of high-quality text for EDC applications: the e-file substantive text and the corresponding files' metadata.
EDC is not good for use with paper documents that are of poor quality (lots of handwriting and marginalia, illegible data, spreadsheets, or diagram- and graphic-intensive documents). However, where OCR text derived from paper documents is of relatively high accuracy (>85%), results from EDC vis-a-vis price make it well worth the decision to code.
At the end of the day, lawyers should leverage competencies of experts to help discern appropriate vendors, methods and technologies that will facilitate their objectives, depending on the discovery materials they encounter.
With each successive large-scale cyber attack, it is slowly becoming clear that ransomware attacks are targeting the critical infrastructure of the most powerful country on the planet. Understanding the strategy, and tactics of our opponents, as well as the strategy and the tactics we implement as a response are vital to victory.
This article highlights how copyright law in the United Kingdom differs from U.S. copyright law, and points out differences that may be crucial to entertainment and media businesses familiar with U.S law that are interested in operating in the United Kingdom or under UK law. The article also briefly addresses contrasts in UK and U.S. trademark law.
In June 2024, the First Department decided Huguenot LLC v. Megalith Capital Group Fund I, L.P., which resolved a question of liability for a group of condominium apartment buyers and in so doing, touched on a wide range of issues about how contracts can obligate purchasers of real property.
The Article 8 opt-in election adds an additional layer of complexity to the already labyrinthine rules governing perfection of security interests under the UCC. A lender that is unaware of the nuances created by the opt in (may find its security interest vulnerable to being primed by another party that has taken steps to perfect in a superior manner under the circumstances.