Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.
While e-discovery may be Greek to many, it is those documents written in Chinese, Japanese, Korean and Russian that cause much of the trouble for companies ' bricks-and-mortar and 'e' alike ' when documents must be collected, reviewed, redacted and presented.
These 'multi-byte' languages have exponentially more characters than the 20-some letters and punctuation marks that such Indo-European languages like English, Spanish, French and German need. In fact, more than 47,000 characters are listed in the Chinese Kangxi dictionary (though 'only' 3,000 – 4,000 are reportedly necessary for full literacy).
The impact on e-discovery is significant, considering the increased sophistication necessary for case evaluation. At the most basic level, computers 'think' in ones and zeros, with a one or zero being a 'bit.' Eight make a byte. There are 256 combinations of numbers you can create using a byte (two bits to the eighth power).
For languages that are not based solely on letters, i.e., those where symbols represent a concept or a syllable, you need to add bytes (256 x 256, which equals 66,536). That is the essence of multi-byte vs. single-byte languages: single-byte languages have 256 possible combinations, while multi-byte languages have 66,536.
Confused? Then let's address codings
What's the Code?
An encoding is a programmatical translation of what you input to what you get on the screen. The problem is when you have multiple encodings. For example, when analyzing an Outlook 2000 e-mail file (PST format) under a Japanese operating system that you then convert to an English-language machine for review, problems will arise because the native data in Japanese is corrupted due to linguistic differences. Unicode was created to solve some of these problems and offer a universal solution; however, it is available only for files created on newer systems, making legacy data a continuing area of concern.
'Each language family has its own unique set of problems and solutions,' Thomas Barnett, Special Counsel for Sullivan & Cromwell, LLP, says.
In fact, 'in some parts of the world, you are not allowed to take the data out of the country due to local data-protection laws,' adds Brian Kim of PriceWaterhouseCoopers LLP. He adds that certain countries also have native applications that are more popular than those commonly used in the United States, requiring additional evaluation of your program inventory.
Whether your data is in Unicode or not, proper preservation is the key. While Microsoft Windows NT, 2000, XP and subsequent versions support Unicode, many archiving or compression tools do not. This could result in missing files that may or may not be reported in error logs, so for that reason, you must test carefully, Kim notes. Also, to ensure correct extraction, properly align the regional settings.
Duplication, in More Ways Than One
Some languages have overlapping characters, e.g., Chinese and Japanese, and others do not use spacing, which makes search more complicated. And many corporate documents will combine English with another language. To avoid mistakes and enhance defensibility, consider organizing data for review beyond keyword searching, given the difficulty in establishing such terms for foreign languages. Also remember that translation is expensive. While expert translators, ordinary native speakers and native machine translators are options, the issue is often one of timing and the reliability of the end product.
Remember, e-discovery is Greek to you only if you don't know the code. e-Commerce counsel will do well by their clients to learn the code, or to seek partners in potential or real e-discovery matters who are experts in the code.
While e-discovery may be Greek to many, it is those documents written in Chinese, Japanese, Korean and Russian that cause much of the trouble for companies ' bricks-and-mortar and 'e' alike ' when documents must be collected, reviewed, redacted and presented.
These 'multi-byte' languages have exponentially more characters than the 20-some letters and punctuation marks that such Indo-European languages like English, Spanish, French and German need. In fact, more than 47,000 characters are listed in the Chinese Kangxi dictionary (though 'only' 3,000 – 4,000 are reportedly necessary for full literacy).
The impact on e-discovery is significant, considering the increased sophistication necessary for case evaluation. At the most basic level, computers 'think' in ones and zeros, with a one or zero being a 'bit.' Eight make a byte. There are 256 combinations of numbers you can create using a byte (two bits to the eighth power).
For languages that are not based solely on letters, i.e., those where symbols represent a concept or a syllable, you need to add bytes (256 x 256, which equals 66,536). That is the essence of multi-byte vs. single-byte languages: single-byte languages have 256 possible combinations, while multi-byte languages have 66,536.
Confused? Then let's address codings
What's the Code?
An encoding is a programmatical translation of what you input to what you get on the screen. The problem is when you have multiple encodings. For example, when analyzing an Outlook 2000 e-mail file (PST format) under a Japanese operating system that you then convert to an English-language machine for review, problems will arise because the native data in Japanese is corrupted due to linguistic differences. Unicode was created to solve some of these problems and offer a universal solution; however, it is available only for files created on newer systems, making legacy data a continuing area of concern.
'Each language family has its own unique set of problems and solutions,' Thomas Barnett, Special Counsel for
In fact, 'in some parts of the world, you are not allowed to take the data out of the country due to local data-protection laws,' adds Brian Kim of
Whether your data is in Unicode or not, proper preservation is the key. While
Duplication, in More Ways Than One
Some languages have overlapping characters, e.g., Chinese and Japanese, and others do not use spacing, which makes search more complicated. And many corporate documents will combine English with another language. To avoid mistakes and enhance defensibility, consider organizing data for review beyond keyword searching, given the difficulty in establishing such terms for foreign languages. Also remember that translation is expensive. While expert translators, ordinary native speakers and native machine translators are options, the issue is often one of timing and the reliability of the end product.
Remember, e-discovery is Greek to you only if you don't know the code. e-Commerce counsel will do well by their clients to learn the code, or to seek partners in potential or real e-discovery matters who are experts in the code.
Businesses have long embraced the use of computer technology in the workplace as a means of improving efficiency and productivity of their operations. In recent years, businesses have incorporated artificial intelligence and other automated and algorithmic technologies into their computer systems. This article provides an overview of the federal regulatory guidance and the state and local rules in place so far and suggests ways in which employers may wish to address these developments with policies and practices to reduce legal risk.
This two-part article dives into the massive shifts AI is bringing to Google Search and SEO and why traditional searches are no longer part of the solution for marketers. It’s not theoretical, it’s happening, and firms that adapt will come out ahead.
For decades, the Children’s Online Privacy Protection Act has been the only law to expressly address privacy for minors’ information other than student data. In the absence of more robust federal requirements, states are stepping in to regulate not only the processing of all minors’ data, but also online platforms used by teens and children.
In an era where the workplace is constantly evolving, law firms face unique challenges and opportunities in facilities management, real estate, and design. Across the industry, firms are reevaluating their office spaces to adapt to hybrid work models, prioritize collaboration, and enhance employee experience. Trends such as flexible seating, technology-driven planning, and the creation of multifunctional spaces are shaping the future of law firm offices.
Protection against unauthorized model distillation is an emerging issue within the longstanding theme of safeguarding intellectual property. This article examines the legal protections available under the current legal framework and explore why patents may serve as a crucial safeguard against unauthorized distillation.