Law.com Subscribers SAVE 30%

Call 855-808-4530 or email [email protected] to receive your discount on a new subscription.

Successful Data Migration

By David Hartmann and Scott Giordano
March 29, 2013

When corporate legal and IT departments deploy new enterprise software, migrating legacy data into the new system is usually one of the larger challenges faced. When it comes to e-discovery software, this challenge is exasperated as matter information may be contained in legacy systems or in a collection of spreadsheets or other ad hoc tools. This challenge presents unique risks, since lost or altered electronically stored information (ESI) or audit trails can lead to opposing counsel questioning the integrity of the entire e-discovery process, with judicial sanctions looming. Put simply, implementation teams have to get it right the first time. It's easy to think of data migrations purely in terms of technical requirements. But like any complex project, they must be approached as a process, involving various stakeholders and a carefully defined sequence of activities.

The essence of any data migration project involves the following stages:

  1. Gathering project requirements and assessing related parameters;
  2. Defining and analyzing the source data;
  3. Identifying and mapping the information;
  4. Extracting and ingesting the data; and
  5. Validating the results, gaining user acceptance and going live.

Project Requirements and Assessing Parameters

Gathering requirements for data migration initiatives involves assigning project sponsors and having them answer potentially difficult threshold questions in advance of deployment, including:

  • What are the overall goals of the migration?
  • What are the risks of going forward?
  • How quickly does the project need to be completed?
  • What information do we want to move?
  • Do we have the right people on the project team?

Having a common understanding of project goals is very important. It is easy to assume that data migrations involve the simple goal of transferring all data from one system to another. In fact, transferring certain ESI previously stored on the legacy system may not be necessary ' or even desirable ' depending on the nature of the systems involved. What's more, data migrations often serve as good opportunities for organizations to assess the data and determine what ESI, if any, can be safely discarded without threat of legal consequence. In this sense, some organizations leverage data migrations as an opportunity to assess and 'clean up' their data as part of defensible data deletion initiatives. It's important that such priorities be woven into the overall project goals and evaluated for success at the end of the project.

Ensuring the project has the right team members is also very important. Successful migrations require subject matter experts who can identify significant risks and requirements, such as the precise attributes of the data elements, knowledge about the organization's IT infrastructure, and what ESI is currently under legal hold or has been triggered for preservation for pending legal matters. For example, a database administrator (DBA) knows the myriad formats and attributes of the data elements of a given repository; that expertise is necessary for defining the protocols needed for successful integration with and migration to the new e-discovery system. See Figure 1, below.

[IMGCAP(1)]

Defining and Analyzing the Source Data

Once the project parameters have been defined and the project team assembled, the next step involves defining and analyzing the source data. Its structure must match the necessary format of the destination system. Even the smallest discrepancies can result in a flawed migration. Every data element will have a particular format and attributes. For example, when formatting a cell in an Excel spreadsheet, users have formatting options, such as date, time, percentage and scientific. These options are necessary because IT systems expect information in a particular format and will not process it otherwise. Dates are a particularly good example of this consideration because they can be formatted in any variety of ways: dd-mm-yyyy, dd-mm-yy, yyyy-mm-dd and many more.

Adding a further wrinkle to this is that source data may come from a variety of ad hoc and or formal e-discovery systems, such as matter management, review point tools or Excel spreadsheets, or enterprise-managed systems like Microsoft Access, SharePoint or Lotus Notes. Migrating data from Lotus Notes poses unique challenges on its own as most Notes systems are developed in-house and contain data structures that are not consistent with industry standards. Conversely, an existing e-discovery point tool might contain more universal data structures but tend to logically relate and organize the data much differently than the destination system.

Identifying and Mapping the Information

The next consideration is how source data will be 'mapped' to the new system. In the e-discovery context, matter, legal hold and custodian records will each have a unique configuration. Some tasks for mapping include:

  • Identifying what ESI from a particular source needs to be moved;
  • Understanding where that information will appear in the new system;
  • Defining what type of character coding, field types and formatting are involved; and
  • Identifying the unique identifiers.

The first task is potentially the most difficult because of the potential record volumes. For example, a legal department may have 2,000 active matters with an average of 10 holds and 100 custodians per matter, creating 200,000 record combinations requiring migration. Add to this the history of a given legal hold, which itself may contain scores of entries, and suddenly the potential volume has exploded. Referring back to the project goals, and deciding whether to migrate legal hold histories or just the current active elements of a hold, implicates how a legal department can address a failure during a hold process. Not keeping the hold histories involves the risk that if there is a failure with a given hold, it may be difficult or impossible to reconstruct what precisely went wrong and potentially exculpatory information will not be available.

For e-discovery systems, the smallest details must be taken into account. For example, with legal holds, a source system Matter Name field may allow for 256 characters with special characters permitted, like the '#' symbol, in the name of the matter. The destination system might not allow for such symbols or as many characters in the corresponding field. The identification of unique identifiers within the data set is especially critical for mapping. Data, in its rawest form, is decentralized and seemingly random. Unique identifiers are the elements within data sets that link records together in a logical way. Using a legal hold example, unique identifiers are what allow a system to precisely recognize the connection between a particular matter, all the legal holds that fall under it and the implicated custodians. In short, it is impossible to successfully map data to a new environment without first understanding how it's connected within the legacy system.

Extracting and Ingesting the Data

After the mapping strategy and configuration are complete, a test migration with sample data should be conducted. Data can be extracted from the source and ingested into another in a variety of ways. Two popular methods include the use of eXtensible Markup Language (XML) and a Comma-Separated Value (CSV) table. In the former, all of the formatting information is included with the data values, so that an XML-enabled system receiving the data will 'know' everything about it, including where to place it. In the latter, every data element of a given record will be copied into a text file where each data value is separated by a 'delimiter,' a character such as a comma, which represents a boundary between values. Each subsequent record continues to be copied, one after another, with the same number of data types with unique data values for each record extracted. This continues with one record flowing after the next until all records are extracted. A utility in the new system will read each of these elements, using the delimiter as a guide, and copy the data elements (i.e., ingest them) into their proper fields in the new matter, legal hold, or custodian record.

The granular nature of the CSV method underscores the potential pitfalls of the ingestion process. For example, if a delimiter for the configuration of the extraction file is a comma and the name of the matter has a comma in it, the Matter Name in the source system will be two fields instead of one. That means the other half of the Matter Name after the comma wasn't ingested properly and likely mapped to whatever field came after the Matter Name in the migration process. See Figure 2, below.

[IMGCAP(2)]

Validating the Results, Gaining User Acceptance and Going Live

Moving the system into production involves installing the new e-discovery software behind the corporate firewall onto the end users' hardware, backing up the source data and then conducting the migration. During the test migration process, a utility program will check that both the source and the target data elements match, and if they don't, will note the mismatch (or any other problem) in an error log. The project team will review these logs and attempt to resolve the errors. The process can be very tedious and time consuming because errors can stem from so many sources. Multiple test migrations may be necessary to resolve just one error.

Once the test migration is complete and the migration model is validated, the configuration will be deployed to intermediary hardware for User Acceptance Testing (UAT), where the end users will begin working with it as they would in a typical work day in order to determine if it functions as planned. UAT can last anywhere from weeks to months, depending on a number of factors, including: the complexity of the migration; the volume of information being transferred; and the urgency with which the new system must be up and running. Once the business users (the legal department team) and the IT team formally accept the proposed new system, it will be moved off of the intermediary hardware and into production. The migration should be monitored 'live' in order to catch any problems and, if necessary, stop the process.

The business users and IT team will conduct a final evaluation before the system is accepted to 'go live' by revisiting the success criteria set forth at the beginning of the project and determining if it has been met with the integrity of data intact. In addition, the project sponsors must accept the final result of the migration and be prepared to cease operational practices that would cause old systems to generate new data.

Conclusion

Data migration is a necessary evil of the e-discovery world. An organization's e-discovery requirements can quickly change and necessitate the acquirement of a new system. While the systems that support e-discovery processes may rapidly change, the underlying information within them cannot. The process of moving that information is inherently complex and fraught with risk. Those risks, however, can be significantly mitigated by following the steps outlined in this article.


David Hartmann is Director of Client Success at Exterro. With more than 15 years of experience in complex software implementations, Hartmann heads up the planning and execution of project charters and criteria processes. Scott Giordano is corporate technology counsel at Exterro. Giordano holds both Information Security Systems Professional (CISSP) and Certified Information Privacy Professional (CIPP) certifications and serves as Exterro's subject matter expert on the intersection of law and technology.

When corporate legal and IT departments deploy new enterprise software, migrating legacy data into the new system is usually one of the larger challenges faced. When it comes to e-discovery software, this challenge is exasperated as matter information may be contained in legacy systems or in a collection of spreadsheets or other ad hoc tools. This challenge presents unique risks, since lost or altered electronically stored information (ESI) or audit trails can lead to opposing counsel questioning the integrity of the entire e-discovery process, with judicial sanctions looming. Put simply, implementation teams have to get it right the first time. It's easy to think of data migrations purely in terms of technical requirements. But like any complex project, they must be approached as a process, involving various stakeholders and a carefully defined sequence of activities.

The essence of any data migration project involves the following stages:

  1. Gathering project requirements and assessing related parameters;
  2. Defining and analyzing the source data;
  3. Identifying and mapping the information;
  4. Extracting and ingesting the data; and
  5. Validating the results, gaining user acceptance and going live.

Project Requirements and Assessing Parameters

Gathering requirements for data migration initiatives involves assigning project sponsors and having them answer potentially difficult threshold questions in advance of deployment, including:

  • What are the overall goals of the migration?
  • What are the risks of going forward?
  • How quickly does the project need to be completed?
  • What information do we want to move?
  • Do we have the right people on the project team?

Having a common understanding of project goals is very important. It is easy to assume that data migrations involve the simple goal of transferring all data from one system to another. In fact, transferring certain ESI previously stored on the legacy system may not be necessary ' or even desirable ' depending on the nature of the systems involved. What's more, data migrations often serve as good opportunities for organizations to assess the data and determine what ESI, if any, can be safely discarded without threat of legal consequence. In this sense, some organizations leverage data migrations as an opportunity to assess and 'clean up' their data as part of defensible data deletion initiatives. It's important that such priorities be woven into the overall project goals and evaluated for success at the end of the project.

Ensuring the project has the right team members is also very important. Successful migrations require subject matter experts who can identify significant risks and requirements, such as the precise attributes of the data elements, knowledge about the organization's IT infrastructure, and what ESI is currently under legal hold or has been triggered for preservation for pending legal matters. For example, a database administrator (DBA) knows the myriad formats and attributes of the data elements of a given repository; that expertise is necessary for defining the protocols needed for successful integration with and migration to the new e-discovery system. See Figure 1, below.

[IMGCAP(1)]

Defining and Analyzing the Source Data

Once the project parameters have been defined and the project team assembled, the next step involves defining and analyzing the source data. Its structure must match the necessary format of the destination system. Even the smallest discrepancies can result in a flawed migration. Every data element will have a particular format and attributes. For example, when formatting a cell in an Excel spreadsheet, users have formatting options, such as date, time, percentage and scientific. These options are necessary because IT systems expect information in a particular format and will not process it otherwise. Dates are a particularly good example of this consideration because they can be formatted in any variety of ways: dd-mm-yyyy, dd-mm-yy, yyyy-mm-dd and many more.

Adding a further wrinkle to this is that source data may come from a variety of ad hoc and or formal e-discovery systems, such as matter management, review point tools or Excel spreadsheets, or enterprise-managed systems like Microsoft Access, SharePoint or Lotus Notes. Migrating data from Lotus Notes poses unique challenges on its own as most Notes systems are developed in-house and contain data structures that are not consistent with industry standards. Conversely, an existing e-discovery point tool might contain more universal data structures but tend to logically relate and organize the data much differently than the destination system.

Identifying and Mapping the Information

The next consideration is how source data will be 'mapped' to the new system. In the e-discovery context, matter, legal hold and custodian records will each have a unique configuration. Some tasks for mapping include:

  • Identifying what ESI from a particular source needs to be moved;
  • Understanding where that information will appear in the new system;
  • Defining what type of character coding, field types and formatting are involved; and
  • Identifying the unique identifiers.

The first task is potentially the most difficult because of the potential record volumes. For example, a legal department may have 2,000 active matters with an average of 10 holds and 100 custodians per matter, creating 200,000 record combinations requiring migration. Add to this the history of a given legal hold, which itself may contain scores of entries, and suddenly the potential volume has exploded. Referring back to the project goals, and deciding whether to migrate legal hold histories or just the current active elements of a hold, implicates how a legal department can address a failure during a hold process. Not keeping the hold histories involves the risk that if there is a failure with a given hold, it may be difficult or impossible to reconstruct what precisely went wrong and potentially exculpatory information will not be available.

For e-discovery systems, the smallest details must be taken into account. For example, with legal holds, a source system Matter Name field may allow for 256 characters with special characters permitted, like the '#' symbol, in the name of the matter. The destination system might not allow for such symbols or as many characters in the corresponding field. The identification of unique identifiers within the data set is especially critical for mapping. Data, in its rawest form, is decentralized and seemingly random. Unique identifiers are the elements within data sets that link records together in a logical way. Using a legal hold example, unique identifiers are what allow a system to precisely recognize the connection between a particular matter, all the legal holds that fall under it and the implicated custodians. In short, it is impossible to successfully map data to a new environment without first understanding how it's connected within the legacy system.

Extracting and Ingesting the Data

After the mapping strategy and configuration are complete, a test migration with sample data should be conducted. Data can be extracted from the source and ingested into another in a variety of ways. Two popular methods include the use of eXtensible Markup Language (XML) and a Comma-Separated Value (CSV) table. In the former, all of the formatting information is included with the data values, so that an XML-enabled system receiving the data will 'know' everything about it, including where to place it. In the latter, every data element of a given record will be copied into a text file where each data value is separated by a 'delimiter,' a character such as a comma, which represents a boundary between values. Each subsequent record continues to be copied, one after another, with the same number of data types with unique data values for each record extracted. This continues with one record flowing after the next until all records are extracted. A utility in the new system will read each of these elements, using the delimiter as a guide, and copy the data elements (i.e., ingest them) into their proper fields in the new matter, legal hold, or custodian record.

The granular nature of the CSV method underscores the potential pitfalls of the ingestion process. For example, if a delimiter for the configuration of the extraction file is a comma and the name of the matter has a comma in it, the Matter Name in the source system will be two fields instead of one. That means the other half of the Matter Name after the comma wasn't ingested properly and likely mapped to whatever field came after the Matter Name in the migration process. See Figure 2, below.

[IMGCAP(2)]

Validating the Results, Gaining User Acceptance and Going Live

Moving the system into production involves installing the new e-discovery software behind the corporate firewall onto the end users' hardware, backing up the source data and then conducting the migration. During the test migration process, a utility program will check that both the source and the target data elements match, and if they don't, will note the mismatch (or any other problem) in an error log. The project team will review these logs and attempt to resolve the errors. The process can be very tedious and time consuming because errors can stem from so many sources. Multiple test migrations may be necessary to resolve just one error.

Once the test migration is complete and the migration model is validated, the configuration will be deployed to intermediary hardware for User Acceptance Testing (UAT), where the end users will begin working with it as they would in a typical work day in order to determine if it functions as planned. UAT can last anywhere from weeks to months, depending on a number of factors, including: the complexity of the migration; the volume of information being transferred; and the urgency with which the new system must be up and running. Once the business users (the legal department team) and the IT team formally accept the proposed new system, it will be moved off of the intermediary hardware and into production. The migration should be monitored 'live' in order to catch any problems and, if necessary, stop the process.

The business users and IT team will conduct a final evaluation before the system is accepted to 'go live' by revisiting the success criteria set forth at the beginning of the project and determining if it has been met with the integrity of data intact. In addition, the project sponsors must accept the final result of the migration and be prepared to cease operational practices that would cause old systems to generate new data.

Conclusion

Data migration is a necessary evil of the e-discovery world. An organization's e-discovery requirements can quickly change and necessitate the acquirement of a new system. While the systems that support e-discovery processes may rapidly change, the underlying information within them cannot. The process of moving that information is inherently complex and fraught with risk. Those risks, however, can be significantly mitigated by following the steps outlined in this article.


David Hartmann is Director of Client Success at Exterro. With more than 15 years of experience in complex software implementations, Hartmann heads up the planning and execution of project charters and criteria processes. Scott Giordano is corporate technology counsel at Exterro. Giordano holds both Information Security Systems Professional (CISSP) and Certified Information Privacy Professional (CIPP) certifications and serves as Exterro's subject matter expert on the intersection of law and technology.

Read These Next
Top 5 Strategies for Managing the End-of-Year Collections Frenzy Image

End of year collections are crucial for law firms because they allow them to maximize their revenue for the year, impacting profitability, partner distributions and bonus calculations by ensuring outstanding invoices are paid before the year closes, which is especially important for meeting financial targets and managing cash flow throughout the firm.

The Self-Service Buyer Is On the Rise Image

Law firms and companies in the professional services space must recognize that clients are conducting extensive online research before making contact. Prospective buyers are no longer waiting for meetings with partners or business development professionals to understand the firm's offerings. Instead, they are seeking out information on their own, and they want to do it quickly and efficiently.

Should Large Law Firms Penalize RTO Rebels or Explore Alternatives? Image

Through a balanced approach that combines incentives with accountability, firms can navigate the complexities of returning to the office while maintaining productivity and morale.

Sink or Swim: The Evolving State of Law Firm Administrative Support Image

The paradigm of legal administrative support within law firms has undergone a remarkable transformation over the last decade. But this begs the question: are the changes to administrative support successful, and do law firms feel they are sufficiently prepared to meet future business needs?

Tax Treatment of Judgments and Settlements Image

Counsel should include in its analysis of a case the taxability of the anticipated and sought after damages as the tax effect could be substantial.