Projects

An International Multi-party Litigation

Ten defendants were producing documents to our client in both native and image format in six different languages. This required a coordinated effort for data extraction and OCRing of documents. Each data set had unique challenges. Along with Discovery Specialists, several other vendors were considered. However, the unique challenges in processing filenames and directory structures with Japanese and Korean characters was beyond the capability of many standard processing systems. Additionally, with so many languages in one data set, other vendors intended to make multiple passes through the data set to focus on each separate language. This would have increased the time necessary to finish the job and raise the costs of production.

Discovery Specialists was able to use its language detection processing to assess which documents contained foreign languages and then accurately machine translate only those documents. We even translated documents containing multiple languages. By use of proprietary technology and algorithms, we saved our clients both time and costs.

Another problem with foreign data sets on this job was the existence of documents created by proprietary foreign software applications. In some cases these software applications used file extensions that are the same as more common applications. By analyzing the meta-data of these files, Discovery Specialists was able to accurately identify the software applications that created these documents and either convert these documents to a standard format, obtain a copy of the foreign application to read the document, or precisely identify the type of document to the law firm in order to have their client output this document in a format that could be read. This same technology was applied to locate documents created by common applications such as Word for Windows, but where file extensions has accidently been entered incorrectly (e.g. .DOV instead of .DOC).

Discovery Specialists’ ability to easily handle foreign documents is a big advantage for clients that deal with international companies. By accurately identifying the correct languages in data sets, searches can be performed by Discovery Specialists where one search string can be used to search the entire data set containing different languages. Before a document of a different language is searched, the search string is automatically converted into that document’s native language to improve the accuracy of the search.

In this same job, we also effectively resolved another challenge. Out of millions of pages of documents that were received in TIFF and PDF format, many were of poor image quality that standard OCR systems would not be able to accurately read. Discovery Specialists applied a series of image processing techniques to documents in a multiple pass OCR system to get the best results possible. Images that are simply too poor to be accurately OCR’d were flagged for manual review. Many of these image documents were emails. Discovery Specialists used its email field extraction algorithms to pull relevant information (i.e. TO, FROM, CC, SUBJECT, DATE, and ATTACHMENT fields) from the images and populated the review database allowing these documents to be more easily searched, sorted, and referenced to the email attachments. Along with some of the produced documents OCR text files were provided. However, the text files did not match the images. While the law firm responsible for producing these mismatched documents was asked for a corrected production, the time to correct this mistake was going to negatively impact our client’s time to adequately prepare for related depositions. Discovery Specialists was able to create a program to automatically detect and correct mismatched text files and images and quickly have the documents ready for review.

Another problem encountered with the opposition’s production was overlapping Bates ranges. This mistake by the Opposition was automatically detected and control numbers applied to those documents in the review database to let the attorneys still get started on their review, but allow the system to be updated at a later time when corrected Bates ranges were provided. Over 6 million pages were extracted, searched, and reviewed in this job. Discovery Specialists’ ability to tag these documents by language, contextual searching, and file type allowed the attorney reviewers to quickly review the documents needed for fast approaching depositions, while the remainder of the documents were reviewed at a later time. In addition, reports were provided that provided gaps in the Opposition’s production, documents not produced, and people at the Opposition’s companies that appeared in many documents, but were not custodians for collection from the Opposition. This allowed the attorneys to provide important details when pushing the Opposition for further discovery.