Contact an eDiscovery Specialist    call (888) 881-8406    fax (888) 422-6851

Predictive Coding

Through a combination of proprietary technology and highly educated and specialized program managers for each specific project, we have the ability to rapidly work thru a large volume of documents and classify those documents subject to client’s customized categories and criteria.

Traditional document review process is sequential in nature. An individual reviewer cycles thru a set of documents one by one and decides whether a document is responsive or not, and then the responsive document set is usually assigned a predetermined category/tag. This process is labor intensive, expensive, and often wrongly perceived as the best approach for document review. However, aside from being very costly for the clients, this process is usually full of errors and inconsistencies due to reviewer fatigue, inconsistencies in tagging documents accross different reviewers and often inconsistencies in tagging for the same reviewer during the course of a multi-day or a multi-week review timeframe. Over the course of last couple of years, text analytics and linguistic algorithms have matured to a point where they can be successfully used for document review. The collection of techniques, algorithms and workflow processes are often referred to as "Predictive coding", "Auto Categorization" or "Machine Coding." We'll use the term Predictive Coding to refer to these techniques collectively.

Predictive Coding is a combination of sophisticated  mathematical algorithms and workflow that harnesses subject matter knowledge of our project managers and senior attorneys on the case team. Linguistics-based mathematical algorithms are capable of assessing contextual similarity accross a large set of documets. Project workflow is designed to maximize the effficiency and efficacy of the mathematical algorithms by properly training the system.

Predictive Coding can be used in a variety of ways to significantly decrease the cost of the review process. With our clients we found the largest impact when Predictive coding techniques are used in one of following two ways:

1. Enhance the traditional review process
2. Replace the traditional review process

These approaches can be used both for the review of own documents as well as for reviewing opposition documents.

Enhance the traditional review process

Several techniques can be used to significantly improve both the efficiency and efficacy of the review team.

Instead of randomly allocating documents to reviewers, Auto Categorization, Near Duplicate and Email Dethreading techniques are used to group contextually similar documents. Reviewers are then assigned batches of documents that are similar in content.

This approach dramatically improves the speed of coding and therefore overall cost of review over the traditional approach where reviewers are assigned random batches of documents.

Replace the traditional review process

While above technique is very useful, when it comes to very large data sets, a complete replacement of the traditional review process with Predictive Coding can deliver tremendous cost savings while at the same time imrpoving recall and precision of the tahhed documents.

This is accomplished in an iterative process by using various mathematical algorithms in conjunction with the deep subject knowledge of our project managers and senior attorneys on a case team that essentially "train" or "tune" the algorithms to the specifics of a case and a given document set. While each case may be slighlty different, the core steps of this iterative process are shown below:

  • case team decides on the categories/tagging structure
  • case team and our staff conduct key word searches to identify sample documents in each of the target categories
  • Sample documents are fed to the system which identifies contextually similar documents from a randomly selected sample of ducments to be reviewed, and tags those documents based on the tagging of the sample documents
  • case team reviews the output and checks the tags that were assigned by the system and makes corrections as necessary. This is essentially a "tuning" or "training" step.
  • above two steps are then iterated until the case team is satisfied with the precision at which point the system performs automatic coding on the entire document set.
  • based on the volume of documents tagged and evolving hypotheses in the case, certain document categories or only Hot documents can be fed to the case team for either further review or for the case team to use for depositions or building supporting documents for various elements of the case

Note that for this process to work well, the case team needs to assign senior attorneys with a deep knowledge of the case and subject matter to participate in the above iterative process. To properly train the system, Discovry Specialists assigns project managers who have direct subject matter knowledge related to the case so that they can properly assist the senior attorneys on the case team during the iterative process.

ADVANTAGES

Cost savings - Predictive Coding approach takes significantly less time and is less costly than the traditional sequential review process.

Consistency - the advantage of software is that it does not tune out, get tired or lazy when reviewing a page. The algorithm produces consistent results from the first to the very last document it reviews, without skipping parts of a document or fine print as is usually the case with a fatigued reviewer under a time pressure.

Flexibility - When additional information or theories become relevant during a case, review teams are forced to repeat the exercise and re-review. With Predictive Coding, this can be accomplished quickly with simply re-tuning the algorithms or feeding new relevant sample documents so that the system can re-tag the entire document set.

Transparency - the system keeps track of why a certain document was tagged in the specific way, and does not need to rely on reviewer's memory.

Reporting - Detailed reporting is produced for documents that have been automatically coded. This helps attorneys understand the type of documents produced, volumes, date ranges, people involved in the documents, connections between custodians and non-custodians, and anomalies in the production.  It would be nearly impossible to compile this detailed reporting with human review, especially since individual human reviewers only see their documents and don’t understand the cross-relationships with documents reviewed by others.

Slang independency - the system recognizes "context" rather than keywords, so it is particularly useful in recognizing terms/paragraphs that have the same meaning but use different words or terminology

Language agnostic - the system is language agnostic and hence can be used on document sets that contain multiple languages. However, to achieve best results, we recommend translating foreign documents or at least machine translating.

DISADVANTAGES

Over-inclusion - the system can be over inclusive on tagging of "hot" documents. This can be tuned with more iterations but it depends on the comfort level of the case team as to what percentage similarity level to use when assessing if two documents are contextually similar.

Output Quality dependent on Input Quality - we require that senior attorneys are the ones involved in the iterative process and selecting sample documents. If that is not the case, sample documents themselves may not be representative of the tagging category which often leads to poor overall results or unnecessarily repetitive iterative process.

Dependency on the OCR quality - Documents that have poor quality of OCRed text may not be properly tagged by some predictive coding systems. However, our platform and approach identifies and properly accounts for OCR defects.

Algorithms used in Predictive Coding

 Following set of algorithms are used in the overall process of Predictive Coding.

  • Similar Document Detection – SDD is a sophisticated semantic algorithm that can evaluate content and key phrases in a document (or a text segment) and then search for similar documents in the data set.
  • Contextual Search – unlike a traditional keyword searching, a Contextual Search is a self-training mathematically based search technique that analyzes content and sentence structures. Via this technique, it is easier to uncover documents that contain contextually similar information to a reference phrase or a paragraph.
  • Flexible, Model-Based Phrase Search – retrieves narratives that contain one or more user-specified phrases, and ranks the narrative on their relevance to the phrases

Our capabilities narrow your productions, save attorney review time, and increase your knowledge of both your client’s documents and those documents produced by the opposition.