Discovery Specialists: Less Production Can Be More in Database Discovery

10/26/12 | 12:31

Less Production Can Be More in Database Discovery

The Corporate Counselor, By Michael Spencer and Diana Fasching
October 26, 2012

In e-discovery, it is not uncommon to see production requests for a copy of an entire database instead of requests for targeted, relevant information. For example, in the investigation of an age discrimination claim, a party may request a copy of an entire human resources database, instead of asking for specific data relevant to the claim. Many parties incorrectly assume that such broad requests facilitate more complete production and eliminate the risk of inadvertently failing to request a key piece of data.

On the contrary, a full database production may actually omit important data because the database (which stores the data) works in concert with the application (which presents the data to users). In doing so, the application may derive and present data that the database does not store. For example, employee age is a constantly changing target because age is determined, in part, by the current date. Therefore, the application handles calculating and displaying employee age derived from the date of birth (stored in the database) and the current date.

KEY PRINCIPLES

If corporate counsel has good lines of communication and support from internal information technology or e-discovery personnel, they likely will not need to maintain an in-depth understanding of specific databases and applications. Counsel should, however, understand several key database principles that can affect the discovery process. The following eight principles will help counsel better understand why "less" (i.e., targeted requests or productions) can be "more" (i.e., readily giving counsel all of the relevant information) in database discovery. (Also see The Sedona Conference Database Principles Addressing the Preservation & Production of Databases & Database Information in Civil Litigation (Conrad J. Jacoby et al. eds., Public Comment Version 2011), hereinafter Sedona Database Principles.)

1. Calculated on the Fly

The employee age example illustrates a common practice whereby the application performs calculations on the fly when displaying data to users. Indeed, most databases do not store values for things such as employee age, years of service, and leave duration. Instead, the database relies on the application to calculate the time period based on the stored date values and the current date. These applications leverage formulas coded by programmers, ranging from very simple to extremely complex, and generally reside separate from the database, either in the application software code or in report templates. Therefore, a database copy might lack important data and the associated formulas required to perform the calculations, thereby omitting a potentially relevant piece of information.

Alternatively, a request tailored to necessary relevant data will allow the receiving party to receive more accurate and reliable information. Counsel, however, must realize that "[d]ue to differences in the way that information is stored or programmed into a database, not all information in a database may be equally accessible, and a party's request for such information must be analyzed for relevance and proportionality" (see database principle #2 in the Sedona Database Principles at 26-30).

2. Translation

Terminology changes over time. For example, business users may refer to "paid leave" later as "paid leave of absence." Fundamentally, though, these two terms denote the same concept. It would be a programming nightmare to change underlying database structures or application code for every such whim. Instead, programmers rely on "translate values" such as where the letter P equates to "paid leave" or, later, "paid leave of absence." These translate values allow data stored in underlying data tables and application code to remain constant (as "P") even as the business users change terminology over time.

The trouble is that unless counsel understand how and when the translate values come into play, they may end up with meaningless data. In the above example, if counsel asked for a copy of the database and looked at employee status, counsel would only see "P." Counsel would have to look elsewhere, likely in another table within the database, to find P's meaning. With a tailored request for employee status, counsel would instead get the meaningful description of "paid leave of absence" with no extra work involved.

3. Combining Data

Enterprise databases, such as those used by human resource departments, often consist of thousands of tables. To provide context, imagine an Excel workbook with 5,000 worksheets. A search for several key terms would be no easy feat. But, Excel workbooks, unlike databases, generally exist as single, stand-alone entities that allow a user to quickly search information from various worksheets. To search a database, however, one must use queries to combine database tables to create a useful representation — a process that generally requires a trained analyst familiar with the underlying database model (i.e., field and table structures and relationships) because database systems often lack sufficient, published documentation. For example, a typical human resources database contains one table that stores employee status, date of birth, date of hire, and a unique identifier that helps distinguish among employees. One combines these values with other tables to determine additional information about an employee, such as job history and salary. A clear understanding of the database structure, relationships between tables, and the appropriate query language is essential to retrieving information from a database.

One should not underestimate the complexity of searching through multiple tables within a database. Counsel cannot perform a simple Google-like search. Rather, counsel must create structured, often complex, queries to obtain information. This likely means navigating the dangers of improper queries, including Cartesian joins (the combination of every row of one table with every row of another table rather than combining only related rows), which lead to incorrect and meaningless data. Rather than take on such a daunting task, it is far easier, less time consuming, and more reliable to actually consider, determine, and target relevant data.

4. Data Changes over Time

A database can maintain both current and historical information. For example, human resources databases typically track distinct events throughout the course of an individual's employment, including hire date, promotion and pay raise dates, military-leave dates, and such. Databases leverage these dates to track the chronological order in which each of these events occurred. Architects also design databases to account for multiple actions that may occur on the same day, but must exist separately for reporting and other business purposes. For example, an annual performance pay rate increase may happen on the same date as a promotion pay rate increase (e.g., from Discovery Analyst I to Discovery Analyst II). Each of these increases may be effective on 1/1/2012, but could be entered separately and sequenced, resulting in two rows of data for this promotion event. It is far easier and more reliable to ask for the targeted information and let the system's trained analysts handle the nuances of combining related tables and historical changes.

5. Protected Information

While privacy issues are beyond the scope of this article, the handling of protected information poses significant risks. Unless the protected information is relevant to the matter, counsel should avoid privacy-related issues by neither requesting nor producing data that is not relevant. For example, human resources databases often store employee Social Security numbers. Unless a party requires the numbers to address a matter, a party should refrain from requesting the data to avoid the additional effort and expenses required to protect this type of sensitive information.

6. Efficiency and Security

Because "databases employ techniques to optimize performance and protect confidentiality that can result in responsive data being missed, even by an apparently competent operator . . . [n]ever assume that a query searches all of the potentially responsive records, and never assume that the operator knows what they are doing." (Craig Ball, "Ubiquitous Databases," Law Technology News, Dec. 1, 2010). Built-in security constraints may prevent a query from retrieving all of the data, even if the search query seems to be perfectly constructed.

This "security trimming" limits search results based on the identity of the user submitting the query. Because a user cannot view data that he or she cannot access, the user cannot view and generally will not know what relevant data the query failed to retrieve unless they are familiar with the business and the nature of its data. Therefore, counsel is better off asking for targeted, relevant information and letting those most familiar with the data handle querying, validating, and presenting it in a form that is useful.

7. Static Reports

Counsel should not rely on the format in which database production will occur to ensure that a party will have the ability to access and search data (see database principle #6 in Sedona Database Principles at 36). Indeed, organizations store information in a database precisely because the structure allows trained analysts to use queries to sort and search vast quantities of data. Any production of data in a flat-file format (e.g., PDF) will diminish these capabilities. Requests for the entire database are often both over-inclusive (scores of tables and historical information) and under-inclusive (calculated values), as discussed herein. Moreover, contrary to popular belief, parties need not request a native database production. "[I]n many cases, a truly native format production of database information is less usable to a requesting party than an alternative production format" (see the "Mismatch of 'Native Format' to Most Database Productions" discussion in Sedona Database Principles at 18).

Consider requesting information in a format that allows for the viewing and analysis of data in multiple ways with a reasonable degree of effort (Karl Schieneman et al., "E-Discovery of Databases — Plaintiff's and Defendant's Perspectives." ESIBytes, Oct. 23, 2009). However, recognize that the "parties should use empirical information, such as that generated from test queries and pilot projects, to ascertain the burden to produce information stored in databases and to reach consensus on the scope of discovery" (see database principle #3 in the Sedona Database Principles at 26-30).

8. Communication and Cooperation

Many databases are purpose-built, with structures understood by only a small team of individuals. Organizations typically customize even the most common, commercially available databases. Given the specialized nature of these systems, counsel need not understand details regarding specific database structures and relationships. Instead, consider using subject matter experts who are more likely to successfully extract appropriate information for specific requests.

Outsiders will likely encounter difficulty in understanding the nature of content stored in these systems and, more important, which data may prove relevant to a matter (Schieneman, E-Discovery of Databases — Plaintiff's and Defendant's Perspectives). Accordingly, the success of enterprise database discovery begins with proper communication and cooperation between parties. In other words, "better communication naturally will reduce 'blunderbuss' requests for databases that typically encompass irrelevant or inappropriate information or the production of terabytes of useless, undifferentiated data" (see the Sedona Database Principles at 6).

CONCLUSION

By targeting only relevant information, counsel will get data that is meaningful, useful and, ironically, more complete. This less-is-more approach to discovery of databases should save time and money — and headaches.

Michael Spencer is the Records and Discovery Manager for DISH Network. Diana Fasching is a Senior Advisor with Redgrave LLP, an information law firm.

<- Back to: Forensics