Skip to main content

Data Validation in eDiscovery: Who Owns the Risk When Something Goes Wrong?

September 4, 2019  |  6 min read

Data Error

Business deals are made and scandals fueled over text, as opposed to a physical paper trail. Come litigation time, those electronic records need to be verifiably loaded into a system and reviewed for a number of purposes. Emails, messenger logs, and attachments linked to those communications become subject to review. The challenge for law firms is one of both quantitative and qualitative data validation.

Typically the cost of eDiscovery represents between 20 and 50 percent of total litigation costs and takes as much time as there is data to upload and review. That includes not just the cost of technology and services, but expenses associated with the actual review of those files. The costs of paying staff to review those files, for relevancy or privilege, can be staggering. Some sources cite the billable cost of document review attorneys at anywhere from $75 to $150 per hour. As data sizes go up, so does the cost of the review itself.


Project X: How Failure to Validate Data Can Lead to Months of Extra Work

When that discovery process goes wrong, correcting any errors can be even more expensive. Anecdotally, firms have gone through the eDiscovery process and gone years into the document review process only to have discovered that there were underlying, undiscovered gaps in the data transformation process. In one particular case, a white-shoe firm had uploaded terabytes of emails into a platform for first and second level privilege and relevancy reviews only to have discovered that embedded images and data within the client bank’s emails hadn’t been properly converted and were unreadable.

At this stage, the bank had already been issued a Cease and Desist order by a regulator in regards to potentially willful violations of Office of Foreign Assets Control regulations. About two years into the process, the law firm discovered the data discrepancy.

During review, the fields had been deemed non-relevant and were not considered as part of the firm’s defense strategy for the bank and hadn’t been provided, where necessary, to the regulatory examiners.

The firm was left befuddled. Quantitatively (for ease of description), the law firm had received, via its vendor, 10 terabytes of data from the client which included the rogue emails. However, there had ostensibly not been a qualitative review of those files to ensure that, as they were being produced, they were substantively intact.

Neither the vendor nor the firm was aware of the issue at its inception, so when the files were uploaded into the vendor platform, the screengrabs showed up as “broken” GIF, PNG, or JPEG files—a red “X” centered in a rectangle. At the time, the X’s were thought to be improperly loaded signature lines or company logos and thus not responsive to the regulators’ requests. After the discrepancy was discovered, additional reviewers needed to be contracted to rereview those documents. The revived project was code-named “Project X” and resulted in months’ worth of additional work for the firm and billable cost to the client.

In the days before operating systems came with a “screen capture” tool or function, users could use a shortcut with the now-defunct Microsoft Paint application. The client financial institution in this case had used this shortcut to email transaction, customer, or other information. The screengrabs were sent to staff in either legal or compliance in order to make decisions on how to proceed operationally.

Years had now passed since those original emails were sent, and the bank’s regulatory position was far more compromised; the emails containing the embedded images had already gone through at least an initial level of review in the firm’s eDiscovery platform. The costs had already been billed to the client. While this wouldn’t meet the definition of spoliation, courts have held that the law firm effectively owns the risk of negligence for failing to properly own or direct the eDiscovery process. Still, this particular type of data issue could have been easily the fault of the vendor, which may not have engaged effective data quality testing during the eDiscovery process. Based on case law such as Industrial Quick Search, Inc. v. Miller, Rosado, & Algios, LLP, it is likely that unless the error could be attributed to the law firm’s negligence, this issue would be the liability of the vendor.


The Legal Implications of Data Validation Errors

Financial institutions find themselves in a comparable positions when it comes to data and integrating systems. Most banks have largely antiquated systems at the core of their operations. In one case, a bank relied on a DOS-based operating system with an HTML-based interface. The challenge there becomes that the varying data types, which were never designed or intended to work together, now had to integrate seamlessly.

These data challenges can have significant legal implications. One regulatory consideration is in the anti-financial crime space, where under the provisions of the Bank Secrecy Act, USA PATRIOT Act, and the aforementioned OFAC requirements, banks are required to monitor the transactions for potentially suspicious activity. Commonly this requires the extraction, transformation, and loading (ETL) of data from at least one core banking system into a far more sophisticated monitoring (volume, typology, etc.) or screening (names, countries, terminology, etc.) system, for potential escalation and reporting. While monitoring/screening systems are routinely called out, a few years back that the term “data feed” was not only featured prominently in an enforcement action but identified as a root cause. Specifically, the regulator chastised the company for failing to properly ensure that all of the key data had made its way from the operating systems into the overlaying transaction monitoring systems. As a result, potentially high-risk transactions were not reviewed or escalated.

Many banks fail to complete robust quantitative and qualitative review of their core system’s data. The more complex the banking products and the more variables that go into a transaction, the bigger the data validation concerns. In correspondent banking the hosting bank must be able to read the data from the respondent institution and translate that into its own monitoring systems. Unfortunately, most systems are not terribly sophisticated by modern standards and data can easily slip through the cracks. 

For example, if a bank established a correspondent relationship with a bank in Malta, screening systems would either need to screen natural Maltese language, or have processes to translate/transliterate that language for screening purposes. Not that a potential sanctions evader or criminal would be so daft as to include “money laundering” on a wire transfer, but if they did the natural Maltese language translation for the phrase is “ħasil ta 'flus”. Unless the user’s screening system can “comprehend” the relevancy of that term, most screening systems would just dismiss it as not responsive.


Who Is Responsible for Data Validation?

In the world of financial institutions, the consistent approach has been that the hosting financial institution owns the data validation risk. Regardless of the complexity of a client or respondent institution’s geographic concerns, the correspondent needs to optimize its systems to be able to manage those risks.

In the world of eDiscovery, last year’s City of Rockford v. Mallinckrodt ARD Inc., 3:17-cv-50107 (N.D. Ill. August 7, 2018) decision reiterated that there are in fact parameters in place for quality control and data validation of eDiscovery production.

There is a volley of cases discussing ownership and financial liability for those risks, but ultimately an ounce of data quality prevention by relying on the right systems and platforms at the onset is far more palatable than the cost of a validation cure.


Learn how Logikcull's built-in quality control features provide unprecedented insight into your data.