If you have been following Logikcull for some time, you are probably aware of our particular crusade against “noise” in eDiscovery. But did you know that noise usually accounts for more than 97% of eDiscovery data?
That’s right. 97% of data collected for eDiscovery is never produced to opposing counsel.
While this stat can sound a bit disheartening, it actually creates a golden opportunity for legal teams looking to bring down eDiscovery costs while increasing efficiency.
Think about it: On average, 70% of discovery spend is tied to the review process. In fact, the ABA estimates that $42.1 billion dollars are spent just on reviewing documents every year, a number that keeps growing exponentially as data volumes explode.
But this is no news to any seasoned in-house legal professional. Logikcull’s 2020 Corporate In-Housing Survey showed that, when it comes to evaluating eDiscovery expenses, review costs are the most important factor at play for in-house teams. They are also the main source of pushback on outside counsel spend, according to 41% of respondents.
So, unless you know with a high degree of accuracy where the smoking gun is, the best way to reduce review costs is to reduce the number of documents that need to be reviewed in the first place. Even the most basic culling techniques like deduplication can reduce your data sizes by more than 40%. Only when the noise is gone can you focus on what really matters—the signal.
And the best part? The most effective data reduction strategies can be handled in-house with low effort and almost no risk. Cull through your data at the outset of a matter and the review savings follow.
Here’s a list of the five best data reduction strategies to dramatically reduce eDiscovery spend:
We tend to think about data reduction strategies after all the data relevant to a case has been collected, but the truth is that you can significantly reduce your data sets if you narrow them down earlier in the EDRM, at the collection stage.
You can avoid overcollection by targeting specific content or date ranges, or even by using search terms if you have a very precise idea of what you’re looking for. For example, if your company uses Google Workspace, you can use Google Vault to export only those emails, GChat messages, or Drive files that were created after a specific date.
This process is much easier when you have a strong data preservation strategy and early case assessment process in place. Your IT team is often a great ally in helping you determine your capabilities and the best approach since they will often handle preservation and collection.
Another way to be smart about your collections is to prioritize the data that needs to be collected. Consider staging your collection and starting with what is likely to be the most relevant data first—you can always get the less important data if needed in the future. To determine what your top priority data is, think about what’s most important for your case: Is it the communications from a specific custodian? Specific document types? Any documents within a specific date range maybe?
Finally, you want to make sure your data collection process is scalable and as efficient as possible. You can easily accomplish this by implementing data reuse as a culling strategy. Modern eDiscovery software usually includes tools that make data from previous projects automatically available for you so you can reuse it across matters.
With smart data collection, not only do you reduce the number of documents that need to be processed and reviewed from the outset of a matter, but you also bring discovery costs down by running this process in-house. Teams that have a solid information governance policy and leverage preservation capabilities provided by most data sources rarely need to rely on external collection.
So, you’re starting off your document processing with a reduced data set thanks to an improved collection strategy. Now, let the culling begin!
The next step is to comb through your data to get rid of any documents that don’t require eyes-on review. And the most obvious (and effective) place to start is duplicate documents.
Data culling through automatic deduplication can trim your data by more than 40%. Just think for a second about the savings that this simple step can yield for your discovery process:
If you’re dealing with a case involving 37Gb and 150,000 documents, assuming a document review rate of 50 documents/hour and a review cost of $40/hour, your document review costs would amount to $120,000. However, a 40% reduction in your data size would bring the cost down to merely $72,000. This means that you would be able to save $48,000 just by clicking a button—literally.
And yet, our 2020 In-Housing survey showed that corporate legal teams are not exploiting this powerful technique to its full potential. With only 63% of all surveyed companies deduplicating data in-house, culling through deduplication—and the astonishing savings that come with it—are largely underutilized.
eDiscovery tools, like Logikcull, that include automatic processing features allow you to perform this simple step in-house and with no effort. Right after ingesting your data into the platform, any duplicate documents will automatically be removed from the review set so you don’t waste any unnecessary time or resources going over them. (Note: You’ll always be able to access duplicate documents if needed. Logikcull doesn't destroy the dupes, just hides them from your review set.)
Similar to deduplication, deNISTing is low-hanging fruit when it comes to culling data.
For starters, deNISTing is the process of eliminating any system applications and files (.dll files, for instance) before a document review.
How can you identify those files? Easy.
NIST stands for National Institute of Standards and Technology. NIST keeps a list of all these known and traceable applications with the National Software Reference Library project, so to deNIST means removing all the applications and files included in this master list. The list is constantly updated and vetted, so you can rest assured that ant files with the extensions included on that list can be ignored.
While deNISTing is pretty straightforward and can be done even during the collection process, many vendors charge for this service as part of their data processing. But if you own a platform like Logikcull, you can have both deduplication and deNISTING automatically taken care of (together with more than 3,000 automatic processing steps) the moment you upload your data.
Another powerful technique to shrink your data volume is to leverage email threads. An email thread is just what it sounds like: a chain that contains all the emails between a sender and a recipient (or recipients).
In order to accurately analyze email threads, you need an eDiscovery tool that includes email threading capabilities. In Logikcull, emails that are part of a thread are labeled as “Has Thread”, and a different tag ("Last Email") is applied to the last email of a thread, which is usually the one that matters most. These automatic tags allow you to see the complete back and forth in one single document, instead of having to review each of the emails separately. This allows you to quickly cull through any threads that don’t require review.
Apart from a great data reduction strategy, email threading is particularly useful during the review stage since it sheds light on the context and tone of communications, the relationships between the people involved, the chronology of events, etc. It also allows you to understand any topic shifts or relevant reactions at a glance.
Now that you’ve eliminated about 50% of the junk, it’s time to get a bit more granular. But fear not—with the right technology, you can still use many straightforward filters and perform keyword searches in-house before getting OC counsel involved.
Here are some of the most common (and often underutilized) filters to further cull your data:
Unless you have already applied a very specific date range during collection, this filter can help limit the scope of your review by eliminating any time frame that’s irrelevant to the case.
You may wonder how on Earth you can know if a document is responsive merely by looking at its gigabytes. The short answer is “you can’t”... unless the number of bytes is 0. Filtering by size is a great way to spot files with no content on them. Bye, bye, impostor docs!
Depending on the nature of your case, this can turn into a real game-changer. For example, if your collection is full of videos and you are confident that’s not where the truth lies, you can save the popcorn for a better time. The same goes for source code, calendar invites, database files—any file type that is not going to be relevant to your review.
Achtung! Unless any of the people involved in your case are not monolingual, you can probably remove any documents that are not in the main language.
We talked about email threads before, where actual people exchange messages, but the truth is that a big portion of our inboxes is made up of emails we never engage with, mostly spam and newsletters. Detecting all those ever ignored emails is a great way to reduce data. Filter by sending domain, unsubscribe links, or keywords in the subject line to quickly identify them.
Finally, let’s cull a bit more the “Google way”. That’s right, through keyword searches.
Using search terms as a culling strategy is generally more subjective than using most of the filters we just covered, but depending on the case, it can be a really effective technique.
The most important considerations when searching for specific words across your data set are understanding how your software’s search function works and making sure your searches are as accurate as possible. (Logikcull indexes all stop words and special characters, but many platforms do not, so be sure you know the limitations of your tool.)
Just imagine that you’re looking for data related to an intern. You want to include all the keywords on the same family (intern*) and any synonyms (trainee, associate, apprentice). For higher efficiency, combine the 3 of them in the same search: (trainee OR associate OR apprentice).
With a tool like Logikcull, you can also test out multiple keywords (and keyword combinations) at once using bulk searches to determine if the scope of your search is too inclusive or too narrow and iterate in real-time. Dump all the keywords that you know can surface privilege or invaluable docs, and cull them all at once.
eDiscovery today presents a signal-to-noise problem, one where noise is the biggest problem of all. The truth still hides in the same tiny fraction of documents, but discoverable data is increasingly contaminated by useless digital noise—and it gets louder and louder.
If you want to keep your eDiscovery costs under control and your mental health in check, you need to learn how to cull the noise.
By implementing these simple data reduction strategies, not only will you dramatically axe your eDiscovery spend but you can also generate a healthier relationship with outside counsel by having fewer disagreements about review costs, which is the most common source of pushback on OC bills.
However, truly effective data reduction strategies are only possible when executed through powerfully simple tools that automate most of the steps for you. With a self-service eDiscovery platform like Logikcull, you have all of these strategies at your fingertips so you can silence the noise with ease—and turn up the volume on massive time and cost savings.
Take Logikcull for a test drive and start applying these data reduction strategies to save money, time, and headache on your document review. Cull the noise and focus on the signal.