If you think that digital data is exploding today, wait for what’s coming...
The IDC forecasts that data will keep growing at a yearly compound rate of 23%, reaching 175 zettabytes of global data by 2025 (that's 175 with 21 zeros!).
This is a known challenge for eDiscovery as organizations of all sizes are struggling to deal with ever-growing volumes of evidence collected for disputes and investigations and the astronomical costs associated with reviewing it.
But even worse than a seemingly unmanageable amount of information is the fact that the majority of data today is unstructured and unindexed. The IDC estimates that 90% of data is unstructured or dark data — a number that could grow to 95-97% in the coming years.
In other words, the majority of the data companies generate today isn’t available for them, and a significant part of the data that lurks in the shadows could be key evidence for all sorts of legal matters.
So the question is: How can you identify and make sense of that dark data during your eDiscovery process?
Let’s find out!
What is Dark Data?
Before we start discussing how to handle dark data, you might be wondering, “What the heck is dark data in the first place?”
Simply put, dark data is information that organizations collect and store, but rarely use, process or analyze. That’s why dark data is usually also unstructured and unrefined data that companies don’t even know they have most of the time.
In the context of eDiscovery, dark data also refers to data that is actually stored and theoretically structured in the systems where it was generated, but those systems are unable to index it when you try to locate and visualize those files. For example, users report that Microsoft 365 and Adobe can miss up to 70% of data when searching for it.
So when it comes to making dark data accessible, the challenge is twofold. On the one hand, your organization’s IT and data teams need to realize the importance of dark data and constantly work to leverage a company’s data and make it readily available for the teams that can benefit from it.
On the other hand, as an eDiscovery professional, you need to have an understanding of how different sources of discoverable information store and handle their data to be able to access and index it. This will prevent you from leaving important evidence behind.
Types of Dark Data
When you think about the amount of data that you generate on a daily basis via email, online documents, social media, Slack, Teams, etc., plus the data that your company collects via its website cookies, forms, applications, research projects, and more, it’s easy to see how unstructured dark data can accumulate and be swept under the rug.
In a nutshell, any data created on any application, data source, on-prem system, or in the cloud has the potential to be dark data. For every industry, the most common types of dark data can vary, but some usual suspects in the world of unstructured data are:
- Log files
- Old employee data
- Financial statements
- Geolocation data
- Raw survey data
- Surveillance video footage
- Customer call records
- Email correspondences
- Notes or old documents
- Research data
However, the fact that those files are sometimes out of reach for you doesn’t mean that you wouldn’t be able to index, search and visualize them with the right eDiscovery technology if you were able to collect them. More on this in a second.
Main Challenges of Dark Data for Discovery
Having dark data in your organization presents several problems for data privacy, security, and compliance.
In terms of privacy, when you have data that is unlinked from a person’s identity, you can’t guarantee you are meeting data privacy regulations.
The same is true for security. Any data you don’t know you have is likely unprotected and easily accessible to hackers. Law firms are especially vulnerable to these attacks, with 25% of them experiencing a data breach in the past few years.
But more crucial for eDiscovery are compliance challenges. When you can’t index your data, it’s not searchable. And if it’s not searchable, you can’t find it. Most eDiscovery tools aren’t built to find dark data — which means you’re missing potentially valuable evidence and compromising the defensibility of your discovery process. And that means an increased risk of sanctions.
An Example of Dark Data: Microsoft 365 Non-Indexed Files
For a more comprehensive overview of the implications of dark data in MS 365 eDiscovery and how you can avoid them, check out Logikcull’s recent guide, “eDiscovery in Microsoft 365: A Compete Guide.”
As mentioned earlier, Microsoft 365 is one of the applications that can generate high volumes of non-indexed dark data for eDiscovery.
The issue comes from the searches you can run using Microsoft's Purview eDiscovery tools. When these searches hit large attachments or documents that are incompatible with Microsoft, they’re labeled as “non-indexed” or “partially indexed” in their systems, which prevents users from seeing the content on those files.
This challenge is perfectly illustrated by a situation a Fortune 500 energy company recently faced.
In a run-of-the-mill investigation, a senior litigation paralegal of this organization found that Microsoft Purview failed to index 10,117 documents in their search.
In some cases, they realized that the files couldn’t be indexed because they were incompatible with Microsoft’s systems. In others, they were email attachments that were too large. When this company exported all of those documents and ingested them into Logikcull, the platform indexed all of them and made them searchable.
After reviewing the files, they realized that 148 of them were responsive. In fact, three of them were crucial pieces of evidence for the investigation at hand. But these documents would have stayed in the dark if the company hadn’t been proactive about finding alternative ways to index them...
Our CEO and Co-Founder Andy Wilson, sharing this customer story during a recent launch event.
Handling Dark Data in Discovery with Logikcull
When it comes to deciding how to deal with structured and unstructured data in your organization, there’s not a ton that eDiscovery software can do for you. That would involve a high-level discussion across many teams on what data to collect, analyze and maintain on a regular basis while keeping business goals and priorities in mind.
It’s also helpful to set retention and deletion policies that account for both structured and unstructured data. Just because a portion of your company’s data is dark, it doesn’t mean you can afford to store it forever.
However, there’s an ocean of data that can be relevant for disputes and investigations that might be impossible to properly access in the sources where it was created — and that’s where discovery technology like Logikcull comes to the rescue.
With the ability to index and read virtually any kind of file — from audiovisual clips to text in embedded images on any document — Logikcull allows you to search and review 100% of the evidence you collect.
When you ingest your data into the platform, it’ll automatically undergo more than 3,000 processing steps that will not only make every single file searchable, but will also turn your metadata into filters that will help you quickly hone in on your most relevant documents.
This eliminates important defensibility risks as it will shed light on your dark discovery data, making it highly unlikely that you’ll miss any important piece of evidence during your document review process.
And while Logikcull will give you access to a whole new ocean of data, that doesn’t mean your discovery process will become more complex or costly. Quite the opposite.
Thanks to automatic deduplication, data sorting, and filtering, you’ll significantly reduce the amount of data that requires eyes-on review for every matter, which will bring down costs while allowing for an ultra-streamlined and secure eDiscovery process.
If you’d like to see how Logikcull can help you surface dark data, book a quick demo with us today.