In a head-to-head matchup between some of the brightest legal minds in the industry, armed with advanced degrees, years of experience, and sophisticated technology, and a simple percentage sign, who would win?
When it comes to eDiscovery search, put your money on the percentage sign. It, like dozens of other often-overlooked words and characters, can easily derail discovery processes when lawyers fail to investigate the abilities of their eDiscovery platform and neglect to test their search queries in advance.
Hours of Work, Foiled by a %
Logikcull recently hosted a webinar on eDiscovery search best practices with Craig Ball, a respected trial lawyer, certified computer forensic examiner, law professor, and expert on electronic evidence and eDiscovery. His accomplishments are too many to list here, but include serving as a special master on eDiscovery in some of the most challenging cases in the U.S. and being able to discuss discovery in a way that is insightful and engaging—a rare skill, indeed.
In that webinar, Craig Ball shared this search he once encountered from a client: “20%” AND ("payment" OR "amount" OR "check" OR “pay”)
To the parties, the above search phrase looked like a perfect, or at least effective, query. After all, they’d composed the search based on their early knowledge of the case, sought terms and a construction they believed would produce relevant documents. They had met and conferred with the other side, agreed upon the search query, and they may have even engaged in significant back and forth before settling on ‘“20%” AND ("payment" OR "amount" OR "check" OR “pay”)’.
“I knew as soon as I saw that,” Ball recounts, “it was going to have some significant issues.” First, there was the question of the “20%.” That keyword alone would miss many of the ways 20 percent can be expressed, such as “0.2” or “1/5" or simply written out as “twenty percent.”
Second, there was the percentage sign. “I'd have to guess that the majority of search tools out there, no discredit to them, cannot search for the percentage sign and the number as a discrete search term that is only going to pull up instances of ‘20%,’” Ball explains. “That's going to be true whether you put it in quotes or you put it everywhere.”
The reason? In many search platforms, the percentage sign is treated as a search operator, such as for fuzzy searches. That means the % sign isn’t indexed. And if a word or character isn’t indexed, it simply cannot be found by the platform, no matter how you craft your search.
In many eDiscovery search platforms, dozens of common words, plus special characters, single characters, punctuation, and more, are not indexed, either because they have a separate function as an operator or because they are treated as “stop words” or “noise words,” words the platform thinks just don’t matter. Numbers, too, are frequently included in that list, meaning that a search for “20%” is unlikely to be fruitful—unless, of course, your platform is designed to index every character. As Craig Ball explains:
So every time you search 20 percent, if you can search numbers at all, and not all search tools will index numbers, in fact many will not, it's not going to pull up 20 percent. It's going to pull up every instance of the number 20.
The “20%” example is illustrative of a far too common problem in discovery. When it comes to creating effective search queries, many legal professionals fail on two fronts: First, they don’t know the limitations of their platforms and how those can impact the accuracy of their search. Second, they neglect to test their queries before finalizing them, missing opportunities to identify failings and refine searches.
Testing Your eDiscovery Search Queries
Let’s focus on this second front: search testing. (For a quick overview of how your platform can impact your eDiscovery search accuracy, and what sets Logikcull apart from others, click here. Hint: Dealing with “20%” in Logikcull shouldn’t be a problem.)
So, how do you go about testing your searches?
In Logikcull, the process is relatively easy. Logikcull’s Bulk Keyword Search feature lets you test searches in dozens, even hundreds, of variations, instantly. Simply enter your search queries and click “test” to see how many documents hit for those results.
This shouldn’t be the end of your process, however. As Ball notes, relying on hit counts alone to determine whether a search is over or underinclusive may be industry standard, but it’s far from ideal. Instead, you need to actually look into the documents themselves to understand how your search queries are working, or how they’re not.
I recommend getting a desktop tool that mirrors the capabilities of your (or your vendor’s) search platform and processing a sample of the data from one to three representative custodians, a volume you can get your arms around, and begin looking at the search hits in context.
When you get noisy hits, what is bringing them up? Is it the footer in all of somebody’s emails? Is it that you've accidentally run a key custodians name against the custodian’s own email?
In Logikcull you can do this directly from the Bulk Keyword Search itself or by running a whole new search, if you wish. After testing your terms in Bulk Keyword Search, simply hit “search” to run the queries and see the resulting documents. The search is run within your current filter set, so if you’re filtering by a tag, metadata field, custodian, etc., the Bulk Keyword Search will apply those limitations as well.
From there, you can go forward and begin refining your search queries, tweaking and testing them as you grow more familiar with your documents. You may also want to meet with the opposing party and discuss any problems with the searches—and whether their platform will be able to handle them as well as yours.
“A little work like that,” Craig Ball says, “with a person who is at least willing to talk about results and cares about not getting a bunch of junk, can go a long way in improving the quality of search and lowering the cost of review, significantly building credibility and trust in the process.” As he explains:
I would like nothing more when challenged about the quality and completeness of my production than to be able to say, "Judge, back when we were coming up with these searches, I brought my opponent in. We sat down for a couple of hours. We tested these searches. This is not something that we hoisted upon my opponent; this is something we came up with together.”
I think you're going to be much more protected against any kind of sanctions or do-overs if you followed a process that is at least somewhat transparent and shows a level of cooperation that demonstrates a genuine desire to make eDiscovery more effective and more efficient.
Ultimately, I think that's going to save the person who does that effort much more money than anyone else in the case.
For more insights into eDiscovery search, check out Logikcull’s recent, product-agnostic, webinar featuring Craig Ball today. To see how Logikcull can improve your discovery process, sign up for a demo here.