Blair and Maron Must Die!

For years now the eDiscovery industry has told attorneys that they don’t know how to do eDiscovery without us. We have told attorneys that using keywords and phrases to find important or even relevant documents doesn’t work, and even if it did work, attorneys aren’t qualified to do that. Attorneys have been told this by eDiscovery vendors (see, e.g., here, here, and here—see slide 10). Attorneys have been told this by eDiscovery experts (see, e.g., here, here and here—at 181). Attorneys have been told this by judges, including those we sometimes call the “eDiscovery Rockstar judges.” (see, e.g., here and here—at 43:00 in the presentation). There is one common thread between all of these, that we see mentioned over and over: “Blair and Maron.” As in, “Blair and Maron says that attorneys don’t know how to find documents.”

I had thought that “Blair and Maron” was dead, because after being the catch-phrase of the eDiscovery industry for years it seemed to have died off. It figured prominently in the 2014 film “The Decade of Discovery,” but I hadn’t heard it much used since then. However, with the surge of interest in—and fear of—AI technology in the law, the phrase seems to have come back to life lately; I heard it used at an event focused on AI and law just a few weeks back. That’s too bad, because like the zombies in “The Walking Dead,” “Blair and Maron” is well past the time where it should have died a natural death.

Just In Case You Are One of the 2 People in eDiscovery Who Haven’t Heard of ‘Blair and Maron’

David C. Blair and M.E. Maron were two scientists who published a study An Evaluation of Retrieval Effectiveness For a Full-Text Document-Retrieval System in the scientific journal Communications of the ACM (standing for Association of Computing Machines) that . . . well, actually, the Sedona Conference Commentary on Search and Retrieval does a good, (mostly) concise job of explaining it:

A well-known study testing recall and precision in a legal setting was conducted by David Blair and M.E. Maron. . . Blair and Maron found that attorneys were only about 20 percent effective at identifying all of the different ways that document authors could refer to words, ideas, or issues in their case.

. . . Blair and Maron evaluated a case involving an accident on the San Francisco Bay Area Rapid Transit (BART) in which a computerized BART train failed to stop at the end of the line. There were about 40,000 documents, totaling about 350,000 pages, in the discovery database. The attorneys worked with experienced paralegal search specialists in an effort to find all of the documents that were relevant to the issues. The attorneys estimated that they had found more than 75 percent of the relevant documents, but more detailed analysis found that the number was actually only about 20 percent. Sedona Commentary at 25 (footnotes omitted)

Or, to quote one of those eDiscovery Rockstar judges, Blair and Maron says that “key words searches suck” or maybe just that you, the lawyer, suck at them.

Or does it?

We All Think We Know What Blair and Maron Says

One of the surprising things that I have come to learn in my long time—over a decade—in eDiscovery is that for all of the times we reference Blair and Maron, exceedingly few of us have actually read it. Instead, we take what we hear about the study for granted and use it because it is, well, useful: if any attorney balks at why they need our new tool or new service or new whatever, we can just cite to the study and our second-hand knowledge thereof. I have to admit that for more years than I would like to admit that I was complicit in this as well.

Some of this might be excused, if one was looking for such, by the fact that the study was hard to find. The study was published in an obscure (at least to the legal industry) journal that required an expensive subscription. And then, one day, somebody sent me a copy of it, perhaps as way of telling me that I didn’t really quite know what I was talking about. And then I read it. In fact I read it several times (admittedly it’s not an easy read at all) and I found myself rather surprised by many of things I learned about the study.

The first of these is just how old the study is. Sure, we all know that we’ve been using it since the “beginning” of the eDiscovery era, so the study must be, maybe a dozen years old? Maybe a little bit more?

Guess again, because the study is a much, much older than you probably think.

If You Can Remember 1985, You Are Old—Just Like Me, and the Blair and Maron Study Too

I’m not the first person to wonder about why we still talk about Blair and Maron, actually. The study is now 32 years old, having been published in 1985. When I ask my law students how many of them were even born by 1985, very, very few of them ever say they were (which makes me feel so very, very old).

Considering just how far computer systems have come in the last decade or so it seems rather surprising that we would use a study written over three decades ago as anything definitive, nonetheless as a way to tell lawyers that they don’t know how to use computers. I used to teach internet searching to lawyers and paralegals years ago, before Google was released 18 years ago (yes, I Googled that). I remember how hard it was to get good search results with many of the pre-Google internet search engines, but I wouldn’t use the fact that Lycos or WebCrawler results often sucked as a reason to tell people they don’t know how to use Google.

Using a system over thirty years old to attack the search skills of present-day users would be problematic enough. However, our problem with unfairly viewing the present through the lens of the past doesn’t end with just a journey back of 32 years. The system used in the study was called STAIRS, for “STorage And Information Retrieval System,” was created by IBM in 1969, back when information retrieval science was still at an early stage. Thus, the STAIRS system used in the Blair and Maron study was 16 years old around the time of the study. Do some simple math (or just have Google do it for you) and you’ll realize that we are using nearly fifty-year-old technology to tell lawyers that they don’t know how to use today’s technology.

What was STAIRS like? Here’s a Computerworld “advertorial” from 1975:

Check out those piles of books, really big books, that they are using to “formulate a STAIRS search strategy.” This looks pretty painful. It’s certainly not any modern, “just Google it,” search process.

One of the pictures has a shot of the STAIRS user interface, but it is unfortunately too small to really see. The original STAIRS reference manual is impossible to find, and in my initial searches all I could discover was a number of posts by researchers in various online forums asking if anyone still had a copy. Fortunately, I caught a break after a while and found some screenshots and instructions. While I could not come up with the original STAIRS Reference Manual, I was able to find a user’s manual that came from, of all places, the International Atomic Energy Agency (IAEA). And yes, I found it by Googling it, over several iterations of searching.

Let’s start with something simple, a list of the available Boolean operators in STAIRS:

IAEA Manual at 12

Makes sense? It doesn’t seem quite so alien to those of us used to DT Search and Lucene syntax. I sure hope so, because once you’ve found some documents, you probably have too many to read through and need to use the “RANK” command to bring the relevant ones to the top of the list. Here’s how you go about that:

IAEA Manual at 15-16

You got that also? Actually, it doesn’t matter, since a 1996 article by David Blair revisiting the original study, STAIRS Redux: Thoughts on the STAIRS Evaluation, Ten Years after, made it clear that the relevance ranking was nothing short of worthless:

There was no statistically significant correlation between the rank ordering of the retrieved sets by any of the five algorithms and the ranking that the users’ relevance judgments placed on those retrieved sets. STAIRS Redux at 17

The manual goes on for 117 pages . . . and it keeps getting better for every, single page you read in it! (For best effect, please re-read that last sentence again with this clip from the movie “Beetlejuice” playing in the background—thank you.) I could cite to any number of impossible-to-unpack paragraphs of 1960s era text retrieval science (fiction?) jargon, but instead, let me just copy and paste the 5-step sequence necessary just to turn the system on:

IAEA Manual at 22

Some of those steps, by the way, come with several pages of details on how to go about them. So, yes, the next time you find yourself complaining about the complexity of searching in your eDiscovery system, at least it only took you a couple of clicks to actually turn it on.

And Just Look at What You Got to Work With Here!

Once you get past the multi-step process for turning the system on, STAIRS offered many of the expected Boolean commands and stemming. However, here is the screen you had to work with:

IAEA Manual at 38

Look, I think I’m qualified (go ahead and play this clip from “Beetlejuice” again while reading this part, please) to talk about this kind of stuff. I started doing legal research back in college, put myself through the last two years of law school as a part-time reference law librarian, was a practicing attorney for many years, taught classes for lawyers and paralegals on how to use the internet for legal research and now I teach eDiscovery as a law school adjunct professor. So here is my professional, expert opinion of this system:

"Eeek!"

In an attempt to make things somewhat easier, the STAIRS system had a feature called TLS (for Thesaurus Linguistic System) that let the user do the types of highly useful concept/fuzzy searches that we can now do in most eDiscovery systems. This feature wasn’t cheap, or easy to install. In STAIRS Redux, Blair notes that TLS required an engineer dedicated full-time to the study spending 18 months, at a cost of $150,000 to construct a thesaurus database. Unfortunately, TLS was, in short, like other STAIRS "enhancements," utterly useless:

. . . over the course of the STAIRS evaluation, not a single relevant document (retrieved or unretrieved) was found with the TLS that had not been found with the simple full-text searching techniques of STAIRS. The reason for this was abundantly clear. The thesaurus linked semantically related engineering terms, but these semantic relations were not really that useful for searching in the lawsuit. STAIRS Redux at 18

Sadly, I could have told them that a dictionary or taxonomy that you have to build from scratch is never going to be worth the effort. You need a pre-built taxonomy, or better yet a more flexible and comprehensive ontology, that is broad enough to cover the industry and context. The good news is that such things do exist now, and they are built into many of our current eDiscovery systems. And they work.

If you think Blair and Maron's dated technology is bad, check out part two of "Blair and Maron Must Die!", where we look at even more reasons the study's influence needs to go.

This post was authored by Michael Simon, an attorney and consultant with over 15 years of experience in the eDiscovery industry. Principal at Seventh Samurai and Adjunct Professor at Michigan State University College of Law, he regularly writes and presents on pressing eDiscovery issues. He can be reached at michael.simon@seventhsamurai.com.