An AI-driven discovery process went off course recently, spinning out of control in class action litigation against some of the nation’s largest airlines. The result: millions of unresponsive documents produced without any easy way to tell the docs that matter apart from those that don't.
This TAR nosedive comes from In Re: Domestic Airline Travel Antitrust Litigation, in which plaintiffs allege that United, Delta, Southwest, and American Airlines violated the Sherman Act by colluding to reduce seat capacity in order to fix ticket prices. (Southwest and American have both since settled).
As pre-class-certification discovery began, both the plaintiffs and United deployed technology-assisted review in order to help them work through the extremely large body of potentially relevant documents. Also known as TAR, predictive coding, machine learning, or simply AI, technology-assisted review seeks to mimic and automate the document coding of a human reviewer, learning to mark documents responsive or not based on the actions of a flesh-and-blood legal professional.
In this instance, however, that process ran in to what D.D.C. Judge Colleen Kollar-Kotelly called a “glitch.”
United’s TAR process ended up producing more than 3.5 million documents, with only an estimated 600,000 docs, or 17 percent, being responsive to the plaintiff’s request. That AI-powered document dump left the plaintiffs with little option but to demand an extension of six months, just to get through the millions of documents accidentally rerouted their way.
As the discovery process began, United and plaintiffs negotiated a TAR protocol that was intended to balance the dual imperatives of recall, the AI’s ability to capture responsive documents, and precision, the technology’s ability to accurately categorize documents.
Yet when United’s documents were produced, the plaintiffs claimed that those imperatives hadn’t been met. Indeed, United’s 3.5-million-document production was 1 million documents larger than the other airlines’ productions combined.
Under the agreed-upon protocol, United was to have a minimum recall rate of 75 percent and a "reasonable level" of precision. United was to review representative samples to ensure accuracy and completeness. But those metrics were not shared with the plaintiffs until 7:23 pm on the Friday before United’s Monday production deadline. While the control set showed acceptable rates of recall and precision, the validation samples were far different, revealing that United’s TAR process was incredibly over-inclusive (a nearly 98 percent recall) and extraordinarily imprecise.
Perhaps due to the often “black box” nature of such technology, it took weeks for United to explain the discrepancy between the two sets of metrics. In the end, United disclosed that it had misreported the control set and explained that the higher level of recall in the validation sample resulted in the precision drop. (“Plaintiffs did not understand that explanation” was the response.)
Of the millions of documents produced, only a small fraction were estimated to be responsive—with no way to tell the two apart. To find a single responsive document, United’s 70-person review team would have to review an average of five non-responsive documents first.
Why not just do it over? That wasn’t possible, the plaintiffs argued, as it would require redoing the entire TAR training from scratch. The plaintiffs’ own attempts to use TAR to sift through the airline's massive productions were similarly fruitless. Plaintiffs TAR tool “is unlikely to weed out the millions of non-responsive documents from United’s production,” they explained. As with United’s TAR process, retraining the plaintiff’s AI would require starting over entirely, implicating the same time, cost, and delay concerns that eliminated that opportunity for United.
When seeking to modify a scheduling order, the moving party must show that such modification is for good cause. To evaluate whether good cause has been shown, the court considered the six factors enumerated in Childers v. Slater (D.D.C. 2000):
(1) whether trial is imminent; (2) whether the request is opposed; (3) whether the nonmoving party would be prejudiced; (4) whether the moving party was diligent in obtaining discovery within the guidelines established by the court; (5) the foreseeability of the need for additional discovery in light of the time allotted by the district court; and (6) the likelihood that discovery will lead to relevant evidence.
On these points, the plaintiffs easily prevailed. The defendant airlines questioned whether the plaintiffs’ 70-person review team was sufficient, asking to see how many hours each attorney had worked and for the court to review attorney timesheets. The court declined.
The airlines also questioned why it would take so long to review six million documents, suggesting that reviewers could get through the production at a rate of three documents per minute. That was a speed of review the plaintiffs described as “preposterous.”
United even argued that its TAR process was reasonable. But that, Judge Kollar-Kotelly wrote, was irrelevant.
Indeed, the defendants, the court explained, missed the issue at hand—”whether Plaintiffs had to deal with unforeseen or unanticipated matters, which justify Plaintiffs’ request for additional time.” Due to United's TAR glitch, they had, and the delay that would result from an extension was not enough to justify denying plaintiffs time to get through the immense body of documents.
TAR was designed for colossal, break-the-bank cases like this, gargantuan pieces of litigation involving incredibly data-rich defendants and millions of potentially relevant documents. Yet the legal industry has been slow to adopt TAR, and not just because gargantuan MDLs make up only a tiny share of the national docket. The cost, complexity, and potential risk of such processes seem to have prevented their wider adoption. Cases like In Re: Domestic Airline Travel Antitrust Litigation are unlikely to help TAR take flight.