Updated on November 4, 2020
During times of uncertainty, transparency into government action and decision-making could not be more valuableâor more urgent.
The ability to shine a light on the working of the government, while always important, becomes even more essential in times of crisis. And certainly, we are in a time of crisisâor crisesâfrom election wrangling to social unrest, to the pandemic and economic downturn.
Today, public interest organizations and journalists desperately need access to government records. Government bodies themselves need technology to make that data available in a speedy, accurate, and secure way.
And everyone needs to be able to make sense of that data.
Without the right technology, the light revealed by state and federal sunshine laws can be blinding. A FOIA document dump can bury as much as it reveals. Thatâs why organizations like the Sierra Club and the New York Times have used Logikcull to rapidly sift through data revealed by FOIA requests and litigation, allowing them to quickly cull through the noise and find what matters.
Below, in this post originally published in the summer of 2019, we show how parties can use technology like Logikcull to sift through massive amounts of data quickly and easily, whether they are journalists, non-profits, government bodies, or litigants. Watch below or try it out yourself here.
â
â
Last Sunday, The New York Times published an in-depth report looking into potential conflicts between Secretary of Transportation Elaine Chao, her familyâs international shipping business, American shipping interests, and China.
The geopolitical implications are interesting enough, but for those of us in the discovery realm, the Timesâs report is also another compelling illustration of how journalists are increasingly using data, and the open records laws which provide access to that data, to make headlines.
Public records requests, after all, are not too different from discovery projects, requiring the collection, review, and production of massive document collections. And, for the receiving party, those productions are often similar to the data dumps that pervade litigation. Except theyâre worse.
So, what does this report tell us about FOIA, data-journalism, and government accountability? And how can the right technology help those looking for the next big lede in a pile of government emailsâor opening those records to the public in the first place?
Follow along in the video above to find out, or read the approximate transcription that follows. And if you want to jump into the FOIA files yourself, you can do so here.
Hello everyone, Iâm Casey Sullivan of Logikcull.com and today I want to talk about The New York Times. You might have seen the Timesâs report on Elaine Chao this weekend. The front-page feature, running nearly 7,000 words long, details the potential conflicts between Elaine Chao, Secretary of the Department of Transportation, her familiesâ international shipping business, and the American shipping interests she is, ostensibly, supposed to represent.
Itâs a compelling read, for anyone who likes big headlines or big ships. But, what stands out to us here at Logikcull is the role that data and public records requests played in breaking this story.
Many of the new revelations came to light through email correspondence the Times obtained under the Freedom of Information Act, or FOIA. Stories like these highlight the importance of data, and discovery, to our democracy, and theyâre why we give journalists like these free Logikcull accountsâso they can have the tools they need to find what matters in data.
Now, FOIA requests arenât too different from discovery requests, but they can present even more challenges. For producing parties, FOIA caseloads can be massive, leading to years-long backlogsâand often litigation. The tools FOIA teams have to review and produce documents, too, can make the process even slower. The State Department, for example, recently estimated that it would take 66 years to review 100,000 emails, partly because of its frustratingly manual review process. But thatâs a separate issue.
For receiving partiesânonprofits, journalists, businessesâFOIA productions are often very similar to litigation data dumps, with the important information hidden under a mountain of junk. Except, unlike in legal discovery, FOIA doesnât allow requesting parties to specify the form of production. Which means that FOIA documents often come in a single, thousand-page long PDF, possibly delivered on a CD-Rom, with no file metadata. Often, they canât even be searched without further processing.
The government emails made available by the Times are a good example of this. One giant PDF of emailsâsome of which look like they were printed, scanned, then redacted and produced.
Whatâs a public advocate or public records officer to do?
Honestly, getting through these productions isnât too tricky if you have the right tools, which we know the Times does.
Letâs take a look at how Logikcull would treat these emails.
Once youâve waited your months or years to obtain your FOIA production, you can just drag and drop it into Logikcull and get into your data in a matter of minutes. While itâs uploading, Logikcull is performing more than 3,000 automated processing steps, like making all those imaged files text searchable.
Alright, so weâre into the data. But first, youâll notice that weâve got a problem here. Since the data was produced in a single PDF, weâve got a single PDF. How do you call out the most critical data when dealing with a giant PDF? How do you tag docs, comment to colleagues, etc.?
Easy, you split it up. Weâre going to take the quickest route and just do page by page, here, but you can also split docs by bookmarks, every page, on specified pages, etc.
Now that weâve broken up our production we can get into searching. I can do a traditional linear review, and here we donât have many pages so thatâs not a problem. I can tag pages and comment to collaborators, who will be instantly notified. If Iâm a FOIA officer, performing a review of docs before theyâre released to the public, I can apply redactions and either tag or comment with the relevant FOIA exemptionâIâll tag this with my favorite FOIA exemption, B9, which excludes disclosure of information relating to wells.
To get through a real data dump, though, youâll want to cull out data you donât need and focus in on the stuff you do. Those will let you narrow your docs by custodian, date range, sender, etc.
Because of the format these documents came in, weâre going to be lacking some metadata that would appear in discovery contexts: things like custodians, create date, to: and from: fields. But we can work around that, for example, by keyword searching for something like ââFrom: Browne, Noahâ AND âTo: Leiby, Thomas.ââ Not ideal, but it works.
We can also do straight keyword search for topics that we want to dive right in to. Here, weâll do âethicsâ and voila, we can see all the internal correspondence over potential ethics concerns.
Suddenly, a FOIA document dump isnât that hard to get through. If youâd like to play around in this very project, you can access it here. And if youâre a journalist who would like access to Logikcull, the same tool Pulitzer Prizing winning Timesâ journalists like Eric Lipton use, shoot an email to hi@logikcull.com and weâd be happy to give you an accountâfor free.
â