The era of deepfakes is just around the corner. The ability to create deepfakes—AI-generated audio and visual content that is virtually impossible to distinguish from the real thing—could be available to just about anyone within the next six to 12 months, according to Hao Li, one of the pioneers of deepfake technology. “Soon, it’s going to get to the point where there is no way that we can actually detect them anymore,” Li told CNBC last Friday.
And that soon is soon. Indeed, several consumer products are making deepfakes readily available to the general public today.
Fake or misleading data is nothing new. After all, people (and bots created by people) have been creating fake or misleading data for years. Consider, for example, the endless stream of phishing emails you receive every day.
But deepfakes go further than previous frauds, threatening to erode trust in data sources that have long been too difficult to manipulate successfully—primarily, audio and video data.
Deepfakes, whose name is a portmanteau of “deep learning” and “fake,” typically operate through the use of “generative adversarial networks” or GANs. GANs work by pitting AI algorithms against each other. One algorithm creates an image, another identifies it as fake, inaccurate, or otherwise “off.” The process repeats ad nauseam until the AI teaches itself to create a forgery so good it can’t be detected as such (by the adversarial algorithms, at least).
Currently, most deepfakes still feel a little bit “off,” allowing them to be spotted with the naked eye. But, as the technology becomes more sophisticated, deepfakes could quickly become undetectable, and that’s cause for significant concern.
Take, for example, this deepfake interview with Russian President Vladimir Putin—or rather, an AI approximation of him—created by Li and presented at MIT Technology Review’s EmTech conference last week.
Of course, the "interview" isn't perfect. There are obvious, and instantly identifiable errors with Putin's faked visage. But the remarkable fact is that such a transformation could be made so easily—live, with no human intervention, using just basic technology.
The political implications of deepfakes are significant enough that DARPA, the Pentagon’s Defense Advanced Research Projects Agency, is racing to develop technology to keep up with deepfake videos—partnering with research agencies to train AI to detect fakes just as well as it can create them. Facebook, too, is working to counter deepfake technology before it takes over social media.
Deepfakes don’t just threaten geopolitical intrigue and disinformation campaigns. They could be used for a variety of purposes, nefarious and not: media (Li, for example, was part of the team that edited Paul Walker into "Furious 7" after his death); social engineering (imagine an phone call from someone impersonating your boss, in your boss’s exact voice, with instructions for an urgent wire transfer); or even falsifying evidence (faked surveillance footage, for example).
Today, making a deepfake is becoming increasingly simple. Consider it the democratization of high-tech deception.
Two products currently allowing individuals to create highly sophisticated, altered video and audio content show the growing power of the technology—though neither of them advertises their services for deception.
Zao, an app available in China, allows users to superimpose their images into scenes from film and television. Users of the viral app, which isn’t available outside of China yet, can have their faces added to a pre-selected list of videos, with fairly convincing results. And Zao is able to accomplish this with just a single image of the user's face.
In case you haven't heard, #ZAO is a Chinese app which completely blew up since Friday. Best application of 'Deepfake'-style AI facial replacement I've ever seen.— Allan Xia (@AllanXia) September 1, 2019
Here's an example of me as DiCaprio (generated in under 8 secs from that one photo in the thumbnail) 🤯 pic.twitter.com/1RpnJJ3wgT
To use Zao, all you need is a smartphone and a Chinese phone number and you can start pitting yourself against the research behemoths at DARPA and Facebook.
Indeed, it was Zao that caused Hao Li to update his timeline for the emergence of consumer deepfakes. Li had previously predicted that undetectable deepfakes would arrive in two to three years. Zao caused him to cut that estimate down to as little as six months, he told CNBC. “In some ways we already know how to do it,” he said, adding that it is “only a matter of training with more data and implementing it.”
On this side of the Pacific, the editing program Descript is promising to make AI-generated audio widely available.
Descript lets you record, transcribe, and edit audio for podcasts, video, and the like. That, alone, isn’t very interesting. Where Descript stands out, though, is how it edits audio—we’re not talking about cutting tracks and overdubbing recordings. When you “edit” audio in Descript, you can just type in changes directly on the transcript, then have those played back as your own voice, saying words you never actually uttered allowed.
Descript’s Overdub feature “allows you to replace recorded words and phrases with synthesized speech that's tonally blended with the surrounding audio,” the company explains.
So, you record yourself saying something along the lines of “I was with Miss Scarlet when the murder was committed.” Descript’s AI learns your voice and allows you to edit the audio to read “I was with Colonel Mustard when the murder was committed,”—generating a brand new audio recording of the edited sentence you never actually spoke.
Overdub is even able to generate audio for uncommon words, like Logikcull, and place natural inflections and emphasis on those words. Indeed, given the available previews, this AI-generated audio is hard to distinguish from the real thing.
Descript’s Overdub feature is currently in closed beta and the company has a webpage dedicated to its ethical implications. Descript is “committed to… unlocking the benefits of generative media while safeguarding against malicious use,” it says. But it acknowledges the risks such products pose.
“Today, our technology is unique, but the foundational research is already widely available,” the company says.
But, “other generative media products will exist soon, and there's no reason to assume they will have the same constraints we've added.”
Will fake docs end up produced in discovery? Even making their way into court? It’s possible, though many top researchers are looking for ways to detect them. But before deepfakes hit the courtrooms, they could wreak havoc in the media, on the internet, and, of course, your aunt’s Facebook feed.
And the existence of sophisticated, virtually unidentifiable fakes doesn’t just risk flooding the world with falsified data—it undermines faith in genuine audio and visual data as well.
When you can’t trust what you see and hear, everything is devalued.