Mona Lisa And Nancy Pelosi: The Implications Of Deepfakes
Last week, Samsung researchers announced a system that can create realistic deepfake video avatars from just one image. Around the same time, a doctored video surfaced of House Speaker Nancy Pelosi that had been slowed down to make her appear drunk. These two unsettling events, representing an impressive achievement by the Samsung team and a less sophisticated case of “ malinformation “ around Pelosi, bring the issue of AI-augmented deepfake videos starkly back into the limelight. Last year, deepfake videos dominated the headlines, culminating in deepfake celebrity pornography and a blanket ban by Reddit in August 2018.
But the technique did not die out, and false or altered videos are now at their most convincing. With the quality of these videos improving rapidly, consumers and companies alike need to be aware of how to spot fake videos and regulations around falsified material and facial data collection need to catch up.
Mona Lisa smiles
In a five-minute YouTube video, the Moscow-based Samsung team runs through the different capabilities of their new facial mapping model. These include mapping someone’s face onto another video source using as little as one input image, from which it is possible to animate the faces of oil paintings with some success, “despite the large domain gap between paintings and YouTube videos,” as the video’s narrator stated. The algorithm works by first doing a huge amount of meta-learning using image banks that can be compared to a trusted model, such as a real video of that person. To map another face or create a realistic avatar of a person, an “embedder” network measures parameters like the size and location of eyes, nose and mouth from an input image and converts this into vector data. A separate “generator” network then maps the “facial landmarks” of the person in the target video to accurately capture their range of expression. Finally, a “discriminator” network positions the first vectors onto the facial landmarks in the target video so that the image’s features are superimposed over the video.
Using this kind of mapping based on facial geometry, the system achieves a much more accurate result with limited input data-as little as one image, although “increasing the number of frames leads to head models of higher realism and better identity preservation,” said researchers. The limits of using a single image are clear when applied to oil paintings like the Mona Lisa, as the resulting videos using the Mona Lisa’s face take on the characteristics of the person in the background video, with “facial landmarks taken from three different people resulting in videos with very distinct personalities.”
Despite the limitations of a deepfake created from a single source, the system is certainly very effective and can extrapolate to show a full range of movements using images taken from only one angle.
The Pelosi predicament
Disturbingly accurate deepfakes built by individuals are also appearing online. These are created by either training AI or combining AI and off-the-shelf video editing software. One YouTube channel, Ctrl Shift Face, has slowly been accruing views on its deepfake videos (3 million in total). In the first and most impressive clip, actor and comedian Bill Hader’s face morphs into Al Pacino and Arnold Schwarzenegger seamlessly as he makes vocal impressions of the actors. While the capabilities of official AI research projects are improving rapidly, there also seems to be a number of self-taught individuals who are developing their own techniques for entertainment (as with Ctrl Shift Face) or for more malicious purposes. This could be a worrying trend-as news stories become harder to verify, visual media can spread on social media sites with little official regulation or verification, and more sinister practices such as revenge porn can become more popular as a means of propagating blackmail, slander or shaming individuals.
The altered video of Nancy Pelosi giving a speech at the Center for American Progress on May 22, which had been slowed down and pitch-shifted to make her seem drunk, catalyzed the return of the debate on deepfakes. Thanks to the voracious nature of social media, the altered video ( as some claimed) was shared far and wide with many commenting on her apparent inebriation. “She always looks like she’s a non-functioning alcoholic,” one contributor to a technically not a deepfake Fox and Friends broadcast said on May 24, and the Former Mayor of New York Rudy Giuliani retweeted the video questioning Pelosi’s “bizarre speech pattern.” This seems like intentional use of doctored media for political purposes, a clear potential use of deepfake material-in Pelosi’s speech she said President Trump’s refusal to cooperate with congressional investigations amounted to a “cover-up”-and Donald Trump, Barack Obama, and Vladimir Putin have all been prominent targets of this technology before.
However, perpetrators of deepfakes are incredibly difficult to track down. Using the free and unregulatable expanse of the internet, users share lessons, tips, videos and different versions of software. In doing so the various tools become better and the network of content becomes harder to navigate.
What is also concerning is that many companies, governments, and police forces have begun implementing facial recognition technology and storing vast amounts of facial data to feed their own AI systems.
Recently, the airline JetBlue was reprimanded by passenger MacKenzie Fegan, who had been allowed to board using only a picture of her face -no boarding pass or passport check-without being consulted first. This incident raised a number of questions about facial recognition technology and its general usage: how was Fegan’s biometric information available to a private company, how was it implemented without being noticed or consented to, and how securely can this information really be transferred between legitimate parties?
The amount of video-based IoT security and verification technology in circulation means that identity theft through false video identification could be a real threat if such data were to be leaked. There exists a “legal vacuum” surrounding facial recognition technology, according to research from the UEA School of Law, and the use of it by police forces around the world does not currently have sufficient legal regulation. This means that there are already vast databases of facial data that could be used to feed an AI or AI-assisted deepfake system, and a breach could lead to another wave of political or malicious videos that will be more sophisticated and harder to trace. Given the pace of progress in legitimate research like that of Samsung and entertainment like Ctrl Shift Face, others with less benign intentions will have similar capabilities as well.
Devil in the details
Deepfake technology is clearly becoming incredibly sophisticated, evidenced by the impressive realism using just one input image by the Samsung researchers in Moscow, and the home-made technology that blends faces of actors using YouTube and film footage. The risk that this presents is not yet clear, but there have already been political, celebrity and civilian targets of false material being used to coerce, intimidate or harass.
Governments, regulators and individuals need to become more aware of the tell-tale signs of deepfake material, so that the most sophisticated material can be outed out as soon as possible.
Regulators must navigate a difficult legal landscape around free-speech and ownership laws to properly regulate the use of this technology before it can do significant damage. The wider usage of facial recognition technology must also be considered in light of these capabilities, so that the risks of collecting and distributing facial data are properly accounted for. It is impressive to see the pace at which deepfake technology has progressed, but as with any technology involving AI and personal data, a balance must be struck between open innovation and proper regulation.
Originally published at https://www.forbes.com.