Posted: 5 Min ReadNorton Labs

Deepfakes: A Terror in the era of AI?

The History and Impact of Deepfakes

In 2017, a user posted pornographic videos on a Reddit forum [1] where the faces of adult entertainers appearing in the videos were replaced with the faces of celebrities.  The videos were intentionally modified using a deep-learning-based adversarial technique called Deepfake.  The term “Deepfake” stems from a combination of “Deep Learning” and “Fake”.  This technique can be used to superimpose face images or facial motions of a target person onto a video of a source person in order to create a video of the target person behaving just as the source person does. The resulting fake is often virtually indiscernible from the authentic ones.  Deep learning has powerful applications in a variety of complex real-world problems, ranging from big-data analytics, computer-vision perception to unmanned control systems. Unfortunately, with the advancement of deep learning technologies, threats to the privacy, stability and security of machine learning-based systems have also developed. 

In general, a deepfake manipulates visual and/or audio content using Deep Neural Nets based machine learning methods.  It changes how a person, object or the environment is represented. Deepfakes can often have benign uses, such as updating film footage without having to reshoot scenes. However, the number of malicious implementations seems to dominate the number of benign ones. Propagation of fake eye-catching celebrity news is one of the notorious malicious applications of deepfake.  The development of advanced deep networks, such as the renowned Generative Adversarial Networks (GAN), makes forged content almost indistinguishable to even sophisticated detection algorithms. Simultaneously, less and less effort is required to produce deceptively convincing video/audio content. Recently proposed deepfake methods, such as DeepNude, have evolved to create forged videos using only a still image [2] which can transform a personal picture to non-consensual porn [3]. Similarly, an audio deepfake application was used to scam an entity out of $243,000 [4]. The faked content generated by deepfakes not only impacts public figures, but also many aspects of ordinary people’s lives.

Figure.1 [9] An example of face swapping
Figure.1 [9] An example of face swapping
Figure.2 Auto-encoder for image reconstruction
Figure.2 Auto-encoder for image reconstruction

Typical scenarios of deepfake applications can be categorized into two types:

Face swapping: As demonstrated in Figure.1, face swapping replaces the still face image of a person (the source person) with that of another person (the target person), resulting in the concealment of the target person’s identity. The target person can, then, use the phony face images to gain access to the source person’s privacy information via face recognition based biometric authentication tools. The technology used by face swapping, a.k.a. auto-encoder [5], also builds the base for further advanced deepfake applications. Auto-encoder (shown in Figure.2) is composed by an encoder and a decoder module. The encoder module compresses the input image frames / audio signals into a low-dimensional feature space. The decoder maps the low-dimensional features to reconstruct the profile of input data. In face swapping (as shown in Figure.3), two auto-encoders are maintained for the face images of the source and target person. The two auto-encoders share the same encoder architecture, so that both face images are compressed in the same latent representation space. In contrast, they are equipped with different decoder architectures.  Once the training process is completed, the decoders are then swapped (as shown in Figure.3), so that the source image is decompressed and reconstructed using the target image’s decoder. The output from the auto-encoders produces an image that delicately stitches the source’s face onto the target’s, while staying true to the target’s expressions. However, the angle of the synthesised face matches the angle of the head of the target person. This is the only step in the process that require manual efforts rather than automatic tuning based on machine learning algorithms.

Figure.3 [7] Face swapping using coupled auto-encoders
Figure.3 [7] Face swapping using coupled auto-encoders

Speech forgery: Speech forgery [6] produces bogus facial motions of the target person. It manipulates the features of the face images of the target person, including the movement of their lips, eyebrows, eyeballs and tilting of their heads, in order to contort the person’s facial expressions. For this purpose, speech forgery is accompanied with speech synthesis, which learns a model of the target person’s voice. Speech forgery then synchronizes the modified facial motions, the text and the voice. In the resulting fraudulent video, the targeted person appears to say something that he/she never said. Additionally, the recent advanced speech synthesis methods support more feasibility in speech manipulation, e.g. users can choose a voice with any age and gender.

As we pointed out, the unveiled threat of deepfakes is primarily focused on violating personal identity. Potentially, deepfakes could disrupt our daily lives. For example, a deepfake could allow people to challenge the veracity of genuine film footage. A bad actor is given a perfect excuse to deny involvement in untoward activity, if every video can be forged. Beyond the entertainment domain, the availability of many easy-to-apply deepfake methods raises a concern about whether video/image/audio evidence should be used by law enforcement, especially in criminal investigations. Visual communication expert Paul Lester suggests that images/videos are not practical as physical evidence because of the threat of deepfake based forgery [1]. As it happens, two US law professors, Danielle Citron and Robert Chesney, envision that fraudulent film footage can propagate and spread propaganda with the intention of destabilizing the law system [1].  

Nevertheless, we should not worry too much. From a technological perspective, it remains difficult for deepfakes to create convincing forgeries without large amounts of manual post-processing efforts, such as tuning the tilting angle of facial areas and lip movements. Furthermore, the audience for bogus videos will eventually shift away from the intuitive assumption that “what I see is what I can trust”.  In addition, the recent progress of deep learning research has witnessed the emergence of fraudulent images/videos detection mechanisms [7]. In these methods, faked image detection is deemed as a two-step binary classification problem. The first step is to extract image descriptors depicting the difference between the authentic images and the forged ones. The extracted image descriptors are then fed into a deep learning-based classifier to conduct detection. The limit to these methods is a lack of forged image frames for training. Agarwal and Varshney [8] proposed using GAN to generate synthetic forged image frames, which mitigates the issue and greatly improves detection accuracy. 

Reference:

1. “Deepfakes and Audio-visual Disinformation: Snapshot paper”, The Centre for Data Ethics and Innovation Snapshots, September 12, 2019.

2. “Few-shot adversarial learning of realistic neural talking head models”, Zakharov, E., Shysheya, A., Burkov, E., and Lempitsky, V., arXiv preprint arXiv:1905.08233, 2019.

3. “A guy made a deepfake app to turn photos of women into nudes. It didn’t go well.”, Samuel, S, June 27, 2019, Retrieved from https://www.vox.com/2019/6/27/18761639/ai-deepfake-deepnude-appnude-women-porn

4. “A Voice Deepfake Was Used To Scam A CEO Out of $243,000”, Jesse Damiani, Forbes, September 3, 2019, https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice-deepfake-was-used-to-scam-a-ceo-out-of-243000/#2b98f1d82241

5. “Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion”, Pascal Vincent and Hugo Larochelle, Journal of Machine Learning Research, vol.11, pp.3371-3408, 2010

6. “Face2Face: real-time face capture and reenactment of RGB videos”, Justus Thies, Michael Zollhofer, Marc Stamminger, Christian Theobalt and Matthias Niebner, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016.

7. “Deep Learning for Deepfakes Creation and Detection”, Thanh Thi Nguyen, Cuong M. Nguyen, Dung Tien Nguyen, Duc Thanh Nguyen and Saeid Nahavandi, arXiv preprint arXiv:1909.11573,2019.

8. “Limits of deepfake detection: A robust estimation viewpoint”, Sakshi Agarwal and Lav R. Varshney, arXiv preprint arXiv:1905.03493, 2019.

9. “Facebook AI Launches Its Deepfake Detection Challenge”, Eliza Strickland, TechTalk at IEEE Spectrum, December 11, 2019, https://spectrum.ieee.org/tech-talk/artificial-intelligence/machine-learning/facebook-ai-launches-its-deepfake-detection-challenge

Editorial note: Our articles provide educational information for you. NortonLifeLock offerings may not cover or protect against every type of crime, fraud, or threat we write about. Our goal is to increase awareness about cyber safety. Please review complete Terms during enrollment or setup. Remember that no one can prevent all identity theft or cybercrime, and that LifeLock does not monitor all transactions at all businesses.

Copyright © 2020 NortonLifeLock Inc. All rights reserved. NortonLifeLock, the NortonLifeLock Logo, the Checkmark Logo, Norton, LifeLock, and the LockMan Logo are trademarks or registered trademarks of NortonLifeLock Inc. or its affiliates in the United States and other countries. Other names may be trademarks of their respective owners.

About the Author

Yufei Han

Senior Principal Researcher

Dr. Yufei Han currently works as Senior Principal Researcher. His research interests include robust learning with imperfect telemetry data, adversarial learning, and privacy-preserving learning, which aim at providing a trusted machine learning service.

Want to comment on this post?

We encourage you to share your thoughts on your favorite social platform.