(Editor’s note: Welcome to Bold’s series on AI-created deepfakes. First up: a deepfake explainer.)
AI technology is now dominating the digital world, but how big will it get? Global statistics say 90% of online content will be AI-generated by 2026. But that’s barely the tip of the iceberg for video and audio AI, which suddenly seem to be everywhere, from newly-released Beatles songs (!) to fake video clips that pretend to be breaking news. What is deepfake technology? Don’t sweat it, Bold Business has you covered.
What is Deepfake Technology?
In its simplest terms, deepfake is a fake, digitally manipulated video or audio file using deep learning. Deep learning, or a type of advanced machine learning software, allows people to create situations using a person’s image and voice, even if they didn’t happen.
Deepfakes originated in late 2017 from a Reddit user, “deepfakes,” who shared altered clips on their subreddit. Many early deepfake videos use celebrities’ faces and swap them for pornographic purposes.
But in 2018 and 2019, applications using deepfake technology started appearing, like FakeApp, Faceswap, and DeepFaceLab. Due to their rising popularity, mobile application companies began to enter the market, such as Zao. Zao allows users to place their faces on TV and movie clips with one photo.
In addition, DataGrid, a Japanese AI company, created a full body deepfake that creates a person from scratch, mainly intended for fashion and apparel. Soon after, audio deepfake collaborations with AI software started, making human voice cloning possible after five seconds of listening.
How Deepfakes Work and How You Can Create Them
Although deepfake AI files use a collaboration of AI technology and deep learning software, they work by utilizing two algorithms- a generator and a discriminator- to make successful and refined fake content. The generator algorithm focuses on building a training data set from the creator’s desired output, which becomes the initial fake digital content. The discriminator algorithm works on analyzing how realistic or fake the version is.
Deepfake files would go through this process numerous times to ensure that the generator continues to improve at creating realistic content. It also helps the discriminator become more skilled at detecting the flaws, which the generator would correct.
Due to this process, the two algorithms combine, creating a generative adversarial network (GAN). The GAN then uses deep learning to distinguish patterns in authentic images and use them to create the fakes. When making deepfake photos, a GAN system checks the photographs from different angles to capture all their details and perspectives. With deepfake videos, GAN views the videos on different sides while analyzing their behavior, movement, and speech patterns. This information is then run through the discriminator repeatedly to fine-tune their realism.
When creating deepfake videos, people can do them in one of two ways. One is by using an original video source of their target, where the person in the video can say or do things they never did. The second is by swapping the person’s face onto a different video or face-swapping.
There are also specific approaches to deepfake file creation. Here are some of them.
Source Video Deepfakes
When you plan on working from a source video, you may need to use a neural network-based deepfake autoencoder. Neural network-based deepfake autoencoders analyze the content to understand the target’s relevant attributes, like facial expressions and body language.
Once it finishes analyzing the said attributes, it then applies these characteristics to the original video. The autoencoder has an encoder, which encodes the target’s specific features, and a decoder, which puts them into the video.
Audio Deepfakes
With audio deepfakes, the creator uses a GAN system to clone the audio of the target’s voice, creating a model from the vocal patterns. It also uses that model to make the voice say anything the creator wants. Audio deepfakes are the most common deepfake approaches used in video game development.
Lip Syncing
Lip syncing is another well-known deepfake technique. In this deepfake technology, the software maps a voice recording to the video file, making it look like the person in the clip speaks the recorded file. When the audio is a deepfake creation, it adds another layer of deception to the video. Lip-syncing deepfakes use recurrent neural networks and autoencoders to function.
Deepfake Technology Terminologies
Delving deep into deepfake AI and technology can be a headache-inducing experience. But you could lessen the confusion of learning about it by familiarizing yourself with the common deepfake terminologies. Here are some of them.
- GAN, or generative adversarial network, is a neural network technology for deepfake content development. It uses two algorithms, the generator, and the discriminator.
- Convolutional Neural Networks are network technologies that analyze patterns in visual data. CNNs are the most common technologies creators use for facial recognition and movement tracking.
- Autoencoders – This neural network technology help identify the relevant characteristics and features of the target, like facial expressions and body movements. In addition, they also impose these attributes on the video source.
- Natural Language Processing are algorithms used for creating deepfake audio. NLP algorithms check the target’s speech before generating original text using their audio.
Understanding Successful Deepfakes
Deepfake content is now widespread in the digital world. Although understanding its evolution and processes can help you identify its uses and influence, successful deepfakes are only successful when generative algorithms can send artificial images that fool its discriminative algorithms.