In 2016, a group from Niessner Lab in Germany published a groundbreaking achievement in the world of computer facial manipulation. Their new technology, called Face2Face, captures one person’s facial expressions as they talk into a webcam and maps those facial expressions directly onto a separate individual’s face in real-time. In essence, this means that you can take a video of anyone and make their face show any expression you’d like. For example, in a demonstration video, footage of Vladimir Putin giving a serious speech becomes a video of him smiling, then frowning, with eyebrows up and then down.
More recently, in July 2017, researchers from the University of Washington developed an algorithm that can turn audio clips into realistic, lip-synced video of that person speaking those words. This is an advancement from the Face2Face technology because here, the researchers use audio only and do not have access to the ground source video. In a video demonstration, the algorithm turns audio clips of President Obama into videos of President Obama. The algorithm accomplished this by watching 14 hours of President Obama’s weekly address footage, and learning which mouth shapes correspond to each audio sound. The algorithm then listens to new audio and creates the corresponding mouth shape, which is then mapped onto a video of President Obama. The result is a convincing video that does not look manufactured. For now, this technology cannot take anyone’s voice and turn it into a video of President Obama speaking; it can only be used for mapping audio of President Obama onto videos of President Obama.
On the horizon, however, Adobe is developing a tool that can listen to a voice for 20 minutes and then generate that voice saying anything. In 2016, Adobe released a “Sneak Peeks” presentation of this tool, called VoCo. In the presentation, an Adobe technologist uses an audio clip of comedian Keegan-Michael Key talking about his excitement after an award nomination. He says, “And I, uh, kissed my dog and my wife.” The tech then erases the words and types in new words to make Mr. Key say that he kissed Jordan (Key’s comedy partner), and then that he kissed Jordan three times. The audio results sound almost perfect. Even though this technology hasn’t been released yet, it has still caused many concerns, such as
- Difficulty using digital files as evidence;
- Manipulation of systems that use voice identification (such as automated bank tellers); and
- Manipulation of the democratic process via the distribution of artificial voice or video recordings.
Some predict that these technologies will be used to improve video conferencing, to improve translations, improve filmand video game simulations, and to simulate conversations with historical figures in virtual reality. What are other ways this technology could be used to help society? What are some concerns about this technology? What are the implications for public figures, including celebrities and politicians? How should we respond to these technologies?