Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Last update: Jan 03, 2023

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Python >= 3.6 , Pytorch >= 1.8 and ffmpeg
Set up OpenFace
- We use the OpenFace tools to extract the initial pose of the reference image
- Make sure you have installed this tool, and set the OPENFACE_POSE_EXTRACTOR_PATH in config.py. For example, it should be the absolute path of the "FeatureExtraction.exe" for Windows.
Other requirements are listed in the 'requirements.txt'

Pretrained Checkpoint

Please download the pretrained checkpoint from google-drive and unzip it to the directory (/checkpoints). Or manually modify the settings of GENERATOR_CKPT and AUDIO2POSE_CKPT in the config.py.

Extract phoneme

We employ the CMU phoneset to represent phonemes, the extra 'SIL' means silence. All the phonesets can be seen in 'phindex.json'.

We have extracted the phonemes for the audios in the 'sample/audio' directory. For other audios, you can extract the phonemes by other ASR tools and then map them to the CMU phoneset. Or email to [email protected] for help.

Generate Demo Results

python test_script.py --img_path xxx.jpg --audio_path xxx.wav --phoneme_path xxx.json --save_dir "YOUR_DIR"

Note that the input images must keep the same height and width and the face should be appropriately cropped as in samples/imgs. You can also preprocess your images with image_preprocess.py.

License and Citation

@InProceedings{wang2021one,
author = Suzhen Wang, Lincheng Li, Yu Ding, Xin Yu
title = {One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning},
booktitle = {AAAI 2022},
year = {2022},
}

Acknowledgement

This codebase is based on First Order Motion Model and imaginaire, thanks for their contributions.

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Related tags

Overview

One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)

Paper | Demo

Requirements

Pretrained Checkpoint

Extract phoneme

Generate Demo Results

License and Citation

Acknowledgement

Owner

FuxiVirtualHuman

Official PyTorch implementation of "BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation" (NeurIPS 2021)

Computationally efficient algorithm that identifies boundary points of a point cloud.

LSTM model trained on a small dataset of 3000 names written in PyTorch

Contrastive Learning of Image Representations with Cross-Video Cycle-Consistency

Training data extraction on GPT-2

An experimental technique for efficiently exploring neural architectures.

Code/data of the paper "Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction" (BMVC2021)

[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Simulate genealogical trees and genomic sequence data using population genetic models

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

How the Deep Q-learning method works and discuss the new ideas that makes the algorithm work

Lua-parser-lark - An out-of-box Lua parser written in Lark

A python code to convert Keras pre-trained weights to Pytorch version

Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

An implementation of the methods presented in Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

DC3: A Learning Method for Optimization with Hard Constraints

A spherical CNN for weather forecasting

PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

Unsupervised Image to Image Translation with Generative Adversarial Networks