Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Last update: Jan 09, 2023

Related tags

Overview

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

We provide PyTorch implementations for our arxiv paper "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"(http://arxiv.org/abs/2002.10137).

Note that this code is protected under patent. It is for research purposes only at your university (research institution) only. If you are interested in business purposes/for-profit use, please contact Prof.Liu (the corresponding author, email: [email protected]).

We provide a demo video here (please search for "Talking Face" in this page and click the "demo video" button).

Colab

Our Proposed Framework

Prerequisites

Linux or macOS
NVIDIA GPU
Python 3
MATLAB

Getting Started

Installation

You can create a virtual env, and install all the dependencies by

pip install -r requirements.txt

Download pre-trained models

Including pre-trained general models and models needed for face reconstruction, identity feature extraction etc
Download from BaiduYun(extract code:usdm) or GoogleDrive and copy to corresponding subfolders (Audio, Deep3DFaceReconstruction, render-to-video).

Download face model for 3d face reconstruction

Download Basel Face Model from https://faces.dmi.unibas.ch/bfm/main.php?nav=1-0&id=basel_face_model, and copy 01_MorphableModel.mat to Deep3DFaceReconstruction/BFM folder
Download Expression Basis from CoarseData of Guo et al., and copy Exp_Pca.bin to Deep3DFaceReconstruction/BFM folder

Fine-tune on a target peron's short video

1. Prepare a talking face video that satisfies: 1) contains a single person, 2) 25 fps, 3) longer than 12 seconds, 4) without large body translation (e.g. move from the left to the right of the screen). An example is here. Rename the video to [person_id].mp4 (e.g. 1.mp4) and copy to Data subfolder.

Note: You can make a video to 25 fps by

ffmpeg -i xxx.mp4 -r 25 xxx1.mp4

1. Extract frames and lanmarks by

cd Data/
python extract_frame1.py [person_id].mp4

1. Conduct 3D face reconstruction. First should compile code in Deep3DFaceReconstruction/tf_mesh_renderer/mesh_renderer/kernels to .so, following its readme, and modify line 28 in rasterize_triangles.py to your directory. Then run

cd Deep3DFaceReconstruction/
CUDA_VISIBLE_DEVICES=0 python demo_19news.py ../Data/[person_id]

This process takes about 2 minutes on a Titan Xp.

1. Fine-tune the audio network. First modify line 28 in rasterize_triangles.py to your directory. Then run

cd Audio/code/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in Audio/model/atcnet_pose0_con3/[person_id]. This process takes about 5 minutes on a Titan Xp.

1. Fine-tune the gan network. Run

cd render-to-video/
python train_19news_1.py [person_id] [gpu_id]

The saved models are in render-to-video/checkpoints/memory_seq_p2p/[person_id]. This process takes about 40 minutes on a Titan Xp.

Test on a target peron

Place the audio file (.wav or .mp3) for test under Audio/audio/. Run [with generated poses]

cd Audio/code/
python test_personalized.py [audio] [person_id] [gpu_id]

or [with poses from short video]

cd Audio/code/
python test_personalized2.py [audio] [person_id] [gpu_id]

This program will print 'saved to xxx.mov' if the videos are successfully generated. It will output 2 movs, one is a video with face only (_full9.mov), the other is a video with background (_transbigbg.mov).

Colab

A colab demo is here.

Acknowledgments

The face reconstruction code is from Deep3DFaceReconstruction, the arcface code is from insightface, the gan code is developed based on pytorch-CycleGAN-and-pix2pix.

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Related tags

Overview

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Our Proposed Framework

Prerequisites

Getting Started

Installation

Download pre-trained models

Download face model for 3d face reconstruction

Fine-tune on a target peron's short video

Test on a target peron

Colab

Acknowledgments

Owner

Ran Yi

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Audio augmentations library for PyTorch for audio in the time-domain

Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

A Python library and tools AUCTUS A6 based radios.

Pythonic bindings for FFmpeg's libraries.

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Delta TTA(Text To Audio) SoftWare

A Music Player Bot for Discord Servers

Deep learning transformer model that generates unique music sequences.

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

Accompanying code for our paper "Point Cloud Audio Processing"

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

𝙰 𝙼𝚞𝚜𝚒𝚌 𝙱𝚘𝚝 𝙲𝚛𝚎𝚊𝚝𝚎𝚍 𝙱𝚢 𝚃𝚎𝚊𝚖𝙳𝚕𝚝 💖

SomaFM Plugin for Kodi

Audio Retrieval with Natural Language Queries: A Benchmark Study

A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Code for "Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose"

Related tags

Overview

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Our Proposed Framework

Prerequisites

Getting Started

Installation

Download pre-trained models

Download face model for 3d face reconstruction

Fine-tune on a target peron's short video

Test on a target peron

Colab

Acknowledgments

Owner

Ran Yi

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Any-to-any voice conversion using synthetic specific-speaker speeches as intermedium features

A python script that can play .mp3 URLs upon the ringing or motion detection of a Ring doorbell. The sound plays through Sonos speakers.

Audio augmentations library for PyTorch for audio in the time-domain

Just-Music - Spotify API Driven Music Web app, that allows to listen and control and share songs

A Python library and tools AUCTUS A6 based radios.

﻿﻿Pythonic bindings for FFmpeg's libraries.

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Delta TTA(Text To Audio) SoftWare

A Music Player Bot for Discord Servers

Deep learning transformer model that generates unique music sequences.

Open-Source bot to play songs in your Telegram's Group Voice Chat. Powered by @Akki_ThePro

A collection of free MIDI chords and progressions ready to be used in your DAW, Akai MPC, or Roland MC-707/101

LibXtract is a simple, portable, lightweight library of audio feature extraction functions.

Accompanying code for our paper "Point Cloud Audio Processing"

convert-to-opus-cli is a Python CLI program for converting audio files to opus audio format.

𝙰 𝙼𝚞𝚜𝚒𝚌 𝙱𝚘𝚝 𝙲𝚛𝚎𝚊𝚝𝚎𝚍 𝙱𝚢 𝚃𝚎𝚊𝚖𝙳𝚕𝚝 💖

SomaFM Plugin for Kodi

Audio Retrieval with Natural Language Queries: A Benchmark Study

A voice assistant which can handle your everyday task and allows you to book items from your favourite store!

Pythonic bindings for FFmpeg's libraries.