Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss (ATVGnet)

Related tags

Deep LearningATVGnet
Overview

Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss (ATVGnet)

By Lele Chen , Ross K Maddox, Zhiyao Duan, Chenliang Xu.

University of Rochester.

Table of Contents

  1. Introduction
  2. Citation
  3. Running
  4. Model
  5. Results
  6. Disclaimer and known issues

Introduction

This repository contains the original models (AT-net, VG-net) described in the paper Hierarchical Cross-modal Talking Face Generation with Dynamic Pixel-wise Loss. The demo video is avaliable at https://youtu.be/eH7h_bDRX2Q. This code can be applied directly in LRW and GRID. The outputs from the model are visualized here: the first one is the synthesized landmark from ATnet, the rest of them are attention, motion map and final results from VGnet.

model model

Citation

If you use any codes, models or the ideas from this repo in your research, please cite:

@inproceedings{chen2019hierarchical,
  title={Hierarchical cross-modal talking face generation with dynamic pixel-wise loss},
  author={Chen, Lele and Maddox, Ross K and Duan, Zhiyao and Xu, Chenliang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={7832--7841},
  year={2019}
}

Running

  1. This code is tested under Python 2.7. The model we provided is trained on LRW. However, it works fine on GRID,VOXCELB and other datasets. You can directly compare this model on other dataset with your own model. We treat this as fair comparison.

  2. Pytorch environment:Pytorch 0.4.1. (conda install pytorch=0.4.1 torchvision cuda90 -c pytorch)

  3. Install requirements.txt (pip install -r requirement.txt)

  4. Download the pretrained ATnet and VGnet weights at google drive. Put the weights under model folder.

  5. Run the demo code: python demo.py

    • -device_ids: gpu id
    • -cuda: using cuda or not
    • -vg_model: pretrained VGnet weight
    • -at_model: pretrained ATnet weight
    • -lstm: use lstm or not
    • -p: input example image
    • -i: input audio file
    • -lstm: use lstm or not
    • -sample_dir: folder to save the outputs
    • ...
  6. Download and unzip the training data from LRW

  7. Preprocess the data (Extract landmark and crop the image by dlib).

  8. Train the ATnet model: python atnet.py

    • -device_ids: gpu id
    • -batch_size: batch size
    • -model_dir: folder to save weights
    • -lstm: use lstm or not
    • -sample_dir: folder to save visualized images during training
    • ...
  9. Test the model: python atnet_test.py

    • -device_ids: gpu id
    • -batch_size: batch size
    • -model_name: pretrained weights
    • -sample_dir: folder to save the outputs
    • -lstm: use lstm or not
    • ...
  10. Train the VGnet: python vgnet.py

    • -device_ids: gpu id
    • -batch_size: batch size
    • -model_dir: folder to save weights
    • -sample_dir: folder to save visualized images during training
    • ...
  11. Test the VGnet: python vgnet_test.py

    • -device_ids: gpu id
    • -batch_size: batch size
    • -model_name: pretrained weights
    • -sample_dir: folder to save the outputs
    • ...

Model

  1. Overall ATVGnet model

  2. Regresssion based discriminator network

    model

Results

  1. Result visualization on different datasets:

    visualization

  2. Reuslt compared with other SOTA methods:

    visualization

  3. The studies on image robustness respective with landmark accuracy:

    visualization

  4. Quantitative results:

    visualization

Disclaimer and known issues

  1. These codes are implmented in Pytorch.
  2. In this paper, we train LRW and GRID seperately.
  3. The model are sensitive to input images. Please use the correct preprocessing code.
  4. I didn't finish the data processing code yet. I will release it soon. But you can try the model and replace with your own image.
  5. If you want to train these models using this version of pytorch without modifications, please notice that:
    • You need at lest 12 GB GPU memory.
    • There might be some other untested issues.
  6. There is another intresting and useful research on audio to landmark genration. Please check it out at https://github.com/eeskimez/Talking-Face-Landmarks-from-Speech.

Todos

  • Release training data

License

MIT

Owner
Lele Chen
I am a Ph.D candidate in University of Rochester supervised by Prof. Chenling Xu.
Lele Chen
Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation (CVPR 2020)

Super-BPD for Fast Image Segmentation (CVPR 2020) Introduction We propose direction-based super-BPD, an alternative to superpixel, for fast generic im

189 Dec 07, 2022
Auxiliary data to the CHIIR paper Searching to Learn with Instructional Scaffolding

Searching to Learn with Instructional Scaffolding This is the data and analysis code for the paper "Searching to Learn with Instructional Scaffolding"

Arthur Câmara 2 Mar 02, 2022
Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Attention-based Transformation from Latent Features to Point Clouds This repository contains a PyTorch implementation of the paper: Attention-based Tr

12 Nov 11, 2022
The project of phase's key role in complex and real NN

Phase-in-NN This is the code for our project at Princeton (co-authors: Yuqi Nie, Hui Yuan). The paper title is: "Neural Network is heterogeneous: Phas

YuqiNie-lab 1 Nov 04, 2021
Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition - NeurIPS2021

Neural-PIL: Neural Pre-Integrated Lighting for Reflectance Decomposition Project Page | Video | Paper Implementation for Neural-PIL. A novel method wh

Computergraphics (University of Tübingen) 64 Dec 29, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 01, 2023
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time

T-LOAM: Truncated Least Squares Lidar-only Odometry and Mapping in Real-Time The first Lidar-only odometry framework with high performance based on tr

Pengwei Zhou 183 Dec 01, 2022
Open source code for Paper "A Co-Interactive Transformer for Joint Slot Filling and Intent Detection"

A Co-Interactive Transformer for Joint Slot Filling and Intent Detection This repository contains the PyTorch implementation of the paper: A Co-Intera

67 Dec 05, 2022
An efficient PyTorch implementation of the evaluation metrics in recommender systems.

recsys_metrics An efficient PyTorch implementation of the evaluation metrics in recommender systems. Overview • Installation • How to use • Benchmark

Xingdong Zuo 12 Dec 02, 2022
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 03, 2023
How to Train a GAN? Tips and tricks to make GANs work

(this list is no longer maintained, and I am not sure how relevant it is in 2020) How to Train a GAN? Tips and tricks to make GANs work While research

Soumith Chintala 10.8k Dec 31, 2022
Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices Abstract For practical deep neural network design on mobile devices, it is e

11 Dec 30, 2022
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Detectron is deprecated. Please see detectron2, a ground-up rewrite of Detectron in PyTorch. Detectron Detectron is Facebook AI Research's software sy

Facebook Research 25.5k Jan 07, 2023
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

Hyeonwoo Kang 2.4k Dec 31, 2022
E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

E2EDNA2 - An automated pipeline for simulation of DNA aptamers complexed with small molecules and short peptides

11 Nov 08, 2022
This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

Data-Science-Intern-Challenge This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge. Summer 2022 Data Science Inte

1 Jan 11, 2022
Software & Hardware to do multi color printing with Sharpies

3D Print Colorizer is a combination of 3D printed parts and a Cura plugin which allows anyone with an Ender 3 like 3D printer to produce multi colored

343 Jan 06, 2023