Official code for CVPR2022 paper: Depth-Aware Generative Adversarial Network for Talking Head Video Generation

Overview

📖 Depth-Aware Generative Adversarial Network for Talking Head Video Generation (CVPR 2022)

đŸ”Ĩ If DaGAN is helpful in your photos/projects, please help to ⭐ it or recommend it to your friends. Thanks đŸ”Ĩ

[Paper]   [Project Page]   [Demo]   [Poster Video]

Fa-Ting Hong, Longhao Zhang, Li Shen, Dan Xu
The Hong Kong University of Science and Technology

Cartoon Sample

cartoon.mp4

Human Sample

celeb.mp4

Voxceleb1 Dataset

🚩 Updates

  • đŸ”Ĩ đŸ”Ĩ ✅ May 19, 2022: The depth face model trained on Voxceleb2 is released! (The corresponding checkpoint of DaGAN will release soon). Click the LINK

  • đŸ”Ĩ đŸ”Ĩ ✅ April 25, 2022: Integrated into Huggingface Spaces 🤗 using Gradio. Try out the web demo: Hugging Face Spaces (GPU version will come soon!)

  • đŸ”Ĩ đŸ”Ĩ ✅ Add SPADE model, which produces more natural results.

🔧 Dependencies and Installation

Installation

We now provide a clean version of DaGAN, which does not require customized CUDA extensions.

  1. Clone repo

    git clone https://github.com/harlanhong/CVPR2022-DaGAN.git
    cd CVPR2022-DaGAN
  2. Install dependent packages

    pip install -r requirements.txt
    
    ## Install the Face Alignment lib
    cd face-alignment
    pip install -r requirements.txt
    python setup.py install

⚡ Quick Inference

We take the paper version for an example. More models can be found here.

YAML configs

See config/vox-adv-256.yaml to get description of each parameter.

Pre-trained checkpoint

The pre-trained checkpoint of face depth network and our DaGAN checkpoints can be found under following link: OneDrive.

Inference! To run a demo, download checkpoint and run the following command:

CUDA_VISIBLE_DEVICES=0 python demo.py  --config config/vox-adv-256.yaml --driving_video path/to/driving --source_image path/to/source --checkpoint path/to/checkpoint --relative --adapt_scale --kp_num 15 --generator DepthAwareGenerator 

The result will be stored in result.mp4. The driving videos and source images should be cropped before it can be used in our method. To obtain some semi-automatic crop suggestions you can use python crop-video.py --inp some_youtube_video.mp4. It will generate commands for crops using ffmpeg.

đŸ’ģ Training

Datasets

  1. VoxCeleb. Please follow the instruction from https://github.com/AliaksandrSiarohin/video-preprocessing.

Train on VoxCeleb

To train a model on specific dataset run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --master_addr="0.0.0.0" --master_port=12348 run.py --config config/vox-adv-256.yaml --name DaGAN --rgbd --batchsize 12 --kp_num 15 --generator DepthAwareGenerator

The code will create a folder in the log directory (each run will create a new name-specific directory). Checkpoints will be saved to this folder. To check the loss values during training see log.txt. By default the batch size is tunned to run on 8 GeForce RTX 3090 gpu (You can obtain the best performance after about 150 epochs). You can change the batch size in the train_params in .yaml file.

🚩 Please use multiple GPUs to train your own model, if you use only one GPU, you would meet the inplace problem.

Also, you can watch the training loss by running the following command:

tensorboard --logdir log/DaGAN/log

When you kill your process for some reasons in the middle of training, a zombie process may occur, you can kill it using our provided tool:

python kill_port.py PORT

Training on your own dataset

  1. Resize all the videos to the same size e.g 256x256, the videos can be in '.gif', '.mp4' or folder with images. We recommend the later, for each video make a separate folder with all the frames in '.png' format. This format is loss-less, and it has better i/o performance.

  2. Create a folder data/dataset_name with 2 subfolders train and test, put training videos in the train and testing in the test.

  3. Create a config config/dataset_name.yaml, in dataset_params specify the root dir the root_dir: data/dataset_name. Also adjust the number of epoch in train_params.

📜 Acknowledgement

Our DaGAN implementation is inspired by FOMM. We appreciate the authors of FOMM for making their codes available to public.

📜 BibTeX

@inproceedings{hong2022depth,
            title={Depth-Aware Generative Adversarial Network for Talking Head Video Generation},
            author={Hong, Fa-Ting and Zhang, Longhao and Shen, Li and Xu, Dan},
            journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
            year={2022}
          }

📧 Contact

If you have any question, please email [email protected].

Enhancing Knowledge Tracing via Adversarial Training

Enhancing Knowledge Tracing via Adversarial Training This repository contains source code for the paper "Enhancing Knowledge Tracing via Adversarial T

Xiaopeng Guo 14 Oct 24, 2022
💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

A 3D Generative Model for Structure-Based Drug Design Coming soon... Citation @inproceedings{luo2021sbdd, title={A 3D Generative Model for Structu

Shitong Luo 118 Jan 05, 2023
Rendering color and depth images for ShapeNet models.

Color & Depth Renderer for ShapeNet This library includes the tools for rendering multi-view color and depth images of ShapeNet models. Physically bas

Yinyu Nie 41 Dec 19, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Deep Surface Reconstruction from Point Clouds with Visibility Information

Data, code and pretrained models for the paper Deep Surface Reconstruction from Point Clouds with Visibility Information.

Raphael Sulzer 23 Jan 04, 2023
A curated list of automated deep learning (including neural architecture search and hyper-parameter optimization) resources.

Awesome AutoDL A curated list of automated deep learning related resources. Inspired by awesome-deep-vision, awesome-adversarial-machine-learning, awe

D-X-Y 2k Dec 30, 2022
Multispectral Object Detection with Yolov5

Multispectral-Object-Detection Intro Official Code for Cross-Modality Fusion Transformer for Multispectral Object Detection. Multispectral Object Dete

Richard Fang 121 Jan 01, 2023
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork įŽ€äŊ“中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

567 Dec 26, 2022
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022
(ICCV 2021 Oral) Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation.

DARS Code release for the paper "Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation", ICCV 2021

CVMI Lab 58 Jan 01, 2023
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 03, 2023
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 13 Nov 29, 2022
End-to-end beat and downbeat tracking in the time domain.

WaveBeat End-to-end beat and downbeat tracking in the time domain. | Paper | Code | Video | Slides | Setup First clone the repo. git clone https://git

Christian J. Steinmetz 60 Dec 24, 2022
Multi-view 3D reconstruction using neural rendering. Unofficial implementation of UNISURF, VolSDF, NeuS and more.

Volume rendering + 3D implicit surface Showcase What? previous: surface rendering; now: volume rendering previous: NeRF's volume density; now: implici

Jianfei Guo 682 Jan 04, 2023
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021
tinykernel - A minimal Python kernel so you can run Python in your Python

tinykernel - A minimal Python kernel so you can run Python in your Python

fast.ai 37 Dec 02, 2022
For medical image segmentation

LeViT_UNet For medical image segmentation Our model is based on LeViT (https://github.com/facebookresearch/LeViT). You'd better gitclone its codes. Th

13 Dec 24, 2022
Code repository for our paper regarding the L3D dataset.

The Large Labelled Logo Dataset (L3D): A Multipurpose and Hand-Labelled Continuously Growing Dataset Website: https://lhf-labs.github.io/tm-dataset Da

LHF Labs 9 Dec 14, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022