TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

Overview

TalkingHead-1KH Dataset

Python 3.7 License CC Format MP4 Resolution 512×512 Videos 500k

TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid:

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
Ting-Chun Wang (NVIDIA), Arun Mallya (NVIDIA), Ming-Yu Liu (NVIDIA)
https://nvlabs.github.io/face-vid2vid/
https://arxiv.org/abs/2011.15126.pdf

The dataset consists of 500k video clips, of which about 80k are greater than 512x512 resolution. Only videos under permissive licenses are included. Note that the number of videos differ from that in the original paper because a more robust preprocessing script was used to split the videos. For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

Download

Unzip the video metadata

First, unzip the metadata and put it under the root directory:

unzip data_list.zip

Unit test

This step downloads a small subset of the dataset to verify the scripts are working on your computer. You can also skip this step if you want to directly download the entire dataset.

bash videos_download_and_crop.sh small

The processed clips should appear in small/cropped_clips.

Download the entire dataset

Please run

bash videos_download_and_crop.sh train

The script will automatically download the YouTube videos, split them into short clips, and then crop and trim them to include only the face regions. The final processed clips should appear in train/cropped_clips.

Evaluation

To download the evaluation set which consists of only 1080p videos, please run

bash videos_download_and_crop.sh val

The processed clips should appear in val/cropped_clips.

We also provide the reconstruction results synthesized by our model here. For each video, we use only the first frame to reconstruct all the following frames.

Furthermore, for models trained using the VoxCeleb2 dataset, we also provide comparisons using another model trained on the VoxCeleb2 dataset. Please find the reconstruction results here.

Licenses

The individual videos were published in YouTube by their respective authors under Creative Commons BY 3.0 license. The metadata file, the download script file, the processing script file, and the documentation file are made available under MIT license. You can use, redistribute, and adapt it, as long as you (a) give appropriate credit by citing our paper, (b) indicate any changes that you've made, and (c) distribute any derivative works under the same license.

Privacy

When collecting the data, we were careful to only include videos that – to the best of our knowledge – were intended for free use and redistribution by their respective authors. That said, we are committed to protecting the privacy of individuals who do not wish their videos to be included.

If you would like to remove your video from the dataset, you can either

  1. Go to YouTube and change the license of your video, or remove your video entirely.
  2. Contact [email protected]. Please include your YouTube video link in the email.

Acknowledgements

This webpage borrows heavily from the FFHQ-dataset page.

Citation

If you use this dataset for your work, please cite

@inproceedings{wang2021facevid2vid,
  title={One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing},
  author={Ting-Chun Wang and Arun Mallya and Ming-Yu Liu},
  booktitle={CVPR},
  year={2021}
}
A simple interface for editing natural photos with generative neural networks.

Neural Photo Editor A simple interface for editing natural photos with generative neural networks. This repository contains code for the paper "Neural

Andy Brock 2.1k Dec 29, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 05, 2022
Official Repsoitory for "Mish: A Self Regularized Non-Monotonic Neural Activation Function" [BMVC 2020]

Mish: Self Regularized Non-Monotonic Activation Function BMVC 2020 (Official Paper) Notes: (Click to expand) A considerably faster version based on CU

Xa9aX ツ 1.2k Dec 29, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch 안녕하세요. 저는 TUNiB에서 머신러닝 엔지니어로 근무 중인 고현웅입니다. 이 자료는 대규모 언어모델 개발에 필요한 여러가지 기술들을 소개드리기 위해 마련하였으며 기본적으로

TUNiB 172 Dec 29, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Deploy pytorch classification model using Flask and Streamlit

Deploy pytorch classification model using Flask and Streamlit

Ben Seo 1 Nov 17, 2021
Implementation of CVAE. Trained CVAE on faces from UTKFace Dataset to produce synthetic faces with a given degree of happiness/smileyness.

Conditional Smiles! (SmileCVAE) About Implementation of AE, VAE and CVAE. Trained CVAE on faces from UTKFace Dataset. Using an encoding of the Smile-s

Raúl Ortega 3 Jan 09, 2022
Official repository of DeMFI (arXiv.)

DeMFI This is the official repository of DeMFI (Deep Joint Deblurring and Multi-Frame Interpolation). [ArXiv_ver.] Coming Soon. Reference Jihyong Oh a

Jihyong Oh 56 Dec 14, 2022
[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

GNeRF This repository contains official code for the ICCV 2021 paper: GNeRF: GAN-based Neural Radiance Field without Posed Camera. This implementation

Quan Meng 191 Dec 26, 2022
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

A repository built on the Flow software package to explore cyber-security attacks on intelligent transportation systems.

George Gunter 4 Nov 14, 2022
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
H&M Fashion Image similarity search with Weaviate and DocArray

H&M Fashion Image similarity search with Weaviate and DocArray This example shows how to do image similarity search using DocArray and Weaviate as Doc

Laura Ham 18 Aug 11, 2022
This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

AlphaRotate: A Rotation Detection Benchmark using TensorFlow Abstract AlphaRotate is maintained by Xue Yang with Shanghai Jiao Tong University supervi

yangxue 972 Jan 05, 2023
Repository for training material for the 2022 SDSC HPC/CI User Training Course

hpc-training-2022 Repository for training material for the 2022 SDSC HPC/CI Training Series HPC/CI Training Series home https://www.sdsc.edu/event_ite

sdsc-hpc-training-org 21 Jul 27, 2022
[ACM MM 2019 Oral] Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation

Contents Cycle-In-Cycle GANs Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Acknowledgments Relat

Hao Tang 67 Dec 14, 2022
Denoising Normalizing Flow

Denoising Normalizing Flow Christian Horvat and Jean-Pascal Pfister 2021 We combine Normalizing Flows (NFs) and Denoising Auto Encoder (DAE) by introd

CHrvt 17 Oct 15, 2022