Code for Talk-to-Edit (ICCV2021). Paper: Talk-to-Edit: Fine-Grained Facial Editing via Dialog.

Overview

Talk-to-Edit (ICCV2021)

Python 3.7 pytorch 1.6.0

This repository contains the implementation of the following paper:

Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang, Ziqi Huang, Xingang Pan, Chen Change Loy, Ziwei Liu
IEEE International Conference on Computer Vision (ICCV), 2021

[Paper] [Project Page] [CelebA-Dialog Dataset]

Overview

overall_structure

Dependencies and Installation

  1. Clone Repo

    git clone [email protected]:yumingj/Talk-to-Edit.git
  2. Create Conda Environment and Install Dependencies

    conda env create -f environment.yml
    conda activate talk_edit
    • Python >= 3.7
    • PyTorch >= 1.6
    • CUDA 10.1
    • GCC 5.4.0

Get Started

Editing

We provide scripts for editing using our pretrained models.

  1. First, download the pretrained models from this link and put them under ./download/pretrained_models as follows:

    ./download/pretrained_models
    ├── 1024_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── 128_field
    │   ├── Bangs.pth
    │   ├── Eyeglasses.pth
    │   ├── No_Beard.pth
    │   ├── Smiling.pth
    │   └── Young.pth
    ├── arcface_resnet18_110.pth
    ├── language_encoder.pth.tar
    ├── predictor_1024.pth.tar
    ├── predictor_128.pth.tar
    ├── stylegan2_1024.pth
    ├── stylegan2_128.pt
    ├── StyleGAN2_FFHQ1024_discriminator.pth
    └── eval_predictor.pth.tar
    
  2. You can try pure image editing without dialog instructions:

    python editing_wo_dialog.py \
       --opt ./configs/editing/editing_wo_dialog.yml \
       --attr 'Bangs' \
       --target_val 5

    The editing results will be saved in ./results.

    You can change attr to one of the following attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age). And the target_val can be [0, 1, 2, 3, 4, 5].

  3. You can also try dialog-based editing, where you talk to the system through the command prompt:

    python editing_with_dialog.py --opt ./configs/editing/editing_with_dialog.yml

    The editing results will be saved in ./results.

    How to talk to the system:

    • Our system is able to edit five facial attributes: Bangs, Eyeglasses, Beard, Smiling, and Young(i.e. Age).
    • When prompted with "Enter your request (Press enter when you finish):", you can enter an editing request about one of the five attributes. For example, you can say "Make the bangs longer."
    • To respond to the system's feedback, just talk as if you were talking to a real person. For example, if the system asks "Is the length of the bangs just right?" after one round of editing, You can say things like "Yes." / "No." / "Yes, and I also want her to smile more happily.".
    • To end the conversation, just tell the system things like "That's all" / "Nothing else, thank you."
  4. By default, the above editing would be performed on the teaser image. You may change the image to be edited in two ways: 1) change line 11: latent_code_index to other values ranging from 0 to 99; 2) set line 10: latent_code_path to ~, so that an image would be randomly generated.

  5. If you want to try editing on real images, you may download the real images from this link and put them under ./download/real_images. You could also provide other real images at your choice. You need to change line 12: img_path in editing_with_dialog.yml or editing_wo_dialog.yml according to the path to the real image and set line 11: is_real_image as True.

  6. You can switch the default image size to 128 x 128 by setting line 3: img_res to 128 in config files.

Train the Semantic Field

  1. To train the Semantic Field, a number of sampled latent codes should be prepared and then we use the attribute predictor to predict the facial attributes for their corresponding images. The attribute predictor is trained using fine-grained annotations in CelebA-Dialog dataset. Here, we provide the latent codes we used. You can download the train data from this link and put them under ./download/train_data as follows:

    ./download/train_data
    ├── 1024
    │   ├── Bangs
    │   ├── Eyeglasses
    │   ├── No_Beard
    │   ├── Smiling
    │   └── Young
    └── 128
        ├── Bangs
        ├── Eyeglasses
        ├── No_Beard
        ├── Smiling
        └── Young
    
  2. We will also use some editing latent codes to monitor the training phase. You can download the editing latent code from this link and put them under ./download/editing_data as follows:

    ./download/editing_data
    ├── 1024
    │   ├── Bangs.npz.npy
    │   ├── Eyeglasses.npz.npy
    │   ├── No_Beard.npz.npy
    │   ├── Smiling.npz.npy
    │   └── Young.npz.npy
    └── 128
        ├── Bangs.npz.npy
        ├── Eyeglasses.npz.npy
        ├── No_Beard.npz.npy
        ├── Smiling.npz.npy
        └── Young.npz.npy
    
  3. All logging files in the training process, e.g., log message, checkpoints, and snapshots, will be saved to ./experiments and ./tb_logger directory.

  4. There are 10 configuration files under ./configs/train, named in the format of field_<IMAGE_RESOLUTION>_<ATTRIBUTE_NAME>. Choose the corresponding configuration file for the attribute and resolution you want.

  5. For example, to train the semantic field which edits the attribute Bangs in 128x128 image resolution, simply run:

    python train.py --opt ./configs/train/field_128_Bangs.yml

Quantitative Results

We provide codes for quantitative results shown in Table 1. Here we use Bangs in 128x128 resolution as an example.

  1. Use the trained semantic field to edit images.

    python editing_quantitative.py \
    --opt ./configs/train/field_128_bangs.yml \
    --pretrained_path ./download/pretrained_models/128_field/Bangs.pth
  2. Evaluate the edited images using quantitative metircs. Change image_num for different attribute accordingly: Bangs: 148, Eyeglasses: 82, Beard: 129, Smiling: 140, Young: 61.

    python quantitative_results.py \
    --attribute Bangs \
    --work_dir ./results/field_128_bangs \
    --image_dir ./results/field_128_bangs/visualization \
    --image_num 148

Qualitative Results

result

CelebA-Dialog Dataset

result

Our CelebA-Dialog Dataset is available at link.

CelebA-Dialog is a large-scale visual-language face dataset with the following features:

  • Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning.
  • Accompanied with each image, there are captions describing the attributes and a user request sample.

result

The dataset can be employed as the training and test sets for the following computer vision tasks: fine-grained facial attribute recognition, fine-grained facial manipulation, text-based facial generation and manipulation, face image captioning, and broader natural language based facial recognition and manipulation tasks.

Citation

If you find our repo useful for your research, please consider citing our paper:

@InProceedings{jiang2021talkedit,
  author = {Jiang, Yuming and Huang, Ziqi and Pan, Xingang and Loy, Chen Change and Liu, Ziwei},
  title = {Talk-to-Edit: Fine-Grained Facial Editing via Dialog},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

Contact

If you have any question, please feel free to contact us via [email protected] or [email protected].

Acknowledgement

The codebase is maintained by Yuming Jiang and Ziqi Huang.

Part of the code is borrowed from stylegan2-pytorch, IEP and face-attribute-prediction.

Owner
Yuming Jiang
[email protected], Ph.D. Student
Yuming Jiang
PenguinSpeciesPredictionML - Basic model to predict Penguin species based on beak size and sex.

Penguin Species Prediction (ML) 🐧 👨🏽‍💻 What? 💻 This project is a basic model using sklearn methods to predict Penguin species based on beak size

Tucker Paron 0 Jan 08, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Tools for the Cleveland State Human Motion and Control Lab

Introduction This is a collection of tools that are helpful for gait analysis. Some are specific to the needs of the Human Motion and Control Lab at C

CSU Human Motion and Control Lab 88 Dec 16, 2022
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Facebook Research 338 Dec 29, 2022
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022
Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics

Diffusion Probabilistic Models This repository provides a reference implementation of the method described in the paper: Deep Unsupervised Learning us

Jascha Sohl-Dickstein 238 Jan 02, 2023
Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Query Variation Generators This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelin

Gustavo Penha 12 Nov 20, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Semantic Diversity Learning for Zero-Shot Multi-label Classification Paper Official PyTorch Implementation Avi Ben-Cohen, Nadav Zamir, Emanuel Ben Bar

28 Aug 29, 2022
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation.

ENet This work has been published in arXiv: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. Packages: train contains too

e-Lab 344 Nov 21, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2

CoaDTI Multi-modal co-attention for drug-target interaction annotation and Its Application to SARS-CoV-2 Abstract Environment The test was conducted i

Layne_Huang 7 Nov 14, 2022
WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking [Paper Link] Abstract In this work, we contribute a new million-scale Un

25 Jan 01, 2023
JAXDL: JAX (Flax) Deep Learning Library

JAXDL: JAX (Flax) Deep Learning Library Simple and clean JAX/Flax deep learning algorithm implementations: Soft-Actor-Critic (arXiv:1812.05905) Transf

Patrick Hart 4 Nov 27, 2022
Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging"

Deep Optics for Single-shot High-dynamic-range Imaging Code associated with the paper "Deep Optics for Single-shot High-dynamic-range Imaging" CVPR, 2

Stanford Computational Imaging Lab 40 Dec 12, 2022
A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

196 Jan 05, 2023
Human motion synthesis using Unity3D

Human motion synthesis using Unity3D Prerequisite: Software: amc2bvh.exe, Unity 2017, Blender. Unity: RockVR (Video Capture), scenes, character models

Hao Xu 9 Jun 01, 2022
An educational resource to help anyone learn deep reinforcement learning.

Status: Maintenance (expect bug fixes and minor updates) Welcome to Spinning Up in Deep RL! This is an educational resource produced by OpenAI that ma

OpenAI 7.6k Jan 09, 2023
To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts

JaxTon 💯 JAX exercises Mission 🚀 To provide 100 JAX exercises over different sections structured as a course or tutorials to teach and learn for beg

Rohan Rao 512 Jan 01, 2023