A new play-and-plug method of controlling an existing generative model with conditioning attributes and their compositions.

Related tags

Deep LearningLACE
Overview

Controllable and Compositional Generation with Latent-Space Energy-Based Models

Python 3.8 pytorch 1.7.1 Torchdiffeq 0.2.1

Teaser image Teaser image

Official PyTorch implementation of the NeurIPS 2021 paper:
Controllable and Compositional Generation with Latent-Space Energy-Based Models
Weili Nie, Arash Vahdat, Anima Anandkumar
https://nvlabs.github.io/LACE

Abstract: Controllable generation is one of the key requirements for successful adoption of deep generative models in real-world applications, but it still remains as a great challenge. In particular, the compositional ability to generate novel concept combinations is out of reach for most current models. In this work, we use energy-based models (EBMs) to handle compositional generation over a set of attributes. To make them scalable to high-resolution image generation, we introduce an EBM in the latent space of a pre-trained generative model such as StyleGAN. We propose a novel EBM formulation representing the joint distribution of data and attributes together, and we show how sampling from it is formulated as solving an ordinary differential equation (ODE). Given a pre-trained generator, all we need for controllable generation is to train an attribute classifier. Sampling with ODEs is done efficiently in the latent space and is robust to hyperparameters. Thus, our method is simple, fast to train, and efficient to sample. Experimental results show that our method outperforms the state-of-the-art in both conditional sampling and sequential editing. In compositional generation, our method excels at zero-shot generation of unseen attribute combinations. Also, by composing energy functions with logical operators, this work is the first to achieve such compositionality in generating photo-realistic images of resolution 1024x1024.

Requirements

  • Linux and Windows are supported, but we recommend Linux for performance and compatibility reasons.
  • 1 high-end NVIDIA GPU with at least 24 GB of memory. We have done all testing and development using a single NVIDIA V100 GPU with memory size 32 GB.
  • 64-bit Python 3.8.
  • CUDA=10.0 and docker must be installed first.
  • Installation of the required library dependencies with Docker:
    docker build -f lace-cuda-10p0.Dockerfile --tag=lace-cuda-10-0:0.0.1 .
    docker run -it -d --gpus 0 --name lace --shm-size 8G -v $(pwd):/workspace -p 5001:6006 lace-cuda-10-0:0.0.1
    docker exec -it lace bash

Experiments on CIFAR-10

The CIFAR10 folder contains the codebase to get the main results on the CIFAR-10 dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., the latent code and label pairs) from here and unzip it to the CIFAR10 folder. Or you can go to the folder CIFAR10/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

In the script run_clf.sh, the variable x can be specified to w or z, representing that the latent classifier is trained in the w-space or z-space of StyleGAN, respectively.

Sampling

To get the conditional sampling results with the ODE or Langevin dynamics (LD) sampler, you can run:

# ODE
bash scripts/run_cond_ode_sample.sh

# LD
bash scripts/run_cond_ld_sample.sh

By default, we set x to w, meaning we use the w-space classifier, because we find our method works the best in w-space. You can change the value of x to z or i to use the classifier in z-space or pixel space, for a comparison.

To compute the conditional accuracy (ACC) and FID scores in conditional sampling with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_score.sh

# LD
bash scripts/run_cond_ld_score.sh

Note that:

  1. For the ACC evaluation, you need a pre-trained image classifier, which can be downloaded as instructed here;

  2. For the FID evaluation, you need to have the FID reference statistics computed beforehand. You can go to the folder CIFAR10/prepare_data and follow the instructions to compute the FID reference statistics with real images sampled from CIFAR-10.

Experiments on FFHQ

The FFHQ folder contains the codebase for getting the main results on the FFHQ dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here (originally from StyleFlow) and unzip it to the FFHQ folder.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., glasses) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

First, you have to get the pre-trained StyleGAN2 (config-f) by following the instructions in Convert StyleGAN2 weight from official checkpoints.

Conditional sampling

To get the conditional sampling results with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_sample.sh

# LD
bash scripts/run_cond_ld_sample.sh

To compute the conditional accuracy (ACC) and FID scores in conditional sampling with the ODE or LD sampler, you can run:

# ODE
bash scripts/run_cond_ode_score.sh

# LD
bash scripts/run_cond_ld_score.sh

Note that:

  1. For the ACC evaluation, you need to train an FFHQ image classifier, as instructed here;

  2. For the FID evaluation, you need to have the FID reference statistics computed beforehand. You can go to the folder FFHQ/prepare_models_data and follow the instructions to compute the FID reference statistics with the StyleGAN generated FFHQ images.

Sequential editing

To get the qualitative and quantitative results of sequential editing, you can run:

# User-specified sampling
bash scripts/run_seq_edit_sample.sh

# ACC and FID
bash scripts/run_seq_edit_score.sh

Note that:

  • Similarly, you first need to train an FFHQ image classifier and get the FID reference statics to compute ACC and FID score by following the instructions, respectively.

  • To get the face identity preservation (ID) score, you first need to download the pre-trained ArcFace network, which is publicly available here, to the folder FFHQ/pretrained/metrics.

Compositional Generation

To get the results of zero-shot generation on novel attribute combinations, you can run:

bash scripts/run_zero_shot.sh

To get the results of compositions of energy functions with logical operators, we run:

bash scripts/run_combine_energy.sh

Experiments on MetFaces

The MetFaces folder contains the codebase for getting the main results on the MetFaces dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here and unzip it to the MetFaces folder. Or you can go to the folder MetFaces/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., yaw) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

To get the conditional sampling and sequential editing results, you can run:

# conditional sampling
bash scripts/run_cond_sample.sh

# sequential editing
bash scripts/run_seq_edit_sample.sh

Experiments on AFHQ-Cats

The AFHQ folder contains the codebase for getting the main results on the AFHQ-Cats dataset, where the scripts folder contains the necessary bash scripts to run the code.

Data preparation

Before running the code, you have to download the data (i.e., 10k pairs of latent variables and labels) from here and unzip it to the AFHQ folder. Or you can go to the folder AFHQ/prepare_data and follow the instructions to generate the data.

Training

To train the latent classifier, you can run:

bash scripts/run_clf.sh

Note that each att_name (i.e., breeds) in run_clf.sh corresponds to a separate attribute classifier.

Sampling

To get the conditional sampling and sequential editing results, you can run:

# conditional sampling
bash scripts/run_cond_sample.sh

# sequential editing
bash scripts/run_seq_edit_sample.sh

License

Please check the LICENSE file. This work may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact [email protected].

Citation

Please cite our paper, if you happen to use this codebase:

@inproceedings{nie2021controllable,
  title={Controllable and compositional generation with latent-space energy-based models},
  author={Nie, Weili and Vahdat, Arash and Anandkumar, Anima},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2021}
}
Owner
NVIDIA Research Projects
NVIDIA Research Projects
Key information extraction from invoice document with Graph Convolution Network

Key Information Extraction from Scanned Invoices Key information extraction from invoice document with Graph Convolution Network Related blog post fro

Phan Hoang 39 Dec 16, 2022
Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

RTFM This repo contains the Pytorch implementation of our paper: Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Lear

Yu Tian 242 Jan 08, 2023
Code for AutoNL on ImageNet (CVPR2020)

Neural Architecture Search for Lightweight Non-Local Networks This repository contains the code for CVPR 2020 paper Neural Architecture Search for Lig

Yingwei Li 104 Aug 31, 2022
Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021

Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs, ICCV 2021 Global Pooling, More than Meets the Eye: Posi

Md Amirul Islam 32 Apr 24, 2022
D2Go is a toolkit for efficient deep learning

D2Go D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms. W

Facebook Research 744 Jan 04, 2023
PyTorch code to run synthetic experiments.

Code repository for Invariant Risk Minimization Source code for the paper: @article{InvariantRiskMinimization, title={Invariant Risk Minimization}

Facebook Research 345 Dec 12, 2022
Patch-Diffusion Code (AAAI2022)

Patch-Diffusion This is an official PyTorch implementation of "Patch Diffusion: A General Module for Face Manipulation Detection" in AAAI2022. Require

H 7 Nov 02, 2022
Syed Waqas Zamir 906 Dec 30, 2022
Tgbox-bench - Simple TGBOX upload speed benchmark

TGBOX Benchmark This script will benchmark upload speed to TGBOX storage. Build

Non 1 Jan 09, 2022
Official Implementation for the paper DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification

DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover’s Distance Improves Out-Of-Distribution Face Identification Official Implementation for the pape

Anh M. Nguyen 36 Dec 28, 2022
GLM (General Language Model)

GLM GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language underst

THUDM 421 Jan 04, 2023
A graph adversarial learning toolbox based on PyTorch and DGL.

GraphWar: Arms Race in Graph Adversarial Learning NOTE: GraphWar is still in the early stages and the API will likely continue to change. 🚀 Installat

Jintang Li 54 Jan 05, 2023
ICCV2021 - Mining Contextual Information Beyond Image for Semantic Segmentation

Introduction The official repository for "Mining Contextual Information Beyond Image for Semantic Segmentation". Our full code has been merged into ss

55 Nov 09, 2022
A Fast Knowledge Distillation Framework for Visual Recognition

FKD: A Fast Knowledge Distillation Framework for Visual Recognition Official PyTorch implementation of paper A Fast Knowledge Distillation Framework f

Zhiqiang Shen 129 Dec 24, 2022
Breast Cancer Classification Model is applied on a different dataset

Breast Cancer Classification Model is applied on a different dataset

1 Feb 04, 2022
Social Distancing Detector

Computer vision has opened up a lot of opportunities to explore into AI domain that were earlier highly limited. Here is an application of haarcascade classifier and OpenCV to develop a social distan

Ashish Pandey 2 Jul 18, 2022
Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

Marius 107 Jan 03, 2023
Recognize Handwritten Digits using Deep Learning on the browser itself.

MNIST on the Web An attempt to predict MNIST handwritten digits from my PyTorch model from the browser (client-side) and not from the server, with the

Harjyot Bagga 7 May 28, 2022
MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

MoCoPnet: Exploring Local Motion and Contrast Priors for Infrared Small Target Super-Resolution Pytorch implementation of local motion and contrast pr

Xinyi Ying 28 Dec 15, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022