PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Related tags

Deep Learningd2c
Overview

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation

Project | Paper

Open In Collab

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Abhishek Sinha*, Jiaming Song*, Chenlin Meng, Stefano Ermon

Stanford University

Overview

Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. By learning from as few as 100 labeled examples, D2C can be used to generate images with a certain label or manipulate an existing image to contain a certain label. Compared with state-of-the-art StyleGAN2 methods, D2C is able to manipulate certain attributes efficiently while keeping the other details intact.

Here are some example for image manipulation. You can see more results here.

Attribute Original D2C StyleGAN2 NVAE DDIM
Blond
Red Lipstick
Beard

Getting started

The code has been tested on PyTorch 1.9.1 (CUDA 10.2).

To use the checkpoints, download the checkpoints from this link, under the checkpoints/ directory.

# Requires gdown >= 4.2.0, install with pip
gdown https://drive.google.com/drive/u/1/folders/1DvApt-uO3uMRhFM3eIqPJH-HkiEZC1Ru -O ./ --folder

Examples

The main.py file provides some basic scripts to perform inference on the checkpoints.

We will release training code soon on a separate repo, as the GPU memory becomes a bottleneck if we train the model jointly.

Example to perform image manipulation:

  • Red lipstick
python main.py ffhq_256 manipulation --d2c_path checkpoints/ffhq_256/model.ckpt --boundary_path checkpoints/ffhq_256/red_lipstick.ckpt --step 10 --image_dir images/red_lipstick --save_location results/red_lipstick
  • Beard
python main.py ffhq_256 manipulation --d2c_path checkpoints/ffhq_256/model.ckpt --boundary_path checkpoints/ffhq_256/beard.ckpt --step 20 --image_dir images/beard --save_location results/beard
  • Blond
python main.py ffhq_256 manipulation --d2c_path checkpoints/ffhq_256/model.ckpt --boundary_path checkpoints/ffhq_256/blond.ckpt --step -15 --image_dir images/blond --save_location results/blond

Example to perform unconditional image generation:

python main.py ffhq_256 sample_uncond --d2c_path checkpoints/ffhq_256/model.ckpt --skip 100 --save_location results/uncond_samples

Extensions

We implement a D2C class here that contains an autoencoder and a diffusion latent model. See code structure here.

Useful functions include: image_to_latent, latent_to_image, sample_latent, manipulate_latent, postprocess_latent, which are also called in main.py.

Todo

  • Release checkpoints and models for other datasets.
  • Release code for conditional generation.
  • Release training code and procedure to convert into inference model.
  • Train on higher resolution images.

References and Acknowledgements

If you find this repository useful for your research, please cite our work.

@inproceedings{sinha2021d2c,
  title={D2C: Diffusion-Denoising Models for Few-shot Conditional Generation},
  author={Sinha*, Abhishek and Song*, Jiaming and Meng, Chenlin and Ermon, Stefano},
  year={2021},
  month={December},
  abbr={NeurIPS 2021},
  url={https://arxiv.org/abs/2106.06819},
  booktitle={Neural Information Processing Systems},
  html={https://d2c-model.github.io}
}

This implementation is based on:

Owner
Jiaming Song
PhD @ Stanford CS. My Chinese name is Jiaming Song (宋佳铭). I also go by the name Tony.
Jiaming Song
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Attentional Focus Modulates Automatic Finger‑tapping Movements

"Attentional Focus Modulates Automatic Finger‑tapping Movements", in Scientific Reports

Xingxun Jiang 1 Dec 02, 2021
Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

🍐 quince Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding 🍐 Installation $ git clone

Andrew Jesson 19 Jun 23, 2022
Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the Machine Learning 4 Health Workshop

Detection-aided liver lesion segmentation Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the

Image Processing Group - BarcelonaTECH - UPC 96 Oct 26, 2022
Super-Fast-Adversarial-Training - A PyTorch Implementation code for developing super fast adversarial training

Super-Fast-Adversarial-Training This is a PyTorch Implementation code for develo

LBK 26 Dec 02, 2022
RL algorithm PPO and IRL algorithm AIRL written with Tensorflow.

RL algorithm PPO and IRL algorithm AIRL written with Tensorflow. They have a parallel sampling feature in order to increase computation speed (especially in high-performance computing (HPC)).

Fangjian Li 3 Dec 28, 2021
Code repository for the paper "Tracking People with 3D Representations"

Tracking People with 3D Representations Code repository for the paper "Tracking People with 3D Representations" (paper link) (project site). Jathushan

Jathushan Rajasegaran 77 Dec 03, 2022
Code for SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations

The Second Situated Interactive MultiModal Conversations (SIMMC 2.0) Challenge 2021 Welcome to the Second Situated Interactive Multimodal Conversation

Facebook Research 81 Nov 22, 2022
Permeability Prediction Via Multi Scale 3D CNN

Permeability-Prediction-Via-Multi-Scale-3D-CNN Data: The raw CT rock cores are obtained from the Imperial Colloge portal. The CT rock cores are sub-sa

Mohamed Elmorsy 2 Jul 06, 2022
A Unified Framework and Analysis for Structured Knowledge Grounding

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu

HKU NLP Group 370 Dec 21, 2022
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations

SDEdit: Image Synthesis and Editing with Stochastic Differential Equations Project | Paper | Colab PyTorch implementation of SDEdit: Image Synthesis a

536 Jan 05, 2023
Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

Towhee is a flexible machine learning framework currently focused on computing deep learning embeddings over unstructured data.

1.7k Jan 08, 2023
Code for paper [ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot] (ICCV 2021, oral))

ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot This repository is the official PyTorch implementation of ICCV-21 pape

Jiarui 21 May 09, 2022
MIMIC Code Repository: Code shared by the research community for the MIMIC-III database

MIMIC Code Repository The MIMIC Code Repository is intended to be a central hub for sharing, refining, and reusing code used for analysis of the MIMIC

MIT Laboratory for Computational Physiology 1.8k Dec 26, 2022
Explanatory Learning: Beyond Empiricism in Neural Networks

Explanatory Learning This is the official repository for "Explanatory Learning: Beyond Empiricism in Neural Networks". Datasets Download the datasets

GLADIA Research Group 10 Dec 06, 2022
Really awesome semantic segmentation

really-awesome-semantic-segmentation A list of all papers on Semantic Segmentation and the datasets they use. This site is maintained by Holger Caesar

Holger Caesar 400 Nov 28, 2022
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

ResT By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software Technology at Nanjing University] This repo is the official implement

zhql 222 Dec 13, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022