Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"

Overview

The Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more" Arxiv preprint

Louay Hazami   ·   Rayhane Mama   ·   Ragavan Thurairatnam


MIT license PWC PWC PWC PWC PWC PWC PWC PWC

Efficient-VDVAE is a memory and compute efficient very deep hierarchical VAE. It converges faster and is more stable than current hierarchical VAE models. It also achieves SOTA likelihood-based performance on several image datasets.

Pre-trained model checkpoints

We provide checkpoints of pre-trained models on MNIST, CIFAR-10, Imagenet 32x32, Imagenet 64x64, CelebA 64x64, CelebAHQ 256x256 (5-bits and 8-bits), FFHQ 256x256 (5-bits and 8bits), CelebAHQ 1024x1024 and FFHQ 1024x1024 in the links in the table below. All provided models are the ones trained for table 4 of the paper.

Dataset Pytorch JAX Negative ELBO
Logs Checkpoints Logs Checkpoints
MNIST link link link link 79.09 nats
CIFAR-10 Queued Queued link link 2.87 bits/dim
Imagenet 32x32 link link link link 3.58 bits/dim
Imagenet 64x64 link link link link 3.30 bits/dim
CelebA 64x64 link link link link 1.83 bits/dim
CelebAHQ 256x256 (5-bits) link link link link 0.51 bits/dim
CelebAHQ 256x256 (8-bits) link link link link 1.35 bits/dim
FFHQ 256x256 (5-bits) link link link link 0.53 bits/dim
FFHQ 256x256 (8-bits) link link link link 2.17 bits/dim
CelebAHQ 1024x1024 link link link link 1.01 bits/dim
FFHQ 1024x1024 link link link link 2.30 bits/dim

Notes:

  • Downloading from the "Checkpoints" link will download the minimal required files to resume training/do inference. The minimal files are the model checkpoint file and the saved hyper-parameters of the run (explained further below).
  • Downloading from the "Logs" link will download additional pre-training logs such as tensorboard files or saved images from training. "Logs" also holds the saved hyper-parameters of the run.
  • Downloaded "Logs" and/or "Checkpoints" should be always unzipped in their implementation folder (efficient_vdvae_torch for Pytorch checkpoints and efficient_vdvae_jax for JAX checkpoints).
  • Some of the model checkpoints are missing in either Pytorch or JAX for the moment. We will update them soon.

Pre-requisites

To run this codebase, you need:

  • Machine that runs a linux based OS (tested on Ubuntu 20.04 (LTS))
  • GPUs (preferably more than 16GB)
  • Docker
  • Python 3.7 or higher
  • CUDA 11.1 or higher (can be installed from here)

We recommend running all the code below inside a Linux screen or any other terminal multiplexer, since some commands can take hours/days to finish and you don't want them to die when you close your terminal.

Note:

  • If you're planning on running the JAX implementation, the installed JAX must use exactly the same CUDA and Cudnn versions installed. Our default Dockerfile assumes the code will run with CUDA 11.4 or newer and should be changed otherwise. For more details, refer to JAX installation.

Installation

To create the docker image used in both the Pytorch and JAX implementations:

cd build  
docker build -t efficient_vdvae_image .  

Note:

  • If using JAX library on ampere architecture GPUs, it's possible to face a random GPU hanging problem when training on multiple GPUs (issue). In that case, we provide an alternative docker image with an older version of JAX to bypass the issue until a solution is found.

All code executions should be done within a docker container. To start the docker container, we provide a utility script:

sh docker_run.sh  # Starts the container and attaches terminal
cd /workspace/Efficient-VDVAE  # Inside docker container

Setup datasets

All datasets can be automatically downloaded and pre-processed from the convenience script we provide:

cd data_scripts
sh download_and_preprocess.sh <dataset_name>

Notes:

  • <dataset_name> can be one of (imagenet32, imagenet64, celeba, celebahq, ffhq). MNIST and CIFAR-10 datasets will get automatically downloaded later when training the model, and they do no require any dataset setup.
  • For the celeba dataset, a manual download of img_align_celeba.zip and list_eval_partition.txt files is necessary. Both files should be placed under <project_path>/dataset_dumps/.
  • img_align_celeba.zip download link.
  • list_eval_partition.txt download link.

Setting the hyper-parameters

In this repository, we use hparams library (already included in the Dockerfile) for hyper-parameter management:

  • Specify all run parameters (number of GPUs, model parameters, etc) in one .cfg file
  • Hparams evaluates any expression used as "value" in the .cfg file. "value" can be any basic python object (floats, strings, lists, etc) or any python basic expression (1/2, max(3, 7), etc.) as long as the evaluation does not require any library importations or does not rely on other values from the .cfg.
  • Hparams saves the configuration of previous runs for reproducibility, resuming training, etc.
  • All hparams are saved by name, and re-using the same name will recall the old run instead of making a new one.
  • The .cfg file is split into sections for readability, and all parameters in the file are accessible as class attributes in the codebase for convenience.
  • The HParams object keeps a global state throughout all the scripts in the code.

We highly recommend having a deeper look into how this library works by reading the hparams library documentation, the parameters description and figures 4 and 5 in the paper before trying to run Efficient-VDVAE.

We have heavily tested the robustness and stability of our approach, so changing the model/optimization hyper-parameters for memory load reduction should not introduce any drastic instabilities as to make the model untrainable. That is of course as long as the changes don't negate the important stability points we describe in the paper.

Training the Efficient-VDVAE

To run Efficient-VDVAE in Torch:

cd efficient_vdvae_torch  
# Set the hyper-parameters in "hparams.cfg" file  
# Set "NUM_GPUS_PER_NODE" in "train.sh" file  
sh train.sh  

To run Efficient-VDVAE in JAX:

cd efficient_vdvae_jax  
# Set the hyper-parameters in "hparams.cfg" file  
python train.py  

If you want to run the model with less GPUs than available on the hardware, for example 2 GPUs out of 8:

CUDA_VISIBLE_DEVICES=0,1 sh train.sh  # For torch  
CUDA_VISIBLE_DEVICES=0,1 python train.py  # For JAX  

Models automatically create checkpoints during training. To resume a model from its last checkpoint, set its <run.name> in hparams.cfg file and re-run the same training commands.

Since training commands will save the hparams of the defined run in the .cfg file. If trying to restart a pre-existing run (by re-using its name in hparams.cfg), we provide a convenience script for resetting saved runs:

cd efficient_vdvae_torch  # or cd efficient_vdvae_jax  
sh reset.sh <run.name>  # <run.name> is the first field in hparams.cfg  

Note:

  • To make things easier for new users, we provide example hparams.cfg files that can be used under the egs folder. Detailed description of the role of each parameter is also inside hparams.cfg.
  • Hparams in egs are to be viewed only as guiding examples, they are not meant to be exactly similar to pre -trained checkpoints or experiments done in the paper.
  • While the example hparams under the naming convention ..._baseline.cfg are not exactly the hparams of C2 models in the paper (pre-trained checkpoints), they are easier to design models that achieve the same performance and can be treated as equivalents to C2 models.

Monitoring the training process

While writing this codebase, we put extra emphasis on verbosity and logging. Aside from the printed logs on terminal (during training), you can monitor the training progress and keep track of useful metrics using Tensorboard:

# While outside efficient_vdvae_torch or efficient_vdvae_jax  
# Run outside the docker container
tensorboard --logdir . --port <port_id> --reload_multifile True  

In the browser, navigate to localhost:<port_id> to visualize all saved metrics.

If Tensorboard is not installed (outside the docker container):

pip install --upgrade tensorboard

Inference with the Efficient-VDVAE

Efficient-VDVAE support multiple inference modes:

  • "reconstruction": Encodes then decodes the test set images and computes test NLL and SSIM.
  • "generation": Generates random images from the prior distribution. Randomness is controlled by the run.seed parameter.
  • "div_stats": Pre-computes the average KL divergence stats used to determine turned-off variates (refer to section 7 of the paper). Note: This mode needs to be run before "encoding" mode and before trying to do masked "reconstruction" (Refer to hparams.cfg for a detailed description).
  • "encoding": Extracts the latent distribution from the inference model, pruned to the quantile defined by synthesis.variates_masks_quantile parameter. This latent distribution is usable in downstream tasks.

To run the inference:

cd efficient_vdvae_torch  # or cd efficient_vdvae_jax  
# Set the inference mode in "logs-<run.name>/hparams-<run.name>.cfg"  
# Set the same <run.name> in "hparams.cfg"  
python synthesize.py  

Notes:

  • Since training a model with a name <run.name> will save that configuration under logs-<run.name>/hparams-<run.name>.cfg for reproducibility and error reduction. Any changes that one wants to make during inference time need to be applied on the saved hparams file (logs-<run.name>/hparams-<run.name>.cfg) instead of the main file hparams.cfg.
  • The torch implementation currently doesn't support multi-GPU inference. The JAX implementation does.

Potential TODOs

  • Make data loaders Out-Of-Core (OOC) in Pytorch
  • Make data loaders Out-Of-Core (OOC) in JAX
  • Update pre-trained model checkpoints
  • Add Fréchet-Inception Distance (FID) and Inception Score (IS) as measures for sample quality performance.
  • Improve the format of the encoded dataset used in downstream tasks (output of encoding mode, if there is a need)
  • Write a decoding mode API (if needed).

Bibtex

If you happen to use this codebase, please cite our paper:

@article{hazami2022efficient,
  title={Efficient-VDVAE: Less is more},
  author={Hazami, Louay and Mama, Rayhane and Thurairatnam, Ragavan},
  journal={arXiv preprint arXiv:2203.13751},
  year={2022}
}
Owner
Rayhane Mama
- If it seems impossible, then it's worth doing.
Rayhane Mama
Implementation of BI-RADS-BERT & The Advantages of Section Tokenization.

BI-RADS BERT Implementation of BI-RADS-BERT & The Advantages of Section Tokenization. This implementation could be used on other radiology in house co

1 May 17, 2022
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

477 Jan 06, 2023
50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program

50-days-of-Statistics-for-Data-Science - This repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.

komal_lamba 22 Dec 09, 2022
Fast, flexible and easy to use probabilistic modelling in Python.

Please consider citing the JMLR-MLOSS Manuscript if you've used pomegranate in your academic work! pomegranate is a package for building probabilistic

Jacob Schreiber 3k Dec 29, 2022
Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

seominseok 62 Dec 08, 2022
A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks. The purpose of this project is to promote the

Zhanghan Ke 255 Dec 11, 2022
PyTorch code of my WACV 2022 paper Improving Model Generalization by Agreement of Learned Representations from Data Augmentation

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (WACV 2022) Paper ArXiv Why it matters? When data augmen

Rowel Atienza 5 Mar 04, 2022
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022
Python program that works as a contact list

Lista de Contatos Programa em Python que funciona como uma lista de contatos. Features Adicionar novo contato Remover contato Atualizar contato Pesqui

Victor B. Lino 3 Dec 16, 2021
In-place Parallel Super Scalar Samplesort (IPS⁴o)

In-place Parallel Super Scalar Samplesort (IPS⁴o) This is the implementation of the algorithm IPS⁴o presented in the paper Engineering In-place (Share

82 Dec 22, 2022
Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation (CVPR2019) This is a pytorch implementatio

Yawei Luo 280 Jan 01, 2023
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo

Amin Rezaei 126 Dec 27, 2022
Localized representation learning from Vision and Text (LoVT)

Localized Vision-Text Pre-Training Contrastive learning has proven effective for pre- training image models on unlabeled data and achieved great resul

Philip Müller 10 Dec 07, 2022
Research on Event Accumulator Settings for Event-Based SLAM

Research on Event Accumulator Settings for Event-Based SLAM This is the source code for paper "Research on Event Accumulator Settings for Event-Based

Robin Shaun 26 Dec 21, 2022
Code I use to automatically update my videos' metadata on YouTube

mCodingYouTube This repository contains the code I use to automatically update my videos' metadata on YouTube, including: titles, descriptions, tags,

James Murphy 19 Oct 07, 2022
Code for layerwise detection of linguistic anomaly paper (ACL 2021)

Layerwise Anomaly This repository contains the source code and data for our ACL 2021 paper: "How is BERT surprised? Layerwise detection of linguistic

6 Dec 07, 2022
BirdCLEF 2021 - Birdcall Identification 4th place solution

BirdCLEF 2021 - Birdcall Identification 4th place solution My solution detail kaggle discussion Inference Notebook (best submission) Environment Use K

tattaka 42 Jan 02, 2023
Official Repository for the paper "Improving Baselines in the Wild".

iWildCam and FMoW baselines (WILDS) This repository was originally forked from the official repository of WILDS datasets (commit 7e103ed) For general

Kazuki Irie 3 Nov 24, 2022
SGPT: Multi-billion parameter models for semantic search

SGPT: Multi-billion parameter models for semantic search This repository contains code, results and pre-trained models for the paper SGPT: Multi-billi

Niklas Muennighoff 182 Dec 29, 2022
An official TensorFlow implementation of “CLCC: Contrastive Learning for Color Constancy” accepted at CVPR 2021.

CLCC: Contrastive Learning for Color Constancy (CVPR 2021) Yi-Chen Lo*, Chia-Che Chang*, Hsuan-Chao Chiu, Yu-Hao Huang, Chia-Ping Chen, Yu-Lin Chang,

Yi-Chen (Howard) Lo 58 Dec 17, 2022