PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Last update: Oct 06, 2021

Related tags

Deep Learning VQVAE-PyTorch

Overview

Pytorch implementation of VQVAE.

This paper combines 2 tricks:

Vector Quantization (check out this amazing blog for better understanding.)
Straight-Through (It solves the problem of back-propagation through discrete latent variables, which are intractable.)

This model has a neural network encoder and decoder, and a prior just like the vanila Variational AutoEncoder(VAE). But this model also has a latent embedding space called codebook(size: K x D). Here, K is the size of latent space and D is the dimension of each embedding e.

In vanilla variational autoencoders, the output from the encoder z(x) is used to parameterize a Normal/Gaussian distribution, which is sampled from to get a latent representation z of the input x using the 'reparameterization trick'. This latent representation is then passed to the decoder. However, In VQVAEs, z(x) is used as a "key" to do nearest neighbour lookup into the embedding codebook c, and get zq(x), the closest embedding in the space. This is called Vector Quantization(VQ) operation. Then, zq(x) is passed to the decoder, which reconstructs the input x. The decoder can either parameterize p(x|z) as the mean of Normal distribution using a transposed convolution layer like in vannila VAE, or it can autoregressively generate categorical distribution over [0,255] pixel values like PixelCNN. In this project, the first approach is used.

The loss function is combined of 3 components:

Regular Reconstruction loss
Vector Quantization loss
Commitment loss

Vector Quantization loss encourages the items in the codebook to move closer to the encoder output ||sg[ze(x) - e||^2] and Commitment loss encourages the output of the encoder to be close to embedding it picked, to commit to its codebook embedding. ||ze(x) - sg[e]]||^2 . commitment loss is multiplied with a constant beta, which is 1.0 for this project. Here, sg means "stop-gradient". Which means we don't propagate the gradients with respect to that term.

Results:

The Model is trained on MNIST and CIFAR10 datasets.

Target 👉 Reconstructed Image

👉

Details:

Trained models for MNIST and CIFAR10 are in the Trained models directory.
Hidden size of the bottleneck(z) for MNIST and CIFAR10 is 128, 256 respectively.

PyTorch Implementation of Vector Quantized Variational AutoEncoders.

Related tags

Overview

Results:

Target 👉 Reconstructed Image

Details:

Owner

Vrushank Changawala

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Tensorflow implementation of the paper "HumanGPS: Geodesic PreServing Feature for Dense Human Correspondences", CVPR 2021.

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Fine-tuning StyleGAN2 for Cartoon Face Generation

Task-based end-to-end model learning in stochastic optimization

Deep Reinforcement Learning based autonomous navigation for quadcopters using PPO algorithm.

A tool to prepare websites grabbed with wget for local viewing.

Python with OpenCV - MediaPip Framework Hand Detection

PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

PyGCL: A PyTorch Library for Graph Contrastive Learning

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

CKD - Collaborative Knowledge Distillation for Heterogeneous Information Network Embedding

Official pytorch implementation of "DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion"

GradAttack is a Python library for easy evaluation of privacy risks in public gradients in Federated Learning

Türkiye Canlı Mobese Görüntülerinde Profesyonel Nesne Takip Sistemi

Hack Camera, Microphone, Location, Clipboard With Just a Link. Also, Get Many Details About Victim's Device. And So On...

Code release for Local Light Field Fusion at SIGGRAPH 2019

Employee-Managment - Company employee registration software in the face recognition system

PyTorch code for JEREX: Joint Entity-Level Relation Extractor