Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Last update: Sep 26, 2022

Related tags

Deep Learning TE-VQGAN

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin¹, Gyubok Lee¹, Jiyoung Lee¹, Joonseok Lee^2,3, Edward Choi¹ | Paper

¹KAIST, ²Google Research, ³Seoul National University

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Requirements

TBU

Download Dataset

TBU

Training TE-VQGAN (Stage 1)

TBU

Training Bi-directional Image-Text Generator (Stage 2)

TBU

Thanks to

The implementation of 'TE-VQGAN' and 'Bi-directional Image-Text Generator' is based on VQGAN and DALLE-pytorch. Thanks to all related works!

Citation

@misc{shin2021translationequivariant,
      title={Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation}, 
      author={Woncheol Shin and Gyubok Lee and Jiyoung Lee and Joonseok Lee and Edward Choi},
      year={2021},
      eprint={2112.00384},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Related tags

Overview

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Requirements

Download Dataset

Training TE-VQGAN (Stage 1)

Training Bi-directional Image-Text Generator (Stage 2)

Thanks to

Citation

Owner

Woncheol Shin

Predicting Price of house by considering ,house age, Distance from public transport

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Controlling a game using mediapipe hand tracking

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

Official PyTorch implementation of the NeurIPS 2021 paper StyleGAN3

Implementation of popular SOTA self-supervised learning algorithms as Fastai Callbacks.

Simulation of self-focusing of laser beams in condensed media

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Redash reset for python

Find-Lane-Line - Use openCV library and Python to detect the road-lane-line

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Paper: Cross-View Kernel Similarity Metric Learning Using Pairwise Constraints for Person Re-identification

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

YOLOv5 🚀 is a family of object detection architectures and models pretrained on the COCO dataset

A Framework for Encrypted Machine Learning in TensorFlow

This is the official implementation of TrivialAugment and a mini-library for the application of multiple image augmentation strategies including RandAugment and TrivialAugment.

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

Latent Execution for Neural Program Synthesis