Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Last update: Sep 11, 2022

Overview

VQGAN-CLIP-Docker

About

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

This is a stripped and minimal dependency repository for running locally or in production VQGAN+CLIP.

For a Google Colab notebook see the original repository.

Samples

Setup

Clone this repository and cd inside.

git clone https://github.com/kcosta42/VQGAN-CLIP-Docker.git
cd VQGAN-CLIP-Docker

Download a VQGAN model and put it in the ./models folder.

Dataset	Link
ImageNet (f=16), 16384	vqgan_imagenet_f16_16384

For GPU capability, make sure you have CUDA installed on your system (tested with CUDA 11.1+).

6 GB of VRAM is required to generate 256x256 images.
11 GB of VRAM is required to generate 512x512 images.
24 GB of VRAM is required to generate 1024x1024 images. (Untested)

Local

Install the Python requirements

python3 -m pip install -r requirements.txt

To know if you can run this on your GPU, the following command must return True.

python3 -c "import torch; print(torch.cuda.is_available());"

Docker

Make sure you have docker and docker-compose installed. nvidia-docker is needed if you want to run this on your GPU through Docker.

A Makefile is provided for ease of use.

make build  # Build the docker image

Usage

Two configuration file are provided ./configs/local.json and ./configs/docker.json. They are ready to go, but you may want to edit them to meet your need. Check the Configuration section to understand each field.

The resulting generations can be found in the ./outputs folder.

GPU

To run locally:

python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate

CPU

To run locally:

DEVICE=cpu python3 -m scripts.generate -c ./configs/local.json

To run on docker:

make generate-cpu

Configuration

Argument	Type	Descriptions
`prompts`	List[str]	Text prompts
`image_prompts`	List[FilePath]	Image prompts / target image path
`max_iterations`	int	Number of iterations
`save_freq`	int	Save image iterations
`size`	[int, int]	Image size (width height)
`init_image`	FilePath	Initial image
`init_noise`	str	Initial noise image ['gradient','pixels']
`init_weight`	float	Initial weight
`output_dir`	FilePath	Path to output directory
`models_dir`	FilePath	Path to models cache directory
`clip_model`	FilePath	CLIP model path or name
`vqgan_checkpoint`	FilePath	VQGAN checkpoint path
`vqgan_config`	FilePath	VQGAN config path
`noise_prompt_seeds`	List[int]	Noise prompt seeds
`noise_prompt_weights`	List[float]	Noise prompt weights
`step_size`	float	Learning rate
`cutn`	int	Number of cuts
`cut_pow`	float	Cut power
`seed`	int	Seed (-1 for random seed)
`optimizer`	str	Optimiser ['Adam','AdamW','Adagrad','Adamax','DiffGrad','AdamP','RAdam']
`augments`	List[str]	Enabled augments ['Ji','Sh','Gn','Pe','Ro','Af','Et','Ts','Cr','Er','Re']

Acknowledgments

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}

@misc{esser2020taming,
      title={Taming Transformers for High-Resolution Image Synthesis},
      author={Patrick Esser and Robin Rombach and Björn Ommer},
      year={2020},
      eprint={2012.09841},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{ramesh2021zeroshot,
    title   = {Zero-Shot Text-to-Image Generation},
    author  = {Aditya Ramesh and Mikhail Pavlov and Gabriel Goh and Scott Gray and Chelsea Voss and Alec Radford and Mark Chen and Ilya Sutskever},
    year    = {2021},
    eprint  = {2102.12092},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

Related tags

Overview

VQGAN-CLIP-Docker

About

Samples

Setup

Local

Docker

Usage

GPU

CPU

Configuration

Acknowledgments

Citations

Owner

Kevin Costa

Torch-mutable-modules - Use in-place and assignment operations on PyTorch module parameters with support for autograd

Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Official repository for "Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring".

Configure SRX interfaces with Scrapli

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

scikit-learn inspired API for CRFsuite

ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

A fast model to compute optical flow between two input images.

Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

Pytorch implementation for "Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter".

Pytorch implementation of MixNMatch

Complete-IoU (CIoU) Loss and Cluster-NMS for Object Detection and Instance Segmentation (YOLACT)

Natural Intelligence is still a pretty good idea.

This is the repository of our article published on MDPI Entropy "Feature Selection for Recommender Systems with Quantum Computing".

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Randomized Correspondence Algorithm for Structural Image Editing

Official PyTorch implementation of Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Hyper-parameter optimization for sklearn

Tool which allow you to detect and translate text.

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]