Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Last update: Dec 22, 2022

Related tags

Deep Learning clip-glass

Overview

CLIP-GLaSS

Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

An in-browser demo is available here

Installation

Clone this repository

git clone https://github.com/galatolofederico/clip-glass && cd clip-glass

Create a virtual environment and install the requirements

virtualenv --python=python3.6 env && . ./env/bin/activate
pip install -r requirements.txt

Run CLIP-GLaSS

You can run CLIP-GLaSS with:

python run.py --config  --target

Specifying and according to the following table:

Config	Meaning	Target Type
GPT2	Use GPT2 to solve the Image-to-Text task	Image
DeepMindBigGAN512	Use DeepMind's BigGAN 512x512 to solve the Text-to-Image task	Text
DeepMindBigGAN256	Use DeepMind's BigGAN 256x256 to solve the Text-to-Image task	Text
StyleGAN2_ffhq_d	Use StyleGAN2-ffhq to solve the Text-to-Image task	Text
StyleGAN2_ffhq_nod	Use StyleGAN2-ffhq without Discriminator to solve the Text-to-Image task	Text
StyleGAN2_church_d	Use StyleGAN2-church to solve the Text-to-Image task	Text
StyleGAN2_church_nod	Use StyleGAN2-church without Discriminator to solve the Text-to-Image task	Text
StyleGAN2_car_d	Use StyleGAN2-car to solve the Text-to-Image task	Text
StyleGAN2_car_nod	Use StyleGAN2-car without Discriminator to solve the Text-to-Image task	Text

If you do not have downloaded the models weights you will be prompted to run ./download-weights.sh You will find the results in the folder ./tmp, a different output folder can be specified with --tmp-folder

Examples

python run.py --config StyleGAN2_ffhq_d --target "the face of a man with brown eyes and stubble beard"
python run.py --config GPT2 --target gpt2_images/dog.jpeg

Acknowledgments and licensing

This work heavily relies on the following amazing repositories and would have not been possible without them:

CLIP from openai (included in the folder clip)
pytorch-pretrained-BigGAN from huggingface
stylegan2-pytorch from Adrian Sahlman (included in the folder stylegan2)
gpt-2-pytorch from Tae-Hwan Jung (included in the folder gpt2)

All their work can be shared under the terms of the respective original licenses.

All my original work (everything except the content of the folders clip, stylegan2 and gpt2) is released under the terms of the GNU/GPLv3 license. Coping, adapting e republishing it is not only consent but also encouraged.

Citing

If you want to cite use you can use this BibTeX

@article{galatolo_glass
,	author	= {Galatolo, Federico A and Cimino, Mario GCA and Vaglini, Gigliola}
,	title	= {Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search}
,	year	= {2021}
}

Contacts

For any further question feel free to reach me at [email protected] or on Telegram @galatolo

Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

Related tags

Overview

CLIP-GLaSS

An in-browser demo is available here

Installation

Run CLIP-GLaSS

Examples

Acknowledgments and licensing

Citing

Contacts

Owner

Federico Galatolo

Code and Experiments for ACL-IJCNLP 2021 Paper Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering.

MILK: Machine Learning Toolkit

Complex Answer Generation For Conversational Search Systems.

A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Auto White-Balance Correction for Mixed-Illuminant Scenes

Differentiable simulation for system identification and visuomotor control

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

An unreferenced image captioning metric (ACL-21)

A benchmark framework for Tensorflow

Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+

PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

JumpDiff: Non-parametric estimator for Jump-diffusion processes for Python

Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

Semiconductor Machine learning project

DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

PyTorch implementation of the cross-modality generative model that synthesizes dance from music.