CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Overview

CLIP-GEN

[简体中文][English]

本项目在萤火二号集群上用 PyTorch 实现了论文 《CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP》。

clip-gen

CLIP-GEN 是一个 Language-Free 的文本生成图像的方法,它不依赖图文训练样本,通过预训练 CLIP 模型的强大表征能力,只需要图片数据就可以训练出一个文本生成图像的模型。该方法的基本原理是:CLIP-GEN 首先会训练一个 VQ-GAN,把图片映射到离散空间;然后再训练一个 GPT 模型,把 CLIP embedding 映射到 VQ-GAN 的离散空间;由于在 CLIP 中,文本和图像共享一个特征空间,在 inference 的时候我们就可以通过同样的方法把文本映射到 VQ-GAN 的离散空间,然后 decode 为 RGB 图像。

Requirements

  • hfai (to be released soon)
  • torch>=1.8

Training

支持的数据集:coco, imagenet, googlecc

  1. 下载 CLIP 预训练模型

    下载 CLIP 后放至 pretrained/clip_vit_b32.pt,该预训练模型来自 OpenAI.

  2. 在 COCO 上训练 VQGAN

    提交任务至萤火集群:

    hfai python train_vqgan.py --ds coco -- -n 1 -p 30

    本地运行:

    python train_vqgan.py --ds coco
  3. 在 COCO 上训练 Conditional GPT

    提交任务至萤火集群:

    hfai python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt -- -n 4 -p 30

    本地运行:

    python train_gpt.py --ds coco --vqgan_ckpt /path/to/vqgan/ckpt

Demo

下载在 COCO 上训练好的 VQGANGPT 模型,分别放到 pretrained/vqgan_coco.ptpretrained/gpt_coco.pt;然后运行:

python demo.py --text "A city bus driving on the city street" --out "bus.jpg"

NOTE: demo 的运行不依赖 hfai,用户可以在装有 PyTorch 的环境下直接使用

Samples

下面是一些文本生成图像的样本:

tower bus living train skiing

References

Citation

@article{wang2022clip,
  title={CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP},
  author={Wang, Zihao and Liu, Wei and He, Qian and Wu, Xinglong and Yi, Zili},
  journal={arXiv preprint arXiv:2203.00386},
  year={2022}
}

TODO

  • 预训练模型
  • FFRecord 数据
You might also like...
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

A Jupyter notebook to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models
Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models

Text2Art is an AI art generator powered with VQGAN + CLIP and CLIPDrawer models. You can easily generate all kind of art from drawing, painting, sketch, or even a specific artist style just using a text input. You can also specify the dimensions of the image. The process can take 3-20 mins and the results will be emailed to you.

A 1.3B text-to-image generation model trained on 14 million image-text pairs
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.
A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP 📎 A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper
[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus
Free-duolingo-plus - Duolingo account creator that uses your invite code to get you free duolingo plus

free-duolingo-plus duolingo account creator that uses your invite code to get yo

Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021
Official implementation of SynthTIGER (Synthetic Text Image GEneratoR) ICDAR 2021

🐯 SynthTIGER: Synthetic Text Image GEneratoR Official implementation of SynthTIGER | Paper | Datasets Moonbin Yim1, Yoonsik Kim1, Han-cheol Cho1, Sun

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Comments
  • "nn.TransformerEncoderLayer" is adopted to construct the "conditonal transformer" in your paper.

    Thanks for your great work.

    I noticed that you utilize "nn.TransformerEncoderLayer" when constructing "conditional transformer". Since it is used to predict the next token index, I am wondering whether the decoder of transformer is more appropriate for the construction of your conditional transformer? or what's the reason that you don't adopt "nn.TransformerdecoderLayer" ?

    Because of the structure of "nn.TransformerEncoderLayer" is simpler or more concise than that of "nn.TransformerDEcoderLayer" ?

    opened by fido20160817 0
  • Add Web Demo & Docker environment

    Add Web Demo & Docker environment

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model, view it here: https://replicate.com/hfailab/clip-gen. You can find the docker file under the tab ‘run model with docker’.

    We have added some examples to the page, but do claim the page so you can own the page, customise the Example gallery as you like, push any future update to the web demo, and we'll feature it on our website and tweet about it too. You can find the 'Claim this model' button on the top of the page. Any member of the HFAiLab organization on GitHub can claim the model ~ When the page is claimed, it will be automatically linked to the arXiv website as well (under “Demos”).

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by chenxwh 0
Video Frame Interpolation without Temporal Priors (a general method for blurry video interpolation)

Video Frame Interpolation without Temporal Priors (NeurIPS2020) [Paper] [video] How to run Prerequisites NVIDIA GPU + CUDA 9.0 + CuDNN 7.6.5 Pytorch 1

YoujianZhang 31 Sep 04, 2022
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Dec 29, 2022
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL) This repository is for Zero-shot Natural Languag

Computer Vision Lab. @ GIST 37 Dec 27, 2022
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
Automated Attendance Project Using Face Recognition

dependencies for project: cmake 3.22.1 dlib 19.22.1 face-recognition 1.3.0 openc

Rohail Taha 1 Jan 09, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

hypergraph_reid Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification" If you find this help your research,

62 Dec 21, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
The source code for Adaptive Kernel Graph Neural Network at AAAI2022

AKGNN The source code for Adaptive Kernel Graph Neural Network at AAAI2022. Please cite our paper if you think our work is helpful to you: @inproceedi

11 Nov 25, 2022
Easily Process a Batch of Cox Models

ezcox: Easily Process a Batch of Cox Models The goal of ezcox is to operate a batch of univariate or multivariate Cox models and return tidy result. ⏬

Shixiang Wang 15 May 23, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
The personal repository of the work: *DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer*.

DanceNet3D The personal repository of the work: DanceNet3D: Music Based Dance Generation with Parametric Motion Transformer. Dataset and Results Pleas

南嘉Nanga 36 Dec 21, 2022
PyTorch implementation of Pointnet2/Pointnet++

Pointnet2/Pointnet++ PyTorch Project Status: Unmaintained. Due to finite time, I have no plans to update this code and I will not be responding to iss

Erik Wijmans 1.2k Dec 29, 2022
Count GitHub Stars ⭐

Count GitHub Stars per Day ⭐ Track GitHub stars per day over a date range to measure the open-source popularity of different repositories. Requirement

Ultralytics 20 Nov 20, 2022
Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Official pytorch implementation of "Scaling-up Disentanglement for Image Translation", ICCV 2021.

Aviv Gabbay 41 Nov 29, 2022
Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

Qiao Liu 50 Dec 18, 2022
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
Code for our paper "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021

SimCLS Code for our paper: "SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization", ACL 2021 1. How to Install Requirements

Yixin Liu 150 Dec 12, 2022
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding This repo contains the data and source code for baseline models in the NeurIPS 2

Microsoft 29 Dec 29, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022