Official implementation of Self-supervised Image-to-text and Text-to-image Synthesis

Last update: Jul 31, 2022

Related tags

Overview

Self-supervised Image-to-text and Text-to-image Synthesis

This is the official implementation of Self-supervised Image-to-text and Text-to-image Synthesis. The architecture of and are shown.

Dataset

We use Caltech-UCSD Birds-200-2011 and Oxford-102 datasets in this work.

Download Flower images
Rename the jpg folder to images and unzip 102flowers.zip and put it inside 102flowers folder
put 102flowers folder inside data folder
Download Birds data and put inside Data/
Download image data Extract them to Data/birds/

Dependencies

pytorch
torchvision
tensorboardX
pickle

Training

Training the image autoencoder

The driver program for training the image autoencoder is main.py

To train the image autoencoder on flower dataset

python main.py --cfg cfg/flowers_3stages.yml --gpu 0

To train the image autoencoder birds dataset

python main.py --cfg cfg/birds_3stages.yml --gpu 0

Models will automatically saved after a fixed number of iteration, to restart from a failed step edit netG_version in respective .yml file

Training the text autoencoder

python run_text_test.py dataset_type Input_Folder output_file.txt

For Flower Dataset dataset_type=1, for Birds Dataset dataset_type=2 e.g.

python run_text_test.py 2 /home/user/dev/unsup/data_datasets/CUB_200_2011 outbirds_n.txt

Training the mapping networks

Train the GAN-based mapping network

python MappingImageText.py Dataset_folder

e.g.

python MappingImageText.py /home/user/dev/unsup/data_datasets/CUB_200_2011

Train the MMD-based mapping network

python mmd_ganTI.py --dataset /home/das/dev/data_datasets/birds_dataset/CUB_200_2011 --gpu_device 0

python mmd_ganIT.py --dataset /home/das/dev/data_datasets/birds_dataset/CUB_200_2011 --gpu_device 0

Official implementation of Self-supervised Image-to-text and Text-to-image Synthesis

Related tags

Overview

Self-supervised Image-to-text and Text-to-image Synthesis

Dataset

Dependencies

Training

Training the image autoencoder

To train the image autoencoder on flower dataset

To train the image autoencoder birds dataset

Training the text autoencoder

Training the mapping networks

Train the GAN-based mapping network

Train the MMD-based mapping network

Owner

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our paper

Joint Channel and Weight Pruning for Model Acceleration on Mobile Devices

A self-supervised learning framework for audio-visual speech

Oriented Object Detection: Oriented RepPoints + Swin Transformer/ReResNet

PyTorch implementation of PNASNet-5 on ImageNet

PECOS - Prediction for Enormous and Correlated Spaces

NDE: Climate Modeling with Neural Diffusion Equation, ICDM'21

This application is the basic of automated online-class-joiner(for YıldızEdu) within the right time. Gets the ZOOM link by scheduled date and time.

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

6D Grasping Policy for Point Clouds

This is the repository for The Machine Learning Workshops, published by AI DOJO

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.

ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

A flexible ML framework built to simplify medical image reconstruction and analysis experimentation.

Image restoration with neural networks but without learning.

Official Code Release for "TIP-Adapter: Training-free clIP-Adapter for Better Vision-Language Modeling"