Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

Overview

GSL - Zero-shot Synthesis with Group-Supervised Learning

image Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD, and Fonts). Bottom: training images (attributes are known). Top: Test image (attributes are a query).

Zero-shot Synthesis with Group-Supervised Learning
Yunhao Ge, Sami Abu-El-Haija, Gan Xin, Laurent Itti
International Conference on Learning Representations (ICLR), 2021

[Paper] [Project Page] [Fonts dataset]

To aid neural networks to envision objects with different attributes, we propose a family of objective functions, expressed on groups of examples, as a novel learning framework that we term Group-Supervised Learning (GSL). GSL allows us to decompose inputs into a disentangled representation with swappable components, that can be recombined to synthesize new samples. (i.e., images of red boats & blue cars can be decomposed and recombined to synthesize novel images of red cars.

[We are actively updating the code]

Getting Started

Installation

  • Dependencies
python 3.6.4
pytorch 0.3.1.post2
visdom
tqdm

  • Clone this repo:
git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
cd pytorch-CycleGAN-and-pix2pix

Datasets

  • iLab-20M, is an attributed dataset containing images of toy vehicles placed on a turntable using 11 cameras at different viewing points. There are 3 attribute classes: vehicle identity: 15 categories, each having 25-160 instances; pose; and backgrounds: over 14 for each identity: projecting vehicles in relevant contexts. You can download a subset of iLab-20M that we used in our paper here: iLab-6pose [http://ilab.usc.edu/datasets/iLab_6pose.zip]

  • Fonts, is a computer-generated RGB image datasets. Each image, with 128 * 128 pixels, contains an alphabet letter rendered using 5 independent generating attributes: letter identity, size, font color, background color and font. you can download the fonts dataset at here: Fonts [http://ilab.usc.edu/datasets/fonts].

  • RaFD contains pictures of 67 models displaying 8 emotional expressions taken by 5 different camera angles simultaneously. There are 3 attributes: identity, camera position (pose), and expression. To download the RaFD dataset, you must request access to the dataset from the Radboud Faces Database website.

  • dSprites, is a dataset of 2D shapes procedurally generated from 6 ground truth independent latent factors. These factors are color, shape, scale, rotation, x and y positions of a sprite. you can download the dSprites dataset here dSprites

Datasets Preprocess

To efficiently access the dataset in a manner of Group-Supervised Learning, some dataset need preprocess.

  • For iLab-20M dataset, after downloading iLab-6pose subset, please run python3 ./utils/ilab_data_preprocess.py
  • For RaFD dataset, after downloading, please run python3 ./utils/rafd_data_preprocess.py
  • For desprites dataset, after downloading, please run python3 ./utils/desprites_data_preprocess.py
  • For Fonts dataset, no preprocess needed.

After preprocess, please update the dataset path in '--dataset_path' parameter

Synthesis with pretrained model

You can download the pretrained models of ilab-20M, Fonts, RaFD and dsprites here pretrained models (http://ilab.usc.edu/datasets/GSL_pretrained_models.zip) and put them to ./checkpoints/pretrained_models The sample test images are in the ./checkpoints/test_imgs You can use the following sample commands to synthesize zero-shot images with our pretrained models:

  • For Fonts
python3 main.py --train False --dataset Fonts --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/fonts' --viz_name fonts
  • For iLab-20M
python3 main.py --train False --dataset ilab-20M --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/ilab_20M' --viz_name ilab-20m
  • For RaFD
python3 main.py --train False --dataset RaFD --pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/rafd' --viz_name rafd
  • For dsprites
python3 main.py --train False --dataset dsprites--pretrain_model_path YOUR_LOCAL_PATH_OF_PRETRAINED_MODEL --test_img_path './checkpoints/test_imgs/dsprites' --viz_name dsprites

Train GZS-Net on datasets used in paper

Group-Supervised Zero-shot Synthesis Network (GZS-Net) is an implemetation of Group-Supervised Learning with only reconstruction loss. If you want to train GZS-Net with the 4 datasets used in paper (Fonts, iLab-20M, RaFD, dSprites), please use 'train.py' with the dataset name, dataset path and visualize pannel name in Visdom. Note: you can also set the hyperparameter of lr, batchsize, backbone structure in train.py Here are some examples:

  • For Fonts
python3 main.py --train True --dataset Fonts --dataset_path YOUR_LOCAL_PATH_OF_FONTS --viz_name fonts
  • For iLab-20M
python3 main.py --train True --dataset ilab-20M --dataset_path YOUR_LOCAL_PATH_OF_ILAB --viz_name ilab-20m
  • For RaFD
python3 main.py --train True --dataset RaFD --dataset_path YOUR_LOCAL_PATH_OF_RaFD --viz_name rafd
  • For dsprites
python3 main.py --train True --dataset dsprites--dataset_path YOUR_LOCAL_PATH_OF_DSPRITES --viz_name dsprites

Train GZS-Net on your own dataset

To use our GZS-Net on you own dataset, before training, please refer the admissible dataset description in our paper. Note: The high level training strategy of the 4 dataset that paper used (Fonts, iLab-20M, RaFD, dSprites) is shown in Figure.3 in our paper. However, to make our method more general and compatale with more dataset, we propose a easier way to train our GZS-Net, we called 'sample edge strategy' to achieve 'One-Overlap Attribute Swap': In each training step, we sample n different edges (each edge corresponding to a specific attribute), and we release the two requirement of edge sample: (1) the two samples connected by an edge with attribute A should have same attribute A value but do not need to have different attribute values of other attributes (e.g. attribute B and C value can be the same). (2) we do not need center image x to keep showing in all edges, which means the connected images between edges can be totally different.

We train ilab-20M with the new training strategy and you can cgange our example code of ilab_20M_custom to your custom dataset.

  • Take ilab_20M_custom dataset as an example
python3 train.py  --dataset ilab_20M_custom --dataset_path YOUR_LOCAL_PATH_OF_CUSTOM_DATASET --viz_name ilab_20M_custom

Citation

If you use this code for your research, please cite our papers.

@inproceedings{ge2021zeroshot,
  title={Zero-shot Synthesis with Group-Supervised Learning},
  author={Yunhao Ge and Sami Abu-El-Haija and Gan Xin and Laurent Itti},
  booktitle={International Conference on Learning Representations},
  year={2021},
  url={https://openreview.net/forum?id=8wqCDnBmnrT}
}

Acknowledgments

Our code is inspired by Beta-VAE.

Owner
Andy_Ge
Ph.D. Student in Computer Vision, Machine Learning, and Baby Learning
Andy_Ge
TensorFlow implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? Source: Improving Vision Transformer Efficiency and Accuracy by Learning to Tokenize

Aritra Roy Gosthipaty 23 Dec 24, 2022
The FIRST GANs-based omics-to-omics translation framework

OmiTrans Please also have a look at our multi-omics multi-task DL freamwork 👀 : OmiEmbed The FIRST GANs-based omics-to-omics translation framework Xi

Xiaoyu Zhang 6 Dec 14, 2022
Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

Hand Gesture Volume Controller Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out). Code Firstly I have created a

Tejas Prajapati 16 Sep 11, 2021
Github for the conference paper GLOD-Gaussian Likelihood OOD detector

FOOD - Fast OOD Detector Pytorch implamentation of the confernce peper FOOD arxiv link. Abstract Deep neural networks (DNNs) perform well at classifyi

17 Jun 19, 2022
Python scripts for performing lane detection using the LSTR model in ONNX

ONNX LSTR Lane Detection Python scripts for performing lane detection using the Lane Shape Prediction with Transformers (LSTR) model in ONNX. Requirem

Ibai Gorordo 29 Aug 30, 2022
TakeInfoatNistforICS - Take Information in NIST NVD for ICS

Take Information in NIST NVD for ICS This project developed with Python. When yo

5 Sep 05, 2022
An image classification app boilerplate to serve your deep learning models asap!

Image 🖼 Classification App Boilerplate Have you been puzzled by tons of videos, blogs and other resources on the internet and don't know where and ho

Smaranjit Ghose 27 Oct 06, 2022
An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Joseph Lee 7 Dec 25, 2022
Raptor-Multi-Tool - Raptor Multi Tool With Python

Promises 🔥 20 Stars and I'll fix every error that there is 50 Stars and we will

Aran 44 Jan 04, 2023
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

71 Dec 14, 2022
A Python framework for conversational search

Chatty Goose Multi-stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting Installation Ma

Castorini 36 Oct 23, 2022
Learning Efficient Online 3D Bin Packing on Packing Configuration Trees

Learning Efficient Online 3D Bin Packing on Packing Configuration Trees This repository is being continuously updated, please stay tuned! Any code con

86 Dec 28, 2022
An unofficial PyTorch implementation of a federated learning algorithm, FedAvg.

Federated Averaging (FedAvg) in PyTorch An unofficial implementation of FederatedAveraging (or FedAvg) algorithm proposed in the paper Communication-E

Seok-Ju Hahn 123 Jan 06, 2023
Hide screen when boss is approaching.

BossSensor Hide your screen when your boss is approaching. Demo The boss stands up. He is approaching. When he is approaching, the program fetches fac

Hiroki Nakayama 6.2k Jan 07, 2023
Repository to run object detection on a model trained on an autonomous driving dataset.

Autonomous Driving Object Detection on the Raspberry Pi 4 Description of Repository This repository contains code and instructions to configure the ne

Ethan 51 Nov 17, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 1.9.0 ubuntu20/python3.9/pip ubuntu20/python3.8/p

ESPnet 5.9k Jan 04, 2023
Yet another video caption

Yet another video caption

Fan Zhimin 5 May 26, 2022
PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation This is the PyTorch implemention of ICCV'21 paper SGPA: Structure

Chen Kai 24 Dec 05, 2022
SemiNAS: Semi-Supervised Neural Architecture Search

SemiNAS: Semi-Supervised Neural Architecture Search This repository contains the code used for Semi-Supervised Neural Architecture Search, by Renqian

Renqian Luo 21 Aug 31, 2022
A curated list of the latest breakthroughs in AI (in 2021) by release date with a clear video explanation, link to a more in-depth article, and code.

2021: A Year Full of Amazing AI papers- A Review 📌 A curated list of the latest breakthroughs in AI by release date with a clear video explanation, l

Louis-François Bouchard 2.9k Dec 31, 2022