Linear image-to-image translation

Overview

Linear (Un)supervised Image-to-Image Translation

Teaser image Examples for linear orthogonal transformations in PCA domain, learned without pairing supervision. Training time is about 1 minute.

This repository contains the official pytorch implementation of the following paper:

The Surprising Effectiveness of Linear Unsupervised Image-to-Image Translation
Eitan Richardson and Yair Weiss
https://arxiv.org/abs/2007.12568

Abstract: Unsupervised image-to-image translation is an inherently ill-posed problem. Recent methods based on deep encoder-decoder architectures have shown impressive results, but we show that they only succeed due to a strong locality bias, and they fail to learn very simple nonlocal transformations (e.g. mapping upside down faces to upright faces). When the locality bias is removed, the methods are too powerful and may fail to learn simple local transformations. In this paper we introduce linear encoder-decoder architectures for unsupervised image to image translation. We show that learning is much easier and faster with these architectures and yet the results are surprisingly effective. In particular, we show a number of local problems for which the results of the linear methods are comparable to those of state-of-the-art architectures but with a fraction of the training time, and a number of nonlocal problems for which the state-of-the-art fails while linear methods succeed.

TODO:

  • Code for reproducing the linear image-to-image translation results
  • Code for applying the linear transformation as regularization for deep unsupervisd image-to-image (based on ALAE)
  • Support for user-provided dataset (e.g. image folders)
  • Automatic detection of available GPU resources

Requirements

  • Pytorch (tested with pytorch 1.5.0)
  • faiss (tested with faiss 1.6.3 with GPU support)
  • OpenCV (used only for generating some of the synthetic transformations)

System Requirements

Both the PCA and the nearest-neighbors search in ICP are performed on GPU (using pytorch and faiss). A cuda-enabled GPU with at least 11 GB of RAM is recommended. Since the entire data is loaded to RAM (not in mini-batches), a lot of (CPU) RAM is required as well ...

Code structure

  • run_im2im.py: The main python script for training and testing the linear transformation
  • pca-linear-map.py: The main algorithm. Performs PCA for the two domains, resolves polarity ambiguity and learnes an orthogonal or unconstrained linear transformation. In the unpaired case, ICP iterations are used to find the best correspondence.
  • pca.py: Fast PCA using pytorch and the skewness-based polarity synchronization.
  • utils.py: Misc utils
  • data.py: Loading the dataset and applying the synthetic transformations

Preparing the datasets

The repository does not contain code for loading the datasets, however, the tested datasets were loaded in their standard format. Please download (or link) the datasets under datasets/CelebA, datasets/FFHQ and datasets/edges2shoes.

Learning a linear transformation

usage: run_im2im.py [--dataset {celeba,ffhq,shoes}]
                    [--resolution RESOLUTION]
                    [--a_transform {identity,rot90,vflip,edges,Canny-edges,colorize,super-res,inpaint}]
                    [--pairing {paired,matching,nonmatching,few-matches}]
                    [--matching {nn,cyc-nn}]
                    [--transform_type {orthogonal,linear}] [--n_iters N_ITERS]
                    [--n_components N_COMPONENTS] [--n_train N_TRAIN]
                    [--n_test N_TEST]

Results are saved into the results folder.

Command example for generating the colorization result in the above image (figure 9 in tha paper):

python3 run_im2im.py --dataset ffhq --resolution 128 --a_transform colorize --n_components 2000 --n_train 20000 --n_test 25
Loading matching data for ffhq - colorize ...
100%|██████████████████████████████████████████████████████████████████████████| 20000/20000 [00:04<00:00, 4549.19it/s]
100%|█████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 299.33it/s]
Learning orthogonal transformation in 2000 PCA dimensions...
Got 20000 samples in A and 20000 in B.
PCA A...
PCA B...
Synchronizing...
Using skew-based logic for 1399/2000 dimensions.
PCA representations:  (20000, 2000) (20000, 2000) took: 68.09504985809326
Learning orthogonal transformation using matching sets:
Iter 0: 4191 B-NNs / 1210 consistent, mean NN l2 = 1308.520. took 2.88 sec.
Iter 1: 19634 B-NNs / 19634 consistent, mean NN l2 = 607.715. took 3.46 sec.
Iter 2: 19801 B-NNs / 19801 consistent, mean NN l2 = 204.487. took 3.49 sec.
Iter 3: 19801 B-NNs / 19801 consistent, mean NN l2 = 204.079. Converged - terminating ICP iterations.
Applying the learned transformation on test data...

Limitations

As described in the paper:

  • If the true translation is very non-linear, the learned linear transformation will not model it well.
  • If the image domain has a very complex structure, a large number of PCA coefficients will be required to achieve high quality reconstruction.
  • The nonmatching case (i.e. no matching paires exist) requires larger training sets.

Additional results

Paired

In the two examples above (edge images to real images and inpainting with a relative large part of the image missing), the true transformation is quite nonlinear, making the learned linear transformation less suitable. Here we used the unconstrained linear transformation rather than the orthogonal one. In addition, pairing supervision was used.

NonFaces

Here is an example showing the linear transformation method applied to a different domain (not just aligned faces).

Owner
Eitan Richardson
PhD student and TA at the Hebrew University of Jerusalem / Research Intern at Google
Eitan Richardson
v objective diffusion inference code for PyTorch.

v-diffusion-pytorch v objective diffusion inference code for PyTorch, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The

Katherine Crowson 635 Dec 30, 2022
An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

Retina Blood Vessels Segmentation This is an implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional

Srijarko Roy 23 Aug 20, 2022
《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

The Most Important Thing. Our code is developed based on: LXMERT: Learning Cross-Modality Encoder Representations from Transformers

53 Dec 16, 2022
An educational AI robot based on NVIDIA Jetson Nano.

JetBot Looking for a quick way to get started with JetBot? Many third party kits are now available! JetBot is an open-source robot based on NVIDIA Jet

NVIDIA AI IOT 2.6k Dec 29, 2022
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

Microsoft 674 Dec 26, 2022
This repo contains the code required to train the multivariate time-series Transformer.

Multi-Variate Time-Series Transformer This repo contains the code required to train the multivariate time-series Transformer. Download the data The No

Gregory Duthé 4 Nov 24, 2022
Official code for "Mean Shift for Self-Supervised Learning"

MSF Official code for "Mean Shift for Self-Supervised Learning" Requirements Python = 3.7.6 PyTorch = 1.4 torchvision = 0.5.0 faiss-gpu = 1.6.1 In

UMBC Vision 44 Nov 21, 2022
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers This is an implementation of A Physics-Informed Vector Quantized Autoencoder for Dat

DreamSoul 3 Sep 12, 2022
A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Probabilistic U-Net + **Update** + An improved Model (the Hierarchical Probabilistic U-Net) + LIDC crops is now available. See below. Re-implementatio

Simon Kohl 498 Dec 26, 2022
Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation", if you find this useful and use

57 Dec 27, 2022
Software associated to AAAI paper "Planning with Biological Neurons and Synapses"

jBrain Software associated with the AAAI 2022 paper Francesco D'Amore, Daniel Mitropolsky, Pierluigi Crescenzi, Emanuele Natale, Christos H. Papadimit

Pierluigi Crescenzi 1 Apr 10, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
Agile SVG maker for python

Agile SVG Maker Need to draw hundreds of frames for a GIF? Need to change the style of all pictures in a PPT? Need to draw similar images with differe

SemiWaker 4 Sep 25, 2022
Code for MarioNette: Self-Supervised Sprite Learning, in NeurIPS 2021

MarioNette | Webpage | Paper | Video MarioNette: Self-Supervised Sprite Learning Dmitriy Smirnov, Michaël Gharbi, Matthew Fisher, Vitor Guizilini, Ale

Dima Smirnov 28 Nov 18, 2022
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
High-quality implementations of standard and SOTA methods on a variety of tasks.

Uncertainty Baselines The goal of Uncertainty Baselines is to provide a template for researchers to build on. The baselines can be a starting point fo

Google 1.1k Dec 30, 2022
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis

ImageBART NeurIPS 2021 Patrick Esser*, Robin Rombach*, Andreas Blattmann*, Björn Ommer * equal contribution arXiv | BibTeX | Poster Requirements A sui

CompVis Heidelberg 110 Jan 01, 2023
CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

Tianwei Yin 134 Dec 23, 2022
Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DistilBERT-Text-mining-authorship-attribution Dataset used: https://www.kaggle.com/azimulh/tweets-data-for-authorship-attribution-modelling/version/2

1 Jan 13, 2022