Code for CVPR2021 paper 'Where and What? Examining Interpretable Disentangled Representations'.

Related tags

Deep LearningPS-SC
Overview

PS-SC GAN

trav_animation

This repository contains the main code for training a PS-SC GAN (a GAN implemented with the Perceptual Simplicity and Spatial Constriction constraints) introduced in the paper Where and What? Examining Interpretable Disentangled Representations. The code for computing the TPL for model checkpoints from disentanglemen_lib can be found in this repository.

Abstract

Capturing interpretable variations has long been one of the goals in disentanglement learning. However, unlike the independence assumption, interpretability has rarely been exploited to encourage disentanglement in the unsupervised setting. In this paper, we examine the interpretability of disentangled representations by investigating two questions: where to be interpreted and what to be interpreted? A latent code is easily to be interpreted if it would consistently impact a certain subarea of the resulting generated image. We thus propose to learn a spatial mask to localize the effect of each individual latent dimension. On the other hand, interpretability usually comes from latent dimensions that capture simple and basic variations in data. We thus impose a perturbation on a certain dimension of the latent code, and expect to identify the perturbation along this dimension from the generated images so that the encoding of simple variations can be enforced. Additionally, we develop an unsupervised model selection method, which accumulates perceptual distance scores along axes in the latent space. On various datasets, our models can learn high-quality disentangled representations without supervision, showing the proposed modeling of interpretability is an effective proxy for achieving unsupervised disentanglement.

Requirements

  • Python == 3.7.2
  • Numpy == 1.19.1
  • TensorFlow == 1.15.0
  • This code is based on StyleGAN2 which relies on custom TensorFlow ops that are compiled on the fly using NVCC. To test that your NVCC installation is working correctly, run:
nvcc test_nvcc.cu -o test_nvcc -run
| CPU says hello.
| GPU says hello.

Preparing datasets

CelebA. To prepare the tfrecord version of CelebA dataset, first download the original aligned-and-cropped version from http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html, then use the following code to create tfrecord dataset:

python dataset_tool.py create_celeba /path/to/new_tfr_dir /path/to/downloaded_celeba_dir

For example, the new_tfr_dir can be: datasets/celeba_tfr.

FFHQ. We use the 512x512 version which can be directly downloaded from the Google Drive link using browser. Or the file can be downloaded using the official script from Flickr-Faces-HQ. Put the xxx.tfrecords file into a two-level directory such as: datasets/ffhq_tfr/xxx.tfrecords.

Other Datasets. The tfrecords versions of DSprites and 3DShapes datasets can be produced

python dataset_tool.py create_subset_from_dsprites_npz /path/to/new_tfr_dir /path/to/dsprites_npz

and

python dataset_tool.py create_subset_from_shape3d /path/to/new_tfr_dir /path/to/shape3d_file

See dataset_tool.py for how other datasets can be produced.

Training

architecture

Pretrained models are shared here. To train a model on CelebA with 2 GPUs, run code:

CUDA_VISIBLE_DEVICES=0,1 \
    python run_training_ps_sc.py \
    --result-dir /path/to/results_ps_sc/celeba \
    --data-dir /path/to/datasets \
    --dataset celeba_tfr \
    --metrics fid1k,tpl_small_0.3 \
    --num-gpus 2 \
    --mirror-augment True \
    --model_type ps_sc_gan \
    --C_lambda 0.01 \
    --fmap_decay 1 \
    --epsilon_loss 3 \
    --random_seed 1000 \
    --random_eps True \
    --latent_type normal \
    --batch_size 8 \
    --batch_per_gpu 4 \
    --n_samples_per 7 \
    --return_atts True \
    --I_fmap_base 10 \
    --G_fmap_base 9 \
    --G_nf_scale 6 \
    --D_fmap_base 10 \
    --fmap_min 64 \
    --fmap_max 512 \
    --topk_dims_to_show -1 \
    --module_list '[Const-512, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-up-1, C_spgroup-4-5, ResConv-id-1, Noise-2, ResConv-id-2]'

Note that for the dataset directory we need to separate the path into --data-dir and --dataset tags. The --model_type tag only specifies the PS-loss, and we need to use the C_spgroup-n_squares-n_codes in the --module_list tag to specify where to insert the Spatial Constriction modules in the generator. The latent traversals and metrics will be logged in the resulting directory. The --C_lambda tag is the hyper-parameter for modulating the PS-loss.

Evaluation

To evaluate a trained model, we can use the following code:

CUDA_VISIBLE_DEVICES=0 \
    python run_metrics.py \
    --result-dir /path/to/evaluate_results_dir \
    --network /path/to/xxx.pkl \
    --metrics fid50k,tpl_large_0.3,ppl2_wend \
    --data-dir /path/to/datasets \
    --dataset celeba_tfr \
    --include_I True \
    --mapping_nodup True \
    --num-gpus 1

where the --include_I is to indicate the model should be loaded with an inference network, and --mapping_nodup is to indicate that the loaded model has no W space duplication as in stylegan.

Generation

We can generate random images, traversals or gifs based on a pretrained model pkl using the following code:

CUDA_VISIBLE_DEVICES=0 \
    python run_generator_ps_sc.py generate-images \
    --network /path/to/xxx.pkl \
    --seeds 0-10 \
    --result-dir /path/to/gen_results_dir

and

CUDA_VISIBLE_DEVICES=0 \
    python run_generator_ps_sc.py generate-traversals \
    --network /path/to/xxx.pkl \
    --seeds 0-10 \
    --result-dir /path/to/traversal_results_dir

and

python run_generator_ps_sc.py \
    generate-gifs \
    --network /path/to/xxx.pkl \
    --exist_imgs_dir git_repo/PS-SC/imgs \
    --result-dir /path/to/results/gif \
    --used_imgs_ls '[sample1.png, sample2.png, sample3.png]' \
    --used_semantics_ls '[azimuth, haircolor, smile, gender, main_fringe, left_fringe, age, light_right, light_left, light_vertical, hair_style, clothes_color, saturation, ambient_color, elevation, neck, right_shoulder, left_shoulder, background_1, background_2, background_3, background_4, right_object, left_object]' \
    --attr2idx_dict '{ambient_color:35, none1:34, light_right:33, saturation:32, light_left:31, background_4:30, background_3:29, gender:28, haircolor:27, background_2: 26, light_vertical:25, clothes_color:24, azimuth:23, right_object:22, main_fringe:21, right_shoulder:20, none4:19, background_1:18, neck:17, hair_style:16, smile:15, none6:14, left_fringe:13, none8:12, none9:11, age:10, shoulder:9, glasses:8, none10:7, left_object: 6, elevation:5, none12:4, none13:3, none14:2, left_shoulder:1, none16:0}' \
    --create_new_G True

A gif generation script is provided in the shared pretrained FFHQ folder. The images referred in --used_imgs_ls is provided in the imgs folder in this repository.

Attributes Editing

We can conduct attributes editing with a disentangled model. Currently we only use generated images for this experiment due to the unsatisfactory quality of the real-image projection into disentangled latent codes.

attr_edit

First we need to generate some images and put them into a directory, e.g. /path/to/existing_generated_imgs_dir. Second we need to assign the concepts to meaningful latent dimensions using the --attr2idx_dict tag. For example, if the 23th dimension represents azimuth concept, we add the item {azimuth:23} into the dictionary. Third we need to which images to provide source attributes. We use the --attr_source_dict tag to realize it. Note that there could be multiple dimensions representing a single concept (e.g. in the following example there are 4 dimensions capturing the background information), therefore it is more desirable to ensure the source images provide all these dimensions (attributes) as a whole. A source image can provide multiple attributes. Finally we need to specify the face-source images with --face_source_ls tag. All the face-source and attribute-source images should be located in the --exist_imgs_dir. An example code is as follows:

python run_editing_ps_sc.py \
    images-editing \
    --network /path/to/xxx.pkl \
    --result-dir /path/to/editing_results \
    --exist_imgs_dir git_repo/PS-SC/imgs \
    --face_source_ls '[sample1.png, sample2.png, sample3.png]' \
    --attr_source_dict '{sample1.png: [azimuth, smile]; sample2.png: [age,fringe]; sample3.png: [lighting_right,lighting_left,lighting_vertical]}' \
    --attr2idx_dict '{ambient_color:35, none1:34, light_right:33, saturation:32, light_left:31, background_4:30, background_3:29, gender:28, haircolor:27, background_2: 26, light_vertical:25, clothes_color:24, azimuth:23, right_object:22, main_fringe:21, right_shoulder:20, none4:19, background_1:18, neck:17, hair_style:16, smile:15, none6:14, left_fringe:13, none8:12, none9:11, age:10, shoulder:9, glasses:8, none10:7, left_object: 6, elevation:5, none12:4, none13:3, none14:2, left_shoulder:1, none16:0}' \

Accumulated Perceptual Distance with 2D Rotation

fringe_vs_background

If a disentangled model has been trained, the accumulated perceptual distance figures shown in Section 3.3 (and Section 8 in the Appendix) can be plotted using the model checkpoint with the following code:

# Celeba
# The dimension for concepts: azimuth: 9; haircolor: 19; smile: 5; hair: 4; fringe: 11; elevation: 10; back: 18;
CUDA_VISIBLE_DEVICES=0 \
    python plot_latent_space.py \
    plot-rot-fn \
    --network /path/to/xxx.pkl \
    --seeds 1-10 \
    --latent_pair 19_5 \
    --load_gan True \
    --result-dir /path/to/acc_results/rot_19_5

The 2D latent traversal grid can be presented with code:

# Celeba
# The dimension for concepts: azimuth: 9; haircolor: 19; smile: 5; hair: 4; fringe: 11; elevation: 10; back: 18;
CUDA_VISIBLE_DEVICES=0 \
    python plot_latent_space.py \
    generate-grids \
    --network /path/to/xxx.pkl \
    --seeds 1-10 \
    --latent_pair 19_5 \
    --load_gan True \
    --result-dir /path/to/acc_results/grid_19_5

Citation

@inproceedings{Xinqi_cvpr21,
author={Xinqi Zhu and Chang Xu and Dacheng Tao},
title={Where and What? Examining Interpretable Disentangled Representations},
booktitle={CVPR},
year={2021}
}
Owner
Xinqi/Steven Zhu
Xinqi/Steven Zhu
PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1) This repository contains the PyTorch implementation of "A Two

10 Aug 19, 2022
C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedal

Meta Research 309 Dec 16, 2022
Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

Vinicius G. Goecks 37 Oct 30, 2022
Mmdet benchmark with python

mmdet_benchmark 本项目是为了研究 mmdet 推断性能瓶颈,并且对其进行优化。 配置与环境 机器配置 CPU:Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz GPU:NVIDIA GeForce RTX 3080 10GB 内存:64G 硬盘:1T

杨培文 (Yang Peiwen) 24 May 21, 2022
Fashion Entity Classification

Fashion-Entity-Classification - Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grays

ADITYA SHAH 1 Jan 04, 2022
Official PyTorch implementation of BlobGAN: Spatially Disentangled Scene Representations

BlobGAN: Spatially Disentangled Scene Representations Official PyTorch Implementation Paper | Project Page | Video | Interactive Demo BlobGAN.mp4 This

148 Dec 29, 2022
An efficient framework for reinforcement learning.

rl: An efficient framework for reinforcement learning Requirements Introduction PPO Test Requirements name version Python =3.7 numpy =1.19 torch =1

16 Nov 30, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 03, 2023
Does Pretraining for Summarization Reuqire Knowledge Transfer?

Pretraining summarization models using a corpus of nonsense

Approximately Correct Machine Intelligence (ACMI) Lab 12 Dec 19, 2022
Multi-Task Learning as a Bargaining Game

Nash-MTL Official implementation of "Multi-Task Learning as a Bargaining Game". Setup environment conda create -n nashmtl python=3.9.7 conda activate

Aviv Navon 87 Dec 26, 2022
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

152 Jan 02, 2023
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
Repository for self-supervised landmark discovery

self-supervised-landmarks Repository for self-supervised landmark discovery Requirements pytorch pynrrd (for 3d images) Usage The use of this models i

Riddhish Bhalodia 2 Apr 18, 2022
[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

[ICCV'2021] Image Inpainting via Conditional Texture and Structure Dual Generation

Xiefan Guo 122 Dec 11, 2022
Code for the paper "Reinforcement Learning as One Big Sequence Modeling Problem"

Trajectory Transformer Code release for Reinforcement Learning as One Big Sequence Modeling Problem. Installation All python dependencies are in envir

Michael Janner 269 Jan 05, 2023
Multitask Learning Strengthens Adversarial Robustness

Multitask Learning Strengthens Adversarial Robustness

Columbia University 15 Jun 10, 2022
Official PyTorch Implementation of Embedding Transfer with Label Relaxation for Improved Metric Learning, CVPR 2021

Embedding Transfer with Label Relaxation for Improved Metric Learning Official PyTorch implementation of CVPR 2021 paper Embedding Transfer with Label

Sungyeon Kim 37 Dec 06, 2022
A complete end-to-end demonstration in which we collect training data in Unity and use that data to train a deep neural network to predict the pose of a cube. This model is then deployed in a simulated robotic pick-and-place task.

Object Pose Estimation Demo This tutorial will go through the steps necessary to perform pose estimation with a UR3 robotic arm in Unity. You’ll gain

Unity Technologies 187 Dec 24, 2022
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022