Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Last update: Dec 23, 2022

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba
Paper | Website | arxiv

This repository contains code for finding layer-selective directions, distilling them, and loading the vocabulary of visual concepts in BigGAN used in the original paper.

Notice: This repository is under active development! Expect instability until at least October 25th, 2021.

Installation

The provided code has been tested for Python 3.8 on MacOS and Ubuntu 20.04. It may still work in other environments, but we make no guarantees.

To run the code yourself, start by cloning the repository:

git clone https://github.com/schwettmann/visual-vocab
cd visual-vocab

(Optional) You will probably want to create a conda environment or virtual environment instead of installing the dependencies globally. E.g., to create a new virtual environment you can run:

python3 -m venv env
source env/bin/activate

Finally, install the Python dependencies using pip:

pip3 install -r requirements.txt

Usage

Notice: This section is under construction and will be updated as functionality gets added.

To download any of the various annotated directions from the paper, use datasets.load submodule. It downloads and parses the annoated directions. Example usage:

from visualvocab import datasets

# Download layer-selective directions and annotations used for distilling single-word directions:
dataset = datasets.load('lsd_all')

# Download distilled directions for all BigGAN-Places365 categories:
dataset = datasets.load('distilled_all')

# Download distilled directions for a specific BigGAN-Places365 category:
dataset = datasets.load('distilled_cottage')

See the module for a full list of available annotated directions.

Citation

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, Antonio Torralba. Toward a Visual Concept Vocabulary for GAN Latent Space, Proceedings of the International Conference on Computer Vision (ICCV), 2021.

Bibtex

@InProceedings{Schwettmann_2021_ICCV,
    author    = {Schwettmann, Sarah and Hernandez, Evan and Bau, David and Klein, Samuel and Andreas, Jacob and Torralba, Antonio},
    title     = {Toward a Visual Concept Vocabulary for GAN Latent Space},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {6804-6812}
}

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}

Installation

Usage

Citation

Bibtex

Owner

Sarah Schwettmann

Treemap visualisation of Maya scene files

👑 spaCy building blocks and visualizers for Streamlit apps

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

SurvTRACE: Transformers for Survival Analysis with Competing Events

Host your own GPT-3 Discord bot

Almost State-of-the-art Text Generation library

SDL: Synthetic Document Layout dataset

A Structured Self-attentive Sentence Embedding

A fast and lightweight python-based CTC beam search decoder for speech recognition.

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Fast, DB Backed pretrained word embeddings for natural language processing.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Code for lyric-section-to-comment generation based on huggingface transformers.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Toward a Visual Concept Vocabulary for GAN Latent Space, ICCV 2021

Related tags

Overview

Toward a Visual Concept Vocabulary for GAN Latent Space Code and data from the ICCV 2021 paper

Installation

Usage

Citation

Bibtex

Owner

Sarah Schwettmann

Treemap visualisation of Maya scene files

👑 spaCy building blocks and visualizers for Streamlit apps

An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

DeepSpeech - Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

SurvTRACE: Transformers for Survival Analysis with Competing Events

Host your own GPT-3 Discord bot

Almost State-of-the-art Text Generation library

SDL: Synthetic Document Layout dataset

A Structured Self-attentive Sentence Embedding

A fast and lightweight python-based CTC beam search decoder for speech recognition.

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Fast, DB Backed pretrained word embeddings for natural language processing.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Code for lyric-section-to-comment generation based on huggingface transformers.

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Toward a Visual Concept Vocabulary for GAN Latent Space
_{Code and data from the ICCV 2021 paper}