Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Last update: Dec 28, 2022

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

Owner

Casual GAN Papers

A multi-mode modulator for multi-domain few-shot classification (ICCV)

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

GrailQA: Strongly Generalizable Question Answering

WebUAV-3M: A Benchmark Unveiling the Power of Million-Scale Deep UAV Tracking

This is my research project for the Irving Center for Cancer Dynamics/Azizi Lab, Columbia University.

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

BARTScore: Evaluating Generated Text as Text Generation

This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Dictionary Learning with Uniform Sparse Representations for Anomaly Detection

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Liver segmentation using MONAI and pytorch

🦕 NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021)

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Self-training for Few-shot Transfer Across Extreme Task Differences

Python版OpenCVのTracking APIのサンプルです。DaSiamRPNアルゴリズムまで対応しています。

QI-Q RoboMaster2022 CV Algorithm