DAGAN - Dual Attention GANs for Semantic Image Synthesis

Last update: Oct 08, 2022

Related tags

Overview

Semantic Image Synthesis with DAGAN
Installation
Dataset Preparation
Generating Images Using Pretrained Model
Train and Test New Models
Evaluation
Acknowledgments
Related Projects
Citation
Contributions
Collaborations

Semantic Image Synthesis with DAGAN

Dual Attention GANs for Semantic Image Synthesis
Hao Tang¹, Song Bai², Nicu Sebe¹³.
¹University of Trento, Italy, ²University of Oxford, UK, ³Huawei Research Ireland, Ireland.
In ACM MM 2020.
The repository offers the official implementation of our paper in PyTorch.

In the meantime, check out our related CVPR 2020 paper Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation and Arxiv paper Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis.

Framework

Results of Generated Images

Cityscapes (512×256)

Facades (1024×1024)

ADE20K (256×256)

CelebAMask-HQ (512×512)

Results of Generated Segmenation Maps

License

The code is released for academic research use only. For commercial use, please contact [email protected].

Installation

Clone this repo.

git clone https://github.com/Ha0Tang/DAGAN
cd DAGAN/

This code requires PyTorch 1.0 and python 3+. Please install dependencies by

pip install -r requirements.txt

This code also requires the Synchronized-BatchNorm-PyTorch rep.

cd DAGAN_v1/
cd models/networks/
git clone https://github.com/vacancy/Synchronized-BatchNorm-PyTorch
cp -rf Synchronized-BatchNorm-PyTorch/sync_batchnorm .
cd ../../

To reproduce the results reported in the paper, you would need an NVIDIA DGX1 machine with 8 V100 GPUs.

Dataset Preparation

Please download the datasets on the respective webpages.

Facades: 55.8M, here.
DeepFashion: 592.3M, here.
CelebAMask-HQ: 2.7G, here.
Cityscapes: 8.4G, here.
ADE20K: 953.7M, here.
COCO-Stuff: 21.5G, here.

We also provide the prepared datasets for your convience.

sh datasets/download_dagan_dataset.sh [dataset]

where [dataset] can be one of facades, deepfashion, celeba, cityscapes, ade20k, or coco_stuff.

Generating Images Using Pretrained Model

Download the pretrained models using the following script,

sh scripts/download_dagan_model.sh GauGAN_DAGAN_[dataset]

where [dataset] can be one of cityscapes, ade, facades, or celeba.

Change several parameter and then generate images using test_[dataset].sh. If you are running on CPU mode, append --gpu_ids -1.
The outputs images are stored at ./results/[type]_pretrained/ by default. You can view them using the autogenerated HTML file in the directory.

Train and Test New Models

Prepare dataset.
Change several parameters and then run train_[dataset].sh for training. There are many options you can specify. To specify the number of GPUs to utilize, use --gpu_ids. If you want to use the second and third GPUs for example, use --gpu_ids 1,2.
Testing is similar to testing pretrained models. Use --results_dir to specify the output directory. --how_many will specify the maximum number of images to generate. By default, it loads the latest checkpoint. It can be changed using --which_epoch.

Evaluation

FID: mseitzer/pytorch-fid
FRD: Ha0Tang/GestureGAN
LPIPS: richzhang/PerceptualSimilarity
DRN: fyu/drn [model: drn-d-105_ms_cityscapes.pth]
UperNet: CSAILVision/semantic-segmentation-pytorch [model: baseline-resnet101-upernet]
DeepLab: kazuto1011/deeplab-pytorch [model: deeplabv2_resnet101_msc-cocostuff164k-100000.pth]

For more details, please refer to this issue.

Acknowledgments

This source code is inspired by both GauGAN/SPADE and LGGAN.

Related Projects

EdgeGAN | LGGAN | SelectionGAN | PanoGAN | Guided-I2I-Translation-Papers

Citation

If you use this code for your research, please consider giving stars ⭐ and citing our papers 🦖 :

DAGAN

@inproceedings{tang2020dual,
  title={Dual Attention GANs for Semantic Image Synthesis},
  author={Tang, Hao and Bai, Song and Sebe, Nicu},
  booktitle ={ACM MM},
  year={2020}
}

EdgeGAN

@article{tang2020edge,
  title={Edge Guided GANs with Semantic Preserving for Semantic Image Synthesis},
  author={Tang, Hao and Qi, Xiaojuan and Xu, Dan and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2003.13898},
  year={2020}
}

LGGAN

@inproceedings{tang2019local,
  title={Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Torr, Philip HS and Sebe, Nicu},
  booktitle={CVPR},
  year={2020}
}

SelectionGAN

@inproceedings{tang2019multi,
  title={Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation},
  author={Tang, Hao and Xu, Dan and Sebe, Nicu and Wang, Yanzhi and Corso, Jason J and Yan, Yan},
  booktitle={CVPR},
  year={2019}
}

@article{tang2020multi,
  title={Multi-channel attention selection gans for guided image-to-image translation},
  author={Tang, Hao and Xu, Dan and Yan, Yan and Corso, Jason J and Torr, Philip HS and Sebe, Nicu},
  journal={arXiv preprint arXiv:2002.01048},
  year={2020}
}

Contributions

If you have any questions/comments/bug reports, feel free to open a github issue or pull a request or e-mail to the author Hao Tang ([email protected]).

Collaborations

I'm always interested in meeting new people and hearing about potential collaborations. If you'd like to work together or get in contact with me, please email [email protected]. Some of our projects are listed here.

Take a few minutes to appreciate what you have and how far you've come.

DAGAN - Dual Attention GANs for Semantic Image Synthesis

Related tags

Overview

Contents

Semantic Image Synthesis with DAGAN

Framework

Results of Generated Images

Cityscapes (512×256)

Facades (1024×1024)

ADE20K (256×256)

CelebAMask-HQ (512×512)

Results of Generated Segmenation Maps

License

Installation

Dataset Preparation

Generating Images Using Pretrained Model

Train and Test New Models

Evaluation

Acknowledgments

Related Projects

Citation

Contributions

Collaborations

Owner

Hao Tang

Pipeline for fast building text classification TF-IDF + LogReg baselines.

Nystromformer: A Nystrom-based Algorithm for Approximating Self-Attention

Backend for the Autocomplete platform. An AI assisted coding platform.

A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.

Translates basic English sentences into the Huna language (hoo-NAH)

ACL'2021: Learning Dense Representations of Phrases at Scale

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic input and profiling. (Nvidia-Alibaba-TensoRT-hackathon2021)

Opal-lang - A WIP programming language based on Python

jiant is an NLP toolkit

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

Intent parsing and slot filling in PyTorch with seq2seq + attention

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model