A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

Last update: Dec 16, 2022

Related tags

Overview

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators

The authors are hidden for the purpose of double blind in the process of review.

Main idea

Introduce attention U-net into the field of blind real world image super resolution. We aims to provide a super resolution method with sharper result and less distortion.

Sharper:

Less distortion:

Network Architecture

The overall architecture of the A-ESRGAN, where the generator is adopted from ESRGAN:

The architecture of a single attention U-net discriminator:

The attention block is modified from 3D attention U-net's attention gate:

Attention Map

We argue it is the attention map that plays the main role in improving the quality of super resolution images. To support our idea, we visualize how the attention coefficients changes in time and space.

We argue that during the training process the attention will gradually focus on regions where color changes abruptly, i.e. edges. And attention layer in different depth will give us edges of different granularity.

Attention coefficients changes across time.

Attention coefficients changes across space.

Multi Scale

Multi scale discriminator has to learn whether parts of the image is clear enough from different receptive fields. From this perspective, different discriminator can learn complementary knowledge. From the figure below, normal discriminator learn to focus on edges, while down-sampled discriminator learn patch-like patterns such as textures.

Thus, comparing with the single attention u-net discriminator, multi-scale u-net discriminator can generate more realistic and detailed images.

Better Texture:

Test Sets

The datasets for test in our A-ESRGAN model are the standard benchmark datasets Set5, Set14, BSD100, Sun-Hays80, Urban100. Noted that we directly apply 4X super resolution to the original real world images and use NIQE to test the perceptual quality of the result. As shown in the figure below, these 5 datasets have covered a large variety of images.

A combined dataset can be find in DatasetsForSR.zip.

We compare with ESRGAN, RealSR, BSRGAN, RealESRGAN on the above 5 datasets and use NIQE as our metrics. The result can be seen in the table below:

Note a lower NIQE score shows a better perceptual quality.

Quick Use

Inference Script

! We now only provides 4X super resolution now.

Download pre-trained models: A-ESRGAN-Single.pth to the experiments/pretrained_models.

wget https://github.com/aergan/A-ESRGAN/releases/download/v1.0.0/A_ESRGAN_Single.pth

Inference:

python inference_aesrgan.py --model_path=experiments/pretrained_models/A_ESRGAN_Single.pth --input=inputs

Results are in the results folder

NIQE Script

The NIQE Script is used to give the Mean NIQE score of a certain directory of images.

Cacluate NIQE score:

cd NIQE_Script
python niqe.py --path=../results

Visualization Script

The Visualization Script is used to visualize the attention coefficient of each attention layer in the attention based U-net discriminator. It has two scripts. One script discriminator_attention_visual(Single).py is used to visualize how the attention of each layer is updated during the training process on a certain image. Another Script combine.py is used to combine the heat map together with original image.

Generate heat maps:

First download single.zip and unzip to experiments/pretrained_models/single

cd Visualization_Script
python discriminator_attention_visual(Single).py --img_path=../inputs/img_015_SRF_4_HR.png

The heat maps will be contained in Visualization_Script/Visual

If you want to see how the heat map looks when combining with the original image, run:

python combine.py --img_path=../inputs/img_015_SRF_4_HR.png

The combined images will be contained in Visualization_Script/Combined

! Multi-scale discriminator attention map visualization:

Download multi.zip and unzip to experiments/pretrained_models/multi

Run discriminator_attention_visual(Mulit).py similar to discriminator_attention_visual(Single).py.

!See what the multi-scale discriminator output

Run Multi_discriminator_Output.py and you could see the visualization of pixel-wise loss from the discriminators.

! Note we haven't provided a combined script for multi attention map yet.

Model_Zoo

The following models are the generators, used in the A-ESRGAN

A_ESRGAN_Multi.pth: X4 model trained with multi scale U-net based discriminators.
A_ESRGAN_Single.pth: X4 model trained with a single U-net based discriminators.
RealESRNet_x4plus.pth: official Real-ESRNet model (X4), where A-ESRGAN is fine-tuned on.

The following models are discriminators, which are usually used for fine-tuning.

The following models are the checkpoints of discriminators during A-ESRGAN training process, which are provided for visualization attention.

Training and Finetuning on your own dataset

We follow the same setting as RealESRGAN, and a detailed guide can be found in Training.md.

Acknowledgement

Our implementation of A-ESRGAN is based on the BasicSR and Real-ESRGAN.

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

savior是一个能够进行快速集成算法模块并支持高性能部署的轻量开发框架。能够帮助将团队进行快速想法验证（PoC），避免重复的去github上找模型然后复现模型；能够帮助团队将功能进行流程拆解，很方便的提高分布式执行效率；能够有效减少代码冗余，减少不必要负担。

125 Dec 22, 2022

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

151 Dec 30, 2022

Comments

About the pre-trained model

Hi, is the A-ESRGAN-multi pertained model available?

the link below seems broken.

https://github.com/aergan/A-ESRGAN/releases/download/v1.0.0/A_ESRGAN_Multi.pth

opened by ShiinaMitsuki 1
some error

/media/xyt/software/anaconda3/envs/basicSR/bin/python /media/xyt/data/github/SR/code/A-ESRGAN/train.py -opt options/train_aesrgan_x4plus.yml --debug 2022-02-09 18:17:12,962 INFO: Dataset [RealESRGANDataset] - DF2K is built. 2022-02-09 18:17:12,962 INFO: Training statistics: Number of train images: 500 Dataset enlarge ratio: 1 Batch size per gpu: 6 World size (gpu number): 1 Require iter number per epoch: 84 Total epochs: 4762; iters: 400000. Traceback (most recent call last): File "/media/xyt/data/github/SR/code/A-ESRGAN/train.py", line 11, in train_pipeline(root_path) File "/media/xyt/software/anaconda3/envs/basicSR/lib/python3.7/site-packages/basicsr/train.py", line 128, in train_pipeline model = build_model(opt) File "/media/xyt/software/anaconda3/envs/basicSR/lib/python3.7/site-packages/basicsr/models/init.py", line 27, in build_model model = MODEL_REGISTRY.get(opt['model_type'])(opt) File "/media/xyt/software/anaconda3/envs/basicSR/lib/python3.7/site-packages/basicsr/utils/registry.py", line 65, in get raise KeyError(f"No object named '{name}' found in '{self._name}' registry!") KeyError: "No object named 'RealESRGANModel' found in 'model' registry!"

opened by xiayutong 1

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

Related tags

Overview

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators

Main idea

Sharper:

Less distortion:

Network Architecture

Attention Map

Attention coefficients changes across time.

Attention coefficients changes across space.

Multi Scale

Better Texture:

Test Sets

Quick Use

Inference Script

NIQE Script

Visualization Script

Model_Zoo

Training and Finetuning on your own dataset

Acknowledgement

You might also like...

The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework that ensures reliability, high concurrency and scalability of services.

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

My usage of Real-ESRGAN to upscale anime, some test and results in the test_img folder

Comments

About the pre-trained model

some error

Releases(v1.0.0)

v1.0.0(Dec 11, 2021)

Owner

Code of TIP2021 Paper《SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition》. We provide both MxNet and Pytorch versions.

Deep Probabilistic Programming Course @ DIKU

Vector Quantization, in Pytorch

Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Python library for loading and using triangular meshes.

Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Keras implementation of "One pixel attack for fooling deep neural networks" using differential evolution on Cifar10 and ImageNet

Implementation of character based convolutional neural network

MQBench: Towards Reproducible and Deployable Model Quantization Benchmark

Astrostatistics class for the MSc degree in Astrophysics at the University of Milan-Bicocca (Italy)

Distinguishing Commercial from Editorial Content in News

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

A simple program for training and testing vit

Pytorch implementation of the paper "Class-Balanced Loss Based on Effective Number of Samples"

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Noether Networks: meta-learning useful conserved quantities

A novel pipeline framework for multi-hop complex KGQA task. About the paper title: Improving Multi-hop Embedded Knowledge Graph Question Answering by Introducing Relational Chain Reasoning

Finding all things on-prem Microsoft for password spraying and enumeration.

Interpolation-based reduced-order models