[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Last update: Nov 29, 2022

Overview

RegionProxy

Figure 2. Performance vs. GFLOPs on ADE20K val split.

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

CVPR 2022 (Poster) [arXiv]

Installation

Note: recommend using the exact version of the packages to avoid running issues.

Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.
Install timm 0.4.12 and einops:
```
pip install timm==0.4.12 einops
```
This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.

Models

ADE20K

backbone	Resolution	FLOPs	#params.	mIoU	mIoU (ms+flip)	FPS	download
ViT-Ti/16	512x512	3.9G	5.8M	42.1	43.1	38.9	[model]
ViT-S/16	512x512	15G	22M	47.6	48.5	32.1	[model]
R26+ViT-S/32	512x512	16G	36M	47.8	49.1	28.5	[model]
ViT-B/16	512x512	59G	87M	49.8	50.5	20.1	[model]
R50+ViT-L/32	640x640	82G	323M	51.0	51.7	12.7	[model]
ViT-L/16	640x640	326G	306M	52.9	53.4	6.6	[model]

Cityscapes

backbone	Resolution	FLOPs	#params.	mIoU	mIoU (ms+flip)	download
ViT-Ti/16	768x768	69G	6M	76.5	77.7	[model]
ViT-S/16	768x768	270G	23M	79.8	81.5	[model]
ViT-B/16	768x768	1064G	88M	81.0	82.2	[model]
ViT-L/16	768x768	-	307M	81.4	82.7	[model]

Evaluation

You may evaluate the model on single GPU by running:

python test.py \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt \
	--eval mIoU

To evaluate on multiple GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 test.py \
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt 
	--eval mIoU

You may add --aug-test to enable multi-scale + flip evaluation. The test.py script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).

Training

The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:

For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:
```
import timm
m = timm.create_model('vit_tiny_patch16_384', pretrained=True)
```
The full list of entries can be found here (vanilla ViTs) and here (hybrid models).
Convert the timm models to mmsegmentation style using this script.

We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:

python -m torch.distributed.launch --nproc_per_node 8 train.py 
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--work-dir /path/to/workdir \
	--options model.pretrained=/path/to/pretrained/model

You may need to adjust data.samples_per_gpu if you plan to train on less GPUs. Please refer to this link for more training optioins.

Citation

@article{zhang2022semantic,
  title={Semantic Segmentation by Early Region Proxy},
  author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2203.14043},
  year={2022}
}

[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Related tags

Overview

RegionProxy

Installation

Models

ADE20K

Cityscapes

Evaluation

Training

Citation

Owner

Yifan

Tensorflow implementation for "Improved Transformer for High-Resolution GANs" (NeurIPS 2021).

Semantic Image Synthesis with SPADE

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

3D-Reconstruction 基于深度学习方法的单目多视图三维重建

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

WormMovementSimulation - 3D Simulation of Worm Body Movement with Neurons attached to its body

Alternatives to Deep Neural Networks for Function Approximations in Finance

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

PyTorch DepthNet Training on Still Box dataset

adversarial_multi_armed_bandit_variable_plays

Dynamic Neural Representational Decoders for High-Resolution Semantic Segmentation

Voxel-based Network for Shape Completion by Leveraging Edge Generation (ICCV 2021, oral)

Attention for PyTorch with Linear Memory Footprint

Prototypical Networks for Few shot Learning in PyTorch

This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

A lightweight deep network for fast and accurate optical flow estimation.

Deep Latent Force Models

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

🤗 Push your spaCy pipelines to the Hugging Face Hub