CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

Overview

CCNet: Criss-Cross Attention for Semantic Segmentation

Paper Links: Our most recent TPAMI version with improvements and extensions (Earlier ICCV version).

By Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Chang Huang, Humphrey Shi, Wenyu Liu and Thomas S. Huang.

Updates

2021/02: The pure python implementation of CCNet is released in the branch pure-python. Thanks Serge-weihao.

2019/08: The new version CCNet is released on branch Pytorch-1.1 which supports Pytorch 1.0 or later and distributed multiprocessing training and testing This current code is a implementation of the experiments on Cityscapes in the CCNet ICCV version. We implement our method based on open source pytorch segmentation toolbox.

2018/12: Renew the code and release trained models with R=1,2. The trained model with R=2 achieves 79.74% on val set and 79.01% on test set with single scale testing.

2018/11: Code released.

Introduction

motivation of CCNet Long-range dependencies can capture useful contextual information to benefit visual understanding problems. In this work, we propose a Criss-Cross Network (CCNet) for obtaining such important information through a more effective and efficient way. Concretely, for each pixel, our CCNet can harvest the contextual information of its surrounding pixels on the criss-cross path through a novel criss-cross attention module. By taking a further recurrent operation, each pixel can finally capture the long-range dependencies from all pixels. Overall, our CCNet is with the following merits:

  • GPU memory friendly
  • High computational efficiency
  • The state-of-the-art performance

Architecture

Overview of CCNet Overview of the proposed CCNet for semantic segmentation. The proposed recurrent criss-cross attention takes as input feature maps H and output feature maps H'' which obtain rich and dense contextual information from all pixels. Recurrent criss-cross attention module can be unrolled into R=2 loops, in which all Criss-Cross Attention modules share parameters.

Visualization of the attention map

Overview of Attention map To get a deeper understanding of our RCCA, we visualize the learned attention masks as shown in the figure. For each input image, we select one point (green cross) and show its corresponding attention maps when R=1 and R=2 in columns 2 and 3 respectively. In the figure, only contextual information from the criss-cross path of the target point is capture when R=1. By adopting one more criss-cross module, ie, R=2 the RCCA can finally aggregate denser and richer contextual information compared with that of R=1. Besides, we observe that the attention module could capture semantic similarity and long-range dependencies.

License

CCNet is released under the MIT License (refer to the LICENSE file for details).

Citing CCNet

If you find CCNet useful in your research, please consider citing:

@article{huang2020ccnet,
  author={Huang, Zilong and Wang, Xinggang and Wei, Yunchao and Huang, Lichao and Shi, Humphrey and Liu, Wenyu and Huang, Thomas S.},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={CCNet: Criss-Cross Attention for Semantic Segmentation}, 
  year={2020},
  month={},
  volume={},
  number={},
  pages={1-1},
  keywords={Semantic Segmentation;Graph Attention;Criss-Cross Network;Context Modeling},
  doi={10.1109/TPAMI.2020.3007032},
  ISSN={1939-3539}}

@article{huang2018ccnet,
    title={CCNet: Criss-Cross Attention for Semantic Segmentation},
    author={Huang, Zilong and Wang, Xinggang and Huang, Lichao and Huang, Chang and Wei, Yunchao and Liu, Wenyu},
    booktitle={ICCV},
    year={2019}}

Instructions for Code (2019/08 version):

Requirements

To install PyTorch==0.4.0 or 0.4.1, please refer to https://github.com/pytorch/pytorch#installation.
4 x 12G GPUs (e.g. TITAN XP)
Python 3.6
gcc (GCC) 4.8.5
CUDA 8.0

Compiling

# Install **Pytorch**
$ conda install pytorch torchvision -c pytorch

# Install **Apex**
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

# Install **Inplace-ABN**
$ git clone https://github.com/mapillary/inplace_abn.git
$ cd inplace_abn
$ python setup.py install

Dataset and pretrained model

Plesae download cityscapes dataset and unzip the dataset into YOUR_CS_PATH.

Please download MIT imagenet pretrained resnet101-imagenet.pth, and put it into dataset folder.

Training and Evaluation

Training script.

python train.py --data-dir ${YOUR_CS_PATH} --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 0,1,2,3 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --recurrence 2

Recommend】You can also open the OHEM flag to reduce the performance gap between val and test set.

python train.py --data-dir ${YOUR_CS_PATH} --random-mirror --random-scale --restore-from ./dataset/resnet101-imagenet.pth --gpu 0,1,2,3 --learning-rate 1e-2 --input-size 769,769 --weight-decay 1e-4 --batch-size 8 --num-steps 60000 --recurrence 2 --ohem 1 --ohem-thres 0.7 --ohem-keep 100000

Evaluation script.

python evaluate.py --data-dir ${YOUR_CS_PATH} --restore-from snapshots/CS_scenes_60000.pth --gpu 0 --recurrence 2

All in one.

./run_local.sh YOUR_CS_PATH

Models

We run CCNet with R=1,2 three times on cityscape dataset separately and report the results in the following table. Please note there exist some problems about the validation/testing set accuracy gap (1~2%). You need to run multiple times to achieve a small gap or turn on OHEM flag. Turning on OHEM flag also can improve the performance on the val set. In general, I recommend you use OHEM in training step.

We train all the models on fine training set and use the single scale for testing. The trained model with R=2 79.74 can also achieve about 79.01 mIOU on cityscape test set with single scale testing (for saving time, we use the whole image as input).

R mIOU on cityscape val set (single scale) Link
1 77.31 & 77.91 & 76.89 77.91
2 79.74 & 79.22 & 78.40 79.74
2+OHEM 78.67 & 80.00 & 79.83 80.00

Acknowledgment

We thank NSFC, ARC DECRA DE190101315, ARC DP200100938, HUST-Horizon Computer Vision ResearchCenter, and IBM-ILLINOIS Center for Cognitive ComputingSystems Research (C3SR).

Thanks to the Third Party Libs

Self-attention related methods:
Object Context Network
Dual Attention Network
Semantic segmentation toolboxs:
pytorch-segmentation-toolbox
semantic-segmentation-pytorch
PyTorch-Encoding

Owner
Zilong Huang
HUSTer
Zilong Huang
Dilated Convolution for Semantic Image Segmentation

Multi-Scale Context Aggregation by Dilated Convolutions Introduction Properties of dilated convolution are discussed in our ICLR 2016 conference paper

Fisher Yu 764 Dec 26, 2022
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
PyTorch implementation of normalizing flow models

PyTorch implementation of normalizing flow models

Vincent Stimper 242 Jan 02, 2023
Mixed Transformer UNet for Medical Image Segmentation

MT-UNet Update 2022/01/05 By another round of training based on previous weights, our model also achieved a better performance on ACDC (91.61% DSC). W

dotman 92 Dec 25, 2022
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Hierarchical Attention Networks for Document Classification This is an implementation of the paper Hierarchical Attention Networks for Document Classi

Quoc-Tuan Truong 83 Dec 05, 2022
The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

LEAR The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction". See below for an overview of

杨攀 93 Jan 07, 2023
This package implements THOR: Transformer with Stochastic Experts.

THOR: Transformer with Stochastic Experts This PyTorch package implements Taming Sparsely Activated Transformer with Stochastic Experts. Installation

Microsoft 45 Nov 22, 2022
LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

LLVM-based compiler for LightGBM gradient-boosted trees. Speeds up prediction by ≥10x.

Simon Boehm 183 Jan 02, 2023
(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

CLNet (ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning" [project page] [paper] Citing CLNet If yo

Chen Zhao 22 Aug 26, 2022
PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

Asym-Siam: On the Importance of Asymmetry for Siamese Representation Learning This is a PyTorch implementation of the Asym-Siam paper, CVPR 2022: @inp

Meta Research 89 Dec 18, 2022
(EI 2022) Controllable Confidence-Based Image Denoising

Image Denoising with Control over Deep Network Hallucination Paper and arXiv preprint -- Our frequency-domain insights derive from SFM and the concept

Images and Visual Representation Laboratory (IVRL) at EPFL 5 Dec 18, 2022
PyTorch trainer and model for Sequence Classification

PyTorch-trainer-and-model-for-Sequence-Classification After cloning the repository, modify your training data so that the training data is a .csv file

NhanTieu 2 Dec 09, 2022
Deep learned, hardware-accelerated 3D object pose estimation

Isaac ROS Pose Estimation Overview This repository provides NVIDIA GPU-accelerated packages for 3D object pose estimation. Using a deep learned pose e

NVIDIA Isaac ROS 41 Dec 18, 2022
Implementation of average- and worst-case robust flatness measures for adversarial training.

Relating Adversarially Robust Generalization to Flat Minima This repository contains code corresponding to the MLSys'21 paper: D. Stutz, M. Hein, B. S

David Stutz 13 Nov 27, 2022
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience This repository is the official implementation of [https://www.bi

Eulerlab 6 Oct 09, 2022
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

XLearning Group 33 Nov 01, 2022
Algorithms for outlier, adversarial and drift detection

Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline d

Seldon 1.6k Dec 31, 2022
The devkit of the nuPlan dataset.

The devkit of the nuPlan dataset.

Motional 264 Jan 03, 2023