Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Last update: Jan 05, 2023

Related tags

Deep Learning Polyp-PVT

Overview

Polyp-PVT

by Bo Dong, Wenhai Wang, Deng-Ping Fan, Jinpeng Li, Huazhu Fu, & Ling Shao.

This repo is the official implementation of "Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers".

1. Introduction

Polyp-PVT is initially described in arxiv.

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features; and 2) designing effective mechanism for fusing these features. Different from existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three novel modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), a and similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features. With the help of the SAM, we extend the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT , effectively suppresses noises in the features and significantly improves their expressive capabilities.

Polyp-PVT achieves strong performance on image-level polyp segmentation (0.808 mean Dice and 0.727 mean IoU on ColonDB) and video polyp segmentation (0.880 mean dice and 0.802 mean IoU on CVC-300-TV), surpassing previous models by a large margin.

2. Framework Overview

3. Results

3.1 Image-level Polyp Segmentation

3.2 Image-level Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:nhhv], including our results and that of compared models.

3.3 Video Polyp Segmentation

3.4 Video Polyp Segmentation Compared Results:

We also provide some result of baseline methods, You could download from Google Drive/Baidu Drive [code:33ie], including our results and that of compared models.

4. Usage:

4.1 Recommended environment:

Python 3.8
Pytorch 1.7.1
torchvision 0.8.2

4.2 Data preparation:

Downloading training and testing datasets and move them into ./dataset/, which can be found in this Google Drive/Baidu Drive [code:dr1h].

4.3 Pretrained model:

You should download the pretrained model from Google Drive/Baidu Drive [code:w4vk], and then put it in the './pretrained_pth' folder for initialization.

4.4 Training:

Clone the repository:

git clone https://github.com/DengPingFan/Polyp-PVT.git
cd Polyp-PVT 
bash train.sh

4.5 Testing:

cd Polyp-PVT 
bash test.sh

4.6 Evaluating your trained model:

Matlab: Please refer to the work of MICCAI2020 (link).

Python: Please refer to the work of ACMMM2021 (link).

Please note that we use the Matlab version to evaluate in our paper.

4.7 Well trained model:

You could download the trained model from Google Drive/Baidu Drive [code:9rpy] and put the model in directory './model_pth'.

4.8 Pre-computed maps:

Google Drive/Baidu Drive [code:x3jc]

5. Citation:

@aticle{dong2021PolypPVT,
  title={Polyp-PVT: Polyp Segmentation with PyramidVision Transformers},
  author={Bo, Dong and Wenhai, Wang and Deng-Ping, Fan and Jinpeng, Li and Huazhu, Fu and Ling, Shao},
  journal={arXiv preprint arXiv:2108.06932},
  year={2021}
}

6. Acknowledgement

We are very grateful for these excellent works PraNet, EAGRNet and MSEG, which have provided the basis for our framework.

7. FAQ:

If you want to improve the usability or any piece of advice, please feel free to contact me directly ([email protected]).

8. License

The source code is free for research and education use only. Any comercial use should get formal permission first.

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers (arXiv2021)

Related tags

Overview

Polyp-PVT

1. Introduction

2. Framework Overview

3. Results

3.1 Image-level Polyp Segmentation

3.2 Image-level Polyp Segmentation Compared Results:

3.3 Video Polyp Segmentation

3.4 Video Polyp Segmentation Compared Results:

4. Usage:

4.1 Recommended environment:

4.2 Data preparation:

4.3 Pretrained model:

4.4 Training:

4.5 Testing:

4.6 Evaluating your trained model:

4.7 Well trained model:

4.8 Pre-computed maps:

5. Citation:

6. Acknowledgement

7. FAQ:

8. License

Owner

Deng-Ping Fan

This is an example of object detection on Micro bacterium tuberculosis using Mask-RCNN

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Contextual Attention Network: Transformer Meets U-Net

Python Interview Questions

Use of Attention Gates in a Convolutional Neural Network / Medical Image Classification and Segmentation

Virtual hand gesture mouse using a webcam

Static Features Classifier - A static features classifier for Point-Could clusters using an Attention-RNN model

Code for the published paper : Learning to recognize rare traffic sign

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Repository for reproducing `Model-Based Robust Deep Learning`

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

Cereal box identification in store shelves using computer vision and a single train image per model.

Automatic packaging of the open-composite libs for OvGME

The repository forked from NVlabs uses our data. (Differentiable rasterization applied to 3D model simplification tasks)

DeepProbLog is an extension of ProbLog that integrates Probabilistic Logic Programming with deep learning by introducing the neural predicate.

PyTorch implementation for Stochastic Fine-grained Labeling of Multi-state Sign Glosses for Continuous Sign Language Recognition.

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.