ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Last update: Dec 19, 2022

Related tags

Overview

This is the project page for the paper:

ISTR: End-to-End Instance Segmentation via Transformers,
Jie Hu, Liujuan Cao, Yao Lu, ShengChuan Zhang, Yan Wang, Ke Li, Feiyue Huang, Ling Shao, Rongrong Ji,
arXiv 2105.00637

⭐ Highlights:

GPU Friendly: Four 1080Ti/2080Ti GPUs can handle the training for R50, R101 backbones with ISTR.
High Performance: On COCO test-dev, ISTR-R50-3x gets 46.8/38.6 box/mask AP, and ISTR-R101-3x gets 48.1/39.9 box/mask AP.

Updates

(2021.05.03) The project page for ISTR is avaliable.

Models

Method	inf. time	box AP	mask AP	download
ISTR-R50-3x	17.8 FPS	46.8	38.6	model \| log
ISTR-R101-3x	13.9 FPS	48.1	39.9	model \| log

The inference time is evaluated with a single 2080Ti GPU.
We use the models pre-trained on ImageNet using torchvision. The ImageNet pre-trained ResNet-101 backbone is obtained from SparseR-CNN.

Installation

The codes are built on top of Detectron2, SparseR-CNN, and AdelaiDet.

Requirements

Python=3.8
PyTorch=1.6.0, torchvision=0.7.0, cudatoolkit=10.1
OpenCV for visualization

Steps

Install the repository (we recommend to use Anaconda for installation.)

conda create -n ISTR python=3.8 -y
conda activate ISTR
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.1 -c pytorch
pip install opencv-python
pip install scipy
pip install shapely
git clone https://github.com/hujiecpp/ISTR.git
cd ISTR
python setup.py build develop

Link coco dataset path

ln -s /coco_dataset_path/coco ./datasets

Train ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml

Evaluate ISTR (e.g., with ResNet50 backbone)

python projects/ISTR/train_net.py --num-gpus 4 --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --eval-only MODEL.WEIGHTS ./output/model_final.pth

Visualize the detection and segmentation results (e.g., with ResNet50 backbone)

python demo/demo.py --config-file projects/ISTR/configs/ISTR-R50-3x.yaml --input input1.jpg --output ./output --confidence-threshold 0.4 --opts MODEL.WEIGHTS ./output/model_final.pth

Citation

If our paper helps your research, please cite it in your publications:

@article{hu2021ISTR,
  title={ISTR: End-to-End Instance Segmentation via Transformers},
  author={Hu, Jie and Cao, Liujuan and Lu, Yao and Zhang, ShengChuan and Li, Ke and Huang, Feiyue and Shao, Ling and Ji, Rongrong},
  journal={arXiv preprint arXiv:2105.00637},
  year={2021}
}

ISTR: End-to-End Instance Segmentation with Transformers (https://arxiv.org/abs/2105.00637)

Related tags

Overview

Updates

Models

Installation

Requirements

Steps

Citation

Owner

Jie Hu

Machine learning for NeuroImaging in Python

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

A robotic arm that mimics hand movement through MediaPipe tracking.

Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Official code repository for "Exploring Neural Models for Query-Focused Summarization"

LQM - Improving Object Detection by Estimating Bounding Box Quality Accurately

This is the official implementation code repository of Underwater Light Field Retention : Neural Rendering for Underwater Imaging (Accepted by CVPR Workshop2022 NTIRE)

Unofficial Implementation of MLP-Mixer, Image Classification Model

Implementation of paper "Self-supervised Learning on Graphs:Deep Insights and New Directions"

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

YOLOv5 in PyTorch > ONNX > CoreML > TFLite

This repository is for EMNLP 2021 paper: It is Not as Good as You Think! Evaluating Simultaneous Machine Translation on Interpretation Data

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Differential fuzzing for the masses!

Codes for "Template-free Prompt Tuning for Few-shot NER".

pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.