YolactEdge: Real-time Instance Segmentation on the Edge

Overview

YolactEdge: Real-time Instance Segmentation on the Edge

██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗    ███████╗██████╗  ██████╗ ███████╗
╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝    ██╔════╝██╔══██╗██╔════╝ ██╔════╝
 ╚████╔╝ ██║   ██║██║     ███████║██║        ██║       █████╗  ██║  ██║██║  ███╗█████╗  
  ╚██╔╝  ██║   ██║██║     ██╔══██║██║        ██║       ██╔══╝  ██║  ██║██║   ██║██╔══╝  
   ██║   ╚██████╔╝███████╗██║  ██║╚██████╗   ██║       ███████╗██████╔╝╚██████╔╝███████╗
   ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝       ╚══════╝╚═════╝  ╚═════╝ ╚══════╝

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images. This is the code for our paper.

For a real-time demo and more samples, check out our demo video.

example-gif-1

example-gif-2

example-gif-3

Installation

See INSTALL.md.

Model Zoo

We provide baseline YOLACT and YolactEdge models trained on COCO and YouTube VIS (our sub-training split, with COCO joint training).

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands.

YouTube VIS models:

Method Backbone  mAP AGX-Xavier FPS RTX 2080 Ti FPS weights
YOLACT R-50-FPN 44.7 8.5 59.8 download | mirror
YolactEdge
(w/o TRT)
R-50-FPN 44.2 10.5 67.0 download | mirror
YolactEdge R-50-FPN 44.0 32.4 177.6 download | mirror
YOLACT R-101-FPN 47.3 5.9 42.6 download | mirror
YolactEdge
(w/o TRT)
R-101-FPN 46.9 9.5 61.2 download | mirror
YolactEdge R-101-FPN 46.2 30.8 172.7 download | mirror

COCO models:

Method    Backbone     mAP Titan Xp FPS AGX-Xavier FPS RTX 2080 Ti FPS weights
YOLACT MobileNet-V2 22.1 - 15.0 35.7 download | mirror
YolactEdge MobileNet-V2 20.8 - 35.7 161.4 download | mirror
YOLACT R-50-FPN 28.2 42.5 9.1 45.0 download | mirror
YolactEdge R-50-FPN 27.0 - 30.7 140.3 download | mirror
YOLACT R-101-FPN 29.8 33.5 6.6 36.5 download | mirror
YolactEdge R-101-FPN 29.5 - 27.3 124.8 download | mirror

Getting Started

Follow the installation instructions to set up required environment for running YolactEdge.

See instructions to evaluate and train with YolactEdge.

Colab Notebook

Try out our Colab Notebook with a live demo to learn about basic usage.

If you are interested in evaluating YolactEdge with TensorRT, we provide another Colab Notebook with TensorRT environment configuration on Colab.

Evaluation

Quantitative Results

# Convert each component of the trained model to TensorRT using the optimal settings and evaluate on the YouTube VIS validation set (our split).
python3 eval.py --trained_model=./weights/yolact_edge_vid_847_50000.pth

# Evaluate on the entire COCO validation set.
python3 eval.py --trained_model=./weights/yolact_edge_54_800000.pth

# Output a COCO JSON file for the COCO test-dev. The command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively. These files can then be submitted to the website for evaluation.
python3 eval.py --trained_model=./weights/yolact_edge_54_800000.pth --dataset=coco2017_testdev_dataset --output_coco_json

Qualitative Results

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --display

Benchmarking

# Benchmark the trained model on the COCO validation set.
# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --benchmark --max_images=1000

Notes

Inference using models trained with YOLACT

If you have a pre-trained model with YOLACT, and you want to take advantage of either TensorRT feature of YolactEdge, simply specify the --config=yolact_edge_config in command line options, and the code will automatically detect and convert the model weights to be compatible.

python3 eval.py --config=yolact_edge_config --trained_model=./weights/yolact_base_54_800000.pth

Inference without Calibration

If you want to run inference command without calibration, you can either run with FP16-only TensorRT optimization, or without TensorRT optimization with corresponding configs. Refer to data/config.py for examples of such configs.

# Evaluate YolactEdge with FP16-only TensorRT optimization with '--use_fp16_tensorrt' option (replace all INT8 optimization with FP16).
python3 eval.py --use_fp16_tensorrt --trained_model=./weights/yolact_edge_54_800000.pth

# Evaluate YolactEdge without TensorRT optimization with '--disable_tensorrt' option.
python3 eval.py --disable_tensorrt --trained_model=./weights/yolact_edge_54_800000.pth

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
# If video_multiframe > 1, then the trt_batch_size should be increased to match it or surpass it. 
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --trt_batch_size 2 --video=0

# Process a video and save it to another file. This is unoptimized.
python eval.py --trained_model=weights/yolact_edge_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4

Use the help option to see a description of all available command line arguments:

python eval.py --help

Training

Make sure to download the entire dataset using the commands above.

  • To train, grab an imagenet-pretrained model and put it in ./weights.
    • For Resnet101, download resnet101_reducedfc.pth from here.
    • For Resnet50, download resnet50-19c8e357.pth from here.
    • For MobileNetV2, download mobilenet_v2-b0353104.pth from here.
  • Run one of the training commands below.
    • Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
    • All weights are saved in the ./weights directory by default with the file name __.pth.
# Trains using the base edge config with a batch size of 8 (the default).
python train.py --config=yolact_edge_config

# Resume training yolact_edge with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_edge_config --resume=weights/yolact_edge_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Training on video dataset

# Pre-train the image based model
python train.py --config=yolact_edge_youtubevis_config

# Train the flow (warping) module
python train.py --config=yolact_edge_vid_trainflow_config --resume=./weights/yolact_edge_youtubevis_847_50000.pth

# Fine tune the network jointly
python train.py --config=yolact_edge_vid_config --resume=./weights/yolact_edge_vid_trainflow_144_100000.pth

Custom Datasets

You can also train on your own dataset by following these steps:

  • Depending on the type of your dataset, create a COCO-style (image) or YTVIS-style (video) Object Detection JSON annotation file for your dataset. The specification for this can be found here for COCO and YTVIS respectively. Note that we don't use some fields, so the following may be omitted:
    • info
    • liscense
    • Under image: license, flickr_url, coco_url, date_captured
    • categories (we use our own format for categories, see below)
  • Create a definition for your dataset under dataset_base in data/config.py (see the comments in dataset_base for an explanation of each field):
my_custom_dataset = dataset_base.copy({
    'name': 'My Dataset',

    'train_images': 'path_to_training_images',
    'train_info':   'path_to_training_annotation',

    'valid_images': 'path_to_validation_images',
    'valid_info':   'path_to_validation_annotation',

    'has_gt': True,
    'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...),

    # below is only needed for YTVIS-style video dataset.

    # whether samples all frames or key frames only.
    'use_all_frames': False,

    # the following four lines define the frame sampling strategy for the given dataset.
    'frame_offset_lb': 1,
    'frame_offset_ub': 4,
    'frame_offset_multiplier': 1,
    'all_frame_direction': 'allway',

    # 1 of K frames is annotated
    'images_per_video': 5,

    # declares a video dataset
    'is_video': True
})
  • Note that: class IDs in the annotation file should start at 1 and increase sequentially on the order of class_names. If this isn't the case for your annotation file (like in COCO), see the field label_map in dataset_base.
  • Finally, in yolact_edge_config in the same file, change the value for 'dataset' to 'my_custom_dataset' or whatever you named the config object above. Then you can use any of the training commands in the previous section.

Citation

If you use this code base in your work, please consider citing:

@article{yolactedge,
  author    = {Haotian Liu and Rafael A. Rivera Soto and Fanyi Xiao and Yong Jae Lee},
  title     = {YolactEdge: Real-time Instance Segmentation on the Edge (Jetson AGX Xavier: 30 FPS, RTX 2080 Ti: 170 FPS)},
  journal   = {arXiv preprint arXiv:2012.12259},
  year      = {2020},
}
@inproceedings{yolact-iccv2019,
  author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
  title     = {YOLACT: {Real-time} Instance Segmentation},
  booktitle = {ICCV},
  year      = {2019},
}

Contact

For questions about our paper or code, please contact Haotian Liu or Rafael A. Rivera-Soto.

Owner
Haotian Liu
Haotian Liu
Implement slightly different caffe-segnet in tensorflow

Tensorflow-SegNet Implement slightly different (see below for detail) SegNet in tensorflow, successfully trained segnet-basic in CamVid dataset. Due t

Tseng Kuan Lun 364 Oct 27, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
3D cascade RCNN for object detection on point cloud

3D Cascade RCNN This is the implementation of 3D Cascade RCNN: High Quality Object Detection in Point Clouds. We designed a 3D object detection model

Qi Cai 22 Dec 02, 2022
Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation

Orange Chicken: Data-driven Model Generalizability in Crosslinguistic Low-resource Morphological Segmentation This repository contains code and data f

Zoey Liu 0 Jan 07, 2022
A curated list of resources for Image and Video Deblurring

A curated list of resources for Image and Video Deblurring

Subeesh Vasu 1.7k Jan 01, 2023
for a paper about leveraging discourse markers for training new models

TSLM-DISCOURSE-MARKERS Scope This repository contains: (1) Code to extract discourse markers from wikipedia (TSA). (1) Code to extract significant dis

International Business Machines 6 Nov 02, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing. First, it

356 Dec 23, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021

Spatial-Temporal Transformer for Dynamic Scene Graph Generation Pytorch Implementation of our paper Spatial-Temporal Transformer for Dynamic Scene Gra

Yuren Cong 119 Jan 01, 2023
Python Environment for Bayesian Learning

Pebl is a python library and command line application for learning the structure of a Bayesian network given prior knowledge and observations. Pebl in

Abhik Shah 103 Jul 14, 2022
Management Dashboard for Torchserve

Torchserve Dashboard Torchserve Dashboard using Streamlit Related blog post Usage Additional Requirement: torchserve (recommended:v0.5.2) Simply run:

Ceyda Cinarel 103 Dec 10, 2022
Gems & Holiday Package Prediction

Predictive_Modelling Gems & Holiday Package Prediction This project is based on 2 cases studies : Gems Price Prediction and Holiday Package prediction

Avnika Mehta 1 Jan 27, 2022
Coded illumination for improved lensless imaging

CodedCam Coded Illumination for Improved Lensless Imaging Paper | Supplementary results | Data and Code are available. Coded illumination for improved

Computational Sensing and Information Processing Lab 1 Nov 29, 2021
PyTorch implementation of Higher Order Recurrent Space-Time Transformer

Higher Order Recurrent Space-Time Transformer (HORST) This is the official PyTorch implementation of Higher Order Recurrent Space-Time Transformer. Th

13 Oct 18, 2022
CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation

CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation (CVPR 2021, oral presentation) CoCosNet v2: Full-Resolution Correspondence

Microsoft 308 Dec 07, 2022
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 05, 2022
code for "Feature Importance-aware Transferable Adversarial Attacks"

Feature Importance-aware Attack(FIA) This repository contains the code for the paper: Feature Importance-aware Transferable Adversarial Attacks (ICCV

Hengchang Guo 44 Nov 24, 2022
The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

machen 11 Nov 27, 2022
Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021) Jiaxi Jiang, Kai Zhang, Radu Timofte Computer Vision Lab, ETH Zurich, Switzerland 🔥

Jiaxi Jiang 282 Jan 02, 2023