This repository contains the source code of our work on designing efficient CNNs for computer vision

Overview

Efficient networks for Computer Vision

This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.

Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML.
Seg demo on iPhone7 Seg demo on iPhone7
Real-time object detection using ESPNetv2
Demo 1
Demo 2 Demo 3

Table of contents

  1. Key highlihgts
  2. Supported networks
  3. Relevant papers
  4. Blogs
  5. Performance comparison
  6. Training receipe
  7. Instructions for segmentation and detection demos
  8. Citation
  9. License
  10. Acknowledgements
  11. Contributions
  12. Notes

Key highlights

  • Object classification on the ImageNet and MS-COCO (multi-label)
  • Semantic Segmentation on the PASCAL VOC and the CityScapes
  • Object Detection on the PASCAL VOC and the MS-COCO
  • Supports PyTorch 1.0
  • Integrated with Tensorboard for easy visualization of training logs.
  • Scripts for downloading different datasets.
  • Semantic segmentation application using ESPNetv2 on iPhone can be found here.

Supported networks

This repo supports following networks:

  • ESPNetv2 (Classification, Segmentation, Detection)
  • DiCENet (Classification, Segmentation, Detection)
  • ShuffleNetv2 (Classification)

Relevant papers

Blogs

Performance comparison

ImageNet

Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here

DiCENet performance on the ImageNet

Object detection

Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here

MSCOCO
Image Size FLOPs mIOU FPS
SSD-VGG 512x512 100 B 26.8 19
YOLOv2 544x544 17.5 B 21.6 40
ESPNetv2-SSD (Ours) 512x512 3.2 B 24.54 35

Semantic Segmentation

Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.

Cityscapes PASCAL VOC 2012
Image Size FLOPs mIOU Image Size FLOPs mIOU
ESPNet 1024x512 4.5 B 60.3 512x512 2.2 B 63
ESPNetv2 1024x512 2.7 B 66.2 384x384 0.76 B 68

Training Receipe

Image Classification

Details about training and testing are provided here.

Details about performance of different models are provided here.

Semantic segmentation

Details about training and testing are provided here.

Details about performance of different models are provided here.

Object Detection

Details about training and testing are provided here.

Details about performance of different models are provided here.

Instructions for segmentation and detection demos

To run the segmentation demo, just type:

python segmentation_demo.py

To run the detection demo, run the following command:

python detection_demo.py

OR 

python detection_demo.py --live

For other supported arguments, please see the corresponding files.

Citation

If you find this repository helpful, please feel free to cite our work:

@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}

@inproceedings{mehta2018espnetv2,
  title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
  author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2019}
}

@inproceedings{mehta2018espnet,
  title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
  author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={552--568},
  year={2018}
}

License

By downloading this software, you acknowledge that you agree to the terms and conditions given here.

Acknowledgements

Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.

Want to help out?

Thanks for your interest in our work :).

Open tasks that are interesting:

  • Tensorflow implementation. I kind of wanna do this but not getting enough time. If you are interested, drop a message and we can talk about it.
  • Optimizing the EESP and the DiceNet block at CUDA-level.
  • Optimize and port pretrained models across multiple mobile platforms, including Android.
  • Other thoughts are also welcome :).

Notes

Notes about DiCENet paper

This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.

Owner
Sachin Mehta
Research Scientist at Apple and Affiliate Assistant Professor at UW
Sachin Mehta
A highly efficient and modular implementation of Gaussian Processes in PyTorch

GPyTorch GPyTorch is a Gaussian process library implemented using PyTorch. GPyTorch is designed for creating scalable, flexible, and modular Gaussian

3k Jan 02, 2023
Model Serving Made Easy

The easiest way to build Machine Learning APIs BentoML makes moving trained ML models to production easy: Package models trained with any ML framework

BentoML 4.4k Jan 08, 2023
Command-line tool for downloading and extending the RedCaps dataset.

RedCaps Downloader This repository provides the official command-line tool for downloading and extending the RedCaps dataset. Users can seamlessly dow

RedCaps dataset 33 Dec 14, 2022
This is an official implementation for "PlaneRecNet".

PlaneRecNet This is an official implementation for PlaneRecNet: A multi-task convolutional neural network provides instance segmentation for piece-wis

yaxu 50 Nov 17, 2022
A Number Recognition algorithm

Paddle-VisualAttention Results_Compared SVHN Dataset Methods Steps GPU Batch Size Learning Rate Patience Decay Step Decay Rate Training Speed (FPS) Ac

1 Nov 12, 2021
Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

MosaicOS Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation. Introduction M

Cheng Zhang 27 Oct 12, 2022
Multiwavelets-based operator model

Multiwavelet model for Operator maps Gaurav Gupta, Xiongye Xiao, and Paul Bogdan Multiwavelet-based Operator Learning for Differential Equations In Ne

Gaurav 33 Dec 04, 2022
FFTNet vocoder implementation

Unofficial Implementation of FFTNet vocode paper. implement the model. implement tests. overfit on a single batch (sanity check). linearize weights fo

Eren Gölge 81 Dec 08, 2022
A Python library for Deep Graph Networks

PyDGN Wiki Description This is a Python library to easily experiment with Deep Graph Networks (DGNs). It provides automatic management of data splitti

Federico Errica 194 Dec 22, 2022
This code implements constituency parse tree aggregation

README This code implements constituency parse tree aggregation. Folder details code: This folder contains the code that implements constituency parse

Adithya Kulkarni 0 Oct 11, 2021
Time-series-deep-learning - Developing Deep learning LSTM, BiLSTM models, and NeuralProphet for multi-step time-series forecasting of stock price.

Stock Price Prediction Using Deep Learning Univariate Time Series Predicting stock price using historical data of a company using Neural networks for

Abdultawwab Safarji 7 Nov 27, 2022
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [BCNet, CVPR 2021] This is the official pytorch implementation of BCNet built on

Lei Ke 434 Dec 01, 2022
Automate issue discovery for your projects against Lightning nightly and releases.

Automated Testing for Lightning EcoSystem Projects Automate issue discovery for your projects against Lightning nightly and releases. You get CPUs, Mu

Pytorch Lightning 41 Dec 24, 2022
This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE

Guochen Yu 68 Dec 16, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 03, 2023
Tensorflow implementation of Swin Transformer model.

Swin Transformer (Tensorflow) Tensorflow reimplementation of Swin Transformer model. Based on Official Pytorch implementation. Requirements tensorflow

167 Jan 08, 2023
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

17 Dec 19, 2022
[2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval

CONQUER: Contexutal Query-aware Ranking for Video Corpus Moment Retreival PyTorch implementation of CONQUER: Contexutal Query-aware Ranking for Video

Hou zhijian 23 Dec 26, 2022
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 08, 2022
Utilities and information for the signals.numer.ai tournament

dsignals Utilities and information for the signals.numer.ai tournament using eodhistoricaldata.com eodhistoricaldata.com provides excellent historical

Degerhan Usluel 23 Dec 18, 2022