CVNets: A library for training computer vision networks

This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:

Image classification on the ImageNet dataset
Object detection using SSD
Semantic segmentation using Deeplabv3

Note: Any image classification backbone can be used with object detection and semantic segmentation models

Training can be done with two samplers:

Standard distributed sampler
Mulit-scale distributed sampler

We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.

Installation

CVNets can be installed in the local python environment using the below command:

    git clone [email protected]:apple/ml-cvnets.git
    cd ml-cvnets
    pip install -r requirements.txt
    pip install --editable .

We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda environment. For setting-up python environment with conda, see here.

Getting Started

General instructions for training and evaluation different models are given here.
Examples for a training and evaluating a specific model are provided in the examples folder. Right now, we support following models.
For converting PyTorch models to CoreML, see README-pytorch-to-coreml.md.

Citation

If you find our work useful, please cite the following paper:

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}

CVNets: A library for training computer vision networks

Related tags

Overview

CVNets: A library for training computer vision networks

Installation

Getting Started

Citation

Owner

Apple

Visual odometry package based on hardware-accelerated NVIDIA Elbrus library with world class quality and performance.

This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

Implementation of PersonaGPT Dialog Model

PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

This repo includes the CUB-GHA (Gaze-based Human Attention) dataset and code of the paper "Human Attention in Fine-grained Classification".

Implementation of MA-Trace - a general-purpose multi-agent RL algorithm for cooperative environments.

StocksMA is a package to facilitate access to financial and economic data of Moroccan stocks.

PyTorch implementation of PP-LCNet: A Lightweight CPU Convolutional Neural Network

Artificial Neural network regression model to predict the energy output in a combined cycle power plant.

Code for How To Create A Fully Automated AI Based Trading System With Python

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

R interface to fast.ai

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Implementation supporting the ICCV 2017 paper "GANs for Biological Image Synthesis"

Normalizing Flows with a resampled base distribution

Install alphafold on the local machine, get out of docker.

CVPRW 2021: How to calibrate your event camera

Stock-history-display - something like a easy yearly review for your stock performance