A PyTorch-Based Framework for Deep Learning in Computer Vision

Last update: Jan 09, 2023

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

@misc{you2019torchcv,
    author = {Ansheng You and Xiangtai Li and Zhen Zhu and Yunhai Tong},
    title = {TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision},
    howpublished = {\url{https://github.com/donnyyou/torchcv}},
    year = {2019}
}

This repository provides source code for most deep learning based cv problems. We'll do our best to keep this repository up-to-date. If you do find a problem about this repository, please raise an issue or submit a pull request.

- Semantic Flow for Fast and Accurate Scene Parsing
- Code and models: https://github.com/lxtGH/SFSegNets

Implemented Papers

Image Classification
- VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition
- ResNet: Deep Residual Learning for Image Recognition
- DenseNet: Densely Connected Convolutional Networks
- ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices
- ShuffleNet V2: Practical Guidelines for Ecient CNN Architecture Design
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search
Semantic Segmentation
- DeepLabV3: Rethinking Atrous Convolution for Semantic Image Segmentation
- PSPNet: Pyramid Scene Parsing Network
- DenseASPP: DenseASPP for Semantic Segmentation in Street Scenes
- Asymmetric Non-local Neural Networks for Semantic Segmentation
- Semantic Flow for Fast and Accurate Scene Parsing
Object Detection
- SSD: Single Shot MultiBox Detector
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- YOLOv3: An Incremental Improvement
- FPN: Feature Pyramid Networks for Object Detection
Pose Estimation
- CPM: Convolutional Pose Machines
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Instance Segmentation
- Mask R-CNN
Generative Adversarial Networks
- Pix2pix: Image-to-Image Translation with Conditional Adversarial Nets
- CycleGAN: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

QuickStart with TorchCV

Now only support Python3.x, pytorch 1.3.

pip3 install -r requirements.txt
cd lib/exts
sh make.sh

Performances with TorchCV

All the performances showed below fully reimplemented the papers' results.

Image Classification

ImageNet (Center Crop Test): 224x224

Model	Train	Test	Top-1	Top-5	BS	Iters	Scripts
ResNet50	train	val	77.54	93.59	512	30W	ResNet50
ResNet101	train	val	78.94	94.56	512	30W	ResNet101
ShuffleNetV2x0.5	train	val	60.90	82.54	1024	40W	ShuffleNetV2x0.5
ShuffleNetV2x1.0	train	val	69.71	88.91	1024	40W	ShuffleNetV2x1.0
DFNetV1	train	val	70.99	89.68	1024	40W	DFNetV1
DFNetV2	train	val	74.22	91.61	1024	40W	DFNetV2

Semantic Segmentation

Cityscapes (Single Scale Whole Image Test): Base LR 0.01, Crop Size 769

Model	Backbone	Train	Test	mIOU	BS	Iters	Scripts
PSPNet	3x3-Res101	train	val	78.20	8	4W	PSPNet
DeepLabV3	3x3-Res101	train	val	79.13	8	4W	DeepLabV3

ADE20K (Single Scale Whole Image Test): Base LR 0.02, Crop Size 520

Model	Backbone	Train	Test	mIOU	PixelACC	BS	Iters	Scripts
PSPNet	3x3-Res50	train	val	41.52	80.09	16	15W	PSPNet
DeepLabv3	3x3-Res50	train	val	42.16	80.36	16	15W	DeepLabV3
PSPNet	3x3-Res101	train	val	43.60	81.30	16	15W	PSPNet
DeepLabv3	3x3-Res101	train	val	44.13	81.42	16	15W	DeepLabV3

Object Detection

Pascal VOC2007/2012 (Single Scale Test): 20 Classes

Model	Backbone	Train	Test	mAP	BS	Epochs	Scripts
SSD300	VGG16	07+12_trainval	07_test	0.786	32	235	SSD300
SSD512	VGG16	07+12_trainval	07_test	0.808	32	235	SSD512
Faster R-CNN	VGG16	07_trainval	07_test	0.706	1	15	Faster R-CNN

Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Instance Segmentation

Mask R-CNN

Generative Adversarial Networks

Pix2pix
CycleGAN

DataSets with TorchCV

TorchCV has defined the dataset format of all the tasks which you could check in the subdirs of data. Following is an example dataset directory trees for training semantic segmentation. You could preprocess the open datasets with the scripts in folder data/seg/preprocess

Dataset
    train
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...
    val
        image
            00001.jpg/png
            00002.jpg/png
            ...
        label
            00001.png
            00002.png
            ...

Commands with TorchCV

Take PSPNet as an example. ("tag" could be any string, include an empty one.)

Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Resume Training

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh train tag

Validate

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh val tag

Testing:

cd scripts/seg/cityscapes/
bash run_fs_pspnet_cityscapes_seg.sh test tag

Demos with TorchCV

Example output of VGG19-OpenPose

A PyTorch-Based Framework for Deep Learning in Computer Vision

Related tags

Overview

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision

Implemented Papers

QuickStart with TorchCV

Performances with TorchCV

Image Classification

Semantic Segmentation

Object Detection

Pose Estimation

Instance Segmentation

Generative Adversarial Networks

DataSets with TorchCV

Commands with TorchCV

Demos with TorchCV

Owner

Donny You

Official repository for Few-shot Image Generation via Cross-domain Correspondence (CVPR '21)

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Official PyTorch implementation of "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation" (ICCV 21).

Modeling CNN layers activity with Gaussian mixture model

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

PyTorch implementation of Glow

yolov5目标检测模型的知识蒸馏（基于响应的蒸馏）

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

An official implementation of the paper Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers

[CVPRW 21] "BNN - BN = ? Training Binary Neural Networks without Batch Normalization", Tianlong Chen, Zhenyu Zhang, Xu Ouyang, Zechun Liu, Zhiqiang Shen, Zhangyang Wang

This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Api for getting bin info and getting encrypted card details for adyen.

Classification models 1D Zoo - Keras and TF.Keras

A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Image restoration with neural networks but without learning.

abess: Fast Best-Subset Selection in Python and R

(ICCV 2021) ProHMR - Probabilistic Modeling for Human Mesh Recovery

Code for Environment Inference for Invariant Learning (ICML 2020 UDL Workshop Paper)

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation