DeepLab resnet v2 model in pytorch

Overview

pytorch-deeplab-resnet

DeepLab resnet v2 model implementation in pytorch.

The architecture of deepLab-ResNet has been replicated exactly as it is from the caffe implementation. This architecture calculates losses on input images over multiple scales ( 1x, 0.75x, 0.5x ). Losses are calculated individually over these 3 scales. In addition to these 3 losses, one more loss is calculated after merging the output score maps on the 3 scales. These 4 losses are added to calculate the total loss.

Updates

18 July 2017

  • One more evaluation script is added, evalpyt2.py. The old evaluation script evalpyt.py uses a different methodoloy to take mean of IOUs than the one used by authors. Results section has been updated to incorporate this change.

24 June 2017

  • Now, weights over the 3 scales ( 1x, 0.75x, 0.5x ) are shared as in the caffe implementation. Previously, each of the 3 scales had seperate weights. Results are almost same after making this change (more in the results section). However, the size of the trained .pth model has reduced significantly. Memory occupied on GPU(11.9 GB) and time taken (~3.5 hours) during training are same as before. Links to corresponding .pth files have been updated.
  • Custom data can be used to train pytorch-deeplab-resnet using train.py, flag --NoLabels (total number of labels in training data) has been added to train.py and evalpyt.py for this purpose. Please note that labels should be denoted by contiguous values (starting from 0) in the ground truth images. For eg. if there are 7 (no_labels) different labels, then each ground truth image must have these labels as 0,1,2,3,...6 (no_labels-1).

The older version (prior to 24 June 2017) is available here.

Usage

Note that this repository has been tested with python 2.7 only.

Converting released caffemodel to pytorch model

To convert the caffemodel released by authors, download the deeplab-resnet caffemodel (train_iter_20000.caffemodel) pretrained on VOC into the data folder. After that, run

python convert_deeplab_resnet.py

to generate the corresponding pytorch model file (.pth). The generated .pth snapshot file can be used to get the exsct same test performace as offered by using the caffemodel in caffe (as shown by numbers in results section). If you do not want to generate the .pth file yourself, you can download it here.

To run convert_deeplab_resnet.py, deeplab v2 caffe and pytorch (python 2.7) are required.

If you want to train your model in pytorch, move to the next section.

Training

Step 1: Convert init.caffemodel to a .pth file: init.caffemodel contains MS COCO trained weights. We use these weights as initilization for all but the final layer of our model. For the last layer, we use random gaussian with a standard deviation of 0.01 as the initialization. To convert init.caffemodel to a .pth file, run (or download the converted .pth here)

python init_net_surgery.py

To run init_net_surgery .py, deeplab v2 caffe and pytorch (python 2.7) are required.

Step 2: Now that we have our initialization, we can train deeplab-resnet by running,

python train.py

To get a description of each command-line arguments, run

python train.py -h

To run train.py, pytorch (python 2.7) is required.

By default, snapshots are saved in every 1000 iterations in the data/snapshots. The following features have been implemented in this repository -

  • Training regime is the same as that of the caffe implementation - SGD with momentum is used, along with the poly lr decay policy. A weight decay has been used. The last layer has 10 times the learning rate of other layers.
  • The iter_size parameter of caffe has been implemented, effectively increasing the batch_size to batch_size times iter_size
  • Random flipping and random scaling of input has been used as data augmentation. The caffe implementation uses 4 fixed scales (0.5,0.75,1,1.25,1.5) while in the pytorch implementation, for each iteration scale is randomly picked in the range - [0.5,1.3].
  • The boundary label (255 in ground truth labels) has not been ignored in the loss function in the current version, instead it has been merged with the background. The ignore_label caffe parameter would be implemented in the future versions. Post processing using CRF has not been implemented.
  • Batchnorm parameters are kept fixed during training. Also, caffe setting use_global_stats = True is reproduced during training. Running mean and variance are not calculated during training.

When run on a Nvidia Titan X GPU, train.py occupies about 11.9 GB of memory.

Evaluation

Evaluation of the saved models can be done by running

python evalpyt.py

To get a description of each command-line arguments, run

python evalpyt.py -h

Results

When trained on VOC augmented training set (with 10582 images) using MS COCO pretrained initialization in pytorch, we get a validation performance of 72.40%(evalpyt2.py, on VOC). The corresponding .pth file can be downloaded here. This is in comparision to 75.54% that is acheived by using train_iter_20000.caffemodel released by authors, which can be replicated by running this file . The .pth model converted from .caffemodel using the first section also gives 75.54% mean IOU. A previous version of this file reported mean IOU of 78.48% on the pytorch trained model which is caclulated in a different way (evalpyt.py, Mean IOU is calculated for each image and these values are averaged together. This way of calculating mean IOU is different than the one used by authors).

To replicate this performance, run

train.py --lr 0.00025 --wtDecay 0.0005 --maxIter 20000 --GTpath <train gt images path here> --IMpath <train images path here> --LISTpath data/list/train_aug.txt

Dataset

The model presented in the results section was trained using the augmented VOC train set which was released by this paper. You may download this augmented data directly from here.

Note that this code can be used to train pytorch-deeplab-resnet model for other datasets also.

Acknowledgement

A part of the code has been borrowed from https://github.com/ry/tensorflow-resnet.

Owner
Isht Dwivedi
Isht Dwivedi
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

GCNet for Object Detection By Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu. This repo is a official implementation of "GCNet: Non-local Networ

Jerry Jiarui XU 1.1k Dec 29, 2022
Data reduction pipeline for KOALA on the AAT.

KOALA KOALA, the Kilofibre Optical AAT Lenslet Array, is a wide-field, high efficiency, integral field unit used by the AAOmega spectrograph on the 3.

4 Sep 26, 2022
SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

The SpeechBrain Toolkit SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch. The goal is to create a single, flexible, and us

SpeechBrain 5.1k Jan 02, 2023
《Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement》(ECCV 2020) GitHub: [fig9]

Unsupervised 3D Human Pose Representation [Paper] The implementation of our paper Unsupervised 3D Human Pose Representation with Viewpoint and Pose Di

42 Nov 24, 2022
A simple image/video to Desmos graph converter run locally

Desmos Bezier Renderer A simple image/video to Desmos graph converter run locally Sample Result Setup Install dependencies apt update apt install git

Kevin JY Cui 339 Dec 23, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 01, 2023
The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees This project contains the codes of pap

0 Apr 20, 2022
Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020

XDVioDet Official implementation of "Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision" ECCV2020. The proj

peng 64 Dec 12, 2022
This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021.

SG2HOI This repository is for our paper Exploiting Scene Graphs for Human-Object Interaction Detection accepted by ICCV 2021. Installation Pytorch 1.7

HT 10 Dec 20, 2022
"Neural Turing Machine" in Tensorflow

Neural Turing Machine in Tensorflow Tensorflow implementation of Neural Turing Machine. This implementation uses an LSTM controller. NTM models with m

Taehoon Kim 1k Dec 06, 2022
Fast methods to work with hydro- and topography data in pure Python.

PyFlwDir Intro PyFlwDir contains a series of methods to work with gridded DEM and flow direction datasets, which are key to many workflows in many ear

Deltares 27 Dec 07, 2022
Everything about being a TA for ITP/AP course!

تی‌ای بودن! تی‌ای یا دستیار استاد از نقش‌های رایج بین دانشجویان مهندسی است، این ریپوزیتوری قرار است نکات مهم درمورد تی‌ای بودن و تی ای شدن را به ما نش

<a href=[email protected]"> 14 Sep 10, 2022
The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022

DG-TrajGen The official repository for paper ''Domain Generalization for Vision-based Driving Trajectory Generation'' submitted to ICRA 2022. Our Meth

Wang 25 Sep 26, 2022
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience This repository is the official implementation of [https://www.bi

Eulerlab 6 Oct 09, 2022
StarGAN-ZSVC: Unofficial PyTorch Implementation

This repository is an unofficial PyTorch implementation of StarGAN-ZSVC by Matthew Baas and Herman Kamper. This repository provides both model architectures and the code to inference or train them.

Jirayu Burapacheep 11 Aug 28, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 09, 2022
Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

Tae-Hwan Jung 775 Jan 08, 2023
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras. Implementation of various Deep Image Segmentation mo

Divam Gupta 2.6k Jan 05, 2023
This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

RGB2NIR_Experimental This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models

5 Jan 04, 2023