A PyTorch implementation of DenseNet.

Overview

A PyTorch Implementation of DenseNet

This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Convolutional Networks by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. This implementation gets a CIFAR-10+ error rate of 4.77 with a 100-layer DenseNet-BC with a growth rate of 12. Their official implementation and links to many other third-party implementations are available in the liuzhuang13/DenseNet repo on GitHub.

Why DenseNet?

As this table from the DenseNet paper shows, it provides competitive state of the art results on CIFAR-10, CIFAR-100, and SVHN.

Why yet another DenseNet implementation?

PyTorch is a great new framework and it's nice to have these kinds of re-implementations around so that they can be integrated with other PyTorch projects.

How do you know this implementation is correct?

Interestingly while implementing this, I had a lot of trouble getting it to converge and looked at every part of the code closer than I usually would. I compared all of the model's hidden states and gradients with the official implementation to make sure my code was correct and even trained a VGG-style network on CIFAR-10 with the training code here. It turns out that I uncovered a new critical PyTorch bug (now fixed) that was causing this.

I have left around my original message about how this isn't working and the things that I have checked in this document. I think this should be interesting for other people to see my development and debugging strategies when having issues implementing a model that's known to converge. I also started this PyTorch forum thread, which has a few other discussion points. You may also be interested in my script that compares PyTorch gradients to Torch gradients and my script that numerically checks PyTorch gradients.

My convergence issues were due to a critical PyTorch bug related to using torch.cat with convolutions with cuDNN enabled (which it is by default when CUDA is used). This bug caused incorrect gradients and the fix to this bug is to disable cuDNN (which doesn't have to be done anymore because it's fixed). The oversight in my debugging strategies that caused me to not find this error is that I did not think to disable cuDNN. Until now, I have assumed that the cuDNN option in frameworks are bug-free, but have learned that this is not always the case. I may have also found something if I would have numerically debugged torch.cat layers with convolutions instead of fully connected layers.

Adam fixed the PyTorch bug that caused this in this PR and has been merged into Torch's master branch. If you are interested in using the DenseNet code in this repository, make sure your PyTorch version contains this PR and was downloaded after 2017-02-10.

What does the PyTorch compute graph of the model look like?

You can see the compute graph here, which I created with make_graph.py, which I copied from Adam Paszke's gist. Adam says PyTorch will soon have a better way to create compute graphs.

How does this implementation perform?

By default, this repo trains a 100-layer DenseNet-BC with an growth rate of 12 on the CIFAR-10 dataset with data augmentations. Due to GPU memory sizes, this is the largest model I am able to run. The paper reports a final test error of 4.51 with this architecture and we obtain a final test error of 4.77.

Why don't people use ADAM instead of SGD for training ResNet-style models?

I also tried training a net with ADAM and found that it didn't converge as well with the default hyper-parameters compared to SGD with a reasonable learning rate schedule.

What about the non-BC version?

I haven't tested this as thoroughly, you should make sure it's working as expected if you plan to use and modify it. Let me know if you find anything wrong with it.

A paradigm for ML code

I like to include a few features in my projects that I don't see in some other re-implementations that are present in this repo. The training code in train.py uses argparse so the batch size and some other hyper-params can easily be changed and as the model is training, progress is written out to csv files in a work directory also defined by the arguments. Then a separate script plot.py plots the progress written out by the training script. The training script calls plot.py after every epoch, but it can importantly be run on its own so figures can be tweaked without re-running the entire experiment.

Help wanted: Improving memory utilization and multi-GPU support

I think there are ways to improve the memory utilization in this code as in the the official space-efficient Torch implementation. I also would be interested in multi-GPU support.

Running the code and viewing convergence

First install PyTorch (ideally in an anaconda3 distribution). ./train.py will create a model, start training it, and save progress to args.save, which is work/cifar10.base by default. The training script will call plot.py after every epoch to create plots from the saved progress.

Citations

The following is a BibTeX entry for the DenseNet paper that you should cite if you use this model.

@article{Huang2016Densely,
  author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.},
  title = {Densely Connected Convolutional Networks},
  journal = {arXiv preprint arXiv:1608.06993},
  year = {2016}
}

If you use this implementation, please also consider citing this implementation and code repository with the following BibTeX or plaintext entry. The BibTeX entry requires the url LaTeX package.

@misc{amos2017densenet,
  title = {{A PyTorch Implementation of DenseNet}},
  author = {Amos, Brandon and Kolter, J. Zico},
  howpublished = {\url{https://github.com/bamos/densenet.pytorch}},
  note = {Accessed: [Insert date here]}
}

Brandon Amos, J. Zico Kolter
A PyTorch Implementation of DenseNet
https://github.com/bamos/densenet.pytorch.
Accessed: [Insert date here]

Licensing

This repository is Apache-licensed.

Owner
Brandon Amos
Brandon Amos
RLDS stands for Reinforcement Learning Datasets

RLDS RLDS stands for Reinforcement Learning Datasets and it is an ecosystem of tools to store, retrieve and manipulate episodic data in the context of

Google Research 135 Jan 01, 2023
Model Quantization Benchmark

Introduction MQBench is an open-source model quantization toolkit based on PyTorch fx. The envision of MQBench is to provide: SOTA Algorithms. With MQ

500 Jan 06, 2023
Voice assistant - Voice assistant with python

🌐 Python Voice Assistant 🌵 - User's greeting 🌵 - Writing tasks to todo-list ?

PythonToday 10 Dec 26, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

3 Sep 30, 2021
MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü

Burak Bagatarhan 12 Mar 29, 2022
Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 01, 2023
Keras documentation, hosted live at keras.io

Keras.io documentation generator This repository hosts the code used to generate the keras.io website. Generating a local copy of the website pip inst

Keras 2k Jan 08, 2023
2 Jul 19, 2022
IEGAN — Official PyTorch Implementation Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image Translation

IEGAN — Official PyTorch Implementation Independent Encoder for Deep Hierarchical Unsupervised Image-to-Image Translation Independent Encoder for Deep

30 Nov 05, 2022
An imperfect information game is a type of game with asymmetric information

DecisionHoldem An imperfect information game is a type of game with asymmetric information. Compared with perfect information game, imperfect informat

Decision AI 25 Dec 23, 2022
DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment

DEEPAGÉ: Answering Questions in Portuguese about the Brazilian Environment This repository is related to the paper DEEPAGÉ: Answering Questions in Por

0 Dec 10, 2021
Code for Environment Dynamics Decomposition (ED2).

ED2 Code for Environment Dynamics Decomposition (ED2). Installation Follow the installation in MBPO and Dreamer. Usage First follow the SD2 method for

0 Aug 10, 2021
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

Tianwei Yin 134 Dec 23, 2022
Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Improving Transferability of Representations via Augmentation-Aware Self-Supervision Accepted to NeurIPS 2021 TL;DR: Learning augmentation-aware infor

hankook 38 Sep 16, 2022
Multiple custom object count and detection using YOLOv3-Tiny method

Electronic-Component-YOLOv3 Introduce This project created to detect, count, and recognize multiple custom object using YOLOv3-Tiny method. The target

Derwin Mahardika 2 Nov 14, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
A repository for generating stylized talking 3D and 3D face

style_avatar A repository for generating stylized talking 3D faces and 2D videos. This is the repository for paper Imitating Arbitrary Talking Style f

Haozhe Wu 191 Dec 22, 2022
DAN: Unfolding the Alternating Optimization for Blind Super Resolution

DAN-Basd-on-Openmmlab DAN: Unfolding the Alternating Optimization for Blind Super Resolution We reproduce DAN via mmediting based on open-sourced code

AlexZou 72 Dec 13, 2022