All public open-source implementations of convnets benchmarks

Overview

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Machine: 6-core Intel Core i7-5930K CPU @ 3.50GHz + NVIDIA Titan X + Ubuntu 14.04 x86_64

Imagenet Winners Benchmarking

I pick some popular imagenet models, and I clock the time for a full forward + backward pass. I average my times over 10 runs. I ignored dropout and softmax layers.

Notation

Input is described as {batch_size}x{num_filters}x{filter_width}x{filter_height}. Where batch_size is the number of images used in a minibatch, num_filters is the number of channels in an image, filter_width is the width of the image, and filter_height is the height of the image.

One small note:

The CuDNN benchmarks are done using Torch bindings. One can also do the same via Caffe bindings or bindings of any other library. This note is here to clarify that Caffe (native) and Torch (native) are the convolution kernels which are present as a default fallback. Some of the frameworks like TensorFlow and Chainer are benchmarked with CuDNN, but it is not explicitly mentioned, and hence one might think that these frameworks as a whole are faster, than for example Caffe, which might not be the case.

AlexNet (One Weird Trick paper) - Input 128x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
CuDNN[R4]-fp16 (Torch) cudnn.SpatialConvolution 71 25 46
Nervana-neon-fp16 ConvLayer 78 25 52
CuDNN[R4]-fp32 (Torch) cudnn.SpatialConvolution 81 27 53
TensorFlow conv2d 81 26 55
Nervana-neon-fp32 ConvLayer 87 28 58
fbfft (Torch) fbnn.SpatialConvolution 104 31 72
Chainer Convolution2D 177 40 136
cudaconvnet2* ConvLayer 177 42 135
CuDNN[R2] * cudnn.SpatialConvolution 231 70 161
Caffe (native) ConvolutionLayer 324 121 203
Torch-7 (native) SpatialConvolutionMM 342 132 210
CL-nn (Torch) SpatialConvolutionMM 963 388 574
Caffe-CLGreenTea ConvolutionLayer 1442 210 1232

Overfeat [fast] - Input 128x3x231x231

Library Class Time (ms) forward (ms) backward (ms)
Nervana-neon-fp16 ConvLayer 176 58 118
Nervana-neon-fp32 ConvLayer 211 69 141
CuDNN[R4]-fp16 (Torch) cudnn.SpatialConvolution 242 86 156
CuDNN[R4]-fp32 (Torch) cudnn.SpatialConvolution 268 94 174
TensorFlow conv2d 279 90 189
fbfft (Torch) SpatialConvolutionCuFFT 342 114 227
Chainer Convolution2D 620 135 484
cudaconvnet2* ConvLayer 723 176 547
CuDNN[R2] * cudnn.SpatialConvolution 810 234 576
Caffe ConvolutionLayer 823 355 468
Torch-7 (native) SpatialConvolutionMM 878 379 499
CL-nn (Torch) SpatialConvolutionMM 963 388 574
Caffe-CLGreenTea ConvolutionLayer 2857 616 2240

OxfordNet [Model-A] - Input 64x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
Nervana-neon-fp16 ConvLayer 254 82 171
Nervana-neon-fp32 ConvLayer 320 103 217
CuDNN[R4]-fp16 (Torch) cudnn.SpatialConvolution 471 140 331
CuDNN[R4]-fp32 (Torch) cudnn.SpatialConvolution 529 162 366
TensorFlow conv2d 540 158 382
Chainer Convolution2D 885 251 632
fbfft (Torch) SpatialConvolutionCuFFT 1092 355 737
cudaconvnet2* ConvLayer 1229 408 821
CuDNN[R2] * cudnn.SpatialConvolution 1099 342 757
Caffe ConvolutionLayer 1068 323 745
Torch-7 (native) SpatialConvolutionMM 1105 350 755
CL-nn (Torch) SpatialConvolutionMM 3437 875 2562
Caffe-CLGreenTea ConvolutionLayer 5620 988 4632

GoogleNet V1 - Input 128x3x224x224

Library Class Time (ms) forward (ms) backward (ms)
Nervana-neon-fp16 ConvLayer 230 72 157
Nervana-neon-fp32 ConvLayer 270 84 186
TensorFlow conv2d 445 135 310
CuDNN[R4]-fp16 (Torch) cudnn.SpatialConvolution 462 112 349
CuDNN[R4]-fp32 (Torch) cudnn.SpatialConvolution 470 130 340
Chainer Convolution2D 687 189 497
Caffe ConvolutionLayer 1935 786 1148
CL-nn (Torch) SpatialConvolutionMM 7016 3027 3988
Caffe-CLGreenTea ConvolutionLayer 9462 746 8716

Layer-wise Benchmarking (Last Updated April 2015)

Spatial Convolution layer (3D input 3D output, densely connected)

forward + backprop (wrt input and weights)
Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms)
fbfft SpatialConvolutionCuFFT 256 101 155
cuda-convnet2 * ConvLayer 977 201 776
cuda-convnet** pylearn2.cuda_convnet 1077 312 765
CuDNN R2 * cudnn.SpatialConvolution 1019 269 750
Theano CorrMM 1225 407 818
Caffe ConvolutionLayer 1231 396 835
Torch-7 SpatialConvolutionMM 1265 418 877
DeepCL ConvolutionLayer 6280 2648 3632
cherry-picking**** best per layer 235 79 155

This table is NOT UPDATED For TITAN-X. These numbers below were on Titan Black and are here only for informational and legacy purposes.

Original Library Class/Function Benchmarked Time (ms) forward (ms) backward (ms)
Theano (experimental)*** conv2d_fft 1178 304 874
Torch-7 nn.SpatialConvolutionBHWD 1892 581 1311
ccv ccv_convnet_layer 809+bw 809
Theano (legacy) conv2d 70774 3833 66941
  • * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • **** The last row shows results obtainable when choosing the best-performing library for each layer.
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time forward+backward calls for layers (L1 + L2 + L3 + L4 + L5)
Breakdown
forward

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
fbfft SpatialConvolutionCuFFT 57 27 6 2 9 101
cuda-convnet2 * ConvLayer 36 113 40 4 8 201
cuda-convnet** pylearn2.cuda_convnet 38 183 68 7 16 312
CuDNN R2 cudnn.SpatialConvolution 56 143 53 6 11 269
Theano CorrMM 91 143 121 24 28 407
Caffe ConvolutionLayer 93 136 116 24 27 396
Torch-7 nn.SpatialConvolutionMM 94 149 123 24 28 418
DeepCL ConvolutionLayer 738 1241 518 47 104 2648
cherry-picking**** best per layer 36 27 6 2 8 79
backward (gradInput + gradWeight)

Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked L1 L2 L3 L4 L5 Total
fbfft SpatialConvolutionCuFFT 76 45 12 4 18 155
cuda-convnet2 * ConvLayer 103 467 162 15 29 776
cuda-convnet** pylearn2.cuda_convnet 136 433 147 15 34 765
CuDNN R2 cudnn.SpatialConvolution 139 401 159 19 32 750
Theano CorrMM 179 405 174 29 31 818
Caffe ConvolutionLayer 200 405 172 28 30 835
Torch-7 nn.SpatialConvolutionMM 206 432 178 29 32 877
DeepCL ConvolutionLayer 484 2144 747 59 198 3632
cherry-picking**** best per layer 76 45 12 4 18 155
Owner
Soumith Chintala
/\︿╱\ _________________________________ \0_ 0 /╱\╱____________________________ \▁︹_/
Soumith Chintala
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

English | 简体中文 PaddleGAN PaddleGAN provides developers with high-performance implementation of classic and SOTA Generative Adversarial Networks, and s

6.4k Jan 09, 2023
Automatically align face images 🙃→🙂. Can also do windowing and warping.

Automatic Face Alignment (AFA) Carl M. Gaspar & Oliver G.B. Garrod You have lots of photos of faces like this: But you want to line up all of the face

Carl Michael Gaspar 15 Dec 12, 2022
Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation (CVPR2019) This is a pytorch implementatio

Yawei Luo 280 Jan 01, 2023
Riemannian Convex Potential Maps

Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited b

Facebook Research 61 Nov 28, 2022
Google AI Open Images - Object Detection Track: Open Solution

Google AI Open Images - Object Detection Track: Open Solution This is an open solution to the Google AI Open Images - Object Detection Track 😃 More c

minerva.ml 46 Jun 22, 2022
This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (EMNLP 2020)

Towards Persona-Based Empathetic Conversational Models (PEC) This is the repo for our work "Towards Persona-Based Empathetic Conversational Models" (E

Zhong Peixiang 35 Nov 17, 2022
PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

StarEnhancer StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement (ICCV 2021 Oral) Abstract: Image enhancement is a subjective process w

IDKiro 133 Dec 28, 2022
Intelligent Video Analytics toolkit based on different inference backends.

English | 中文 OpenIVA OpenIVA is an end-to-end intelligent video analytics development toolkit based on different inference backends, designed to help

Quantum Liu 15 Oct 27, 2022
Shōgun

The SHOGUN machine learning toolbox Unified and efficient Machine Learning since 1999. Latest release: Cite Shogun: Develop branch build status: Donat

Shōgun ML 2.9k Jan 04, 2023
This is the research repository for Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition.

Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition This is the research repository for Vid2

Future Interfaces Group (CMU) 26 Dec 24, 2022
Finding all things on-prem Microsoft for password spraying and enumeration.

msprobe About Installing Usage Examples Coming Soon Acknowledgements About Finding all things on-prem Microsoft for password spraying and enumeration.

205 Jan 09, 2023
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Momin Haider 0 Jan 02, 2022
Code for DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Pytorch Implementation for DisCo: Remedy Self-supervi

79 Jan 06, 2023
Semantic similarity computation with different state-of-the-art metrics

Semantic similarity computation with different state-of-the-art metrics Description • Installation • Usage • License Description TaxoSS is a semantic

6 Jun 22, 2022
ICNet and PSPNet-50 in Tensorflow for real-time semantic segmentation

Real-Time Semantic Segmentation in TensorFlow Perform pixel-wise semantic segmentation on high-resolution images in real-time with Image Cascade Netwo

Oles Andrienko 219 Nov 21, 2022
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022
Implementation of character based convolutional neural network

Character Based CNN This repo contains a PyTorch implementation of a character-level convolutional neural network for text classification. The model a

Ahmed BESBES 248 Nov 21, 2022
Vehicles Counting using YOLOv4 + DeepSORT + Flask + Ngrok

A project for counting vehicles using YOLOv4 + DeepSORT + Flask + Ngrok

Duong Tran Thanh 37 Dec 16, 2022
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022