An addernet CUDA version

Last update: Jun 20, 2022

Related tags

Overview

Training addernet accelerated by CUDA

Usage

cd adder_cuda
python setup.py install
cd ..
python main.py

Environment

pytorch 1.10.0 CUDA 11.3

benchmark

version	training_time_per_batch/s
raw	1.61
torch.cdist	1.49
cuda_unoptimized	0.4508
this work	0.3158

The CUDA version of AdderNet has achieved a 5× speed increase over the original version. There seems to be some bugs in the Cuda_unoptimized version, causing the model to fail to converge. Its speed is still listed here for comparison. The experiment was run on RTX 2080Ti platform, and ResNet-20 based on CIFAR-10 was trained.

Time(%)	Time	Calls	Avg	Min	Max	Name
48.57	30.4752s	3920	7.7743ms	162.70us	12.271ms	CONV_BACKWARD
34.85	21.8686s	19680	1.1112ms	5.3770us	11.827ms	_ZN2at6native27unrolled_elementwise_kernel...
7.46	4.67901s	5920	790.37us	26.529us	1.5841ms	CONV
2.24	1.40372s	3920	358.09us	31.298us	845.80us	col2im_kernel
2.10	1.31882s	36862	35.777us	1.4720us	276.24us	vectorized_elementwise_kernel
1.43	900.03ms	5920	152.03us	7.9040us	372.40us	im2col_kernel

Here is the time distribution of training an epoch. If you are interested, you can continue to optimize the CUDA kernel.

An addernet CUDA version

Related tags

Overview

Training addernet accelerated by CUDA

Usage

Environment

benchmark

Owner

LingXY

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

FlexConv: Continuous Kernel Convolutions with Differentiable Kernel Sizes

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Social Distancing Detector

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

MTCNN face detection implementation for TensorFlow, as a PIP package.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

Official implementation for paper Knowledge Bridging for Empathetic Dialogue Generation (AAAI 2021).

The code release of paper Low-Light Image Enhancement with Normalizing Flow

FindFunc is an IDA PRO plugin to find code functions that contain a certain assembly or byte pattern, reference a certain name or string, or conform to various other constraints.

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

scAR (single-cell Ambient Remover) is a package for data denoising in single-cell omics.

The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

Consecutive-Subsequence - Simple software to calculate susequence with highest sum

This is a Keras-based Python implementation of DeepMask- a complex deep neural network for learning object segmentation masks

Simple keras FCN Encoder/Decoder model for MS-COCO (food subset) segmentation