NVIDIA Deep Learning Examples for Tensor Cores

Overview

NVIDIA Deep Learning Examples for Tensor Cores

Introduction

This repository provides State-of-the-Art Deep Learning examples that are easy to train and deploy, achieving the best reproducible accuracy and performance with NVIDIA CUDA-X software stack running on NVIDIA Volta, Turing and Ampere GPUs.

NVIDIA GPU Cloud (NGC) Container Registry

These examples, along with our NVIDIA deep learning software stack, are provided in a monthly updated Docker container on the NGC container registry (https://ngc.nvidia.com). These containers include:

  • The latest NVIDIA examples from this repository
  • The latest NVIDIA contributions shared upstream to the respective framework
  • The latest NVIDIA Deep Learning software libraries, such as cuDNN, NCCL, cuBLAS, etc. which have all been through a rigorous monthly quality assurance process to ensure that they provide the best possible performance
  • Monthly release notes for each of the NVIDIA optimized containers

Computer Vision

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
ResNet-50 PyTorch Yes Yes Yes - Yes - Yes Yes -
ResNeXt-101 PyTorch Yes Yes Yes - Yes - Yes Yes -
SE-ResNeXt-101 PyTorch Yes Yes Yes - Yes - Yes Yes -
EfficientNet-B0 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-B4 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-WideSE-B0 PyTorch Yes Yes Yes - - - - Yes -
EfficientNet-WideSE-B4 PyTorch Yes Yes Yes - - - - Yes -
Mask R-CNN PyTorch Yes Yes Yes - - - - - Yes
nnUNet PyTorch Yes Yes Yes - - - - Yes -
SSD PyTorch Yes Yes Yes - - - - - Yes
ResNet-50 TensorFlow Yes Yes Yes - - - - Yes -
ResNeXt101 TensorFlow Yes Yes Yes - - - - Yes -
SE-ResNeXt-101 TensorFlow Yes Yes Yes - - - - Yes -
Mask R-CNN TensorFlow Yes Yes Yes - - - - Yes -
SSD TensorFlow Yes Yes Yes - - - - Yes Yes
U-Net Ind TensorFlow Yes Yes Yes - - - - Yes Yes
U-Net Med TensorFlow Yes Yes Yes - - - - Yes -
U-Net 3D TensorFlow Yes Yes Yes - - - - Yes -
V-Net Med TensorFlow Yes Yes Yes - - - - Yes -
U-Net Med TensorFlow2 Yes Yes Yes - - - - Yes -
Mask R-CNN TensorFlow2 Yes Yes Yes - - - - Yes -
EfficientNet TensorFlow2 Yes Yes Yes Yes - - - Yes -
ResNet-50 MXNet - Yes Yes - - - - - -

Natural Language Processing

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
BERT PyTorch Yes Yes Yes Yes - - Yes Yes -
TransformerXL PyTorch Yes Yes Yes Yes - - - Yes -
GNMT PyTorch Yes Yes Yes - - - - - -
Transformer PyTorch Yes Yes Yes - - - - - -
ELECTRA TensorFlow2 Yes Yes Yes Yes - - - Yes -
BERT TensorFlow Yes Yes Yes Yes Yes - Yes Yes Yes
BERT TensorFlow2 Yes Yes Yes Yes - - - Yes -
BioBert TensorFlow Yes Yes Yes - - - - Yes Yes
TransformerXL TensorFlow Yes Yes Yes - - - - - -
GNMT TensorFlow Yes Yes Yes - - - - - -
Faster Transformer Tensorflow - - - - Yes - - - -

Recommender Systems

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
DLRM PyTorch Yes Yes Yes - - Yes Yes Yes Yes
DLRM TensorFlow2 Yes Yes Yes Yes - - - Yes -
NCF PyTorch Yes Yes Yes - - - - - -
Wide&Deep TensorFlow Yes Yes Yes - - - - Yes -
Wide&Deep TensorFlow2 Yes Yes Yes - - - - Yes -
NCF TensorFlow Yes Yes Yes - - - - Yes -
VAE-CF TensorFlow Yes Yes Yes - - - - - -

Speech to Text

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
Jasper PyTorch Yes Yes Yes - Yes Yes Yes Yes Yes
Hidden Markov Model Kaldi - - Yes - - - Yes - -

Text to Speech

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
FastPitch PyTorch Yes Yes Yes - - - - Yes -
FastSpeech PyTorch - Yes Yes - Yes - - - -
Tacotron 2 and WaveGlow PyTorch Yes Yes Yes - Yes Yes Yes Yes -

Graph Neural Networks

Models Framework A100 AMP Multi-GPU Multi-Node TRT ONNX Triton DLC NB
SE(3)-Transformer PyTorch Yes Yes Yes - - - - - -

NVIDIA support

In each of the network READMEs, we indicate the level of support that will be provided. The range is from ongoing updates and improvements to a point-in-time release for thought leadership.

Glossary

Multinode Training
Supported on a pyxis/enroot Slurm cluster.

Deep Learning Compiler (DLC)
TensorFlow XLA and PyTorch JIT and/or TorchScript

Accelerated Linear Algebra (XLA)
XLA is a domain-specific compiler for linear algebra that can accelerate TensorFlow models with potentially no source code changes. The results are improvements in speed and memory usage.

PyTorch JIT and/or TorchScript
TorchScript is a way to create serializable and optimizable models from PyTorch code. TorchScript, an intermediate representation of a PyTorch model (subclass of nn.Module) that can then be run in a high-performance environment such as C++.

Automatic Mixed Precision (AMP)
Automatic Mixed Precision (AMP) enables mixed precision training on Volta, Turing, and NVIDIA Ampere GPU architectures automatically.

TensorFloat-32 (TF32)
TensorFloat-32 (TF32) is the new math mode in NVIDIA A100 GPUs for handling the matrix math also called tensor operations. TF32 running on Tensor Cores in A100 GPUs can provide up to 10x speedups compared to single-precision floating-point math (FP32) on Volta GPUs. TF32 is supported in the NVIDIA Ampere GPU architecture and is enabled by default.

Jupyter Notebooks (NB)
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.

Feedback / Contributions

We're posting these examples on GitHub to better support the community, facilitate feedback, as well as collect and implement contributions using GitHub Issues and pull requests. We welcome all contributions!

Known issues

In each of the network READMEs, we indicate any known issues and encourage the community to provide feedback.

Owner
NVIDIA Corporation
NVIDIA Corporation
novel deep learning research works with PaddlePaddle

Research 发布基于飞桨的前沿研究工作,包括CV、NLP、KG、STDM等领域的顶会论文和比赛冠军模型。 目录 计算机视觉(Computer Vision) 自然语言处理(Natrual Language Processing) 知识图谱(Knowledge Graph) 时空数据挖掘(Spa

1.5k Dec 29, 2022
Code for pre-training CharacterBERT models (as well as BERT models).

Pre-training CharacterBERT (and BERT) This is a repository for pre-training BERT and CharacterBERT. DISCLAIMER: The code was largely adapted from an o

Hicham EL BOUKKOURI 31 Dec 05, 2022
Official repo for BMVC2021 paper ASFormer: Transformer for Action Segmentation

ASFormer: Transformer for Action Segmentation This repo provides training & inference code for BMVC 2021 paper: ASFormer: Transformer for Action Segme

42 Dec 23, 2022
Pywonderland - A tour in the wonderland of math with python.

A Tour in the Wonderland of Math with Python A collection of python scripts for drawing beautiful figures and animating interesting algorithms in math

Zhao Liang 4.1k Jan 03, 2023
[ECCV 2020] XingGAN for Person Image Generation

Contents XingGAN or CrossingGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowl

Hao Tang 218 Oct 29, 2022
Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021)

Learning Facial Representations from the Cycle-consistency of Face (ICCV 2021) This repository contains the code for our ICCV2021 paper by Jia-Ren Cha

Jia-Ren Chang 40 Dec 27, 2022
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
PyTorch Implementation of NCSOFT's FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis

FastPitchFormant - PyTorch Implementation PyTorch Implementation of FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis. Qu

Keon Lee 63 Jan 02, 2023
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

217 Jan 03, 2023
MOpt-AFL provided by the paper "MOPT: Optimized Mutation Scheduling for Fuzzers"

MOpt-AFL 1. Description MOpt-AFL is a AFL-based fuzzer that utilizes a customized Particle Swarm Optimization (PSO) algorithm to find the optimal sele

172 Dec 18, 2022
An open source implementation of CLIP.

OpenCLIP Welcome to an open source implementation of OpenAI's CLIP (Contrastive Language-Image Pre-training). The goal of this repository is to enable

2.7k Dec 31, 2022
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

2s-AGCN Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19 Note PyTorch version should be 0.3! For PyTor

LShi 547 Dec 26, 2022
Generative Flow Networks

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation Implementation for our paper, submitted to NeurIPS 2021 (also chec

Emmanuel Bengio 381 Jan 04, 2023
A tool to estimate time varying instantaneous reproduction number during epidemics

EpiEstim A tool to estimate time varying instantaneous reproduction number during epidemics. It is described in the following paper: @article{Cori2013

MRC Centre for Global Infectious Disease Analysis 78 Dec 19, 2022
The codes I made while I practiced various TensorFlow examples

TensorFlow_Exercises The codes I made while I practiced various TensorFlow examples About the codes I didn't create these codes by myself, but re-crea

Terry Taewoong Um 614 Dec 08, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Main repo for ECCV 2020 paper MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images. visual.cs.brown.edu/matryodshka

Brown University Visual Computing Group 75 Dec 13, 2022
Implementation of " SESS: Self-Ensembling Semi-Supervised 3D Object Detection" (CVPR2020 Oral)

SESS: Self-Ensembling Semi-Supervised 3D Object Detection Created by Na Zhao from National University of Singapore Introduction This repository contai

125 Dec 23, 2022