GPU Programming with Julia - course at the Swiss National Supercomputing Centre (CSCS), ETH Zurich

Overview

Course title page

Course Description

The programming language Julia is being more and more adopted in High Performance Computing (HPC) due to its unique way to combine performance with simplicity and interactivity, enabling unprecedented productivity in HPC development. This course will discuss both basic and advanced topics relevant for single and Multi-GPU computing with Julia. It will focus on the CUDA.jl package, which enables writing native Julia code for GPUs. Topics covered include the following:

  • GPU array programming;
  • GPU kernel programming;
  • kernel launch parameters;
  • usage of on-chip memory;
  • Multi-GPU computing;
  • code reflection and introspection; and
  • diverse advanced optimization techniques.

This course combines lectures and hands-on sessions.

Target audience

This course addresses scientists interested in doing HPC using Julia. Previous Julia or GPU computing knowledge is not needed, but a good general understanding of programming is advantageous.

Instructors

  • Dr. Tim Besard (Lead developer of CUDA.jl, Julia Computing Inc.)
  • Dr. Samuel Omlin (Computational Scientist | Responsible for Julia computing, CSCS)

Course material

This git repository contains the material of day 1 and 2 (speaker: Dr. Samuel Omlin, CSCS). The material of day 3 and 4 is found in this git repository (speaker: Dr. Tim Besard, Julia Computing Inc.).

Course recording

The edited course recording is found here. The following list provides key entry points into the video.

Day 1:

00:00: Introduction to the course

05:02: General introduction to supercomputing

14:06: High-speed introduction to GPU computing

32:57: Walk through introduction notebook on memory copy and performance evaluation

Day 2:

1:24:53: Introduction to day 2

1:39:12: Walk through solutions of exercise 1 and 2 (data "transfer" optimisations)

2:34:12: Walk through solutions of exercise 3 and 4 (data "transfer" optimisations and distributed parallelization)

Day 3:

03:31:57: Introduction to day 3

03:32:59: Presentation of notebook 1: cuda libraries

04:24:31: Presentation of notebook 2: programming models

05:30:46: Presentation of notebook 3: memory management

06:03:48: Presentation of notebook 4: concurrent computing

Day 4:

06:27:15: Introduction to day 4

06:28:13: Presentation of notebook 5: application analysis and optimisation

07:35:08: Presentation of notebook 6: kernel analysis and optimisation

Owner
Samuel Omlin
Computational Scientist | Responsible for Julia computing, CSCS - Swiss National Supercomputing Centre
Samuel Omlin
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

37 Dec 08, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 04, 2023
Lightweight plotting to the terminal. 4x resolution via Unicode.

Uniplot Lightweight plotting to the terminal. 4x resolution via Unicode. When working with production data science code it can be handy to have plotti

Olav Stetter 203 Dec 29, 2022
An Implementation of SiameseRPN with Feature Pyramid Networks

SiameseRPN with FPN This project is mainly based on HelloRicky123/Siamese-RPN. What I've done is just add a Feature Pyramid Network method to the orig

3 Apr 16, 2022
Official PyTorch implementation of MAAD: A Model and Dataset for Attended Awareness

MAAD: A Model for Attended Awareness in Driving Install // Datasets // Training // Experiments // Analysis // License Official PyTorch implementation

7 Oct 16, 2022
Reinforcement learning models in ViZDoom environment

DoomNet DoomNet is a ViZDoom agent trained by reinforcement learning. The agent is a neural network that outputs a probability of actions given only p

Andrey Kolishchak 126 Dec 09, 2022
Self-Supervised Contrastive Learning of Music Spectrograms

Self-Supervised Music Analysis Self-Supervised Contrastive Learning of Music Spectrograms Dataset Songs on the Billboard Year End Hot 100 were collect

27 Dec 10, 2022
A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

2 Jul 25, 2022
Forecasting with Gradient Boosted Time Series Decomposition

ThymeBoost ThymeBoost combines time series decomposition with gradient boosting to provide a flexible mix-and-match time series framework for spicy fo

131 Jan 08, 2023
Install alphafold on the local machine, get out of docker.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

Kui Xu 73 Dec 13, 2022
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection', CVPR 2019.

Code-and-Dataset-for-CapSal This project provides the code and datasets for 'CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detec

lu zhang 48 Aug 19, 2022
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

HEP Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior Implementation Python3 PyTorch=1.0 NVIDIA GPU+CUDA Training process The

FengZhang 34 Dec 04, 2022
Flower - A Friendly Federated Learning Framework

Flower - A Friendly Federated Learning Framework Flower (flwr) is a framework for building federated learning systems. The design of Flower is based o

Adap 1.8k Jan 01, 2023
Official implementation of "Learning Not to Reconstruct" (BMVC 2021)

Official PyTorch implementation of "Learning Not to Reconstruct Anomalies" This is the implementation of the paper "Learning Not to Reconstruct Anomal

Marcella Astrid 13 Dec 04, 2022
Code for "Unsupervised Source Separation via Bayesian inference in the latent domain"

LQVAE-separation Code for "Unsupervised Source Separation via Bayesian inference in the latent domain" Paper Samples GT Compressed Separated Drums GT

Michele Mancusi 30 Oct 25, 2022
This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing described in paper Discontinuous Grammar as a Foreign Language.

Discontinuous Grammar as a Foreign Language This repository includes the code of the sequence-to-sequence model for discontinuous constituent parsing

Daniel Fernández-González 2 Apr 07, 2022
CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation

CDGAN CDGAN: Cyclic Discriminative Generative Adversarial Networks for Image-to-Image Transformation CDGAN Implementation in PyTorch This is the imple

Kancharagunta Kishan Babu 6 Apr 19, 2022
Code for Robust Contrastive Learning against Noisy Views

Robust Contrastive Learning against Noisy Views This repository provides a PyTorch implementation of the Robust InfoNCE loss proposed in paper Robust

Ching-Yao Chuang 53 Jan 08, 2023