Implementation of a Transformer, but completely in Triton

Last update: Dec 22, 2022

Overview

Transformer in Triton (wip)

Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repository will mostly be a learning experience, with the end-goal being a vanilla transformer that is faster and more efficient to train.

Install

$ pip install triton-transformer

Usage

import torch
from triton_transformer import Transformer

model = Transformer(
    num_tokens = 256,
    max_seq_len = 1024,
    dim = 512,
    depth = 6,
    heads = 8,
    dim_head = 64
)

x = torch.randint(0, 256, (1, 1024))
mask = torch.ones(1, 1024).bool()

logits = model(x, mask = mask) # (1, 1024, 256)

Citations

@article{Tillet2019TritonAI,
    title   = {Triton: an intermediate language and compiler for tiled neural network computations},
    author  = {Philippe Tillet and H. Kung and D. Cox},
    journal = {Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages},
    year    = {2019}
}

@misc{vaswani2017attention,
    title   = {Attention Is All You Need}, 
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

RSG: A Simple but Effective Module for Learning Imbalanced Datasets (CVPR 2021) A Pytorch implementation of our CVPR 2021 paper "RSG: A Simple but Eff

120 Dec 12, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

515 Dec 26, 2022

A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

115 Dec 9, 2021

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

ImageProcessingTransformer Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

61 Jan 1, 2023

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

Episodic Transformers (E.T.) Episodic Transformer for Vision-and-Language Navigation Alexander Pashevich, Cordelia Schmid, Chen Sun Episodic Transform

62 Dec 24, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

409 Jan 6, 2023

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation ". Please

610 Dec 28, 2022

3D-Transformer: Molecular Representation with Transformer in 3D Space

55 Dec 19, 2022

Comments

Question concerning PyTorch build

Hello. I find your project very interesting and I have seen your comparison between PyTorch and Triton implementations.

However, I am curious whether your PyTorch environment is a source build optimized for your machine or a pip/conda install.

Source building has faster runtimes and if a conda install is being used for comparison, the difference in speed may simply be due to Triton optimizing CUDA for the run environment.

Thank you again for your interesting project.

opened by veritas9872 13
_layernorm implementation forward result not equal F.layer_norm

I have a try on your triton-transformer and test the layernorm module alone. It's very weird that the forward result is different while the backward result is equal.

code: from triton_transformer.layernorm import layernorm import torch import torch.nn as nn

torch.manual_seed(0) x = torch.randn(2,5).cuda() x.requires_grad_(True) dy = .1*torch.randn_like(x).cuda() dim = 5 norm = nn.LayerNorm(dim).cuda()

y1 = layernorm(x, norm.weight, norm.bias, use_triton = True) y2 = layernorm(x, norm.weight, norm.bias, use_triton = False) print(y1, y2) print(torch.allclose(y1, y2))

y1.backward(dy, retain_graph=True) dx_y1 = x.grad.clone()

x.grad = None

y2.backward(dy, retain_graph=True) dx_y2 = x.grad.clone() print(dx_y1, dx_y2) print(torch.allclose(dx_y1, dx_y2))

result: `tensor([[ 0.9492, -0.0021, -0.9797, 0.4449, -0.4123], [-0.7624, 0.4399, 0.7299, -0.3091, -0.0983]], device='cuda:0', grad_fn=<_layernormBackward>) tensor([[ 1.4217, -0.0031, -1.4674, 0.6663, -0.6175], [-1.4342, 0.8276, 1.3732, -0.5815, -0.1850]], device='cuda:0', grad_fn=) False

tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') tensor([[-0.0706, 0.0288, -0.0813, 0.0446, 0.0785], [ 0.0218, -0.0152, 0.0141, -0.0522, 0.0315]], device='cuda:0') True`

opened by Tengxu-Sun 1
Current state of benchmarking & contributing?
Hey @lucidrains - hope you're doing well! I have some time to hack the next couple weeks, just wanted to get a sense of:

Current state of benchmarking (what Triton kernels provide how much lift, aggregate lift over a "vanilla Transformer implementation"

If there's anything I could help with, especially as I learn Triton!
opened by siddk 0
Official layer norm added

Hi @lucidrains , in Triton layer norm was just added in examples, https://github.com/openai/triton/commit/d4baad426db72b83c5222e1c83c929c1860cae54 I tested it, it's twice as fast as Torch, often faster then Apex.

I'm looking forward for your implementation of attention, so far the Torch implementation is the fastest with 12.3 / 14.5 (forw / back) vs the other Triton implementation in DeepSpeed which is 17.3/ 23.0 on my data.

opened by olegklimov 2

Releases(0.1.1)

0.1.1(Apr 5, 2022)

Source code(tar.gz)
Source code(zip)
0.1.0(Apr 4, 2022)

Source code(tar.gz)
Source code(zip)
0.0.28(Mar 23, 2022)

Source code(tar.gz)
Source code(zip)
0.0.27(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.26(Nov 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.25(Oct 6, 2021)

Source code(tar.gz)
Source code(zip)
0.0.24(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.23(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.22(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.21(Oct 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.20(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.19(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.18(Sep 29, 2021)

Source code(tar.gz)
Source code(zip)
0.0.17(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.16(Sep 28, 2021)

Source code(tar.gz)
Source code(zip)
0.0.15(Sep 27, 2021)

Source code(tar.gz)
Source code(zip)
0.0.14(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.12(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.10(Sep 23, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Sep 22, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Sep 15, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need

GitHub Repository

Alleviating Over-segmentation Errors by Detecting Action Boundaries

Alleviating Over-segmentation Errors by Detecting Action Boundaries Forked from ASRF offical code. This repo is the a implementation of replacing orig

13 Dec 12, 2022

A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

2.2k Jan 09, 2023

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Low-light Image Enhancement via Breaking Down the Darkness by Qiming Hu, Xiaojie Guo. 1. Dependencies Python3 PyTorch=1.0 OpenCV-Python, TensorboardX

30 Jan 01, 2023

Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

203 Jan 03, 2023

A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

260 Sep 02, 2022

A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

6 Dec 07, 2022

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Deep Inside Convolutional Networks This is a caffe implementation to visualize the learnt model. Part of a class project at Georgia Tech Problem State

61 Apr 15, 2022

Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

PackNet: https://arxiv.org/abs/1711.05769 Pretrained models are available here: https://uofi.box.com/s/zap2p03tnst9dfisad4u0sfupc0y1fxt Datasets in Py

216 Jan 05, 2023

Official code for paper Exemplar Based 3D Portrait Stylization.

3D-Portrait-Stylization This is the official code for the paper "Exemplar Based 3D Portrait Stylization". You can check the paper on our project websi

60 Dec 07, 2022

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Hybrid solving process for combinatorial optimization problems Combinatorial optimization has found applications in numerous fields, from aerospace to

117 Dec 13, 2022

An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Deep Permutation Equivariant Structure from Motion Paper | Poster This repository contains an implementation for the ICCV 2021 paper Deep Permutation

72 Dec 27, 2022

Code to generate datasets used in "How Useful is Self-Supervised Pretraining for Visual Tasks?"

Synthetic dataset rendering Framework for producing the synthetic datasets used in: How Useful is Self-Supervised Pretraining for Visual Tasks? Alejan

21 Apr 29, 2022

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

0 Jan 02, 2022

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning (ACMMM 2021) Overview We release the code of the DSANet (Dynamic S

46 Dec 27, 2022

Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting" by Shu et al.

[Re] Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping

1 Mar 13, 2020

A python module for scientific analysis of 3D objects based on VTK and Numpy

A lightweight and powerful python module for scientific analysis and visualization of 3d objects.

1.5k Jan 06, 2023

Implementation of Axial attention - attending to multi-dimensional data efficiently

Axial Attention Implementation of Axial attention in Pytorch. A simple but powerful technique to attend to multi-dimensional data efficiently. It has

250 Dec 25, 2022

LBBA-boosted WSOD

LBBA-boosted WSOD Summary Our code is based on ruotianluo/pytorch-faster-rcnn and WSCDN Sincerely thanks for your resources. Newer version of our code

20 Sep 19, 2022

Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

Adversarial Reciprocal Points Learning for Open Set Recognition Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Se

78 Dec 28, 2022

🌊 Online machine learning in Python

In a nutshell River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition

4k Jan 02, 2023

Implementation of a Transformer, but completely in Triton

Related tags

Overview

Transformer in Triton (wip)

Install

Usage

Citations

You might also like...

A Pytorch implementation of CVPR 2021 paper "RSG: A Simple but Effective Module for Learning Imbalanced Datasets"

A concise but complete implementation of CLIP with various experimental improvements from recent papers

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Third party Pytorch implement of Image Processing Transformer (Pre-Trained Image Processing Transformer arXiv:2012.00364v2)

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

3D-Transformer: Molecular Representation with Transformer in 3D Space

Comments

Question concerning PyTorch build

_layernorm implementation forward result not equal F.layer_norm

Current state of benchmarking & contributing?

Official layer norm added

Releases(0.1.1)

0.1.1(Apr 5, 2022)

0.1.0(Apr 4, 2022)

0.0.28(Mar 23, 2022)

0.0.27(Nov 6, 2021)

0.0.26(Nov 6, 2021)

0.0.25(Oct 6, 2021)

0.0.24(Oct 4, 2021)

0.0.23(Oct 4, 2021)

0.0.22(Oct 4, 2021)

0.0.21(Oct 4, 2021)

0.0.20(Sep 29, 2021)

0.0.19(Sep 29, 2021)

0.0.18(Sep 29, 2021)

0.0.17(Sep 28, 2021)

0.0.16(Sep 28, 2021)

0.0.15(Sep 27, 2021)

0.0.14(Sep 23, 2021)

0.0.12(Sep 23, 2021)

0.0.10(Sep 23, 2021)

0.0.9(Sep 22, 2021)

0.0.8(Sep 22, 2021)

0.0.7(Sep 22, 2021)

0.0.6(Sep 22, 2021)

0.0.5(Sep 22, 2021)

0.0.4(Sep 22, 2021)

0.0.3(Sep 15, 2021)

0.0.2(Sep 15, 2021)

Owner

Phil Wang

Alleviating Over-segmentation Errors by Detecting Action Boundaries

A PyTorch-Based Framework for Deep Learning in Computer Vision

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Aligning Latent and Image Spaces to Connect the Unconnectable

A torch implementation of "Pixel-Level Domain Transfer"

A big endian Gentoo port developed on a Pine64.org RockPro64

Deep Inside Convolutional Networks - This is a caffe implementation to visualize the learnt model

Code for PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

Official code for paper Exemplar Based 3D Portrait Stylization.

Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

An implementation for the ICCV 2021 paper Deep Permutation Equivariant Structure from Motion.

Code to generate datasets used in "How Useful is Self-Supervised Pretraining for Visual Tasks?"

Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

【ACMMM 2021】DSANet: Dynamic Segment Aggregation Network for Video-Level Representation Learning

Reimplementation of NeurIPS'19: "Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting" by Shu et al.

A python module for scientific analysis of 3D objects based on VTK and Numpy

Implementation of Axial attention - attending to multi-dimensional data efficiently

LBBA-boosted WSOD

Official PyTorch implementation of "Adversarial Reciprocal Points Learning for Open Set Recognition"

🌊 Online machine learning in Python