Attention for PyTorch with Linear Memory Footprint

Unofficially implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention (+ some sidekick speedup on the GPU when compared to reference implementation in JAX)

Usage:

git clone https://github.com/CHARM-Tx/linear_mem_attention_pytorch
cd linear_mem_attention_pytorch
python setup.py install

Usage:

High Level

from linear_mem_attention_torch.fast_attn import Attention

batch, length, features = 2, 2**8, 64
x, ctx = torch.randn(2, batch, length, features)
mask = torch.randn(batch, length) < 1.

attn = Attention(dim=features, heads = 8, dim_head = 64, bias=False)

# self-attn
v_self = attn(x, x, mask, query_chunk_size=1024, key_chunk_size=4096)

# cross-attn
v_cross = attn(x, ctx, mask, query_chunk_size=1024, key_chunk_size=4096)

Low level

from linear_mem_attention_torch import attention

batch, length, heads, features = 2, 2**8, 8, 64
mask = torch.randn(batch, length) < 1.
q, k, v = torch.randn(3, batch, length, heads, features)

v_ = attention(q, k, v, mask, query_chunk_size=1024, key_chunk_size=4096)

Benchmarks

See examples/example_benchamrk.ipynb for more information.

Citations:

@misc{rabe2021selfattention,
      title={Self-attention Does Not Need $O(n^2)$ Memory}, 
      author={Markus N. Rabe and Charles Staats},
      year={2021},
      eprint={2112.05682},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Attention for PyTorch with Linear Memory Footprint

Related tags

Overview

Attention for PyTorch with Linear Memory Footprint

Usage:

Usage:

High Level

Low level

Benchmarks

Citations:

Owner

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Keeper for Ricochet Protocol, implemented with Apache Airflow

Classify music genre from a 10 second sound stream using a Neural Network.

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

noisy labels; missing labels; semi-supervised learning; entropy; uncertainty; robustness and generalisation.

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Pseudo-rng-app - whos needs science to make a random number when you have pseudoscience?

Camera-caps - Examine the camera capabilities for V4l2 cameras

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Official PyTorch implementation of the paper: DeepSIM: Image Shape Manipulation from a Single Augmented Training Sample

Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Dewarping Document Image By Displacement Flow Estimation with Fully Convolutional Network.

Mixed Neural Likelihood Estimation for models of decision-making

Doing fast searching of nearest neighbors in high dimensional spaces is an increasingly important problem

PyTorch implementation of our ICCV 2021 paper, Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents.

MADT: Offline Pre-trained Multi-Agent Decision Transformer