Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch

Overview

CAPE 🌴 pylint pytest

PyTorch implementation of Continuous Augmented Positional Embeddings (CAPE), by Likhomanenko et al. Enhance your Transformer positional embeddings with easy-to-use augmentations!

Setup 🔧

Minimum requirements:

torch >= 1.10.0

Install from source:

git clone https://github.com/gcambara/cape.git
cd cape
pip install --editable ./

Usage 📖

Ready to go along with PyTorch's official implementation of Transformers. Default initialization behaves identically as sinusoidal positional embeddings, summing them up to your content embeddings:

from torch import nn
from cape import CAPE1d

pos_emb = CAPE1d(d_model=512)
transformer = nn.Transformer(d_model=512)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x) # forward sums the positional embedding by default
x = transformer(x)

Alternatively, you can get positional embeddings separately

x = torch.randn(10, 32, 512)
pos_emb = pos_emb.compute_pos_emb(x)

scale = 512**0.5
x = (scale * x) + pos_emb
x = transformer(x)

Let's see a few examples of CAPE initialization for different modalities, inspired by the original paper experiments.

CAPE for text 🔤

CAPE1d is ready to be applied to text. Keep max_local_shift between 0 and 0.5 to shift local positions without disordering them.

from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=5.0, 
                 max_local_shift=0.5, max_global_scaling=1.03, 
                 normalize=False)

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x = pos_emb(x)

Padding is supported by indicating the length of samples in the forward method, with the x_lengths argument. For example, the original length of samples is 7, although they have been padded to sequence length 10.

x = torch.randn(10, 32, 512) # seq_len, batch_size, n_feats
x_lengths = torch.ones(32)*7
x = pos_emb(x, x_lengths=x_lengths)

CAPE for audio 🎙️

CAPE1d for audio is applied similarly to text. Use positions_delta argument to set the separation in seconds between time steps, and x_lengths for indicating sample durations in case there is padding.

For instance, let's consider no padding and same hop size (30 ms) at every sample in the batch:

# Max global shift is 60 s.
# Max local shift is set to 0.5 to maintain positional order.
# Max global scaling is 1.1, according to WSJ recipe.
# Freq scale is 30 to ensure that 30 ms queries are possible with long audios
from cape import CAPE1d
pos_emb = CAPE1d(d_model=512, max_global_shift=60.0, 
                 max_local_shift=0.5, max_global_scaling=1.1, 
                 normalize=True, freq_scale=30.0)

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats
positions_delta = 0.03 # 30 ms of stride
x = pos_emb(x, positions_delta=positions_delta)

Now, let's imagine that the original duration of all samples is 2.5 s, although they have been padded to 3.0 s. Hop size is 30 ms for every sample in the batch.

x = torch.randn(100, 32, 512) # seq_len, batch_size, n_feats

duration = 2.5
positions_delta = 0.03
x_lengths = torch.ones(32)*duration
x = pos_emb(x, x_lengths=x_lengths, positions_delta=positions_delta)

What if the hop size is different for every sample in the batch? E.g. first half of the samples have stride of 30 ms, and the second half of 50 ms.

positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300,
        0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0300, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500, 0.0500,
        0.0500, 0.0500, 0.0500, 0.0500, 0.0500])

Lastly, let's consider a very rare case, where hop size is different for every sample in the batch, and is not constant within some samples. E.g. stride of 30 ms for the first half of samples, and 50 ms for the second half. However, the hop size of the very first sample linearly increases for each time step.

from einops import repeat
positions_delta = 0.03
positions_delta = torch.ones(32)*positions_delta
positions_delta[16:] = 0.05
positions_delta = repeat(positions_delta, 'b -> b new_axis', new_axis=100)
positions_delta[0, :] *= torch.arange(1, 101)
x = pos_emb(x, positions_delta=positions_delta)
positions_delta
tensor([[0.0300, 0.0600, 0.0900,  ..., 2.9400, 2.9700, 3.0000],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        [0.0300, 0.0300, 0.0300,  ..., 0.0300, 0.0300, 0.0300],
        ...,
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500,  ..., 0.0500, 0.0500, 0.0500]])

CAPE for ViT 🖼️

CAPE2d is used for embedding positions in image patches. Scaling of positions between [-1, 1] is done within the module, whether patches are square or non-square. Thus, set max_local_shift between 0 and 0.5, and the scale of local shifts will be adjusted according to the height and width of patches. Beyond values of 0.5 the order of positions might be altered, do this at your own risk!

from cape import CAPE2d
pos_emb = CAPE2d(d_model=512, max_global_shift=0.5, 
                 max_local_shift=0.5, max_global_scaling=1.4)

# Case 1: square patches
x = torch.randn(16, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

# Case 2: non-square patches
x = torch.randn(24, 16, 32, 512) # height, width, batch_size, n_feats
x = pos_emb(x)

Citation ✍️

I just did this PyTorch implementation following the paper's Python code and the Flashlight recipe in C++. All the credit goes to the original authors, please cite them if you use this for your research project:

@inproceedings{likhomanenko2021cape,
title={{CAPE}: Encoding Relative Positions with Continuous Augmented Positional Embeddings},
author={Tatiana Likhomanenko and Qiantong Xu and Gabriel Synnaeve and Ronan Collobert and Alex Rogozhnikov},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=n-FqqWXnWW}
}

Acknowledgments 🙏

Many thanks to the paper's authors for code reviewing and clarifying doubts about the paper and the implementation. :)

You might also like...
Implementation of
Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Rotary Embeddings - Pytorch A standalone library for adding rotary embeddings to transformers in Pytorch, following its success as relative positional

A PyTorch Implementation of
A PyTorch Implementation of "Watch Your Step: Learning Node Embeddings via Graph Attention" (NeurIPS 2018).

Attention Walk ⠀⠀ A PyTorch Implementation of Watch Your Step: Learning Node Embeddings via Graph Attention (NIPS 2018). Abstract Graph embedding meth

PyTorch implementation of the NIPS-17 paper
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch
Implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTorch

Neural Distance Embeddings for Biological Sequences Official implementation of Neural Distance Embeddings for Biological Sequences (NeuroSEED) in PyTo

Styled Augmented Translation
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

TANL: Structured Prediction as Translation between Augmented Natural Languages

TANL: Structured Prediction as Translation between Augmented Natural Languages Code for the paper "Structured Prediction as Translation between Augmen

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments (CoRL 2020)

Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments [Project website] [Paper] This project is a PyTorch

Releases(v1.0.0)
Owner
Guillermo Cámbara
🎙️ PhD Candidate in Self-Supervised Learning + Speech Recognition @ Universitat Pompeu Fabra & Telefónica Research
Guillermo Cámbara
Official Pytorch implementation for video neural representation (NeRV)

NeRV: Neural Representations for Videos (NeurIPS 2021) Project Page | Paper | UVG Data Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav S

hao 214 Dec 28, 2022
PyTorch Implementation of DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

DiffGAN-TTS - PyTorch Implementation PyTorch implementation of DiffGAN-TTS: High

Keon Lee 157 Jan 01, 2023
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
A complete, self-contained example for training ImageNet at state-of-the-art speed with FFCV

ffcv ImageNet Training A minimal, single-file PyTorch ImageNet training script designed for hackability. Run train_imagenet.py to get... ...high accur

FFCV 92 Dec 31, 2022
A big endian Gentoo port developed on a Pine64.org RockPro64

Gentoo-aarch64_be A big endian Gentoo port developed on a Pine64.org RockPro64 The endian wars are over... little endian won. As a result, it is incre

Rory Bolt 6 Dec 07, 2022
Differentiable simulation for system identification and visuomotor control

gradsim gradSim: Differentiable simulation for system identification and visuomotor control gradSim is a unified differentiable rendering and multiphy

105 Dec 18, 2022
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers

hierarchical-transformer-1d Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning using 🤗 transformers In Progress!! 2021.

MyungHoon Jin 7 Nov 06, 2022
Citation Intent Classification in scientific papers using the Scicite dataset an Pytorch

Citation Intent Classification Table of Contents About the Project Built With Installation Usage Acknowledgments About The Project Citation Intent Cla

Federico Nocentini 4 Mar 04, 2022
Repo for parser tensorflow(.pb) and tflite(.tflite)

tfmodel_parser .pb file is the format of tensorflow model .tflite file is the format of tflite model, which usually used in mobile devices before star

1 Dec 23, 2021
This is my codes that can visualize the psnr image in testing videos.

CVPR2018-Baseline-PSNRplot This is my codes that can visualize the psnr image in testing videos. Future Frame Prediction for Anomaly Detection – A New

Wenhao Yang 12 May 29, 2021
Quick program made to generate alpha and delta tables for Hidden Markov Models

HMM_Calc Functions for generating Alpha and Delta tables from a Hidden Markov Model. Parameters: a: Matrix of transition probabilities. a[i][j] = a_{i

Adem Odza 1 Dec 04, 2021
🥇Samsung AI Challenge 2021 1등 솔루션입니다🥇

MoT - Molecular Transformer Large-scale Pretraining for Molecular Property Prediction Samsung AI Challenge for Scientific Discovery This repository is

Jungwoo Park 44 Dec 03, 2022
This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning

This is the code for Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning It includes /bert, which is the original BERT repos

Mitchell Gordon 11 Nov 15, 2022
Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Unofficial Pytorch Lightning implementation of Contrastive Syn-to-Real Generalization (ICLR, 2021)

Gyeongjae Choi 17 Sep 23, 2021
Optimising chemical reactions using machine learning

Summit Summit is a set of tools for optimising chemical processes. We’ve started by targeting reactions. What is Summit? Currently, reaction optimisat

Sustainable Reaction Engineering Group 75 Dec 14, 2022
Caffe models in TensorFlow

Caffe to TensorFlow Convert Caffe models to TensorFlow. Usage Run convert.py to convert an existing Caffe model to TensorFlow. Make sure you're using

Saumitro Dasgupta 2.8k Dec 31, 2022
Coursera - Quiz & Assignment of Coursera

Coursera Assignments This repository is aimed to help Coursera learners who have difficulties in their learning process. The quiz and programming home

浅梦 828 Jan 04, 2023
Image-to-Image Translation in PyTorch

CycleGAN and pix2pix in PyTorch New: Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that e

Jun-Yan Zhu 19k Jan 07, 2023