A python library for highly configurable transformers - easing model architecture search and experimentation.

Overview

configaformers (re-factor in progress)

A python library for highly configurable transformers - easing model architecture search and experimentation. It is premised on building small and independent modules that enables users to configure custom transformer architectures.

Special thanks to lucidrains (https://github.com/lucidrains) and Kharr.

Usage

Quick demo that will configure a 768-wide, 12-layer transformer, with a language modeling head.

Import, and create token embedding block:

import torch
from model_builder import ConfigaFormer

emb = []
model_dim = 768

emb.append({'type': 'embedding',
            'output_dim': model_dim,
            'num_classes': 50257})

Create self-attention module:

attn = []

# Make residual and norm
attn.append({'type': 'make_stream', 'output_name': 'residual'})
attn.append({'type': 'norm', 'norm_type': 'layer_norm'})

# Make QKVs
attn.append({'type': 'linear', 'output_name': 'queries'})
attn.append({'type': 'linear', 'output_name': 'keys'})
attn.append({'type': 'linear', 'output_name': 'values'})

attn.append({'type': 'make_heads', 'input_name': 'queries', 'output_name': 'queries', 'num_heads': 12})
attn.append({'type': 'make_heads', 'input_name': 'keys', 'output_name': 'keys', 'num_heads': 12})

attn.append({'type': 'rope', 'input_name': 'queries', 'output_name': 'queries', 'rotate_dim': 16})
attn.append({'type': 'rope', 'input_name': 'keys', 'output_name': 'keys', 'rotate_dim': 16})

# Perform attention
attn.append({'type': 'mha_dots',
             'input_name_queries': 'queries',
             'input_name_keys': 'keys'})
attn.append({'type': 'attention_offset'})
attn.append({'type': 'mha_sum',
             'input_name_values': 'values'})

# Mix
attn.append({'type': 'linear'})

# Add residual
attn.append({'type': 'merge_streams',
             'input_name_1': 'residual',
             'merge_type': 'add'})

Create FFN module:

ffn = []

# Make residual and norm
ffn.append({'type': 'make_stream', 'output_name': 'residual'})
ffn.append({'type': 'norm', 'norm_type': 'layer_norm'})

# Proj Up
ffn.append({'type': 'linear', 'output_dim': 768*4})

# Activation
ffn.append({'type': 'activation'})

# Proj Down
ffn.append({'type': 'linear', 'output_dim': 768})

# Add residual
ffn.append({'type': 'merge_streams',
             'input_name_1': 'residual',
             'merge_type': 'add'})

Create language modeling head:

to_logits = []
to_logits.append({'type': 'linear', 'output_dim': 50257})

Create blocks, initialize input shapes, and init the model:

transformer_block = attn + ffn
classifier = ffn + to_logits

blocks = [{"config": emb,
           "repeat": 1},
          {"config": transformer_block,
           "repeat": 12},
          {"config": classifier,
           "repeat": 1},
          ]
          
my_config = {'blocks' = blocks}
input_streams = {'emb_ids': ['B', 'L_in'],
                 'attn_offset': ['B', 12, 'L_in', 'L_in'],}

model = ConfigaFormer(model_config=my_config,
                     input_streams=input_streams).cuda()

This will print out the transformer config:

Block #1, 1x
embedding -> Input(s): emb_ids (BSZ, L_in) - Output(s): x (BSZ, L_in, 768)


Block #2, 12x
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): queries (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): keys (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): values (BSZ, L_in, 768)
make_heads -> Input(s): queries (BSZ, L_in, 768) - Output(s): queries (BSZ, 12, L_in, 64)
make_heads -> Input(s): keys (BSZ, L_in, 768) - Output(s): keys (BSZ, 12, L_in, 64)
rope -> Input(s): queries (BSZ, 12, L_in, 64), rope_16 (2048, 16) - Output(s): queries (BSZ, 12, L_in, 64)
rope -> Input(s): keys (BSZ, 12, L_in, 64), rope_16 (2048, 16) - Output(s): keys (BSZ, 12, L_in, 64)
mha_dots -> Input(s): queries (BSZ, 12, L_in, 64), keys (BSZ, 12, L_in, 64) - Output(s): attn_dots (BSZ, 12, L_in, L_in)
attention_offset -> Input(s): attn_dots (BSZ, 12, L_in, L_in), attn_offset (BSZ, 12, L_in, L_in) - Output(s): attn_dots (BSZ, 12, L_in, L_in)
mha_sum -> Input(s): values (BSZ, L_in, 768), attn_dots (BSZ, 12, L_in, L_in) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)
activation -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)
linear -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)


Block #3, 1x
make_stream -> Input(s): x (BSZ, L_in, 768) - Output(s): residual (BSZ, L_in, 768)
norm -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 3072)
activation -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 3072)
linear -> Input(s): x (BSZ, L_in, 3072) - Output(s): x (BSZ, L_in, 768)
merge_streams -> Input(s): residual (BSZ, L_in, 768), x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 768)
linear -> Input(s): x (BSZ, L_in, 768) - Output(s): x (BSZ, L_in, 50257)

Before running, we need to get the attention offset (in this case, AliBi with a causal mask):

from attention_offset_module import get_alibi

attn_offset = get_alibi(num_heads=12)

Now we can use the model:

input_data = {'emb_ids': batch_ids.view(bsz, 1024).cuda(),
              'attn_offset': attn_offset.cuda()}

logits = model(input_data)['x'].view(bsz, 1024, 50257)

TODO

  1. Token shifting, down/up sampling
  2. Create higher abstractions for FFN and self-attention
  3. everything else
Owner
Anthony Fuller
Anthony Fuller
Facial Expression Detection In The Realtime

The human's facial expressions is very important to detect thier emotions and sentiment. It can be very efficient to use to make our computers make interviews. Furthermore, we have robots now can det

Adel El-Nabarawy 4 Mar 01, 2022
CVAT is free, online, interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 04, 2023
Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

EMI-FGSM This repository contains code to reproduce results from the paper: Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021) Xiaosen Wa

John Hopcroft Lab at HUST 10 Sep 26, 2022
This repo contains the code and data used in the paper "Wizard of Search Engine: Access to Information Through Conversations with Search Engines"

Wizard of Search Engine: Access to Information Through Conversations with Search Engines by Pengjie Ren, Zhongkun Liu, Xiaomeng Song, Hongtao Tian, Zh

19 Oct 27, 2022
PenguinSpeciesPredictionML - Basic model to predict Penguin species based on beak size and sex.

Penguin Species Prediction (ML) 🐧 👨🏽‍💻 What? 💻 This project is a basic model using sklearn methods to predict Penguin species based on beak size

Tucker Paron 0 Jan 08, 2022
Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

TianYuan 27 Nov 07, 2022
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

75 Dec 28, 2022
Code for the Convolutional Vision Transformer (ConViT)

ConViT : Vision Transformers with Convolutional Inductive Biases This repository contains PyTorch code for ConViT. It builds on code from the Data-Eff

Facebook Research 418 Jan 06, 2023
Make your own game in a font!

Project structure. Included is a suite of tools to create font games. Tutorial: For a quick tutorial about how to make your own game go here For devel

Michael Mulet 125 Dec 04, 2022
Elevation Mapping on GPU.

Elevation Mapping cupy Overview This is a ros package of elevation mapping on GPU. Code are written in python and uses cupy for GPU calculation. * pla

Robotic Systems Lab - Legged Robotics at ETH Zürich 183 Dec 19, 2022
B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search

B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search This is the offical implementation of the

SNU ADSL 0 Feb 07, 2022
This is the repo for Uncertainty Quantification 360 Toolkit.

UQ360 The Uncertainty Quantification 360 (UQ360) toolkit is an open-source Python package that provides a diverse set of algorithms to quantify uncert

International Business Machines 207 Dec 30, 2022
An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding, top-down-bottom-up, and attention (consensus between columns)

GLOM - Pytorch (wip) An attempt at the implementation of Glom, Geoffrey Hinton's new idea that integrates neural fields, predictive coding,

Phil Wang 173 Dec 14, 2022
AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

NeurIPS 2021 Paper "Learning with Algorithmic Supervision via Continuous Relaxations"

Felix Petersen 76 Jan 01, 2023
This repo is official PyTorch implementation of MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021).

Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" Introduction This repo is official PyTorch implementatio

Choi Sang Bum 203 Jan 05, 2023
THIS IS THE **OLD** PYMC PROJECT. PLEASE USE PYMC3 INSTEAD:

Introduction Version: 2.3.8 Authors: Chris Fonnesbeck Anand Patil David Huard John Salvatier Web site: https://github.com/pymc-devs/pymc Documentation

PyMC 7.2k Jan 07, 2023
Code for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss"

PurNet Project for the TIP 2021 Paper "Salient Object Detection with Purificatory Mechanism and Structural Similarity Loss" Abstract Image-based salie

Jinming Su 4 Aug 25, 2022
Human Pose Detection on EdgeTPU

Coral PoseNet Pose estimation refers to computer vision techniques that detect human figures in images and video, so that one could determine, for exa

google-coral 476 Dec 31, 2022
Video-based open-world segmentation

UVO_Challenge Team Alpes_runner Solutions This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our

Yuming Du 84 Dec 22, 2022
Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery (ICCV 2021 Oral) Run this model on Replicate Optimization: Global directions: Mapper: Check ou

3.3k Jan 05, 2023