Mesh Transformer Jax

A haiku library using the new(ly documented) xmap operator in Jax for model parallelism of transformers.

See enwik8_example.py for an example of using this to implement an autoregressive language model.

Benchmarks

On a TPU v3-8 (see tpuv38_example.py):

~2.7B model

Initialized in 121.842s
Total parameters: 2722382080
Compiled in 49.0534s
it: 0, loss: 20.311113357543945
<snip>
it: 90, loss: 3.987450361251831
100 steps in 109.385s
effective flops (not including attn): 2.4466e+14

~4.8B model

Initialized in 101.016s
Total parameters: 4836720896
Compiled in 52.7404s
it: 0, loss: 4.632925987243652
<snip>
it: 40, loss: 3.2406811714172363
50 steps in 102.559s
effective flops (not including attn): 2.31803e+14

10B model

Initialized in 152.762s
Total parameters: 10073579776
Compiled in 92.6539s
it: 0, loss: 5.3125
<snip>
it: 40, loss: 3.65625
50 steps in 100.235s
effective flops (not including attn): 2.46988e+14

Model parallel transformers in Jax and Haiku

Related tags

Overview

Mesh Transformer Jax

Benchmarks

~2.7B model

~4.8B model

10B model

TODO

Owner

Ben Wang

A GOOD REPRESENTATION DETECTS NOISY LABELS

An end-to-end library for editing and rendering motion of 3D characters with deep learning [SIGGRAPH 2020]

An image classification app boilerplate to serve your deep learning models asap!

This is an official repository of CLGo: Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints

Get the partition that a file belongs and the percentage of space that consumes

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

SCAAML is a deep learning framwork dedicated to side-channel attacks run on top of TensorFlow 2.x.

Simulating Sycamore quantum circuits classically using tensor network algorithm.

Explainer for black box models that predict molecule properties

Contrastive Learning for Compact Single Image Dehazing, CVPR2021

Neon-erc20-example - Example of creating SPL token and wrapping it with ERC20 interface in Neon EVM

Python 3 module to print out long strings of text with intervals of time inbetween

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals, CVPR2021

Resources related to our paper "CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain"

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

A Data Annotation Tool for Semantic Segmentation, Object Detection and Lane Line Detection.(In Development Stage)

TorchX: A PyTorch Extension Library for More Efficient Deep Learning

This application is the basic of automated online-class-joiner(for YıldızEdu) within the right time. Gets the ZOOM link by scheduled date and time.

AnimationKit: AI Upscaling & Interpolation using Real-ESRGAN+RIFE

Spam your friends and famly and when you do your famly will disown you and you will have no friends.