PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Last update: Feb 27, 2022

Overview

Transformer-PyTorch

A PyTorch implementation of the Transformer from the paper Attention is All You Need in both Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Pre-LN applies LayerNorm to the input of every sublayers instead of the residual connection part in Post-LN. The proposed model architecture in the paper was in Post-LN, however the official implementation has been changed into Pre-LN version. The experiment result shows that Pre-LN transformer converges faster while doesn't even need warming up, and is less sensitive to hyperparameters. For more detail about the difference between them, check out the paper On Layer Normalization in the Transformer Architecture.

A STAR would be so nice if you like it!

Dataset

The English-German small-dataset WMT 2016 multimodal task from torchtext.

Prerequisites

Python3
PyTorch >= 1.2.0
torchtext
spacy
nltk
tqdm

Implementation Notes

Beam search is not supported.
Label smoothing is not implemented.
BPE is not adapted.

Usage

Run transformer.ipynb to download dataset and train the model.
Change the flag pre_lnorm to determine which to use.

Evaluation

Parameter settings
- hidden size: 512
- feed forward size: 2048
- num head: 8
- layer: 6
- warm-up: 2000
- batch size: 128

Generated Examples

Here's an example from test data:

source
- eine frau verwendet eine bohrmaschine während ein mann sie fotografiert .
gold
- a woman uses a drill while another man takes her picture .
inference
- a woman uses an electric drill as a man takes a picture .

TODO

Label smoothing
Attention visualization

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

Related tags

Overview

Transformer-PyTorch

A STAR would be so nice if you like it!

Dataset

Prerequisites

Implementation Notes

Usage

Evaluation

Generated Examples

TODO

References

Owner

Jared Wang

REGTR: End-to-end Point Cloud Correspondences with Transformers

Learning Spatio-Temporal Transformer for Visual Tracking

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

The full training script for Enformer (Tensorflow Sonnet) on TPU clusters

Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

To Design and Implement Logistic Regression to Classify Between Benign and Malignant Cancer Types

PyTorch implementation of Constrained Policy Optimization

Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation

Pytorch implementation of SenFormer: Efficient Self-Ensemble Framework for Semantic Segmentation

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

6D Grasping Policy for Point Clouds

(AAAI 2021) Progressive One-shot Human Parsing

Official implementation of MSR-GCN (ICCV 2021 paper)

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

MediaPipe is a an open-source framework from Google for building multimodal

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Turi Create simplifies the development of custom machine learning models.

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"