The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Overview

tldr-transformers

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Models: GPT- *, * BERT *, Adapter- *, * T5, etc.

BERT and T5 (art from the original papers)

     

Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface 🤗 implementation.

Here is an example: t5.

The transformers papers are presented somewhat chronologically below. Go to the " 👉 Notes 👈 " column below to find the notes for each paper.

This repo also includes a table quantifying the differences across transformer papers all in one table.

Contents

Quick_Note

This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.

Motivation

With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.

Models

Model Year Institute Paper 👉 Notes 👈 Original Code Huggingface 🤗 Other Repo
Transformer 2017 Google Attention is All You Need Skipped, too many good write-ups: ?
GPT-3 2018 OpenAI Language Models are Unsupervised Multitask Learners To-Do X X
GPT-J-6B 2021 EleutherAI GPT-J-6B: 6B Jax-Based Transformer (public GPT-3) X here x x
BERT 2018 Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT notes here here
DistilBERT 2019 Huggingface DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter DistilBERT notes here
ALBERT 2019 Google/Toyota ALBERT: A Lite BERT for Self-supervised Learning of Language Representations ALBERT notes here here
RoBERTa 2019 Facebook RoBERTa: A Robustly Optimized BERT Pretraining Approach RoBERTa notes here here
BART 2019 Facebook BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension BART notes here here
T5 2019 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer T5 notes here here
Adapter-BERT 2019 Google Parameter-Efficient Transfer Learning for NLP Adapter-BERT notes here - here
Megatron-LM 2019 NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron notes here - here
Reformer 2020 Google Reformer: The Efficient Transformer Reformer notes here
byT5 2021 Google ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 notes here here
CLIP 2021 OpenAI Learning Transferable Visual Models From Natural Language Supervision CLIP notes here here
DALL-E 2021 OpenAI Zero-Shot Text-to-Image Generation DALL-E notes here -
Codex 2021 OpenAI Evaluating Large Language Models Trained on Code Codex notes X -

BigTable

All of the table summaries found ^ collapsed into one really big table here.

Alignment

Paper Year Institute 👉 Notes 👈 Codes
Fine-Tuning Language Models from Human Preferences 2019 OpenAI To-Do None

Scaling

Paper Year Institute 👉 Notes 👈 Codes
Scaling Laws for Neural Language Models 2020 OpenAI To-Do None

Memorization

Paper Year Institute 👉 Notes 👈 Codes
Extracting Training Data from Large Language Models 2021 Google et al. To-Do None
Deduplicating Training Data Makes Language Models Better 2021 Google et al. To-Do None

FewLabels

Paper Year Institute 👉 Notes 👈 Codes
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP 2021 GIT/UNC To-Do None
Learning with fewer labeled examples 2021 Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) Worth a read, won't summarize here. None

Contribute

If you are interested in contributing to this repo, feel free to do the following:

  1. Fork the repo.
  2. Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
  3. Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
  4. Submit your PR.

Errata

Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.

Citation

@misc{cliff-notes-transformers,
  author = {Thompson, Will},
  url = {https://github.com/will-thompson-k/cliff-notes-transformers},
  year = {2021}
}

For the notes above, I've linked the original papers.

License

MIT

Owner
Will Thompson
Will Thompson
SpinalNet: Deep Neural Network with Gradual Input

SpinalNet: Deep Neural Network with Gradual Input This repository contains scripts for training different variations of the SpinalNet and its counterp

H M Dipu Kabir 142 Dec 30, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 01, 2023
The most simple and minimalistic navigation dashboard.

Navigation This project follows a goal to have simple and lightweight dashboard with different links. I use it to have my own self-hosted service dash

Yaroslav 23 Dec 23, 2022
A PyTorch Implementation of "Neural Arithmetic Logic Units"

Neural Arithmetic Logic Units [WIP] This is a PyTorch implementation of Neural Arithmetic Logic Units by Andrew Trask, Felix Hill, Scott Reed, Jack Ra

Kevin Zakka 181 Nov 18, 2022
FID calculation with proper image resizing and quantization steps

clean-fid: Fixing Inconsistencies in FID Project | Paper The FID calculation involves many steps that can produce inconsistencies in the final metric.

Gaurav Parmar 606 Jan 06, 2023
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for

Zekeriyya Demirci 1 Jan 09, 2022
curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

curl-impersonate A special compilation of curl that makes it impersonate real browsers. It can impersonate the four major browsers: Chrome, Edge, Safa

lwthiker 1.9k Jan 03, 2023
MLPs for Vision and Langauge Modeling (Coming Soon)

MLP Architectures for Vision-and-Language Modeling: An Empirical Study MLP Architectures for Vision-and-Language Modeling: An Empirical Study (Code wi

Yixin Nie 27 May 09, 2022
Multiple paper open-source codes of the Microsoft Research Asia DKI group

📫 Paper Code Collection (MSRA DKI Group) This repo hosts multiple open-source codes of the Microsoft Research Asia DKI Group. You could find the corr

Microsoft 249 Jan 08, 2023
[TPAMI 2021] iOD: Incremental Object Detection via Meta-Learning

Incremental Object Detection via Meta-Learning To appear in an upcoming issue of the IEEE Transactions on Pattern Analysis and Machine Intelligence (T

Joseph K J 66 Jan 04, 2023
GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Xinyan Zhao 29 Dec 26, 2022
Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Visual Transformer for Facial Emotion Recognition (FER) This project has the aim to build an efficient Visual Transformer for the Facial Emotion Recog

Mario Sessa 8 Dec 12, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition"

CLIPstyler Official Pytorch implementation of "CLIPstyler:Image Style Transfer with a Single Text Condition" Environment Pytorch 1.7.1, Python 3.6 $ c

203 Dec 30, 2022
Simulated garment dataset for virtual try-on

Simulated garment dataset for virtual try-on This repository contains the dataset used in the following papers: Self-Supervised Collision Handling via

33 Dec 20, 2022
harmonic-percussive-residual separation algorithm wrapped as a VST3 plugin (iPlug2)

Harmonic-percussive-residual separation plug-in This work is a study on the plausibility of a sines-transients-noise decomposition inspired algorithm

Derp Learning 9 Sep 01, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)

V-MPO Simple code to demonstrate Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) in Pyt

Nugroho Dewantoro 9 Jun 06, 2022
Fusion-in-Decoder Distilling Knowledge from Reader to Retriever for Question Answering

This repository contains code for: Fusion-in-Decoder models Distilling Knowledge from Reader to Retriever Dependencies Python 3 PyTorch (currently tes

Meta Research 323 Dec 19, 2022
This is a Image aid classification software based on python TK library development

This is a Image aid classification software based on python TK library development.

EasonChan 1 Jan 17, 2022