Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Last update: Sep 11, 2022

Related tags

Text Data & NLP gpt

Overview

Pytorch GPT-X

My Own Pytorch GPT-X

1. Abstract

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

2. Model

Transformer

Additional Module

① Rezero

Rezero Is All You Need link

② Explicit Sparse Transformer

Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection link

③ Macaron Architecture

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View link

④ RealFormer, Residual Attention

RealFormer link

Train

DeepSpeed

TODO

~~ReZero~~
RealFormer, Residual Attention
~~Macaron architectures~~
~~Macaron architectures - layer Scale 0.5~~
~~Explicit Sparse Transformer~~
torch lightning
Deepspeed train on single GPU
Deepspeed parallel trainig on 2 V100 GPU with 16GB Memory

Parameter For Few-shot

The 175B parameter model is very large, but a large model is needed for Few-Shot Learning. So this repository try to use DeepSpeed for training extremely big model.

GPT-3 Config

model_name	n_params	n_layer	d_model	n_heads	d_head	batch_size	learning_rate
GPT-3 175B	175B	96	12288	96	128	3.2M	0.6 x 10^-4
GPT-3 13B	13B	40	5140	40	128	2M	1.0 x 10^-4
GPT-3 6.7B	6.7B	32	4096	32	128	2M	1.2 x 10^-4
GPT-3 2.7B	2.7B	32	25560	32	80	1M	1.6 x 10^-4

References

Transformer

lucidrains/x-transformers

DeepSpeed

ReZero

/majumderb/rezero

Explicit Sparse Transformer

x-transformer: explicit_sparse_transformer

Macaron Architecrue

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View

Train GPT-3 model on V100(16GB Mem) Using improved Transformer.

Related tags

Overview

Pytorch GPT-X

1. Abstract

2. Model

Transformer

Additional Module

① Rezero

② Explicit Sparse Transformer

③ Macaron Architecture

④ RealFormer, Residual Attention

Train

DeepSpeed

TODO

Parameter For Few-shot

GPT-3 Config

References

Owner

Seonghwan Kim

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

CMeEE 数据集医学实体抽取

Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2021).

Machine learning models from Singapore's NLP research community

A script that automatically creates a branch name using google translation api and jira api

Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Snowball compiler and stemming algorithms

Part of Speech Tagging using Hidden Markov Model (HMM) POS Tagger and Brill Tagger

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

NLP library designed for reproducible experimentation management

Just a basic Telegram AI chat bot written in Python using Pyrogram.

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

Chinese segmentation library

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Blazing fast language detection using fastText model

American Sign Language (ASL) to Text Converter

Code for the paper PermuteFormer

🦆 Contextually-keyed word vectors