Source code for "Efficient Training of BERT by Progressively Stacking"

Overview

Introduction

This repository is the code to reproduce the result of Efficient Training of BERT by Progressively Stacking. The code is based on Fairseq.

Requirements and Installation

  • PyTorch >= 1.0.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • Python version 3.7

After PyTorch is installed, you can install requirements with:

pip install -r requirements.txt

Getting Started

Step 1:

bash install.sh

This script downloads:

  1. Moses Decoder
  2. Subword NMT
  3. Fast BPE (In the next steps, we use Subword NMT instead of Fast BPE. Recommended if you want to generate your own dictionary on a large-scale dataset.)

These library will do cleaning, tokenization, and BPE encoding for GLUE data in step 3. They will also be helpful if you want to make your own corpus for BERT training or if you want to test our model on your own tasks.

Step 2:

bash reproduce_bert.sh

This script runs progressive stacking and train a BERT. The code is tested on 4 Tesla P40 GPUs (24GB Gmem). For different hardware, you probably need to change the maximum number of tokens per batch (by changing max-tokens and update-freq).

Step 3:

bash reproduce_glue.sh

This script fine-tunes the BERT trained in step 2. The script chooses the checkpoint trained for 400K steps, which is the same as the stacking model in our paper.

Cite

@InProceedings{pmlr-v97-gong19a,
  title = 	 {Efficient Training of {BERT} by Progressively Stacking},
  author = 	 {Gong, Linyuan and He, Di and Li, Zhuohan and Qin, Tao and Wang, Liwei and Liu, Tieyan},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2337--2346},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Long Beach, California, USA},
  month = 	 {09--15 Jun},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/gong19a/gong19a.pdf},
  url = 	 {http://proceedings.mlr.press/v97/gong19a.html},
}
Owner
Gong Linyuan
Gong Linyuan
Create Basic ERC20 token with Solidity, Brownie and Python

Create Basic ERC20 token with Solidity, Brownie and Python Demo Check out Cornell Token on Rinnkeby network with Etherscan. Installation Install brown

Ethan Huang 2 Feb 16, 2022
Python Library to Extract youtube video Tags without Youtube API

YoutubeTags Python Library to Extract youtube video Tags without Youtube API Installation pip install YoutubeTags Example import YoutubeTags from Yout

Nuhman Pk 17 Nov 12, 2022
A community Billy vs SNAKEMAN bot

BvS Bot A discord bot built for the Billy vs SNAKEMAN community! Dependencies An installation of Python 3.9.x with ssl compiled. The following pip pac

Neopolitan 2 May 10, 2022
SimpleDCABot is a simple bot that buys crypto with a dollar-cost averaging strategy.

Simple Open Dollar Cost Averaging (DCA) Bot SimpleDCABot is a simple bot that buys crypto on a selected exchange at regular intervals for a prescribed

4 Mar 28, 2022
Trading bot rienforcement with python

Trading_bot_rienforcement System: Ubuntu 16.04 GPU (GeForce GTX 1080 Ti) Instructions: In order to run the code: Make sure to clone the stable baselin

1 Oct 22, 2021
This bot plays the most recent video from the Daily Silksong News Youtube Channel whenever a specific user enters voice chat once a day.

Do you have that one friend that really likes Hollow Knight. Are they waiting for Silksong to come out? Heckle them with this Discord bot.

Tommy Rousey 2 Feb 09, 2022
This project is a basic login system in terminal for Discord

Welcome to Discord Login System(Terminal) 👋 This project is a basic login system in terminal for Discord Author 👤 arukovic Github: @SONIC-CODEZ Show

SONIC-CODEZ 2 Feb 11, 2022
A python script that changes our background based on current weather and time of the day.

Desktop background on Windows 10, based on current weather and time A python script that changes our background based on current weather and time of t

Maj Gaberšček 1 Nov 16, 2021
Crypto trading bot that detects surges in the bitcoin price and executes trades.

The bot will be trading Bitcoin automatically if the price has increased by more than 3% in the last 10 minutes. We will have a stop loss of 5% and t

164 Oct 20, 2022
Template to create a telegram bot in python

Template for Telegram Bot Template to create a telegram bot in python. How to Run Set your telegram bot token as environment variable TELEGRAM_BOT_TOK

PyTopia 12 Aug 14, 2022
Sunflower-farmers-automated-bot - Sunflower Farmers NFT Game automated bot.IT IS NOT a cheat or hack bot

Sunflower-farmers-auto-bot Sunflower Farmers NFT Game automated bot.IT IS NOT a

Arthur Alves 17 Nov 09, 2022
Morpy Bot Linux - Morpy Bot Linux With Python

Morpy_Bot_Linux Guide to using the robot : 🔸 Lsmod = to identify admins and st

2 Jan 20, 2022
✨ Music&Video Userbot

🎶 Fizi - UserBot 🎶 🤖 Telegram UserBot Untuk Memutar Lagu Dan Video Di Obrolan Suara Telegram. ✨ Didukung Oleh PyTgCalls Pyrogram 📝 Persyaratan Pyt

F I Z I • Ɱeƙípres 4 Mar 29, 2022
AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints

AWS DeepRacer Free Student Workshop: Run faster by using your custom waypoints Reward Function Template for waypoints def reward_function(params):

Yuen Cheuk Lam 88 Nov 27, 2022
Easy-apply-bot - A LinkedIn Easy Apply bot to help with my job search.

easy-apply-bot A LinkedIn Easy Apply bot to help with my job search. Getting Started First, clone the repository somewhere onto your computer, or down

Matthew Alunni 5 Dec 09, 2022
Unofficial calendar integration with Gradescope

Gradescope-Calendar This script scrapes your Gradescope account for courses and assignment details. Assignment details currently can be transferred to

6 May 06, 2022
Texting service to receive current air quality conditions and maps, powered by AirNow, Twilio, and AWS

The Air Quality Bot is generally available by texting a zip code (and optionally the word "map") to (415) 212-4229. The bot will respond with the late

Alex Laird 8 Oct 16, 2022
A smart tool to backup members 📈 So you even after a raid/ ban you can easily restore them in seconds 🎲

🤑 Discord-backer 🤑 A open-source Discord member backup and restore tool for your server. This can help you get all your members in 5 Seconds back af

John 29 Dec 21, 2022
Discord Bot for server hosts, devs, and admins. Analyzes timings reports & uploads text files to hastebin. Developed by https://birdflop.com.

"Botflop" Click here to invite Botflop to your server. Current abilities Analyze timings reports Paste a timings report to review an in-depth descript

Purpur 76 Dec 31, 2022
Projeto do segundo módulo da Resilia

@ Projeto Resilia : Módulo 2 Vamos jogar Forca ! O jogo da forca é um jogo em que o jogador tem que acertar qual é a palavra proposta, tendo como dica

Mateus Sartorio 2 Feb 24, 2022