GLM (General Language Model)

Related tags

Deep LearningGLM
Overview

GLM

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

Please refer to our paper for a detailed description of GLM:

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: equal contribution)

Part of the code is based on Megatron-LM and PET.

Pretrained Models

You can download the pretrained models used in the paper here.

Name Params File Config
GLM-Base 110M glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large (multi-task) 335M glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M (multi-task) 410M glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M (multi-task) 515M glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh

Installation

Clone this repo

git clone https://github.com/THUDM/GLM
cd GLM

Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by

pip install -r requirements.txt

Usage

We provide scripts for finetuning GLM on some downstream tasks.

SuperGLUE

  • Download the SuperGlue data and check the experiment setup in scripts/finetune_superglue.sh. Note that DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH need to be changed to your local path. You may also change the batch-size and nproc_per_node according to your available hardware. We suggest to use aggregated batch size 64 for MultiRC and ReCORD and 16 for other tasks.

  • Run the following script (use the COPA dataset as an example)

bash scripts/finetune_superglue.sh \
     config_tasks/model_blocklm_roberta_large.sh \
     config_tasks/task_copa.sh
  • To apply GLM to a new NLU dataset with cloze-filling finetuning, implement a DataProcessor in tasks/superglue/dataset.py for data loading and add a PVP in tasks/superglue/pvp.py for the cloze question. More details can be found here.

  • The cloze questions (prompts) used in this work are written by human. We are also studying a P-tuning (prompt tuning) approach to search for the optimal continuous prompt. Please refer to our paper and code.

Text Summarization

  • Download the Gigaword dataset and check the experiment setup in scripts/finetune_seq2seq.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_seq2seq.sh \ 
     config_tasks/model_blocklm_large_generation.sh \ 
     config_tasks/seq_gigaword.sh
  • For calculating rouge, install file2rouge from here and run bash scripts/evaluate_seq2seq.sh

Language Modeling

LAMBADA Cloze Accuracy

bash scripts/evaluate_lm.sh \ 
     config_tasks/model_blocklm_large_generation.sh \
     config_tasks/zero_lambada.sh 

LM Perplexity

  • Download our test set of wikibook (or any dataset following the same format) and change DATA_ROOT, CHECKPOINT_PATH in scripts/evaluate_lm.sh
  • Run the following script
    bash scripts/evaluate_lm.sh \ 
       config_tasks/model_blocklm_large_generation.sh \
       config_tasks/zero_lm.sh 

Blank Language Model

  • Download the Yahoo dataset and check the experiment setup in scripts/finetune_blank.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_blank.sh \ 
     config_tasks/model_blocklm_large.sh \ 
     config_tasks/seq_blank.sh

Blank Filling (Interactive)

  • Change CHECKPOINT_PATH to your local path. Run the following script
bash scripts/generate_block.sh \
     config_tasks/model_blocklm_large.sh

Example:

Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.

GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a pioneer in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the stanford university

Citation

Please cite our paper if you find this code useful for your research:

@article{DBLP:journals/corr/abs-2103-10360,
  author    = {Zhengxiao Du and
               Yujie Qian and
               Xiao Liu and
               Ming Ding and
               Jiezhong Qiu and
               Zhilin Yang and
               Jie Tang},
  title     = {All {NLP} Tasks Are Generation Tasks: {A} General Pretraining Framework},
  journal   = {CoRR},
  volume    = {abs/2103.10360},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.10360}
}
Owner
THUDM
Data Mining Research Group at Tsinghua University
THUDM
Garbage Detection system which will detect objects based on whether it is plastic waste or plastics or just garbage.

Garbage Detection using Yolov5 on Jetson Nano 2gb Developer Kit. Garbage detection system which will detect objects based on whether it is plastic was

Rishikesh A. Bondade 2 May 13, 2022
Trying to understand alias-free-gan.

alias-free-gan-explanation Trying to understand alias-free-gan in my own way. [Chinese Version 中文版本] CC-BY-4.0 License. Tzu-Heng Lin motivation of thi

Tzu-Heng Lin 12 Mar 17, 2022
Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021.

Conformal time-series forecasting Implementation for Stankevičiūtė et al. "Conformal time-series forecasting", NeurIPS 2021. If you use our code in yo

Kamilė Stankevičiūtė 36 Nov 21, 2022
Machine learning Bot detection technique, based on United States election dataset

Machine learning Bot detection technique, based on United States election dataset (2020). Current github repo provides implementation described in pap

Alexander Shevtsov 4 Nov 20, 2022
Use evolutionary algorithms instead of gridsearch in scikit-learn

sklearn-deap Use evolutionary algorithms instead of gridsearch in scikit-learn. This allows you to reduce the time required to find the best parameter

rsteca 709 Jan 03, 2023
YOLOX-CondInst - Implement CondInst which is a instances segmentation method on YOLOX

YOLOX CondInst -- YOLOX 实例分割 前言 本项目是自己学习实例分割时,复现的代码. 通过自己编程,让自己对实例分割有更进一步的了解。 若想

DDGRCF 16 Nov 18, 2022
Bottom-up Human Pose Estimation

Introduction This is the official code of Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation. This paper has been accepted to CVPR2

108 Dec 01, 2022
TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline

193 Dec 22, 2022
Pytorch implementation of BRECQ, ICLR 2021

BRECQ Pytorch implementation of BRECQ, ICLR 2021 @inproceedings{ li&gong2021brecq, title={BRECQ: Pushing the Limit of Post-Training Quantization by Bl

Yuhang Li 148 Dec 28, 2022
TreeSubstitutionCipher - Encryption system based on trees and substitution

Tree Substitution Cipher Generation Algorithm: Generate random tree. Tree nodes

stepa 1 Jan 08, 2022
U-Time: A Fully Convolutional Network for Time Series Segmentation

U-Time & U-Sleep Official implementation of The U-Time [1] model for general-purpose time-series segmentation. The U-Sleep [2] model for resilient hig

Mathias Perslev 176 Dec 19, 2022
An official implementation of MobileStyleGAN in PyTorch

MobileStyleGAN: A Lightweight Convolutional Neural Network for High-Fidelity Image Synthesis Official PyTorch Implementation The accompanying videos c

Sergei Belousov 602 Jan 07, 2023
Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21'

Argument Extraction by Generation Code for paper "Document-Level Argument Extraction by Conditional Generation". NAACL 21' Dependencies pytorch=1.6 tr

Zoey Li 87 Dec 26, 2022
CLIP (Contrastive Language–Image Pre-training) for Italian

Italian CLIP CLIP (Radford et al., 2021) is a multimodal model that can learn to represent images and text jointly in the same space. In this project,

Italian CLIP 114 Dec 29, 2022
Deep Compression for Dense Point Cloud Maps.

DEPOCO This repository implements the algorithms described in our paper Deep Compression for Dense Point Cloud Maps. How to get started (using Docker)

Photogrammetry & Robotics Bonn 67 Dec 06, 2022
Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection Main requirements torch = 1.0 torchvision = 0.2.0 Python 3 Environm

15 Apr 04, 2022
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

Denis 29 Nov 21, 2022
Official Repository for the paper "Improving Baselines in the Wild".

iWildCam and FMoW baselines (WILDS) This repository was originally forked from the official repository of WILDS datasets (commit 7e103ed) For general

Kazuki Irie 3 Nov 24, 2022
Benchmarking the robustness of Spatial-Temporal Models

Benchmarking the robustness of Spatial-Temporal Models This repositery contains the code for the paper Benchmarking the Robustness of Spatial-Temporal

Yi Chenyu Ian 15 Dec 16, 2022
Facial expression detector

A tensorflow convolutional neural network model to detect facial expressions.

Carlos Tardón Rubio 5 Apr 20, 2022