The official github repository for Towards Continual Knowledge Learning of Language Models

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...                    

Components in each configurations file

  • input_length (int) : the input sequence length
  • output_length (int) : the output sequence length
  • num_train_epochs (int) : number of training epochs
  • output_dir (string) : the directory to save the model checkpoints
  • dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
  • dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
  • train_batch_size (int) : batch size used for training
  • learning rate (float) : learning rate used for training
  • model (string) : model name in huggingface models (https://huggingface.co/models)
  • method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
  • freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
  • gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
  • ngpu (int) : number of gpus used for the run
  • num_workers (int) : number of workers for the Dataloader
  • resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
  • accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
  • use_deepspeed (bool) : false by default. Currently not extensively tested.
  • CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
  • wandb_log (bool) : whether to log experiment through wandb
  • wandb_project (string) : project name of wandb
  • wandb_run_name (string) : the name of this training run
  • mode (string) : 'pretrain' for all configs
  • use_lr_scheduling (bool) : true if using learning rate scheduling
  • check_validation (bool) : true for evaluation (no training)
  • checkpoint_path (string) : path to the model checkpoint that is used for evaluation
  • output_log (string) : directory to log evaluation results to
  • split_num (int) : default is 1. more than 1 if there are multile CKL phases
  • split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}
Owner
Joel Jang | 장요엘
Aspiring NLP researcher and a MS student at the Graduate School of AI, KAIST advised by Minjoon Seo
Joel Jang | 장요엘
CVAT is free, online, interactive video and image annotation tool for computer vision

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 04, 2023
A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

Tech Penguin 28 Dec 17, 2021
Global Filter Networks for Image Classification

Global Filter Networks for Image Classification Created by Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, Jie Zhou This repository contains PyTorch

Yongming Rao 273 Dec 26, 2022
Keras code and weights files for popular deep learning models.

Trained image classification models for Keras THIS REPOSITORY IS DEPRECATED. USE THE MODULE keras.applications INSTEAD. Pull requests will not be revi

François Chollet 7.2k Dec 29, 2022
Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices,

Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices, Linh Van Ma, Tin Trung Tran, Moongu Jeon, ICAIIC 2022 (The 4th

Linh 11 Oct 10, 2022
A full-fledged version of Pix2Seq

Stable-Pix2Seq A full-fledged version of Pix2Seq What it is. This is a full-fledged version of Pix2Seq. Compared with unofficial-pix2seq, stable-pix2s

peng gao 205 Dec 27, 2022
A repository for benchmarking neural vocoders by their quality and speed.

License The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, Para

Meta Research 177 Dec 12, 2022
Minimal deep learning library written from scratch in Python, using NumPy/CuPy.

SmallPebble Project status: experimental, unstable. SmallPebble is a minimal/toy automatic differentiation/deep learning library written from scratch

Sidney Radcliffe 92 Dec 30, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
All course materials for the Zero to Mastery Machine Learning and Data Science course.

Zero to Mastery Machine Learning Welcome! This repository contains all of the code, notebooks, images and other materials related to the Zero to Maste

Daniel Bourke 1.6k Jan 08, 2023
Code for "Learning to Segment Rigid Motions from Two Frames".

rigidmask Code for "Learning to Segment Rigid Motions from Two Frames". ** This is a partial release with inference and evaluation code.

Gengshan Yang 157 Nov 21, 2022
EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:

Csordás Róbert 57 Nov 21, 2022
Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild

1.1k Jan 03, 2023
Project page for our ICCV 2021 paper "The Way to my Heart is through Contrastive Learning"

The Way to my Heart is through Contrastive Learning: Remote Photoplethysmography from Unlabelled Video This is the official project page of our ICCV 2

36 Jan 06, 2023
Official repository for the paper "Self-Supervised Models are Continual Learners" (CVPR 2022)

Self-Supervised Models are Continual Learners This is the official repository for the paper: Self-Supervised Models are Continual Learners Enrico Fini

Enrico Fini 73 Dec 18, 2022
CM building dataset Timisoara

CM_building_dataset_Timisoara Date created: Febr-2020 The Timi\c{s}oara Building Dataset - TMBuD - is composed of 160 images with the resolution of 76

Orhei Ciprian 5 Sep 07, 2022
Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Facebook Research 408 Jan 01, 2023
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023
Implementation of "With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021" in PyTorch

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression

Mercer Gaussian Process (MGP) and Fourier Gaussian Process (FGP) Regression We provide the code used in our paper "How Good are Low-Rank Approximation

Aristeidis (Ares) Panos 0 Dec 13, 2021