Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Last update: Jul 08, 2022

Related tags

Deep Learning stem-cell-hypothesis

Overview

The Stem Cell Hypothesis

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Installation

Run the following setup script. Feel free to install a GPU-enabled PyTorch (torch>=1.6.0).

python3 -m venv env
source env/bin/activate
ln -sf "$(which python2)" env/bin/python
pip install -e .

Data Pre-processing

Download OntoNotes 5 (LDC2013T19.tgz) and put it into the following directory:

mkdir -p ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/
cp LDC2013T19.tgz ~/.elit/thirdparty/catalog.ldc.upenn.edu/LDC2013T19/LDC2013T19.tgz

That's all. ELIT will automatically do the rest for you the first time you run the training script.

Experiments

Here we demonstrate how to experiment with BERT-base but feel free to replace the transformer and task name in the script path for other experiments. Our scripts are grouped by transformers and tasks with clear semantics.

Single Task Learning

The following script will train STL-POS with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/single/pos.py

Multi-Task Learning

The following script will train MTL-5 with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/joint/all.py

Pruning Experiments

The following script will train STL-POS-DP with BERT-base and evaluate its performance on the test set:

python3 stem_cell_hypothesis/en_bert_base/gate/pos.py

You can monitor the pruning process in real time via tensorboard:

tensorboard --logdir=data/model/mtl/ontonotes_bert_base_en/gated/pos/0/runs --samples_per_plugin images=1000

which will show how the heads gradually get claimed in http://localhost:6007/#images:

Once 3 runs are finished, you can visualize the overlap of head utilization across runs via:

python3 stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_rgb.py

which will generate the following figure (1a):

Similarly, Figure 1g is generated with stem_cell_hypothesis/en_bert_base/gate/vis_gate_overlap_tasks_gray.py.

Probing Experiments

Once a model is trained, you can probe its representations via the scripts in stem_cell_hypothesis/en_bert_base/head. For example, to probe STL-POS performance, run:

python3 stem_cell_hypothesis/en_bert_base/head/pos.py
python3 stem_cell_hypothesis/en_bert_base/head/vis/pos.py

which generates Figure 4:

You may need to manually change the path and update new results in the scripts.

To probe the unsupervised BERT performance for a single task, e.g., SRL, run:

python3 stem_cell_hypothesis/en_bert_base/head/srl_dot.py

which generates Figure 3:

Although not included in the paper due to page limitation, experiments of Chinese, BERT-large, ALBERT, etc. are uploaded to stem_cell_hypothesis. Feel free to run them for your interest.

Citation

If you use this repository in your research, please kindly cite our EMNLP2021 paper:

@inproceedings{he-choi-2021-stem,
    title = "The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders",
    author = "He, Han and Choi, Jinho D.",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.451",
    pages = "5555--5577",
    abstract = "Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.",
}

Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

Related tags

Overview

The Stem Cell Hypothesis

Installation

Data Pre-processing

Experiments

Single Task Learning

Multi-Task Learning

Pruning Experiments

Probing Experiments

Citation

Owner

Emory NLP

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

Differential rendering based motion capture blender project.

Neural Architecture Search Powered by Swarm Intelligence 🐜

Implementation of [Time in a Box: Advancing Knowledge Graph Completion with Temporal Scopes].

MODNet: Trimap-Free Portrait Matting in Real Time

Fast and Simple Neural Vocoder, the Multiband RNNMS

Codeflare - Scale complex AI/ML pipelines anywhere

Code for the paper A Theoretical Analysis of the Repetition Problem in Text Generation

Pytorch implementation of "ARM: Any-Time Super-Resolution Method"

FFTNet vocoder implementation

a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

A little Python application to auto tag your photos with the power of machine learning.

Fast and exact ILP-based solvers for the Minimum Flow Decomposition (MFD) problem, and variants of it.

Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

FANet - Real-time Semantic Segmentation with Fast Attention

[NeurIPS 2021] Shape from Blur: Recovering Textured 3D Shape and Motion of Fast Moving Objects

PyTorch implementation of a Real-ESRGAN model trained on custom dataset

Automatically erase objects in the video, such as logo, text, etc.