Norm-based Analysis of Transformer

Implementations for 2 papers introducing to analyze Transformers using vector norms:

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)
Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

This paper proposed to analyze attention, a core component of Transformer, using vector norms rather than attention weights.
Transformer analyses have been focused on mixing in attention and have typically observed attention weights.
However, in addition to attention weights, there are more factors to determine attention's outputs: the input vector itself and vector transformations.
Then, this paper proposed to analyze attention using vector norms considering them.
→ Check this paper's code: Code for emnlp2020.

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

This paper proposed to analyze attention block (i.e., attention, residual connection, and layer normalization) using vector norms.
Transformer analyses have been focused on mixing in attention.
However, there are components other than attention in Transformer, and they can play a role other than mixing.
Then, this paper proposed to expand the scope of Transformer analysis from attention into attention block.
→ Check this paper's code: Code for emnlp2021.

Citation

If you use our code for academic work, please cite:

@inproceedings{kobayashi-etal-2020-attention,  
   title = {Attention is Not Only a Weight: Analyzing Transformers with Vector Norms},  
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},  
   booktitle = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},  
   year = "2020",  
   url = "https://www.aclweb.org/anthology/2020.emnlp-main.574",  
   pages = "7057--7075",  
}
@inproceedings{kobayashi-etal-2021-incorporating,
   title = {Incorporating Residual and Normalization Layers into Analysis of Masked Language Models},
   author = {Kobayashi, Goro and Kuribayashi, Tatsuki and Yokoi, Sho and Inui, Kentaro},
   booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Proceeding (EMNLP)},
   year = "2021",
   url = "https://arxiv.org/abs/2109.07152",
   pages = "to appear",
}

Norm-based Analysis of Transformer

Related tags

Overview

Norm-based Analysis of Transformer

Kobayashi+'20 Attention is Not Only a Weight: Analyzing Transformers with Vector Norms (EMNLP 2020)

Kobayashi+'21 Incorporating Residual and Normalization Layers into Analysis of Masked Language Models (EMNLP 2021)

Citation

Owner

Goro Kobayashi

Generate images from texts. In Russian

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Siamese TabNet

Python periodic table module

Caffe models in TensorFlow

A computational optimization project towards the goal of gerrymandering the results of a hypothetical election in the UK.

Embodied Intelligence via Learning and Evolution

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

A PyTorch library for Vision Transformers

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

Text Generation by Learning from Demonstrations

Image processing in Python

Manim is an engine for precise programmatic animations, designed for creating explanatory math videos

Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Tensorflow implementation of soft-attention mechanism for video caption generation.

A Player for Kanye West's Stem Player. Sort of an emulator.

Evaluating deep transfer learning for whole-brain cognitive decoding

This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

Multi-objective constrained optimization for energy applications via tree ensembles