The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Last update: Dec 17, 2022

Related tags

Overview

R2D2

This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling". The current repo is refactored from the original version used in the paper. If meet any issue, please feel free to feedback.

Data

Train

Multi-GPUs

For training from scratch in a single machine with multiple GPUs, please follow scripts below:

CORPUS_PATH=
OUTPUT_PATH=
NODE_NUM=

python -m torch.distributed.launch \
    --nproc_per_node $NODE_NUM R2D2_trainer.py --batch_size 16 \
    --min_len 2 \
    --max_batch_len 512 \
    --max_line -1 \
    --corpus_path $CORPUS_PATH \
    --vocab_path data/en_bert/bert-base-uncased-vocab.txt \
    --config_path data/en_bert/config.json \
    --epoch 60 \
    --output_dir $OUTPUT_PATH \
    --window_size 4 \
    --input_type txt

Single-GPU

CORPUS_PATH=
OUTPUT_PATH=

python trainer.R2D2_trainer \
    --batch_size 16 \
    --min_len 2 \
    --max_batch_len 512 \
    --max_line -1 \
    --corpus_path $CORPUS_PATH \
    --vocab_path data/en_bert/bert-base-uncased-vocab.txt \
    --config_path data/en_bert/config.json \
    --epoch 10 \
    --output_dir $OUTPUT_PATH \
    --input_type txt

Evaluation

Evaluating the bidirectional language model task.

CORPUS_PATH=path to training corpus
VOCAB_DIR=directory of vocab.txt
MODEL_PATH=path to model.bin
CONFIG_PATH=path to config.json

python lm_eval_buckets.py \
    --model_name R2D2 \
    --dataset test \
    --config_path CONFIG_PATH \
    --model_path MODEL_PATH \
    --vocab_dir VOCAB_DIR \
    --corpus_path CORPUS_PATH

For evaluating F1 score on constituency trees, please refer to https://github.com/harvardnlp/compound-pcfg/blob/master/compare_trees.py

Evaluating compatibility with dependency trees: Download WSJ dataset and convert to dependency trees by Stanford CoreNLP(https://stanfordnlp.github.io/CoreNLP/). As WSJ is not a free dataset, it's not included in our project. Please refer to the files in data/predict_trees for detail format of tree induced.

python eval_tree.py \
    --pred_tree_path path_to_tree_induced \
    --ground_truth_path path_to_dependency_trees
    --vocab_dir VOCAB_DIR

On-going work

Re-implement whole model to increase GPU utility ratio.
Pre-train on large corpus

Contact

[email protected] and [email protected]

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

18 Oct 12, 2022

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

29 Nov 21, 2022

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

115 Jan 4, 2023

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

68 Dec 6, 2022

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

83 Dec 14, 2022

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

27 Mar 30, 2022

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

6 Jul 18, 2022

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

41 Nov 8, 2022

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Demysitifing Local Vision Transformer, arxiv This is the official PyTorch implementation of our paper. We simply replace local self attention by (dyna

138 Dec 28, 2022

Comments

question about perplexity measures with R2D2 original model
I have a few minor questions about the R2D2 PPPL measurements and their implementation.

Q1: In the paper, it says PPPL is defined as, exp(-(1/N) sum(L(S)))

This makes sense. But in the evaluation code here,

log_p_sums, b_c, pppl = self.predictor(ids, self.bucket_size, self.get_bucket_id) PPPL += (pppl - PPPL) / counter print(PPPL, file=f_out)

We are outputting PPPL without taking the exponential. I assume the numbers in the paper are actually 2^{PPPL} here right? assuming we are using 2 as the base. I simply load a random BERT model, PPPL outputted here is around 10.4, 2^{10.4} ~= 1351, which is about right.

Q2: For pretraining the BERT model baseline, are you guys loading the same dataset as in the link below? or loading some default huggingface dataset? https://github.com/alipay/StructuredLM_RTDT/tree/r2d2/data/en_wiki

Sorry to throw random questions at you, but this would be very helpful for me to build something on top of this.

Thanks.
opened by frankaging 4
an potential issue found for the nn.MultiheadAttention module setup
Hi Authors!

Thanks for sharing this repo, I enjoyed when reading your paper, and I am working on a related project. As I am going through the code, I found one potential issue with the current setup. I will (1) explain the issue, and (2) provide a simple test case that I ran on my end. Please help with verifying.

Issue:

nn.MultiheadAttention module inside the BinaryEncoder module is set with batch_first=True, however it seems like we are passing in Q, K, V matrics without the first dimension being the batch dimension.

Code Analysis: In r2d2.py, it is calling the encoder here, as the following

tasks_embedding = self.embedding(task_ids) # (?, 2, dim) input_embedding = torch.cat([tasks_embedding, tensor_batch], dim=1) # (?, 4, dim) outputs = self.tree_encoder(input_embedding.transpose(0, 1)).transpose(0, 1) # (? * batch_size, 4, dim)

We can see that input_embedding is definitely with the first dimension being the batch_size as it concat with the embeddings from the nn.embedding module. Before we call self.tree_encoder, .transpose(0, 1) makes the the second dimension of the input being the batch_size instead. Specifically, the first dimension, in this case, is always 4.

Testing Done: I simply add some logs inside TreeEncoderLayer as,

def forward(self, src, src_mask=None, pos_ids=None): """ :param src: concatenation of task embeddings and representation for left and right. src shape: (task_embeddings + left + right, batch_size, dim) :param src_mask: :param pos_ids: :return: """ if len(pos_ids.shape) == 1: sz = src.shape[0] # sz: batch_size pos_ids = pos_ids.unsqueeze(0).expand(sz, -1) # (3, batch_size) position_embedding = self.position_embedding(pos_ids) print("pre: ", src.shape) print("pos_emb: ", position_embedding.shape) output = self.self_attn(src + position_embedding, src + position_embedding, src, attn_mask=src_mask) src2 = output[0] attn_weights = output[1] print("attn_w: ", attn_weights.shape) src = src + self.dropout1(src2) src = self.norm1(src) src2 = self.linear2(self.dropout(self.activation(self.linear1(src)))) src = src + self.dropout2(src2) src = self.norm2(src) print("post: ", src.shape) return src

And this is what I get,

pre: torch.Size([4, 8, 768]) pos_emb: torch.Size([4, 8, 768]) attn_w: torch.Size([4, 8, 8]) post: torch.Size([4, 8, 768])

Summary: It seems like for r2d2.py, the self-attention is not on those 4 tokens (2 special prefix + left and right children embedding), but it is on the full collection of candidates with their children.

I saw this issue is not presented in r2d2_cuda.py as,

outputs = self.tree_encoder( input_embedding) # (? * batch_size, 4, dim)

This is great. I have not checked the rest of the code for r2d2_cuda.py though. With this, I am wondering are the numbers from either of your papers need to be updated with this potential issue? Either way, I am not blocked by this potential issue, and I was inspired quite a lot by your codebase. Thanks!
opened by frankaging 3
关于backbone的疑问。
作者你好，非常感谢你的贡献，我觉得你的工作很有意义，感觉是一个新方向。有2个疑问需要请教一下：

encoder 使用 transformer，基于注意力的模型，其能力很大部门来源于能通过注意力机制编码出上下文中有用的信息，但这里每次输入只有 [SUM], [CLS], [token1], [token2] 共4个，上下文短，个人感觉 transformer 可能不是最合适的，有试过其它编码器吗？比如gru，或者textCNN？

有办法并行编码吗？虽然 transformer 的时间复杂度高，但是GPU并行编码很好解决了训练时间长的问题。从论文的E图看 CKY 树编码，一个 token 要分别编码几次，这样会不会导致训练时间实际更长？如，3层 R2D2 比 12 层 transformer 训练数据时间更长？谢谢作者。
opened by wulaoshi 1

Releases(fast-R2D2)

fast-R2D2(Oct 10, 2022)

The paper version and the corresponding model pretrained on wiki-103.
Source code(tar.gz)
Source code(zip)
wiki103_win4_60_epochs.zip(319.16 MB)
r2d2(Feb 13, 2022)

The code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling"
Source code(tar.gz)
Source code(zip)

Owner

Alipay

Ant Group Open Source

GitHub Repository

Refactoring dalle-pytorch and taming-transformers for TPU VM

Text-to-Image Translation (DALL-E) for TPU in Pytorch Refactoring Taming Transformers and DALLE-pytorch for TPU VM with Pytorch Lightning Requirements

61 Nov 07, 2022

Python and Julia in harmony.

PythonCall & JuliaCall Bringing Python® and Julia together in seamless harmony: Call Python code from Julia and Julia code from Python via a symmetric

414 Jan 07, 2023

MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

77 Nov 22, 2022

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Analysis of cross-lingual citations in English papers Contents initial_analysis Source code, data, and evaluation details as published at ICADL2020 ci

1 Oct 27, 2022

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

ccks2021-track3 CCKS2021中文NLP地址相关性任务-赛道三-冠军方案团队：我的加菲鱼- wodejiafeiyu 初赛第二/复赛第一/决赛第一前言 19年开始，陆陆续续参加了一些比赛，拿到过一些top，比较懒一直都没分享过，这次比较幸运又拿了top1，打算分享下分类的任务

131 Dec 31, 2022

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2

You Only Look One-level Feature (YOLOF), CVPR2021 A simple, fast, and efficient object detector without FPN. This repo provides a neat implementation

273 Jan 03, 2023

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

260 Nov 16, 2022

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

SSL_OSC Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

2 May 14, 2022

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

Re-implementation of the paper 'Grokking: Generalization beyond overfitting on small algorithmic datasets' Paper Original paper can be found here Data

38 Aug 09, 2022

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

56 Jan 01, 2023

Python code to generate art with Generative Adversarial Network

GAN_Canvas_Maker Generating Art using Generative Adversarial Network (GAN) Python code to generate art with Generative Adversarial Network: https://to

10 Aug 22, 2022

This repo includes the supplementary of our paper "CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels"

Supplementary Materials for CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels This repository includes all supplementary mater

0 Jan 05, 2022

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

CSAC Introduction This repository contains the implementation code for paper: Co

5 Jul 22, 2022

Official implementation of VQ-Diffusion

Official implementation of VQ-Diffusion: Vector Quantized Diffusion Model for Text-to-Image Synthesis

592 Jan 03, 2023

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Multi-Modal Self-Supervision using GDT and StiCa This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized D

42 Dec 09, 2022

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

59 Dec 09, 2022

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

Conditioning Sparse Variational Gaussian Processes for Online Decision-making This repository contains a PyTorch and GPyTorch implementation of the pa

16 Dec 08, 2022

This repository contains the code to replicate the analysis from the paper "Moving On - Investigating Inventors' Ethnic Origins Using Supervised Learning"

Replication Code for 'Moving On' - Investigating Inventors' Ethnic Origins Using Supervised Learning This repository contains the code to replicate th

0 Jan 04, 2022

Official implementation of MSR-GCN (ICCV 2021 paper)

MSR-GCN Official implementation of MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction (ICCV 2021 paper) [Paper] [Sup

42 Nov 07, 2022

Framework that uses artificial intelligence applied to mathematical models to make predictions

LiconIA Framework that uses artificial intelligence applied to mathematical models to make predictions Interface Overview Table of contents [TOC] 1 Ar

4 Jun 20, 2021

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

Related tags

Overview

R2D2

Data

Train

Multi-GPUs

Single-GPU

Evaluation

On-going work

Contact

You might also like...

Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Comments

question about perplexity measures with R2D2 original model

an potential issue found for the nn.MultiheadAttention module setup

关于backbone的疑问。

Releases(fast-R2D2)

fast-R2D2(Oct 10, 2022)

r2d2(Feb 13, 2022)

Owner

Alipay

Refactoring dalle-pytorch and taming-transformers for TPU VM

Python and Julia in harmony.

MegEngine implementation of YOLOX

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2

Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Graph Self-Supervised Learning for Optoelectronic Properties of Organic Semiconductors

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Python code to generate art with Generative Adversarial Network

This repo includes the supplementary of our paper "CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels"

CSAC - Collaborative Semantic Aggregation and Calibration for Separated Domain Generalization

Official implementation of VQ-Diffusion

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

PyTorch and GPyTorch implementation of the paper "Conditioning Sparse Variational Gaussian Processes for Online Decision-making."

This repository contains the code to replicate the analysis from the paper "Moving On - Investigating Inventors' Ethnic Origins Using Supervised Learning"

Official implementation of MSR-GCN (ICCV 2021 paper)

Framework that uses artificial intelligence applied to mathematical models to make predictions

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务