A toolkit for document-level event extraction, containing some SOTA model implementations

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

Source code for ACL-IJCNLP 2021 Long paper: Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker.

Our code is based on Doc2EDAG.

0. Introduction

Document-level event extraction aims to extract events within a document. Different from sentence-level event extraction, the arguments of an event record may scatter across sentences, which requires a comprehensive understanding of the cross-sentence context. Besides, a document may express several correlated events simultaneously, and recognizing the interdependency among them is fundamental to successful extraction. To tackle the aforementioned two challenges, We propose a novel heterogeneous Graph-based Interaction Model with a Tracker (GIT). A graph-based interaction network is introduced to capture the global context for the scattered event arguments across sentences with different heterogeneous edges. We also decode event records with a Tracker module, which tracks the extracted event records, so that the interdependency among events is taken into consideration. Our approach delivers better results over the state-of-the-art methods, especially in cross-sentence events and multiple events scenarios.

  • Architecture model overview

  • Overall Results

1. Package Description

GIT/
├─ dee/
    ├── __init__.py
    ├── base_task.py
    ├── dee_task.py
    ├── ner_task.py
    ├── dee_helper.py: data features constrcution and evaluation utils
    ├── dee_metric.py: data evaluation utils
    ├── config.py: process command arguments
    ├── dee_model.py: GIT model
    ├── ner_model.py
    ├── transformer.py: transformer module
    ├── utils.py: utils
├─ run_dee_task.py: the main entry
├─ train_multi.sh
├─ run_train.sh: script for training (including evaluation)
├─ run_eval.sh: script for evaluation
├─ Exps/: experiment outputs
├─ Data.zip
├─ Data: unzip Data.zip
├─ LICENSE
├─ README.md

2. Environments

  • python (3.6.9)
  • cuda (11.1)
  • Ubuntu-18.0.4 (5.4.0-73-generic)

3. Dependencies

  • numpy (1.19.5)
  • torch (1.8.1+cu111)
  • pytorch-pretrained-bert (0.4.0)
  • dgl-cu111 (0.6.1)
  • tensorboardX (2.2)

PS: The environments and dependencies listed here is different from what we use in our paper, so the results may be a bit different.

4. Preparation

  • Unzip Data.zip and you can get an Data folder, where the training/dev/test data locate.

5. Training

>> bash run_train.sh

6. Evaluation

>> bash run_eval.sh

(The evaluation is also conducted after the training)

7. License

This project is licensed under the MIT License - see the LICENSE file for details.

8. Citation

If you use this work or code, please kindly cite the following paper:

@inproceedings{xu-etal-2021-git,
    title = "Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker",
    author = "Runxin Xu  and
      Tianyu Liu  and
      Lei Li and
      Baobao Chang",
    booktitle = "The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}
Owner
人生苦短 及时行乐
StarGAN - Official PyTorch Implementation

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Dec 30, 2022
Long text token classification using LongFormer

Long text token classification using LongFormer

abhishek thakur 161 Aug 07, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets What is LASSL • How to Use What is LASSL LASSL은 LAnguage Semi-Super

LASSL: LAnguage Self-Supervised Learning 116 Dec 27, 2022
Higher quality textures for the Metal Gear Solid series.

Metal Gear Solid: HD Textures Higher quality textures for the Metal Gear Solid series. The goal is to maximize the quality of assets that the engine w

Samantha 6 Dec 06, 2022
Host your own GPT-3 Discord bot

GPT3 Discord Bot Host your own GPT-3 Discord bot i'd host and make the bot invitable myself, however GPT3 terms of service prohibit public use of GPT3

[something hillarious here] 8 Jan 07, 2023
The first online catalogue for Arabic NLP datasets.

Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas

ARBML 94 Dec 26, 2022
Binary LSTM model for text classification

Text Classification The purpose of this repository is to create a neural network model of NLP with deep learning for binary classification of texts re

Nikita Elenberger 1 Mar 11, 2022
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

Pretrain and Fine-tune a T5 model with Flax on GCP This tutorial details how pretrain and fine-tune a FlaxT5 model from HuggingFace using a TPU VM ava

Gabriele Sarti 41 Nov 18, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

Structured Super Lottery Tickets in BERT This repo contains our codes for the paper "Super Tickets in Pre-Trained Language Models: From Model Compress

Chen Liang 16 Dec 11, 2022
Code for the paper "Language Models are Unsupervised Multitask Learners"

Status: Archive (code is provided as-is, no updates expected) gpt-2 Code and models from the paper "Language Models are Unsupervised Multitask Learner

OpenAI 16.1k Jan 08, 2023
An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

Fan 137 Oct 26, 2022
This github repo is for Neurips 2021 paper, NORESQA A Framework for Speech Quality Assessment using Non-Matching References.

NORESQA: Speech Quality Assessment using Non-Matching References This is a Pytorch implementation for using NORESQA. It contains minimal code to predi

Meta Research 36 Dec 08, 2022
Text to speech converter with GUI made in Python.

Text-to-speech-with-GUI Text to speech converter with GUI made in Python. To run this download the zip file and run the main file or clone this repo.

SidTheMiner 1 Nov 15, 2021
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

30 Aug 29, 2022
A NLP program: tokenize method, PoS Tagging with deep learning

IRIS NLP SYSTEM A NLP program: tokenize method, PoS Tagging with deep learning Report Bug · Request Feature Table of Contents About The Project Built

Zakaria 7 Dec 13, 2022
Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

Mathieu Godbout 1 Nov 19, 2021
LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating

LSTM based Sentiment Classification using Tensorflow - Amazon Reviews Rating (Dataset) The dataset is from Amazon Review Data (2018)

Immanuvel Prathap S 1 Jan 16, 2022