A toolkit for document-level event extraction, containing some SOTA model implementations

Last update: Dec 15, 2022

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

Source code for ACL-IJCNLP 2021 Long paper: Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker.

Our code is based on Doc2EDAG.

0. Introduction

Document-level event extraction aims to extract events within a document. Different from sentence-level event extraction, the arguments of an event record may scatter across sentences, which requires a comprehensive understanding of the cross-sentence context. Besides, a document may express several correlated events simultaneously, and recognizing the interdependency among them is fundamental to successful extraction. To tackle the aforementioned two challenges, We propose a novel heterogeneous Graph-based Interaction Model with a Tracker (GIT). A graph-based interaction network is introduced to capture the global context for the scattered event arguments across sentences with different heterogeneous edges. We also decode event records with a Tracker module, which tracks the extracted event records, so that the interdependency among events is taken into consideration. Our approach delivers better results over the state-of-the-art methods, especially in cross-sentence events and multiple events scenarios.

Architecture
Overall Results

1. Package Description

GIT/
├─ dee/
    ├── __init__.py
    ├── base_task.py
    ├── dee_task.py
    ├── ner_task.py
    ├── dee_helper.py: data features constrcution and evaluation utils
    ├── dee_metric.py: data evaluation utils
    ├── config.py: process command arguments
    ├── dee_model.py: GIT model
    ├── ner_model.py
    ├── transformer.py: transformer module
    ├── utils.py: utils
├─ run_dee_task.py: the main entry
├─ train_multi.sh
├─ run_train.sh: script for training (including evaluation)
├─ run_eval.sh: script for evaluation
├─ Exps/: experiment outputs
├─ Data.zip
├─ Data: unzip Data.zip
├─ LICENSE
├─ README.md

2. Environments

python (3.6.9)
cuda (11.1)
Ubuntu-18.0.4 (5.4.0-73-generic)

3. Dependencies

numpy (1.19.5)
torch (1.8.1+cu111)
pytorch-pretrained-bert (0.4.0)
dgl-cu111 (0.6.1)
tensorboardX (2.2)

PS: The environments and dependencies listed here is different from what we use in our paper, so the results may be a bit different.

4. Preparation

Unzip Data.zip and you can get an Data folder, where the training/dev/test data locate.

5. Training

>> bash run_train.sh

6. Evaluation

>> bash run_eval.sh

(The evaluation is also conducted after the training)

7. License

This project is licensed under the MIT License - see the LICENSE file for details.

8. Citation

If you use this work or code, please kindly cite the following paper:

@inproceedings{xu-etal-2021-git,
    title = "Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker",
    author = "Runxin Xu  and
      Tianyu Liu  and
      Lei Li and
      Baobao Chang",
    booktitle = "The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)",
    year = "2021",
    publisher = "Association for Computational Linguistics",
}

A toolkit for document-level event extraction, containing some SOTA model implementations

Related tags

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

0. Introduction

1. Package Description

2. Environments

3. Dependencies

4. Preparation

5. Training

6. Evaluation

7. License

8. Citation

Owner

A Structured Self-attentive Sentence Embedding

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

It analyze the sentiment of the user, whether it is postive or negative.

A website which allows you to play with the GPT-2 transformer

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Machine translation models released by the Gourmet project

Chinese Named Entity Recognization (BiLSTM with PyTorch)

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。

Chatbot for the Chatango messaging platform

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

A Fast Command Analyser based on Dict and Pydantic

Python library for Serbian Natural language processing (NLP)

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Final Project Bootcamp Zero

تولید اسم های رندوم فینگیلیش

A 10000+ hours dataset for Chinese speech recognition

MEDIALpy: MEDIcal Abbreviations Lookup in Python

A toolkit for document-level event extraction, containing some SOTA model implementations

Related tags

Overview

Document-level Event Extraction via Heterogeneous Graph-based Interaction Model with a Tracker

0. Introduction

1. Package Description

2. Environments

3. Dependencies

4. Preparation

5. Training

6. Evaluation

7. License

8. Citation

Owner

A Structured Self-attentive Sentence Embedding

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

It analyze the sentiment of the user, whether it is postive or negative.

A website which allows you to play with the GPT-2 transformer

Knowledge Graph,Question Answering System，基于知识图谱和向量检索的医疗诊断问答系统

Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

Machine translation models released by the Gourmet project

Chinese Named Entity Recognization (BiLSTM with PyTorch)

PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含 自然语言处理各领域的 面试题积累。

Chatbot for the Chatango messaging platform

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

A Fast Command Analyser based on Dict and Pydantic

Python library for Serbian Natural language processing (NLP)

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Final Project Bootcamp Zero

تولید اسم های رندوم فینگیلیش

A 10000+ hours dataset for Chinese speech recognition

MEDIALpy: MEDIcal Abbreviations Lookup in Python

本项目是作者们根据个人面试和经验总结出的自然语言处理(NLP)面试准备的学习笔记与资料，该资料目前包含自然语言处理各领域的面试题积累。