REBEL: Relation Extraction By End-to-end Language generation

Related tags

Deep Learningrebel
Overview

PWC PWC PWC PWC PWC

REBEL: Relation Extraction By End-to-end Language generation

This is the repository for the Findings of EMNLP 2021 paper REBEL: Relation Extraction By End-to-end Language generation. We present a new linearization aproach and a reframing of Relation Extraction as a seq2seq task. The paper can be found here. If you use the code, please reference this work in your paper:

@inproceedings{huguet-cabot-navigli-2021-rebel,
title = "REBEL: Relation Extraction By End-to-end Language generation",
author = "Huguet Cabot, Pere-Llu{\'\i}s  and
  Navigli, Roberto",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Online and in the Barceló Bávaro Convention Centre, Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://github.com/Babelscape/rebel/blob/main/docs/EMNLP_2021_REBEL__Camera_Ready_.pdf",
}
Repo structure
| conf  # contains Hydra config files
  | data
  | model
  | train
  root.yaml  # hydra root config file
| data  # data
| datasets  # datasets scripts
| model # model files should be stored here
| src
  | pl_data_modules.py  # LightinigDataModule
  | pl_modules.py  # LightningModule
  | train.py  # main script for training the network
  | test.py  # main script for training the network
| README.md
| requirements.txt
| demo.py # Streamlit demo to try out the model
| setup.sh # environment setup script 

Initialize environment

In order to set up the python interpreter we utilize conda , the script setup.sh creates a conda environment and install pytorch and the dependencies in "requirements.txt".

REBEL Model and Dataset

Model and Dataset files can be downloaded here:

https://osf.io/4x3r9/?view_only=87e7af84c0564bd1b3eadff23e4b7e54

Or you can directly use the model from Huggingface repo:

https://huggingface.co/Babelscape/rebel-large

", "").replace(" ", "").replace("", "").split(): if token == " ": current = 't' if relation != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) relation = '' subject = '' elif token == " ": current = 's' if relation != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) object_ = '' elif token == " ": current = 'o' relation = '' else: if current == 't': subject += ' ' + token elif current == 's': object_ += ' ' + token elif current == 'o': relation += ' ' + token if subject != '' and relation != '' and object_ != '': triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()}) return triplets extracted_triplets = extract_triplets(extracted_text[0]) print(extracted_triplets) ">
from transformers import pipeline

triplet_extractor = pipeline('text2text-generation', model='Babelscape/rebel-large', tokenizer='Babelscape/rebel-large')

# We need to use the tokenizer manually since we need special tokens.
extracted_text = triplet_extractor.tokenizer.batch_decode(triplet_extractor("Punta Cana is a resort town in the municipality of Higuey, in La Altagracia Province, the eastern most province of the Dominican Republic", return_tensors=True, return_text=False)[0]["generated_token_ids"]["output_ids"])

print(extracted_text[0])

# Function to parse the generated text and extract the triplets
def extract_triplets(text):
    triplets = []
    relation, subject, relation, object_ = '', '', '', ''
    text = text.strip()
    current = 'x'
    for token in text.replace("", "").replace("
        
         "
        , "").replace("", "").split():
        if token == "
       
        "
       :
            current = 't'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
                relation = ''
            subject = ''
        elif token == "
       
        "
       :
            current = 's'
            if relation != '':
                triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
            object_ = ''
        elif token == "
       
        "
       :
            current = 'o'
            relation = ''
        else:
            if current == 't':
                subject += ' ' + token
            elif current == 's':
                object_ += ' ' + token
            elif current == 'o':
                relation += ' ' + token
    if subject != '' and relation != '' and object_ != '':
        triplets.append({'head': subject.strip(), 'type': relation.strip(),'tail': object_.strip()})
    return triplets
extracted_triplets = extract_triplets(extracted_text[0])
print(extracted_triplets)

CROCODILE: automatiC RelatiOn extraCtiOn Dataset wIth nLi filtEring.

REBEL dataset can be recreated using our RE dataset creator CROCODILE

Training and testing

There are conf files to train and test each model. Within the src folder to train for CONLL04 for instance:

train.py model=rebel_model data=conll04_data train=conll04_train

Once the model is trained, the checkpoint can be evaluated by running:

test.py model=rebel_model data=conll04_data train=conll04_train do_predict=True checkpoint_path="path_to_checkpoint"

src/model_saving.py can be used to convert a pytorch lightning checkpoint into the hf transformers format for model and tokenizer.

DEMO

We suggest running the demo to test REBEL. Once the model files are unzipped in the model folder run:

streamlit run demo.py

And a demo will be available in the browser. It accepts free input as well as data from the sample file in data/rebel/

Datasets

TACRED is not freely avialable but instructions on how to create Re-TACRED from it can be found here.

For CONLL04 and ADE one can use the script from the SpERT github.

For NYT the dataset can be downloaded from Copy_RE github.

Finally the DocRED for RE can be downloaded at the JEREX github

Owner
Babelscape
Babelscape is a deep tech company founded in 2016 focused on multilingual Natural Language Processing.
Babelscape
PyTorch Implementation of CvT: Introducing Convolutions to Vision Transformers

CvT: Introducing Convolutions to Vision Transformers Pytorch implementation of CvT: Introducing Convolutions to Vision Transformers Usage: img = torch

Rishikesh (ऋषिकेश) 193 Jan 03, 2023
DM-ACME compatible implementation of the Arm26 environment from Mujoco

ACME-compatible implementation of Arm26 from Mujoco This repository contains a customized implementation of Mujoco's Arm26 model, that can be used wit

1 Dec 24, 2021
This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems.

Amortized Assimilation This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems. Abstract: T

4 Aug 16, 2022
Streaming over lightweight data transformations

Description Data augmentation libarary for Deep Learning, which supports images, segmentation masks, labels and keypoints. Furthermore, SOLT is fast a

Research Unit of Medical Imaging, Physics and Technology 256 Jan 08, 2023
JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces

JAXMAPP: JAX-based Library for Multi-Agent Path Planning in Continuous Spaces JAXMAPP is a JAX-based library for multi-agent path planning (MAPP) in c

OMRON SINIC X 24 Dec 28, 2022
Perform Linear Classification with Multi-way Data

MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (

Eric F. Lock 2 Dec 15, 2020
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing

DyStyle: Dynamic Neural Network for Multi-Attribute-Conditioned Style Editing Figure: Joint multi-attribute edits using DyStyle model. Great diversity

74 Dec 03, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
Simple renderer for use with MuJoCo (>=2.1.2) Python Bindings.

Viewer for MuJoCo in Python Interactive renderer to use with the official Python bindings for MuJoCo. Starting with version 2.1.2, MuJoCo comes with n

Rohan P. Singh 62 Dec 30, 2022
Static-test - A playground to play with ideas related to testing the comparability of the code

Static test playground ⚠️ The code is just an experiment. Compiles and runs on U

Igor Bogoslavskyi 4 Feb 18, 2022
FOSS Digital Asset Distribution Platform built on Frappe.

Digistore FOSS Digital Assets Marketplace. Distribute digital assets, like a pro. Video Demo Here Features Create, attach and list digital assets (PDF

Mohammad Hussain Nagaria 30 Dec 08, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

11 Dec 10, 2022
Code for the paper "A Study of Face Obfuscation in ImageNet"

A Study of Face Obfuscation in ImageNet Code for the paper: A Study of Face Obfuscation in ImageNet Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng,

35 Oct 04, 2022
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

Commonsense Knowledge Base Completion with Structural and Semantic Context Code for the paper Commonsense Knowledge Base Completion with Structural an

AI2 96 Nov 05, 2022
Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

Deep Generative Model for Robust Imbalance Classification Deep Generative Model for Robust Imbalance Classification Xinyue Wang, Yilin Lyu, Liping Jin

9 Nov 01, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022