Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Last update: Dec 27, 2022

Overview

LOREN

Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

DEMO System

Check out our demo system! Note that the results will be slightly different from the paper, since we use an up-to-date Wikipedia as the evidence source whereas FEVER uses Wikipedia dated 2017.

Dependencies

CUDA > 11
Prepare requirements: pip3 install -r requirements.txt.
- Also works for allennlp==2.3.0, transformers==4.5.1, torch==1.8.1.
Set environment variable $PJ_HOME: export PJ_HOME=/YOUR_PATH/LOREN/.

Download Pre-processed Data and Checkpoints

Pre-processed data at Google Drive. Unzip it and put them under LOREN/data/.
- Data for training a Seq2Seq MRC is at data/mrc_seq2seq_v5/.
- Data for training veracity prediction is at data/fact_checking/v5/*.json.
  - Note: dev.json uses ground truth evidence for validation, where eval.json uses predicted evidence for validation. This is consistent with the settings in KGAT.
- Evidence retrieval models are not required for training LOREN, since we directly adopt the retrieved evidence from KGAT, which is at data/fever/baked_data/ (using only during pre-processing).
- Original data is at data/fever/ (using only during pre-processing).
Pre-trained checkpoints at Huggingface Models. Unzip it and put them under LOREN/models/.
- Checkpoints for veracity prediciton are at models/fact_checking/.
- Checkpoint for generative MRC is at models/mrc_seq2seq/.
- Checkpoints for KGAT evidence retrieval models are at models/evidence_retrieval/ (not used in training, displayed only for the sake of completeness).

Training LOREN from Scratch

For quick training and inference with pre-processed data & pre-trained models, please go to Veracity Prediction.

First, go to LOREN/src/.

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

You'll need to download three external models in this step, i.e., two models from AllenNLP in parsing_client/sentence_parser.py and a T5-based question generation model in qg_client/question_generator.py. Don't worry, they'll be automatically downloaded.

Run python3 pproc_client/pproc_questions.py --roles eval train val test
This generates cached json files:
- AG_PREFIX/answer.{role}.cache: extracted phrases are stored in the field answers.
- QG_PREFIX/question.{role}.cache: generated questions are stored in the field cloze_qs, generate_qs and questions (two types of questions concatenated).

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)

Run python3 pproc_client/pproc_mrc.py -o LOREN/data/mrc_seq2seq_v5.
This generates files for Seq2Seq training in a HuggingFace style:
- data/mrc_seq2seq_v5/{role}.source: concatenated question and evidence text.
- data/mrc_seq2seq_v5/{role}.target: answer (claim phrase).

Training Seq2Seq

Go to mrc_client/seq2seq/, which is modified based on HuggingFace's examples.
Follow script/train.sh.
The best checkpoint will be saved in $output_dir (e.g., models/mrc_seq2seq/).
- Best checkpoints are decided by ROUGE score on dev set.

3) Run MRC for all questions and assemble local premises

Run python3 pproc_client/pproc_evidential.py --roles val train eval test -m PATH_TO_MRC_MODEL/.
This generates files:
- {role}.json: files for veracity prediction. Assembled local premises are stored in the field evidential_assembled.

4) Building NLI prior

Before training veracity prediction, we'll need a NLI prior from pre-trained NLI models, such as DeBERTa.

Run python3 pproc_client/pproc_nli_labels.py -i PATH_TO/{role}.json -m microsoft/deberta-large-mnli.
Mind the order! The predicted classes [Contradict, Neutral, Entailment] correspond to [REF, NEI, SUP], respectively.
This generates files:
- Adding a new field nli_labels to {role}.json.

2 Veracity Prediction

This part is rather easy (less pipelined :P). A good place to start if you want to skip the above pre-processing.

1) Training

Go to folder check_client/.
See what scripts/train_*.sh does.

2) Testing

Stay in folder check_client/
Run python3 fact_checker.py --params PARAMS_IN_THE_CODE
This generates files:
- results/*.predictions.jsonl

3) Evaluation

Go to folder eval_client/
For Label Accuracy and FEVER score: fever_scorer.py
For CulpA (turn on --verbose in testing): culpa.py

Citation

If you find our paper or resources useful to your research, please kindly cite our paper (pre-print, official published paper coming soon).

@misc{chen2021loren,
      title={LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification}, 
      author={Jiangjie Chen and Qiaoben Bao and Changzhi Sun and Xinbo Zhang and Jiaze Chen and Hao Zhou and Yanghua Xiao and Lei Li},
      year={2021},
      eprint={2012.13577},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Related tags

Overview

LOREN

DEMO System

Dependencies

Download Pre-processed Data and Checkpoints

Training LOREN from Scratch

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)

Training Seq2Seq

3) Run MRC for all questions and assemble local premises

4) Building NLI prior

2 Veracity Prediction

1) Training

2) Testing

3) Evaluation

Citation

Owner

Jiangjie Chen

EMNLP 2020 - Summarizing Text on Any Aspects

Chinese named entity recognization with BiLSTM using Keras

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

3D Pose Estimation for Vehicles

Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

DirectVoxGO reconstructs a scene representation from a set of calibrated images capturing the scene.

《Single Image Reflection Removal Beyond Linearity》(CVPR 2019)

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Baseline for the Spoofing-aware Speaker Verification Challenge 2022

A high-level Python library for Quantum Natural Language Processing

Hardware accelerated, batchable and differentiable optimizers in JAX.

Human Activity Recognition example using TensorFlow on smartphone sensors dataset and an LSTM RNN. Classifying the type of movement amongst six activity categories - Guillaume Chevalier

LSTM model trained on a small dataset of 3000 names written in PyTorch

A simplified framework and utilities for PyTorch

DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)

covid question answering datasets and fine tuned models

A simple python module to generate anchor (aka default/prior) boxes for object detection tasks.

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

This is the official code for the paper "Ad2Attack: Adaptive Adversarial Attack for Real-Time UAV Tracking".

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners