Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Overview

MTM

This is the official repository of the paper:

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, and Lei Zhong.

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

PDF / Poster / Code / Chinese Dataset / Chinese Blog 1 / Chinese Blog 2

Datasets

There are two experimental datasets, including the Twitter Dataset, and the firstly proposed Weibo Dataset. Note that you can download the Weibo Dataset only after an "Application to Use the Chinese Dataset for Detecting Previously Fact-Checked Claim" has been submitted.

Code

Key Requirements

python==3.6.10
torch==1.6.0
torchvision==0.7.0
transformers==3.2.0

Usage for Weibo Dataset

After you download the dataset (the way to access is described here), move the FN_11934_filtered.json and DN_27505_filtered.json into the path MTM/dataset/Weibo/raw:

mkdir MTM/dataset/Weibo/raw
mv FN_11934_filtered.json MTM/dataset/Weibo/raw
mv DN_27505_filtered.json MTM/dataset/Weibo/raw

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_weibo.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_weibo.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Weibo --pretrained_model bert-base-chinese
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Weibo --pretrained_model bert-base-chinese --save './ckpts/Weibo' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.01 \
    --fp16 True
    

    then you can get ckpts/Weibo/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Weibo --pretrained_model bert-base-chinese \
    --rouge_bert_model_file './ckpts/Weibo/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Weibo
    

    and you can get data/Weibo/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_weibo.sh:

    python kmeans_clustering.py --dataset Weibo --pretrained_model bert-base-chinese --clustering_data_file 'data/Weibo/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Weibo/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Weibo.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Weibo

You can refer to the run_weibo.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Weibo' \
--dataset 'Weibo' --pretrained_model 'bert-base-chinese' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Weibo/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Weibo/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Weibo/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Weibo/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 3 \
--lr 5e-6 --epochs 10 --batch_size 32 \

then the results and ranking reports will be saved in ckpts/Weibo.

Usage for Twitter Dataset

The description of the dataset can be seen at here.

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_twitter.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_twitter.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Twitter --pretrained_model bert-base-uncased
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Twitter --pretrained_model bert-base-uncased --save './ckpts/Twitter' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.05 \
    --fp16 True
    

    then you can get ckpts/Twitter/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Twitter --pretrained_model bert-base-uncased \
    --rouge_bert_model_file './ckpts/Twitter/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Twitter
    

    and you can get data/Twitter/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_twitter.sh:

    python kmeans_clustering.py --dataset Twitter --pretrained_model bert-base-uncased --clustering_data_file 'data/Twitter/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Twitter/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Twitter.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Twitter

You can refer to the run_twitter.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Twitter' \
--dataset 'Twitter' --pretrained_model 'bert-base-uncased' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Twitter/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Twitter/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Twitter/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Twitter/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 5 \
--lr 1e-4 --epochs 10 --batch_size 16 \

then the results and ranking reports will be saved in ckpts/Twitter.

Citation

@inproceedings{MTM,
  author    = {Qiang Sheng and
               Juan Cao and
               Xueyao Zhang and
               Xirong Li and
               Lei Zhong},
  title     = {Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting
               Previously Fact-Checked Claims},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021},
  pages     = {5468--5481},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
  url       = {https://doi.org/10.18653/v1/2021.acl-long.425},
  doi       = {10.18653/v1/2021.acl-long.425},
}
Owner
ICTMCG
Multimedia Computing Group, Institute of Computing Technology, Chinese Academy of Sciences. Our official account on WeChat: ICTMCG.
ICTMCG
ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

ENet in Caffe Execution times and hardware requirements Network 1024x512 1280x720 Parameters Model size (fp32) ENet 20.4 ms 32.9 ms 0.36 M 1.5 MB SegN

Timo Sämann 561 Jan 04, 2023
Code for "Human Pose Regression with Residual Log-likelihood Estimation", ICCV 2021 Oral

Human Pose Regression with Residual Log-likelihood Estimation [Paper] [arXiv] [Project Page] Human Pose Regression with Residual Log-likelihood Estima

JeffLi 347 Dec 24, 2022
[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting [Paper] [Project Website] [Google Colab] We propose a method for converting a

Virginia Tech Vision and Learning Lab 6.2k Jan 01, 2023
Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Noisy Natural Gradient as Variational Inference PyTorch implementation of Noisy Natural Gradient as Variational Inference. Requirements Python 3 Pytor

Tony JiHyun Kim 119 Dec 02, 2022
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022
This repository collects 100 papers related to negative sampling methods.

Negative-Sampling-Paper This repository collects 100 papers related to negative sampling methods, covering multiple research fields such as Recommenda

RUCAIBox 119 Dec 29, 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

LightHuBERT LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Github | Huggingface | SUPER

WangRui 46 Dec 29, 2022
The official implementation of Variable-Length Piano Infilling (VLI).

Variable-Length-Piano-Infilling The official implementation of Variable-Length Piano Infilling (VLI). (paper: Variable-Length Music Score Infilling vi

29 Sep 01, 2022
PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

XLearning Group 33 Nov 01, 2022
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
Real-time object detection on Android using the YOLO network with TensorFlow

TensorFlow YOLO object detection on Android Source project android-yolo is the first implementation of YOLO for TensorFlow on an Android device. It is

Nataniel Ruiz 624 Jan 03, 2023
Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

Reinforcement Learning with Q Learning Algorithm Q learning algorithm is trained on the gym's frozen lake environment. Libraries Used gym Numpy tqdm P

1 Nov 10, 2021
Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Realtime Face Anti-Spoofing Detection 🤖 Realtime Face Anti Spoofing Detection with Face Detector to detect real and fake faces Please star this repo

Prem Kumar 86 Aug 03, 2022
FairMOT for Multi-Class MOT using YOLOX as Detector

FairMOT-X Project Overview FairMOT-X is a multi-class multi object tracker, which has been tailored for training on the BDD100K MOT Dataset. It makes

Jonathan Tan 33 Dec 28, 2022
Simple sinc interpolation in PyTorch.

Kazane: simple sinc interpolation for 1D signal in PyTorch Kazane utilize FFT based convolution to provide fast sinc interpolation for 1D signal when

Chin-Yun Yu 10 May 03, 2022
Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
Extracts essential Mediapipe face landmarks and arranges them in a sequenced order.

simplified_mediapipe_face_landmarks Extracts essential Mediapipe face landmarks and arranges them in a sequenced order. The default 478 Mediapipe face

Irfan 13 Oct 04, 2022
Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting

Disturbing Target Values for Neural Network regularization: attacking the loss layer to prevent overfitting 1. Classification Task PyTorch implementat

Yongho Kim 0 Apr 24, 2022
Implementation of SegNet: A Deep Convolutional Encoder-Decoder Architecture for Semantic Pixel-Wise Labelling

Caffe SegNet This is a modified version of Caffe which supports the SegNet architecture As described in SegNet: A Deep Convolutional Encoder-Decoder A

Alex Kendall 1.1k Jan 02, 2023
It's A ML based Web Site build with python and Django to find the breed of the dog

ML-Based-Dog-Breed-Identifier This is a Django Based Web Site To Identify the Breed of which your DOG belogs All You Need To Do is to Follow These Ste

Sanskar Dwivedi 2 Oct 12, 2022