Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Overview

MTM

This is the official repository of the paper:

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, and Lei Zhong.

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

PDF / Poster / Code / Chinese Dataset / Chinese Blog 1 / Chinese Blog 2

Datasets

There are two experimental datasets, including the Twitter Dataset, and the firstly proposed Weibo Dataset. Note that you can download the Weibo Dataset only after an "Application to Use the Chinese Dataset for Detecting Previously Fact-Checked Claim" has been submitted.

Code

Key Requirements

python==3.6.10
torch==1.6.0
torchvision==0.7.0
transformers==3.2.0

Usage for Weibo Dataset

After you download the dataset (the way to access is described here), move the FN_11934_filtered.json and DN_27505_filtered.json into the path MTM/dataset/Weibo/raw:

mkdir MTM/dataset/Weibo/raw
mv FN_11934_filtered.json MTM/dataset/Weibo/raw
mv DN_27505_filtered.json MTM/dataset/Weibo/raw

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_weibo.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_weibo.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Weibo --pretrained_model bert-base-chinese
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Weibo --pretrained_model bert-base-chinese --save './ckpts/Weibo' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.01 \
    --fp16 True
    

    then you can get ckpts/Weibo/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Weibo --pretrained_model bert-base-chinese \
    --rouge_bert_model_file './ckpts/Weibo/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Weibo
    

    and you can get data/Weibo/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_weibo.sh:

    python kmeans_clustering.py --dataset Weibo --pretrained_model bert-base-chinese --clustering_data_file 'data/Weibo/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Weibo/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Weibo.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Weibo

You can refer to the run_weibo.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Weibo' \
--dataset 'Weibo' --pretrained_model 'bert-base-chinese' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Weibo/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Weibo/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Weibo/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Weibo/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 3 \
--lr 5e-6 --epochs 10 --batch_size 32 \

then the results and ranking reports will be saved in ckpts/Weibo.

Usage for Twitter Dataset

The description of the dataset can be seen at here.

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_twitter.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_twitter.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Twitter --pretrained_model bert-base-uncased
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Twitter --pretrained_model bert-base-uncased --save './ckpts/Twitter' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.05 \
    --fp16 True
    

    then you can get ckpts/Twitter/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Twitter --pretrained_model bert-base-uncased \
    --rouge_bert_model_file './ckpts/Twitter/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Twitter
    

    and you can get data/Twitter/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_twitter.sh:

    python kmeans_clustering.py --dataset Twitter --pretrained_model bert-base-uncased --clustering_data_file 'data/Twitter/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Twitter/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Twitter.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Twitter

You can refer to the run_twitter.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Twitter' \
--dataset 'Twitter' --pretrained_model 'bert-base-uncased' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Twitter/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Twitter/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Twitter/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Twitter/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 5 \
--lr 1e-4 --epochs 10 --batch_size 16 \

then the results and ranking reports will be saved in ckpts/Twitter.

Citation

@inproceedings{MTM,
  author    = {Qiang Sheng and
               Juan Cao and
               Xueyao Zhang and
               Xirong Li and
               Lei Zhong},
  title     = {Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting
               Previously Fact-Checked Claims},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021},
  pages     = {5468--5481},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
  url       = {https://doi.org/10.18653/v1/2021.acl-long.425},
  doi       = {10.18653/v1/2021.acl-long.425},
}
Owner
ICTMCG
Multimedia Computing Group, Institute of Computing Technology, Chinese Academy of Sciences. Our official account on WeChat: ICTMCG.
ICTMCG
A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Awesome PyTorch Scholarship Resources A collection of awesome PyTorch and Python learning resources. Contributions are always welcome! Course Informat

Arnas Gečas 302 Dec 03, 2022
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

TubeDETR: Spatio-Temporal Video Grounding with Transformers Website • STVG Demo • Paper This repository provides the code for our paper. This includes

Antoine Yang 108 Dec 27, 2022
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 66 Jan 04, 2023
Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras

Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras which will then be used to generate residuals

Federico Lopez 2 Jan 14, 2022
Mail classification with tensorflow and MS Exchange Server (ham or spam).

Mail classification with tensorflow and MS Exchange Server (ham or spam).

Metin Karatas 1 Sep 11, 2021
MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets)

MixRNet(Using mixup as regularization and tuning hyper-parameters for ResNets) Using mixup data augmentation as reguliraztion and tuning the hyper par

Bhanu 2 Jan 16, 2022
Demo for Real-time RGBD-based Extended Body Pose Estimation paper

Real-time RGBD-based Extended Body Pose Estimation This repository is a real-time demo for our paper that was published at WACV 2021 conference The ou

Renat Bashirov 118 Dec 26, 2022
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Created by Yongming Rao*, Wenliang Zhao*, Guangyi Chen, Yansong Tang, Zheng Z

Yongming Rao 322 Dec 31, 2022
High-Resolution 3D Human Digitization from A Single Image.

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization (CVPR 2020) News: [2020/06/15] Demo with Google Colab (i

Meta Research 8.4k Dec 29, 2022
EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
A way to store images in YAML.

YAMLImg A way to store images in YAML. I made this after seeing Roadcrosser's JSON-G because it was too inspiring to ignore this opportunity. Installa

5 Mar 14, 2022
NeuroFind - A solution to the to the Task given by the Oberseminar of Messtechnik Institute of TU Dresden in 2021

NeuroFind A solution to the to the Task given by the Oberseminar of Messtechnik

1 Jan 20, 2022
CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

CoMoGAN: Continuous Model-guided Image-to-Image Translation Official repository. Paper CoMoGAN: continuous model-guided image-to-image translation [ar

166 Dec 31, 2022
Whisper is a file-based time-series database format for Graphite.

Whisper Overview Whisper is one of three components within the Graphite project: Graphite-Web, a Django-based web application that renders graphs and

Graphite Project 1.2k Dec 25, 2022
A pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction"

ssnt-loss ℹ️ This is a WIP project. the implementation is still being tested. A pure PyTorch implementation of the loss described in "Online Segment t

張致強 1 Feb 09, 2022
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Seokeon Choi 35 Oct 26, 2022
SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning

SPCL SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning Update on 2021/11/25: ArXiv Ver

Binhui Xie (谢斌辉) 11 Oct 29, 2022
The Most Efficient Temporal Difference Learning Framework for 2048

moporgic/TDL2048+ TDL2048+ is a highly optimized temporal difference (TD) learning framework for 2048. Features Many common methods related to 2048 ar

Hung Guei 5 Nov 23, 2022