A Model for Natural Language Attack on Text Classification and Inference

Last update: Dec 16, 2022

Overview

TextFooler

A Model for Natural Language Attack on Text Classification and Inference

This is the source code for the paper: Jin, Di, et al. "Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment." arXiv preprint arXiv:1907.11932 (2019). If you use the code, please cite the paper:

@article{jin2019bert,
  title={Is BERT Really Robust? Natural Language Attack on Text Classification and Entailment},
  author={Jin, Di and Jin, Zhijing and Zhou, Joey Tianyi and Szolovits, Peter},
  journal={arXiv preprint arXiv:1907.11932},
  year={2019}
}

Data

Our 7 datasets are here.

Prerequisites:

Required packages are listed in the requirements.txt file:

pip install -r requirements.txt

How to use

Run the following code to install the esim package:

cd ESIM
python setup.py install
cd ..

(Optional) Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings.

python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]

Run the following code to generate the adversaries for text classification:

python attack_classification.py

For Natural langauge inference:

python attack_nli.py

Examples of run code for these two files are in run_attack_classification.py and run_attack_nli.py. Here we explain each required argument in details:

--dataset_path: The path to the dataset. We put the 1000 examples for each dataset we used in the paper in the folder data.
--target_model: Name of the target model such as ''bert''.
--target_model_path: The path to the trained parameters of the target model. For ease of replication, we shared the trained BERT model parameters, the trained LSTM model parameters, and the trained CNN model parameters on each dataset we used.
--counter_fitting_embeddings_path: The path to the counter-fitting word embeddings.
--counter_fitting_cos_sim_path: This is optional. If given, then the pre-computed cosine similarity scores based on the counter-fitting word embeddings will be loaded to save time. If not, it will be calculated.
--USE_cache_path: The path to save the USE model file (Downloading is automatic if this path is empty).

Two more things to share with you:

In case someone wants to replicate our experiments for training the target models, we shared the used seven datasets we have processed for you!
In case someone may want to use our generated adversary results towards the benchmark data directly, here it is.

A Model for Natural Language Attack on Text Classification and Inference

Related tags

Overview

TextFooler

Data

Prerequisites:

How to use

Owner

Di Jin

A flexible framework of neural networks for deep learning

Weakly Supervised Learning of Rigid 3D Scene Flow

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.

Large scale embeddings on a single machine.

Official repository for Automated Learning Rate Scheduler for Large-Batch Training (8th ICML Workshop on AutoML)

Hcaptcha-challenger - Gracefully face hCaptcha challenge with Yolov5(ONNX) embedded solution

A high performance implementation of HDBSCAN clustering.

masscan + nmap + Finger

TimeSHAP explains Recurrent Neural Network predictions.

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Scripts for training an AI to play the endless runner Subway Surfers using a supervised machine learning approach by imitation and a convolutional neural network (CNN) for image classification

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

Individual Tree Crown classification on WorldView-2 Images using Autoencoder -- Group 9 Weak learners - Final Project (Machine Learning 2020 Course)

🍷 Gracefully claim weekly free games and monthly content from Epic Store.

The official implementation of Theme Transformer

This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

(CVPR 2022 - oral) Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017