Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Last update: Jan 04, 2023

Related tags

Deep Learning BERT-Attack

Overview

BERT-ATTACK

Code for our EMNLP2020 long paper:

BERT-ATTACK: Adversarial Attack Against BERT Using BERT

Dependencies

Python 3.7
PyTorch 1.4.0
transformers 2.9.0
TextFooler

Usage

To train a classification model, please use the run_glue.py script in the huggingface transformers==2.9.0.

To generate adversarial samples based on the masked-LM, run

python bertattack.py --data_path data_defense/imdb_1k.tsv --mlm_path bert-base-uncased --tgt_path models/imdbclassifier --use_sim_mat 1 --output_dir data_defense/imdb_logs.tsv --num_label 2 --use_bpe 1 --k 48 --start 0 --end 1000 --threshold_pred_score 0

--data_path: We take IMDB dataset as an example. Datasets can be obtained in TextFooler.
--mlm_path: We use BERT-base-uncased model as our target masked-LM.
--tgt_path: We follow the official fine-tuning process in transformers to fine-tune BERT as the target model.
--k 48: The threshold k is the number of possible candidates
--output_dir : The output file.
--start: --end: in case the dataset is large, we provide a script for multi-thread process.
--threshold_pred_score: a score in cutting off predictions that may not be suitable (details in Section5.1)

Note

The datasets are re-formatted to the GLUE style.

Some configs are fixed, you can manually change them.

If you need to use similar-words-filter, you need to download and process consine similarity matrix following TextFooler. We only use the filter in sentiment classification tasks like IMDB and YELP.

If you need to evaluate the USE-results, you need to create the corresponding tensorflow environment USE.

For faster generation, you could turn off the BPE substitution.

As illustrated in the paper, we set thresholds to balance between the attack success rate and USE similarity score.

The multi-thread process use the batchrun.py script

You can run

cat cmd.txt | python batchrun.py --gpus 0,1,2,3

to simutaneously generate adversarial samples of the given dataset for faster generation. We use the IMDB dataset as an example.

Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Related tags

Overview

BERT-ATTACK

Dependencies

Usage

Note

Owner

Linyang Li

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Retrieval.pytorch - The code we used in [2020 DIGIX]

FairMOT - A simple baseline for one-shot multi-object tracking

AdamW optimizer for bfloat16 models in pytorch.

code for Image Manipulation Detection by Multi-View Multi-Scale Supervision

PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability

Object detection on multiple datasets with an automatically learned unified label space.

Gauge equivariant mesh cnn

Tutorial in Python targeted at Epidemiologists. Will discuss the basics of analysis in Python 3

When BERT Plays the Lottery, All Tickets Are Winning

PyTorch Implementation of PIXOR: Real-time 3D Object Detection from Point Clouds

The official github repository for Towards Continual Knowledge Learning of Language Models

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Text Extraction Formulation + Feedback Loop for state-of-the-art WSD (EMNLP 2021)

vit for few-shot classification

This is the code for the paper "Contrastive Clustering" (AAAI 2021)

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

A Runtime method overload decorator which should behave like a compiled language

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)