Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Last update: Jan 07, 2023

Overview

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

1 Using Colab

Please notice that the notebook assumes that you are using a GPU. To switch runtime go to Runtime -> change runtime type and select GPU.
Installing all the requirements may take some time. After installation, please restart the runtime.

2 Running Examples

Notice that we have two jupyter notebooks to run the examples presented in the paper.

The notebook for LXMERT contains both the examples from the paper and examples with images from the internet and free form questions. To use your own input, simply change the URL variable to your image and the question variable to your free form question.
The notebook for DETR contains the examples from the paper. To use your own input, simply change the URL variable to your image.

3 Reproduction of results

3.1 VisualBERT

Run the run.py script as follows:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python VisualBERT/run.py --method=<method_name> --is-text-pert=<true/false> --is-positive-pert=<true/false> --num-samples=10000 config=projects/visual_bert/configs/vqa2/defaults.yaml model=visual_bert dataset=vqa2 run_type=val checkpoint.resume_zoo=visual_bert.finetuned.vqa2.from_coco_train env.data_dir=/path/to/data_dir training.num_workers=0 training.batch_size=1 training.trainer=mmf_pert training.seed=1234

Note

If the datasets aren't already in env.data_dir, then the script will download the data automatically to the path in env.data_dir.

3.2 LXMERT

Download valid.json:

pushd data/vqa
wget https://nlp.cs.unc.edu/data/lxmert_data/vqa/valid.json
popd

Download the COCO_val2014 set to your local machine.

Note

If you already downloaded COCO_val2014 for the VisualBERT tests, you can simply use the same path you used for VisualBERT.

Run the perturbation.py script as follows:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd` python lxmert/lxmert/perturbation.py  --COCO_path /path/to/COCO_val2014 --method <method_name> --is-text-pert <true/false> --is-positive-pert <true/false>

3.3 DETR

Download the COCO dataset as described in the DETR repository. Notice you only need the validation set.
Lower the IoU minimum threshold from 0.5 to 0.2 using the following steps:
- Locate the cocoeval.py script in your python library path:
  
  find library path:
```
import sys
print(sys.path)
```
  find cocoeval.py:
```
cd /path/to/lib
find -name cocoeval.py
```
- Change the self.iouThrs value in the setDetParams function (which sets the parameters for the COCO detection evaluation) in the Params class as follows:
  
  insead of:
```
self.iouThrs = np.linspace(.5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
```
  use:
```
self.iouThrs = np.linspace(.2, 0.95, int(np.round((0.95 - .2) / .05)) + 1, endpoint=True)
```

Run the segmentation experiment, use the following command:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=`pwd`  python DETR/main.py --coco_path /path/to/coco/dataset  --eval --masks --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --batch_size 1 --method <method_name>

4 Credits

VisualBERT implementation is based on the MMF framework.
LXMERT implementation is based on the offical LXMERT implementation and on Hugging Face Transformers.
DETR implementation is based on the offical DETR implementation

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Related tags

Overview

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers

1 Using Colab

2 Running Examples

3 Reproduction of results

3.1 VisualBERT

3.2 LXMERT

3.3 DETR

4 Credits

Owner

Hila Chefer

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation(mCOLT/mRASP2), ACL2021

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Unofficial TensorFlow implementation of the Keyword Spotting Transformer model

Code and model benchmarks for "SEVIR : A Storm Event Imagery Dataset for Deep Learning Applications in Radar and Satellite Meteorology"

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Grow Function: Generate 3D Stacked Bifurcating Double Deep Cellular Automata based organisms which differentiate using a Genetic Algorithm...

[ICCV '21] In this repository you find the code to our paper Keypoint Communities

Official code for the ICLR 2021 paper Neural ODE Processes

Deep Learning Visuals contains 215 unique images divided in 23 categories

Official Pytorch Implementation of: "Semantic Diversity Learning for Zero-Shot Multi-label Classification"(2021) paper

Official PyTorch implementation of Spatial Dependency Networks.

Automatic differentiation with weighted finite-state transducers.

Python scripts for performing lane detection using the LSTR model in ONNX

Python scripts for performing road segemtnation and car detection using the HybridNets multitask model in ONNX.

Style-based Neural Drum Synthesis with GAN inversion

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

TensorFlow Similarity is a python package focused on making similarity learning quick and easy.

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Multi-tool reverse engineering collaboration solution.