Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".

Overview

VL-BERT

By Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai.

This repository is an official implementation of the paper VL-BERT: Pre-training of Generic Visual-Linguistic Representations.

Update on 2020/01/16 Add code of visualization.

Update on 2019/12/20 Our VL-BERT got accepted by ICLR 2020.

Introduction

VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual Commonsense Reasoning, Visual Question Answering and Referring Expression Comprehension.

Thanks to PyTorch and its 3rd-party libraries, this codebase also contains following features:

  • Distributed Training
  • FP16 Mixed-Precision Training
  • Various Optimizers and Learning Rate Schedulers
  • Gradient Accumulation
  • Monitoring the Training Using TensorboardX

Citing VL-BERT

@inproceedings{
  Su2020VL-BERT:,
  title={VL-BERT: Pre-training of Generic Visual-Linguistic Representations},
  author={Weijie Su and Xizhou Zhu and Yue Cao and Bin Li and Lewei Lu and Furu Wei and Jifeng Dai},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=SygXPaEYvH}
}

Prepare

Environment

  • Ubuntu 16.04, CUDA 9.0, GCC 4.9.4
  • Python 3.6.x
    # We recommend you to use Anaconda/Miniconda to create a conda environment
    conda create -n vl-bert python=3.6 pip
    conda activate vl-bert
  • PyTorch 1.0.0 or 1.1.0
    conda install pytorch=1.1.0 cudatoolkit=9.0 -c pytorch
  • Apex (optional, for speed-up and fp16 training)
    git clone https://github.com/jackroos/apex
    cd ./apex
    pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./  
  • Other requirements:
    pip install Cython
    pip install -r requirements.txt
  • Compile
    ./scripts/init.sh

Data

See PREPARE_DATA.md.

Pre-trained Models

See PREPARE_PRETRAINED_MODELS.md.

Training

Distributed Training on Single-Machine

./scripts/dist_run_single.sh <num_gpus> <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>
  • <num_gpus>: number of gpus to use.
  • <task>: pretrain/vcr/vqa/refcoco.
  • <path_to_cfg>: config yaml file under ./cfgs/<task>.
  • <dir_to_store_checkpoint>: root directory to store checkpoints.

Following is a more concrete example:

./scripts/dist_run_single.sh 4 vcr/train_end2end.py ./cfgs/vcr/base_q2a_4x16G_fp32.yaml ./

Distributed Training on Multi-Machine

For example, on 2 machines (A and B), each with 4 GPUs,

run following command on machine A:

./scripts/dist_run_multi.sh 2 0 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

run following command on machine B:

./scripts/dist_run_multi.sh 2 1 <ip_addr_of_A> 4 <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

Non-Distributed Training

./scripts/nondist_run.sh <task>/train_end2end.py <path_to_cfg> <dir_to_store_checkpoint>

Note:

  1. In yaml files under ./cfgs, we set batch size for GPUs with at least 16G memory, you may need to adapt the batch size and gradient accumulation steps according to your actual case, e.g., if you decrease the batch size, you should also increase the gradient accumulation steps accordingly to keep 'actual' batch size for SGD unchanged.

  2. For efficiency, we recommend you to use distributed training even on single-machine. But for RefCOCO+, you may meet deadlock using distributed training due to unknown reason (it may be related to PyTorch dataloader deadloack), you can simply use non-distributed training to solve this problem.

Evaluation

VCR

  • Local evaluation on val set:

    python vcr/val.py \
      --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \
      --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

    Note: <indexes_of_gpus_to_use> is gpu indexes, e.g., 0 1 2 3.

  • Generate prediction results on test set for leaderboard submission:

    python vcr/test.py \
      --a-cfg <cfg_of_q2a> --r-cfg <cfg_of_qa2r> \
      --a-ckpt <checkpoint_of_q2a> --r-ckpt <checkpoint_of_qa2r> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

VQA

  • Generate prediction results on test set for EvalAI submission:
    python vqa/test.py \
      --cfg <cfg_file> \
      --ckpt <checkpoint> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

RefCOCO+

  • Local evaluation on val/testA/testB set:
    python refcoco/test.py \
      --split <val|testA|testB> \
      --cfg <cfg_file> \
      --ckpt <checkpoint> \
      --gpus <indexes_of_gpus_to_use> \
      --result-path <dir_to_save_result> --result-name <result_file_name>
    

Visualization

See VISUALIZATION.md.

Acknowledgements

Many thanks to following codes that help us a lot in building this codebase:

Owner
Weijie Su
Graduate student at USTC.
Weijie Su
A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Differentiable SVD Introduction This repository contains: The official Pytorch implementation of ICCV21 paper Why Approximate Matrix Square Root Outpe

YueSong 32 Dec 25, 2022
Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

An LSTM Odyssey Code for training variants of "LSTM: A Search Space Odyssey" on Fomoro. Check out the blog post. Training Install TensorFlow. Clone th

Fomoro AI 95 Apr 13, 2022
Open standard for machine learning interoperability

Open Neural Network Exchange (ONNX) is an open ecosystem that empowers AI developers to choose the right tools as their project evolves. ONNX provides

Open Neural Network Exchange 13.9k Dec 30, 2022
This repository is an unoffical PyTorch implementation of Medical segmentation in 3D and 2D.

Pytorch Medical Segmentation Read Chinese Introduction:Here! Recent Updates 2021.1.8 The train and test codes are released. 2021.2.6 A bug in dice was

EasyCV-Ellis 618 Dec 27, 2022
CS506-Spring2022 - Code and Slides for Boston University CS 506

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston

Lance Galletti 17 May 06, 2022
Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
HEAM: High-Efficiency Approximate Multiplier Optimization for Deep Neural Networks

Approximate Multiplier by HEAM What's HEAM? HEAM is a general optimization method to generate high-efficiency approximate multipliers for specific app

4 Sep 11, 2022
Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

Attention-based Transformation from Latent Features to Point Clouds This repository contains a PyTorch implementation of the paper: Attention-based Tr

12 Nov 11, 2022
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

Peter Hase 19 Aug 21, 2022
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
Solutions of Reinforcement Learning 2nd Edition

Solutions of Reinforcement Learning, An Introduction

YIFAN WANG 1.4k Dec 30, 2022
Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter Code and checkpoints for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling

274 Dec 06, 2022
Assginment for UofT CSC420: Intro to Image Understanding

Run the code Open edge_detection.ipynb in google colab. Upload image1.jpg,image2.jpg and my_image.jpg to '/content/drive/My Drive'. chooose 'Run all'

Ziyi-Zhou 1 Feb 24, 2022
DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN This is the official code and data release for: DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang*, Huan Li

302 Jan 05, 2023
Yoga - Yoga asana classifier for python

Yoga Asana Classifier Description Hi welcome to my new deep learning project "Yo

Programminghut 35 Dec 12, 2022
This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

This repository contains numerical implementation for the paper Intertemporal Pricing under Reference Effects: Integrating Reference Effects and Consumer Heterogeneity.

Hansheng Jiang 6 Nov 18, 2022
Pytorch code for our paper Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains)

Beyond ImageNet Attack: Towards Crafting Adversarial Examples for Black-box Domains (ICLR'2022) This is the Pytorch code for our paper Beyond ImageNet

Alibaba-AAIG 37 Nov 23, 2022
[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

OcCo: Unsupervised Point Cloud Pre-training via Occlusion Completion This repository is the official implementation of paper: "Unsupervised Point Clou

Hanchen 204 Dec 24, 2022
Download & Install mods for your favorit game with a few simple clicks

Husko's SteamWorkshop Downloader 🔴 IMPORTANT ❗ 🔴 The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i

Husko 67 Nov 25, 2022
Scripts used to make and evaluate OpenAlex's concept tagging model

openalex-concept-tagging This repository contains all of the code for getting the concept tagger up and running. To learn more about where this model

OurResearch 18 Dec 09, 2022