Multi Task Vision and Language

Last update: Dec 19, 2022

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Please cite the following if you use this code. Code and pre-trained models for 12-in-1: Multi-Task Vision and Language Representation Learning:

@InProceedings{Lu_2020_CVPR,
author = {Lu, Jiasen and Goswami, Vedanuj and Rohrbach, Marcus and Parikh, Devi and Lee, Stefan},
title = {12-in-1: Multi-Task Vision and Language Representation Learning},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

and ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks:

@inproceedings{lu2019vilbert,
  title={Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  booktitle={Advances in Neural Information Processing Systems},
  pages={13--23},
  year={2019}
}

Repository Setup

Create a fresh conda environment, and install all dependencies.

conda create -n vilbert-mt python=3.6
conda activate vilbert-mt
git clone --recursive https://github.com/facebookresearch/vilbert-multi-task.git
cd vilbert-multi-task
pip install -r requirements.txt

Install pytorch

conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Install apex, follows https://github.com/NVIDIA/apex
Install this codebase as a package in this environment.

python setup.py develop

Data Setup

Check README.md under data for more details.

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

python train_concap.py --bert_model bert-base-uncased --config_file config/bert_base_6layer_6conect.json --train_batch_size 512 --objective 1 --file_path <path_to_extracted_cc_features>

Download link

Multi-task Training

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <pretrained_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1-2-4-7-8-9-10-11-12-13-15-17 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name multi_task_model

Download link

Fine-tune from Multi-task trained model

python train_tasks.py --bert_model bert-base-uncased --from_pretrained <multi_task_model_path> --config_file config/bert_base_6layer_6conect.json --tasks 1 --lr_scheduler 'warmup_linear' --train_iter_gap 4 --task_specific_tokens --save_name finetune_from_multi_task_model

License

vilbert-multi-task is licensed under MIT license available in LICENSE file.

Multi Task Vision and Language

Related tags

Overview

12-in-1: Multi-Task Vision and Language Representation Learning

Repository Setup

Data Setup

Visiolinguistic Pre-training and Multi Task Training

Pretraining on Conceptual Captions

Multi-task Training

Fine-tune from Multi-task trained model

License

Owner

Facebook Research

Age Progression/Regression by Conditional Adversarial Autoencoder

PyTorch Implementations for DeeplabV3 and PSPNet

Unbiased Learning To Rank Algorithms (ULTRA)

This is the source code of the solver used to compete in the International Timetabling Competition 2019.

An off-line judger supporting distributed problem repositories

Neural network-based build time estimation for additive manufacturing

The pytorch implementation of DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Pytorch implementation of the paper "Topic Modeling Revisited: A Document Graph-based Neural Network Perspective"

An end-to-end implementation of intent prediction with Metaflow and other cool tools

Implementation of a Transformer, but completely in Triton

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

An implementation of the efficient attention module.

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Tracking Progress in Question Answering over Knowledge Graphs

The code for our paper Semi-Supervised Learning with Multi-Head Co-Training

A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

Implementation of ECCV20 paper: the devil is in classification: a simple framework for long-tail object detection and instance segmentation

Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

InvTorch: memory-efficient models with invertible functions