Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Last update: Dec 23, 2022

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Please consider citing our paper in your publications if the project helps your research.

@inproceedings{vision-language-transformer,
  title={Vision-Language Transformer and Query Generation for Referring Segmentation},
  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2021}
}

Installation

Environment:
- Python 3.6
- tensorflow 1.15
- Other dependencies in requirements.txt
- SpaCy model for embedding:
  
  python -m spacy download en_vectors_web_lg
Dataset preparation
- Put the folder of COCO training set ("train2014") under data/images/.
- Download the RefCOCO dataset from here and extract them to data/. Then run the script for data preparation under data/:
```
cd data
python data_process_v2.py --data_root . --output_dir data_v2 --dataset [refcoco/refcoco+/refcocog] --split [unc/umd/google] --generate_mask
```

Evaluating

Download pretrained models & config files from here.
In the config file, set:
- evaluate_model: path to the pretrained weights
- evaluate_set: path to the dataset for evaluation.

Run

python vlt.py test [PATH_TO_CONFIG_FILE]

Training

Pretrained Backbones: We use the backbone weights proviede by MCN.

Note: we use the backbone that excludes all images that appears in the val/test splits of RefCOCO, RefCOCO+ and RefCOCOg.
Specify hyperparameters, dataset path and pretrained weight path in the configuration file. Please refer to the examples under /config, or config file of our pretrained models.

Run

python vlt.py train [PATH_TO_CONFIG_FILE]

Acknowledgement

We borrowed a lot of codes from MCN, keras-transformer, RefCOCO API and keras-yolo3. Thanks for their excellent works!

Vision-Language Transformer and Query Generation for Referring Segmentation (ICCV 2021)

Related tags

Overview

Vision-Language Transformer and Query Generation for Referring Segmentation

Installation

Evaluating

Training

Acknowledgement

Owner

Henghui Ding

frida工具的缝合怪

GUPNet - Geometry Uncertainty Projection Network for Monocular 3D Object Detection

Honours project, on creating a depth estimation map from two stereo images of featureless regions

A mini-course offered to Undergrad chemistry students

Sound Source Localization for AI Grand Challenge 2021

Code for the paper: Sketch Your Own GAN

ONNX Command-Line Toolbox

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

Code for the Weighted, Accelerated and Restarted Primal-dual algorithm. This algorithm achieves stable linear convergence for reconstruction from undersampled noisy measurements under an approximate sharpness condition. See the paper for details.

[CVPR-2021] UnrealPerson: An adaptive pipeline for costless person re-identification

Multitask Learning Strengthens Adversarial Robustness

[Link]mareteutral - pars tradg wth M []

Object tracking implemented with YOLOv4, DeepSort, and TensorFlow.

Generative Adversarial Networks for High Energy Physics extended to a multi-layer calorimeter simulation

StyleGAN2 - Official TensorFlow Implementation

Deep Learning applied to Integral data analysis

Pose estimation with MoveNet Lightning

DROPO: Sim-to-Real Transfer with Offline Domain Randomization

Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes, ICCV 2017

This tool uses Deep Learning to help you draw and write with your hand and webcam.