Official Implementation of VAT

Last update: Dec 27, 2022

Overview

Semantic correspondence

Few-shot segmentation

Cost Aggregation Is All You Need for Few-Shot Segmentation

For more information, check out project [Project Page] and the paper on [arXiv].

Network

Our model VAT is illustrated below:

Environment Settings

git clone https://github.com/Seokju-Cho/Volumetric-Aggregation-Transformer.git

cd Volumetric-Aggregation-Transformer

conda env create -f environment.yaml

Preparing Few-Shot Segmentation Datasets

Download following datasets:

1. PASCAL-5ⁱ

Download PASCAL VOC2012 devkit (train/val data):
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
Download PASCAL VOC2012 SDS extended mask annotations from our [Google Drive].

2. COCO-20ⁱ

Download COCO2014 train/val images and annotations:
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
Download COCO2014 train/val annotations from our Google Drive: [train2014.zip], [val2014.zip]. (and locate both train2014/ and val2014/ under annotations/ directory).

3. FSS-1000

Download FSS-1000 images and annotations from our [Google Drive].

Create a directory '../Datasets_VAT' for the above three few-shot segmentation datasets and appropriately place each dataset to have following directory structure:

../                         # parent directory
└── Datasets_VAT/
    ├── VOC2012/            # PASCAL VOC2012 devkit
    │   ├── Annotations/
    │   ├── ImageSets/
    │   ├── ...
    │   └── SegmentationClassAug/
    ├── COCO2014/           
    │   ├── annotations/
    │   │   ├── train2014/  # (dir.) training masks (from Google Drive) 
    │   │   ├── val2014/    # (dir.) validation masks (from Google Drive)
    │   │   └── ..some json files..
    │   ├── train2014/
    │   └── val2014/
    └── FSS-1000/           # (dir.) contains 1000 object classes
        ├── abacus/   
        ├── ...
        └── zucchini/

Training

Training on PASCAL-5ⁱ:

  python train.py --config "config/pascal_resnet{50, 101}/pascal_resnet{50, 101}_fold{0, 1, 2, 3}/config.yaml"

Training on COCO-20ⁱ:

  python train.py --config "config/coco_resnet50/coco_resnet50_fold{0, 1, 2, 3}/config.yaml"

Training on FSS-1000:

  python train.py --config "config/fss_resnet{50, 101}/config.yaml"

Evaluation

Download pre-trained weights on Link

Result on PASCAL-5ⁱ:

  python test.py --load "/path_to_pretrained_model/pascal_resnet{50, 101}/pascal_resnet{50, 101}_fold{0, 1, 2, 3}/"

Result on COCO-20ⁱ:

  python test.py --load "/path_to_pretrained_model/coco_resnet50/coco_resnet50_fold{0, 1, 2, 3}/"

Results on FSS-1000:

  python test.py --load "/path_to_pretrained_model/fss_resnet{50, 101}/"

Acknowledgement

We borrow code from public projects (huge thanks to all the projects). We mainly borrow code from HSNet.

Official Implementation of VAT

Related tags

Overview

Semantic correspondence

Few-shot segmentation

Cost Aggregation Is All You Need for Few-Shot Segmentation

Network

Environment Settings

Preparing Few-Shot Segmentation Datasets

1. PASCAL-5ⁱ

2. COCO-20ⁱ

3. FSS-1000

Training

Evaluation

Acknowledgement

Owner

Hamacojr

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Annotate datasets with a semi-trained or fully trained YOLOv5 model

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

WSDM2022 Challenge - Large scale temporal graph link prediction

Boosted CVaR Classification (NeurIPS 2021)

X-VLM: Multi-Grained Vision Language Pre-Training

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

MINOS: Multimodal Indoor Simulator

Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Code for the paper "Jukebox: A Generative Model for Music"

[ICML 2021] A fast algorithm for fitting robust decision trees.

A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

Python implementation of Project Fluent

Current state of supervised and unsupervised depth completion methods

Official Implementation of VAT

Related tags

Overview

Semantic correspondence

Few-shot segmentation

Cost Aggregation Is All You Need for Few-Shot Segmentation

Network

Environment Settings

Preparing Few-Shot Segmentation Datasets

1. PASCAL-5i

2. COCO-20i

3. FSS-1000

Training

Evaluation

Acknowledgement

Owner

Hamacojr

Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

Annotate datasets with a semi-trained or fully trained YOLOv5 model

code for the ICLR'22 paper: On Robust Prefix-Tuning for Text Classification

rastrainer is a QGIS plugin to training remote sensing semantic segmentation model based on PaddlePaddle.

WSDM2022 Challenge - Large scale temporal graph link prediction

Boosted CVaR Classification (NeurIPS 2021)

X-VLM: Multi-Grained Vision Language Pre-Training

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

An atmospheric growth and evolution model based on the EVo degassing model and FastChem 2.0

MINOS: Multimodal Indoor Simulator

Implementation of hyperparameter optimization/tuning methods for machine learning & deep learning models

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Code for the paper "Jukebox: A Generative Model for Music"

[ICML 2021] A fast algorithm for fitting robust decision trees.

A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

codes for paper Combining Dynamic Local Context Focus and Dependency Cluster Attention for Aspect-level sentiment classification

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

Python implementation of Project Fluent

Current state of supervised and unsupervised depth completion methods

1. PASCAL-5ⁱ

2. COCO-20ⁱ