image scene graph generation benchmark

Overview

Scene Graph Benchmark in PyTorch 1.7

This project is based on maskrcnn-benchmark

alt text

Highlights

  • Upgrad to pytorch 1.7
  • Multi-GPU training and inference
  • Batched inference: can perform inference using multiple images per batch per GPU.
  • Fast and flexible tsv dataset format
  • Remove FasterRCNN detector dependency: during relation head training, can plugin bounding boxes from any detector.
  • Provides pre-trained models for different scene graph detection algorithms (IMP, MSDN, GRCNN, Neural Motif, RelDN).
  • Provides bounding box level and relation level feature extraction functionalities
  • Provides large detector backbones (ResNxt152)

Installation

Check INSTALL.md for installation instructions.

Model Zoo and Baselines

Pre-trained models can be found in SCENE_GRAPH_MODEL_ZOO.md

Visualization and Demo

We provide a helper class to simplify writing inference pipelines using pre-trained models (Currently only support objects and attributes). Here is how we would do it. Run the following commands:

# visualize VinVL object detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.obj.jpg MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize VinVL object-attribute detection
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/demo/demo_image.py --config_file sgg_configs/vgattr/vinvl_x152c4.yaml --img_file demo/woman_fish.jpg --save_file output/woman_fish_x152c4.attr.jpg --visualize_attr MODEL.WEIGHT pretrained_model/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 TEST.IGNORE_BOX_REGRESSION False

# visualize OpenImage scene graph generation by RelDN
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/sgg_model_zoo/sgg_oi_vrd_model_zoo/RX152FPN_reldn_oi_best.pth
python tools/demo/demo_image.py --config_file sgg_configs/vrd/R152FPN_vrd_reldn.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file output/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.reldn_relation.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False

# visualize Visual Genome scene graph generation by neural motif
python tools/demo/demo_image.py --config_file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml --img_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa.jpg --save_file demo/1024px-Gen_Robert_E_Lee_on_Traveler_at_Gettysburg_Pa_vgnm.jpg --visualize_relation MODEL.ROI_RELATION_HEAD.DETECTOR_PRE_CALCULATED False DATASETS.LABELMAP_FILE "visualgenome/VG-SGG-dicts-danfeiX-clipped.json" DATA_DIR /home/penzhan/GitHub/maskrcnn-benchmark-1/datasets1 MODEL.ROI_RELATION_HEAD.USE_BIAS True MODEL.ROI_RELATION_HEAD.FILTER_NON_OVERLAP True MODEL.ROI_HEADS.DETECTIONS_PER_IMG 64 MODEL.ROI_RELATION_HEAD.SHARE_BOX_FEATURE_EXTRACTOR False MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.OBJ_LSTM_NUM_LAYERS 0 MODEL.ROI_RELATION_HEAD.NEURAL_MOTIF.EDGE_LSTM_NUM_LAYERS 2 TEST.IMS_PER_BATCH 2

Perform training

For the following examples to work, you need to first install this repo.

You will also need to download the dataset. Datasets can be downloaded by azcopy with following command:

path/to/azcopy copy 'https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/datasets/TASK_NAME' <target folder> --recursive

TASK_NAME could be visualgenome, openimages_v5c.

We recommend to symlink the path to the dataset to datasets/ as follows

# symlink the dataset
cd ~/github/maskrcnn-benchmark
mkdir -p datasets/openimages_v5c/
ln -s /vrd datasets/openimages_v5c/vrd

You can also prepare your own datasets.

Follow tsv dataset creation instructions tools/mini_tsv/README.md

Single GPU training

python tools/train_sg_net.py --config-file "/path/to/config/file.yaml"

This should work out of the box and is very similar to what we should do for multi-GPU training. But the drawback is that it will use much more GPU memory. The reason is that we set in the configuration files a global batch size that is divided over the number of GPUs. So if we only have a single GPU, this means that the batch size for that GPU will be 4x larger, which might lead to out-of-memory errors.

Multi-GPU training

We use internally torch.distributed.launch in order to launch multi-gpu training. This utility function from PyTorch spawns as many Python processes as the number of GPUs we want to use, and each Python process will only use a single GPU.

export NGPUS=4
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/train_sg_net.py --config-file "path/to/config/file.yaml" 

Evaluation

You can test your model directly on single or multiple gpus. To evaluate relations, one needs to output "relation_scores_all" in the TSV_SAVE_SUBSET. Here are a few example command line for evaluating on 4 GPUS:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH 

# vg IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_imp.yaml

# vg MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_msdn.yaml

# vg neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_nm.yaml

# vg GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_grcnn.yaml

# vg RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vg_vrd/rel_danfeiX_FPN50_reldn.yaml

# oi IMP evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_imp_bias_oi.yaml

# oi MSDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_msdn_bias_oi.yaml

# oi neural motif evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_motif_oi.yaml

# oi GRCNN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/oi_vrd/R152FPN_grcnn_oi.yaml

# oi RelDN evaluation
python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file sgg_configs/vrd/R152FPN_vrd_reldn.yaml

To evaluate in sgcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_BOX_HEAD.FORCE_BOXES True MODEL.ROI_RELATION_HEAD.MODE "sgcls"

To evaluate in predcls mode:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.MODE "predcls"

To evaluate with ground truth bbox and ground truth pairs:

export NGPUS=4

python -m torch.distributed.launch --nproc_per_node=$NGPUS tools/test_sg_net.py --config-file CONFIG_FILE_PATH MODEL.ROI_RELATION_HEAD.FORCE_RELATIONS True

Abstractions

For more information on some of the main abstractions in our implementation, see ABSTRACTIONS.md.

Adding your own dataset

This implementation adds support for TSV style datasets. But adding support for training on a new dataset can be done as follows:

from maskrcnn_benchmark.data.datasets.relation_tsv import RelationTSVDataset

class MyDataset(RelationTSVDataset):
    def __init__(self, yaml_file, extra_fields=(), transforms=None,
            is_load_label=True, **kwargs):

        super(MyDataset, self).__init__(yaml_file, extra_fields, transforms, is_load_label, **kwargs)
    
    def your_own_function(self, idx, call=False):
        # you can overwrite function or add your own functions this way
        pass

That's it. You can also add extra fields to the boxlist, such as segmentation masks (using structures.segmentation_mask.SegmentationMask), or even your own instance type.

For a full example of how the VGTSVDataset is implemented, check maskrcnn_benchmark/data/datasets/vg_tsv.py.

Once you have created your dataset, it needs to be added in a couple of places:

Adding your own evaluation

To enable your dataset for testing, add a corresponding if statement in maskrcnn_benchmark/data/datasets/evaluation/__init__.py:

if isinstance(dataset, datasets.MyDataset):
        return your_evaluation(**args)

VinVL Feature extraction

The output feature will be encoded as base64

# extract vision features with VinVL object-attribute detection model
# pretrained models at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/vinvl_vg_x152c4.pth
# the associated labelmap at https://penzhanwu2.blob.core.windows.net/sgg/sgg_benchmark/vinvl_model_zoo/VG-SGG-dicts-vgoi6-clipped.json
python tools/test_sg_net.py --config-file sgg_configs/vgattr/vinvl_x152c4.yaml TEST.IMS_PER_BATCH 2 MODEL.WEIGHT models/vinvl/vinvl_vg_x152c4.pth MODEL.ROI_HEADS.NMS_FILTER 1 MODEL.ROI_HEADS.SCORE_THRESH 0.2 DATA_DIR "../maskrcnn-benchmark-1/datasets1" TEST.IGNORE_BOX_REGRESSION True MODEL.ATTRIBUTE_ON True

To extract relation features (union bounding box's feature), in yaml file, set TEST.OUTPUT_RELATION_FEATURE to True, add relation_feature in TEST.TSV_SAVE_SUBSET.

To extract bounding box features, in yaml file, set TEST.OUTPUT_FEATURE to True, add feature in TEST.TSV_SAVE_SUBSET.

Troubleshooting

If you have issues running or compiling this code, we have compiled a list of common issues in TROUBLESHOOTING.md. If your issue is not present there, please feel free to open a new issue.

Citations

Please consider citing this project in your publications if it helps your research. The following is a BibTeX reference. The BibTeX entry requires the url LaTeX package.

@misc{han2021image,
      title={Image Scene Graph Generation (SGG) Benchmark}, 
      author={Xiaotian Han and Jianwei Yang and Houdong Hu and Lei Zhang and Jianfeng Gao and Pengchuan Zhang},
      year={2021},
      eprint={2107.12604},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

maskrcnn-benchmark is released under the MIT license. See LICENSE for additional details.

Acknowledgement

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss This is official implement of "

程星 87 Dec 24, 2022
Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

SEAM Match-RCNN Official code of MovingFashion: a Benchmark for the Video-to-Shop Challenge paper Installation Requirements: Pytorch 1.5.1 or more rec

HumaticsLAB 31 Oct 10, 2022
Keras-1D-NN-Classifier

Keras-1D-NN-Classifier This code is based on the reference codes linked below. reference 1, reference 2 This code is for 1-D array data classification

Jae-Hoon Shim 6 May 18, 2021
Fast Soft Color Segmentation

Fast Soft Color Segmentation

3 Oct 29, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
discovering subdomains, hidden paths, extracting unique links

python-website-crawler discovering subdomains, hidden paths, extracting unique links pip install -r requirements.txt discover subdomain: You can give

merve 4 Sep 05, 2022
A simple library that implements CLIP guided loss in PyTorch.

pytorch_clip_guided_loss: Pytorch implementation of the CLIP guided loss for Text-To-Image, Image-To-Image, or Image-To-Text generation. A simple libr

Sergei Belousov 74 Dec 26, 2022
SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

IMLHF 3 Oct 11, 2022
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022
The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

OverlapTransformer The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for

HAOMO.AI 136 Jan 03, 2023
Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs.

Lunar Lunar is a neural network aimbot that uses real-time object detection accelerated with CUDA on Nvidia GPUs. About Lunar can be modified to work

Zeyad Mansour 276 Jan 07, 2023
Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression"

beyond-preserved-accuracy Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression" How to implemen

Kevin Canwen Xu 10 Dec 23, 2022
quantize aware training package for NCNN on pytorch

ncnnqat ncnnqat is a quantize aware training package for NCNN on pytorch. Table of Contents ncnnqat Table of Contents Installation Usage Code Examples

62 Nov 23, 2022
Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Dominik Klein 189 Dec 21, 2022
SplineConv implementation for Paddle.

SplineConv implementation for Paddle This module implements the SplineConv operators from Matthias Fey, Jan Eric Lenssen, Frank Weichert, Heinrich Mül

北海若 3 Dec 29, 2021
The Official Repository for "Generalized OOD Detection: A Survey"

Generalized Out-of-Distribution Detection: A Survey 1. Overview This repository is with our survey paper: Title: Generalized Out-of-Distribution Detec

Jingkang Yang 338 Jan 03, 2023
This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

The Pytorch Implementation of Category-consistent deep network learning for accurate vehicle logo recognition This is the offical website for paper ''

Wanglong Lu 28 Oct 29, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

Machine_Learning Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning This project is based on 2 case-studies:

Avnika Mehta 1 Jan 27, 2022
Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google 89 Dec 22, 2022