Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

Overview

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers

Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu, Shashank Shekhar†, Graham W. Taylor†, Maksims Volkovs

  • * Denotes equal contribution
  • † University of Guelph / Vector Institute

Prerequisites and Environment

  • pytorch-gpu 1.13.1
  • numpy 1.16.0
  • tqdm

All experiments were conducted on a 20-core Intel(R) Xeon(R) CPU E5-2630 v4 @2.20GHz and 4 NVIDIA V100 GPUs with 32GB GPU memory.

Dataset

Visual Genome

Download it here. Unzip it under the data folder. You should see a vg folder unzipped there. It contains .json annotations that suit the dataloader used in this repo.

Visual Relation Detection

See Images:VRD

Images

Visual Genome

Create a folder for all images:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vg
mkdir VG_100K

Download Visual Genome images from the official page. Unzip all images (part 1 and part 2) into VG_100K/. There should be a total of 108249 files.

Visual Relation Detection

Create the vrd folder under data:

# ROOT=path/to/cloned/repository
cd $ROOT/data/vrd

Download the original annotation json files from here and unzip json_dataset.zip here. The images can be downloaded from here. Unzip sg_dataset.zip to create an sg_dataset folder in data/vrd. Next run the preprocessing scripts:

cd $ROOT
python tools/rename_vrd_with_numbers.py
python tools/convert_vrd_anno_to_coco_format.py

rename_vrd_with_numbers.py converts all non-jpg images (some images are in png or gif) to jpg, and renames them in the {:012d}.jpg format (e.g., "000000000001.jpg"). It also creates new relationship annotations other than the original ones. This is mostly to make things easier for the dataloader. The filename mapping from the original is stored in data/vrd/*_fname_mapping.json where "*" is either "train" or "val".

convert_vrd_anno_to_coco_format.py creates object detection annotations from the new annotations generated above, which are required by the dataloader during training.

Pre-trained Object Detection Models

Download pre-trained object detection models here. Unzip it under the root directory. Note: We do not include code for training object detectors. Please refer to the "(Optional) Training Object Detection Models" section in Large-Scale-VRD.pytorch for this.

Directory Structure

The final directories should look like:

|-- data
|   |-- detections_train.json
|   |-- detections_val.json
|   |-- new_annotations_train.json
|   |-- new_annotations_val.json
|   |-- objects.json
|   |-- predicates.json
|-- evaluation
|-- output
|   |-- pair_predicate_dict.dat
|   |-- train_data.dat
|   |-- valid_data.dat
|-- config.py
|-- core.py
|-- data_utils.py
|-- evaluation_utils.py
|-- feature_utils.py
|-- file_utils.py
|-- preprocess.py
|-- trainer.py
|-- transformer.py

Evaluating Pre-trained Relationship Detection models

DO NOT CHANGE anything in the provided config files(configs/xx/xxxx.yaml) even if you want to test with less or more than 8 GPUs. Use the environment variable CUDA_VISIBLE_DEVICES to control how many and which GPUs to use. Remove the --multi-gpu-test for single-gpu inference.

Visual Genome

NOTE: May require at least 64GB RAM to evaluate on the Visual Genome test set

We use three evaluation metrics for Visual Genome:

  1. SGDET: predict all the three labels and two boxes
  2. SGCLS: predict subject, object and predicate labels given ground truth subject and object boxes
  3. PRDCLS: predict predicate labels given ground truth subject and object boxes and labels

Training Scene Graph Generation Models

With the following command lines, the training results (models and logs) should be in $ROOT/Outputs/xxx/ where xxx is the .yaml file name used in the command without the ".yaml" extension. If you want to test with your trained models, simply run the test commands described above by setting --load_ckpt as the path of your trained models.

Visual Relation Detection

To train our scene graph generation model on the VRD dataset, run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 500 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vrd/

cd evaluation/vrd

python run_all_for_vrd.py prediction.txt

Visual Genome

To train our scene graph generation model on the VG dataset, download the json files from https://visualgenome.org/api/v0/api_home.html, put the extracted files under data and then run

python preprocess.py

python trainer.py --num-encoder-layers 4 --num-decoder-layers 2 --nhead 4 --num-epochs 2000 --learning-rate 1e-3

python preprocess_evaluation.py

python write_prediction.py

mv prediction.txt evaluation/vg/

cd evaluation/vg

python run_all.py prediction.txt

Acknowledgements

This repository uses code based on the ContrastiveLosses4VRD Ji Zhang, Neural-Motifs source code from Rowan Zellers.

Citation

If you find this code useful in your research, please cite the following paper:

@inproceedings{lu2021seq2seq,
  title={Context-aware Scene Graph Generation with Seq2Seq Transformers},
  author={Yichao Lu, Himanshu Rai, Jason Chang, Boris Knyazev, Guangwei Yu, Shashank Shekhar, Graham W. Taylor, Maksims Volkovs},
  booktitle={ICCV},
  year={2021}
}
Owner
Layer6 Labs
Research repositories from Layer 6 AI.
Layer6 Labs
Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

by Matyáš Boháček and Marek Hrúz, University of West Bohemia Should you have any questions or inquiries, feel free to contact us here. Repository acco

Matyáš Boháček 30 Dec 30, 2022
Perform Linear Classification with Multi-way Data

MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (

Eric F. Lock 2 Dec 15, 2020
1st place solution in CCF BDCI 2021 ULSEG challenge

1st place solution in CCF BDCI 2021 ULSEG challenge This is the source code of the 1st place solution for ultrasound image angioma segmentation task (

Chenxu Peng 30 Nov 22, 2022
[ICCV21] Self-Calibrating Neural Radiance Fields

Self-Calibrating Neural Radiance Fields, ICCV, 2021 Project Page | Paper | Video Author Information Yoonwoo Jeong [Google Scholar] Seokjun Ahn [Google

381 Dec 30, 2022
Fast SHAP value computation for interpreting tree-based models

FastTreeSHAP FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 X

LinkedIn 369 Jan 04, 2023
bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

Code Submission for: Bio-inspired Min-Nets Improve the Performance and Robustness of Deep Networks Run with docker To build a docker environment, chan

0 Dec 09, 2021
Generating Radiology Reports via Memory-driven Transformer

R2Gen This is the implementation of Generating Radiology Reports via Memory-driven Transformer at EMNLP-2020. Citations If you use or extend our work,

CUHK-SZ NLP Group 101 Dec 13, 2022
COLMAP - Structure-from-Motion and Multi-View Stereo

COLMAP About COLMAP is a general-purpose Structure-from-Motion (SfM) and Multi-View Stereo (MVS) pipeline with a graphical and command-line interface.

4.7k Jan 07, 2023
A python code to convert Keras pre-trained weights to Pytorch version

Weights_Keras_2_Pytorch 最近想在Pytorch项目里使用一下谷歌的NIMA,但是发现没有预训练好的pytorch权重,于是整理了一下将Keras预训练权重转为Pytorch的代码,目前是支持Keras的Conv2D, Dense, DepthwiseConv2D, Batch

Liu Hengyu 2 Dec 16, 2021
The official repository for "Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds"

Revealing unforeseen diagnostic image features with deep learning by detecting cardiovascular diseases from apical four-chamber ultrasounds The why Im

3 Mar 29, 2022
Human Pose estimation with TensorFlow framework

Human Pose Estimation with TensorFlow Here you can find the implementation of the Human Body Pose Estimation algorithm, presented in the DeeperCut and

Eldar Insafutdinov 1.1k Dec 29, 2022
Convert human motion from video to .bvh

video_to_bvh Convert human motion from video to .bvh with Google Colab Usage 1. Open video_to_bvh.ipynb in Google Colab Go to https://colab.research.g

Dene 306 Dec 10, 2022
An unsupervised learning framework for depth and ego-motion estimation from monocular videos

SfMLearner This codebase implements the system described in the paper: Unsupervised Learning of Depth and Ego-Motion from Video Tinghui Zhou, Matthew

Tinghui Zhou 1.8k Dec 30, 2022
Deep learning PyTorch library for time series forecasting, classification, and anomaly detection

Deep learning for time series forecasting Flow forecast is an open-source deep learning for time series forecasting framework. It provides all the lat

AIStream 1.2k Jan 04, 2023
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
SVG Icon processing tool for C++

BAWR This is a tool to automate the icons generation from sets of svg files into fonts and atlases. The main purpose of this tool is to add it to the

Frank David Martínez M 66 Dec 14, 2022
Materials for my scikit-learn tutorial

Scikit-learn Tutorial Jake VanderPlas email: [email protected] twitter: @jakevdp gith

Jake Vanderplas 1.6k Dec 30, 2022
CVPRW 2021: How to calibrate your event camera

E2Calib: How to Calibrate Your Event Camera This repository contains code that implements video reconstruction from event data for calibration as desc

Robotics and Perception Group 104 Nov 16, 2022
Automatic labeling, conversion of different data set formats, sample size statistics, model cascade

Simple Gadget Collection for Object Detection Tasks Automatic image annotation Conversion between different annotation formats Obtain statistical info

llt 4 Aug 24, 2022
Interpretable-contrastive-word-mover-s-embedding

Interpretable-contrastive-word-mover-s-embedding Paper Datasets Here is a Dropbox link to the datasets used in the paper: https://www.dropbox.com/sh/n

0 Nov 02, 2021