Code repository for our paper "Learning to Generate Scene Graph from Natural Language Supervision" in ICCV 2021

Overview

Scene Graph Generation from Natural Language Supervision

This repository includes the Pytorch code for our paper "Learning to Generate Scene Graph from Natural Language Supervision" accepted in ICCV 2021.

overview figure

Top (our setting): Our goal is learning to generate localized scene graphs from image-text pairs. Once trained, our model takes an image and its detected objects as inputs and outputs the image scene graph. Bottom (our results): A comparison of results from our method and state-of-the-art (SOTA) with varying levels of supervision.

Contents

  1. Overview
  2. Qualitative Results
  3. Installation
  4. Data
  5. Metrics
  6. Pretrained Object Detector
  7. Pretrained Scene Graph Generation Models
  8. Model Training
  9. Model Evaluation
  10. Acknowledgement
  11. Reference

Overview

Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph. Further, we design a Transformer-based model to predict these "pseudo" labels via a masked token prediction task. Learning from only image-sentence pairs, our model achieves 30% relative gain over a latest method trained with human-annotated unlocalized scene graphs. Our model also shows strong results for weakly and fully supervised scene graph generation. In addition, we explore an open-vocabulary setting for detecting scene graphs, and present the first result for open-set scene graph generation.

Qualitative Results

Our generated scene graphs learned from image descriptions

overview figure

Partial visualization of Figure 3 in our paper: Our model trained by image-sentence pairs produces scene graphs with a high quality (e.g. "man-on-motorcycle" and "man-wearing-helmet" in first example). More comparison with other models trained by stronger supervision (e.g. unlocalized scene graph labels, localized scene graph labels) can be viewed in the Figure 3 of paper.

Our generated scene graphs in open-set and closed-set settings

overview figure

Figure 4 in our paper: We explored open-set setting where the categories of target concepts (objects and predicates) are unknown during training. Compared to our closed-set model, our open-set model detects more concepts outside the evaluation dataset, Visual Genome (e.g. "swinge", "mouse", "keyboard"). Our results suggest an exciting avenue of large-scale training of open-set scene graph generation using image captioning dataset such as Conceptual Caption.

Installation

Check INSTALL.md for installation instructions.

Data

Check DATASET.md for instructions of data downloading.

Metrics

Explanation of metrics in this toolkit are given in METRICS.md

Pretrained Object Detector

In this project, we primarily use the detector Faster RCNN pretrained on Open Images dataset. To use this repo, you don't need to run this detector. You can directly download the extracted detection features, as the instruction in DATASET.md. If you're interested in this detector, the pretrained model can be found in TensorFlow 1 Detection Model Zoo: faster_rcnn_inception_resnet_v2_atrous_oidv4.

Additionally, to compare with previous fully supervised models, we also use the detector pretrained by Scene-Graph-Benchmark. You can download this Faster R-CNN model and extract all the files to the directory checkpoints/pretrained_faster_rcnn.

Pretrained Scene Graph Generation Models

Our pretrained SGG models can be downloaded on Google Drive. The details of these models can be found in Model Training section below. After downloading, please put all the folders to the directory checkpoints/. More pretrained models will be released. Stay tuned!

Model Training

To train our scene graph generation models, run the script

bash train.sh MODEL_TYPE

where MODEL_TYPE specifies the training supervision, the training dataset and the scene graph generation model. See details below.

  1. Language supervised models: trained by image-text pairs

    • Language_CC-COCO_Uniter: train our Transformer-based model on Conceptual Caption (CC) and COCO Caption (COCO) datasets
    • Language_*_Uniter: train our Transformer-based model on single dataset. * represents the dataset name and can be CC, COCO, and VG
    • Language_OpensetCOCO_Uniter: train our Transformer-based model on COCO dataset in open-set setting
    • Language_CC-COCO_MotifNet: train Motif-Net model with language supervision from CC and COCO datasets
  2. Weakly supervised models: trained by unlocalized scene graph labels

    • Weakly_Uniter: train our Transformer-based model
  3. Fully supervised models: trained by localized scene graph labels

    • Sup_Uniter: train our Transformer-based model
    • Sup_OnlineDetector_Uniter: train our Transformer-based model by using the object detector from Scene-Graph-Benchmark.

You can set CUDA_VISIBLE_DEVICES in train.sh to specify which GPUs are used for model training (e.g., the default script uses 2 GPUs).

Model Evaluation

To evaluate the trained scene graph generation model, you can reuse the commands in train.sh by simply changing WSVL.SKIP_TRAIN to True and setting OUTPUT_DIR as the path to your trained model. One example can be found in test.sh and just run bash test.sh.

Acknowledgement

This repository was built based on Scene-Graph-Benchmark for scene graph generation and UNITER for image-text representation learning.

We specially would like to thank Pengchuan Zhang for providing the object detector pretrained on Objects365 dataset which was used in our ablation study.

Reference

If you are using our code, please consider citing our paper.

@inproceedings{zhong2021SGGfromNLS,
  title={Learning to Generate Scene Graph from Natural Language Supervision},
  author={Zhong, Yiwu and Shi, Jing and Yang, Jianwei and Xu, Chenliang and Li, Yin},
  booktitle={ICCV},
  year={2021}
}
Owner
Yiwu Zhong
Ph.D. Student of Computer Science in University of Wisconsin-Madison
Yiwu Zhong
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"

Abstract We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms are inadequate for the falsificat

Montefiore Artificial Intelligence Research 3 Nov 14, 2022
Adversarial Autoencoders

Adversarial Autoencoders (with Pytorch) Dependencies argparse time torch torchvision numpy itertools matplotlib Create Datasets python create_datasets

Felipe Ducau 188 Jan 01, 2023
Implementation for NeurIPS 2021 Submission: SparseFed

READ THIS FIRST This repo is an anonymized version of an existing repository of GitHub, for the AIStats 2021 submission: SparseFed: Mitigating Model P

2 Jun 15, 2022
A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

SPAAR Description A toolset of Python programs for signal modeling via sparse semilinear autoregressors. References Vides, F. (2021). Computing Semili

Fredy Vides 0 Oct 30, 2021
dualPC.R contains the R code for the main functions.

dualPC.R contains the R code for the main functions. dualPC_sim.R contains an example run with the different PC versions; it calls dualPC_algs.R whic

3 May 30, 2022
Paddle Graph Learning (PGL) is an efficient and flexible graph learning framework based on PaddlePaddle

DOC | Quick Start | 中文 Breaking News !! 🔥 🔥 🔥 OGB-LSC KDD CUP 2021 winners announced!! (2021.06.17) Super excited to announce our PGL team won TWO

1.5k Jan 06, 2023
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Hrishikesh Kamath 31 Nov 20, 2022
Improving XGBoost survival analysis with embeddings and debiased estimators

xgbse: XGBoost Survival Embeddings "There are two cultures in the use of statistical modeling to reach conclusions from data

Loft 242 Dec 30, 2022
HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images

HistoSeg : Quick attention with multi-loss function for multi-structure segmentation in digital histology images Histological Image Segmentation This

Saad Wazir 11 Dec 16, 2022
Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

**Codebase and data are uploaded in progress. ** VOLT(-py) is a vocabulary learning codebase that allows researchers and developers to automaticaly ge

416 Jan 09, 2023
A package related to building quasi-fibration symmetries

qf A package related to building quasi-fibration symmetries. If you'd like to learn more about how it works, see the brief explanation and References

Paolo Boldi 1 Dec 01, 2021
Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant self-at

35 Oct 18, 2022
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion: A Machine Learning Library for Time Series Table of Contents Introduction Installation Documentation Getting Started Anomaly Detection Foreca

Salesforce 2.8k Dec 30, 2022
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
Meta Learning for Semi-Supervised Few-Shot Classification

few-shot-ssl-public Code for paper Meta-Learning for Semi-Supervised Few-Shot Classification. [arxiv] Dependencies cv2 numpy pandas python 2.7 / 3.5+

Mengye Ren 501 Jan 08, 2023
CoRe: Contrastive Recurrent State-Space Models

CoRe: Contrastive Recurrent State-Space Models This code implements the CoRe model and reproduces experimental results found in Robust Robotic Control

Apple 21 Aug 11, 2022
Hierarchical Few-Shot Generative Models

Hierarchical Few-Shot Generative Models Giorgio Giannone, Ole Winther This repo contains code and experiments for the paper Hierarchical Few-Shot Gene

Giorgio Giannone 6 Dec 12, 2022
Stochastic Normalizing Flows

Stochastic Normalizing Flows We introduce stochasticity in Boltzmann-generating flows. Normalizing flows are exact-probability generative models that

AI4Science group, FU Berlin (Frank Noé and co-workers) 50 Dec 16, 2022