A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Last update: Dec 19, 2022

Related tags

Overview

❇️ ❇️ Please visit our Project Page to learn more about Panoptic Narrative Grounding. ❇️ ❇️

Panoptic Narrative Grounding

This repository provides a PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral). Panoptic Narrative Grounding is a spatially fine and general formulation of the natural language visual grounding problem. We establish an experimental framework for the study of this new task, including new ground truth and metrics, and we propose a strong baseline method to serve as stepping stone for future work. We exploit the intrinsic semantic richness in an image by including panoptic categories, and we approach visual grounding at a fine-grained level by using segmentations. In terms of ground truth, we propose an algorithm to automatically transfer Localized Narratives annotations to specific regions in the panoptic segmentations of the MS COCO dataset. The proposed baseline achieves a performance of 55.4 absolute Average Recall points. This result is a suitable foundation to push the envelope further in the development of methods for Panoptic Narrative Grounding.

Paper

Panoptic Narrative Grounding,
Cristina González¹, Nicolás Ayobi¹, Isabela Hernández¹, José Hernández ¹, Jordi Pont-Tuset², Pablo Arbeláez¹
ICCV 2021 Oral.

¹ Center for Research and Formation in Artificial Intelligence (CINFONIA) , Universidad de Los Andes.
²Google Research, Switzerland.

Installation

Requirements

Python
Numpy
Pytorch 1.7.1
Tqdm 4.56.0
Scipy 1.5.3

Cloning the repository

$ git clone [email protected]:BCV-Uniandes/PNG.git
$ cd PNG

Dataset Preparation

Panoptic Marrative Grounding Benchmark

Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images1 and panoptic segmentations annotations.
Download the Panoptic Narrative Grounding Benchmark and pre-computed features from our project webpage with the following folders structure:

panoptic_narrative_grounding
|_ images
|  |_ train2017
|  |_ val2017
|_ features
|  |_ train2017
|  |  |_ mask_features
|  |  |_ sem_seg_features
|  |  |_ panoptic_seg_predictions
|  |_ val2017
|     |_ mask_features
|     |_ sem_seg_features
|     |_ panoptic_seg_predictions
|_ annotations
   |_ png_coco_train2017.json
   |_ png_coco_val2017.json
   |_ panoptic_segmentation
      |_ train2017
      |_ val2017

Train setup:

Modify the routes in train_net.sh according to your local paths.

python main --init_method "tcp://localhost:8080" NUM_GPUS 1 DATA.PATH_TO_DATA_DIR path_to_your_data_dir DATA.PATH_TO_FEATURES_DIR path_to_your_features_dir OUTPUT_DIR output_dir

Test setup:

Modify the routes in test_net.sh according to your local paths.

python main --init_method "tcp://localhost:8080" NUM_GPUS 1 DATA.PATH_TO_DATA_DIR path_to_your_data_dir DATA.PATH_TO_FEATURES_DIR path_to_your_features_dir OUTPUT_DIR output_dir TRAIN.ENABLE "False"

Pretrained model

To reproduce all our results as reported bellow, you can use our pretrained model and our source code.

Method	things + stuff	things	stuff
Oracle	64.4	67.3	60.4
Ours	55.4	56.2	54.3
MCN	-	48.2	-

Method	singulars + plurals	singulars	plurals
Oracle	64.4	64.8	60.7
Ours	55.4	56.2	48.8

Citation

If you find Panoptic Narrative Grounding useful in your research, please use the following BibTeX entry for citation:

@inproceedings{gonzalez2021png,
  title={Panoptic Narrative Grounding},
  author={Gonz{\'a}lez, Cristina and Ayobi, Nicol{'\a}s and Hern{\'a}ndez, Isabela and Hern{\'a}ndez, Jose and Pont-Tuset, Jordi and Arbel{\'a}ez, Pablo},
  booktitle={ICCV},
  year={2021}
}

A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Related tags

Overview

Panoptic Narrative Grounding

Paper

Installation

Requirements

Cloning the repository

Dataset Preparation

Panoptic Marrative Grounding Benchmark

Train setup:

Test setup:

Pretrained model

Citation

Owner

Biomedical Computer Vision @ Uniandes

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

SSL_SLAM2: Lightweight 3-D Localization and Mapping for Solid-State LiDAR (mapping and localization separated) ICRA 2021

Face Mask Detection on Image and Video using tensorflow and keras

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Image-to-image regression with uncertainty quantification in PyTorch

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Fashion Landmark Estimation with HRNet

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

CLIP + VQGAN / PixelDraw

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

CLIP (Contrastive Language–Image Pre-training) trained on Indonesian data

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Trax — Deep Learning with Clear Code and Speed

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Binary classification for arrythmia detection with ECG datasets.

Picasso: a methods for embedding points in 2D in a way that respects distances while fitting a user-specified shape.

(Preprint) Official PyTorch implementation of "How Do Vision Transformers Work?"

Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Trying to understand alias-free-gan.

A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose