PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Last update: Nov 17, 2022

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

This is the PyTorch implementation of our paper:

Learning to Generate Grounded Visual Captions without Localization Supervision
Chih-Yao Ma, Yannis Kalantidis, Ghassan AlRegib, Peter Vajda, Marcus Rohrbach, Zsolt Kira
European Conference on Computer Vision (ECCV), 2020

[arXiv] [GitHub] [Project]

10-min YouTube Video

How to start

Clone the repo recursively:

git clone --recursive [email protected]:chihyaoma/cyclical-visual-captioning.git

If you didn't clone with the --recursive flag, then you'll need to manually clone the pybind submodule from the top-level directory:

git submodule update --init --recursive

Installation

The proposed cyclical method can be applied directly to image and video captioning tasks.

Currently, installation guide and our code for video captioning on the ActivityNet-Entities dataset are provided in anet-video-captioning.

Acknowledgments

Chih-Yao Ma and Zsolt Kira were partly supported by DARPA’s Lifelong Learning Machines (L2M) program, under Cooperative Agreement HR0011-18-2-0019, as part of their affiliation with Georgia Tech. We thank Chia-Jung Hsu for her valuable and artistic helps on the figures.

Citation

If you find this repository useful, please cite our paper:

@inproceedings{ma2020learning,
    title={Learning to Generate Grounded Image Captions without Localization Supervision},
    author={Ma, Chih-Yao and Kalantidis, Yannis and AlRegib, Ghassan and Vajda, Peter and Rohrbach, Marcus and Kira, Zsolt},
    booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
    year={2020},
    url={https://arxiv.org/abs/1906.00283},
}

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Related tags

Overview

Learning to Generate Grounded Visual Captions without Localization Supervision

10-min YouTube Video

How to start

Installation

Acknowledgments

Citation

Owner

Chih-Yao Ma

Segmentation models with pretrained backbones. Keras and TensorFlow Keras.

[ECCV 2020] Reimplementation of 3DDFAv2, including face mesh, head pose, landmarks, and more.

HW3 ― GAN, ACGAN and UDA

Code for the paper "How Attentive are Graph Attention Networks?"

Malware Env for OpenAI Gym

natural image generation using ConvNets

Code for "LoRA: Low-Rank Adaptation of Large Language Models"

A list of all papers and resoureces on Semantic Segmentation

Code for the paper 'A High Performance CRF Model for Clothes Parsing'.

General purpose GPU compute framework for cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends)

Lazy, a tool for running things in idle time

(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

Tensorforce: a TensorFlow library for applied reinforcement learning

Решения, подсказки, тесты и утилиты для тренировки по алгоритмам от Яндекса.

Code for GNMR in ICDE 2021

Alphabetical Letter Recognition

The official code of "SCROLLS: Standardized CompaRison Over Long Language Sequences".

Controlling Hill Climb Racing with Hand Tacking

Utilities to bridge Canvas-generated course rosters with GitLab's API.

Reference implementation for Deep Unsupervised Learning using Nonequilibrium Thermodynamics