Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Overview

Moer Grounded Image Captioning by Distilling Image-Text Matching Model

Requirements

  • Python 3.7
  • Pytorch 1.2

Prepare data

  1. Please use git clone --recurse-submodules to clone this repository and remember to follow initialization steps in coco-caption/README.md. Then download and place the Flickr30k reference file under coco-caption/annotations. Also, download Stanford CoreNLP 3.9.1 for grounding evaluation and place the uncompressed folder under the tools/ directory.
  2. Download the preprocessd dataset from this link and extract it to data/.
  3. For Flickr30k-Entities, please download bottom-up visual feature extracted by Anderson's extractor (Zhou's extractor) from this link ( link) and place the uncompressed folders under data/flickrbu/. For MSCOCO, please follow this instruction to prepare the bottom-up features and place them under data/mscoco/.
  4. Download the pretrained models from here and extract them to log/.
  5. Download the pretrained SCAN models from this link and extract them to misc/SCAN/runs.

Evaluation

To reproduce the results reported in the paper, just simply run

bash eval_flickr.sh

fro Flickr30k-Entities and

bash eval_coco.sh

for MSCOCO.

Training

  1. In the first training stage, run like
python train.py --id CE-scan-sup-0.1kl --caption_model topdown --input_json data/flickrtalk.json --input_fc_dir data/flickrbu/flickrbu_fc --input_att_dir data/flickrbu/flickrbu_att  --input_box_dir data/flickrbu/flickrbu_box  --input_label_h5 data/flickrtalk_label.h5 --batch_size 29 --learning_rate 5e-4 --learning_rate_decay_start 0 --scheduled_sampling_start 0 --checkpoint_path log/CE-scan-sup-0.1kl --save_checkpoint_every 1000 --val_images_use -1 --max_epochs 30  --att_supervise  True   --att_supervise_weight 0.1
  1. In the second training stage, run like
python train.py --id sc-ground-CE-scan-sup-0.1kl --caption_model topdown --input_json data/flickrtalk.json --input_fc_dir data/flickrbu/flickrbu_fc --input_att_dir data/flickrbu/flickrbu_att  --input_box_dir data/flickrbu/flickrbu_box  --input_label_h5 data/flickrtalk_label.h5 --batch_size 29 --learning_rate 5e-5 --start_from log/CE-scan-sup-0.1kl --checkpoint_path log/sc-ground-CE-scan-sup-0.1kl --save_checkpoint_every 1000 --language_eval 1 --val_images_use -1 --self_critical_after 30  --max_epochs  110      --cider_reward_weight  1
--ground_reward_weight   1 

Citation

@inproceedings{zhou2020grounded,
  title={More Grounded Image Captioning by Distilling Image-Text Matching Model},
  author={Zhou, Yuanen and Wang, Meng and Liu, Daqing and  Hu, Zhenzhen and Zhang, Hanwang},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2020}
}

Acknowledgements

This repository is built upon self-critical.pytorch, SCAN and grounded-video-description. Thanks for their released code.

Owner
YE Zhou
YE Zhou
AITUS - An atomatic notr maker for CYTUS

AITUS an automatic note maker for CYTUS. 利用AI根据指定乐曲生成CYTUS游戏谱面。 效果展示:https://www

GradiusTwinbee 6 Feb 24, 2022
Inkscape extensions for figure resizing and editing

Academic-Inkscape: Extensions for figure resizing and editing This repository contains several Inkscape extensions designed for editing plots. Scale P

192 Dec 26, 2022
Code for "Continuous-Time Meta-Learning with Forward Mode Differentiation" (ICLR 2022)

Continuous-Time Meta-Learning with Forward Mode Differentiation ICLR 2022 (Spotlight) - Installation - Example - Citation This repository contains the

Tristan Deleu 25 Oct 20, 2022
On Out-of-distribution Detection with Energy-based Models

On Out-of-distribution Detection with Energy-based Models This repository contains the code for the experiments conducted in the paper On Out-of-distr

Sven 19 Aug 07, 2022
Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Minimal code and simple experiments to play with Denoising Diffusion Probabilist

Rithesh Kumar 16 Oct 06, 2022
MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios This is the official TensorFlow implementation of MetaTTE in the

morningstarwang 4 Dec 14, 2022
PyTorch version implementation of DORN

DORN_PyTorch This is a PyTorch version implementation of DORN Reference H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression

Zilin.Zhang 3 Apr 27, 2022
MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research

MOOSE (Multi-organ objective segmentation) a data-centric AI solution that generates multilabel organ segmentations to facilitate systemic TB whole-person research.The pipeline is based on nn-UNet an

QIMP team 30 Jan 01, 2023
Artstation-Artistic-face-HQ Dataset (AAHQ)

Artstation-Artistic-face-HQ Dataset (AAHQ) Artstation-Artistic-face-HQ (AAHQ) is a high-quality image dataset of artistic-face images. It is proposed

onion 105 Dec 16, 2022
Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Transformer in Transformer Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image c

Phil Wang 272 Dec 23, 2022
Group Fisher Pruning for Practical Network Compression(ICML2021)

Group Fisher Pruning for Practical Network Compression (ICML2021) By Liyang Liu*, Shilong Zhang*, Zhanghui Kuang, Jing-Hao Xue, Aojun Zhou, Xinjiang W

Shilong Zhang 129 Dec 13, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
A scikit-learn-compatible module for estimating prediction intervals.

MAPIE - Model Agnostic Prediction Interval Estimator MAPIE allows you to easily estimate prediction intervals (or prediction sets) using your favourit

588 Jan 04, 2023
Build Low Code Automated Tensorflow, What-IF explainable models in just 3 lines of code.

Build Low Code Automated Tensorflow explainable models in just 3 lines of code.

Hasan Rafiq 170 Dec 26, 2022
Implementation of the paper "Shapley Explanation Networks"

Shapley Explanation Networks Implementation of the paper "Shapley Explanation Networks" at ICLR 2021. Note that this repo heavily uses the experimenta

68 Dec 27, 2022
[ICCV 2021] Relaxed Transformer Decoders for Direct Action Proposal Generation

RTD-Net (ICCV 2021) This repo holds the codes of paper: "Relaxed Transformer Decoders for Direct Action Proposal Generation", accepted in ICCV 2021. N

Multimedia Computing Group, Nanjing University 80 Nov 30, 2022
《K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters》(2020)

K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters This repository is the implementation of the paper "K-Adapter: Infusing Knowledge

Microsoft 118 Dec 13, 2022
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Learning to Estimate Hidden Motions with Global Motion Aggregation

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA) This repository contains the source code for our paper: Learning to Estimate

Shihao Jiang (Zac) 221 Dec 18, 2022
Code repository for the paper Computer Vision User Entity Behavior Analytics

Computer Vision User Entity Behavior Analytics Code repository for "Computer Vision User Entity Behavior Analytics" Code Description dataset.csv As di

Sameer Khanna 2 Aug 20, 2022