WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Last update: Oct 28, 2022

Related tags

Overview

Paper "Improving image captioning with better use of captions"

@inproceedings{shi2020improving,
  title={Improving Image Captioning with Better Use of Caption},
  author={Shi, Zhan and Zhou, Xu and Qiu, Xipeng and Zhu, Xiaodan},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={7454--7464},
  year={2020}
}

Requirements

python 2.7.15

torch 1.0.1

Specific conda env is shown in ezs.yml

BTW, you need to download coco-captions and cider folder in this directory for evaluation.

Data Files and Models

Files: Add files in data directory in google drive or [baidu netdisk](链接：https://pan.baidu.com/s/1ddtfdlwD65cm4JmVu6GF3w 提取码：39pa) to data directory here. See data/README for more details.

Models: Add log directory in google drive or or [baidu netdisk](链接：https://pan.baidu.com/s/1ddtfdlwD65cm4JmVu6GF3w 提取码：39pa) here.

Scripts

MLE training:

python train.py --gpus 0 --id experiment-mle

RL training

python train.py --gpus 0 --id experiment-rl --learning_rate 2e-5 --resume_from experiment-mle --resume_from_best True --self_critical_after 0 --max_epochs 60 --learning_rate_decay_start -1 --scheduled_sampling_start -1 --reduce_on_plateau

Evaluate your own model or Load trained model:

python eval.py --gpus 0 --resume_from experiment-mle

and

python eval.py --gpus 0 --resume_from experiment-rl

Acknowledgement

This code is based on Ruotian Luo's brilliant image captioning repo ruotianluo/self-critical.pytorch. We use the detected bounding boxes/categories/features provided by Bottom-Up peteanderson80/bottom-up-attention, yangxuntu/SGAE. Many thanks for their work!

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

Related tags

Overview

Paper "Improving image captioning with better use of captions"

Requirements

Data Files and Models

Scripts

Acknowledgement

Owner

Cereal box identification in store shelves using computer vision and a single train image per model.

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Group Fisher Pruning for Practical Network Compression(ICML2021)

[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

This is RFA-Toolbox, a simple and easy-to-use library that allows you to optimize your neural network architectures using receptive field analysis (RFA) and create graph visualizations of your architecture.

Raster Vision is an open source Python framework for building computer vision models on satellite, aerial, and other large imagery sets

Uses Open AI Gym environment to create autonomous cryptocurrency bot to trade cryptocurrencies.

The implementation for "Comprehensive Knowledge Distillation with Causal Intervention".

Cowsay - A rewrite of cowsay in python

Pytorch implementation of our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.

Text Generation by Learning from Demonstrations

Baseline for the Spoofing-aware Speaker Verification Challenge 2022

NeurIPS 2021 Datasets and Benchmarks Track

Gym Threat Defense

This is a demo app to be used in the video streaming applications

Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Interactive Terraform visualization. State and configuration explorer.

Multiview 3D object detection on MultiviewC dataset through moft3d.

Contrastive Fact Verification