LaBERT - A length-controllable and non-autoregressive image captioning model.

Last update: Nov 13, 2022

Overview

Length-Controllable Image Captioning (ECCV2020)

This repo provides the implemetation of the paper Length-Controllable Image Captioning.

Install

conda create --name labert python=3.7
conda activate labert

conda install pytorch=1.3.1 torchvision cudatoolkit=10.1 -c pytorch
pip install h5py tqdm transformers==2.1.1
pip install git+https://github.com/salaniz/pycocoevalcap

Data & Pre-trained Models

Prepare MSCOCO data follow link.
Download pretrained Bert and Faster-RCNN from Baidu Cloud Disk [code: 0j9f] or Google Drive.
- It's an unified checkpoint file, containing a pretrained Bert-base and the fc6 layer of the Faster-RCNN.
Download our pretrained LaBERT model from Baidu Cloud Disk [code: fpke] or Google Drive.

Scripts

Train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Continue train

python -m torch.distributed.launch \
  --nproc_per_node=$NUM_GPUS \
  --master_port=4396 train.py \
  save_dir $PATH_TO_TRAIN_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU \
  model_path $PATH_TO_MODEL

Inference

python inference.py \
  model_path $PATH_TO_MODEL \
  save_dir $PATH_TO_TEST_OUTPUT \
  samples_per_gpu $NUM_SAMPLES_PER_GPU

Evaluate

python evaluate.py \
  --gt_caption data/id2captions_test.json \
  --pd_caption $PATH_TO_TEST_OUTPUT/caption_results.json \
  --save_dir $PATH_TO_TEST_OUTPUT

Cite

Please consider citing our paper in your publications if the project helps your research.

@article{deng2020length,
  title={Length-Controllable Image Captioning},
  author={Deng, Chaorui and Ding, Ning and Tan, Mingkui and Wu, Qi},
  journal={arXiv preprint arXiv:2007.09580},
  year={2020}
}

LaBERT - A length-controllable and non-autoregressive image captioning model.

Related tags

Overview

Length-Controllable Image Captioning (ECCV2020)

Install

Data & Pre-trained Models

Scripts

Cite

Owner

bearcatt

Sign Language Transformers (CVPR'20)

End-to-end beat and downbeat tracking in the time domain.

La source de mon module 'pyfade' disponible sur Pypi.

Current state of supervised and unsupervised depth completion methods

This is the formal code implementation of the CVPR 2022 paper 'Federated Class Incremental Learning'.

CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

A bunch of random PyTorch models using PyTorch's C++ frontend

OptNet: Differentiable Optimization as a Layer in Neural Networks

Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Flax is a neural network ecosystem for JAX that is designed for flexibility.

Over-the-Air Ensemble Inference with Model Privacy

A Python framework for developing parallelized Computational Fluid Dynamics software to solve the hyperbolic 2D Euler equations on distributed, multi-block structured grids.

CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

Simple implementation of OpenAI CLIP model in PyTorch.

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs