A transformer-based method for Healthcare Image Captioning in Vietnamese

Last update: May 05, 2022

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

This repo GitHub contains our solution for vieCap4H Challenge 2021. In detail, we use grid features as visual presentation and pre-training a BERT-based language model from PhoBERT-based pre-trained model to obtain language presentation. Besides, we indicate a suitable schedule with the self-critical training sequence (SCST) technique to achieve the best results. Through experiments, we achieve an average of BLEU 30.3% on the public-test round and 28.9% on the private-test round, which ranks 3rd and 4th, respectively.

Figure 1. An overview of our solution based on RSTNet

1. Data preparation

The grid features of vieCap4H can be downloaded via links below:

Dataset can be downloaded at https://aihub.vn/competitions/40 Annotations must be converted to COCO format. We have already converted and it is available at:

viecap4h-public-train.json.

2. Training

Pre-training BERT-based model with PhoBERT-based

python train_language.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of BERT-based model should be appeared in folder saved_language_models

Then, continue to train Transformer model via command below::

python train_transformer.py \
--img_path <images path> \
--features_path <features path> \
--annotation_folder <annotations folder> \
--batch_size 40

Weights of Transformr-based model should be appeared in folder saved_transformer_rstnet_models

Where <images path> is data folder, <features path> is the path of grid features folder, <annotations folder> is the path of folder that contains file viecap4h-public-train.json.

3. Inference

The results can be obtained via command below:

python test_viecap.py

4. Pre-trained model

To implement our results on leaderboard, two pretrained models for BERT-based model and Transformer model can be downloaded via links below:

Updating...

A transformer-based method for Healthcare Image Captioning in Vietnamese

Related tags

Overview

vieCap4H Challenge 2021: A transformer-based method for Healthcare Image Captioning in Vietnamese

1. Data preparation

2. Training

3. Inference

4. Pre-trained model

Owner

Doanh B C

CL-Gym: Full-Featured PyTorch Library for Continual Learning

MatchGAN: A Self-supervised Semi-supervised Conditional Generative Adversarial Network

Toward Multimodal Image-to-Image Translation

Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

Predicts an answer in yes or no.

Source code for Acorn, the precision farming rover by Twisted Fields

Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

This repository contains the code and models for the following paper.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A PyTorch implementation of Implicit Q-Learning

A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maximum bidding

Deeper DCGAN with AE stabilization

Doge-Prediction - Coding Club prediction ig

Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Project page for our ICCV 2021 paper "The Way to my Heart is through Contrastive Learning"

This repository contain code on Novelty-Driven Binary Particle Swarm Optimisation for Truss Optimisation Problems.

HW3 ― GAN, ACGAN and UDA

A system for quickly generating training data with weak supervision