Automatic caption evaluation metric based on typicality analysis.

Last update: Jan 09, 2022

Related tags

Overview

SeMantic and linguistic UndeRstanding Fusion (SMURF)

Automatic caption evaluation metric described in the paper "SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis" (ACL 2021).

arXiv: https://arxiv.org/abs/2106.01444

ACL Anthology: https://aclanthology.org/2021.acl-long.175/

Overview

SMURF is an automatic caption evaluation metric that combines a novel semantic evaluation algorithm (SPARCS) and novel fluency evaluation algorithms (SPURTS and MIMA) for both caption-level and system-level analysis. These evaluations were developed to be generalizable and as a result demonstrate a high correlation with human judgment across many relevant datasets. See paper for more details.

Requirements

You can run requirements/install.sh to quickly install all the requirements in an Anaconda environment. The requirements are:

python 3
torch>=1.0.0
numpy
nltk>=3.5.0
pandas>=1.0.1
matplotlib
transformers>=3.0.0
shapely
sklearn
sentencepiece

Usage

./smurf_example.py provides working examples of the following functions:

Caption-Level Scoring

Returns a dictionary with scores for semantic similarity between reference captions and candidate captions (SPARCS), style/diction quality of candidate text (SPURTS), grammar outlier penalty of candidate text (MIMA), and the fusion of these scores (SMURF). Input sentences should be preprocessed before being fed into the smurf_eval_captions object as shown in the example. Evaluations with SPARCS require a list of reference sentences while evaluations with SPURTS and MIMA do not use reference sentences.

System-Level Analysis

After reading in and standardizing caption-level scores, generates a plot that can be used to give an overall evaluation of captioner performances along with relevant system-level scores (intersection with reference captioner and total grammar outlier penalties) for each captioner. An example of such a plot is shown below:

The number of captioners you are comparing should be specified when instantiating a smurf_system_analysis object. In order to generate the plot correctly, the captions fed into the caption-level scoring for each candidate captioner (C1, C2,...) should be organized in the following format with the C1 captioner as the ground truth:

[C1 image 1 output, C2 image 1 output,..., C1 image 2 output, C2 image 2 output,...].

Author/Maintainer:

Joshua Feinglass (https://scholar.google.com/citations?user=V2h3z7oAAAAJ&hl=en)

If you find this repo useful, please cite:

@inproceedings{feinglass2021smurf,
  title={SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis},
  author={Joshua Feinglass and Yezhou Yang},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  year={2021},
  url={https://aclanthology.org/2021.acl-long.175/}
}

Automatic caption evaluation metric based on typicality analysis.

Related tags

Overview

SeMantic and linguistic UndeRstanding Fusion (SMURF)

Overview

Requirements

Usage

Caption-Level Scoring

System-Level Analysis

Author/Maintainer:

Owner

Joshua Feinglass

Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

Hand tracking demo for DIY Smart Glasses with a remote computer doing the work

The project of phase's key role in complex and real NN

PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

Deeper DCGAN with AE stabilization

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.

Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

PyTorch implementation of the supervised learning experiments from the paper Model-Agnostic Meta-Learning (MAML)

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

Offcial implementation of "A Hybrid Video Anomaly Detection Framework via Memory-Augmented Flow Reconstruction and Flow-Guided Frame Prediction, ICCV-2021".

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

KaziText is a tool for modelling common human errors.

Repo for code associated with Modeling the Mitral Valve.

On Nonlinear Latent Transformations for GAN-based Image Editing - PyTorch implementation

Optical machine for senses sensing using speckle and deep learning

Code related to the manuscript "Averting A Crisis In Simulation-Based Inference"