💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Last update: Nov 07, 2022

Related tags

Overview

VALSE 💃

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena. https://arxiv.org/abs/2112.07566

Data Instructions

Please find the data in the data folder. The dataset is in json format and contains the following relevant fields:

A reference to the image in the original dataset: dataset and image_file.
The valid sentence, the caption for VALSE: caption.
The altered caption, the foil.
The annotator's votes (3 annotators per sample): mturk.
- The subentry caption counts the number of annotators who chose the caption, but/and not the foil, to be the one describing the image.
- The subentry foil counts how many of the three annotators chose the foil to be (also) describing the image.
- For more information, see subsec. 4.4 and App. E of the paper.

‼️ Please be aware that the jsons are containing both valid (meaning: validated by annotators) and non-validated samples. In order to work only with the valid set, please consider filtering them:

We consider a valid foil to mean: at least two out of three annotators identified the caption, but not the foil, as the text which accurately describes the image.

This means that the valid samples of the dataset are the ones where sample["mturk"]["caption"] >= 2.

Example instance:

{
    "actions_test_0": {
        "dataset": "SWiG",
        "original_split": "test",                 # the split of the original dataset in which the sample belonged to
        "dataset_idx": "exercising_255.jpg",      # the sample id in the original dataset
        "linguistic_phenomena": "actions",        # the linguistic phenomenon targeted
        "image_file": "exercising_255.jpg",
        "caption": "A man exercises his torso.",
        "classes": "man",                         # the word of the caption that was replaced
        "classes_foil": "torso",                  # the foil word / phrase
        "mturk": {
            "foil": 0,
            "caption": 3,
            "other": 0
        },
        "foil": "A torso exercises for a man."
    }, ...
}

Images

For the images, please follow the downloading instructions of the respective original dataset. The provenance of the original images is mentioned in the json files in the field dataset.

Reference

Please cite our 💃 VALSE paper if you are using this dataset.

@misc{parcalabescu2021valse,
      title={VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena}, 
      author={Letitia Parcalabescu and Michele Cafagna and Lilitta Muradjan and Anette Frank and Iacer Calixto and Albert Gatt},
      year={2021},
      eprint={2112.07566},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

💃 VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena

Related tags

Overview

VALSE 💃

Data Instructions

Images

Reference

Owner

Heidelberg-NLP

Config files for my GitHub profile.

The code of NeurIPS 2021 paper "Scalable Rule-Based Representation Learning for Interpretable Classification".

Chess reinforcement learning by AlphaGo Zero methods.

Multi-task yolov5 with detection and segmentation based on yolov5

Colour detection is necessary to recognize objects, it is also used as a tool in various image editing and drawing apps.

Code for Temporally Abstract Partial Models

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.

ML-based medical imaging using Azure

Efficient face emotion recognition in photos and videos

A Python package to process & model ChEMBL data.

3D dataset of humans Manipulating Objects in-the-Wild (MOW)

Goal of the project : Detecting Temporal Boundaries in Sign Language videos

The codes and related files to reproduce the results for Image Similarity Challenge Track 1.

Monitora la qualità della ricezione dei segnali radio nelle province siciliane.

This program creates a formatted excel file which highlights the undervalued stock according to Graham's number.

The PASS dataset: pretrained models and how to get the data - PASS: Pictures without humAns for Self-Supervised Pretraining

CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

Official Pytorch implementation of Online Continual Learning on Class Incremental Blurry Task Configuration with Anytime Inference (ICLR 2022)

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

Official code for our CVPR '22 paper "Dataset Distillation by Matching Training Trajectories"