Few-shot NLP benchmark for unified, rigorous eval

Related tags

Deep Learningflex
Overview

FLEX

FLEX is a benchmark and framework for unified, rigorous few-shot NLP evaluation. FLEX enables:

  • First-class NLP support
  • Support for meta-training
  • Reproducible fewshot evaluations
  • Extensible benchmark creation (benchmarks defined using HuggingFace Datasets)
  • Advanced sampling functions for creating episodes with class imbalance, etc.

For more context, see our arXiv preprint.

Together with FLEX, we also released a simple yet strong few-shot model called UniFew. For more details, see our preprint.

Leaderboards

These instructions are geared towards users of the first benchmark created with this framework. The benchmark has two leaderboards, for the Pretraining-Only and Meta-Trained protocols described in Section 4.2 of our paper:

  • FLEX (Pretraining-Only): for models that do not use meta-training data related to the test tasks (do not follow the Model Training section below).
  • FLEX-META (Meta-Trained): for models that use only the provided meta-training and meta-validation data (please do see the Model Training section below).

Installation

  • Clone the repository: git clone [email protected]:allenai/flex.git
  • Create a Python 3 environment (3.7 or greater), eg using conda create --name flex python=3.9
  • Activate the environment: conda activate flex
  • Install the package locally with pip install -e .

Data Preparation

Creating the data for the flex challenge for the first time takes about 10 minutes (using a recent Macbook Pro on a broadband connection) and requires 3GB of disk space. You can initiate this process by running

python -c "import fewshot; fewshot.make_challenge('flex');"

You can control the location of the cached data by setting the environment variable HF_DATASETS_CACHE. If you have not set this variable, the location should default to ~/.cache/huggingface/datasets/. See the HuggingFace docs for more details.

Model Evaluation

"Challenges" are datasets of sampled tasks for evaluation. They are defined in fewshot/challenges/__init__.py.

To evaluate a model on challenge flex (our first challenge), you should write a program that produces a predictions.json, for example:

#!/usr/bin/env python3
import random
from typing import Iterable, Dict, Any, Sequence
import fewshot


class YourModel(fewshot.Model):
    def fit_and_predict(
        self,
        support_x: Iterable[Dict[str, Any]],
        support_y: Iterable[str],
        target_x: Iterable[Dict[str, Any]],
        metadata: Dict[str, Any]
    ) -> Sequence[str]:
        """Return random label predictions for a fewshot task."""
        train_x = [d['txt'] for d in support_x]
        train_y = support_y
        test_x = [d['txt'] for d in target_x]
        test_y = [random.choice(metadata['labels']) for _ in test_x]
        # >>> print(test_y)
        # ['some', 'list', 'of', 'label', 'predictions']
        return test_y


if __name__ == '__main__':
    evaluator = fewshot.make_challenge("flex")
    model = YourModel()
    evaluator.save_model_predictions(model=model, save_path='/path/to/predictions.json')

Warning: Calling fewshot.make_challenge("flex") above requires some time to prepare all the necessary data (see "Data preparation" section).

Running the above script produces /path/to/predictions.json with contents formatted as:

{
    "[QUESTION_ID]": {
        "label": "[CLASS_LABEL]",  # Currently an integer converted to a string
        "score": float  # Only used for ranking tasks
    },
    ...
}

Each [QUESTION_ID] is an ID for a test example in a few-shot problem.

[Optional] Parallelizing Evaluation

Two options are available for parallelizing evaluation.

First, one can restrict evaluation to a subset of tasks with indices from [START] to [STOP] (exclusive) via

evaluator.save_model_predictions(model=model, start_task_index=[START], stop_task_index=[STOP])

Notes:

  • You may use stop_task_index=None (or omit it) to avoid specifying an end.
  • You can find the total number of tasks in the challenge with fewshot.get_challenge_spec([CHALLENGE]).num_tasks.
  • To merge partial evaluation outputs into a complete predictions.json file, use fewshot merge partial1.json partial2.json ... predictions.json.

The second option will call your model's .fit_and_predict() method with batches of [BATCH_SIZE] tasks, via

evaluator.save_model_predictions(model=model, batched=True, batch_size=[BATCH_SIZE])

Result Validation and Scoring

To validate the contents of your predictions, run:

fewshot validate --challenge_name flex --predictions /path/to/predictions.json

This validates all the inputs and takes some time. Substitute flex for another challenge to evaluate on a different challenge.

(There is also a score CLI command which should not be used on the final challenge except when reporting final results.)

Model Training

For the meta-training protocol (e.g., the FLEX-META leaderboard), challenges come with a set of related training and validation data. This data is most easily accessible in one of two formats:

  1. Iterable from sampled episodes. fewshot.get_challenge_spec('flex').get_sampler(split='[SPLIT]') returns an iterable that samples datasets and episodes from meta-training or meta-validation datasets, via [SPLIT]='train' or [SPLIT]='val', respectively. The sampler defaults to the fewshot.samplers.Sample2WayMax8ShotCfg sampler configuration (for the fewshot.samplers.sample.Sampler class), but can be reconfigured.

  2. Raw dataset stores. This option is for directly accessing the raw data. fewshot.get_challenge_spec('flex').get_stores(split='[SPLIT']) returns a mapping from dataset names to fewshot.datasets.store.Store instances. Each Store instance has a Store.store attribute containing a raw HuggingFace Dataset instance. The Store instance has a Store.label attribute with the Dataset object key for accessing the target label (e.g., via Store.store[Store.label]) and the FLEX-formatted text available at the flex.txt key (e.g., via Store.store['flex.txt']).

Two examples of these respective approaches are available at:

  1. The UniFew model repository. For more details on Unifew, see also the FLEX Arxiv paper.
  2. The baselines/bao/ directory, for training and evaluating the approach described in the following paper:

Yujia Bao*, Menghua Wu*, Shiyu Chang, and Regina Barzilay. Few-shot Text Classification with Distributional Signatures. In International Conference on Learning Representations 2020

Benchmark Construction and Optimization

To add a new benchmark (challenge) named [NEW_CHALLENGE], you must edit fewshot/challenges/__init__.py or otherwise add it to the registry. The above usage instructions would change to substitute [NEW_CHALLENGE] in place of flex when calling fewshot.get_challenge_spec('[NEW_CHALLENGE]') and fewshot.make_challenge('[NEW_CHALLENGE]').

For an example of how to optimize the sample size of the challenge, see scripts/README-sample-size.md.

Attribution

If you make use of our framework, benchmark, or model, please cite our preprint:

@misc{bragg2021flex,
      title={FLEX: Unifying Evaluation for Few-Shot NLP},
      author={Jonathan Bragg and Arman Cohan and Kyle Lo and Iz Beltagy},
      year={2021},
      eprint={2107.07170},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
SpineAI Bilsky Grading With Python

SpineAI-Bilsky-Grading SpineAI Paper with Code 📫 Contact Address correspondence to J.T.P.D.H. (e-mail: james_hallinan AT nuhs.edu.sg) Disclaimer This

<a href=[email protected]"> 2 Dec 16, 2021
GUI for a Vocal Remover that uses Deep Neural Networks.

GUI for a Vocal Remover that uses Deep Neural Networks.

4.4k Jan 07, 2023
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

101 Nov 25, 2022
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch

Learning to Communicate with Deep Multi-Agent Reinforcement Learning This is a PyTorch implementation of the original Lua code release. Overview This

Minqi 297 Dec 12, 2022
Open source code for the paper of Neural Sparse Voxel Fields.

Neural Sparse Voxel Fields (NSVF) Project Page | Video | Paper | Data Photo-realistic free-viewpoint rendering of real-world scenes using classical co

Meta Research 647 Dec 27, 2022
This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

Hyungtae Lim 225 Dec 29, 2022
Official Pytorch implementation of "DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network" (CVPR'21)

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network Pytorch implementation for our DivCo. We propose a simple ye

64 Nov 22, 2022
code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning Overview This code is for paper: Not All Unlabeled Data are Equa

Jason Ren 22 Nov 23, 2022
Unofficial PyTorch implementation of Guided Dropout

Unofficial PyTorch implementation of Guided Dropout This is a simple implementation of Guided Dropout for research. We try to reproduce the algorithm

2 Jan 07, 2022
This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

CPM 项目描述 CPM(Chinese Pretrained Models)模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型,参数量分别为109M、334M、2.6B,用户需申请与通过审核,方可下载。 由于原项目需要考虑大模型的训练和使用,需要安装较为复杂

hzwer 190 Jan 08, 2023
FS-Mol: A Few-Shot Learning Dataset of Molecules

FS-Mol is A Few-Shot Learning Dataset of Molecules, containing molecular compounds with measurements of activity against a variety of protein targets. The dataset is presented with a model evaluation

Microsoft 114 Dec 15, 2022
Can we learn gradients by Hamiltonian Neural Networks?

Can we learn gradients by Hamiltonian Neural Networks? This project was carried out as part of the Optimization for Machine Learning course (CS-439) a

2 Aug 22, 2022
Attention for PyTorch with Linear Memory Footprint

Attention for PyTorch with Linear Memory Footprint Unofficially implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention (+

11 Jan 09, 2022
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
Recommendation algorithms for large graphs

Fast recommendation algorithms for large graphs based on link analysis. License: Apache Software License Author: Emmanouil (Manios) Krasanakis Depende

Multimedia Knowledge and Social Analytics Lab 27 Jan 07, 2023
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
offical implement of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021

LifelongReID Offical implementation of our Lifelong Person Re-Identification via Adaptive Knowledge Accumulation in CVPR2021 by Nan Pu, Wei Chen, Yu L

PeterPu 76 Dec 08, 2022
The missing CMake project initializer

cmake-init - The missing CMake project initializer Opinionated CMake project initializer to generate CMake projects that are FetchContent ready, separ

1k Jan 01, 2023
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 07, 2023
Deep-Learning-Book-Chapter-Summaries - Attempting to make the Deep Learning Book easier to understand.

Deep-Learning-Book-Chapter-Summaries This repository provides a summary for each chapter of the Deep Learning book by Ian Goodfellow, Yoshua Bengio an

Aman Dalmia 1k Dec 27, 2022