QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Deep Learningquality
Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman (* = equal contribution)

Data link

Download QuALITY v0.9 (zip).

Paper preprint

You can read the paper here.

Data README

Here are the explanations to the fields in the jsonl file. Each json line corresponds to the set of validated questions, corresponding to one article, written by one writer.

  • article_id: String. A five-digit number uniquely identifying the article. In each split, there are exactly two lines containing the same article_id, because two writers wrote questions for the same article.
  • set_unique_id: String. The unique ID corresponding to the set of questions, which corresponds to the line of json. Each set of questions is written by the same writer.
  • batch_num: String. The batch number. Our data collection is split in two groups, and there are three batches in each group. [i][j] means the j-th batch in the i-th group. For example, 23 corresponds to the third batch in the second group.
  • writer_id: String. The anonymized ID of the writer who wrote this set of questions.
  • source: String. The source of the article.
  • title: String. The title of the article.
  • author: String. The author of the article.
  • topic: String. The topic of the article.
  • url: String. The URL of the original unprocessed source article.
  • license: String. The license information for the article.
  • article: String. The HTML of the article. A script that converts HTML to plain texts is provided.
  • questions: A list of dictionaries explained below. Each line of json has a different number of questions because some questions were removed following validation.

As discussed, the value of questions is a list of dictionaries. Each dictionary has the following fields.

  • question: The question.
  • options: A list of four answer options.
  • gold_label: The correct answer, defined by a majority vote of 3 or 5 annotators + the original writer's label. The number corresponds to the option number (1-indexed) in options.
  • writer_label: The label the writer provided. The number corresponds to the option number (1-indexed) in options.
  • validation: A list of dictionaries containing the untimed validation results. Each dictionary contains the following fields.
    • untimed_annotator_id: The anonymized annotator IDs corresponding to the untimed validation results shown in untimed_answer.
    • untimed_answer: The responses in the untimed validation. Each question in the training set is annotated by three workers in most cases, and each question in the dev/test sets is annotated by five cases in most cases (see paper for exceptions).
    • untimed_eval1_answerability: The responses (represented numerically) to the first eval question in untimed validation. We asked the raters: “Is the question answerable and unambiguous?” The values correspond to the following choices:
      • 1: Yes, there is a single answer choice that is the most correct.
      • 2: No, two or more answer choices are equally correct.
      • 3: No, it is unclear what the question is asking, or the question or answer choices are unrelated to the passage.
    • untimed_eval2_context: The responses (represented numerically) to the second eval question in untimed validation. We asked the raters: “How much of the passage/text is needed as context to answer this question correctly?” The values correspond to the following choices:
      • 1: Only a sentence or two of context.
      • 2: At least a long paragraph or two of context.
      • 3: At least a third of the passage for context.
      • 4: Most or all of the passage for context.
    • untimed_eval3_distractor: The responses to the third eval question in untimed validation. We asked the raters: “Which of the options that you did not select was the best "distractor" item (i.e., an answer choice that you might be tempted to select if you hadn't read the text very closely)?” The numbers correspond to the option numbers (1-indexed).
  • speed_validation: A list of dictionaries containing the speed validation results. Each dictionary contains the following fields.
    • speed_annotator_id: The anonymized annotator IDs corresponding to the speed annotation results shown in speed_answer.
    • speed_answer: The responses in the speed validation. Each question is annotated by five workers.
  • difficult: A binary value. 1 means that less than 50% of the speed annotations answer the question correctly, so we include this question in the hard subset. Otherwise, the value is 0. In our evaluations, we report one accuracy figure for the entire dataset, and a second for the difficult=1 subset.

Validation criteria for the questions

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label (defined as the majority vote of validators' annotations together with the writer's provided label).
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.

What are the hard questions?

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label.
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.
  • More than 50% of annotators answer the question incorrectly in the speed validaiton setting. That is, more than 50% of the speed_answer annotations are incorrect.

Test set

The annotations for questions in the test set will not be released. We are currently working on a leaderboard. Stay tuned for an update by early January!

Code

The code for our baseline models will be released soon. Stay tuned for an update by early January!

Citation

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

Contact

{yzpang, alicia.v.parrish}@nyu.edu

Owner
ML² AT CILVR
The Machine Learning for Language Group at NYU CILVR
ML² AT CILVR
Differentiable Abundance Matching With Python

shamnet Differentiable Stellar Population Synthesis Installation You can install shamnet with pip. Installation dependencies are numpy, jax, corrfunc,

5 Dec 17, 2021
A library of extension and helper modules for Python's data analysis and machine learning libraries.

Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks. Sebastian Raschka 2014-2020 Links Doc

Sebastian Raschka 4.2k Jan 02, 2023
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples

Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples This repository is the official implementation of paper [Qimera: Data-free Q

Kanghyun Choi 21 Nov 03, 2022
This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

PlyTitle_Generation This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach. The paper has been accepted by

SeungHeonDoh 6 Jan 03, 2022
code release for USENIX'22 paper `On the Security Risks of AutoML`

This project is a minimized runnable project cut from trojanzoo, which contains more datasets, models, attacks and defenses. This repo will not be mai

Ren Pang 5 Apr 19, 2022
buildseg is a building extraction plugin of QGIS based on PaddlePaddle.

buildseg buildseg is a building extraction plugin of QGIS based on PaddlePaddle. TODO Extract building on 512x512 remote sensing images. Extract build

Yizhou Chen 11 Sep 26, 2022
Neural Tangent Generalization Attacks (NTGA)

Neural Tangent Generalization Attacks (NTGA) ICML 2021 Video | Paper | Quickstart | Results | Unlearnable Datasets | Competitions | Citation Overview

Chia-Hung Yuan 34 Nov 25, 2022
Machine Learning in Asset Management (by @firmai)

Machine Learning in Asset Management If you like this type of content then visit ML Quant site below: https://www.ml-quant.com/ Part One Follow this l

Derek Snow 1.5k Jan 02, 2023
NumPy로 구현한 딥러닝 라이브러리입니다. (자동 미분 지원)

Deep Learning Library only using NumPy 본 레포지토리는 NumPy 만으로 구현한 딥러닝 라이브러리입니다. 자동 미분이 구현되어 있습니다. 자동 미분 자동 미분은 미분을 자동으로 계산해주는 기능입니다. 아래 코드는 자동 미분을 활용해 역전파

조준희 17 Aug 16, 2022
Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes This repository contains the source code for Full

Yu Nishimura 106 Nov 21, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
Official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution"

RealBasicVSR [Paper] This is the official repository of "Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv". This repository contain

Kelvin C.K. Chan 566 Dec 28, 2022
All the essential resources and template code needed to understand and practice data structures and algorithms in python with few small projects to demonstrate their practical application.

Data Structures and Algorithms Python INDEX 1. Resources - Books Data Structures - Reema Thareja competitiveCoding Big-O Cheat Sheet DAA Syllabus Inte

Shushrut Kumar 129 Dec 15, 2022
A pytorch reprelication of the model-based reinforcement learning algorithm MBPO

Overview This is a re-implementation of the model-based RL algorithm MBPO in pytorch as described in the following paper: When to Trust Your Model: Mo

Xingyu Lin 93 Jan 05, 2023
A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

Fully Distributed CIDACS-RL The CIDACS-RL is a brazillian record linkage tool suitable to integrate large amount of data with high accuracy. However,

Robespierre Pita 5 Nov 04, 2022
Fully Convolutional DenseNets for semantic segmentation.

Introduction This repo contains the code to train and evaluate FC-DenseNets as described in The One Hundred Layers Tiramisu: Fully Convolutional Dense

485 Nov 26, 2022
Discriminative Condition-Aware PLDA

DCA-PLDA This repository implements the Discriminative Condition-Aware Backend described in the paper: L. Ferrer, M. McLaren, and N. Brümmer, "A Speak

Luciana Ferrer 31 Aug 05, 2022
Kaggle-titanic - A tutorial for Kaggle's Titanic: Machine Learning from Disaster competition. Demonstrates basic data munging, analysis, and visualization techniques. Shows examples of supervised machine learning techniques.

Kaggle-titanic This is a tutorial in an IPython Notebook for the Kaggle competition, Titanic Machine Learning From Disaster. The goal of this reposito

Andrew Conti 800 Dec 15, 2022
Python Multi-Agent Reinforcement Learning framework

- Please pay attention to the version of SC2 you are using for your experiments. - Performance is *not* always comparable between versions. - The re

whirl 1.3k Jan 05, 2023