QuALITY: Question Answering with Long Input Texts, Yes!

Related tags

Deep Learningquality
Overview

QuALITY: Question Answering with Long Input Texts, Yes!

Authors: Richard Yuanzhe Pang,* Alicia Parrish,* Nitish Joshi,* Nikita Nangia, Jason Phang, Angelica Chen, Vishakh Padmakumar, Johnny Ma, Jana Thompson, He He, and Samuel R. Bowman (* = equal contribution)

Data link

Download QuALITY v0.9 (zip).

Paper preprint

You can read the paper here.

Data README

Here are the explanations to the fields in the jsonl file. Each json line corresponds to the set of validated questions, corresponding to one article, written by one writer.

  • article_id: String. A five-digit number uniquely identifying the article. In each split, there are exactly two lines containing the same article_id, because two writers wrote questions for the same article.
  • set_unique_id: String. The unique ID corresponding to the set of questions, which corresponds to the line of json. Each set of questions is written by the same writer.
  • batch_num: String. The batch number. Our data collection is split in two groups, and there are three batches in each group. [i][j] means the j-th batch in the i-th group. For example, 23 corresponds to the third batch in the second group.
  • writer_id: String. The anonymized ID of the writer who wrote this set of questions.
  • source: String. The source of the article.
  • title: String. The title of the article.
  • author: String. The author of the article.
  • topic: String. The topic of the article.
  • url: String. The URL of the original unprocessed source article.
  • license: String. The license information for the article.
  • article: String. The HTML of the article. A script that converts HTML to plain texts is provided.
  • questions: A list of dictionaries explained below. Each line of json has a different number of questions because some questions were removed following validation.

As discussed, the value of questions is a list of dictionaries. Each dictionary has the following fields.

  • question: The question.
  • options: A list of four answer options.
  • gold_label: The correct answer, defined by a majority vote of 3 or 5 annotators + the original writer's label. The number corresponds to the option number (1-indexed) in options.
  • writer_label: The label the writer provided. The number corresponds to the option number (1-indexed) in options.
  • validation: A list of dictionaries containing the untimed validation results. Each dictionary contains the following fields.
    • untimed_annotator_id: The anonymized annotator IDs corresponding to the untimed validation results shown in untimed_answer.
    • untimed_answer: The responses in the untimed validation. Each question in the training set is annotated by three workers in most cases, and each question in the dev/test sets is annotated by five cases in most cases (see paper for exceptions).
    • untimed_eval1_answerability: The responses (represented numerically) to the first eval question in untimed validation. We asked the raters: “Is the question answerable and unambiguous?” The values correspond to the following choices:
      • 1: Yes, there is a single answer choice that is the most correct.
      • 2: No, two or more answer choices are equally correct.
      • 3: No, it is unclear what the question is asking, or the question or answer choices are unrelated to the passage.
    • untimed_eval2_context: The responses (represented numerically) to the second eval question in untimed validation. We asked the raters: “How much of the passage/text is needed as context to answer this question correctly?” The values correspond to the following choices:
      • 1: Only a sentence or two of context.
      • 2: At least a long paragraph or two of context.
      • 3: At least a third of the passage for context.
      • 4: Most or all of the passage for context.
    • untimed_eval3_distractor: The responses to the third eval question in untimed validation. We asked the raters: “Which of the options that you did not select was the best "distractor" item (i.e., an answer choice that you might be tempted to select if you hadn't read the text very closely)?” The numbers correspond to the option numbers (1-indexed).
  • speed_validation: A list of dictionaries containing the speed validation results. Each dictionary contains the following fields.
    • speed_annotator_id: The anonymized annotator IDs corresponding to the speed annotation results shown in speed_answer.
    • speed_answer: The responses in the speed validation. Each question is annotated by five workers.
  • difficult: A binary value. 1 means that less than 50% of the speed annotations answer the question correctly, so we include this question in the hard subset. Otherwise, the value is 0. In our evaluations, we report one accuracy figure for the entire dataset, and a second for the difficult=1 subset.

Validation criteria for the questions

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label (defined as the majority vote of validators' annotations together with the writer's provided label).
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.

What are the hard questions?

  • More than 50% of annotators answer the question correctly in the untimed setting. That is, more than 50% of the untimed_answer annotations agree with gold_label.
  • More than 50% of annotators think that the question is unambiguous and answerable. That is, more than 50% of the untimed_eval1_answerability annotations have 1's.
  • More than 50% of annotators answer the question incorrectly in the speed validaiton setting. That is, more than 50% of the speed_answer annotations are incorrect.

Test set

The annotations for questions in the test set will not be released. We are currently working on a leaderboard. Stay tuned for an update by early January!

Code

The code for our baseline models will be released soon. Stay tuned for an update by early January!

Citation

@article{pang2021quality,
  title={{QuALITY}: Question Answering with Long Input Texts, Yes!},
  author={Pang, Richard Yuanzhe and Parrish, Alicia and Joshi, Nitish and Nangia, Nikita and Phang, Jason and Chen, Angelica and Padmakumar, Vishakh and Ma, Johnny and Thompson, Jana and He, He and Bowman, Samuel R.},
  journal={arXiv preprint arXiv:2112.08608},
  year={2021}
}

Contact

{yzpang, alicia.v.parrish}@nyu.edu

Owner
ML² AT CILVR
The Machine Learning for Language Group at NYU CILVR
ML² AT CILVR
Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face Manipulation" published in CVPR 2020.

FFD Source Code Provided is code that demonstrates the training and evaluation of the work presented in the paper: "On the Detection of Digital Face M

88 Nov 22, 2022
Efficient Training of Audio Transformers with Patchout

PaSST: Efficient Training of Audio Transformers with Patchout This is the implementation for Efficient Training of Audio Transformers with Patchout Pa

165 Dec 26, 2022
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Implementation of ConvMixer for "Patches Are All You Need? 🤷"

Patches Are All You Need? 🤷 This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?" by Asher

CMU Locus Lab 934 Jan 08, 2023
Efficient Multi Collection Style Transfer Using GAN

Proposed a new model that can make style transfer from single style image, and allow to transfer into multiple different styles in a single model.

Zhaozheng Shen 2 Jan 15, 2022
ReferFormer - Official Implementation of ReferFormer

The official implementation of the paper: Language as Queries for Referring Vide

Jonas Wu 232 Dec 29, 2022
The Dual Memory is build from a simple CNN for the deep memory and Linear Regression fro the fast Memory

Simple-DMA a simple Dual Memory Architecture for classifications. based on the paper Dual-Memory Deep Learning Architectures for Lifelong Learning of

1 Jan 27, 2022
Denoising images with Fourier Ring Correlation loss

Denoising images with Fourier Ring Correlation loss The python code accompanies the working manuscript Image quality measurements and denoising using

2 Mar 12, 2022
Flexible-Modal Face Anti-Spoofing: A Benchmark

Flexible-Modal FAS This is the official repository of "Flexible-Modal Face Anti-

Zitong Yu 22 Nov 10, 2022
Repository of the paper Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models at ML4AD @ NeurIPS 2021.

Compressing Sensor Data for Remote Assistance of Autonomous Vehicles using Deep Generative Models Code and supplementary materials Repository of the p

Daniel Bogdoll 4 Jul 13, 2022
The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation"

SD-AANet The code is for the paper "A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation" [arxiv] Overview confi

cv516Buaa 9 Nov 07, 2022
(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Inverse Q-Learning (IQ-Learn) Official code base for IQ-Learn: Inverse soft-Q Learning for Imitation, NeurIPS '21 Spotlight IQ-Learn is an easy-to-use

Divyansh Garg 102 Dec 20, 2022
DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

DRLib:A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos A concise deep reinforcement learning libr

329 Jan 03, 2023
Implementation for paper: Self-Regulation for Semantic Segmentation

Self-Regulation for Semantic Segmentation This is the PyTorch implementation for paper Self-Regulation for Semantic Segmentation, ICCV 2021. Citing SR

Dong ZHANG 30 Nov 21, 2022
Mscp jamf - Build compliance in jamf

mscp_jamf Build compliance in Jamf. This will build the following xml pieces to

Bob Gendler 3 Jul 25, 2022
GAN Image Generator and Characterwise Image Recognizer with python

MODEL SUMMARY 모델의 구조는 크게 6단계로 나뉩니다. STEP 0: Input Image Predict 할 이미지를 모델에 입력합니다. STEP 1: Make Black and White Image STEP 1 은 입력받은 이미지의 글자를 흑색으로, 배경을

Juwan HAN 1 Feb 09, 2022
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

Google 157 Dec 26, 2022
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

High-Performance Brain-to-Text Communication via Handwriting Overview This repo is associated with this manuscript, preprint and dataset. The code can

Francis R. Willett 306 Jan 03, 2023
High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

59 Dec 14, 2022