Repository for the Bias Benchmark for QA dataset.

Related tags

Deep LearningBBQ
Overview

BBQ

Repository for the Bias Benchmark for QA dataset.

Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Thompson, Phu Mon Htut, and Samuel R. Bowman.

About BBQ

It is well documented that NLP models learnsocial biases present in the world, but littlework has been done to show how these biasesmanifest in actual model outputs for appliedtasks like question answering (QA). We introduce the Bias Benchmark for QA (BBQ), adataset consisting of question-sets constructedby the authors that highlightattestedsocialbiases against people belonging to protectedclasses along nine different social dimensionsrelevant for U.S. English-speaking contexts.Our task evaluates model responses at two distinct levels: (i) given an under-informative context, test how strongly model answers reflectsocial biases, and (ii) given an adequately informative context, test whether the model’s biases still override a correct answer choice. Wefind that models strongly rely on stereotypeswhen the context is ambiguous, meaning thatthe model’s outputs consistently reproduceharmful biases in this setting. Though modelsare much more accurate when the context provides an unambiguous answer, they still relyon stereotyped information and achieve an accuracy 2.5 percentage points higher on examples where the correct answer aligns with a social bias, with this accuracy difference widening to over 5 points for examples targeting gender.

The paper

You can read our paper "BBQ: A Hand-Built Bias Benchmark for Question Answering" here.

File structure

  • data
    • Description: This folder contains each set of generated examples for BBQ. This is the folder you would use to test BBQ.
    • Contents: 11 jsonl files, each containing all templated examples. Each category is a separate file.
  • results
    • Description: This folder contains our results after running BBQ on UnifiedQA
    • Contents: 11 jsonl files, each containing all templated examples and three sets of results for each example line:
      • Predictions using ARC-format
      • Predictions using RACE-format
      • Predictions using a question-only baseline
  • supplemental
    • Description: Additional files used in validation and selecting names for the vocabulary
    • Contents:
      • MTurk_validation contains the HIT templates, scripts, input data, and results from our MTurk validations
      • name_job_data contains files downloaded that contain name & demographic information or occupation prestige scores for developing these portions of the vocabulary
  • templates
    • Description: This folder contains all the templates and vocabulary used to create BBQ
    • Contents: 11 csv files that contain the templates used in BBQ, 1 csv file listing all filler items used in the validation, 2 csv files for the BBQ vocabulary.
Owner
ML² AT CILVR
The Machine Learning for Language Group at NYU CILVR
ML² AT CILVR
Infrastructure as Code (IaC) for a self-hosted version of Gnosis Safe on AWS

Welcome to Yearn Gnosis Safe! Setting up your local environment Infrastructure Deploying Gnosis Safe Prerequisites 1. Create infrastructure for secret

Numan 16 Jul 18, 2022
Automatic meme generation model using Tensorflow Keras.

Memefly You can find the project at MemeflyAI. Contributors Nick Buukhalter Harsh Desai Han Lee Project Overview Trello Board Product Canvas Automatic

BloomTech Labs 2 Jan 13, 2022
Implementation of "Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency"

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency (ICCV2021) Paper Link: https://arxiv.org/abs/2107.11355 This implementation bui

32 Nov 17, 2022
Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

ori nizan 260 Nov 16, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

Oliver Hahn 1 Jan 26, 2022
Minimal fastai code needed for working with pytorch

fastai_minima A mimal version of fastai with the barebones needed to work with Pytorch #all_slow Install pip install fastai_minima How to use This lib

Zachary Mueller 14 Oct 21, 2022
An implementation of MobileFormer

MobileFormer An implementation of MobileFormer proposed by Yinpeng Chen, Xiyang Dai et al. Including [1] Mobile-Former proposed in:

slwang9353 62 Dec 28, 2022
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 329 Dec 30, 2022
Digan - Official PyTorch implementation of Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks

DIGAN (ICLR 2022) Official PyTorch implementation of "Generating Videos with Dyn

Sihyun Yu 147 Dec 31, 2022
Flask101 - FullStack Web Development with Python & JS - From TAQWA

Task: Create a CLI Calculator Step 0: Creating Virtual Environment $ python -m

Hossain Foysal 1 May 31, 2022
BanditPAM: Almost Linear-Time k-Medoids Clustering

BanditPAM: Almost Linear-Time k-Medoids Clustering This repo contains a high-performance implementation of BanditPAM from BanditPAM: Almost Linear-Tim

254 Dec 12, 2022
masscan + nmap + Finger

说明 个人根据使用习惯修改masnmap而来的一个小工具。调用masscan做全端口扫描,再调用nmap做服务识别,最后调用Finger做Web指纹识别。工具使用场景适合风险探测排查、众测等。 使用方法 安装依赖 pip3 install -r requirements.txt -i https:/

Ryan 3 Mar 25, 2022
Using machine learning to predict undergrad college admissions.

College-Prediction Project- Overview: Many have tried, many have failed. Few trailblazers are ambitious enought to chase acceptance into the top 15 un

John H Klinges 1 Jan 05, 2022
Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection".

A2S-USOD Code for our work "Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection". Code will be released upon

15 Dec 16, 2022
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

QAConv Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting This PyTorch code is proposed in

Shengcai Liao 166 Dec 28, 2022
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds PCAM: Product of Cross-Attention Matrices for Rigid Registration of P

valeo.ai 24 May 31, 2022
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
Dungeons and Dragons randomized content generator

Component based Dungeons and Dragons generator Supports Entity/Monster Generation NPC Generation Weapon Generation Encounter Generation Environment Ge

Zac 3 Dec 04, 2021
An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

Derrick 9 May 30, 2022
Implementation of U-Net and SegNet for building segmentation

Specialized project Created by Katrine Nguyen and Martin Wangen-Eriksen as a part of our specialized project at Norwegian University of Science and Te

Martin.w-e 3 Dec 07, 2022