Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

Overview

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Directory Structure

data/ --> data folder including splits we use for FEVER, zsRE, Wikidata5m, and LeapOfThought
training_reports/ --> folder to be populated with individual training run reports produced by main.py
result_sheets/ --> folder to be populated with .csv's of results from experiments produced by main.py
aggregated_results/ --> contains combined experiment results produced by run_jobs.py
outputs/ --> folder to be populated with analysis results, including belief graphs and bootstrap outputs
models/ --> contains model wrappers for Huggingface models and the learned optimizer code
data_utils/ --> contains scripts for making all datasets used in paper
main.py --> main script for all individual experiments in the paper
metrics.py --> functions for calculing metrics reported in the paper
utils.py --> data loading and miscellaneous utilities
run_jobs.py --> script for running groups of experiments
statistical_analysis.py --> script for running bootstraps with the experimental results
data_analysis.Rmd --> R markdown file that makes plots using .csv's in result_sheets
requirements.txt --> contains required packages

Requirements

The code is compatible with Python 3.6+. data_analysis.Rmd is an R markdown file that makes all the plots in the paper.

The required packages can be installed by running:

pip install -r requirements.txt

If you wish to visualize belief graphs, you should also install a few packages as so:

sudo apt install python-pydot python-pydot-ng graphviz

Making Data

We include the data splits from the paper in data/ (though the train split for Wikidata5m is divided into two files that need to be locally combined.) To construct the datasets from scratch, you can follow a few steps:

  1. Set the DATA_DIR environment variable to where you'd like the data to be stored. Set the CODE_DIR to point to the directory where this code is.
  2. Run the following blocks of code

Make FEVER and ZSRE

cd $DATA_DIR
git clone https://github.com/facebookresearch/KILT.git
cd KILT
mkdir data
python scripts/download_all_kilt_data.py
mv data/* ./
cd $CODE_DIR
python data_utils/shuffle_fever_splits.py
python data_utils/shuffle_zsre_splits.py

Make Leap-Of-Thought

cd $DATA_DIR
git clone https://github.com/alontalmor/LeapOfThought.git
cd LeapOfThought
python -m LeapOfThought.run -c Hypernyms --artiset_module soft_reasoning -o build_artificial_dataset -v training_mix -out taxonomic_reasonings.jsonl.gz
gunzip taxonomic_reasonings_training_mix_train.jsonl.gz taxonomic_reasonings_training_mix_dev.jsonl.gz taxonomic_reasonings_training_mix_test.jsonl.gz taxonomic_reasonings_training_mix_meta.jsonl.gz
cd $CODE_DIR
python data_utils/shuffle_leapofthought_splits.py

Make Wikidata5m

cd $DATA_DIR
mkdir Wikidata5m
cd Wikidata5m
wget https://www.dropbox.com/s/6sbhm0rwo4l73jq/wikidata5m_transductive.tar.gz
wget https://www.dropbox.com/s/lnbhc8yuhit4wm5/wikidata5m_alias.tar.gz
tar -xvzf wikidata5m_transductive.tar.gz
tar -xvzf wikidata5m_alias.tar.gz
cd $CODE_DIR
python data_utils/filter_wikidata.py

Experiment Replication

Experiment commands require a few arguments: --data_dir points to where the data is. --save_dir points to where models should be saved. --cache_dir points to where pretrained models will be stored. --gpu indicates the GPU device number. --seeds indicates how many seeds per condition to run. We give commands below for the experiments in the paper, saving everything in $DATA_DIR.

To train the task and prepare the necessary data for training learned optimizers, run:

python run_jobs.py -e task_model --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e write_LeapOfThought_preds --seeds 5 --dataset LeapOfThought --do_train false --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get the main experiments in a single-update setting, run:

python run_jobs.py -e learned_opt_main --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

For results in a sequential-update setting (with r=10) run:

python run_jobs.py -e learned_opt_r_main --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get the corresponding off-the-shelf optimizer baselines for these experiments, run

python run_jobs.py -e base_optimizers --seeds 5 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e base_optimizers_r_main --seeds 5 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

To get ablations across values of r for the learned optimizer and baselines, run

python run_jobs.py -e base_optimizers_r_ablation --seeds 1 --do_train false  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

Next we give commands for for ablations across k, the choice of training labels, the choice of evaluation labels, training objective terms, and a comparison to the objective from de Cao (in order):

python run_jobs.py -e learned_opt_k_ablation --seeds 1 --dataset ZSRE  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_label_ablation --seeds 1 --dataset ZSRE --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_eval_ablation --seeds 1 --dataset ZSRE  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_objective_ablation --seeds 1 --dataset all  --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR
python run_jobs.py -e learned_opt_de_cao --seeds 5 --dataset all --data_dir $DATA_DIR --save_dir $DATA_DIR --cache_dir $DATA_DIR

Analysis

Statistical Tests

After running an experiment from above, you can compute confidence intervals and hypothesis tests using statistical_analysis.py.

To get confidence intervals for the main single-update learned optimizer experiments, run

python statistical_analysis -e learned_opt_main -n 10000

To run hypothesis tests between statistics for the learned opt experiment and its baselines, run

python statistical_analysis -e learned_opt_main -n 10000 --hypothesis_tests true

You can substitute the experiment name for results for other conditions.

Belief Graphs

Add --save_dir, --cache_dir, and --data_dir arguments to the commands below per the instructions above.

Write preds from FEVER model:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --write_preds_to_file true

Write graph to file:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer adamw --lr 1e-6 --update_steps 100 --update_all_points true --write_graph_to_file true --use_dev_not_test false --num_random_other 10444

Analyze graph:
python main.py --dataset FEVER --probing_style model --probe linear --model roberta-base --seed 0 --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --use_dev_not_test false --optimizer adamw --lr 1e-6 --update_steps 100 --do_train false --do_eval false --pre_eval false --do_graph_analysis true

Combine LeapOfThought Main Inputs and Entailed Data:
python data_utils/combine_leapofthought_data.py

Write LeapOfThought preds to file:
python main.py --dataset LeapOfThought --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --write_preds_to_file true --leapofthought_main main

Write graph for LeapOfThought:
python main.py --dataset LeapOfThought --leapofthought_main main --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer sgd --update_steps 100 --lr 1e-2 --update_all_points true --write_graph_to_file true --use_dev_not_test false --num_random_other 8642

Analyze graph (add --num_eval_points 2000 to compute update-transitivity):
python main.py --dataset LeapOfThought --leapofthought_main main --probing_style model --probe linear --model roberta-base --seed 0 --do_train false --do_eval true --test_batch_size 64 --update_eval_truthfully false --fit_to_alt_labels true --update_beliefs true --optimizer sgd --update_steps 100 --lr 1e-2 --do_train false --do_eval false --pre_eval false --do_graph_analysis true

Plots

The data_analysis.Rmd R markdown file contains code for plots in the paper. It reads data from aggregated_results and saves plots in a ./figures directory.

Owner
Peter Hase
I am a PhD student in the UNC-NLP group at UNC Chapel Hill.
Peter Hase
Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

SangHun 61 Dec 27, 2022
Free course that takes you from zero to Reinforcement Learning PRO 🦸🏻‍🦸🏽

The Hands-on Reinforcement Learning course 🚀 From zero to HERO 🦸🏻‍🦸🏽 Out of intense complexities, intense simplicities emerge. -- Winston Churchi

Pau Labarta Bajo 260 Dec 28, 2022
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data

MarcoPolo is a method to discover differentially expressed genes in single-cell RNA-seq data without depending on prior clustering Overview MarcoPolo

Chanwoo Kim 13 Dec 18, 2022
An implementation for Neural Architecture Search with Random Labels (CVPR 2021 poster) on Pytorch.

Neural Architecture Search with Random Labels(RLNAS) Introduction This project provides an implementation for Neural Architecture Search with Random L

18 Nov 08, 2022
A Bayesian cognition approach for belief updating of correlation judgement through uncertainty visualizations

Overview Code and supplemental materials for Karduni et al., 2020 IEEE Vis. "A Bayesian cognition approach for belief updating of correlation judgemen

Ryan Wesslen 1 Feb 08, 2022
Detecting drunk people through thermal images using Deep Learning (CNN)

Drunk Detection CNN Detecting drunk people through thermal images using Deep Learning (CNN) Dataset We used thermal images provided by Electronics Lab

Giacomo Ferretti 3 Oct 27, 2022
Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Ponder(ing) Transformer Implementation of a Transformer that learns to adapt the number of computational steps it takes depending on the difficulty of

Phil Wang 65 Oct 04, 2022
Datasets, tools, and benchmarks for representation learning of code.

The CodeSearchNet challenge has been concluded We would like to thank all participants for their submissions and we hope that this challenge provided

GitHub 1.8k Dec 25, 2022
List of awesome things around semantic segmentation 🎉

Awesome Semantic Segmentation List of awesome things around semantic segmentation 🎉 Semantic segmentation is a computer vision task in which we label

Dam Minh Tien 18 Nov 26, 2022
An Industrial Grade Federated Learning Framework

DOC | Quick Start | 中文 FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure comput

Federated AI Ecosystem 4.8k Jan 09, 2023
A framework for GPU based high-performance medical image processing and visualization

FAST is an open-source cross-platform framework with the main goal of making it easier to do high-performance processing and visualization of medical images on heterogeneous systems utilizing both mu

Erik Smistad 315 Dec 30, 2022
Get a Grip! - A robotic system for remote clinical environments.

Get a Grip! Within clinical environments, sterilization is an essential procedure for disinfecting surgical and medical instruments. For our engineeri

Jay Sharma 1 Jan 05, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

TableMASTER-mmocr Contents About The Project Method Description Dependency Getting Started Prerequisites Installation Usage Data preprocess Train Infe

Jianquan Ye 298 Dec 21, 2022
A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery

PiSL A python implementation of Physics-informed Spline Learning for nonlinear dynamics discovery. Sun, F., Liu, Y. and Sun, H., 2021. Physics-informe

Fangzheng (Andy) Sun 8 Jul 13, 2022
Yolact-keras实例分割模型在keras当中的实现

Yolact-keras实例分割模型在keras当中的实现 目录 性能情况 Performance 所需环境 Environment 文件下载 Download 训练步骤 How2train 预测步骤 How2predict 评估步骤 How2eval 参考资料 Reference 性能情况 训练数

Bubbliiiing 11 Dec 26, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Code for Graph-to-Tree Learning for Solving Math Word Problems (ACL 2020)

Graph-to-Tree Learning for Solving Math Word Problems PyTorch implementation of Graph based Math Word Problem solver described in our ACL 2020 paper G

Jipeng Zhang 66 Nov 23, 2022
Soomvaar is the repo which 🏩 contains different collection of 👨‍💻🚀code in Python and 💫✨Machine 👬🏼 learning algorithms📗📕 that is made during 📃 my practice and learning of ML and Python✨💥

Soomvaar 📌 Introduction Soomvaar is the collection of various codes implement in machine learning and machine learning algorithms with python on coll

Felix-Ayush 42 Dec 30, 2022