Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

Code for "Finetuning Pretrained Transformers into Variational Autoencoders"

📔️ Generate a text-based journal from a template file.

Unsupervised intent recognition

中文医疗信息处理基准CBLUE: A Chinese Biomedical LanguageUnderstanding Evaluation Benchmark

Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

Simple bots or Simbots is a library designed to create simple bots using the power of python. This library utilises Intent, Entity, Relation and Context model to create bots .

Code-autocomplete, a code completion plugin for Python

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

Switch spaces for knowledge graph embeddings

🏖 Easy training and deployment of seq2seq models.

Snowball compiler and stemming algorithms

Open-World Entity Segmentation

Library for fast text representation and classification.

sangha, pronounced "suhng-guh", is a social networking, booking platform where students and teachers can share their practice.

A library for finding knowledge neurons in pretrained transformer models.

This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Sequence modeling benchmarks and temporal convolutional networks