Natural Language Processing for Adverse Drug Reaction (ADR) Detection

This repo contains code from a project to identify ADRs in discharge summaries at Austin Health. The model uses the HuggingFace Transformers library, beginning with the pretrained DeBERTa model. Further MLM pre-training is performed on a large corpus of unannotated discharge summaries. Finally, fine-tuning is peformed on a corpus of annotated discharge summaries (annotated using Prodigy). The model performs NER, but final performance is measured at the document level using the maximum token-level score.

We used Weights and Biases for experiment tracking.

The pretrain script takes a folder containing discharge summaries stored in CSV folders, tokenizes and continues MLM training on deberta-base.

Fine-tuning can then be performed with the finetune script using CLI commands. This script assumes the data is either a JSONL file of annotated text exported from Prodigy (--datafile example.jsonl), or a saved HuggingFace Datasets. If you run this script once on a JSONL file of annotations, you can choose to save the Dataset into a folder (--save_data_dir "save_to_here") and use this for subsequent training runs (--datafile "save_to_here").

Example usage:

python .\finetune.py --folds 5 --epochs 15 --lr 5e-5 --wandb_on --hub_off --project 'CLI Tests' --run_name cross-validation --datafile 'data'

Note: you might find that your exported annotations (JSONL file) is not encoded using UTF-8, which will prevent this code from working. There are various methods to change the encoding and these can all be found with a quick Google search. On a windows machine, for example, modify the following in powershell:

Get-Content .\name_of_file.jsonl -Encoding Unicode | Set-Content -Encoding UTF8 .\name_of_new_file.jsonl

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Related tags

Overview

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Owner

Medicines Optimisation Service - Austin Health

An end to end ASR Transformer model training repo

PyTorch Implementation of "Non-Autoregressive Neural Machine Translation"

Code for the paper "Language Models are Unsupervised Multitask Learners"

iBOT: Image BERT Pre-Training with Online Tokenizer

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

This converter will create the exact measure for your cappuccino recipe from the grandiose Rafaella Ballerini!

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

Score-Based Point Cloud Denoising (ICCV'21)

【原神】自动演奏风物之诗琴的程序

Estimation of the CEFR complexity score of a given word, sentence or text.

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

Yodatranslator is a simple translator English to Yoda-language

A Streamlit web app that generates Rick and Morty stories using GPT2.

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Python library for Serbian Natural language processing (NLP)

Programme de chiffrement et de déchiffrement inverse d'un message en python3.

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Text to speech for Vietnamese, ez to use, ez to update