Weakly supervised medical named entity classification

Overview

Trove

Documentation Status license

Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data.

The COVID-19 pandemic has underlined the need for faster, more flexible ways of building and sharing state-of-the-art NLP/NLU tools to analyze electronic health records, scientific literature, and social media. Likewise, recent research into language modeling and the dangers of uncurated, "unfathomably" large-scale training data underlines the broader need to approach training set creation itself with more transparency and rigour.

Trove provides tools for combining freely available supervision sources such as medical ontologies from the Unified Medical Language System (UMLS), common text heuristics, and other noisy labeling sources for use as entity labelers in weak supervision frameworks such as Snorkel, FlyingSquid and others. Technical details are available in our manuscript.

Trove has been used as part of several COVID-19 reseach efforts at Stanford.

Getting Started

Tutorials

See tutorials/ for Jupyter notebooks walking through an example NER application.

Installation

Requirements: Python 3.6 or later. We recomend using pip to install

pip install -r requirements.txt

Contributions

We welcome all contributions to the code base! Please submit a pull request and/or start a discussion on GitHub Issues.

Weakly supervised methods for programatically building and maintaining training sets provides new opportunities for the larger community to participate in the creation of important datasets. This is especially exciting in domains such as medicine, where sharing labeled data is often challening due to patient privacy concerns.

Inspired by recent efforts such as HuggingFace's Datasets library, we would love to start a conversation around how to support sharing labelers in service of mantaining an open task library, so that it is easier to create, deploy, and version control weakly supervised models.

Citation

If use Trove in your research, please cite us!

Fries, J.A., Steinberg, E., Khattar, S. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun 12, 2017 (2021). https://doi-org.stanford.idm.oclc.org/10.1038/s41467-021-22328-4

@article{fries2021trove,
  title={Ontology-driven weak supervision for clinical entity classification in electronic health records},
  author={Fries, Jason A and Steinberg, Ethan and Khattar, Saelig and Fleming, Scott L and Posada, Jose and Callahan, Alison and Shah, Nigam H},
  journal={Nature Communications},
  volume={12},
  number={1},
  year={2021},
  publisher={Nature Publishing Group}
}
You might also like...
Source Code For Template-Based Named Entity Recognition Using BART

Template-Based NER Source Code For Template-Based Named Entity Recognition Using BART Training Training train.py Inference inference.py Corpus ATIS (h

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)
“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker
Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

Chinese named entity recognization with BiLSTM using Keras

Chinese named entity recognization (Bilstm with Keras) Project Structure ./ ├── README.md ├── data │   ├── README.md │   ├── data 数据集 │   │   ├─

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Named-Entity-Recognition-NER-Papers by Pengfei Liu, Jinlan Fu and other contributors. An elaborate and exhaustive paper list for Named Entity Recognit

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

Pytorch Code for
Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

Comments
  • Code in

    Code in "applications" doesn't work; experiments not reproducible

    A lot of the imports in the scripts under the "applications" subfolder fail - for example: from trove.utils import score_umls_ontologies from trove.labelers.norm import lowercase, strip_affixes

    This makes it impossible to reproduce the experiments in the paper

    Could you please share these functions (or the previous version of the repo) to make the code runnable? Thanks a lot in advance!

    opened by tmadl 7
  • Bump joblib from 1.0.1 to 1.2.0

    Bump joblib from 1.0.1 to 1.2.0

    Bumps joblib from 1.0.1 to 1.2.0.

    Changelog

    Sourced from joblib's changelog.

    Release 1.2.0

    • Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

    • Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

    • Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

    • Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

    • Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

    • Vendor loky 3.3.0 which fixes several bugs including:

      • robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

      • avoiding leaking worker processes in case of nested loky parallel calls;

      • reliability spawn the correct number of reusable workers.

    Release 1.1.0

    • Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

    • Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

    ... (truncated)

    Commits
    • 5991350 Release 1.2.0
    • 3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)
    • cea26ff CI test the future loky-3.3.0 branch (#1338)
    • 8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)
    • 067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)
    • ac4ebd5 MAINT add back pytest warnings plugin (#1337)
    • a23427d Test child raises parent exits cleanly more reliable on macos (#1335)
    • ac09691 [MAINT] various test updates (#1334)
    • 4a314b1 Vendor loky 3.2.0 (#1333)
    • bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • UMLS. init_from_nlm_zip can't decode charmap

    UMLS. init_from_nlm_zip can't decode charmap

    Describe the bug

    I can't install the UMLS as directed by the tutorial notebooks. The UMLS object can't be initialized.

    Steps to reproduce the bug

    I downloaded the relevant zip file from the provided link (https://download.nlm.nih.gov/umls/kss/2020AB/umls-2020AB-metathesaurus.zip) and placed the file in the same directory as the 1_Installing_the_UMLS.ipynb notebook in the tutorials folder. Then I ran the notebook as given in the github.

    Sample code to reproduce the bug

    Expected results

    A clear and concise description of the expected results.

    Actual results

    Specify the actual results or traceback.

    The libraries and python version are all on the pdf attached

    Environment info

    troveDecodeError

    • Python version:

    The libraries and python version are all on the pdf attached

    • PyArrow version:

    The libraries and python version are all on the pdf attached

    opened by elsirdavid 3
  • from spacy.pipeline import SentenceSegmenter

    from spacy.pipeline import SentenceSegmenter

    Hi, thank you for the interesting piece of work!

    When I tried to preprocess my data, I encountered this error: ImportError: cannot import name 'SentenceSegmenter' from 'spacy.pipeline' when running parse.py. I checked for spacy documentation and it doesn't have the SentenceSegmenter feature. May I get your advice on how to solve this issue?

    Thank you so much!

    opened by Yocodeyo 2
Releases(v0.1-alpha)
  • v0.1-alpha(Feb 3, 2021)

Source code for From Stars to Subgraphs

GNNAsKernel Official code for From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness Visualizations GNN-AK(+) GNN-AK(+) with Subgra

44 Dec 19, 2022
An MQA (Studio, originalSampleRate) identifier for lossless flac files written in Python.

An MQA (Studio, originalSampleRate) identifier for "lossless" flac files written in Python.

Daniel 10 Oct 03, 2022
Container : Context Aggregation Network

Container : Context Aggregation Network If you use this code for a paper please cite: @article{gao2021container, title={Container: Context Aggregati

AI2 47 Dec 16, 2022
DTCN SMP Challenge - Sequential prediction learning framework and algorithm

DTCN This is the implementation of our paper "Sequential Prediction of Social Me

Bobby 2 Jan 24, 2022
This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

labml.ai 16.4k Jan 09, 2023
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

tonne 1.4k Dec 29, 2022
Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

REDQ source code Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm. Paper link: https://arxiv.org/abs/2101.05

109 Dec 16, 2022
Very deep VAEs in JAX/Flax

Very Deep VAEs in JAX/Flax Implementation of the experiments in the paper Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on I

Jamie Townsend 42 Dec 12, 2022
StyleMapGAN - Official PyTorch Implementation

StyleMapGAN - Official PyTorch Implementation StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing Hyunsu Kim, Yunj

NAVER AI 425 Dec 23, 2022
Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation Woncheol Shin1, Gyubok Lee1, Jiyoung Lee1, Joonseok Lee2,3, Edward Ch

Woncheol Shin 7 Sep 26, 2022
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery This is a public code repository for the publication: i-SpaSP: Structured Neural Pruning

Cameron Ronald Wolfe 5 Nov 04, 2022
Practical Single-Image Super-Resolution Using Look-Up Table

Practical Single-Image Super-Resolution Using Look-Up Table [Paper] Dependency Python 3.6 PyTorch glob numpy pillow tqdm tensorboardx 1. Training deep

Younghyun Jo 116 Dec 23, 2022
Hypercomplex Neural Networks with PyTorch

HyperNets Hypercomplex Neural Networks with PyTorch: this repository would be a container for hypercomplex neural network modules to facilitate resear

Eleonora Grassucci 21 Dec 27, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 03, 2023
Pgn2tex - Scripts to convert pgn files to latex document. Useful to build books or pdf from pgn studies

Pgn2Latex (WIP) A simple script to make pdf from pgn files and studies. It's sti

12 Jul 23, 2022
This GitHub repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.'

About Repository This repository contains code used for plots in NeurIPS 2021 paper 'Stochastic Multi-Armed Bandits with Control Variates.' About Code

Arun Verma 1 Nov 09, 2021
Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom Sample on-line plotting while training(avg loss)/testing(writ

Jingwei Zhang 269 Nov 15, 2022
Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Bootstrap Your Own Latent (BYOL), in Pytorch Practical implementation of an astoundingly simple method for self-supervised learning that achieves a ne

Phil Wang 1.4k Dec 29, 2022
Deploy optimized transformer based models on Nvidia Triton server

🤗 Hugging Face Transformer submillisecond inference 🤯 and deployment on Nvidia Triton server Yes, you can perfom inference with transformer based mo

Lefebvre Sarrut Services 1.2k Jan 05, 2023