Weakly supervised medical named entity classification

Last update: Nov 18, 2022

Overview

Trove

Trove is a research framework for building weakly supervised (bio)medical named entity recognition (NER) and other entity attribute classifiers without hand-labeled training data.

The COVID-19 pandemic has underlined the need for faster, more flexible ways of building and sharing state-of-the-art NLP/NLU tools to analyze electronic health records, scientific literature, and social media. Likewise, recent research into language modeling and the dangers of uncurated, "unfathomably" large-scale training data underlines the broader need to approach training set creation itself with more transparency and rigour.

Trove provides tools for combining freely available supervision sources such as medical ontologies from the Unified Medical Language System (UMLS), common text heuristics, and other noisy labeling sources for use as entity labelers in weak supervision frameworks such as Snorkel, FlyingSquid and others. Technical details are available in our manuscript.

Trove has been used as part of several COVID-19 reseach efforts at Stanford.

Continuous symptom profiling of patients screened for SARS-CoV-2. We used a daily feed of patient notes from Stanford Health Care emergency departments to generate up-to-date COVID-19 symptom frequency data. Funded by the Bill & Melinda Gates Foundation.
Estimating the efficacy of symptom-based screening for COVID-19 published in npj Digitial Medicine.
Our COVID-19 symptom data was used by CMU's DELPHI group to prioritize selection of informative features from Google's Symptom Search Trends dataset.

Getting Started

Tutorials

See tutorials/ for Jupyter notebooks walking through an example NER application.

Installation

Requirements: Python 3.6 or later. We recomend using pip to install

pip install -r requirements.txt

Contributions

We welcome all contributions to the code base! Please submit a pull request and/or start a discussion on GitHub Issues.

Weakly supervised methods for programatically building and maintaining training sets provides new opportunities for the larger community to participate in the creation of important datasets. This is especially exciting in domains such as medicine, where sharing labeled data is often challening due to patient privacy concerns.

Inspired by recent efforts such as HuggingFace's Datasets library, we would love to start a conversation around how to support sharing labelers in service of mantaining an open task library, so that it is easier to create, deploy, and version control weakly supervised models.

Citation

If use Trove in your research, please cite us!

Fries, J.A., Steinberg, E., Khattar, S. et al. Ontology-driven weak supervision for clinical entity classification in electronic health records. Nat Commun 12, 2017 (2021). https://doi-org.stanford.idm.oclc.org/10.1038/s41467-021-22328-4

@article{fries2021trove,
  title={Ontology-driven weak supervision for clinical entity classification in electronic health records},
  author={Fries, Jason A and Steinberg, Ethan and Khattar, Saelig and Fleming, Scott L and Posada, Jose and Callahan, Alison and Shah, Nigam H},
  journal={Nature Communications},
  volume={12},
  number={1},
  year={2021},
  publisher={Nature Publishing Group}
}

Source Code For Template-Based Named Entity Recognition Using BART

Template-Based NER Source Code For Template-Based Named Entity Recognition Using BART Training Training train.py Inference inference.py Corpus ATIS (h

174 Dec 19, 2022

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Data Augmentation for Cross-Domain Named Entity Recognition Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio This repository

18 Sep 10, 2022

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

12 Dec 14, 2022

Chinese named entity recognization with BiLSTM using Keras

Chinese named entity recognization (Bilstm with Keras) Project Structure ./ ├── README.md ├── data │ ├── README.md │ ├── data 数据集 │ │ ├─

1 Dec 17, 2021

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Named-Entity-Recognition-NER-Papers by Pengfei Liu, Jinlan Fu and other contributors. An elaborate and exhaustive paper list for Named Entity Recognit

388 Dec 18, 2022

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Medical-Transformer Pytorch Code for the paper "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation" About this repo: This repo

615 Dec 25, 2022

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

1.2k Jan 4, 2023

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

6 Dec 25, 2022

Comments

Code in "applications" doesn't work; experiments not reproducible

A lot of the imports in the scripts under the "applications" subfolder fail - for example: from trove.utils import score_umls_ontologies from trove.labelers.norm import lowercase, strip_affixes

This makes it impossible to reproduce the experiments in the paper

Could you please share these functions (or the previous version of the repo) to make the code runnable? Thanks a lot in advance!

opened by tmadl 7
Bump joblib from 1.0.1 to 1.2.0
Bumps joblib from 1.0.1 to 1.2.0.

Changelog

Sourced from joblib's changelog.

Release 1.2.0

Fix a security issue where eval(pre_dispatch) could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327

Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256

Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263

Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with mmap_mode != None as the resulting numpy.memmap object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254

Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.

Vendor loky 3.3.0 which fixes several bugs including:

robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);

avoiding leaking worker processes in case of nested loky parallel calls;

reliability spawn the correct number of reusable workers.

Release 1.1.0

Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181

Fix joblib.Memory bug with the ignore parameter when the cached function is a decorated function.

... (truncated)

Commits

5991350 Release 1.2.0

3fa2188 MAINT cleanup numpy warnings related to np.matrix in tests (#1340)

cea26ff CI test the future loky-3.3.0 branch (#1338)

8aca6f4 MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)

067ed4f XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)

ac4ebd5 MAINT add back pytest warnings plugin (#1337)

a23427d Test child raises parent exits cleanly more reliable on macos (#1335)

ac09691 [MAINT] various test updates (#1334)

4a314b1 Vendor loky 3.2.0 (#1333)

bdf47e9 Make test_parallel_with_interactively_defined_functions_default_backend timeo...

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
UMLS. init_from_nlm_zip can't decode charmap
Describe the bug

I can't install the UMLS as directed by the tutorial notebooks. The UMLS object can't be initialized.

Steps to reproduce the bug

I downloaded the relevant zip file from the provided link (https://download.nlm.nih.gov/umls/kss/2020AB/umls-2020AB-metathesaurus.zip) and placed the file in the same directory as the 1_Installing_the_UMLS.ipynb notebook in the tutorials folder. Then I ran the notebook as given in the github.

Sample code to reproduce the bug

Expected results

A clear and concise description of the expected results.

Actual results

Specify the actual results or traceback.

The libraries and python version are all on the pdf attached

Environment info

datasets version:

Platform: 1_Installing_the_UMLS_with_decode_error.pdf

Python version:

The libraries and python version are all on the pdf attached

PyArrow version:

The libraries and python version are all on the pdf attached
opened by elsirdavid 3
from spacy.pipeline import SentenceSegmenter

Hi, thank you for the interesting piece of work!

When I tried to preprocess my data, I encountered this error: ImportError: cannot import name 'SentenceSegmenter' from 'spacy.pipeline' when running parse.py. I checked for spacy documentation and it doesn't have the SentenceSegmenter feature. May I get your advice on how to solve this issue?

Thank you so much!

opened by Yocodeyo 2

Releases(v0.1-alpha)

v0.1-alpha(Feb 3, 2021)

This is the codebase used to generate results for the manuscript "Ontology-driven weak supervision for medical entity classification."
Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

True per-item rarity for Loot

True-Rarity True per-item rarity for Loot (For Adventurers) and More Loot A.K.A mLoot each out/true_rarity_{item_type}.json file contains probabilitie

3 Jul 26, 2022

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022

Emotion classification of online comments based on RNN

emotion_classification Emotion classification of online comments based on RNN, the accuracy of the model in the test set reaches 99% data: Large Movie

1 Nov 23, 2021

Tandem Mass Spectrum Prediction with Graph Transformers

MassFormer This is the original implementation of MassFormer, a graph transformer for small molecule MS/MS prediction. Check out the preprint on arxiv

13 Oct 27, 2022

LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023

Image Segmentation and Object Detection in Pytorch

Image Segmentation and Object Detection in Pytorch Pytorch-Segmentation-Detection is a library for image segmentation and object detection with report

732 Dec 10, 2022

Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码，MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载请在智源资源下载页面进行申请，文件介绍如下：文件名描述参数大小 100000.tar

136 Dec 28, 2022

a basic code repository for basic task in CV(classification,detection,segmentation)

basic_cv a basic code repository for basic task in CV(classification,detection,segmentation,tracking) classification generate dataset train predict de

1 Oct 15, 2021

Clockwork Convnets for Video Semantic Segmentation

Clockwork Convnets for Video Semantic Segmentation This is the reference implementation of arxiv:1608.03609: Clockwork Convnets for Video Semantic Seg

141 Nov 21, 2022

Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Universal Adversarial Triggers for Attacking and Analyzing NLP This is the official code for the EMNLP 2019 paper, Universal Adversarial Triggers for

248 Dec 17, 2022

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022

Weakly supervised medical named entity classification

Related tags

Overview

Trove

Getting Started

Tutorials

Installation

Contributions

Citation

You might also like...

Source Code For Template-Based Named Entity Recognition Using BART

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Chinese named entity recognization with BiLSTM using Keras

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

Comments

Code in "applications" doesn't work; experiments not reproducible

Bump joblib from 1.0.1 to 1.2.0

Release 1.2.0

Release 1.1.0

UMLS. init_from_nlm_zip can't decode charmap

Describe the bug

Steps to reproduce the bug

Sample code to reproduce the bug

Expected results

Actual results

Environment info

from spacy.pipeline import SentenceSegmenter

Releases(v0.1-alpha)

v0.1-alpha(Feb 3, 2021)

Owner

True per-item rarity for Loot

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Emotion classification of online comments based on RNN

Tandem Mass Spectrum Prediction with Graph Transformers

LibMTL: A PyTorch Library for Multi-Task Learning

Image Segmentation and Object Detection in Pytorch

Code for CPM-2 Pre-Train

a basic code repository for basic task in CV(classification,detection,segmentation)

Clockwork Convnets for Video Semantic Segmentation

Github project for Attention-guided Temporal Coherent Video Object Matting.

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

InvTorch: memory-efficient models with invertible functions

Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

Neural Turing Machine (NTM) & Differentiable Neural Computer (DNC) with pytorch & visdom

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

A graph adversarial learning toolbox based on PyTorch and DGL.

This is the source code of the 1st place solution for segmentation task (with Dice 90.32%) in 2021 CCF BDCI challenge.

Transformers based fully on MLPs

Easy way to add GoogleMaps to Flask applications. maintainer: @getcake