Asterisk*

Generating Training Data made Easy

Asterisk is a framework to generate high-quality training datasets at scale. Instead of relying on the end users to write user-defined heuristics, the proposed approach exploits a small set of labeled data and automatically produces a set of heuristics to assign initial labels. In order to enhance the quality of the generated labels, the framework improves the accuracies of the heuristics by applying a novel data-driven AL process. During the process, the system examines the generated weak labels along with the modeled accuracies of the heuristics to help the learner decide on the points for which the user should provide true labels.

Installation

To install Asterisk, you can use pip:

pip install asterisk

or clone the Git repository and run:

pip install -e .

within it.

Publications

M. Nashaat, A. Ghosh, J. Miller, and S. Quader, “Asterisk: Generating Large Training Datasets with Automatic Active Supervision,” ACM Transactions on Data Science (TDS), May 2020.
M. Nashaat, A. Ghosh, J. Miller, and S. Quader, "WeSAL: Applying Active Supervision to Find High-quality Labels at Industrial Scale", Proceedings of the 53rd Hawaii International Conference on System Sciences, HI, USA, 2020, pp. 219-228.
M. Nashaat, A. Ghosh, J. Miller, S. Quader, C. Marston and J. Puget, "Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets," 2018 IEEE International Conference on Big Data (Big Data) , Seattle, WA, USA, 2018, pp. 46-55. doi: 10.1109/BigData.2018.8622459.

Asterisk is a framework to generate high-quality training datasets at scale

Related tags

Overview

Asterisk*

Installation

Publications

Owner

Mona Nashaat

CrossMLP - The repository offers the official implementation of our BMVC 2021 paper (oral) in PyTorch.

A comprehensive and up-to-date developer education platform for Urbit.

Godot RL Agents is a fully Open Source packages that allows video game creators

In the case of your data having only 1 channel while want to use timm models

Deep Learning to Improve Breast Cancer Detection on Screening Mammography

A library for uncertainty representation and training in neural networks.

ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

Implementation of OmniNet, Omnidirectional Representations from Transformers, in Pytorch

Code for paper "Multi-level Disentanglement Graph Neural Network"

Energy consumption estimation utilities for Jetson-based platforms

Contrastive Learning for Compact Single Image Dehazing, CVPR2021

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

A Python implementation of global optimization with gaussian processes.

Unofficial implementation of the Involution operation from CVPR 2021

Generate images from texts. In Russian

Implementation of the paper Recurrent Glimpse-based Decoder for Detection with Transformer.

The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

Train SN-GAN with AdaBelief

Image morphing without reference points by applying warp maps and optimizing over them.

Implement A3C for Mujoco gym envs