Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Last update: Dec 30, 2022

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
metrics of uncertainty, consistency, and agreement with aggregate
loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available ( ✅ ) and in progress ( 🟡 ).

Categorical Responses

Method	Status
Majority Vote	✅
Dawid-Skene	✅
Gold Majority Vote	✅
M-MSR	✅
Wawa	✅
Zero-Based Skill	✅
GLAD	✅
BCC	🟡

Textual Responses

Method	Status
RASA	✅
HRRASA	✅
ROVER	✅

Image Segmentation

Method	Status
Segmentation MV	✅
Segmentation RASA	✅
Segmentation EM	✅

Pairwise Comparisons

Method	Status
Bradley-Terry	✅
Noisy Bradley-Terry	✅

Citation

Ustalov D., Pavlichenko N., Losev V., Giliazev I., and Tulin E. A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python. The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track. HCOMP 2021. 2021. arXiv: 2109.08584 [cs.HC].

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

For reporting bugs please use the Toloka/bugreport page.
Join our English-speaking slack community for both tech and abstract questions.

License

Comments

Crowd-Kit Learning

This is just an example of what this subpackage will contain.

We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

opened by pilot7747 10
Fix the documentation generation issues
Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[x] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[ ] All new and existing tests passed.

documentation enhancement
opened by dustalov 9
Add MACE

Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.
enhancement good first issue

opened by jcklie 4
Add MACE aggregation model
I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

Description

Based on the original VB inference implementation, I wrote it in Python.

Connected issues (if any)

https://github.com/Toloka/crowd-kit/issues/5

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by pilot7747 3
Documentation updates
Updated index.md and the Classification section:

added extra information to the models descriptions;

added descriptions for parameters;

fixed error and typos in descriptions.
opened by Natalyl3 2
Binary Relevance aggregation
Description

I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[x] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.
opened by denaxen 2
Use mypy --strict
Description

This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[ ] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement
opened by dustalov 2
Run Jupyter notebooks with tests
Description

This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

enhancement good first issue
opened by dustalov 2
Dramatically improve the code maintainability
This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

Description

In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

Connected issues (if any)

Types of changes

[x] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[x] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[x] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[x] I have added tests to cover my changes.

[x] All new and existing tests passed.

bug documentation enhancement
opened by dustalov 2
Add header and LM-based aggregation item
Description

This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

Connected issues (if any)

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)

[ ] New feature (non-breaking change which adds functionality)

[ ] Breaking change (fix or feature that would cause existing functionality to change)

[x] Documentation and examples improvement (changes affected documentation and/or examples)

Checklist:

[x] I have read the CONTRIBUTING document.

[x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

[ ] My change requires a change to the documentation.

[ ] I have updated the documentation accordingly.

[ ] I have added tests to cover my changes.

[x] All new and existing tests passed.

documentation
opened by dustalov 2
Renamed columns?

Hi, the guide says

df = pd.read_csv('results.csv') # should contain columns: task, performer, label

but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

opened by jcklie 2
Ordinal Labels
Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

Labels are categorical but have an ordering defined 1 < ... < K.

The K class ordinal labels are transformed into K−1 binary class label data.

Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.

The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].

The class with the maximum probability is assigned to the instance

enhancement
opened by vikasraykar 2

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)
Crowd-Kit Learning subpackage introducing implementations of deep learning from crowds methods: CoNAL and CrowdLayer

Added Multi-Binary aggregation

Source code(tar.gz)
Source code(zip)
v1.2.0.rc1(Dec 13, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(Sep 27, 2022)
New aggregation methods: One-Coin Dawid Skene, MACE, and KOS

Fixed bugs in Dawid-Skene implementation

Improved maintainability by removing stub files

Switched to setup.cfg from setup.py

Source code(tar.gz)
Source code(zip)
v1.1.0.rc4(Sep 26, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc3(Sep 23, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc2(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0.rc1(Jul 28, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(Mar 22, 2022)
Not a backward-compatible change:

Replaced all mentions of "performer" with "worker". This change is not backward compatible because parameters names and DataFrame/Series columns are also affected.

Improvements:

GoldMajorityVote true_labels argument now supports multiple ground truth values for a single task.

Added tol optimization parameter as a tolerance stopping criteria for iterative methods with a variable number of steps.

Python 3.10 support added.

Enhanced aggregation methods descriptions.

Source code(tar.gz)
Source code(zip)
v0.0.9(Nov 30, 2021)
Added TextSummarization aggregation

Added new datasets

Added entropy_threshold method

Added names for pd.Series which are available after fit

Added on_missing_skill and default_skill params for models that use skills

Source code(tar.gz)
Source code(zip)
v0.0.8(Oct 14, 2021)
Added GLAD aggregeation

Fixed https://github.com/Toloka/crowd-kit/issues/6

Fixed https://github.com/Toloka/crowd-kit/issues/3

Source code(tar.gz)
Source code(zip)
v0.0.7(Sep 2, 2021)
Added segmentation EM

Added ROVER

Fixed HRRASA and refactored TextRASA and TextHRRASA

Source code(tar.gz)
Source code(zip)
v0.0.6(Aug 18, 2021)

crowd-kit==0.0.6 release
Source code(tar.gz)
Source code(zip)
v0.0.5(Jul 18, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.4(May 19, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.3(Apr 12, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.2(Apr 7, 2021)

Source code(tar.gz)
Source code(zip)
v0.0.1(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Toloka

Data labeling platform for ML

GitHub Repository

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Stochastic Image-to-Video Synthesis using cINNs Official PyTorch implementation of Stochastic Image-to-Video Synthesis using cINNs accepted to CVPR202

135 Dec 28, 2022

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans Introduction We introduce the task of dense captioning in 3D scans from commodity RGB-D sensor

79 Nov 07, 2022

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

253 Jan 06, 2023

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral) This is the official implementat

259 Dec 25, 2022

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Video Representation Learning by Recognizing Temporal Transformations [Project Page] Simon Jenni, Givi Meishvili, and Paolo Favaro. In ECCV, 2020. Thi

46 Nov 14, 2022

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Complex-Valued Neural Networks (CVNN) Done by @NEGU93 - J. Agustin Barrachina Using this library, the only difference with a Tensorflow code is that y

1 Nov 12, 2021

Automate issue discovery for your projects against Lightning nightly and releases.

Automated Testing for Lightning EcoSystem Projects Automate issue discovery for your projects against Lightning nightly and releases. You get CPUs, Mu

41 Dec 24, 2022

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

9 Oct 25, 2022

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Learning Intents behind Interactions with Knowledge Graph for Recommendation This is our PyTorch implementation for the paper: Xiang Wang, Tinglin Hua

158 Dec 15, 2022

Running AlphaFold2 (from ColabFold) in Azure Machine Learning

Running AlphaFold2 (from ColabFold) in Azure Machine Learning Colby T. Ford, Ph.D. Companion repository for Medium Post: How to predict many protein s

3 Feb 18, 2022

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

You Only Cut Once (YOCO) YOCO is a simple method/strategy of performing augmenta

88 Dec 28, 2022

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

CoMoGAN: Continuous Model-guided Image-to-Image Translation Official repository. Paper CoMoGAN: continuous model-guided image-to-image translation [ar

166 Dec 31, 2022

A program that can analyze videos according to the weights you select

MaskMonitor A program that can analyze videos according to the weights you select 下載訓練完的 weight檔案執行 MaskDetection.py 內部可更改輸入來源(鏡頭, 影片, 圖片) 以及輸出條件(人

1 Nov 07, 2021

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data This is the official PyTorch implementation of the SeCo paper: @articl

101 Dec 12, 2022

Measuring and Improving Consistency in Pretrained Language Models

ParaRel 🤘 This repository contains the code and data for the paper: Measuring and Improving Consistency in Pretrained Language Models as well as the

26 Dec 02, 2022

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 02, 2022

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Related tags

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

Installing

Getting Started

Implemented Aggregation Methods

Categorical Responses

Textual Responses

Image Segmentation

Pairwise Comparisons

Citation

Questions and Bug Reports

License

Comments

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Description

Connected issues (if any)

Types of changes

Checklist:

Releases(v1.2.0)

v1.2.0(Dec 14, 2022)

v1.2.0.rc1(Dec 13, 2022)

v1.1.0(Sep 27, 2022)

v1.1.0.rc4(Sep 26, 2022)

v1.1.0.rc3(Sep 23, 2022)

v1.1.0.rc2(Jul 28, 2022)

v1.1.0.rc1(Jul 28, 2022)

v1.0.0(Mar 22, 2022)

v0.0.9(Nov 30, 2021)

v0.0.8(Oct 14, 2021)

v0.0.7(Sep 2, 2021)

v0.0.6(Aug 18, 2021)

v0.0.5(Jul 18, 2021)

v0.0.4(May 19, 2021)

v0.0.3(Apr 12, 2021)

v0.0.2(Apr 7, 2021)

v0.0.1(Mar 2, 2021)

Owner

Toloka

Implementation of Stochastic Image-to-Video Synthesis using cINNs.

[CVPR 2021] Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds (CVPR 2022, Oral)

Video Representation Learning by Recognizing Temporal Transformations. In ECCV, 2020.

Complex-Valued Neural Networks (CVNN)Complex-Valued Neural Networks (CVNN)

Automate issue discovery for your projects against Lightning nightly and releases.

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Running AlphaFold2 (from ColabFold) in Azure Machine Learning

Code for You Only Cut Once: Boosting Data Augmentation with a Single Cut

CoMoGAN: continuous model-guided image-to-image translation. CVPR 2021 oral.

A program that can analyze videos according to the weights you select

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data

Measuring and Improving Consistency in Pretrained Language Models

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

Official implementation for “Unsupervised Low-Light Image Enhancement via Histogram Equalization Prior”

Online Multi-Granularity Distillation for GAN Compression (ICCV2021)

Improved Fitness Optimization Landscapes for Sequence Design