Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Last update: Nov 21, 2022

Overview

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

This project is for the paper: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles.

Experimental Results

Preliminaries

It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requries some packages to be installed:

Downloading Datasets

MNIST-M: download it from the Google drive. Extract the files and place them in ./dataset/mnist_m/.
SVHN: need to download Format 2 data (*.mat). Place the files in ./dataset/svhn/.
USPS: download the usps.h5 file. Place the file in ./dataset/usps/.

Overview of the Code

train_model.py: train standard models via supervised learning.
train_dann.py: train domain adaptive (DANN) models.
eval_pipeline.py: evaluate various methods on all tasks.

Running Experiments

Examples

To train a standard model via supervised learning, you can use the following command:

python train_model.py --source-dataset {source dataset} --model-type {model type} --base-dir {directory to save the model}

{source dataset} can be mnist, mnist-m, svhn or usps.

{model type} can be typical_dnn or dann_arch.

To train a domain adaptive (DANN) model, you can use the following command:

python train_dann.py --source-dataset {source dataset} --target-dataset {target dataset} --base-dir {directory to save the model} [--test-time]

{source dataset} (or {target dataset}) can be mnist, mnist-m, svhn or usps.

The argument --test-time is to indicate whether to replace the target training dataset with the target test dataset.

To evaluate a method on all training-test dataset pairs, you can use the following command:

python eval_pipeline.py --model-type {model type} --method {method}

{model type} can be typical_dnn or dann_arch.

{method} can be conf_avg, ensemble_conf_avg, conf, trust_score, proxy_risk, our_ri or our_rm.

Train All Models

You can run the following scrips to pre-train all models needed for the experiments.

run_all_model_training.sh: train all supervised learning models.
run_all_dann_training.sh: train all DANN models.
run_all_ensemble_training.sh: train all ensemble models.

Evaluate All Methods

You can run the following script to get the results reported in the paper.

run_all_evaluation.sh: evaluate all methods on all tasks.

Acknowledgements

Part of this code is inspired by estimating-generalization and TrustScore.

Citation

Please cite our work if you use the codebase:

@article{chen2021detecting,
  title={Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles},
  author={Chen, Jiefeng and Liu, Frederick and Avci, Besim and Wu, Xi and Liang, Yingyu and Jha, Somesh},
  journal={arXiv preprint arXiv:2106.15728},
  year={2021}
}

License

Please refer to the LICENSE.

Propose a principled and practically effective framework for unsupervised accuracy estimation and error detection tasks with theoretical analysis and state-of-the-art performance.

Related tags

Overview

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

Experimental Results

Preliminaries

Downloading Datasets

Overview of the Code

Running Experiments

Examples

Train All Models

Evaluate All Methods

Acknowledgements

Citation

License

Owner

Jiefeng Chen

DeLag: Detecting Latency Degradation Patterns in Service-based Systems

Bringing Computer Vision and Flutter together , to build an awesome app !!

Graduation Project

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Voice assistant - Voice assistant with python

Applying CLIP to Point Cloud Recognition.

A PyTorch implementation of "Capsule Graph Neural Network" (ICLR 2019).

The official repository for Deep Image Matting with Flexible Guidance Input

Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

🙄 Difficult algorithm, Simple code.

A 35mm camera, based on the Canonet G-III QL17 rangefinder, simulated in Python.

Explainable Zero-Shot Topic Extraction

Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

PyTorch implementations of Generative Adversarial Networks.

[ICCV21] Code for RetrievalFuse: Neural 3D Scene Reconstruction with a Database

Visual dialog agents with pre-trained vision-and-language encoders.

ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

Bringing Characters to Life with Computer Brains in Unity

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets