Fuzzy Overclustering (FOC)

Overview

Fuzzy Overclustering (FOC)

In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in many applications these consistent annotations can not be given due to issues like intra- and interobserver variability. We call these inconsistent label fuzzy. Our method Fuzzy Overclustering overclusters the data and can therefore handle these fuzzy labels better than Out-of-the-Box Semi-Supervised Methods.

More details are given in the accpeted full paper at https://doi.org/10.3390/s21196661 or in the preprint https://arxiv.org/abs/2012.01768

The main idea is illustrated below. The graphic and caption are taken from the original work.

main idea of paper

Illustration of fuzzy data and overclustering -- The grey dots represent unlabeled data and the colored dots labeled data from different classes. The dashed lines represent decision boundaries. For certain data, a clear separation of the different classes with one decision boundary is possible and both classes contain the same amount of data (top). For fuzzy data determining a decision boundary is difficult because of intermediate datapoints between the classes (middle). These fuzzy datapoints can often not be easily sorted into one consistent class between annotators. If you overcluster the data, you get smaller but more consistent substructures in the fuzzy data (bottom). The images illustrate possible examples for \certain data (cat & dog) and \fuzzy plankton data (trichodesmium puff and tuft). The center plankton image was considered to be trichodesmium puff or tuft by around half of the annotators each. The left and right plankton image were consistently annotated as their respective class.

Installation

We advise to use docker for the experiments. We recommend a python3 container with tesnorflow 1.14 preinstalled. Additionally the following commands need to be executed:

apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev libgl1-mesa-glx

After this ensure that the requirements from requirements.txt are installed. The most important packages are keras, scipy and opencv.

Usage

The parameters are given in arguments.yaml with their description. Most of the parameters can be left at the default value. Especially the dataset, batch size and epoch related parameters are imported.

As a rule of thumb the following should be applied:

  • overcluster_k = 5-6 * the number of classes
  • batch_size = repetition * overcluster_k * 2-3

You need to define three directories for the execution with docker:

  • DATASET_ROOT, this folder contains a folder with the dataset name. This folder contains a trainand val folder. It needs a folder unlabeled if the parameter unlabeled_data is used. Each folder contains subfolder with the given classes.
  • LOG_ROOT, inside a subdiretory logs all experimental results will be stored with regard to the given IDs and a time stamp
  • SRC_ROOT root of the this project source code

The DOCKER_IMAGE is the above defined image.

You can visualize the results with tensorboard --logdir . from inside the log_dir

Example Usages

bash % test pipeline running docker run -it --rm -v :/data-ssd -v :/data1 -v :/src -w="/src" python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v :/data-ssd -v :/data1 -v :/home -w="/home" python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 ">
% test container
docker run -it --rm -v 
               
                :/data-ssd -v 
                
                 :/data1   -v 
                 
                  :/src -w="/src" 
                  
                    bash


% test pipeline running
docker run -it --rm -v 
                   
                    :/data-ssd -v 
                    
                     :/data1 -v 
                     
                      :/src -w="/src" 
                      
                        python main.py --IDs foc experiment_name not_use_mi --dataset [email protected] --unlabeled_data [email protected] --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 --normal_epoch 2 --frozen_epoch 1 % training FOC-Light docker run -it --rm -v 
                       
                        :/data-ssd -v 
                        
                         :/data1 -v 
                         
                          :/home -w="/home" 
                          
                            python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 130 --batch_size 130 --overcluster_k 60 --num_gpus 1 % training FOC (no warmup) % needs multiple GPUs or very large ones (change num gpu to 1 in this case) docker run -it --rm -v 
                           
                            :/data-ssd -v 
                            
                             :/data1 -v 
                             
                              :/home -w="/home" 
                              
                                python main.py --experiment_identifiers foc experiment_name not_use_mi --dataset stl10 --frozen_batch_size 390 --batch_size 390 --overcluster_k 60 --num_gpus 3 --lambda_m 1 --sample_repetition 3 
                              
                             
                            
                           
                          
                         
                        
                       
                      
                     
                    
                   
                  
                 
                
               
Evaluating different engineering tricks that make RL work

Reinforcement Learning Tricks, Index This repository contains the code for the paper "Distilling Reinforcement Learning Tricks for Video Games". Short

Anssi 15 Dec 26, 2022
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

[ICLR'21] DARTS-: Robustly Stepping out of Performance Collapse Without Indicators [openreview] Authors: Xiangxiang Chu, Xiaoxing Wang, Bo Zhang, Shun

55 Nov 01, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Mutian He 19 Oct 14, 2022
Codes to calculate solar-sensor zenith and azimuth angles directly from hyperspectral images collected by UAV. Works only for UAVs that have high resolution GNSS/IMU unit.

UAV Solar-Sensor Angle Calculation Table of Contents About The Project Built With Getting Started Prerequisites Installation Datasets Contributing Lic

Sourav Bhadra 1 Jan 15, 2022
A tool for making map images from OpenTTD save games

OpenTTD Surveyor A tool for making map images from OpenTTD save games. This is not part of the main OpenTTD codebase, nor is it ever intended to be pa

Aidan Randle-Conde 9 Feb 15, 2022
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 01, 2022
Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP"

DiLBERT Repo for the paper "DiLBERT: Cheap Embeddings for Disease Related Medical NLP" Pretrained Model The pretrained model presented in the paper is

Kevin Roitero 2 Dec 15, 2022
PERIN is Permutation-Invariant Semantic Parser developed for MRP 2020

PERIN: Permutation-invariant Semantic Parsing David Samuel & Milan Straka Charles University Faculty of Mathematics and Physics Institute of Formal an

รšFAL 40 Jan 04, 2023
StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

3k Jan 08, 2023
Fang Zhonghao 13 Nov 19, 2022
Portfolio Optimization and Quantitative Strategic Asset Allocation in Python

Riskfolio-Lib Quantitative Strategic Asset Allocation, Easy for Everyone. Description Riskfolio-Lib is a library for making quantitative strategic ass

Riskfolio 1.7k Jan 07, 2023
Python based Advanced AI Assistant

Knick is a virtual artificial intelligence project, fully developed in python. The objective of this project is to develop a virtual assistant that can handle our minor, intermediate as well as heavy

19 Nov 15, 2022
ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Introduction PyTorch code for the ICLR 2021 paper [i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning]. @inproceedings{lee2021i

Kibok Lee 68 Nov 27, 2022
Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
ใ€ŒPyTorch Implementation of AnimeGANv2ใ€ใ‚’็”จใ„ใฆใ€็”Ÿๆˆใ—ใŸ้ก”็”ปๅƒใ‚’ๅ…ƒใฎ็”ปๅƒใซไธŠๆ›ธใใ™ใ‚‹ใƒ‡ใƒข

AnimeGANv2-Face-Overlay-Demo PyTorch Implementation of AnimeGANv2ใ‚’็”จใ„ใฆใ€็”Ÿๆˆใ—ใŸ้ก”็”ปๅƒใ‚’ๅ…ƒใฎ็”ปๅƒใซไธŠๆ›ธใใ™ใ‚‹ใƒ‡ใƒขใงใ™ใ€‚

KazuhitoTakahashi 21 Oct 18, 2022
CaLiGraph Ontology as a Challenge for Semantic Reasoners ([email protected]'21)

CaLiGraph for Semantic Reasoning Evaluation Challenge This repository contains code and data to use CaLiGraph as a benchmark dataset in the Semantic R

Nico Heist 0 Jun 08, 2022
This is the implementation of the paper "Self-supervised Outdoor Scene Relighting"

Self-supervised Outdoor Scene Relighting This is the implementation of the paper "Self-supervised Outdoor Scene Relighting". The model is implemented

Ye Yu 24 Dec 17, 2022
This is the pytorch implementation for the paper: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation, which is accepted to ICCV2021.

GMPQ: Generalizable Mixed-Precision Quantization via Attribution Rank Preservation This is the pytorch implementation for the paper: Generalizable Mix

18 Sep 02, 2022
A Free and Open Source Python Library for Multiobjective Optimization

Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

Project Platypus 424 Dec 18, 2022
Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022