Mixing up the Invariant Information clustering architecture, with self supervised concepts from SimCLR and MoCo approaches

Overview

Self Supervised clusterer

Combined IIC, and Moco architectures, with some SimCLR notions, to get state of the art unsupervised clustering while retaining interesting image latent representations in the feature space using contrastive learning.

Installation

Currently successfully tested on Ubuntu 18.04 and Ubuntu 20.04, with python 3.6 and 3.8

Works for Pytorch versions >= 1.4. Launch following command to install all pd

pip3 install -r requirements.txt

Logs

All information is logged to tensorboard. If you activate the neptune flag, you can also make logs to Neptune.ai.

Tensorboard

To check logs of your trainings using tensorboard, use the command :

tensorboard --logdir=./logs/NAME_OF_TEST/events

The NAME_OF_TEST is generated automatically for each automatic training you launch, composed of the inputed name of the training you chose (explained further below in commands), and the exact date and time when you launched the training. For example test_on_nocadozole_20210518-153531

Neptune

Before using neptune as a log and output control tool, you need to create a neptune account and get your developer token. Create a neptune_token.txt file and store the token in it.

Create in neptune a folder for your outputs, with a name of your choice, then go to main.py and modify from line 129 :

if args.offline :
    CONNECTION_MODE = "offline"
    run = neptune.init(project='USERNAME/PROJECT_NAME',# You should add your project name and username here
                   api_token=token,
                   mode=CONNECTION_MODE,
                   )
else :
    run = neptune.init(project='USERNAME/PROJECT_NAME',# You should add your project name and username here
               api_token=token,
               )

Preparing your own data

All datasets will be put in the ./data folder. As you might have to create various different datasets inside, create a folder inside for each dataset you use, while giving it a linux-friendly name.

To be completed

Commands

  • Adding the --labels command means you have ground truth for classes, and you wish to use it in evaluation

  • Adding the --neptune command means you wish to log your data in neptune (Check logging section)

  • output_k is the number of clusters

  • model_name is the name you'll use to keep track of this specific model. Date of training launch will be added to its name.

  • augmentation is the contrastive loss augmentation types you'll be using. They can be consulted and modified in the datasets/datasetgetter.py file.

  • epochs is the maximal number of epochs you wish to have. It is 1000 by default

  • batch_size is the training batch size. Default is 32

  • val_batch is the validation batch size. Default is 10

  • sty_dim is the size of the style vector. default is 128

  • img_size size of input images

  • --debug is a flag for activating debug mode, where the training is very fast, just to check if everything is working fine

training from scratch
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation BBC --data_type BBBC021_196  --data_folder N1 --neptune --img_size 196
training using pretrained model
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation improved_v2 --data_type BBBC021_196  --data_folder ND8D --labels --neptune --load_model testing_high_cluster_number_20210604-024131_
valiadtion using pretrained model
python main.py --gpu 2  --output_k 9  --model_name=validating_best_image_transfer --augmentation improved_v2 --data_type BBBC021_196  --data_folder ND8D --labels --validation --neptune --load_model testing_high_cluster_number_20210604-024131_
Owner
Bendidi Ihab
Computational Biologist & DL Eng
Bendidi Ihab
Tutorial for Decision Threshold In Machine Learning.

Decision-Threshold-ML Tutorial for improve skills: 'Decision Threshold In Machine Learning' (from GeeksforGeeks) by Marcus Mariano For more informatio

0 Jan 20, 2022
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.

TensorFlowOnSpark TensorFlowOnSpark brings scalable deep learning to Apache Hadoop and Apache Spark clusters. By combining salient features from the T

Yahoo 3.8k Jan 04, 2023
A collection of machine learning examples and tutorials.

machine_learning_examples A collection of machine learning examples and tutorials.

LazyProgrammer.me 7.1k Jan 01, 2023
A Lucid Framework for Transparent and Interpretable Machine Learning Models.

Currently a Beta-Version lucidmode is an open-source, low-code and lightweight Python framework for transparent and interpretable machine learning mod

lucidmode 15 Aug 12, 2022
A Python package to preprocess time series

Disclaimer: This package is WIP. Do not take any APIs for granted. tspreprocess Time series can contain noise, may be sampled under a non fitting rate

Maximilian Christ 57 Dec 17, 2022
A classification model capable of accurately predicting the price of secondhand cars

The purpose of this project is create a classification model capable of accurately predicting the price of secondhand cars. The data used for model building is open source and has been added to this

Akarsh Singh 2 Sep 13, 2022
A simple machine learning python sign language detection project.

SST Coursework 2022 About the app A python application that utilises the tensorflow object detection algorithm to achieve automatic detection of ameri

Xavier Koh 2 Jun 30, 2022
This repository has datasets containing information of Uber pickups in NYC from April 2014 to September 2014 and January to June 2015. data Analysis , virtualization and some insights are gathered here

uber-pickups-analysis Data Source: https://www.kaggle.com/fivethirtyeight/uber-pickups-in-new-york-city Information about data set The dataset contain

B DEVA DEEKSHITH 1 Nov 03, 2021
TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models

538 Jan 01, 2023
MLBox is a powerful Automated Machine Learning python library.

MLBox is a powerful Automated Machine Learning python library. It provides the following features: Fast reading and distributed data preprocessing/cle

Axel 1.4k Jan 06, 2023
ArviZ is a Python package for exploratory analysis of Bayesian models

ArviZ (pronounced "AR-vees") is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, model checking, comparison and diagnostics

ArviZ 1.3k Jan 05, 2023
Machine Learning approach for quantifying detector distortion fields

DistortionML Machine Learning approach for quantifying detector distortion fields. This project is a feasibility study for training a surrogate model

Joel Bernier 1 Nov 05, 2021
Python package for concise, transparent, and accurate predictive modeling

Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use. 📚 docs • 📖 demo notebooks Modern

Chandan Singh 983 Jan 01, 2023
Bottleneck a collection of fast, NaN-aware NumPy array functions written in C.

Bottleneck Bottleneck is a collection of fast, NaN-aware NumPy array functions written in C. As one example, to check if a np.array has any NaNs using

Python for Data 835 Dec 27, 2022
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

eXtreme Gradient Boosting Community | Documentation | Resources | Contributors | Release Notes XGBoost is an optimized distributed gradient boosting l

Distributed (Deep) Machine Learning Community 23.6k Jan 03, 2023
Python Research Framework

Python Research Framework

EleutherAI 106 Dec 13, 2022
Machine learning that just works, for effortless production applications

Machine learning that just works, for effortless production applications

Elisha Yadgaran 16 Sep 02, 2022
A benchmark of data-centric tasks from across the machine learning lifecycle.

A benchmark of data-centric tasks from across the machine learning lifecycle.

61 Dec 28, 2022
Forecast dynamically at scale with this unique package. pip install scalecast

🌄 Scalecast: Dynamic Forecasting at Scale About This package uses a scaleable forecasting approach in Python with common scikit-learn and statsmodels

Michael Keith 158 Jan 03, 2023
Decision Weights in Prospect Theory

Decision Weights in Prospect Theory It's clear that humans are irrational, but how irrational are they? After some research into behavourial economics

Cameron Davidson-Pilon 32 Nov 08, 2021