imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL: Deep Imbalanced Learning in Python

Overview

imbalanced-DL (imported as imbalanceddl) is a Python package designed to make deep imbalanced learning easier for researchers and real-world users. From our experiences, we observe that to tackcle deep imbalanced learning, there is a need for a strategy. That is, we may not just address this problem with one single model or approach. Thus in this package, we seek to provide several strategies for deep imbalanced learning. The package not only implements several popular deep imbalanced learning strategies, but also provides benchmark results on several image classification tasks. Futhermore, this package provides an interface for implementing more datasets and strategies.

Strategy

We provide some baseline strategies as well as some state-of-the-are strategies in this package as the following:

Environments

  • This package is tested on Linux OS.
  • You are suggested to use a different virtual environment so as to avoid package dependency issue.
  • For Pyenv & Virtualenv users, you can follow the below steps to create a new virtual environment or you can also skip this step.
Pyenv & Virtualenv (Optinal)
  • For dependency isolation, it's better to create another virtual environment for usage.
  • The following will be the demo for creating and managing virtual environment.
  • Install pyenv & virtualenv first.
  • pyenv virtualenv [version] [virtualenv_name]
    • For example, if you'd like to use python 3.6.8, you can do: pyenv virtualenv 3.6.8 TestEnv
  • mkdir [dir_name]
  • cd [dir_name]
  • pyenv local [virtualenv_name]
  • Then, you will have a new (clean) python virtual environment for the package installation.

Installation

Basic Requirement

  • Python >= 3.6
git clone https://github.com/ntucllab/imbalanced-DL.git
cd imbalanceddl
python -m pip install -r requirements.txt
python setup.py install

Usage

We highlight three key features of imbalanced-DL as the following:

(0) Imbalanced Dataset:

  • We support 5 benchmark image datasets for deep imbalanced learing.
  • To create and ImbalancedDataset object, you will need to provide a config_file as well as the dataset name you would like to use.
  • Specifically, inside the config_file, you will need to specify three key parameters for creating imbalanced dataset.
    • imb_type: you can choose from exp (long-tailed imbalance) or step imbalanced type.
    • imb_ratio: you can specify the imbalanceness of your data, typically researchers choose 0.1 or 0.01.
    • dataset_name: you can specify 5 benchmark image datasets we provide, or you can implement your own dataset.
    • For an example of the config_file, you can see example/config.
  • To contruct your own dataset, you should inherit from BaseDataset, and you can follow torchvision.datasets.ImageFolder to construct your dataset in PyTorch format.
from imbalanceddl.dataset.imbalance_dataset import ImbalancedDataset

# specify the dataset name
imbalance_dataset = ImbalancedDataset(config, dataset_name=config.dataset)

(1) Strategy Trainer:

  • We support 6 different strategies for deep imbalance learning, and you can either choose to train from scratch, or evaluate with the best model after training. To evaluate with the best model, you can get more in-depth metrics such as per class accuracy for further evaluation on the performance of the selected strategy. We provide one trained model in example/checkpoint_cifar10.
  • For each strategy trainer, it is associated with a config_file, ImbalancedDataset object, model, and strategy_name.
  • Specifically, the config_file will provide some training parameters, where the default settings for reproducing benchmark result can be found in example/config. You can also set these training parameters based on your own need.
  • For model, we currently provide resnet32 and resnet18 for reproducing the benchmark results.
  • We provide a build_trainer() function to return the specified trainer as the following.
from imbalanceddl.strategy.build_trainer import build_trainer

# specify the strategy
trainer = build_trainer(config,
                        imbalance_dataset,
                        model=model,
                        strategy=config.strategy)
# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • Or you can also just select the specific strategy you would like to use as:
from imbalanceddl.strategy import LDAMDRWTrainer

# pick the trainer
trainer = LDAMDRWTrainer(config,
                         imbalance_dataset,
                         model=model,
                         strategy=config.strategy)

# train from scratch
trainer.do_train_val()

# Evaluate with best model
trainer.eval_best_model()
  • To construct your own strategy trainer, you need to inherit from Trainer class, where in your own strategy you will have to implement get_criterion() and train_one_epoch() method. After this you can choose whether to add your strategy to build_trainer() function or you can just use it as the above demonstration.

(2) Benchmark research environment:

  • To conduct deep imbalanced learning research, we provide example codes for training with different strategies, and provide benchmark results on five image datasets. To quickly start training CIFAR-10 with ERM strategy, you can do:
cd example
python main.py --gpu 0 --seed 1126 --c config/config_cifar10.yaml --strategy ERM

  • Following the example code, you can not only get results from baseline training as well as state-of-the-art performance such as LDAM or Remix, but also use this environment to develop your own algorithm / strategy. Feel free to add your own strategy into this package.
  • For more information about example and usage, please see the Example README

Benchmark Results

We provide benchmark results on 5 image datasets, including CIFAR-10, CIFAR-100, CINIC-10, SVHN, and Tiny-ImageNet. We follow standard procedure to generate imbalanced training dataset for these 5 datasets, and provide their top 1 validation accuracy results for research benchmark. For example, below you can see the result table of Long-tailed Imbalanced CIFAR-10 trained on different strategies. For more detailed benchmark results, please see example/README.md.

  • Long-tailed Imbalanced CIFAR-10
imb_type imb_factor Model Strategy Validation Top 1
long-tailed 100 ResNet32 ERM 71.23
long-tailed 100 ResNet32 DRW 75.08
long-tailed 100 ResNet32 LDAM-DRW 77.75
long-tailed 100 ResNet32 Mixup-DRW 82.11
long-tailed 100 ResNet32 Remix-DRW 81.82

Test

  • python -m unittest -v

Contact

If you have any question, please don't hesitate to email [email protected]. Thanks !

Acknowledgement

The authors thank members of the Computational Learning Lab at National Taiwan University for valuable discussions and various contributions to making this package better.

Owner
NTUCSIE CLLab
Computational Learning Lab, Dept. of Computer Science and Information Engineering, National Taiwan University
NTUCSIE CLLab
Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color

75 Dec 02, 2022
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022
Towards Debiasing NLU Models from Unknown Biases

Towards Debiasing NLU Models from Unknown Biases Abstract: NLU models often exploit biased features to achieve high dataset-specific performance witho

Ubiquitous Knowledge Processing Lab 22 Jun 14, 2022
ICCV2021 Expert-Goal Trajectory Prediction

ICCV 2021: Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples This repository contains the code for the paper Where are yo

hz 21 Dec 12, 2022
Code for "AutoMTL: A Programming Framework for Automated Multi-Task Learning"

AutoMTL: A Programming Framework for Automated Multi-Task Learning This is the website for our paper "AutoMTL: A Programming Framework for Automated M

Ivy Zhang 40 Dec 04, 2022
This is the dataset for testing the robustness of various VO/VIO methods

KAIST VIO dataset This is the dataset for testing the robustness of various VO/VIO methods You can download the whole dataset on KAIST VIO dataset Ind

1 Sep 01, 2022
Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022
Reviving Iterative Training with Mask Guidance for Interactive Segmentation

This repository provides the source code for training and testing state-of-the-art click-based interactive segmentation models with the official PyTorch implementation

Visual Understanding Lab @ Samsung AI Center Moscow 406 Jan 01, 2023
Implementation of ToeplitzLDA for spatiotemporal stationary time series data.

Code for the ToeplitzLDA classifier proposed in here. The classifier conforms sklearn and can be used as a drop-in replacement for other LDA classifiers. For in-depth usage refer to the learning from

Jan Sosulski 5 Nov 07, 2022
Repository of best practices for deep learning in Julia, inspired by fastai

FastAI Docs: Stable | Dev FastAI.jl is inspired by fastai, and is a repository of best practices for deep learning in Julia. Its goal is to easily ena

FluxML 532 Jan 02, 2023
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.

Xcessiv Xcessiv is a tool to help you create the biggest, craziest, and most excessive stacked ensembles you can think of. Stacked ensembles are simpl

Reiichiro Nakano 1.3k Nov 17, 2022
PyTorch implementation of our paper How robust are discriminatively trained zero-shot learning models?

How robust are discriminatively trained zero-shot learning models? This repository contains the PyTorch implementation of our paper How robust are dis

Mehmet Kerim Yucel 5 Feb 04, 2022
Understanding the Properties of Minimum Bayes Risk Decoding in Neural Machine Translation.

Understanding Minimum Bayes Risk Decoding This repo provides code and documentation for the following paper: Müller and Sennrich (2021): Understanding

ZurichNLP 13 May 01, 2022
PASSL包含 SimCLR,MoCo,BYOL,CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer,Swin-Transformer,BEiT,CVT,T2T,MLP_Mixer等视觉Transformer算法

PASSL Introduction PASSL is a Paddle based vision library for state-of-the-art Self-Supervised Learning research with PaddlePaddle. PASSL aims to acce

186 Dec 29, 2022
On Out-of-distribution Detection with Energy-based Models

On Out-of-distribution Detection with Energy-based Models This repository contains the code for the experiments conducted in the paper On Out-of-distr

Sven 19 Aug 07, 2022
This repo is for segmentation of T2 hyp regions in gliomas.

T2-Hyp-Segmentor This repo is for segmentation of T2 hyp regions in gliomas. By downloading the model from here you can use it to segment your T2w ima

1 Jan 18, 2022
Demonstration of the Model Training as a CI/CD System in Vertex AI

Model Training as a CI/CD System This project demonstrates the machine model training as a CI/CD system in GCP platform. You will see more detailed wo

Chansung Park 19 Dec 28, 2022
A PyTorch implementation of the paper "Semantic Image Synthesis via Adversarial Learning" in ICCV 2017

Semantic Image Synthesis via Adversarial Learning This is a PyTorch implementation of the paper Semantic Image Synthesis via Adversarial Learning. Req

Seonghyeon Nam 146 Nov 25, 2022
Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US simulation

AutomaticUSnavigation Investigating automatic navigation towards standard US views integrating MARL with the virtual US environment developed in CT2US

Cesare Magnetti 6 Dec 05, 2022
PromptDet: Expand Your Detector Vocabulary with Uncurated Images

PromptDet: Expand Your Detector Vocabulary with Uncurated Images Paper Website Introduction The goal of this work is to establish a scalable pipeline

103 Dec 20, 2022