[CVPR2022] This repository contains code for the paper "Nested Collaborative Learning for Long-Tailed Visual Recognition", published at CVPR 2022

Related tags

Data AnalysisNCL
Overview

Nested Collaborative Learning for Long-Tailed Visual Recognition

This repository is the official PyTorch implementation of the paper in CVPR 2022:

Nested Collaborative Learning for Long-Tailed Visual Recognition
Jun Li, Zichang Tan, Jun Wan, Zhen Lei, Guodong Guo
[PDF]  

 

Main requirements

torch >= 1.7.1 #This is the version I am using, other versions may be accteptable, if there is any problem, go to https://pytorch.org/get-started/previous-versions/ to get right version(espicially CUDA) for your machine.
tensorboardX >= 2.1 #Visualization of the training process.
tensorflow >= 1.14.0 #convert long-tailed cifar datasets from tfrecords to jpgs.
Python 3.6 #This is the version I am using, other versions(python 3.x) may be accteptable.

Detailed requirement

pip install -r requirements.txt

Prepare datasets

This part is mainly based on https://github.com/zhangyongshun/BagofTricks-LT

We provide three datasets in this repo: long-tailed CIFAR (CIFAR-LT), long-tailed ImageNet (ImageNet-LT), iNaturalist 2018 (iNat18) and Places_LT.

The detailed information of these datasets are shown as follows:

Datasets CIFAR-10-LT CIFAR-100-LT ImageNet-LT iNat18 Places_LT
Imbalance factor
100 50 100 50
Training images 12,406 13,996 10,847 12,608 11,5846 437,513 62,500
Classes 50 50 100 100 1,000 8,142 365
Max images 5,000 5,000 500 500 1,280 1,000 4,980
Min images 50 100 5 10 5 2 5
Imbalance factor 100 50 100 50 256 500 996
-"Max images" and "Min images" represents the number of training images in the largest and smallest classes, respectively.

-"CIFAR-10-LT-100" means the long-tailed CIFAR-10 dataset with the imbalance factor beta = 100.

-"Imbalance factor" is defined as: beta = Max images / Min images.

  • Data format

The annotation of a dataset is a dict consisting of two field: annotations and num_classes. The field annotations is a list of dict with image_id, fpath, im_height, im_width and category_id.

Here is an example.

{
    'annotations': [
                    {
                        'image_id': 1,
                        'fpath': '/data/iNat18/images/train_val2018/Plantae/7477/3b60c9486db1d2ee875f11a669fbde4a.jpg',
                        'im_height': 600,
                        'im_width': 800,
                        'category_id': 7477
                    },
                    ...
                   ]
    'num_classes': 8142
}
  • CIFAR-LT

    Cui et al., CVPR 2019 firstly proposed the CIFAR-LT. They provided the download link of CIFAR-LT, and also the codes to generate the data, which are in TensorFlow.

    You can follow the steps below to get this version of CIFAR-LT:

    1. Download the Cui's CIFAR-LT in GoogleDrive or Baidu Netdisk (password: 5rsq). Suppose you download the data and unzip them at path /downloaded/data/.
    2. Run tools/convert_from_tfrecords, and the converted CIFAR-LT and corresponding jsons will be generated at /downloaded/converted/.
    # Convert from the original format of CIFAR-LT
    python tools/convert_from_tfrecords.py  --input_path /downloaded/data/ --output_path /downloaded/converted/
  • ImageNet-LT

    You can use the following steps to convert from the original images of ImageNet-LT.

    1. Download the original ILSVRC-2012. Suppose you have downloaded and reorgnized them at path /downloaded/ImageNet/, which should contain two sub-directories: /downloaded/ImageNet/train and /downloaded/ImageNet/val.
    2. Directly replace the data root directory in the file dataset_json/ImageNet_LT_train.json, dataset_json/ImageNet_LT_val.json,You can handle this with any editor, or use the following command.
    # replace data root
    python tools/replace_path.py --json_file dataset_json/ImageNet_LT_train.json --find_root /media/ssd1/lijun/ImageNet_LT --replaces_to /downloaded/ImageNet
    
    python tools/replace_path.py --json_file dataset_json/ImageNet_LT_val.json --find_root /media/ssd1/lijun/ImageNet_LT --replaces_to /downloaded/ImageNet
    
  • iNat18

    You can use the following steps to convert from the original format of iNaturalist 2018.

    1. The images and annotations should be downloaded at iNaturalist 2018 firstly. Suppose you have downloaded them at path /downloaded/iNat18/.
    2. Directly replace the data root directory in the file dataset_json/iNat18_train.json, dataset_json/iNat18_val.json,You can handle this with any editor, or use the following command.
    # replace data root
    python tools/replace_path.py --json_file dataset_json/iNat18_train.json --find_root /media/ssd1/lijun/inaturalist2018/train_val2018 --replaces_to /downloaded/iNat18
    
    python tools/replace_path.py --json_file dataset_json/iNat18_val.json --find_root /media/ssd1/lijun/inaturalist2018/train_val2018 --replaces_to /downloaded/iNat18
    
  • Places_LT

    You can use the following steps to convert from the original format of Places365-Standard.

    1. The images and annotations should be downloaded at Places365-Standard firstly. Suppose you have downloaded them at path /downloaded/Places365/.
    2. Directly replace the data root directory in the file dataset_json/Places_LT_train.json, dataset_json/Places_LT_val.json,You can handle this with any editor, or use the following command.
    # replace data root
    python tools/replace_path.py --json_file dataset_json/Places_LT_train.json --find_root /media/ssd1/lijun/data/places365_standard --replaces_to /downloaded/Places365
    
    python tools/replace_path.py --json_file dataset_json/Places_LT_val.json --find_root /media/ssd1/lijun/data/places365_standard --replaces_to /downloaded/Places365
    

Usage

First, prepare the dataset and modify the relevant paths in config/CIFAR100/cifar100_im100_NCL.yaml

Parallel training with DataParallel

1, Train
# Train long-tailed CIFAR-100 with imbalanced ratio of 100. 
# `GPUs` are the GPUs you want to use, such as '0' or`0,1,2,3`.
bash data_parallel_train.sh /home/lijun/papers/NCL/config/CIFAR/CIFAR100/cifar100_im100_NCL.yaml 0

Distributed training with DistributedDataParallel

Note that if you choose to train with DistributedDataParallel, the BATCH_SIZE in .yaml indicates the number on each GPU!

Default training batch-size: CIFAR: 64; ImageNet_LT: 256; Places_LT: 256; iNat18: 512.

e.g. if you want to train NCL with batch-size=512 on 8 GPUS, you should set the BATCH_SIZE in .yaml to 64.

1, Change the NCCL_SOCKET_IFNAME in run_with_distributed_parallel.sh to [your own socket name]. 
export NCCL_SOCKET_IFNAME = [your own socket name]

2, Train
# Train inaturalist2018. 
# `GPUs` are the GPUs you want to use, such as `0,1,2,3,4,5,6,7`.
# `NUM_GPUs` are the number of GPUs you want to use. If you set `GPUs` to `0,1,2,3,4,5,6,7`, then `NUM_GPUs` should be `8`.
bash distributed_data_parallel_train.sh config/iNat18/inat18_NCL.yaml 8 0,1,2,3,4,5,6,7

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star and a citation.

@inproceedings{li2022nested,
  title={Nested Collaborative Learning for Long-Tailed Visual Recognition},
  author={Li, Jun and Tan, Zichang and Wan, Jun and Lei, Zhen and Guo, Guodong},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Acknowledgements

This is a project based on Bag of tricks.

The data augmentations in dataset are based on PaCo

The MOCO in constrstive learning is based on MOCO

Owner
Jun Li
Jun Li
PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

NCL (Neighborhood-enrighed Contrastive Learning) This is the official PyTorch implementation for the paper: Zihan Lin*, Changxin Tian*, Yupeng Hou* Wa

RUCAIBox 73 Jan 03, 2023
An Integrated Experimental Platform for time series data anomaly detection.

Curve Sorry to tell contributors and users. We decided to archive the project temporarily due to the employee work plan of collaborators. There are no

Baidu 486 Dec 21, 2022
A Numba-based two-point correlation function calculator using a grid decomposition

A Numba-based two-point correlation function (2PCF) calculator using a grid decomposition. Like Corrfunc, but written in Numba, with simplicity and hackability in mind.

Lehman Garrison 3 Aug 24, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
INF42 - Topological Data Analysis

TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de

2 Jan 07, 2022
Find exposed data in Azure with this public blob scanner

BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of

CyberArk 250 Jan 03, 2023
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 05, 2022
Pyspark project that able to do joins on the spark data frames.

SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d

Joshua 1 Dec 14, 2021
Minimal working example of data acquisition with nidaqmx python API

Data Aquisition using NI-DAQmx python API Based on this project It is a minimal working example for data acquisition using the NI-DAQmx python API. It

Pablo 1 Nov 05, 2021
PyPSA: Python for Power System Analysis

1 Python for Power System Analysis Contents 1 Python for Power System Analysis 1.1 About 1.2 Documentation 1.3 Functionality 1.4 Example scripts as Ju

758 Dec 30, 2022
Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Allen Institute Visual Behavior Analysis package This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual

Allen Institute 16 Nov 04, 2022
Very useful and necessary functions that simplify working with data

Additional-function-for-pandas Very useful and necessary functions that simplify working with data random_fill_nan(module_name, nan) - Replaces all sp

Alexander Goldian 2 Dec 02, 2021
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

PandaPy "I came across PandaPy last week and have already used it in my current project. It is a fascinating Python library with a lot of potential to

Derek Snow 527 Jan 02, 2023
OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase working capital.

Overview OpenARB is an open source program aiming to emulate a free market while encouraging players to participate in arbitrage in order to increase

Tom 3 Feb 12, 2022
This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

This repo contains a powerful tool made using python which is used to visualize, analyse and finally assess the quality of the product depending upon the given observations

SasiVatsal 8 Oct 18, 2022
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022
Wafer Fault Detection - Wafer circleci with python

Wafer Fault Detection Problem Statement: Wafer (In electronics), also called a slice or substrate, is a thin slice of semiconductor, such as a crystal

Avnish Yadav 14 Nov 21, 2022
Performance analysis of predictive (alpha) stock factors

Alphalens Alphalens is a Python Library for performance analysis of predictive (alpha) stock factors. Alphalens works great with the Zipline open sour

Quantopian, Inc. 2.5k Jan 09, 2023