Datasets and source code for our paper Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach

Overview

Introduction

Datasets and source code for our paper Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach


Datasets: WebFG-496 & WebiNat-5089

WebFG-496

WebFG-496 contains 200 subcategories of the "Bird" (Web-bird), 100 subcategories of the Aircraft" (Web-aircraft), and 196 subcategories of the "Car" (Web-car). It has a total number of 53339 web training images.

Download the dataset:

wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-aircraft.tar.gz
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-bird.tar.gz
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-car.tar.gz

WebiNat-5089

WebiNat-5089 is a large-scale webly supervised fine-grained dataset, which consists of 5089 subcategories and 1184520 web training images.

Download the dataset:

wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-00
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-01
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-02
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-03
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-04
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-05
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-06
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-07
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-08
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-09
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-10
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-11
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-12
wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/web-iNat.tar.gz.part-13

Dataset Briefing

  1. The statistics of popular fine-grained datasets and our datasets. “Supervision" means the training data is manually labeled (“Manual”) or collected from the web (“Web”).

dataset-stats

  1. Detailed construction process of training data in WebFG-496 and WebiNat-5089. “Testing Source” indicates where testing images come from. “Imbalance” is the number of images in the largest class divided by the number of images in the smallest.

dataset-construction_detail

  1. Rough label accuracy of training data estimated by random sampling for WebFG-496 and WebiNat-5089.

dataset-estimated_label_accuracy


Peer-learning model

Network Architecture

The architecture of our proposed peer-learning model is as follows network

Installation

After creating a virtual environment of python 3.5, run pip install -r requirements.txt to install all dependencies

How to use

The code is currently tested only on GPU

  • Data Preparation

    • WebFG-496

      Download data into PLM root directory and decompress them using

      tar -xvf web-aircraft.tar.gz
      tar -xvf web-bird.tar.gz
      tar -xvf web-car.tar.gz
      
    • WebiNat-5089

      Download data into PLM root directory and decompress them using

      cat web-iNat.tar.gz.part-* | tar -zxv
      
  • Source Code

    • If you want to train the whole network from beginning using source code on the WebFG-496 dataset, please follow subsequent steps

      • In Web496_train.sh
        • Modify CUDA_VISIBLE_DEVICES to proper cuda device id.
        • Modify DATA to web-aircraft/web-bird/web-car as needed and then modify N_CLASSES accordingly.
      • Activate virtual environment(e.g. conda) and then run the script
        bash Web496_train.sh
        
    • If you want to train the whole network from beginning using source code on the WebiNat-5089 dataset, please follow subsequent steps

      • Modify CUDA_VISIBLE_DEVICES to proper cuda device id in Web5089_train.sh.
      • Activate virtual environment(e.g. conda) and then run the script
        bash Web5089_train.sh
        
  • Demo

    • If you just want to do a quick test on the model and check the final fine-grained recognition performance on the WebFG-496 dataset, please follow subsequent steps

      • Download one of the following trained models into model/ using
        wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/Models/plm_web-aircraft_bcnn_best-epoch_74.38.pth
        wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/Models/plm_web-bird_bcnn_best-epoch_76.48.pth
        wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/Models/plm_web-car_bcnn_best-epoch_78.52.pth
        
      • Activate virtual environment (e.g. conda)
      • In Web496_demo.sh
        • Modify CUDA_VISIBLE_DEVICES to proper cuda device id.
        • Modify the model name according to the model downloaded.
        • Modify DATA to web-aircraft/web-bird/web-car according to the model downloaded and then modify N_CLASSES accordingly.
      • Run demo using bash Web496_demo.sh
    • If you just want to do a quick test on the model and check the final fine-grained recognition performance on the WebiNat-5089 dataset, please follow subsequent steps

      • Download one of the following trained models into model/ using
        wget https://web-fgvc-496-5089-sh.oss-cn-shanghai.aliyuncs.com/Models/plm_web-inat_resnet50_best-epoch_54.56.pth
        
      • Activate virtual environment (e.g. conda)
      • In Web5089_demo.sh
        • Modify CUDA_VISIBLE_DEVICES to proper cuda device id.
        • Modify the model name according to the model downloaded.
      • Run demo using bash Web5089_demo.sh

Results

  1. The comparison of classification accuracy (%) for benchmark methods and webly supervised baselines (Decoupling, Co-teaching, and our Peer-learning) on the WebFG-496 dataset.

network

  1. The comparison of classification accuracy (%) of benchmarks and our proposed webly supervised baseline Peer-learning on the WebiNat-5089 dataset.

network

  1. The comparisons among our Peer-learning model (PLM), VGG-19, B-CNN, Decoupling (DP), and Co-teaching (CT) on sub-datasets Web-aircraft, Web-bird, and Web-car in WebFG-496 dataset. The value on each sub-dataset is plotted in the dotted line and the average value is plotted in solid line. It should be noted that the classification accuracy is the result of the second stage in the two-step training strategy. Since we have trained 60 epochs in the second stage on the basic network VGG-19, we only compare the first 60 epochs in the second stage of our approach with VGG-19

network


Citation

If you find this useful in your research, please consider citing:

@inproceedings{
title={Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach},
author={Zeren Sun, Yazhou Yao, Xiu-Shen Wei, Yongshun Zhang, Fumin Shen, Jianxin Wu, Jian Zhang, Heng Tao Shen},
booktitle={IEEE International Conference on Computer Vision (ICCV)},
year={2021}
}
Poplar implementation of "Bundle Adjustment on a Graph Processor" (CVPR 2020)

Poplar Implementation of Bundle Adjustment using Gaussian Belief Propagation on Graphcore's IPU Implementation of CVPR 2020 paper: Bundle Adjustment o

Joe Ortiz 34 Dec 05, 2022
Generative Adversarial Networks for High Energy Physics extended to a multi-layer calorimeter simulation

CaloGAN Simulating 3D High Energy Particle Showers in Multi-Layer Electromagnetic Calorimeters with Generative Adversarial Networks. This repository c

Deep Learning for HEP 101 Nov 13, 2022
Continuous Query Decomposition for Complex Query Answering in Incomplete Knowledge Graphs

Continuous Query Decomposition This repository contains the official implementation for our ICLR 2021 (Oral) paper, Complex Query Answering with Neura

UCL Natural Language Processing 71 Dec 29, 2022
Unofficial Implementation of Oboe (SIGCOMM'18').

Oboe-Reproduce This is the unofficial implementation of the paper "Oboe: Auto-tuning video ABR algorithms to network conditions, Zahaib Akhtar, Yun Se

Tianchi Huang 13 Nov 04, 2022
Voice of Pajlada with model and weights.

Pajlada TTS Stripped down version of ForwardTacotron (https://github.com/as-ideas/ForwardTacotron) with pretrained weights for Pajlada's (https://gith

6 Sep 03, 2021
Pytorch implementation of "Forward Thinking: Building and Training Neural Networks One Layer at a Time"

forward-thinking-pytorch Pytorch implementation of Forward Thinking: Building and Training Neural Networks One Layer at a Time Requirements Python 2.7

Kim Heecheol 65 Oct 06, 2022
This is implementation of AlexNet(2012) with 3D Convolution on TensorFlow (AlexNet 3D).

AlexNet_3dConv TensorFlow implementation of AlexNet(2012) by Alex Krizhevsky, with 3D convolutiional layers. 3D AlexNet Network with a standart AlexNe

Denis Timonin 41 Jan 16, 2022
Constraint-based geometry sketcher for blender

Constraint-based sketcher addon for Blender that allows to create precise 2d shapes by defining a set of geometric constraints like tangent, distance,

1.7k Dec 31, 2022
Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

Vision and Language Group@ MIL 23 Dec 21, 2022
Implementation for the paper SMPLicit: Topology-aware Generative Model for Clothed People (CVPR 2021)

SMPLicit: Topology-aware Generative Model for Clothed People [Project] [arXiv] License Software Copyright License for non-commercial scientific resear

Enric Corona 225 Dec 13, 2022
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

RAVE: Realtime Audio Variational autoEncoder Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthes

ACIDS 587 Jan 01, 2023
Gesture Volume Control Using OpenCV and MediaPipe

This Project Uses OpenCV and MediaPipe Hand solutions to identify hands and Change system volume by taking thumb and index finger positions

Pratham Bhatnagar 6 Sep 12, 2022
An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

IceVision is the first agnostic computer vision framework to offer a curated collection with hundreds of high-quality pre-trained models from torchvision, MMLabs, and soon Pytorch Image Models. It or

airctic 789 Dec 29, 2022
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
Blender Python - Node-based multi-line text and image flowchart

MindMapper v0.8 Node-based text and image flowchart for Blender Mindmap with shortcuts visible: Mindmap with shortcuts hidden: Notes This was requeste

SpectralVectors 58 Oct 08, 2022
working repo for my xumx-sliCQ submissions to the ISMIR 2021 MDX

Music Demixing Challenge - xumx-sliCQ This repository is the GitHub mirror of my working submission repository for the AICrowd ISMIR 2021 Music Demixi

4 Aug 25, 2021
Official Repsoitory for "Activate or Not: Learning Customized Activation." [CVPR 2021]

CVPR 2021 | Activate or Not: Learning Customized Activation. This repository contains the official Pytorch implementation of the paper Activate or Not

184 Dec 27, 2022
Zero-shot Synthesis with Group-Supervised Learning (ICLR 2021 paper)

GSL - Zero-shot Synthesis with Group-Supervised Learning Figure: Zero-shot synthesis performance of our method with different dataset (iLab-20M, RaFD,

Andy_Ge 62 Dec 21, 2022
DeepHyper: Scalable Asynchronous Neural Architecture and Hyperparameter Search for Deep Neural Networks

What is DeepHyper? DeepHyper is a software package that uses learning, optimization, and parallel computing to automate the design and development of

DeepHyper Team 214 Jan 08, 2023
MTCNN face detection implementation for TensorFlow, as a PIP package.

MTCNN Implementation of the MTCNN face detector for Keras in Python3.4+. It is written from scratch, using as a reference the implementation of MTCNN

Iván de Paz Centeno 1.9k Dec 30, 2022