A PyTorch-based Semi-Supervised Learning (SSL) Codebase for Pixel-wise (Pixel) Vision Tasks

Overview

PixelSSL is a PyTorch-based semi-supervised learning (SSL) codebase for pixel-wise (Pixel) vision tasks.

The purpose of this project is to promote the research and application of semi-supervised learning on pixel-wise vision tasks. PixelSSL provides two major features:

  • Interface for implementing new semi-supervised algorithms
  • Template for encapsulating diverse computer vision tasks

As a result, the SSL algorithms integrated in PixelSSL are compatible with all task codes inherited from the given template.

In addition, PixelSSL provides the benchmarks for validating semi-supervised learning algorithms for some pixel-level tasks, which now include semantic segmentation.

News

  • [Dec 25 2020] PixelSSL v0.1.4 is Released!
    🎄 Merry Christmas! 🎄
    v0.1.4 supports the CutMix semi-supervised learning algorithm for pixel-wise classification.

  • [Nov 06 2020] PixelSSL v0.1.3 is Released!
    v0.1.3 supports the CCT semi-supervised learning algorithm for pixel-wise classification.

  • [Oct 28 2020] PixelSSL v0.1.2 is Released!
    v0.1.2 supports PSPNet and its SSL results for semantic segmentation task (check here).

    [More]

Supported Algorithms and Tasks

We are actively updating this project.
The SSL algorithms and demo tasks supported by PixelSSL are summarized in the following table:

Algorithms / Tasks Segmentation Other Tasks
SupOnly v0.1.0 Coming Soon
MT [1] v0.1.0 Coming Soon
AdvSSL [2] v0.1.0 Coming Soon
S4L [3] v0.1.1 Coming Soon
CCT [4] v0.1.3 Coming Soon
GCT [5] v0.1.0 Coming Soon
CutMix [6] v0.1.4 Coming Soon

[1] Mean Teachers are Better Role Models: Weight-Averaged Consistency Targets Improve Semi-Supervised Deep Learning Results
      Antti Tarvainen, and Harri Valpola. NeurIPS 2017.

[2] Adversarial Learning for Semi-Supervised Semantic Segmentation
      Wei-Chih Hung, Yi-Hsuan Tsai, Yan-Ting Liou, Yen-Yu Lin, and Ming-Hsuan Yang. BMVC 2018.

[3] S4L: Self-Supervised Semi-Supervised Learning
      Xiaohua Zhai, Avital Oliver, Alexander Kolesnikov, and Lucas Beyer. ICCV 2019.

[4] Semi-Supervised Semantic Segmentation with Cross-Consistency Training
      Yassine Ouali, CĂ©line Hudelot, and Myriam Tami. CVPR 2020.

[5] Guided Collaborative Training for Pixel-wise Semi-Supervised Learning
      Zhanghan Ke, Di Qiu, Kaican Li, Qiong Yan, and Rynson W.H. Lau. ECCV 2020.

[6] Semi-Supervised Semantic Segmentation Needs Strong, Varied Perturbations
      Geoff French, Samuli Laine, Timo Aila, Michal Mackiewicz, and Graham Finlayson. BMVC 2020.

Installation

Please refer to the Installation document.

Getting Started

Please follow the Getting Started document to run the provided demo tasks.

Tutorials

We provide the API document and some tutorials for using PixelSSL.

License

This project is released under the Apache 2.0 license.

Acknowledgement

We thank City University of Hong Kong and SenseTime for their support to this project.

Citation

This project is extended from our ECCV 2020 paper Guided Collaborative Training for Pixel-wise Semi-Supervised Learning (GCT). If this codebase or our method helps your research, please cite:

@InProceedings{ke2020gct,
  author = {Ke, Zhanghan and Qiu, Di and Li, Kaican and Yan, Qiong and Lau, Rynson W.H.},
  title = {Guided Collaborative Training for Pixel-wise Semi-Supervised Learning},
  booktitle = {European Conference on Computer Vision (ECCV)},
  month = {August},
  year = {2020},
}

Contact

This project is currently maintained by Zhanghan Ke (@ZHKKKe).
If you have any questions, please feel free to contact [email protected].

Comments
  • Question about the input size of images during inference time.

    Question about the input size of images during inference time.

    Dear author: I have a question about the inference setting. In this section: https://github.com/ZHKKKe/PixelSSL/blob/2e85e12c1db5b24206bfbbf2d7f6348ae82b2105/task/sseg/data.py#L102

        def _val_prehandle(self, image, label):
            sample = {self.IMAGE: image, self.LABEL: label}
            composed_transforms = transforms.Compose([
                FixScaleCrop(crop_size=self.args.im_size),
                Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
                ToTensor()])
    
            transformed_sample = composed_transforms(sample)
    
            return transformed_sample[self.IMAGE], transformed_sample[self.LABEL]
    

    I find that you crop the image as the input and calculate the metrics on the cropped image. However, I think we should use the whole image to calculate the metric. Based on this setting, the supervised full baseline is 2~3% mIoU lower than the raw performance. Could you explain it?

    opened by charlesCXK 16
  • some questions about Paper

    some questions about Paper "Guided Collaborative Training"

    great work. Thanks for your amazing codebase. I have some questions about this paper "Guided Collaborative Training for Pixel-wise Semi-Supervised Learning"

    1.I'm wondering if I can just use max score of a pixel as an evaluation criterion without Flaw Detector in semantic segmentation task? If so, how would it work if I use score directly, have you ever done such experiment?

    1. Is Flaw Correction Constraint forcing the error to 0 to correct the result of semantic segmentation? This loss, not quite understand what it means.
    opened by czy341181 8
  • Add implementation for Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network

    Add implementation for Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network

    Thanks for your sharing and the repo is quite helpful for me to understand the work in SSL segmentation. If possible, could you add the implementation of Semi-supervised Semantic Segmentation via Strong-weak Dual-branch Network (ECCV 2020), which is a simply dual branch network. It's a quite easy and inituitive idea but I could not reproduce the results with deeplabv2. It would be great if you could add this into the repo.

    opened by syorami 5
  • CUDA out of memory

    CUDA out of memory

    Hi ZHKKKE,

    First of all, thank you for your work. Currently, I retrain the gct by PSPNet with the ResNet-101 backbone in Pascal VOC, and use the parameter of im_size=513, batch_size=4 with 4 gpus. However, i am getting the error of insufficient memory. I retrained other methods you offered by using the parameter of im_size=513, batch_size=4 with 4 gpus and can get the accuracy provided by README.md.

    I want to know how you train the gct with 4 GPUs? Save memory by changing im_size=513 to im_size=321Or is there any other way

    Thank you and regards

    opened by Rainfor1 4
  • A question about ASPP

    A question about ASPP

    Thanks for your great work for tackling the pixel-wise semi-supervised tasks. I am currently following it and I have the following question.

    Should the returned value of 'out' at https://github.com/ZHKKKe/PixelSSL/blob/master/task/sseg/module/deeplab_v2.py#L85 be out of the for loop? Otherwise, the ASPP only adds the outputs of dilation rates 6 and 12.

    Thanks in advance : )

    opened by tianzhuotao 3
  • More data splits of VOC

    More data splits of VOC

    Dear author: Thank you for sharing! Could you share more data splits of your ECCV paper, such as data split of 1/16, 1/4, 1/2 of VOC? We want to run experiments based on more splits and make a comparison with the numbers reported in the paper. Thank you!

    opened by charlesCXK 2
  • FlawDetector In 3D version

    FlawDetector In 3D version

    Hi there, thanks for your work, it's very inspiring!

    And now I want to use the job in my project, but in 3D. I found that the FlawDetector for 2D is stacked of some conv layers with kernel size is 4 stride is 1 or 2 or some stuff.

    But my input size is 256, 256 after the self.conv3_1 will cause errors. So I have to modify kernel size from 4 to 3, and now before interpolating the feature map, the x's shape is (1, 1, 8, 8, 8), but to interpolating to shape of (1, 1, 16, 256, 256), the gap between the x and the task_pred seems too large.

    But in 2D mode, I set the input is (3, 256, 256) while the num_classes is 14, the x will be interpolated from (1, 1, 8, 8) to (1, 1, 256, 256). Is is reasonable?

    Thanks a lot!

    opened by DISAPPEARED13 0
  • About the performance of PSPNet.

    About the performance of PSPNet.

    Hello, thanks for your perfect work. I have a question about the performance of PSPNet , when i use PSPNet alone in my own dataset and my own code and trainning with 1/2 samples, the miou could reach about 68%. But when I change to your code and trainningwith suponly, the miou is only 60% . Could you please tell me what may be the reason for this.

    opened by liyanping0317 1
  • Is there a bug in task/sseg/func.py  metrics?

    Is there a bug in task/sseg/func.py metrics?

    Hi, ZHKKKe, Thank you for your excellent code.

    I found a suspected bug in task/sseg/func.py.

    In the function metrics, you reset all meters named acc_str/acc_class_str/mIoU_str/fwIoU_str. if meters.has_key(acc_str): meters.reset(acc_str) if meters.has_key(acc_class_str): meters.reset(acc_class_str) if meters.has_key(mIoU_str): meters.reset(mIoU_str) if meters.has_key(fwIoU_str): meters.reset(fwIoU_str) When I test your pre-trained model deeplabv2_pascalvoc_1-8_suponly.ckpt, I found the Validation metrics logging the whole confusion matrix. Shouldn‘t we count the single image acc/mIoU independently?

    I'm not sure whether my speculation is right, could you help me?

    opened by HHuiwen 1
  • Splits of Cityscapes ...

    Splits of Cityscapes ...

    Hi, thanks for your nice work!

    I have noticed that you only give us the data split of VOC2012, will you offer us the splits of cityscapes dataset?

    And from your scripts, The labeled data used in your experiments only samples in the order of names from the txt file, https://github.com/ZHKKKe/PixelSSL/blob/ce192034355ae6a77e47d2983d9c9242df60802a/task/sseg/dataset/PascalVOC/tool/random_sublabeled_samples.py#L21 labeled_num = int(len(samples) * labeled_ratio + 1) labeled_list = samples[:labeled_num]

    opened by ghost 3
Releases(v0.1.4)
Owner
Zhanghan Ke
PhD Candidate @ CityU
Zhanghan Ke
OSLO: Open Source framework for Large-scale transformer Optimization

O S L O Open Source framework for Large-scale transformer Optimization What's New: December 21, 2021 Released OSLO 1.0. What is OSLO about? OSLO is a

TUNiB 280 Nov 24, 2022
TUPÃ was developed to analyze electric field properties in molecular simulations

TUPÃ: Electric field analyses for molecular simulations What is TUPÃ? TUPÃ (pronounced as tu-pan) is a python algorithm that employs MDAnalysis engine

Marcelo D. PolĂȘto 10 Jul 17, 2022
This repo is customed for VisDrone.

Object Detection for VisDrone(无äșșæœșèˆȘæ‹ć›Ÿćƒç›źæ ‡æŁ€æ”‹) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 06, 2023
Demonstration of transfer of knowledge and generalization with distillation

Distilling-the-Knowledge-in-a-Neural-Network This is an implementation of a part of the paper "Distilling the Knowledge in a Neural Network" (https://

26 Nov 25, 2022
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery This is a public code repository for the publication: i-SpaSP: Structured Neural Pruning

Cameron Ronald Wolfe 5 Nov 04, 2022
Trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI

Introduction This script trains an agent with stochastic policy gradient ascent to solve the Lunar Lander challenge from OpenAI. In order to run this

Momin Haider 0 Jan 02, 2022
Wenzhou-Kean University AI-LAB

AI-LAB This is Wenzhou-Kean University AI-LAB. Our research interests are in Computer Vision and Natural Language Processing. Computer Vision Please g

WKU AI-LAB 10 May 05, 2022
Python tools for 3D face: 3DMM, Mesh processing(transform, camera, light, render), 3D face representations.

face3d: Python tools for processing 3D face Introduction This project implements some basic functions related to 3D faces. You can use this to process

Yao Feng 2.3k Dec 30, 2022
Repo 4 basic seminar §How to make human machine readable"

WORK IN PROGRESS... Notebooks from the Seminar: Human Machine Readable WS21/22 Introduction into programming Georg Trogemann, Christian Heck, Mattis

experimental-informatics 3 May 29, 2022
Official implementation of "A Unified Objective for Novel Class Discovery", ICCV2021 (Oral)

A Unified Objective for Novel Class Discovery This is the official repository for the paper: A Unified Objective for Novel Class Discovery Enrico Fini

Enrico Fini 118 Dec 26, 2022
Code and results accompanying our paper titled Mixture Proportion Estimation and PU Learning: A Modern Approach at Neurips 2021 (Spotlight)

Mixture Proportion Estimation and PU Learning: A Modern Approach This repository is the official implementation of Mixture Proportion Estimation and P

Approximately Correct Machine Intelligence (ACMI) Lab 23 Dec 28, 2022
Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index.

TechSEO Crawler Build a small, 3 domain internet using Github pages and Wikipedia and construct a crawler to crawl, render, and index. Play with the r

JR Oakes 57 Nov 24, 2022
(ICONIP 2020) MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image

MobileHand: Real-time 3D Hand Shape and Pose Estimation from Color Image This repo contains the source code for MobileHand, real-time estimation of 3D

90 Dec 12, 2022
Dungeons and Dragons randomized content generator

Component based Dungeons and Dragons generator Supports Entity/Monster Generation NPC Generation Weapon Generation Encounter Generation Environment Ge

Zac 3 Dec 04, 2021
Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Diverse Object-Scene Compositions For Zero-Shot Action Recognition This repository contains the source code for the use of object-scene compositions f

7 Sep 21, 2022
Implementation of ToeplitzLDA for spatiotemporal stationary time series data.

Code for the ToeplitzLDA classifier proposed in here. The classifier conforms sklearn and can be used as a drop-in replacement for other LDA classifiers. For in-depth usage refer to the learning from

Jan Sosulski 5 Nov 07, 2022
Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022)

Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022) Please cite "Independent SE(3)-Equivar

Octavian Ganea 154 Jan 02, 2023
Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Variance-Aware-MT-Test-Sets Variance-Aware Machine Translation Test Sets License See LICENSE. We follow the data licensing plan as the same as the WMT

NLP2CT Lab, University of Macau 5 Dec 21, 2021
Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022