The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Related tags

Deep LearningRegSeg
Overview

RegSeg

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Paper: arxiv

params

D block

DBlock

Decoder

Decoder

Setup

Install the dependencies in requirements.txt by using pip and virtualenv.

Download Cityscapes

go to https://www.cityscapes-dataset.com, create an account, and download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip. You can delete the test images to save some space if you don't want to submit to the competition. Name the directory cityscapes_dataset. Make sure that you have downloaded the required python packages and run

CITYSCAPES_DATASET=cityscapes_dataset csCreateTrainIdLabelImgs

There are 19 classes.

Results from paper

To see the ablation studies results from the paper, go here.

Usage

To visualize your model, go to show.py. To train, validate, benchmark, and save the results of your model, go to train.py.

Results on Cityscapes server

RegSeg (exp48_decoder26, 30FPS): 78.3

Larger RegSeg (exp53_decoder29, 20 FPS): 79.5

Citation

If you find our work helpful, please consider citing our paper.

@article{gao2021rethink,
  title={Rethink Dilated Convolution for Real-time Semantic Segmentation},
  author={Gao, Roland},
  journal={arXiv preprint arXiv:2111.09957},
  year={2021}
}
Comments
  • question about STDC2-Seg75

    question about STDC2-Seg75

    Hi, I note that you benchmark the computation of STDC2-Seg75 which is not reported in the CVPR2021 paper. Did you test the speed of STDC-Seg on your own platform? How about the results?

    opened by ydhongHIT 2
  • Can not show.py

    Can not show.py

    I try show.py. But I can not.

    $ python3 show.py
    name= cityscapes
    train size: 2975
    val size: 500
    Traceback (most recent call last):
      File "show.py", line 358, in <module>
        show_cityscapes_model()
      File "show.py", line 337, in show_cityscapes_model
        show(model,val_loader,device,show_cityscapes_mask,num_images=num_images,skip=skip,images_per_line=images_per_line)
      File "show.py", line 134, in show
        outputs = model(images)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/model.py", line 76, in forward
        x=self.stem(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/blocks.py", line 22, in forward
        x = self.conv(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
    
    opened by sounansu 2
  • The pretrained model link

    The pretrained model link

    Hi, thank you for sharing the code. Can you provide download link about the pretrained model(exp48_decoder26 and exp53_decoder29) in Cityscapes dataset, Thank you very much!

    opened by gaowq2017 1
  • About train bug

    About train bug

    When using seg_transforms.py through your scripts 'camvid_efficientnet_b1_hyperseg-s', there always exsist 'TypeError: resize() got an unexpected keyword argument 'interpolation'' in 174 line. Does this bug only appear in this scripts and should I modify the code when using this scripts?

    opened by 870572761 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • About train code

    About train code

    When training, how did the miou and accuracy calculate? On train dataset or validate dataset? I think it's calculated on val dataset due to https://github.com/RolandGao/RegSeg/blob/main/train.py#L238. I trained the base regseg model with config cityscapes_trainval_1000epochs.yam on Cityscapes and got the unbelievable results. 840794c66f23deb33666dcffc4af5b5

    opened by Asthestarsfalll 6
  • confusion on field of view  and model inference time

    confusion on field of view and model inference time

    Hi, RolandGao, nice to see a good job! I see you've done a lot of experiments on the backbone setting, but I still have some confusion after reading your published paper.

    • First, You calculate the fov of 4095 to see the bottom-right pixel when training cityscape (1024x2048), so you have verify the backbone should be exp48 [ (1,1) + (1,2) + 4 * (1, 4) + 7 *(1, 14) ] with fov (3807). But I also find the same backbone when training the CamVid (720x960). Why not use a shallow backbone? I am training my own dataset with image resolution (512 x 512), do I need to modify the backbone architecture? Can you give some advice?
    • Second, I test inference time of regseg. I notice that the speed is not better than other real-time archs due to split and dilated conv even if model costs low GFLOPs. In the application, what we are concerned about is the speed, so is there any strategy to improve the speed?
    opened by LinaShanghaitech 5
  • Why not pretrain on ImageNet?

    Why not pretrain on ImageNet?

    Hi, Thanks for your excellent work ! I notice that RegSeg can achieve a high accuracy on Cityscapes without pretraining. I also did a lot of ablation studies and I think DDRNet will drop around 3% miou if they do not use ImageNet pretraining. How about trying to train your encoder on ImageNet and see what will happen? I really look forward to your result ! Thanks !

    opened by RobinhoodKi 1
Owner
Roland
University of Toronto CS 2023
Roland
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

41 May 18, 2022
SingleVC performs any-to-one VC, which is an important component of MediumVC project.

SingleVC performs any-to-one VC, which is an important component of MediumVC project. Here is the official implementation of the paper, MediumVC.

谷下雨 26 Dec 28, 2022
TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition This is an implementation of TCPNet. Introduction For video recognition task, a g

Zilin Gao 21 Dec 08, 2022
A texturizer that I just made. Nothing special here.

texturizer This is a little project that I did with an hour's time. It texturizes an image given a image and a texture to texturize it with. There is

1 Nov 11, 2021
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
Keras-1D-ACGAN-Data-Augmentation

Keras-1D-ACGAN-Data-Augmentation What is the ACGAN(Auxiliary Classifier GANs) ? Related Paper : [Abstract : Synthesizing high resolution photorealisti

Jae-Hoon Shim 7 Dec 23, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

SiamSA: Robust Siamese Object Tracking for Unmanned Aerial Manipulator Demo video πŸ“Ή Our video on Youtube and bilibili demonstrates the evaluation of

Intelligent Vision for Robotics in Complex Environment 12 Dec 18, 2022
Source code release of the paper: Knowledge-Guided Deep Fractal Neural Networks for Human Pose Estimation.

GNet-pose Project Page: http://guanghan.info/projects/guided-fractal/ UPDATE 9/27/2018: Prototxts and model that achieved 93.9Pck on LSP dataset. http

Guanghan Ning 83 Nov 21, 2022
A PyTorch implementation of the Transformer model in "Attention is All You Need".

Attention is all you need: A Pytorch Implementation This is a PyTorch implementation of the Transformer model in "Attention is All You Need" (Ashish V

Yu-Hsiang Huang 7.1k Jan 04, 2023
A toy compiler that can convert Python scripts to pickle bytecode πŸ₯’

Pickora 🐰 A small compiler that can convert Python scripts to pickle bytecode. Requirements Python 3.8+ No third-party modules are required. Usage us

κŒ—α–˜κ’’κ€€κ“„κ’’κ€€κˆ€κŸ 68 Jan 04, 2023
CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches

CRISCE: Automatically Generating Critical Driving Scenarios From Car Accident Sketches This document describes how to install and use CRISCE (CRItical

Chair of Software Engineering II, Uni Passau 2 Feb 09, 2022
Estimating Example Difficulty using Variance of Gradients

Estimating Example Difficulty using Variance of Gradients This repository contains source code necessary to reproduce some of the main results in the

Chirag Agarwal 48 Dec 26, 2022
A python implementation of Deep-Image-Analogy based on pytorch.

Deep-Image-Analogy This project is a python implementation of Deep Image Analogy.https://arxiv.org/abs/1705.01088. Some results Requirements python 3

Peng Lu 171 Dec 14, 2022
Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

cairone_fiorentino97 1 Dec 10, 2021
PyTorch implementation of Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network

hierarchical-multi-label-text-classification-pytorch Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach This

Mingu Kang 17 Dec 13, 2022
General Vision Benchmark, a project from OpenGVLab

Introduction We build GV-B(General Vision Benchmark) on Classification, Detection, Segmentation and Depth Estimation including 26 datasets for model e

174 Dec 27, 2022
Public scripts, services, and configuration for running a smart home K3S network cluster

makerhouse_network Public scripts, services, and configuration for running MakerHouse's home network. This network supports: TODO features here For mo

Scott Martin 1 Jan 15, 2022
Perform Linear Classification with Multi-way Data

MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (

Eric F. Lock 2 Dec 15, 2020