TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

Overview

ICNet_tensorflow

HitCount

This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images," by Hengshuang Zhao, and et. al. (ECCV'18).

The model generates segmentation mask for every pixel in the image. It's based on the ResNet50 with totally three branches as auxiliary paths, see architecture below for illustration.

We provide both training and inference code in this repo. The pre-trained models we provided are converted from caffe weights in Official Implementation.

News (2018.10.22 updated):

Now you can try ICNet on your own image online using ModelDepot live demo!

Table Of Contents

Environment Setup

pip install tensorflow-gpu opencv-python jupyter matplotlib tqdm

Download Weights

We provide pre-trained weights for cityscapes and ADE20k dataset. You can download the weights easily use following command,

python script/download_weights.py --dataset cityscapes (or ade20k)

Download Dataset (Optional)

If you want to evaluate the provided weights or keep fine-tuning on cityscapes and ade20k dataset, you need to download them using different methods.

ADE20k dataset

Simply run following command:

bash script/download_ADE20k.sh

Cityscapes dataset

You need to download Cityscape dataset from Official website first (you'll need to request access which may take couple of days).

Then convert downloaded dataset ground truth to training format by following instructions to install cityscapesScripts then running these commands:

export CITYSCAPES_DATASET=<cityscapes dataset path>
csCreateTrainIdLabelImgs

Get started!

This repo provide three phases with full documented, which means you can try train/evaluate/inference on your own.

Inference on your own image

demo.ipynb show the easiest example to run semantic segmnetation on your own image.

In the end of demo.ipynb, you can test the speed of ICNet.

Here are some results run on Titan Xp with high resolution images (1024x2048):
~0.037(s) per images, which means we can get ~27 fps (nearly same as described in paper).

Evaluate on cityscapes/ade20k dataset

To get the results, you need to follow the steps metioned above to download dataset first.
Then you need to change the data_dir path in config.py.

CITYSCAPES_DATA_DIR = '/data/cityscapes_dataset/cityscape/'
ADE20K_DATA_DIR = './data/ADEChallengeData2016/'

Cityscapes

Perform in single-scaled model on the cityscapes validation dataset. (We have sucessfully re-produced the performance same to caffe framework).

Model Accuracy Model Accuracy
train_30k   67.26%/67.7% train_30k_bn 67.31%/67.7%
trainval_90k 80.90% trainval_90k_bn 0.8081%

Run following command to get evaluation results,

python evaluate.py --dataset=cityscapes --filter-scale=1 --model=trainval

List of Args:

--model=train       - To select train_30k model
--model=trainval    - To select trainval_90k model
--model=train_bn    - To select train_30k_bn model
--model=trainval_bn - To select trainval_90k_bn model

ADE20k

Reach 32.25%mIoU on ADE20k validation set.

python evaluate.py --dataset=ade20k --filter-scale=2 --model=others

Note: to use model provided by us, set filter-scale to 2.

Training on your own dataset

This implementation is different from the details descibed in ICNet paper, since I did not re-produce model compression part. Instead, we train on the half kernels directly.

In orignal paper, the authod trained the model in full kernels and then performed model-pruning techique to kill half kernels. Here we use --filter-scale to denote whether pruning or not.

For example, --filter-scale=1 <-> [h, w, 32] and --filter-scale=2 <-> [h, w, 64].

Step by Step

1. Change the configurations in utils/config.py.

cityscapes_param = {'name': 'cityscapes',
                    'num_classes': 19,
                    'ignore_label': 255,
                    'eval_size': [1025, 2049],
                    'eval_steps': 500,
                    'eval_list': CITYSCAPES_eval_list,
                    'train_list': CITYSCAPES_train_list,
                    'data_dir': CITYSCAPES_DATA_DIR}

2. Set Hyperparameters in train.py,

class TrainConfig(Config):
    def __init__(self, dataset, is_training,  filter_scale=1, random_scale=None, random_mirror=None):
        Config.__init__(self, dataset, is_training, filter_scale, random_scale, random_mirror)

    # Set pre-trained weights here (You can download weight using `python script/download_weights.py`) 
    # Note that you need to use "bnnomerge" version.
    model_weight = './model/cityscapes/icnet_cityscapes_train_30k_bnnomerge.npy'
    
    # Set hyperparameters here, you can get much more setting in Config Class, see 'utils/config.py' for details.
    LAMBDA1 = 0.16
    LAMBDA2 = 0.4
    LAMBDA3 = 1.0
    BATCH_SIZE = 4
    LEARNING_RATE = 5e-4

3. Run following command and decide whether to update mean/var or train beta/gamma variable.

python train.py --update-mean-var --train-beta-gamma \
      --random-scale --random-mirror --dataset cityscapes --filter-scale 2

Note: Be careful to use --update-mean-var! Use this flag means you will update the moving mean and moving variance in batch normalization layer. This need large batch size, otherwise it will lead bad results.

Result (inference with my own data)

Citation

@article{zhao2017icnet,
  author = {Hengshuang Zhao and
            Xiaojuan Qi and
            Xiaoyong Shen and
            Jianping Shi and
            Jiaya Jia},
  title = {ICNet for Real-Time Semantic Segmentation on High-Resolution Images},
  journal={arXiv preprint arXiv:1704.08545},
  year = {2017}
}

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}

@article{zhou2016semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={arXiv preprint arXiv:1608.05442},
  year={2016}
}

If you find this implementation or the pre-trained models helpful, please consider to cite:

@misc{Yang2018,
  author = {Hsuan-Kung, Yang},
  title = {ICNet-tensorflow},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/hellochick/ICNet-tensorflow}}
}
Owner
HsuanKung Yang
HsuanKung Yang
ICCV2021 Papers with Code

ICCV2021 Papers with Code

Amusi 1.4k Jan 02, 2023
[NeurIPS 2021 Spotlight] Code for Learning to Compose Visual Relations

Learning to Compose Visual Relations This is the pytorch codebase for the NeurIPS 2021 Spotlight paper Learning to Compose Visual Relations. Demo Imag

Nan Liu 88 Jan 04, 2023
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue.

The repo contains the code to train and evaluate a system which extracts relations and explanations from dialogue. How do I cite D-REX? For now, cite

Alon Albalak 6 Mar 31, 2022
A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

Hyunsoo Cho 1 Dec 20, 2021
Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021)

Neural-Pull: Learning Signed Distance Functions from Point Clouds by Learning to Pull Space onto Surfaces(ICML 2021) This repository contains the code

149 Dec 15, 2022
RGB-D Local Implicit Function for Depth Completion of Transparent Objects

RGB-D Local Implicit Function for Depth Completion of Transparent Objects [Project Page] [Paper] Overview This repository maintains the official imple

NVIDIA Research Projects 43 Dec 12, 2022
[CVPR'21] Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

TransFuser This repository contains the code for the CVPR 2021 paper Multi-Modal Fusion Transformer for End-to-End Autonomous Driving. If you find our

695 Jan 05, 2023
Histology images query (unsupervised)

110-1-NTU-DBME5028-Histology-images-query Final Project: Histology images query (unsupervised) Kaggle: https://www.kaggle.com/c/histology-images-query

1 Jan 05, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Welcome to the CSIRO Face Analysis SDK. Documentation for the SDK can be found in doc/documentation.html. All code in this SDK is provided according t

Luiz Carlos Vieira 7 Jul 16, 2020
Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Jina AI 794 Dec 31, 2022
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
Code for "Adversarial attack by dropping information." (ICCV 2021)

AdvDrop Code for "AdvDrop: Adversarial Attack to DNNs by Dropping Information(ICCV 2021)." Human can easily recognize visual objects with lost informa

Ranjie Duan 52 Nov 10, 2022
Text to image synthesis using thought vectors

Text To Image Synthesis Using Thought Vectors This is an experimental tensorflow implementation of synthesizing images from captions using Skip Though

Paarth Neekhara 2.1k Jan 05, 2023
A Kaggle competition: discriminate gender based on handwriting

Gender discrimination based on handwriting See http://fastml.com/gender-discrimination/ for description. prep_data.py - a first step chunk_by_authors.

Zygmunt ZajÄ…c 22 Jul 20, 2022
PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

Completer: Incomplete Multi-view Clustering via Contrastive Prediction This repo contains the code and data of the following paper accepted by CVPR 20

XLearning Group 72 Dec 07, 2022
The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

GUESS WHO Main Links: [Github] [App] Related Links: [CLIP] [Celeba] The aim of the game, as in the original one, is to find a specific image from a gr

Arnau - DIMAI 3 Jan 04, 2022
Code of Puregaze: Purifying gaze feature for generalizable gaze estimation, AAAI 2022.

PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation Description Our work is accpeted by AAAI 2022. Picture: We propose a domain-general

39 Dec 05, 2022