Fully convolutional networks for semantic segmentation

Last update: Dec 25, 2022

Overview

FCN-semantic-segmentation

Simple end-to-end semantic segmentation using fully convolutional networks [1]. Takes a pretrained 34-layer ResNet [2], removes the fully connected layers, and adds transposed convolution layers with skip connections from lower layers. Initialises upsampling convolutions with bilinear interpolation filters and zeros the final (classification) layer.

Uses an independent cross-entropy loss per class. Trained with SGD with momentum, plus weight decay only on convolutional weights. Calculates and plots class-wise and mean intersection-over-union. Checkpoints the network every epoch.

Note: This code does not achieve great results (achieves ~40 IoU fairly quickly, but converges there). Contributions to fix this are welcome! The goal of this repo is to provide strong, simple and efficient baselines for semantic segmentation using the FCN method, so this shouldn't be restricted to using ResNet 34 etc.

Requirements

Instructions

Install all of the required software. To feasibly run the training, CUDA is needed. The crop size and batch size can be tailored to your GPU memory (the default crop and batch sizes use ~10GB of GPU RAM).
Register on the Cityscapes website to access the dataset.
Download and extract the training/validation RGB data (leftImg8bit_trainvaltest) and ground truth data (gtFine_trainvaltest).
Run python main.py <options>.

First a Dataset object is set up, returning the RGB inputs, one-hot targets (for independent classification) and label targets. During training, the images are randomly cropped and horizontally flipped. Testing calculates IoU scores and produces a subset of coloured predictions that match the coloured ground truth.

References

[1] Fully convolutional networks for semantic segmentation
[2] Deep Residual Learning for Image Recognition

Fully convolutional networks for semantic segmentation

Related tags

Overview

FCN-semantic-segmentation

Requirements

Instructions

References

Owner

Kai Arulkumaran

3D detection and tracking viewer (visualization) for kitti & waymo dataset

An Unpaired Sketch-to-Photo Translation Model

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Beancount-mercury - Beancount importer for Mercury Startup Checking

Node Dependent Local Smoothing for Scalable Graph Learning

Personal project about genus-0 meshes, spherical harmonics and a cow

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Another pytorch implementation of FCN (Fully Convolutional Networks)

The world's simplest facial recognition api for Python and the command line

IGCN : Image-to-graph convolutional network

QT Py Media Knob using rotary encoder & neopixel ring

Fully convolutional networks for semantic segmentation

Related tags

Overview

FCN-semantic-segmentation

Requirements

Instructions

References

Owner

Kai Arulkumaran

3D detection and tracking viewer (visualization) for kitti & waymo dataset

An Unpaired Sketch-to-Photo Translation Model

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Beancount-mercury - Beancount importer for Mercury Startup Checking

Node Dependent Local Smoothing for Scalable Graph Learning

Personal project about genus-0 meshes, spherical harmonics and a cow

PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, wav2lip, picture repair, image editing, photo2cartoon, image style transfer, and so on.

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

​TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.

Another pytorch implementation of FCN (Fully Convolutional Networks)

The world's simplest facial recognition api for Python and the command line

IGCN : Image-to-graph convolutional network

QT Py Media Knob using rotary encoder & neopixel ring

TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games.