LinkNet - This repository contains our Torch7 implementation of the network developed by us at e-Lab.

Related tags

Deep LearningLinkNet
Overview

LinkNet

This repository contains our Torch7 implementation of the network developed by us at e-Lab. You can go to our blogpost or read the article LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation for further details.

Dependencies:

  • Torch7 : you can follow our installation step specified here
  • VideoDecoder : video decoder for torch that utilizes avcodec library.
  • Profiler : use it to calculate # of paramaters, operations and forward pass time of any network trained using torch.

Currently the network can be trained on two datasets:

Datasets Input Resolution # of classes
CamVid (cv) 768x576 11
Cityscapes (cs) 1024x512 19

To download both datasets, follow the link provided above. Both the datasets are first of all resized by the training script and if you want then you can cache this resized data using --cachepath option. In case of CamVid dataset, the available video data is first split into train/validate/test set. This is done using prepCamVid.lua file. dataDistributionCV.txt contains the detail about splitting of CamVid dataset. These things are automatically run before training of the network.

LinkNet performance on both of the above dataset:

Datasets Best IoU Best iIoU
Cityscapes 76.44 60.78
CamVid 69.10 55.83

Pretrained models and confusion matrices for both datasets can be found in the latest release.

Files/folders and their usage:

  • run.lua : main file
  • opts.lua : contains all the input options used by the tranining script
  • data : data loaders for loading datasets
  • [models] : all the model architectures are defined here
  • train.lua : loading of models and error calculation
  • test.lua : calculate testing error and save confusion matrices

There are three model files present in models folder:

  • model.lua : our LinkNet architecture
  • model-res-dec.lua : LinkNet with residual connection in each of the decoder blocks. This slightly improves the result but we had to use bilinear interpolation in residual connection because of which we were not able to run our trained model on TX1.
  • nobypass.lua : this architecture does not use any link between encoder and decoder. You can use this model to verify if connecting encoder and decoder modules actually improve performance.

A sample command to train network is given below:

th main.lua --datapath /Datasets/Cityscapes/ --cachepath /dataCache/cityscapes/ --dataset cs --model models/model.lua --save /Models/cityscapes/ --saveTrainConf --saveAll --plot

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/

Comments
  • memory consuming

    memory consuming

    The model read all the dataset into the momory, this method is too memory consuming. Maybe it is better to read the dataset list and iterate the list when training .

    opened by mingminzhen 7
  • Training on camvid dataset

    Training on camvid dataset

    Hi. I can't reproduce your result on camvid dataset. What is the learning rate and number of training epoch you used in your training, is your published result on validate or test set?.

    opened by vietdoan 4
  • Torch: not enough memory (17GB)

    Torch: not enough memory (17GB)

    Hi, all

    When I run : th main.lua --datapath /data2/cityscapes_dataset/leftImg8bit/all_train_images/ --cachepath /data2/cityscapes_dataset/leftImg8bit/dataCache/ --dataset cs --model models/model.lua --save save_models/cityscapes/ --saveTrainConf --saveAll --plot

    I got "Torch: not enough memory: you tried to allocate 17GB" error (details)

    It's strange because the paper mentioned it is trained using Titan X which has 12GB memory. Why the network consumes 17GB in running?

    Any suggestion to fix this issue?

    Thanks!

    opened by amiltonwong 3
  • Fine Tuning

    Fine Tuning

    Hi,

    is there any possibility to fine-tune this model on a custom datase with different number of classes? The pre-trained weights must be exist also, as I know.

    opened by MyVanitar 3
  • Model input/output details?

    Model input/output details?

    Hi,

    I'm having a hell of a time trying to understand what the model is expecting in terms of input and output. I'm trying to use this model in an iOS project, so I need to convert the model to Apple's CoreML format.

    Image input questions:

    • For image pixel values: 0-255, 0-1, -1-1?
    • RGB or BGR?
    • Color bias?

    Prediction output:

    • Looks like the shape is # of classes, width, height?
    • Predictions are positive floats from 0-100?

    So far I'm having the best luck with these specifications:

    import torch
    from torch2coreml import convert
    from torch.utils.serialization import load_lua
    
    model = load_lua("model-cs-IoU-cpu.net")
    
    input_shape = (3, 512, 1024)
    coreml_model = convert(
            model,
            [input_shape],
            input_names=['inputImage'],
            output_names=['outputImage'],
            image_input_names=['inputImage'],
            preprocessing_args={
                'image_scale': 2/255.0
            }
        )
    coreml_model.save("/home/sean/Downloads/Final/model-cs-IoU.mlmodel")
    
    opened by seantempesta 2
  • About IoU

    About IoU

    Hi, @codeAC29
    I cannot obtain the high IoU in my training. I looked into your code and found that, the IoU is computed via averageValid. But this is actually computing the mean of class accuracy. The IoU should be the value of averageUnionValid. Do you notice the difference and obtain 76% IoU by averageUnionValid ?

    Sorry for the trouble. For convenience, I refer the definition of averageValid and averageUnionValid here.

    opened by qqning 2
  • Error while running linknet main file

    Error while running linknet main file

    Hii, I am getting this error while running main.py RuntimeError: Expected object of type torch.cuda.LongTensor but found type torch.cuda.FloatTensor for argument 2 'target'. Please help me out. Also when i try to run the trained models i am running into error. I am using pytorch to run .net files. I am not able to load them as it is showing error: name cs is not defined. It is a model. Why does it have a variable named cs(here cs represents cityscapes) in it?

    opened by Tharun98 0
  • Model fails for input size other than multiples of 32(for depth of 4)

    Model fails for input size other than multiples of 32(for depth of 4)

    Hi, If we give the input image size other than 32 multiples there is a size mismatch error when adding the output from encoder3 and decoder4. For example input image size is 1000x2000 output of encoder3 is 63x125 and decoder4 output size is 64x126. We need adjust parameters for spatialfullconvolution layer only if input image size is multiple of 2^(n+1) where n is encoder depth. For other image sizes adjust parameter depends on the image size. In this example network works if adjust parameter is zero in decoders 3 and 4. Please clarify if this network works only for 2^(n+1) sizes. Thanks.

    opened by Tharun98 1
  • How about the image resolution?

    How about the image resolution?

    Hi, I am reproducing the LinkNet. I have a doubt about the input image resolution and the output image resolution when you compute the FLOPS. I find my FLOPS and running speed are different your results reported on your paper.

    opened by ycszen 5
  • linknet  architecture

    linknet architecture

    iam trying to build linknet in caffe. Could you please help me in below qns: 1)Found that there are 5 downsampling and 6 updsampling by 2. if we have different no of up sampling and down sampling(6,5) how can we get the same output shape as input. Referred:https://arxiv.org/pdf/1707.03718.pdf 2)how many iterations you ran to get the proper results. 3)To match the encoder and decoder output shape i used crop layer before Eltwise instead of adding extra row or column. Will it make any difference?

    opened by vishnureghu007 7
  • Error while training

    Error while training

    I got the camVid dataset as specified in the in the read me file and installed video-decoder

    Ientered the following command to start training: th main.lua --datapath ./data/CamVid/ --cachepath ./dataCache/CamV/ --dataset cv --model ./models/model.lua --save ./Models/CamV/ --saveTrainConf --saveAll --plot

    And I got the following error,

    Preparing CamVid dataset for data loader Filenames and their role found in: ./misc/dataDistributionCV.txt

    Getting input images and labels for: 01TP_extract.avi /home/jayp/torch/install/bin/luajit: /home/jayp/torch/install/share/lua/5.1/trepl/init.lua:389: /home/jayp/torch/install/share/lua/5.1/trepl/init.lua:389: error loading module 'libvideo_decoder' from file '/home/jayp/torch/install/lib/lua/5.1/libvideo_decoder.so': /home/jayp/torch/install/lib/lua/5.1/libvideo_decoder.so: undefined symbol: avcodec_get_frame_defaults stack traceback: [C]: in function 'error' /home/jayp/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require' main.lua:34: in main chunk [C]: in function 'dofile' ...jayp/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk

    I would really appreciate if anyone would help me with this.

    Thank You!

    opened by jay98 4
Releases(v1.0)
Owner
e-Lab
e-Lab
This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, built on SECOND.

3D-CVF This is the official implementation of 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object

YecheolKim 97 Dec 20, 2022
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

English: README-EN.md VRCWatch VRCWatch は、VRChat 内のアバター向けに現在時刻を送信するためのプログラムです。 使

Kosaki Mezumona 17 Nov 30, 2022
Implementation of SSMF: Shifting Seasonal Matrix Factorization

SSMF Implementation of SSMF: Shifting Seasonal Matrix Factorization, Koki Kawabata, Siddharth Bhatia, Rui Liu, Mohit Wadhwa, Bryan Hooi. NeurIPS, 2021

Koki Kawabata 9 Jun 10, 2022
Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, Daniel Silva, Andrew McCallum, Amr Ahmed. KDD 2019.

gHHC Code for: Gradient-based Hierarchical Clustering using Continuous Representations of Trees in Hyperbolic Space. Nicholas Monath, Manzil Zaheer, D

Nicholas Monath 35 Nov 16, 2022
Customizable RecSys Simulator for OpenAI Gym

gym-recsys: Customizable RecSys Simulator for OpenAI Gym Installation | How to use | Examples | Citation This package describes an OpenAI Gym interfac

Xingdong Zuo 14 Dec 08, 2022
Repository containing detailed experiments related to the paper "Memotion Analysis through the Lens of Joint Embedding".

Memotion Analysis Through The Lens Of Joint Embedding This repository contains the experiments conducted as described in the paper 'Memotion Analysis

Nethra Gunti 1 Mar 16, 2022
FairyTailor: Multimodal Generative Framework for Storytelling

FairyTailor: Multimodal Generative Framework for Storytelling

Eden Bens 172 Dec 30, 2022
duralava is a neural network which can simulate a lava lamp in an infinite loop.

duralava duralava is a neural network which can simulate a lava lamp in an infinite loop. Example This is not a real lava lamp but a "fake" one genera

Maximilian Bachl 87 Dec 20, 2022
Synthetic LiDAR sequential point cloud dataset with point-wise annotations

SynLiDAR dataset: Learning From Synthetic LiDAR Sequential Point Cloud This is official repository of the SynLiDAR dataset. For technical details, ple

78 Dec 27, 2022
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 06, 2023
Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations Trevor Ablett, Daniel (Yifan) Zhai, Jonatha

STARS Laboratory 3 Feb 01, 2022
Boostcamp CV Serving For Python

Boostcamp-CV-Serving Prerequisites MySQL GCP Cloud Storage GCP key file Sentry Streamlit Cloud Secrets: .streamlit/secrets.toml #DO NOT SHARE THIS I

Jungwon Seo 19 Feb 22, 2022
Repositorio oficial del curso IIC2233 Programación Avanzada 🚀✨

IIC2233 - Programación Avanzada Evaluación Las evaluaciones serán efectuadas por medio de actividades prácticas en clases y tareas. Se calculará la no

IIC2233 @ UC 47 Sep 06, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
Husein pet projects in here!

project-suka-suka Husein pet projects in here! List of projects mysejahtera-density. Generate resolution points using meshgrid and request each points

HUSEIN ZOLKEPLI 47 Dec 09, 2022
Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz).

Blender-Cave-Generation Cave Generation using metaballs in Blender. Originally created by sdfgeoff, Edited by Myself (Archie Jaskowicz). Installation

2 Dec 28, 2022
Source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network

D-HAN The source code of D-HAN This is the source code of D-HAN: Dynamic News Recommendation with Hierarchical Attention Network. However, only the co

30 Sep 22, 2022
Changing the Mind of Transformers for Topically-Controllable Language Generation

We will first introduce the how to run the IPython notebook demo by downloading our pretrained models. Then, we will introduce how to run our training and evaluation code.

IESL 20 Dec 06, 2022