Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

Owner

HReynaud

Tiny Kinetics-400 for test

OpenFed: A Comprehensive and Versatile Open-Source Federated Learning Framework

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation (CoRL 2021)

Numerical-computing-is-fun - Learning numerical computing with notebooks for all ages.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Mall-Customers-Segmentation - Customer Segmentation Using K-Means Clustering

Metadata-Extractor - Metadata Extractor Script can be used to read in exif metadata

An end-to-end framework for mixed-integer optimization with data-driven learned constraints.

My implementation of transformers related papers for computer vision in pytorch

Official implement of "CAT: Cross Attention in Vision Transformer".

Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

Few-Shot Graph Learning for Molecular Property Prediction

A PyTorch Implementation of Single Shot Scale-invariant Face Detector.

Code for the paper Learning the Predictability of the Future

Put blind watermark into a text with python

NumQMBasic - A mini-course offered to Undergrad physics students

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

A Strong Baseline for Image Semantic Segmentation

kullanışlı ve işinizi kolaylaştıracak bir araç

Code release for ConvNeXt model