the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

Overview

EmbedSeg

Introduction

This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images. For a short summary of the main attributes of the publication, please check out the project webpage.

We refer to the techniques elaborated in the publication, here as EmbedSeg. EmbedSeg is a method to perform instance-segmentation of objects in microscopy images, based on the ideas by Neven et al, 2019.

teaser

With EmbedSeg, we obtain state-of-the-art results on multiple real-world microscopy datasets. EmbedSeg has a small enough memory footprint (between 0.7 to about 3 GB) to allow network training on virtually all CUDA enabled hardware, including laptops.

Citation

If you find our work useful in your research, please consider citing:

@misc{lalit2021embeddingbased,
      title={Embedding-based Instance Segmentation of Microscopy Images}, 
      author={Manan Lalit and Pavel Tomancak and Florian Jug},
      year={2021},
      eprint={2101.10033},
      archivePrefix={arXiv},
      primaryClass={eess.IV}
}

Dependencies

We have tested this implementation using pytorch version 1.1.0 and cudatoolkit version 10.0 on a linux OS machine.

In order to replicate results mentioned in the publication, one could use the same virtual environment (EmbedSeg_environment.yml) as used by us. Create a new environment, for example, by entering the python command in the terminal conda env create -f path/to/EmbedSeg_environment.yml.

Getting Started

Please open a new terminal window and run the following commands one after the other.

git clone https://github.com/juglab/EmbedSeg.git
cd EmbedSeg
conda env create -f EmbedSeg_environment.yml
conda activate EmbedSegEnv
python3 -m pip install -e .
python3 -m ipykernel install --user --name EmbedSegEnv --display-name "EmbedSegEnv"
cd examples
jupyter notebook

(In case conda activate EmbedSegEnv generates an error, please try source activate EmbedSegEnv instead). Next, look in the examples directory, and try out the dsb-2018 example set of notebooks (to begin with). Please make sure to select Kernel > Change kernel to EmbedSegEnv.

Training & Inference on your data

*.tif-type images and the corresponding masks should be respectively present under images and masks, under directories train, val and test. (In order to prepare such instance masks, one could use the Fiji plugin Labkit as detailed here). These are cropped in smaller patches in the notebook 01-data.ipynb. The following would be a desired structure as to how data should be prepared.

$data_dir
└───$project-name
    |───train
        └───images
            └───X0.tif
            └───...
            └───Xn.tif
        └───masks
            └───Y0.tif
            └───...
            └───Yn.tif
    |───val
        └───images
            └───...
        └───masks
            └───...
    |───test
        └───images
            └───...
        └───masks
            └───...
Comments
  • How can I reduce memory for inference

    How can I reduce memory for inference

    Hi.

    I tried separately ran the notebook[bbbc010-2012] for inference provided by this repo but I had a memory allocation issue. I used batch size as 1.

    Is there any other parameters to reduce memory requirement?

    Also I set normalization_factor = 32767 if data_type=='8-bit' else 255 instead of normalization_factor = 65535 if data_type=='16-bit' else 255.

    But nothing changed.

    bug 
    opened by r-matsuzaka 9
  • [BUG]RuntimeError: result type Byte can't be cast to the desired output type Bool

    [BUG]RuntimeError: result type Byte can't be cast to the desired output type Bool

    Hi, again..

    When I run begin_evaluating(test_configs, verbose = False, avg_bg= avg_bg/normalization_factor) at predict notebook, I got the following error:

    2-D `test` dataloader created! Accessing data from ../../../data/bbbc010-2012/test/
    Number of images in `test` directory is 50
    Number of instances in `test` directory is 50
    Number of center images in `test` directory is 0
    *************************
    Creating branched erfnet with [4, 1] classes
    

    0%| | 0/50 [00:01<?, ?it/s]


    RuntimeError Traceback (most recent call last) /tmp/ipykernel_33/4185926816.py in ----> 1 begin_evaluating(test_configs, verbose = False, avg_bg= avg_bg/normalization_factor)

    /kaggle/input/embedsegv1/EmbedSeg/test.py in begin_evaluating(test_configs, verbose, mask_region, mask_intensity, avg_bg) 62 test(verbose = verbose, grid_x = test_configs['grid_x'], grid_y = test_configs['grid_y'], 63 pixel_x = test_configs['pixel_x'], pixel_y = test_configs['pixel_y'], ---> 64 one_hot = test_configs['dataset']['kwargs']['one_hot'], avg_bg = avg_bg, n_sigma=n_sigma) 65 elif(test_configs['name']=='3d'): 66 test_3d(verbose=verbose,

    /kaggle/input/embedsegv1/EmbedSeg/test.py in test(verbose, grid_y, grid_x, pixel_y, pixel_x, one_hot, avg_bg, n_sigma) 126 127 center_x, center_y, samples_x, samples_y, sample_spatial_embedding_x, sample_spatial_embedding_y, sigma_x, sigma_y,
    --> 128 color_sample_dic, color_embedding_dic = prepare_embedding_for_test_image(instance_map = instance_map, output = output, grid_x = grid_x, grid_y = grid_y, pixel_x = pixel_x, pixel_y =pixel_y, predictions =predictions, n_sigma = n_sigma) 129 130 base, _ = os.path.splitext(os.path.basename(sample['im_name'][0]))

    /kaggle/input/embedsegv1/EmbedSeg/utils/utils.py in prepare_embedding_for_test_image(instance_map, output, grid_x, grid_y, pixel_x, pixel_y, predictions, n_sigma) 483 sample_spatial_embedding_y[id.item()] = add_samples(samples_spatial_embeddings, 1, grid_y - 1, pixel_y) 484 center_image = predictions[id.item() - 1]['center-image'] # predictions is a list! --> 485 center_mask = in_mask & center_image.byte() 486 487

    RuntimeError: result type Byte can't be cast to the desired output type Bool

    bug 
    opened by r-matsuzaka 5
  • dsb-2018/01-data.ipynb ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

    dsb-2018/01-data.ipynb ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

    Hi,

    I am trying to run the firs example notebook, and I am failing at the very first cell...

    miniconda installation, creating the environment from your directions.

    conda env create -f EmbedSeg_environment.yml
    conda activate EmbedSegEnv
    python3 -m pip install -e .
    python3 -m ipykernel install --sys-prefix  --name EmbedSegEnv --display-name "EmbedSegEnv"
    

    (instead of --user to install it into the virtualenv instead of $HOME/.local)

    (EmbedSegEnv) [[email protected] EmbedSeg]$ pip3 list |grep numpy
    numpy                             1.19.4
    (EmbedSegEnv) [[email protected] EmbedSeg]$ pip3 list |grep hdm
    hdmedians                         0.14.1
    
    from tqdm import tqdm
    
    from glob import glob
    
    import tifffile
    
    import numpy as np
    
    import os
    
    from EmbedSeg.utils.preprocess_data import extract_data, split_train_val
    
    from EmbedSeg.utils.generate_crops import *
    
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-1-54e5f42b447e> in <module>
          5 import os
          6 from EmbedSeg.utils.preprocess_data import extract_data, split_train_val
    ----> 7 from EmbedSeg.utils.generate_crops import *
    
    ~/git/github/juglab/EmbedSeg/EmbedSeg/utils/generate_crops.py in <module>
          5 from scipy.ndimage.morphology import binary_fill_holes
          6 from scipy.spatial import distance_matrix
    ----> 7 import hdmedians as hd
          8 from numba import jit
          9 
    
    /c7/home/tru/miniconda3/envs/EmbedSegEnv/lib/python3.7/site-packages/hdmedians/__init__.py in <module>
          4 
          5 from .medoid import medoid, nanmedoid
    ----> 6 from .geomedian import geomedian, nangeomedian
    
    hdmedians/geomedian.pyx in init hdmedians.geomedian()
    
    ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject
    
    
    opened by truatpasteurdotfr 5
  • Where is cmap_60.npy?

    Where is cmap_60.npy?

    Hello again.

    I have a question about your elaborate notebook. I get stacked one section when loading cmap_60.npy.

    When I tried to load it, I got FileNotFoundError: [Errno 2] No such file or directory: '../../../cmaps/cmap_60.npy'.

    How can I prepare it?

    bug 
    opened by r-matsuzaka 4
  • Where is medoid used?

    Where is medoid used?

    Hi.

    I have a question about implementation about medoid which is mentioned in the paper. I found the calculation of it is done at https://github.com/juglab/EmbedSeg/blob/50f23233cf9564ff443c67c45a611ce665571c12/EmbedSeg/utils/generate_crops.py#L84

    But I could not found the clue that this funtion is called from any other python scripts.

    Could you tell me how the medoid is used in the code?

    bug 
    opened by r-matsuzaka 3
  • creating prediction without having val files

    creating prediction without having val files

    Hi, I am trying to create/generate prediction (part 3) but my dataset lacks validation files which prevent me from going further. I was wondering is there a specific function or code that can be implemented to tackle the issue or by default, validation files are required to generate prediction?

    opened by aminrezaei-img 2
  • License and general questions

    License and general questions

    embedseg seems promising,

    • why not use bsd or apache for license
    • how does embedseg compares to DenoiSeg in segmenting connected components, performance, efficiency, etc...
    opened by seekingdeep 2
  • [BUG] `workers` Parameter not Respected by DataLoaders

    [BUG] `workers` Parameter not Respected by DataLoaders

    Describe the bug Only 1 thread (core) is used for the dataloaders.

    To Reproduce Steps to reproduce the behavior:

    1. Spin up any of the training examples
    2. Set batch_size to something respectable, like 512
    3. Adjust workers dataloader parameter
    4. Examine CPU utilization

    Expected behavior Multiple cores get engaged and are used to feed the GPU(s).

    Screenshots Only 1 CPU Core Engaged

    Desktop (please complete the following information):

    • OS: Ubuntu 20.04.2 LTS
    • Graphics 2x GeForce GTX 3090

    Additional context

    train_dataset_dict = create_dataset_dict(
    	data_dir = data_dir, 
    	project_name = project_name,  
    	center = center, 
    	size = train_size, 
    	batch_size = train_batch_size, 
    	virtual_batch_multiplier = virtual_train_batch_multiplier, 
    	normalization_factor= normalization_factor,
    	one_hot = one_hot,
    	workers=16,
    	type = 'train'
    )
    

    To help debug, from the same virtual environment I put together this dummy script:

    import random
    import numpy as np
    from torch.utils.data import Dataset
    import torch
    from tqdm.auto import tqdm
    
    class TestDS(Dataset):
        def __len__(self):
            return 5000
    
        def __getitem__(self, index):
            z = np.zeros((256*256))
            for i in range(256*256): z[i] = i
            return z
            
    
    val_dataset = TestDS()
    val_dataset_it = torch.utils.data.DataLoader(
        val_dataset,
        batch_size=32,
        shuffle=True,
        drop_last=True,
        num_workers=12,
        pin_memory=True
    )
    
    while True:
        for i, sample in enumerate(tqdm(val_dataset_it)):
            sample = sample.to('cuda:1')
    

    Running the above results in proper core utilization: Cores Properly Engaged

    Even adding the following code at the head of EmbSeg training script does not help:

    import os
    os.environ["MKL_NUM_THREADS"] = "20"
    os.environ["OMP_NUM_THREADS"] = "20"
    
    bug 
    opened by authman 1
  • V0.2.5 - tag (d)

    V0.2.5 - tag (d)

    • Add Arabidopsis-Cells-CAM notebooks
    • Add stitch_2d and stitch_3d functions
    • Introduce num_workers while creating test_configs_dict
    • Correct path to labkit wiki
    opened by lmanan 0
  • V0.2.5 - tag (b)

    V0.2.5 - tag (b)

    • Fix resume path
    • Add updated docstrings
    • Hide display tags, save_images and virtual_batch_multiplier from 2d notebooks
    • Set drop_last=False while creating val_dataset_it (this helps if number of val crops is less than val_batch_size)
    opened by lmanan 0
  • v0.2.5 - tag (a)

    v0.2.5 - tag (a)

    • Make min-max-percentile normalization default
    • Update README
    • Better Visualization of crops and model predictions
    • Reduce text in train notebooks
    • Take away virtual_batch_multiplier as a user-defined attribute
    opened by lmanan 0
  • Pretrained models not found

    Pretrained models not found

    Hello,

    I found your links of pretrained models in this project page are 404. Do they still available? I want to try your models on our private dataset of 3D nuclei instance segmentation.

    Thank you! Best wishes.

    bug 
    opened by Chrisa142857 0
  • cublas Run time error

    cublas Run time error

    Describe the bug I am trying the example notebooks and successfully ran 01-data However, when I try the training notebook and being training the model, it takes a long time to initialise and then I get the following error: cublas runtime error : the GPU program failed to execute at C:/w/1/s/tmp_conda_3.7_044431/conda/conda-bld/pytorch_1556686009173/work/aten/src/THC/THCBlas.cu:259

    Desktop (please complete the following information):

    • OS: Tried this on Window 10 and Windows 11
    • Graphics NVIDIA RTX 3080

    Additional context Not sure if its a compatibility issue with RTX 30 series cards. I found a similar error for RTX 2080 cards on older pytorch https://github.com/pytorch/pytorch/issues/17334

    bug 
    opened by pr4deepr 1
  • TypeError: forward() missing 4 required positional arguments: 'prediction', 'instances', 'labels', and 'center_images'[BUG]

    TypeError: forward() missing 4 required positional arguments: 'prediction', 'instances', 'labels', and 'center_images'[BUG]

    Hello. I tried tutorial of bbbc010-2012 Jupyter notebooks, but this error happend and I don't know solution. Could you tell me what I should do ?

    I ran 01-data.ipynb and 02-train.ipynb. When I ran 「begin_training(train_dataset_dict, val_dataset_dict, model_dict, loss_dict, configs, color_map=new_cmap)」, the following error happend. image image

    Environment

    • OS: Ubuntu 18.04
    • GPU:Tesla -python3.7 torch 1.1.0 torchvision 0.3.0 cuda=10.0
    bug 
    opened by kenta-takizawa 2
  • RuntimeError: CUDA out of memory.

    RuntimeError: CUDA out of memory.

    I have 4 images, and batch size is only 1. but when I start the begin_training(train_dataset_dict, val_dataset_dict, model_dict, loss_dict, configs), I have RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 31.75 GiB total capacity; 30.71 GiB already allocated; 62.50 MiB free; 12.93 MiB cached). Please let me know how can I solve it. Thanks

    opened by Saharkakavand 8
Releases(v0.2.4-tag)
  • v0.2.4-tag(Apr 18, 2022)

    This release was used to compute numbers for the MIDL publication and is stable.

    • The normalization of the image intensities was done by dividing pixel intensities by 255 (for 8-bit images) and 65535 (for unsigned 16-bit images). While this normalization strategy lead to a faster training, it lead to a sometimes, poorer OOD performance. In the future releases, the default will be set to min-max-percentile (takes model longer to reach the same val IoU but leads to a better inference performance).
    Source code(tar.gz)
    Source code(zip)
  • v0.2.3-tag(Jun 15, 2021)

    A minor update since release v0.2.2. This includes:

    • Add display_zslice parameter and save_checkpoint_frequency parameter to configs dictionary here
    1. Support for visualization for setups when virtual_batch_multiplier > 1 is still missing.
    2. Also hardcoded install version of tifffile in setup.py here because latest version currently (2021.6.14) generates a warning message with imsave command while generating crops with bbbc010-2012 dataset. Will relax this version specification in release v0.2.4

    TODOs include:

    1. Plan to update pytorch version to 1.9.0 in release v0.2.4 (currently pytorch version used is 1.1.0)
    2. Plan to add tile and stitch capability in release v0.2.4 for handling in large 2d and 3d images during inference
    3. Plan to add a parameter max_crops_per_image in release v0.2.4 to set an optional upper bound on number of crops extracted from each image
    4. Plan to save all instance crops and center crops as RLE files in release v0.2.4
    5. Plan to add an optional mask parameter during training which ignores loss computation from certain regions of the image in release v0.2.4
    6. Plan to deal with bug while evaluating var_loss and to have crops of desired size by additional padding.
    7. Plan to include support for more classes.
    8. Normalization for 3d ==> (0,1, 2)
    9. Make normalization as default option for better extensibility
    10. Parallelize operations like cropping
    11. Eliminate the specification of grid size in notebooks -set to some default value
    12. Simplify notebooks further
    13. Make colab versions of the notebooks
    14. Test center=learn capability for learning the center freely
    15. Add the ILP formulation for stitching 2d instance predictions
    16. Add the code for converting predictions from 2d model on xy, yz and xz slices to generate a 3D instance segmentation
    17. Add more examples from medical image datasets
    18. Add threejs visualizations of the instance segmentations. Explain how to generate these meshes, smoothen them and import them with threejs script.
    19. Padding with reflection instead of constant mode
    20. Include cluster_with_seeds in case nuclei or cell detections are additionally available
    Source code(tar.gz)
    Source code(zip)
  • v0.2.2-tag(May 5, 2021)

  • v0.2.0(Apr 17, 2021)

    Major changes:

    • Add 3d example notebooks for two datasets
    • Correct min_object_size (evaluated now from looking at the train and validation masks)
    • Save tif images with datatype np.uint16 (in the prediction notebooks )
    • Provide support in case evaluation GT images are not available (during prediction)

    Some things which are still incorrect in v0.2.0:

    • n_y should be set to n_x for equal pixel/voxel sizes in y and x dimension. This is fixed in v0.2.1
    • anisotropy_factor is wrongly calculated for the 3d notebooks (it was calculated as the reciprocal). This is fixed in v0.2.1
    • train_size was set to 600 for the bbbc012-2010 dataset. This is raised to 1200 in v0.2.1
    Source code(tar.gz)
    Source code(zip)
Owner
JugLab
GitHub for the JugLab
JugLab
✅ How Robust are Fact Checking Systems on Colloquial Claims?. In NAACL-HLT, 2021.

How Robust are Fact Checking Systems on Colloquial Claims? Official PyTorch implementation of our NAACL paper: Byeongchang Kim*, Hyunwoo Kim*, Seokhee

Byeongchang Kim 19 Mar 15, 2022
Anomaly detection related books, papers, videos, and toolboxes

Anomaly Detection Learning Resources Outlier Detection (also known as Anomaly Detection) is an exciting yet challenging field, which aims to identify

Yue Zhao 6.7k Dec 31, 2022
A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network Requirements pytorch 1.1+ torchvision 0.3+ pyclipper opencv3 gcc

zhoujun 400 Dec 26, 2022
Predicting Student Attentiveness using OpenCV

Predicting-Student-Attentiveness-using-OpenCV The model will predict if a student is attentive or not through facial parameter received through the st

Johann Pinto 2 Aug 20, 2022
Research code of ICCV 2021 paper "Mesh Graphormer"

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning

Machine_Learning Election Exit Poll Prediction and U.S.A Presidential Speech Analysis using Machine Learning This project is based on 2 case-studies:

Avnika Mehta 1 Jan 27, 2022
A PyTorch Toolbox for Face Recognition

FaceX-Zoo FaceX-Zoo is a PyTorch toolbox for face recognition. It provides a training module with various supervisory heads and backbones towards stat

JDAI-CV 1.6k Jan 06, 2023
SparseInst: Sparse Instance Activation for Real-Time Instance Segmentation, CVPR 2022

SparseInst 🚀 A simple framework for real-time instance segmentation, CVPR 2022 by Tianheng Cheng, Xinggang Wang†, Shaoyu Chen, Wenqiang Zhang, Qian Z

Hust Visual Learning Team 458 Jan 05, 2023
This is a five-step framework for the development of intrusion detection systems (IDS) using machine learning (ML) considering model realization, and performance evaluation.

AB-TRAP: building invisibility shields to protect network devices The AB-TRAP framework is applicable to the development of Network Intrusion Detectio

Lab-C2DC - Laboratory of Command and Control and Cyber-security 17 Jan 04, 2023
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives

Status: Under development (expect bug fixes and huge updates) ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectiv

37 Dec 28, 2022
LabelImg is a graphical image annotation tool.

LabelImgPlus LabelImg is a graphical image annotation tool. This project is not updated with new functions now. More functions are supported with Labe

lzx1413 200 Dec 20, 2022
prior-based-losses-for-medical-image-segmentation

Repository for papers: Benchmark: Effect of Prior-based Losses on Segmentation Performance: A Benchmark Midl: A Surprisingly Effective Perimeter-based

Rosana EL JURDI 9 Sep 07, 2022
Official PyTorch implementation of RIO

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-

NVIDIA Research Projects 17 May 20, 2022
salabim - discrete event simulation in Python

Object oriented discrete event simulation and animation in Python. Includes process control features, resources, queues, monitors. statistical distrib

181 Dec 21, 2022
Understanding the Effects of Datasets Characteristics on Offline Reinforcement Learning

Understanding the Effects of Datasets Characteristics on Offline Reinforcement Learning Kajetan Schweighofer1, Markus Hofmarcher1, Marius-Constantin D

Institute for Machine Learning, Johannes Kepler University Linz 17 Dec 28, 2022
这是一个mobilenet-yolov4-lite的库,把yolov4主干网络修改成了mobilenet,修改了Panet的卷积组成,使参数量大幅度缩小。

YOLOV4:You Only Look Once目标检测模型-修改mobilenet系列主干网络-在Keras当中的实现 2021年2月8日更新: 加入letterbox_image的选项,关闭letterbox_image后网络的map一般可以得到提升。

Bubbliiiing 65 Dec 01, 2022
Pytorch implementation of Learning Rate Dropout.

Learning-Rate-Dropout Pytorch implementation of Learning Rate Dropout. Paper Link: https://arxiv.org/pdf/1912.00144.pdf Train ResNet-34 for Cifar10: r

42 Nov 25, 2022
ConformalLayers: A non-linear sequential neural network with associative layers

ConformalLayers: A non-linear sequential neural network with associative layers ConformalLayers is a conformal embedding of sequential layers of Convo

Prograf-UFF 5 Sep 28, 2022
[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search The official implementation of the paper LightTra

Multimedia Research 290 Dec 24, 2022