LIVECell - A large-scale dataset for label-free live cell segmentation

Related tags

Deep LearningLIVECell
Overview

LIVECell dataset

This document contains instructions of how to access the data associated with the submitted manuscript "LIVECell - A large-scale dataset for label-free live cell segmentation" by Edlund et. al. 2021.

Background

Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.

How to access LIVECell

All images in LIVECell are available following this link (requires 1.3 GB). Annotations for the different experiments are linked below. To see a more details regarding benchmarks and how to use our models, see this link.

LIVECell-wide train and evaluate

Annotation set URL
Training set link
Validation set link
Test set link

Single cell-type experiments

Cell Type Training set Validation set Test set
A172 link link link
BT474 link link link
BV-2 link link link
Huh7 link link link
MCF7 link link link
SH-SHY5Y link link link
SkBr3 link link link
SK-OV-3 link link link

Dataset size experiments

Split URL
2 % link
4 % link
5 % link
25 % link
50 % link

Comparison to fluorescence-based object counts

The images and corresponding json-file with object count per image is available together with the raw fluorescent images the counts is based on.

Cell Type Images Counts Fluorescent images
A549 link link link
A172 link link link

Download all of LIVECell

The LIVECell-dataset and trained models is stored in an Amazon Web Services (AWS) S3-bucket. It is easiest to download the dataset if you have an AWS IAM-user using the AWS-CLI in the folder you would like to download the dataset to by simply:

aws s3 sync s3://livecell-dataset .

If you do not have an AWS IAM-user, the procedure is a little bit more involved. We can use curl to make an HTTP-request to get the S3 XML-response and save to files.xml:

files.xml ">
curl -H "GET /?list-type=2 HTTP/1.1" \
     -H "Host: livecell-dataset.s3.eu-central-1.amazonaws.com" \
     -H "Date: 20161025T124500Z" \
     -H "Content-Type: text/plain" http://livecell-dataset.s3.eu-central-1.amazonaws.com/ > files.xml

We then get the urls from files using grep:

)[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt ">
grep -oPm1 "(?<=
   
    )[^<]+" files.xml | sed -e 's/^/http:\/\/livecell-dataset.s3.eu-central-1.amazonaws.com\//' > urls.txt

   

Then download the files you like using wget.

File structure

The top-level structure of the files is arranged like:

/livecell-dataset/
    ├── LIVECell_dataset_2021  
    |       ├── annotations/
    |       ├── models/
    |       ├── nuclear_count_benchmark/	
    |       └── images.zip  
    ├── README.md  
    └── LICENSE

LIVECell_dataset_2021/images

The images of the LIVECell-dataset are stored in /livecell-dataset/LIVECell_dataset_2021/images.zip along with their annotations in /livecell-dataset/LIVECell_dataset_2021/annotations/.

Within images.zip are the training/validation-set and test-set images are completely separate to facilitate fair comparison between studies. The images require 1.3 GB disk space unzipped and are arranged like:

images/
    ├── livecell_test_images
    |       └── 
   
    
    |               └── 
    
     _Phase_
     
      _
      
       _
       
        _
        
         .tif └── livecell_train_val_images └── 
          
         
        
       
      
     
    
   

Where is each of the eight cell-types in LIVECell (A172, BT474, BV2, Huh7, MCF7, SHSY5Y, SkBr3, SKOV3). Wells are the location in the 96-well plate used to culture cells, indicates location in the well where the image was acquired, the time passed since the beginning of the experiment to image acquisition and index of the crop of the original larger image. An example image name is A172_Phase_C7_1_02d16h00m_2.tif, which is an image of A172-cells, grown in well C7 where the image is acquired in position 1 two days and 16 hours after experiment start (crop position 2).

LIVECell_dataset_2021/annotations/

The annotations of LIVECell are prepared for all tasks along with the training/validation/test splits used for all experiments in the paper. The annotations require 2.1 GB of disk space and are arranged like:

annotations/
    ├── LIVECell
    |       └── livecell_coco_
   
    .json
    ├── LIVECell_single_cells
    |       └── 
    
     
    |               └── 
     
      .json
    └── LIVECell_dataset_size_split
            └── 
      
       _train
       
        percent.json 
       
      
     
    
   
  • annotations/LIVECell contains the annotations used for the LIVECell-wide train and evaluate task.
  • annotations/LIVECell_single_cells contains the annotations used for Single cell type train and evaluate as well as the Single cell type transferability tasks.
  • annotations/LIVECell_dataset_size_split contains the annotations used to investigate the impact of training set scale.

All annotations are in Microsoft COCO Object Detection-format, and can for instance be parsed by the Python package pycocotools.

models/

ALL models trained and evaluated for tasks associated with LIVECell are made available for wider use. The models are trained using detectron2, Facebook's framework for object detection and instance segmentation. The models require 15 GB of disk space and are arranged like:

models/
   └── Anchor_
   
    
            ├── ALL/
            |    └──
    
     .pth
            └── 
     
      /
                 └──
      
       .pths
       

      
     
    
   

Where each .pth is a binary file containing the model weights.

configs/

The config files for each model can be found in the LIVECell github repo

LIVECell
    └── Anchor_
   
    
            ├── livecell_config.yaml
            ├── a172_config.yaml
            ├── bt474_config.yaml
            ├── bv2_config.yaml
            ├── huh7_config.yaml
            ├── mcf7_config.yaml
            ├── shsy5y_config.yaml
            ├── skbr3_config.yaml
            └── skov3_config.yaml

   

Where each config file can be used to reproduce the training done or in combination with our model weights for usage, for more info see the usage section.

nuclear_count_benchmark/

The images and fluorescence-based object counts are stored as the label-free images in a zip-archive and the corresponding counts in a json as below:

nuclear_count_benchmark/
    ├── A172.zip
    ├── A172_counts.json
    ├── A172_fluorescent_images.zip
    ├── A549.zip
    ├── A549_counts.json 
    └── A549_fluorescent_images.zip

The json files are on the following format:

": " " } ">
{
    "
     
      ": "
      
       "
}

      
     

Where points to one of the images in the zip-archive, and refers to the object count according fluorescent nuclear labels.

LICENSE

All images, annotations and models associated with LIVECell are published under Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.

All software source code associated associated with LIVECell are published under the MIT License.

Owner
Sartorius Corporate Research
Sartorius Corporate Research
A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

AI Summer 65 Sep 12, 2022
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 04, 2023
SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

Sara Ahmed 275 Dec 28, 2022
Official Implementation of "Designing an Encoder for StyleGAN Image Manipulation"

Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021) Recently, there has been a surge of diverse methods for performing image editing

749 Jan 09, 2023
FMA: A Dataset For Music Analysis

FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information

Michaël Defferrard 1.8k Dec 29, 2022
Efficient Lottery Ticket Finding: Less Data is More

The lottery ticket hypothesis (LTH) reveals the existence of winning tickets (sparse but critical subnetworks) for dense networks, that can be trained in isolation from random initialization to match

VITA 20 Sep 04, 2022
[ WSDM '22 ] On Sampling Collaborative Filtering Datasets

On Sampling Collaborative Filtering Datasets This repository contains the implementation of many popular sampling strategies, along with various expli

Noveen Sachdeva 17 Dec 08, 2022
Script utilizando OpenCV e modelo Machine Learning para detectar o uso de máscaras.

Reconhecendo máscaras Este repositório contém um script em Python3 que reconhece se um rosto está ou não portando uma máscara! O código utiliza da bib

Maria Eduarda de Azevedo Silva 168 Oct 20, 2022
Materials for upcoming beginner-friendly PyTorch course (work in progress).

Learn PyTorch for Deep Learning (work in progress) I'd like to learn PyTorch. So I'm going to use this repo to: Add what I've learned. Teach others in

Daniel Bourke 2.3k Dec 29, 2022
Official code release for "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis"

GRAF This repository contains official code for the paper GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis. You can find detailed usage i

349 Dec 29, 2022
A variational Bayesian method for similarity learning in non-rigid image registration (CVPR 2022)

A variational Bayesian method for similarity learning in non-rigid image registration We provide the source code and the trained models used in the re

daniel grzech 14 Nov 21, 2022
Dcf-game-infrastructure-public - Contains all the components necessary to run a DC finals (attack-defense CTF) game from OOO

dcf-game-infrastructure All the components necessary to run a game of the OOO DC

Order of the Overflow 46 Sep 13, 2022
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

DeCLIP Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm. Our paper is available in arxiv Updates ** Ou

Sense-GVT 470 Dec 30, 2022
kapre: Keras Audio Preprocessors

Kapre Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time. Tested on Python 3.6 and 3.7 Why Kapre? vs. Pre-co

Keunwoo Choi 867 Dec 29, 2022
Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Ankou Ankou is a source-based grey-box fuzzer. It intends to use a more rich fitness function by going beyond simple branch coverage and considering t

SoftSec Lab 54 Dec 24, 2022
Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation

NorCal Normalization Calibration (NorCal) for Long-Tailed Object Detection and Instance Segmentation On Model Calibration for Long-Tailed Object Detec

Tai-Yu (Daniel) Pan 24 Dec 25, 2022
FocusFace: Multi-task Contrastive Learning for Masked Face Recognition

FocusFace This is the official repository of "FocusFace: Multi-task Contrastive Learning for Masked Face Recognition" accepted at IEEE International C

Pedro Neto 21 Nov 17, 2022
Tensors and neural networks in Haskell

Hasktorch Hasktorch is a library for tensors and neural networks in Haskell. It is an independent open source community project which leverages the co

hasktorch 920 Jan 04, 2023
Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Conditional Generation Using Polynomial Expansions Official implementation of the conditional image generation experiments as described on the NeurIPS

Grigoris 4 Aug 07, 2022
An Unpaired Sketch-to-Photo Translation Model

Unpaired-Sketch-to-Photo-Translation We have released our code at https://github.com/rt219/Unsupervised-Sketch-to-Photo-Synthesis This project is the

38 Oct 28, 2022