A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Last update: May 08, 2022

Related tags

Overview

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Overview

Clustering analysis is widely utilized in single-cell RNA-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While several clustering methods have been developed for scRNA-seq analysis, the clustering results of these methods heavily rely on the number of clusters as prior information. How-ever, it is not easy to know the exact number of cell types, and experienced determination is not always accurate. Here, we have developed ADClust, an auto deep embedding clustering method for scRNA-seq data, which can simultaneously and accurately estimate the number of clusters and cluster cells. Specifically, ADClust first obtain low-dimensional representation through pre-trained autoencoder, and use the representations to cluster cells into micro-clusters. Then, the micro-clusters are compared in be-tween by Dip-test, a statistical test for unimodality, and similar micro-clusters are merged through a designed clustering loss func-tion. This process continues until convergence. By tested on elev-en real scRNA-seq datasets, ADClust outperformed existing meth-ods in terms of both clustering performance and the ability to es-timate the number of clusters. More importantly, our model pro-vides high speed and scalability on large datasets.

Requirements

Please ensure that all the libraries below are successfully installed:

torch 1.7.1
numpy 1.19.2
scipy 1.7.3
scanpy 1.8.1

Installation

You need to compile the dip.c file using a C compiler, and add the path of generated library dip.so into LD_LIBRARY_PATH. For this following commands need to be executed:


gcc -fPIC -shared -o dip.so dip.c

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./dip.so

Run ADClust

Run on the normalized example data.


python ADClust.py --name Baron_human_normalized

output

The clustering cell labels will be stored in the dir ourtput /dataname_pred.csv.

scRNA-seq Datasets

All datasets can be downloaded at Here

All datasets will be downloaded to: ADClust /data/

Citation

Please cite our paper:


@article{zengys,
  title={A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data},
  author={Yuansong Zeng, Zhuoyi Wei, Fengqi, Zhong,  Zixiang Pan, Yutong Lu, Yuedong Yang},
  journal={biorxiv},
  year={2021}
 publisher={Cold Spring Harbor Laboratory}
}

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Related tags

Overview

A Parameter-free Deep Embedded Clustering Method for Single-cell RNA-seq Data

Overview

Requirements

Installation

Run ADClust

Run on the normalized example data.

output

scRNA-seq Datasets

Citation

Owner

AI-Biomed @NSCC-gz

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

a baseline to practice

Semantic segmentation models, datasets and losses implemented in PyTorch.

Weakly Supervised Learning of Rigid 3D Scene Flow

STEM: An approach to Multi-source Domain Adaptation with Guarantees

Implementation of ConvMixer in TensorFlow and Keras

Open source Python module for computer vision

This is a library for training and applying sparse fine-tunings with torch and transformers.

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks

Image Segmentation and Object Detection in Pytorch

Official implementation of NeurIPS'2021 paper TransformerFusion

Federated_learning codes used for the the paper "Evaluation of Federated Learning Aggregation Algorithms" and "A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison"

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

AntiFuzz: Impeding Fuzzing Audits of Binary Executables

The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

SSD-based Object Detection in PyTorch

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"