Pipeline to convert a haploid assembly into diploid

Related tags

Data Analysishapdup
Overview

HapDup

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes preserve heterozygous structural variants (in addition to small variants) and are locally phased.

Version 0.4

Input requirements

HapDup takes as input a haploid long-read assembly, such as produced with Flye or Shasta. Currenty, ONT reads (Guppy 5+ recommended) and PacBio HiFi reads are supported.

HapDup is currently designed for low-heterozygosity genomes (such as human). The expectation is that the assembly has most of the diploid genome collapsed into a single haplotype. For assemblies with partially resolved haplotypes, alternative alleles could be removed prior to running the pipeline using purge_dups. We expect to add a better support of highly heterozygous genomes in the future.

The first stage is to realign the original long reads on the assembly using minimap2. We recommend to use the latest minimap2 release.

minimap2 -ax map-ont -t 30 assembly.fasta reads.fastq | samtools sort -@ 4 -m 4G > lr_mapping.bam
samtools index -@ 4 assembly_lr_mapping.bam

Quick start using Docker

HapDup is available on the Docker Hub.

If Docker is not installed in your system, you need to set it up first following this guide.

Next steps assume that your assembly.fasta and lr_mapping.bam are in the same directory, which will also be used for HapDup output. If it is not the case, you might need to bind additional directories using the Docker's -v / --volume argument. The number of threads (-t argument) should be adjusted according to the available resources. For PacBio HiFi input, use --rtype hifi instead of --rtype ont.

cd directory_with_assembly_and_alignment
HD_DIR=`pwd`
docker run -v $HD_DIR:$HD_DIR -u `id -u`:`id -g` mkolmogo/hapdup:0.4 \
  hapdup --assembly $HD_DIR/assembly.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup -t 64 --rtype ont

Quick start using Singularity

Alternatively, you can use Singularity. First, you will need install the client as descibed in the manual. One way to do it is through conda:

conda install singularity

Next steps assume that your assembly.fasta and lr_mapping.bam are in the same directory, which will also be used for HapDup output. If it is not the case, you might need to bind additional directories using the --bind argument. The number of threads (-t argument) should be adjusted according to the available resources. For PacBio HiFi input, use --rtype hifi instead of --rtype ont.

singularity pull docker://mkolmogo/hapdup:0.4
HD_DIR=`pwd`
singularity exec --bind $HD_DIR hapdup_0.4.sif \
  hapdup --assembly $HD_DIR/assembly.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup -t 64 --rtype ont

Output files

The output directory will contain:

  • haplotype_{1,2}.fasta - final assembled haplotypes
  • phased_blocks_hp{1,2}.bed - phased blocks coordinates

Haplotypes generated by the pipeline contain homozogous and heterozygous varinats (small and structural). Becuase the pipeline is only using long-read (ONT) data, it does not achieve chromosome-level phasing. Fully-phased blocks are given in the the phased_blocks* files.

Pipeline overview

  1. HapDup starts with filtering alignments that are likely originating from the unassembled parts of the genome. Such alignments may later create false haplotypes if not removed (e.g. if reads from a segmental duplication with two copies can create four haplotypes).

  2. Afterwards, PEPPER is used to call SNPs from the filtered alignment file

  3. Then we use Margin to phase SNPs and haplotype reads

  4. We then use Flye to polish the initiall assembly with the reads from each of the two haplotypes independently

  5. Finally, we find (heterozygous) breakpoints in long-read alignments and apply the corresponding structural changes to the corresponding polished haplotypes. Currently, it allows to recover large heterozygous inversions.

Benchmarks

We evaluated HapDup haplotypes in terms of reconstructed structural variants signatures (heterozygous & homozygous) using the HG002 for which the curated set of SVs is available. We used the recent ONT data basecalled with Guppy 5.

Given HapDup haplotypes, we called SV using dipdiff. We also compare SV set against hifiasm assemblies, even though they were produced from HiFi, rather than ONT reads. Evaluated using truvari with -r 2000 option. GT refers to genotype-considered benchmarks.

Method Precision Recall F1-score GT Precision GT Recall GT F1-score
Shasta+HapDup 0.9500 0.9551 0.9525 0.934 0.9543 0.9405
Sniffles 0.9294 0.9143 0.9219 0.8284 0.9051 0.8605
CuteSV 0.9324 0.9428 0.9376 0.9119 0.9416 0.9265
hifiasm 0.9512 0.9734 0.9622 0.9129 0.9723 0.9417

Yak k-mer based evaluations:

Hap QV Switch err Hamming err
1 35 0.0389 0.1862
2 35 0.0385 0.1845

Given a minimap2 alignment, HapDup runs in ~400 CPUh and uses ~80 Gb of RAM.

Source installation

If you prefer, you can install from source as follows:

#create a new conda environemnt and activate it
conda create -n hapdup python=3.8
conda activate hapdup

#get HapDup source
git clone https://github.com/fenderglass/hapdup
cd hapdup
git submodule update --init --recursive

#build and install Flye
pushd submodules/Flye/ && python setup.py install && popd

#build and install Margin
pushd submodules/margin/ && mkdir build && cd build && cmake .. && make && cp ./margin $CONDA_PREFIX/bin/ && popd

#build and install PEPPER and its dependencies
pushd submodules/pepper/ && python -m pip install . && popd

To run, ensure that the conda environemnt is activated and then execute:

conda activate hapdup
./hapdup.py --assembly assembly.fasta --bam lr_mapping.bam --out-dir hapdup -t 64 --rtype ont

Acknowledgements

The major parts of the HapDup pipeline are:

Authors

The pipeline was developed at UC Santa Cruz genomics institute, Benedict Paten's lab.

Pipeline code contributors:

  • Mikhail Kolmogorov

PEPPER/Margin/Shasta support:

  • Kishwar Shafin
  • Trevor Pesout
  • Paolo Carnevali

Citation

If you use HapDup in your research, the most relevant papers to cite are:

Kishwar Shafin, Trevor Pesout, Pi-Chuan Chang, Maria Nattestad, Alexey Kolesnikov, Sidharth Goel, Gunjan Baid et al. "Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks." bioRxiv (2021). doi:10.1101/2021.03.04.433952

Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8

License

HapDup is distributed under a BSD license. See the LICENSE file for details. Other software included in this discrubution is released under either MIT or BSD licenses.

How to get help

A preferred way report any problems or ask questions is the issue tracker.

In case you prefer personal communication, please contact Mikhail at [email protected].

Comments
  • pthread_setaffinity_np failed Error while running pepper

    pthread_setaffinity_np failed Error while running pepper

    Hi,

    I'm trying to run HapDup on my assembly from Flye. However, an error has occurred while running Pepper: RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:173 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg: there's also a warning before this runtime error: /usr/local/lib/python3.8/dist-packages/torch/onnx/symbolic_opset9.py:2095: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. warnings.warn("Exporting a model to ONNX with a batch_size other than 1, " + Do you have any idea why this happens?

    The commands that I use are like:

    reads=NA24385_ONT_Promethion.fastq
    outdir=`pwd`
    assembly=${outdir}/assembly.fasta
    hapdup_sif=../HapDup/hapdup_0.4.sif
    
    time minimap2 -ax map-ont -t 30 ${assembly} ${reads} | samtools sort -@ 4 -m 4G > assembly_lr_mapping.bam
    samtools index -@ 4 assembly_lr_mapping.bam
    
    time singularity exec --bind ${outdir} ${hapdup_sif}\
    	hapdup --assembly ${assembly} --bam ${outdir}/assembly_lr_mapping.bam --out-dir ${outdir}/hapdup -t 64 --rtype ont
    

    Thank you

    bug 
    opened by LYC-vio 11
  • Docker fails to mount HD_DIR

    Docker fails to mount HD_DIR

    Hi! The following error occurs when I try to run:

    sudo docker run -v $HD_DIR:$HD_DIR -u `id -u`:`id -g` mkolmogo/hapdup:0.2   hapdup --assembly $HD_DIR/barcode11CBS1878_v2.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup
    
    docker: Error response from daemon: error while creating mount source path '/home/user/data/volume_2/hapdup_results': mkdir /home/user/data: file exists.
    ERRO[0000] error waiting for container: context canceled``
    
    opened by alkminion1 8
  • Incorrect genotype for large deletion

    Incorrect genotype for large deletion

    Hi,

    I have used Hapdup to make a haplotype-resolved assembly from Illumina-corrected ONT reads (haploid assembly made with Flye 2.9) and I am particularly interested in a large 32kb deletion. Here is a screenshot of IGV (from top to bottom: HAP1, HAP2 and haploid assembly):

    hapdup

    I believe the position and size of the deletion are near correct. However, the deletion is homozygous while it should be heterozygous. I have assembled with Hifiasm this proband and its parents using 30x PacBio HiFi: the 3 assemblies support an heterozygous call in the proband. I can also see from the corrected ONT that there is support for a heterozygous call. Finally, we can see this additional contig in the haploid assembly which I guess also support a heterozygous call.

    Hence, my question is: even if MARGIN manages to correctly separate reads with the deletion from reads without the deletion, can the polishing of Flye actually "fix" such a large event in one of the haplotype assembly?

    Thanks, Guillaume

    opened by GuillaumeHolley 7
  • invalid contig

    invalid contig

    Hi, I got an error when I ran the third step, and here is the error reporting information. Skipped filtering phase Skipped pepper phase Skipped margin phase Skipped Flye phase Finding breakpoints Parsed 304552 reads 14590 split reads Running: flye-minimap2 -ax asm5 -t 64 -K 5G /usr_storage/zyl/SY_haplotype/ZSP192L/ZSP192L.fasta /usr_storage/zyl/SY_haplotype/ZSP192L/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G [email protected] > /usr_storage/zyl/SY_haplotype/ZSP192L/hapdup/structural/liftover_hp1.bam [bam_sort_core] merging from 0 files and 4 in-memory blocks... Traceback (most recent call last): File "/usr/local/bin/hapdup", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/hapdup/main.py", line 173, in main bed_liftover(inversions_bed, minimap_out, open(inversions_hp, "w")) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 76, in bed_liftover proj_start_chr, proj_start_pos, proj_start_sign = project(bam_file, chr_id, chr_start) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 9, in project name, pos, sign = project_flank(bam_path, ref_seq, ref_pos, 1) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 23, in project_flank for pileup_col in samfile.pileup(ref_seq, max(0, ref_pos - flank), ref_pos + flank, truncate=True, File "pysam/libcalignmentfile.pyx", line 1335, in pysam.libcalignmentfile.AlignmentFile.pileup File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region ValueError: invalid contig Contig125

    bug 
    opened by tongyin121 7
  • hapdup fails with multiple primary alignments

    hapdup fails with multiple primary alignments

    I've got one sample in a series which is failing in hapdup with the following error.

    Any suggestions?

    Thanks

    `

    Starting merge Expected three tokens in header line, got 2 This usually means you have multiple primary alignments with the same read ID. You can identify whether this is the case with this command:

        samtools view -F 0x904 YOUR.bam | cut -f 1 | sort | uniq -c | awk '$1 > 1'
    

    Expected three tokens in header line, got 2 This usually means you have multiple primary alignments with the same read ID. You can identify whether this is the case with this command:

        samtools view -F 0x904 YOUR.bam | cut -f 1 | sort | uniq -c | awk '$1 > 1'
    

    [2022-12-13 17:15:57] ERROR: Missing output: hapdup/margin/MARGIN_PHASED.haplotagged.bam Traceback (most recent call last): File "/data/test_data/GIT/hapdup/hapdup.py", line 24, in sys.exit(main()) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 206, in main file_check(haplotagged_bam) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 114, in file_check raise Exception("Missing output") Exception: Missing output `

    opened by mattloose 4
  • ZeroDivisionError

    ZeroDivisionError

    Hi,

    I'm running hapdup version 0.8 on a number of human genomes.

    It appears to fail fairly regulary with an error in flye:

    [2022-10-20 15:33:48] INFO: Running Flye polisher [2022-10-20 15:33:48] INFO: Polishing genome (1/1) [2022-10-20 15:33:48] INFO: Polishing with provided bam [2022-10-20 15:33:48] INFO: Separating alignment into bubbles [2022-10-20 15:37:12] ERROR: Thread exception [2022-10-20 15:37:12] ERROR: Traceback (most recent call last): File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 79, in _thread_worker indels_profile = _get_indel_clusters(ctg_aln, profile, ctg_region.start) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 419, in _get_indel_clusters get_clusters(deletions, add_last=True) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 410, in get_clusters support = len(reads) / region_coverage ZeroDivisionError: division by zero

    The flye version is 2.9-b1778

    Has anyone else seen this and have any suggestions on how to fix?

    Thanks.

    bug 
    opened by mattloose 4
  • Super cool tool! Maybe in the README mention that FASTQ files are required if using the container

    Super cool tool! Maybe in the README mention that FASTQ files are required if using the container

    As far as know, inputting FASTA reads aligned to the reference as the bam file will result in Pepper failing to find any variants using the default settings of the Docker/Singularity container as the config for Pepper requires a minimum base quality setting.

    opened by jelber2 3
  • Option to set minimap2 -I flag

    Option to set minimap2 -I flag

    Hi,

    Ran into this error while trying to run hapdup:

    [2022-03-08 10:23:36] INFO: Running: flye-minimap2 -ax asm5 -t 10 -K 5G <PATH>/assembly.fasta <PATH>/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G [email protected] > <PATH>/hapdup/structural/liftover_hp1.bam
    [E::sam_parse1] missing SAM header
    [W::sam_read1] Parse error at line 2
    samtools sort: truncated file. Aborting
    Traceback (most recent call last):
      File "/usr/local/bin/hapdup", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/hapdup/main.py", line 245, in main
        subprocess.check_call(" ".join(minimap_cmd), shell=True)
      File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command 'flye-minimap2 -ax asm5 -t 10 -K 5G <PATH>/assembly.fasta <PATH>/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G -@4 > <PATH>/hapdup/structural/liftover_hp1.bam' returned non-zero exit status 1.
    

    I suspect it could be because of the default minimap2 -I flag being too small (4G)? If this is the case, maybe an option to specify this could be added, or adjust it automatically depending on genome size?

    Thanks!

    bug 
    opened by fellen31 3
  • Using Pepper-MARGIN r0.7?

    Using Pepper-MARGIN r0.7?

    Hi,

    Would it be possible to include the latest version of Pepper-MARGIN (r0.7) in hapdup? I haven't been able to run hapdup on my data so far because of some issues in Pepper-MARGIN r0.6 (now solved in r0.7).

    Thank you!

    Guillaume

    enhancement 
    opened by GuillaumeHolley 3
  • Please add CLI option to specify location of BAM index

    Please add CLI option to specify location of BAM index

    Could you add an additional command line parameter, allowing the specification of the BAM index location? Eg,

    hapdup --bam /some/where/abam.bam --bam-index /other/location/a_bam_index.bam.csi

    And then pass that optional value to pysam.AlignmentFile() argument filepath_index?

    Motivation: I'm wrapping hapdup and some other steps in a WDL script, and need to pass each file separately (ie, they are localized as individual files, and there is no guarantee they end up in the same directory when hapdup is invoked). The current hapdup assumes the index and bam are in the same directory, and fails.

    Thanks!

    CC: @0seastar0

    enhancement 
    opened by bkmartinjr 3
  • Singularity

    Singularity

    Hi,

    Thank you for this tool, I am really excited to try it! Would it be possible to have hapdup available as a Singularity image or to have the Docker image available online in a container repo (such that it can be converted to a Singularity image with a singularity pull)?

    Thanks, Guillaume

    opened by GuillaumeHolley 3
  • --overwrite fails

    --overwrite fails

    [2022-11-03 20:16:22] INFO: Filtering alignments Traceback (most recent call last): File "/data/test_data/GIT/hapdup/hapdup.py", line 24, in sys.exit(main()) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 153, in main filter_alignments_parallel(args.bam, filtered_bam, min(args.threads, 30), File "/data/test_data/GIT/hapdup/hapdup/filter_misplaced_alignments.py", line 188, in filter_alignments_parallel pysam.merge("-@", str(num_threads), bam_out, *bams_to_merge) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/pysam/utils.py", line 69, in call raise SamtoolsError( pysam.utils.SamtoolsError: "samtools returned with error 1: stdout=, stderr=[bam_merge] File 'hapdup/filtered.bam' exists. Please apply '-f' to overwrite. Abort.\n"

    Looks as though when you run with --overwrite the command is not being correctly passed through to sub processes.

    bug 
    opened by mattloose 2
  • phase block number compared to Whatshap

    phase block number compared to Whatshap

    Hello, thank you for the great tool!

    I was just testing HapDup v0.7 on our fish genome. Comparing the output with phasing done with WhatsHap (WH), I wondered why there is such a big difference in phased block size and block number between HapDup and the WH pipeline?

    For the fish chromosomes, WH was generating 679 blocks using 2'689'114 phased SNPs. Margin (HapDup pipeline) was generating 5352 blocks using 3'862'108 phased SNPs.

    The main difference seems to be the prior read filtering and usage of MarginPhase for the phasing in HapDup, but does this explain such a big difference?

    I was wondering if phase blocks of HapDup could be concatenated using whatshap SNP and block information to increase continuity? I imagine it would be a straightforward approach overlapping SNP positions between Margin and WH with phase block ids and lift-over phase ids from WH. I will do some visual inspections and scripting to test if there is overlap of called SNPs and agreement on block boarders.

    Cheers, Michel

    opened by MichelMoser 3
Releases(0.10)
  • 0.10(Oct 29, 2022)

  • 0.6(Mar 4, 2022)

    • Fixed issue with PEPPER model quantization causing the pipeline to hang on some systems
    • Speed-up of the last breakpoint analysis part of the pipeline, causing bottlenecks on some datasets
    Source code(tar.gz)
    Source code(zip)
  • 0.5(Feb 6, 2022)

    • Update to PEPPER 0.7
    • Added new option --pepper-model for custom pepper models
    • Added new option --bam-index to provide a non-standard path to alignment index file
    Source code(tar.gz)
    Source code(zip)
  • 0.4(Nov 19, 2021)

Owner
Mikhail Kolmogorov
Postdoc @ UCSC CGL, Paten lab. I work on building algorithms for computational genomics.
Mikhail Kolmogorov
Cleaning and analysing aggregated UK political polling data.

Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.

Ajay Pethani 0 Dec 22, 2021
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
Import, connect and transform data into Excel

xlwings_query Import, connect and transform data into Excel. Description The concept is to apply data transformations to a main query object. When the

George Karakostas 1 Jan 19, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
Data imputations library to preprocess datasets with missing data

Impyute is a library of missing data imputation algorithms. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

Elton Law 329 Dec 05, 2022
Show you how to integrate Zeppelin with Airflow

Introduction This repository is to show you how to integrate Zeppelin with Airflow. The philosophy behind the ingtegration is to make the transition f

Jeff Zhang 11 Dec 30, 2022
A meta plugin for processing timelapse data timepoint by timepoint in napari

napari-time-slicer A meta plugin for processing timelapse data timepoint by timepoint. It enables a list of napari plugins to process 2D+t or 3D+t dat

Robert Haase 2 Oct 13, 2022
A simplified prototype for an as-built tracking database with API

Asbuilt_Trax A simplified prototype for an as-built tracking database with API The purpose of this project is to: Model a database that tracks constru

Ryan Pemberton 1 Jan 31, 2022
Bamboolib - a GUI for pandas DataFrames

Community repository of bamboolib bamboolib is joining forces with Databricks. For more information, please read our announcement. Please note that th

Tobias Krabel 863 Jan 08, 2023
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly.

Please consider citing the manuscript if you use apricot in your academic work! You can find more thorough documentation here. apricot implements subm

Jacob Schreiber 457 Dec 20, 2022
Pyspark project that able to do joins on the spark data frames.

SPARK JOINS This project is to perform inner, all outer joins and semi joins. create_df.py: load_data.py : helps to put data into Spark data frames. d

Joshua 1 Dec 14, 2021
This tool parses log data and allows to define analysis pipelines for anomaly detection.

logdata-anomaly-miner This tool parses log data and allows to define analysis pipelines for anomaly detection. It was designed to run the analysis wit

AECID 32 Nov 27, 2022
A data parser for the internal syncing data format used by Fog of World.

A data parser for the internal syncing data format used by Fog of World. The parser is not designed to be a well-coded library with good performance, it is more like a demo for showing the data struc

Zed(Zijun) Chen 40 Dec 12, 2022
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

861 Dec 29, 2022
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

ROOT 2k Dec 29, 2022
Efficient matrix representations for working with tabular data

Efficient matrix representations for working with tabular data

QuantCo 70 Dec 14, 2022
cLoops2: full stack analysis tool for chromatin interactions

cLoops2: full stack analysis tool for chromatin interactions Introduction cLoops2 is an extension of our previous work, cLoops. From loop-calling base

YaqiangCao 25 Dec 14, 2022
This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be

Keanu Pang 0 Jan 20, 2022