Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Overview

Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Cashier is a tool developed by Russell Durrett for the analysis and extraction of expressed barcode tags.

This python implementation offers the same flexibility and simple command line operation.

Like it's predecessor it is a wrapper for the tools cutadapt, fastx-toolkit, and starcode.

Dependencies

  • cutadapt (sequence extraction)
  • starcode (sequence clustering)
  • fastx-toolkit (PHred score filtering)
  • pear (paired end read merging)
  • pysam (sam file convertion to fastq)

Recommended Installation Procedure

It's recommended to use conda to install and manage the dependencies for this package

conda env create -f https://raw.githubusercontent.com/brocklab/pycashier/main/environment.yml # or mamba env create -f ....
conda activate cashierenv
pycashier --help

Additionally you may install with pip. Though it will be up to you to ensure all the non-python dependencies are on the path and installed correctly.

pip install pycashier

Usage

Pycashier has one required argument which is the directory containing the fastq or sam files you wish to process.

conda activate cashierenv
pycashier ./fastqs

For additional parameters see pycashier -h.

As the files are processed two additional directories will be created pipeline and outs.

Currently all intermediary files generated as a result of the program will be found in pipeline.

While the final processed files will be found within the outs directory.

Merging Files

Pycashier can now take paired end reads and perform a merging of the reads to produce a fastq which can then be used with cashier's default feature.

pycashier ./fastqs -m

Processing Barcodes from 10X bam files

Pycashier can also extract gRNA barcodes along with 10X cell and umi barcodes.

Firstly we are only interested in the unmapped reads. From the cellranger bam output you would obtain these reads using samtools.

samtools view -f 4 possorted_genome_bam.bam > unmapped.sam

Then similar to normal barcode extraction you can pass a directory of these unmapped sam files to pycashier and extract barcodes. You can also still specify extraction parameters that will be passed to cutadapt as usual.

Note: The default parameters passed to cutadapt are unlinked adapters and minimum barcode length of 10 bp.

pycashier ./unmapped_sams -sc

When finished the outs directory will have a .tsv containing the following columns: Illumina Read Info, UMI Barcode, Cell Barcode, gRNA Barcode

Usage notes

Pycashier will NOT overwrite intermediary files. If there is an issue in the process, please delete either the pipeline directory or the requisite intermediary files for the sample you wish to reprocess. This will allow the user to place new fastqs within the source directory or a project folder without reprocessing all samples each time.

  • Currently, pycashier expects to find .fastq.gz files when merging and .fastq files when extracting barcodes. This behavior may change in the future.
  • If there are reads from multiple lanes they should first be concatenated with cat sample*R1*.fastq.gz > sample.R1.fastq.gz
  • Naming conventions:
    • Sample names are extracted from files using the first string delimited with a period. Please take this into account when naming sam or fastq files.
    • Each processing step will append information to the input file name to indicate changes, again delimited with periods.
For Tok-k passages that have passed through the Bi-Encoder Retrival, ReRank is performed using CrossEncoder.

Cross-Encoder-with-Bi-Encoder For Tok-k passages that have passed through the Bi-Encoder Retrival, ReRank is performed using CrossEncoder. Data Data u

7 Feb 09, 2022
ColabFold / AlphaFold2_advanced on your local PC (or macOS)

LocalColabFold ColabFold / AlphaFold2_advanced on your local PC (or macOS) Installation For Linux Make sure curl and wget commands are already install

Yoshitaka Moriwaki 207 Dec 22, 2022
Dev-meme - A repository that contains memes just for people like us

A repository that contains memes just for people like us. Coders are constantly

Padmashree Jha 4 Oct 31, 2022
This is a small compiler to demonstrate how compilers work.

This is a small compiler to demonstrate how compilers work. It compiles our own dialect to C, while being written in Python.

Md. Tonoy Akando 2 Jul 19, 2022
A submodule of rmcrkd/ODE-Uniqueness

Heston-ODE This repo contains the Heston-related code that accompanies the article One-sided maximal uniqueness for a class of spatially irregular ord

0 Jan 05, 2022
Provides guideline on how to configure pre-commit hooks in your own python project

Pre-commit Configuration Guide The main aim of this repository is to act as a guide on how to configure the pre-commit hooks in your existing python p

Faraz Ahmed Khan 2 Mar 31, 2022
My HA controller for veg and flower rooms

HAGrowRoom My HA controller for veg and flower rooms I will do my best to keep this updated as I change, add and improve. System heavily uses custom t

4 May 25, 2022
A Blender addon to enable reloading linked libraries from UI.

library_reload_linked_libraries A Blender addon to enable reloading linked libraries from UI.

3 Nov 27, 2022
A Python program for calculating the 95%CI for GNSS-derived site velocities

GNSS_Vel_95%CI A Python program for calculating the 95%CI for GNSS-derived site velocities Function_GNSS_95CI.py is a Python function for calculating

<a href=[email protected]"> 4 Dec 16, 2022
HOWTO: Downgrade from nYNAB to YNAB4

HOWTO: Downgrade from nYNAB to YNAB4 This page explains how to move from nYNAB to YNAB4 while retaining as much information as possible. See Appendix

Tobias Kunze 10 Dec 29, 2022
To lazy to read your homework ? Get it done with LOL

LOL To lazy to read your homework ? Get it done with LOL Needs python 3.x L:::::::::L OO:::::::::OO L:::::::::L L:::::::

KorryKatti 4 Dec 08, 2022
北大选课网2021年春季验证码识别

北大选课网验证码识别 2021 年春季学期 Powered by Elector Quartet (@Rabbit, @xmcp, @SpiritedAwayCN, @gzz) 数据集描述 最初的数据集为 5130 张人工标记的验证码,之后利用早期训练好的模型在选课网上进行自动验证 (自举),又收集

Rabbit 27 Sep 17, 2022
A jokes python module

Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/FayasNoushad/Jokes/blob/main/LICENSE Deploy

Fayas Noushad 3 Nov 28, 2021
A server shell for you to play with Powered by Django + Nginx + Postgres + Bootstrap + Celery.

A server shell for you to play with Powered by Django + Nginx + Postgres + Bootstrap + Celery.

Mengting Song 1 Jan 10, 2022
🥦 Send and receive nano with 2 simple functions

easy_nano Send and receive nano (without having to understand the nano protocol).

1 Feb 14, 2022
A tool to allow New World players to calculate the best place to put their Attribute Points for their build and level

New World Damage Simulator A tool designed to take a characters base stats including armor and weapons, level, and base damage of their items (slash d

Joseph P Langford 31 Nov 01, 2022
Viewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.

Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

DataCamp 114 Oct 12, 2022
Always fill your package requirements without the user having to do anything! Simple and easy!

WSL Should now work always-fill-reqs-python3 Always fill your package requirements without the user having to do anything! Simple and easy! Supported

Hashm 7 Jan 19, 2022
Slimbook Battery 4 is the new version with new features that improves battery control and increases battery duration in laptops.

Slimbookbattery Slimbook Battery 4 is the new version with new features that improves battery control and increases battery duration in laptops. This

SLIMBOOK TEAM 128 Dec 28, 2022
Exploiting Linksys WRT54G using a vulnerability I found.

Exploiting Linksys WRT54G Exploit # Install the requirements. pip install -r requirements.txt ROUTER_HOST=192.169.1.1 ROUTER_USERNAME=admin ROUTER_P

Elon Gliksberg 31 May 29, 2022