InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

Overview

CRISPRanalysis

InDels analysis of CRISPR lines by NGS amplicon sequencing technology for a multicopy gene family.

In this work, we present a workflow to analyze InDels from the multicopy α-gliadin gene family from wheat based on NGS data without the need to pre-viously establish a reference sequence for each genetic background. The pipeline was tested it in a multiple sample set, including three generations of edited wheat lines (T0, T1, and T2), from three different backgrounds and ploidy levels (hexaploid and tetraploid). Implementation of Bayesian optimization of Usearch parameters, inhouse Python, and bash scripts are reported.

Workflow:

Step1:

Bayesian optimization was implemented to optimize Usearch v9.2.64 parameters from merge to search steps for the α-gliadin amplicons on the wild type lines.

python Step1_Bayesian_usearch.py --database 
   
     --file_intervals 
    
      --trim_primers 
     
       --path_usearch_control 
      

      
     
    
   


Help:

Argument Help
--database File fasta with database sequences. Example: /path/to/database/database.fasta.
--file_intervals File with intervals for parameters. Example in /Examples/Example_intervals.txt.
--trim_primers Trim primers in reads if you use database without primers. Optios: YES | NO.
--path_usearch_control Path of usearch and control raw data separated by "," without white spaces. Example: /paht/to/usearch,/path/to/reads_control.


Outputs:

  • Bayesian_usearch.txt File with optimal values, optimal function value, samples or observations, obatained values and search space.
  • Bayesian.png Convergence plot.
  • Bayesian_data_res.txt File with the minf(x) after n calls in each iteration.

Step 2:

Usearch pipeline optimazed on wild type lines for studying results of optimization.

Step2_usearch_WT_to_DB.sh dif pct maxee amp id path_control name_dir_usearch path_database trim_primers


Help:

Arguments must be disposed in the order indicated before.

  • dif Optimal value for dif Usearch parameter.
  • pct Optimal value for pct Usearch parameter.
  • maxee Optimal value for maxee Usearch parameter.
  • amp Optimal value for amp Usearch parameter.
  • id Optimal value for id Usearch parameter.
  • path_control Path of the wild type lines fastq files.
  • name_dir_usearch Path of Usearch.
  • path_database Path of alpha-gliadin amplicon database.
  • trim_primers Trim primers in reads if you use database without primers. Optios: YES | NO.


Outputs:

Usearch merge files, filter files, unique amplicons file, unique denoised amplicon (Amp/otu) file, otu table file.

Step 3:

Usearch pipeline optimazed on all lines (wild types and CRISPR lines) for studying denoised unique amplicon relative abundances.

Step3_usearch_ALL_LINES.sh dif pct maxee amp id path_ALL name_dir_usearch trim_primers


Help:

Arguments must be disposed in the order indicated before.

  • dif Optimal value for dif Usearch parameter.
  • pct Optimal value for pct Usearch parameter.
  • maxee Optimal value for maxee Usearch parameter.
  • amp Optimal value for amp Usearch parameter.
  • id Optimal value for id Usearch parameter.
  • path_ALL Path of all lines (wild type and CRISPR lines) fastq files.
  • name_dir_usearch Path of Usearch.
  • trim_primers Trim primers in reads if you use database without primers. Optios: YES | NO.


Outputs:

Usearch merge files, filter files, unique amplicon file, unique denoised amplicon (Amp/otu) file, otu table file.

Before Step 4, otu table file must be normalized by TMM normalization method (edgeR package in R). Results of TMM normalized unique denoised amplicons table can be represented as heatmaps. Unique denoised amplicons can be compared between them to detect Insertions and Deletions (InDels) in CRISPR lines.

Step 4:

Create tables with the presence or absence of unique denoised amplicons in each CRISPR line compared to the wild type lines.

python Step4_usearch_to_table.py --file_otu 
   
     --file_group 
    
      --prefix_output 
     
       --genotype 
      

      
     
    
   


Help:

Argument Help
--file_otu File of TMM normalized otu_table from usearch. Remove "#OTU" from the first line.
--file_group Path to file of genotypes in wild type and CRISPR lines. Example in /Examples/Example_groups.txt.
--prefix_output Prefix to output name. Example: if you are working with BW208 groups: BW.
--genotype Genotype name. Example: if you are working with BW208 groups: BW208.

Default threshold 0.3 % of frequency of each unique denoised amplicon (Amp) in each line.


Outputs:

Substitute "name" in output names for the prefix_output string.

  • Amptable_frequency.txt Table of Amps (otus) transformed to frequencies for apply the threshold.
  • Amptable_brutes_name.txt Table with number of reads contained in the unique denoised amplicons (Amps) present in each line.
  • Amps_name.txt Table with number of unique denoised amplicons (Amps) in each line.

Python 3.6 or later is required.

Predictive Modeling & Analytics on Home Equity Line of Credit

Predictive Modeling & Analytics on Home Equity Line of Credit Data (Python) HMEQ Data Set In this assignment we will use Python to examine a data set

Dhaval Patel 1 Jan 09, 2022
Average time per match by division

HW_02 Unzip matches.rar to access .json files for matches. Get an API key to access their data at: https://developer.riotgames.com/ Average time per m

11 Jan 07, 2022
A real data analysis and modeling project - restaurant inspections

A real data analysis and modeling project - restaurant inspections Jafar Pourbemany 9/27/2021 This project represents data analysis and modeling of re

Jafar Pourbemany 2 Aug 21, 2022
Zipline, a Pythonic Algorithmic Trading Library

Zipline is a Pythonic algorithmic trading library. It is an event-driven system for backtesting. Zipline is currently used in production as the backte

Quantopian, Inc. 15.7k Jan 07, 2023
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
Convert monolithic Jupyter notebooks into Ploomber pipelines.

Soorgeon Join our community | Newsletter | Contact us | Blog | Website | YouTube Convert monolithic Jupyter notebooks into Ploomber pipelines. soorgeo

Ploomber 65 Dec 16, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 09, 2023
My first Python project is a simple Mad Libs program.

Python CLI Mad Libs Game My first Python project is a simple Mad Libs program. Mad Libs is a phrasal template word game created by Leonard Stern and R

Carson Johnson 1 Dec 10, 2021
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos

Graham Lea 12 May 26, 2022
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Package for decomposing EMG signals into motor unit firings, as used in Formento et al 2021.

EMGDecomp Package for decomposing EMG signals into motor unit firings, created for Formento et al 2021. Based heavily on Negro et al, 2016. Supports G

13 Nov 01, 2022
bigdata_analyse 大数据分析项目

bigdata_analyse 大数据分析项目 wish 采用不同的技术栈,通过对不同行业的数据集进行分析,期望达到以下目标: 了解不同领域的业务分析指标 深化数据处理、数据分析、数据可视化能力 增加大数据批处理、流处理的实践经验 增加数据挖掘的实践经验

Way 2.4k Dec 30, 2022
A tax calculator for stocks and dividends activities.

Revolut Stocks calculator for Bulgarian National Revenue Agency Information Processing and calculating the required information about stock possession

Doino Gretchenliev 200 Oct 25, 2022
OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

opendrift OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere. Do

OpenDrift 167 Dec 13, 2022
Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data

WeRateDogs Twitter Data from 2015 to 2017 Udacity - Data Analyst Nanodegree - Project 4 - Wrangle and Analyze Data Table of Contents Introduction Proj

Keenan Cooper 1 Jan 12, 2022
simple way to build the declarative and destributed data pipelines with python

unipipeline simple way to build the declarative and distributed data pipelines. Why you should use it Declarative strict config Scaffolding Fully type

aliaksandr-master 0 Jan 26, 2022
CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner.

CaterApp is a cross platform, remotely data sharing tool created for sharing files in a quick and secured manner. It is aimed to integrate this tool with several more features including providing a U

Ravi Prakash 3 Jun 27, 2021
LynxKite: a complete graph data science platform for very large graphs and other datasets.

LynxKite is a complete graph data science platform for very large graphs and other datasets. It seamlessly combines the benefits of a friendly graphical interface and a powerful Python API.

124 Dec 14, 2022
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

largeVis This is an implementation of the largeVis algorithm described in (https://arxiv.org/abs/1602.00370). It also incorporates: A very fast algori

336 May 25, 2022