Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Overview

find_te_ins

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2)

Install

$ git clone https://github.com/bakerwm/find_te_ins.git
$ cd find_te_ins

Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11;

$ bash run_pipe.sh
run_pipe.sh 
    
    

    
   

Prerequisite

  • minimap2 - 2.17-r974-dirty, align long reads to reference genome
  • featureCounts - v2.0.0, quantification
  • samtools - v1.12, working with BAM files
  • python 3.8+
  • pysam 0.16.0.1, python module, working with BAM files

Getting Started

1 Prepare input files

  • genome_fa - reference genome in fasta format, in script run_pipe.sh, line-10
  • te_fa - TE consensus sequence in fasta format, in script run_pipe.sh, line-11
  • long reads - Long reads from NanoPore or Pacbio, in fasta or fastq format

2 Run pipe

$ cd ~/work/te_ins
# specify the path of long reads data: 
   
    /
   
$ git clone https://github.com/bakerwm/find_te_ins.git 
$ bash find_te_ins/run_pipe.sh <path-to-long-reads>/ results

[1/9] align to reference genome
[2/9] extract raw insertions from BAM, by CIGAR
[3/9] convert raw insertions to fasta format
[4/9] align raw_insertion to transposon
[5/9] extract transposon name for insertions
[6/9] merge raw_insertions by window=100
[7/9] count reads for each insertion
[8/9] save final insertions to file
[9/9] Done!

3 Output

The following files listed below are the output of the pipeline, the TE insertions saved in file *.te_ins.final.bed

$ tree -L 2 results/ONT_sample-1
.
├── ONT_sample-1
│   ├── ONT_sample-1.bam
│   ├── ONT_sample-1.bam.bai
│   ├── ONT_sample-1.raw_ins.bed
│   ├── ONT_sample-1.raw_ins.fa
│   ├── ONT_sample-1.raw_ins.fa.bam
│   ├── ONT_sample-1.raw_ins.fa.bam.bai
│   ├── ONT_sample-1.te_ins.bed
│   ├── ONT_sample-1.te_ins.final.bed
│   ├── ONT_sample-1.te_ins.final.bed6
│   ├── ONT_sample-1.te_ins.gtf
│   ├── ONT_sample-1.te_ins.quant.stderr
│   ├── ONT_sample-1.te_ins.quant.stdout
│   ├── ONT_sample-1.te_ins.quant.txt
│   ├── ONT_sample-1.te_ins.quant.txt.summary
│   ├── ONT_sample-1.te_ins.raw.txt
│   ├── run_minimap2.dm6.stderr
│   └── run_minimap2.dm6_transposon.stderr
...

{sample_name}.te_ins.final.bed

column 1. chr name of reference 
column 2. start pos of Insertion 
column 3. end pos of Insertion 
column 4. insertion name 
column 5. a fixed integer [255]  
column 6. strand # in current version, not consider the dirction of TE insertions !!!
column 7. name of TE consensus 
column 8. length of TE consensus  
column 9. proportion of the TE consensus identified  
column 10. number of supported reads for the insertion 
column 11. number of all reads cover the insertion 
column 12. proportion TE supported reads 
column 13. type of the TE insertions [full, p3, p5]

{sample_name}.te_ins.raw.txt

column 16 (last column), is the type of TE insertions: [full, p3, p5]

  • full, more then cutoff [60%] of the TE consensus were detected
  • p3, only the 3' end of the TE consensus were detected
  • p5, only the 5' end of the TE consensus were detected

In the .final.bed file, ONLY full TE insertions were saved for further analysis

Change criteria

TE types were defined in run_pipe.sh by anno_te.py, the criteria -c 0.6 could be changed to [0-1] float number based on your condition. see line-100 in file run_pipe.sh

# line-100 of run_pipe.sh
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# change criteria to 0.7
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} -c 0.7 ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# remove te_ins files, and run the command again
$ rm results/ONT_sample-1.te_ins*
$ bash find_te_ins/run_pipe.sh 
   
    / results

   

How it works?

  1. extract INSERTIONS
Owner
Ming Wang
Ming Wang
This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols.

Regex-Reverser This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols. Screenshots Usage/Examples py

x19 0 Jun 02, 2022
Utils to quickly evaluate many 🤗 models on the GLUE tasks

Utils to quickly evaluate many 🤗 models on the GLUE tasks

Przemyslaw K. Joniak 1 Dec 22, 2021
Various hdas (Houdini Digital Assets)

aaTools My various assets for Houdini "ms_asset_loader" - Custom importer assets from Quixel Bridge "asset_placer" - Tool for placment sop geometry on

9 Dec 19, 2022
Final project in KAIST AI class

mmodal_mixer MLP-Mixer based Multi-modal image-text retrieval Image: Original image is cropped with 16 x 16 patch size without overlap. Then, it is re

SuperSuperMoon 5 May 30, 2022
A script where you execute a script that generates a base project for your gdextension

GDExtension Project Creator this is a script (currently only for linux) where you execute a script that generates a base project for your gdextension,

Unknown 11 Nov 17, 2022
The Open edX platform, the software that powers edX!

This is the core repository of the Open edX software. It includes the LMS (student-facing, delivering courseware), and Studio (course authoring) compo

edX 6.2k Jan 01, 2023
because rico hates uuid's

terrible-uuid-lambda because rico hates uuid's sub 200ms response times! Try it out here: https://api.mathisvaneetvelde.com/uuid https://api.mathisvan

Mathis Van Eetvelde 2 Feb 15, 2022
The ldapconsole script allows you to perform custom LDAP requests to a Windows domain

ldapconsole The ldapconsole script allows you to perform custom LDAP requests to a Windows domain. Features Authenticate with password Authenticate wi

Podalirius 38 Dec 09, 2022
Lightweight and Modern kernel for VK Bots

This is the kernel for creating VK Bots written in Python 3.9

Yrvijo 4 Nov 21, 2021
Modern API wrapper for Genshin Impact built on asyncio and pydantic.

genshin.py Modern API wrapper for Genshin Impact built on asyncio and pydantic.

sadru 212 Jan 06, 2023
The Begin button and menu for the Meadows operating system. The start button for UNIX/Linux.

By: Seanpm2001, Meadows Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afri

Sean P. Myrick V19.1.7.2 4 Aug 28, 2022
flake8 plugin which forbids match statements (PEP 634)

flake8-match flake8 plugin which forbids match statements (PEP 634)

Anthony Sottile 25 Nov 01, 2022
Python Programmma DarkMap.py

DarkMap Python Programmma DarkMap.py O'rganish va rasmlarni ko'riosh https://drive.google.com/drive/folders/1l1zybs_0Zy9z_trZYz5R72WrwsE6mFOh?usp=shar

Og'abek 0 May 06, 2022
An educational platform for students

Watch N Learn About Watch N Learn is an educational platform for students. Watch N Learn incentivizes students to learn with fun activities and reward

Brian Law 3 May 04, 2022
MoBioTools A simple yet versatile toolkit to automatically setup quantum mechanics/molecular mechanics

A simple yet versatile toolkit to setup quantum mechanical/molecular mechanical (QM/MM) calculations from molecular dynamics trajectories.

MoBioChem 17 Nov 27, 2022
A collection of daily usage utility scripts in python. Helps in automation of day to day repetitive tasks.

Kush's Utils Tool is my personal collection of scripts which is used to automated daily tasks. It is a evergrowing collection of scripts and will continue to evolve till the day I program. This is al

Kushagra 10 Jan 16, 2022
通过简单的卷积神经网络直接预测出验证码图片中滑块的位置

使用说明 1. 在本地测试 运行python3 prdict_one.py即可,默认需要预测的图片路径位于testImg文件夹下的test1.png 运行python3 predict_folder.py预测testImg下的所有图片 2. 部署到服务器 运行python3 run_a_server

12 Mar 08, 2022
This is sample project needed for security course to connect web service to database

secufaku This is sample project needed for security course to "connect web service to database". Why it suits alignment purpose It connects to postgre

Mark Nicholson 6 May 15, 2022
Python bilgilerimi eğlenceli bir şekilde hatırlamak ve daha da geliştirmek için The Big Book of Small Python Projects isimli bir kitap almıştım.

Python bilgilerimi eğlenceli bir şekilde hatırlamak ve daha da geliştirmek için The Big Book of Small Python Projects isimli bir kitap almıştım. Bu repo kitaptaki örnek programları çalıştığım oyun al

Burak Selim Senyurt 22 Oct 26, 2022
An OrpheusDL Tidal module

OrpheusDL - Tidal A Tidal module for the OrpheusDL modular archival music program Report Bug · Request Feature Table of content About OrpheusDL - Tida

Daniel 54 Dec 29, 2022