Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Um simples bot escrito em Python usando a lib pyTelegramBotAPI

Telegram Bot Python Um simples bot escrito em Python usando a lib pyTelegramBotAPI Instalação Windows: Download do Python 3 Aqui Download do ZIP do Có

Sr_Yuu 1 May 07, 2022
Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily

Leakvertise Leakvertise is a Python open-source project which aims to bypass these fucking annoying captchas and ads from linkvertise, easily. You can

Quatrecentquatre 9 Oct 06, 2022
A multipurpose, semi-modular Discord bot written in Python with the new discord.py module.

Discord.py Reaction Bot MIRAI KURIYAMA A multipurpose, semi-modular Discord bot written in Python with the new discord.py module. Installing dependenc

1 Dec 02, 2021
Unofficial Python API client for Notion.so

notion-py Unofficial Python 3 client for Notion.so API v3. Object-oriented interface (mapping database tables to Python classes/attributes) Automatic

Jamie Alexandre 3.9k Jan 03, 2023
python script to buy token from pancakeswap

pancakeswapBot python script to buy token from pancakeswap Change your privatekey!!! on line 58 (signed_txn = web3.eth.account.sign_transaction(pancak

206 Dec 31, 2022
A full-featured Python wrapper for the Onfleet API.

UPDATE: Please use Onfleet's wrapper instead. This repository is not maintained. https://github.com/onfleet/pyonfleet --- Python-Onfleet   python-onfl

Lionheart Software 11 Jan 13, 2022
A project in order to analyze user's favorite musics, artists and genre

Spotify-Wrapped This is a project about Spotify Wrapped (which is an extra option for premium accounts, but you don't need to be premium here) This pr

Hossein Mohseni 19 Jan 04, 2023
Tools used by Ada Health's internal IT team to deploy and manage a serverless Munki setup.

Serverless Munki This repository contains cross platform code to deploy a production ready Munki service, complete with AutoPkg, that runs entirely fr

Ada Health 17 Dec 05, 2022
Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate.

LibreTranslate Try it online! | API Docs Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on pro

UAV4GEO 3.5k Jan 03, 2023
Advance Anonymous Sender bot with Caption Editor

AnonyMous Sender 👨‍💻 Advanced Anonymous Sender with Caption Editor Join @DaisySupport_Official 🎵 for help Features Get forwarded messages without f

Inuka Asith 13 Oct 09, 2022
Мои личные наработки по новому API Тинькофф. Не официально.

TinkoffNewAPI Мои личные наработки по новому API Тинькофф. Не официально. Официально по ссылке: https://github.com/Tinkoff/investAPI/ Выложено по прос

1 Jan 20, 2022
A program that generates discord.py code

discord-py-generator A program that generates discord.py code Setup in cmds.txt file add your user id, client id and bot token you can change the bot

3 Dec 15, 2022
Telegram vc - A bot that can play music on telegram group's voice call

Telegram Voice Chat Bot A bot that can play music on telegram group's voice call

1 Jan 02, 2022
A twitter multi-tool for OSINT on twitter accounts.

TwitterCheckr A twitter multi-tool for OSINT on twitter accounts. Infomation TwitterCheckr also known as TCheckr is multi-tool for OSINT on twitter a

IRIS 16 Dec 23, 2022
New developed moderation discord bot by archisha

Monitor42 New developed moderation discord bot by αrchιshα#5518. Details Prefix: 42! Commands: Moderation Use 42!help to get command list. Invite http

Kamilla Youver 0 Jun 29, 2022
RevSpotify is a fast, useful telegram bot to have Spotify music on Telegram.

RevSpotify A Telegram Bot that can download music from Spotify RevSpotify is a fast, useful telegram bot to have Spotify music on Telegram. ✨ Features

Alireza Shabani 12 Sep 12, 2022
Student-Management-System-in-Python - Student Management System in Python

Student-Management-System-in-Python Student Management System in Python

G.Niruthian 3 Jan 01, 2022
This is a bot which you can use in telegram to spam without flooding and enjoy being in the leaderboard

Telegram-Count-spamming-Bot This is a bot which you can use in telegram to spam without flooding and enjoy being in the leaderboard You can avoid the

Lalan Kumar 1 Oct 23, 2021
PRNT.sc Image Grabber

PRNTSender PRNT.sc Image Grabber PRNTSender is a script that takes images posted on PRNT.sc and sends them to a Discord webhook, if you want to know h

neox 2 Dec 10, 2021