Toolchest provides APIs for scientific and bioinformatic data analysis.

Overview

Toolchest Python Client

Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of running tools on your own resources by running the same jobs on secure, powerful remote servers.

This package contains the Python client for using Toolchest. For the R client, see here.

Installation

The Toolchest client is available on PyPI:

pip install toolchest-client

Usage

Using a tool in Toolchest is as simple as:

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY")
toolchest.kraken2(
  tool_args="",
  inputs="path/to/input.fastq",
  output_path="path/to/output.fastq",
)

For a list of available tools, see the documentation.

Configuration

To use Toolchest, you must have an authentication key stored in the TOOLCHEST_KEY environment variable.

import toolchest_client as toolchest
toolchest.set_key("YOUR_TOOLCHEST_KEY") # or a file path containing the key

Contact Toolchest if:

  • you need a key
  • you’ve forgotten your key
  • the key is producing authentication errors.

Documentation & User Guide available at Read the Docs

Comments
  • Enable paired reads for `kraken2`

    Enable paired reads for `kraken2`

    Adds the option to use paired-read inputs for kraken2, via the read_one and read_two arguments (or a list of two paths via inputs).

    Adds/removes --paired to tool_args as necessary.

    opened by bcai2 3
  • v0.4.0

    v0.4.0

    • Add Poetry, remove Twine

    • Add CircleCI automatic deploy to PyPI (untested for prod PyPI)

    Note: CircleCI will be failing because v0.4.0 already exists on test PyPI. That is to be expected, because I already bumped it to v0.4.0 when testing.

    opened by lebovic 3
  • S3 chaining

    S3 chaining

    Adds:

    • Output class returned by all toolchest.tool() calls, which contains s3_uri, presigned_s3_url, and (local) output_path variables
    • S3 chaining, via supplying output.s3_uri from a previous tool as the inputs parameter for a following tool
    • the ability to skip download of any tool's output, by setting output_path=None (set to None by default)
    opened by lebovic 2
  • Polish tool_arg handling, add more STAR args

    Polish tool_arg handling, add more STAR args

    Adds:

    • More STAR args
    • Add multiple levels of tool_arg handling (whitelist, dangerlist, blacklist)
    • Error on unknown or blacklisted args
    • Reduce complexity (validation and parallelization for now) if a dangerous argument is passed

    Requires:

    • https://github.com/trytoolchest/toolchest-worker-node/pull/24
    • https://github.com/trytoolchest/toolchest-api/pull/22

    This does not fix:

    • Bigger disk/memory/etc requirements for larger files where args trigger reduced complexity / no parallelization
    opened by lebovic 2
  • STAR whitelist options

    STAR whitelist options

    • Adds basic whitelist options for STAR.

    • Adds support for tags with variable amounts of arguments. Adds the --quantMode tag for STAR.

    (This should be merged in after the kraken2 paired read commit.)

    opened by bcai2 2
  • feat: centrifuge base

    feat: centrifuge base

    • Adds the centrifuge tool.
    • Adds docs.
    • Refactors how prefix_mapping is generated for megahit with a new module (input_util.py) and function (convert_input_params_to_prefix_mapping). Adds a unit test for the function.
    opened by bcai2 1
  • fix: upload/download tracker bugfixes

    fix: upload/download tracker bugfixes

    • Refactors the tracking printed statements into a pythonic print call with string formatting.
    • Fixes status update logic in uploading. (This was causing the terminal output to stall at the "uploading" stage.)
    • Adds integration test dirs to .gitignore.
    opened by bcai2 1
  • fix: remove pysam due to multiple issues

    fix: remove pysam due to multiple issues

    Pysam has caused multiple issues as a package and STAR parallelization is not currently used so this pr fully removes pysam as a dependency. Either a different library or custom sam file merging code is planned to be implemented later so parallelization framework is remaining in the code for now.

    opened by jherr-dev 1
  • feat: add preliminary alphafold support

    feat: add preliminary alphafold support

    Adds basic support for running AlphaFold via Toolchest. Code needs to be cleaned up and better documented. Currently limited to 1 input fasta.

    use_reduced_dbs and is_prokaryote_list are currently disabled until further implementation and testing is done. Integration will come with reduced dbs since full dbs take 45 minutes to an hour to run even on simple input.

    opened by jherr-dev 1
  • feat: support async execution

    feat: support async execution

    Adds:

    • Support for async execution

    See https://gist.github.com/lebovic/72fbb857119f1667c7959a4d7e28cd50 (or the integration test) for a hacky example on how to run Toolchest with async execution.

    opened by lebovic 1
  • fix: set default version number

    fix: set default version number

    Sets the version number to a default instead of erroring if the client is run from source (i.e., without the toolchest-client package being installed via pip).

    Open question: the version number defaults to 0.0.0, which can be confusing -- are there any other labels that might be better (e.g., dev or just the empty string)?

    opened by bcai2 1
Releases(v0.11.3)
Owner
Toolchest
Toolchest
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
đź’¬ Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.

Chatistics Python 3 scripts to convert chat logs from various messaging platforms into Pandas DataFrames. Can also generate histograms and word clouds

Florian 893 Jan 02, 2023
Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video.

Datashredder is a simple data corruption engine written in python. You can corrupt anything text, images and video. You can chose the cha

2 Jul 22, 2022
Find exposed data in Azure with this public blob scanner

BlobHunter A tool for scanning Azure blob storage accounts for publicly opened blobs. BlobHunter is a part of "Hunting Azure Blobs Exposes Millions of

CyberArk 250 Jan 03, 2023
WAL enables programmable waveform analysis.

This repro introcudes the Waveform Analysis Language (WAL). The initial paper on WAL will appear at ASPDAC'22 and can be downloaded here: https://www.

Institute for Complex Systems (ICS), Johannes Kepler University Linz 40 Dec 13, 2022
Full ELT process on GCP environment.

Rent Houses Germany - GCP Pipeline Project: The goal of the project is to extract data about house rentals in Germany, store, process and analyze it u

Felipe Demenech Vasconcelos 2 Jan 20, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenari

epispot 12 Dec 28, 2022
INF42 - Topological Data Analysis

TDA INF421(Conception et analyse d'algorithmes) Projet : Topological Data Analysis SphereMin Etant donné un nuage des points, ce programme contient de

2 Jan 07, 2022
Hidden Markov Models in Python, with scikit-learn like API

hmmlearn hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models. For supervised learning learning of HMMs and

2.7k Jan 03, 2023
Developed for analyzing the covariance for OrcVIO

about This repo is developed for analyzing the covariance for OrcVIO environment setup platform ubuntu 18.04 using conda conda env create --file envir

Sean 1 Dec 08, 2021
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

Mayara Canaver 4 Aug 06, 2022
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023
scikit-survival is a Python module for survival analysis built on top of scikit-learn.

scikit-survival scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizi

Sebastian Pölsterl 876 Jan 04, 2023
Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. As part of the TensorFl

3.8k Jan 05, 2023
Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

Bhavya Gopal 3 Jan 31, 2022
Cleaning and analysing aggregated UK political polling data.

Analysing aggregated UK polling data The tweet collection & storage pipeline used in email-service is used to also collect tweets from @britainelects.

Ajay Pethani 0 Dec 22, 2021
Semi-Automated Data Processing

Perform semi automated exploratory data analysis, feature engineering and feature selection on provided dataset by visualizing every possibilities on each step and assisting the user to make a meanin

Arun Singh Babal 1 Jan 17, 2022