Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Overview

Full Spectrum Bioinformatics

DOI

NSF-1942647.

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code). You can also view a static version of each section using the nbviewer links. If using the direct GitHub links, you may sometimes get a GitHub error message. Usually hitting reload page or using the nbviewer link avoids this issue.

licensebuttons by-nc-sa
Lead Author: Jesse Zaneveld1
Vignette Authors: Nia Prabhu*1, Aziz Bajouri*1,2, Ayomikun Akinrinade*1,3

* Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA

The text is currently in prototype status. Chapters with content you can preview are linked below:

This project is being developed with support from NSF Integrative and Organismal Systems award NSF-1942647.

Feedback

You can submit feedback about completed chapters at the following link

Comments
  • Bump nokogiri from 1.10.9 to 1.11.1

    Bump nokogiri from 1.10.9 to 1.11.1

    Bumps nokogiri from 1.10.9 to 1.11.1.

    Release notes

    Sourced from nokogiri's releases.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    SHA-256 Checksums of published gems

    a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c  nokogiri-1.11.1-arm64-darwin.gem
    d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d  nokogiri-1.11.1-java.gem
    f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc  nokogiri-1.11.1-x64-mingw32.gem
    dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f  nokogiri-1.11.1-x86-linux.gem
    7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046  nokogiri-1.11.1-x86-mingw32.gem
    5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7  nokogiri-1.11.1-x86_64-darwin.gem
    c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2  nokogiri-1.11.1-x86_64-linux.gem
    42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa  nokogiri-1.11.1.gem
    

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    ... (truncated)

    Changelog

    Sourced from nokogiri's changelog.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
    • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

    Security

    See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

    Added

    • Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

    ... (truncated)

    Commits
    • 7be6f04 version bump to v1.11.1
    • aa0c399 dev: overhaul .gitignore
    • 3d90c6d Merge pull request #2169 from sparklemotion/2168-active-support-test-failure
    • bbf850c changelog: update for #2168
    • ee69772 ci: another valgrind suppression
    • f9a2c4e fix: restore proper error handling in the SAX push parser
    • 35aa88b fix(cruby): reset libxml2's error handler in sax and push parsers
    • 07459fd fix(test): clobber libxml2's global error handler before every test
    • b682ac5 ci: ensure all tests are running setup
    • 007662f github: update "installation difficulty" issue template
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Missing reading response link on

    Missing reading response link on "Error Messages in Python"

    opened by LucaOnline 1
  • Add discussion of HISAT2 & transcriptomics

    Add discussion of HISAT2 & transcriptomics

    HiSat2 https://anaconda.org/bioconda/hisat2

    Salmon intro (another alternative that interoperates well with DESeq2) https://combine-lab.github.io/salmon/getting_started/

    opened by zaneveld 0
  • Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    In addition we found a more recent study that showed that [research finding] (cite1;cite2). --> [research finding]

    In a 2016 study it was shown that [finding])(cite1) --> finding

    opened by zaneveld 0
  • More database links:  https://www.cbioportal.org/  (Cancer research database)  https://www.idigbio.org/  (Integrated digitized biocollections)  https://www.gbif.org/ (biodiversity data)  https://bceenetwork.org/cure-summaries/  https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    More database links: https://www.cbioportal.org/ (Cancer research database) https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    Open resources shared in the 2022 AACU Talks (CUREing Cancer: How a Virtual Cancer Genomics CURE Made Research Accessible to Students During COVID and another was on Expanding Access to Undergraduate Research Through BCEENET Cures Using Digitized Collections Data) on CUREs (shared by Robin Angotti):

    https://www.cbioportal.org/ (Cancer research database)
    https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    opened by zaneveld 0
Releases(release-2022.3.1)
  • release-2022.3.1(Mar 2, 2022)

    What's Changed

    The 2022.3.1 Release of Full Spectrum Bioinformatics greatly expands the scope and maturity of the text, including contributions from 3 undergraduate co-authors. This text has now been used to support multiple classes, and has 35 sections that are linked from the table of content and ready for classroom use.

    Here are some of the major changes:

    The text has several new sections: -- An overview of python syntax now overviews how to recognize python syntax before we dive into studying the details -- A first chapter on sequence alignment now covers Needleman-Wunsch alignment, both as worked by hand using a simple example, and an implementation in numpy. -- The text now discusses linear models, with accompanying illustrations as well as figures -- An Error Bingo exercise now encourages students to intentionally trigger and learn from errors
    -- An extensive section has been added discussing common errors in python, why they most commonly occur, and how to fix them.

    -- 3 undergraduate contributors have added Bioinformatics Vignettes showing how to apply the principles in the text to biological problems: - Nia Prabhu (nucleotide composition) - Aziz Bajouri (set analysis) - Ayomikun Akinrinade (machine learning)

    -- A section has been added on revising writing about statistical results -- An initial draft section on visualizing correlation has been added showing how a scatterplot can be revised to add linear regression results, 95% confidence intervals, and to better meet recommendations for data visualization. -- The Data Sources page has been greatly updated, and now includes logos for linked resources

    New Draft Sections: -- A draft section on student activism and fighting for an inclusive workplace has been added. -- A draft section on network analysis has several in-progress code commits (not yet linked from main table of contents)

    Other changes: -- Full Spectrum Bioinformatics has now adopted a code of conduct -- Many minor fixes -- Exercises have been added to many sections that previously lacked them -- The exercise on calculating CG content in the human genome has been updated -- Several chapters have been updated to include Feedback links that were previously missing -- Unused Jupyter Book files have been removed

    Full Changelog: https://github.com/zaneveld/full_spectrum_bioinformatics/compare/release-2020.12.1...release-2022.3.1

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics_2022.3.0.zip(182.17 MB)
  • release-2020.12.1(Dec 8, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    NOTE: this is very similar to release-2020.12.0, other than minor edits to the readme but I need to re-release to trigger Zenodo to generate a DOI.

    Source code(tar.gz)
    Source code(zip)
  • release-2020.12.0(Dec 7, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics.zip(84.89 MB)
Owner
Jesse Zaneveld
Jesse Zaneveld
Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

Aryan Kargwal 19 Feb 17, 2022
This is a MD5 password/passphrase brute force tool

CROWES-PASS-CRACK-TOOl This is a MD5 password/passphrase brute force tool How to install: Do 'git clone https://github.com/CROW31/CROWES-PASS-CRACK-TO

9 Mar 02, 2022
Topic Inference with Zeroshot models

zeroshot_topics Table of Contents Installation Usage License Installation zeroshot_topics is distributed on PyPI as a universal wheel and is available

Rita Anjana 55 Nov 28, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets What is LASSL • How to Use What is LASSL LASSL은 LAnguage Semi-Super

LASSL: LAnguage Self-Supervised Learning 116 Dec 27, 2022
Conversational-AI-ChatBot - Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users!

Conversational AI ChatBot Intelligent ChatBot built with Microsoft's DialoGPT transformer to make conversations with human users! In this project? Thi

Rajkumar Lakshmanamoorthy 6 Nov 30, 2022
Fidibo.com comments Sentiment Analyser

Fidibo.com comments Sentiment Analyser Introduction This project first asynchronously grab Fidibo.com books comment data using grabber.py and then sav

Iman Kermani 3 Apr 15, 2022
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021
Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API

gpt3-instruct-sandbox Interactive Jupyter Notebook Environment for using the GPT-3 Instruct API Description This project updates an existing GPT-3 san

312 Jan 03, 2023
Yodatranslator is a simple translator English to Yoda-language

yodatranslator Overview yodatranslator is a simple translator English to Yoda-language. Project is created for educational purposes. It is intended to

1 Nov 11, 2021
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.

BERT Convolutions Code for the paper Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models. Contains expe

mlpc-ucsd 21 Jul 18, 2022
DANeS is an open-source E-newspaper dataset by collaboration between DATASET JSC (dataset.vn) and AIV Group (aivgroup.vn)

DANeS - Open-source E-newspaper dataset Source: Technology vector created by macrovector - www.freepik.com. DANeS is an open-source E-newspaper datase

DATASET .JSC 64 Aug 17, 2022
The source code of HeCo

HeCo This repo is for source code of KDD 2021 paper "Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning". Paper Link: htt

Nian Liu 106 Dec 27, 2022
Deep Learning for Natural Language Processing - Lectures 2021

This repository contains slides for the course "20-00-0947: Deep Learning for Natural Language Processing" (Technical University of Darmstadt, Summer term 2021).

0 Feb 21, 2022
This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

Description 💻 This is an incredibly powerful calculator that is capable of many useful day-to-day functions. Such functions include solving basic ari

Jordan Leich 37 Nov 19, 2022
A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

We provide a notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository. The notebook also shows how to segment the corpus using BPE tokenizatio

Computation for Indian Language Technology (CFILT) 9 Oct 13, 2022
Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

T5: Text-To-Text Transfer Transformer The t5 library serves primarily as code for reproducing the experiments in Exploring the Limits of Transfer Lear

Google Research 4.6k Jan 01, 2023
Leon is an open-source personal assistant who can live on your server.

Leon Your open-source personal assistant. Website :: Documentation :: Roadmap :: Contributing :: Story 👋 Introduction Leon is an open-source personal

Leon AI 11.7k Dec 30, 2022
Training code of Spatial Time Memory Network. Semi-supervised video object segmentation.

Training-code-of-STM This repository fully reproduces Space-Time Memory Networks Performance on Davis17 val set&Weights backbone training stage traini

haochen wang 128 Dec 11, 2022