Collapse a set of redundant kmers to use IUPAC degenerate bases

Overview

kmer-collapse

Collapse a set of redundant kmers to use IUPAC degenerate bases

Overview

Given an input set of kmers, find the smallest set of kmers that encapsulates all diversity in the input set using IUPAC degenerate bases. This aims to solve the problem described here: https://www.biostars.org/p/9498272/

Usage

Install the marisa-trie library, if necessary.

Modify the script's input variable to specify desired sequences, and then run the script:

$ python kmer-collapse.py
{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "ACAAAAAAAA",
        "AGAAAAAAAA"
    ],
    "encoded_output": [
        "WAAAAAAAAA",
        "ASAAAAAAAA"
    ]
}

Notes

This has not been tested with any kmer sets but those examples provided. However, it aims to be scalable by pruning combinations of sub-kmers along the way, which would otherwise yield incorrect encodings. This also uses a trie for faster prefix testing. If futher performance is needed, some easy wins would be to cache sub-kmer tests, since most of these test outcomes would be redundant.

Additionally, no error checking is done on the input kmer alphabet or on the consistency of kmer lengths. It may be useful to validate input before using this script.

Examples

These examples are available from the script by uncommenting the relevant input.

A

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA"
    ],
    "encoded_output": [
        "WAAAAAAAAA"
    ]
}

B

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "GCGAAAAAAA"
    ],
    "encoded_output": [
        "GCGAAAAAAA",
        "WAAAAAAAAA"
    ]
}

C

{
    "input": [
        "AAAAAAAAAA"
    ],
    "encoded_output": [
        "AAAAAAAAAA"
    ]
}

D

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "CAAAAAAAAA",
        "GAAAAAAAAA"
    ],
    "encoded_output": [
        "NAAAAAAAAA"
    ]
}

E

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "TTAAAAAAAA",
        "ATAAAAAAAA"
    ],
    "encoded_output": [
        "WWAAAAAAAA"
    ]
}

F

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "CAAAAAAAAA",
        "GAAAAAAAAA",
        "TACAGATACA",
        "AACAGAAAAA"
    ],
    "encoded_output": [
        "NAAAAAAAAA",
        "TACAGATACA",
        "AACAGAAAAA"
    ]
}

G

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "ACAAAAAAAA",
        "AGAAAAAAAA"
    ],
    "encoded_output": [
        "ASAAAAAAAA",
        "WAAAAAAAAA"
    ]
}
Owner
Alex Reynolds
Pug caregiver, curler, cyclist, gardener, beginning French scholar
Alex Reynolds
A small project of two newbies, who wanted to learn something about Python language programming, via fun way.

HaveFun A small project of two newbies, who wanted to learn something about Python language programming, via fun way. What's this project about? Well.

Patryk Sobczak 2 Nov 24, 2021
Python Cheat Sheet

Introduction Pysheeet was created with intention of collecting python code snippets for reducing coding hours and making life easier and faster. Any c

CHANG-NING TSAI 7.5k Dec 30, 2022
CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。

CountBoard 是一个基于Tkinter简单的,开源的桌面日程倒计时应用。 基本功能 置顶功能 是否使窗体一直保持在最上面。 简洁模式 简洁模式使窗体更加简洁。 此模式下不可调整大小,请提前在普通模式下调整大小。 设置功能 修改主窗体背景颜色,修改计时模式。 透明设置 调整窗体的透明度。 修改

gaoyongxian 130 Dec 01, 2022
Dockernized ZeroTierOne controller with zero-ui web interface.

docker-zerotier-controller Dockernized ZeroTierOne controller with zero-ui web interface. 中文讨论 Customize ZeroTierOne's controller planets Modify patch

sbilly 209 Jan 04, 2023
An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

An addin for Autodesk Fusion 360 that lets you view your design in a Looking Glass Portrait 3D display

Brian Peiris 12 Nov 02, 2022
About A python based Apple Quicktime protocol,you can record audio and video from real iOS devices

介绍 本应用程序使用 python 实现,可以通过 USB 连接 iOS 设备进行屏幕共享 高帧率(30〜60fps) 高画质 低延迟(200ms) 非侵入性 支持多设备并行 Mac OSX 安装 python =3.7 brew install libusb pkg-config 如需使用 g

YueC 124 Nov 30, 2022
Pyrmanent - Make all your classes permanent in a flash 💾

Pyrmanent A base class to make your Python classes permanent in a flash. Features Easy to use. Great compatibility. No database needed. Ask for new fe

Sergio Abad 4 Jan 07, 2022
Open source tools to allow working with ESP devices in the browser

ESP Web Tools Allow flashing ESPHome or other ESP-based firmwares via the browser. Will automatically detect the board type and select a supported fir

ESPHome 195 Dec 31, 2022
It's a repo for Cramer's rule, which is some math crap or something idk

It's a repo for Cramer's rule, which is some math crap or something idk (just a joke, it's not crap; don't take that seriously, math teachers)

Module64 0 Aug 31, 2022
An application for automation of the mining function in the game Alienworlds.IO

alienautomation A Python script made to automate the tidious job of mining on AlienWorlds This script: Automatically opens the browser Automatically l

anonieXdev 42 Dec 03, 2022
aaencode for python,把python代码转换为颜文字

py-aaencode aaencode for python,把python代码转换为颜文字 compile.py: 将python编译成颜文字,编译结果有随机性,可以选择BPE词表压缩代码 compile_min.py: 最小化的编译器 compiled_min.txt: 编译得到的最小的com

11 Dec 30, 2021
Distribute PySPI jobs across a PBS cluster

Distribute PySPI jobs across a PBS cluster This repository contains scripts for distributing PySPI jobs across a PBS-type cluster. Each job will conta

Oliver Cliff 1 Feb 10, 2022
Free and open source qualitative research tool

Taguette A spin on the phrase "tag it!", Taguette is a free and open source qualitative research tool that allows users to: Import PDFs, Word Docs (.d

Remi Rampin 48 Jan 02, 2023
A web project to control the daily life budget planing

Budget Planning - API In this repo there's only the API and Back-End of the this project. Install and run the project # install virtualenv --python=py

Leonardo Da Vinci 1 Oct 24, 2021
A tool to replace all osu beatmap backgrounds at once.

OsuBgTool A tool to replace all osu beatmap backgrounds at once. Requirements You need to have python 3.6 or newer installed. That's it. How to Use Ju

Aditya Gupta 1 Oct 24, 2021
Cool little Python scripts & projects I've made.

Little Python Projects A repository for neat little Python scripts I've made! How to run a script: *NOTE: You'll need to install Python v3 or higher.

dood 1 Jan 19, 2022
使用京东cookie一键生成所有退会链接

JDMemberCloseLinks 本项目旨在使用京东cookie一键生成所有退会链接

hyzaw 68 Jun 10, 2022
A python tool for synchronizing the messages from different threads, processes, or hosts.

Sync-stream This project is designed for providing the synchoronization of the stdout / stderr among different threads, processes, devices or hosts.

Yuchen Jin 0 Aug 11, 2021
Neogex is a human readable parser standard, being implemented in Python

Neogex (New Expressions) Parsing Standard Much like Regex, Neogex allows for string parsing and validation based on a set of requirements. Unlike Rege

Seamus Donnellan 1 Dec 17, 2021
Store Simulation

Almacenes Para clonar el Repositorio: Vaya a la terminal de Linux o Mac, o a la cmd en Windows y ejecute:

Johan Posada 1 Nov 12, 2021