Write reproducible code for getting and processing ChEMBL

Overview

chembl_downloader

PyPI PyPI - Python Version PyPI - License DOI Code style: black

Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download it and use it automatically.

Installation

$ pip install chembl-downloader

Usage

Download A Specific Version

import chembl_downloader

path = chembl_downloader.download(version='28')

After it's been downloaded and extracted once, it's smart and does not need to download again. It gets stored using pystow automatically in the ~/.data/chembl directory.

We'd like to implement something such that it could load directly into SQLite from the archive, but it appears this is a paid feature.

Download the Latest Version

First, you'll have to install bioversions with pip install bioversions, whose job it is to look up the latest version of many databases. Then, you can modify the previous code slightly by omitting the version keyword argument:

import chembl_downloader

path = chembl_downloader.download()

The version keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()), but will be omitted below for brevity.

Automate Connection

Inside the archive is a single SQLite database file. Normally, people manually untar this folder then do something with the resulting file. Don't do this, it's not reproducible! Instead, the file can be downloaded and a connection can be opened automatically with:

import chembl_downloader

with chembl_downloader.connect() as conn:
    with conn.cursor() as cursor:
        cursor.execute(...)  # run your query string
        rows = cursor.fetchall()  # get your results

The cursor() function provides a convenient wrapper around this operation:

import chembl_downloader

with chembl_downloader.cursor() as cursor:
    cursor.execute(...)  # run your query string
    rows = cursor.fetchall()  # get your results

Run a query and get a pandas DataFrame

The most powerful function is query() which builds on the previous connect() function in combination with pandas.read_sql to make a query and load the results into a pandas DataFrame for any downstream use.

import chembl_downloader

sql = """
SELECT
    MOLECULE_DICTIONARY.chembl_id,
    MOLECULE_DICTIONARY.pref_name
FROM MOLECULE_DICTIONARY
JOIN COMPOUND_STRUCTURES ON MOLECULE_DICTIONARY.molregno == COMPOUND_STRUCTURES.molregno
WHERE molecule_dictionary.pref_name IS NOT NULL
LIMIT 5
"""

df = chembl_downloader.query(sql)
df.to_csv(..., sep='\t', index=False)

Suggestion 1: use pystow to make a reproducible file path that's portable to other people's machines (e.g., it doesn't have your username in the path).

Suggestion 2: RDKit is now pip-installable with pip install rdkit-pypi, which means most users don't have to muck around with complicated conda environments and configurations. One of the powerful but understated tools in RDKit is the rdkit.Chem.PandasTools module.

Store in a Different Place

If you want to store the data elsewhere using pystow (e.g., in pyobo I also keep a copy of this file), you can use the prefix argument.

import chembl_downloader

# It gets downloaded/extracted to 
# ~/.data/pyobo/raw/chembl/29/chembl_29/chembl_29_sqlite/chembl_29.db
path = chembl_downloader.download(prefix=['pyobo', 'raw', 'chembl'])

See the pystow documentation on configuring the storage location further.

The prefix keyword argument is available for all functions in this package (e.g., including connect(), cursor(), and query()).

Download via CLI

After installing, run the following CLI command to ensure it and send the path to stdout

$ chembl_downloader

Use --test to show two example queries

$ chembl_downloader --test

Contributing

If you'd like to contribute, there's a submodule called chembl_downloader.queries where you can add an SQL query along with a description of what it does for easy importing.

Comments
  • Repo status

    Repo status

    Dear @cthoyt,

    I know that you have multiple responsibilities, but I was wondering if the current repo is in working condition or if is it a legacy repo which worked with a specific version of ChEMBL? It would be great if you could add a batch on the repo for the same.

    Thank You.

    opened by YojanaGadiya 4
  • Add SQL for getting activities by target

    Add SQL for getting activities by target

    This PR adds some functionality for generating target-based datasets, motivated by https://github.com/PatWalters/yamc/issues/14.

    See the notebook here (note that this is pinned with a permalink to the state after merging this PR).

    opened by cthoyt 1
  • Improve ChEBI mapping notebook

    Improve ChEBI mapping notebook

    This filters out about 10% of the possible ChEMBL - ChEBI curations since ChEBI externally already took care of that

    -> move this into biomappings repo

    opened by cthoyt 0
  • Call for additional functionality

    Call for additional functionality

    • What other operations do people commonly want to do with the entire ChEMBL database/SDF file that would be good to wrap (including loading other files released by ChEMBL)?
    • What other operations like the RDKit supplier exist in other libraries that might be worth wrapping?

    @iwatobipen do you have any suggestions?

    opened by cthoyt 0
  • Add functionality for bacting

    Add functionality for bacting

    @egonw are there any bulk SMILES, InChI, or SDF loading operations in bacting that are exposed by pybacting that would be nice to wrap inside this library for full loading of ChEMBL? On the readme, you can see I made a specific function for RDKit's "supplier" that reads an SDF file

    opened by cthoyt 3
Releases(v0.4.1)
  • v0.4.1(Nov 19, 2022)

    What's Changed

    • Add SQL for getting activities by target by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/8
    • Improve ChEBI mapping notebook by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/10
    • Add UniProt target mapping functions by @cthoyt in https://github.com/cthoyt/chembl-downloader/pull/11

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Oct 28, 2022)

    This PR does several things:

    1. Removes dependency on bioversions and just implements the code locally
    2. Adds a CLI for generating a statistics table for all versions of ChEMBL
    3. Add proper project skeleton (documentation, unit tests, code quality assurance, CI)
    4. Improve SQLite loading in case you delete the compressed data

    Notebooks

    1. Adds notebook about drug indications
    2. Adds notebook about mapping to ChEBI
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Mar 19, 2022)

    This release adds two new functions:

    1. chembl_downloader.download_monomer_library which gets this file https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_30_monomer_library.xml for whatever version you specify
    2. chembl_downloader.get_monomer_library_root which does the same as the downloader but also parses the XML for you

    Thanks to @iwatobipen and his recent blog post for inspiring this.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jan 14, 2022)

    New Functions

    • chembl_downloader.download_fps downloads the pre-computed Morgan fingerprint file
    • chembl_downloader.download_chemreps downloads the chembl-smiles-inchi-inchikey map
    • chembl_downloader.get_chemreps_df builds on chembl_downloader.download_chemreps and loads them in a pandas dataframe

    Misc

    • Add isort to code quality checking
    • Enable many functions with return_version to make a tuple with the version, which is useful if you're having it infer the latest version.
    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Dec 20, 2021)

    This release adds the get_substructure_library() for automating the generation of an RDKit substructure library as described in Greg Landrum's RDKit blog post, Some new features in the SubstructLibrary. The following example shows how it can be used to accomplish some of the first tasks presented in the post:

    from rdkit import Chem
    
    import chembl_downloader
    
    library = chembl_downloader.get_substructure_library()
    query = Chem.MolFromSmarts('[O,N]=C-c:1:c:c:n:c:c:1')
    matches = library.GetMatches(query)
    

    Full Changelog: https://github.com/cthoyt/chembl-downloader/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Dec 20, 2021)

  • v0.1.1(Aug 5, 2021)

  • v0.1.0(Aug 4, 2021)

    • rename download() to download_extract_sqlite() to make room for other download functions
    • added supplier() function for loading the SDF dump through RDKit
    Source code(tar.gz)
    Source code(zip)
  • v0.0.4(Jul 28, 2021)

  • v0.0.3(Jul 27, 2021)

  • v0.0.2(Jul 27, 2021)

  • v0.0.1(Jul 27, 2021)

Owner
Charles Tapley Hoyt
Bio/cheminformatician, open scientist, maintainer of @pybel and @pykeen, part of @indralab (he/him)
Charles Tapley Hoyt
Twayback: Downloading deleted Tweets from the Wayback Machine, made easy

Finding and downloading deleted Tweets takes a lot of time. Thankfully, with this tool, it becomes a piece of cake! 🎂

126 Dec 27, 2022
Download minecraft head or skin, allows TLauncher accounts

Minecraft-skin-downloader Download minecraft head or skin, allows TLauncher accounts by BoBkiNN_ Contact: https://vk.com/bobkinnvk Requirements: Modul

3 Apr 03, 2022
A simple kemono.party downloader using python.

kemono-dl This is a simple kemono.party downloader. How to use Install python Download source code from releases and extract it Then install requireme

318 Dec 27, 2022
A fast and small Torrent client made with Python 3.

pico-torrent A fast and small Torrent client made with Python 3. History and context It was programmed by a hacker known as Jazz_Man, around January o

Pindorama 9 Oct 04, 2022
The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

16 Aug 31, 2022
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022
A python module to download ISO Standards

ISO Standards Downloader A python module to download ISO Standards from https://standards.iso.org/iso-iec/ Report Bug · Request Feature Table of conte

Daniel 1 Dec 29, 2021
A tool to make easy to search for directories in the URL.

Welcome to Brutos Directory Scanner 🚀 The Brutos is a python script used to provide agility in obtaining verifications to informations about related

Sérgio Corrêa 4 Apr 14, 2022
this is udemy course downloader, before a start you know how to get access token.

udemy_downloader this is udemy course downloader, before a start you know how to get access token. To get the access_token on Google Chrome (once on U

OkUgur 18 Dec 04, 2022
ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comics

ImageScraper The ImageScraper is a cross-platform tool for downloading a specified count from xkcd, Astronomy Picture of the Day and Existential Comic

1amnobody 1 Jan 25, 2022
A program which takes an Anime name or URL and downloads the specified range of episodes.

super-anime-downloader A console application written in Python3.x (GUI will be added soon) which takes a Anime Name/URL as input and downloads the ran

Sayyid Ali Sajjad Rizavi 26 Jul 18, 2022
This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

Debiprasad Das 5 Dec 28, 2022
A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

David 97 Jan 02, 2023
A Simple YouTube Video Downloader With Python

Simple YouTube Video Downloader Simple YouTube Video Downloader is an open source project with a very simple UI that tries to speed up the process of

Brian Han 2 Jan 03, 2022
1Fichier Download Manager.

1fichier-dl 1Fichier Download Manager. Features ⭐ Manage your downloads ⭐ Bypass time limits Credits All icons, including the app icon, were provided

manuGMG 470 Oct 08, 2022
Bulk Downloader for Reddit

saveddit is a bulk media downloader for reddit pip3 install saveddit Setting up authorization Register an application with Reddit Write down your clie

Pranav 136 Jan 03, 2023
the best video downloader for terminals (currently only compatible with Linux and Windows)

the best video downloader for terminals (currently only compatible with Linux and Windows)

Amaral 2 Oct 14, 2021
A Python package for downloading / archiving all available episodes from a podcast RSS feed.

allcasts 📻 🗃 A Python package for downloading all available episodes from a podcast RSS feed. Useful for making private archives of your favourite p

Lewis Gentle 5 Nov 20, 2022
Code for "Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions"

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions Codebase for the "Adversarial Motion Priors Make Good Substitutes for Com

Alejandro Escontrela 54 Dec 13, 2022
Jocomol 16 Dec 12, 2022