LinkML based SPARQL template library and execution engine

Overview

sparqlfun

LinkML based SPARQL template library and execution engine

  • modularized core library of SPARQL templates
    • generic templates using common vocabs (rdf, owl, skos, ...)
    • OBO and biology specific, e.g. Ubergraph
    • coming soon: uniprot, wikidata, etc
  • Fully FAIR description of templates
    • Each template has a URI
    • Each template parameters has a URI
    • Full metadata including descriptions of each
    • Templates described in YAML, RDF, SHACL, ShEx, ...
  • optional python bindings using LinkML
  • supports both SELECT and CONSTRUCT
  • optional export to TSV, JSON, YAML

Browse the default templates

Note: currently not all metadata from the yaml is shown in the generated docs

Command Line

sparqlfun -e ubergraph -T PairwiseCommonSubClassAncestor node1=GO:0046220 node2=GO:0008295

results:

results:
- node1: GO:0046220
  node2: GO:0008295
  predicate1: rdfs:subClassOf
  predicate2: rdfs:subClassOf
  ancestor: GO:0009987
- node1: GO:0046220
  node2: GO:0008295
  predicate1: rdfs:subClassOf
  predicate2: rdfs:subClassOf
  ancestor: GO:0044237
- node1: GO:0046220
  node2: GO:0008295
  predicate1: rdfs:subClassOf
  predicate2: rdfs:subClassOf
  ancestor: GO:0044271
...

Python

se = SparqlEngine(endpoint='ubergraph')
se.bind_prefixes(GO='http://purl.obolibrary.org/obo/GO_')
for row in se.query(PairwiseCommonSubClassAncestor, node1='GO:0046220', node2='GO:0008295'):
        print(f'ROW={row}')

For more examples, see tests/

Service (via Fast API)

coming soon!

Browsing the templates

  • source is in sparqlfun/schema
    • add new templates here
  • Browse the generated markdown on the site

How it works

Basics

Templates are defined as YAML files following the LinkML schema.

A yaml file with a single template might look like this:

classes:
  my template:
    slots:
      - my_var1
      - my_var2
    annotations:
      sparql.select: |-
        SELECT  * WHERE { ... ?my_var1 ... ?my_var2}
      
slots:
  my_var1:
    description: about my var 1
  my_var2:
    description: about my var 2

This defines a template MyTemplate with two slots/parameters, and an arbitrarily complex SPARQL select query.

Note that the definitions of the slots go in a different section from the classes/templates. You are encouraged to "reuse" slots across templates.

The above can be used in queries:

sparqlfun -e ubergraph -T MyTemplate my_var2=MY_VAL

You can ground any or all of your vars on the command line (if you ground all then your SELECT is effectively an ASK query).

However, the features go beyond other templating systems, and leverage the fact that LinkML is a fully-fledged rich modeling language with bindings to JSON-Schema, SHACL, ShEx, etc.

For example, you will get markdown documentation describing your templates. This markdown documentation will be even richer if you annotate your schemas with metadata such as

  • descriptions
  • ranges for slots
  • mappings and URIs for your templates and slots

Template Inheritance

Templates can be inherited, facilitating reuse and composition patterns

To illustrate consider a simple "base" template to query a triple:

triple:
    aliases:
      - statement
    description: >-
      Represents an RDF triple
    slots:
      - subject
      - predicate
      - object
    class_uri: rdf:Statement
    in_subset:
      - base table
    annotations:
      sparql.select: SELECT  * WHERE { ?subject ?predicate ?object}

This is not a particularly useful template in isolation - you may as well query directly with sparql (nevertheless it can be useful to have templates for even this simple pattern, to faciliate generation of APIs etc)

This template can be inherited, which means that slots will be inherited, eliminating some boilerplate and the need to redefine them

Inerhitance allows even more powerful features using the LinkML classification_rules construct. Let's say we want to represent type triples as children of generic triples:

rdf type triple:
    is_a: triple
    description: >-
      A triple that indicates the asserted type of the subject entity
    slot_usage:
      object:
        description: >-
          The entity type
        range: class node
    classification_rules:
      - is_a: triple
        slot_conditions:
          predicate:
            equals_string: rdf:type

Note we don't need to specify a SPARQL template here - the template is autogenerated from the classification rule.

SPARQL CONSTRUCT and nested/inlined objects

Example CONSTRUCT query:

obo class:
    is_a: class node
    class_uri: owl:Class
    slots:
      - definition
      - exact_synonyms
    annotations:
      sparql.construct: |-
        CONSTRUCT {
          ?id a owl:Class ;
              IAO:0000115 ?definition ;
              oboInOwl:hasExactSynonym ?exact_snonyms
        }
        WHERE {
          ?id a owl:Class .
          OPTIONAL { ?id IAO:0000115 ?definition } .
          OPTIONAL { ?id oboInOwl:hasExactSynonym ?exact_snonyms } .
        }

...

slots:
  definition:
    slot_uri: IAO:0000115
  exact_synonyms:
    slot_uri: oboInOwl:hasExactSynonym
    multivalued: true

We can then query this as follows:

sparqlfun -e ubergraph -T OboClass id=GO:0000023

The results will be nested following the LinkML specification for the model

{
  "results": [
    {
      "id": "GO:0000023",
      "definition": "The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch.",
      "exact_synonyms": [
        "malt sugar metabolic process",
        "malt sugar metabolism",
        "maltose metabolism"
      ]
    }
  ],
  "@type": "ResultSet"
}

You can also get the turtle as returned by the triplestore:

@prefix ns1: 
    .
@prefix ns2: 
    .
@prefix ns3: 
    .

ns2:GO_0000023 a 
    ;
    ns2:IAO_0000115 "The chemical reactions and pathways involving the disaccharide maltose (4-O-alpha-D-glucopyranosyl-D-glucopyranose), an intermediate in the catabolism of glycogen and starch." ;
    ns1:hasExactSynonym "malt sugar metabolic process",
        "malt sugar metabolism",
        "maltose metabolism" .

[] a ns3:ResultSet ;
    ns3:results ns2:GO_0000023 .

With -t tsv the linkml csv dumper will attempt to flatten the nested structure to TSV as closely as possible, e.g. using pipe internal seperators for multivalued

Modularity

LinkML allows importing so templates can be modularized

In future this repo may be split up, with the bio/obo specific features migrating to a new repo.

Use of Jinja commands

You can incorporate additional logic via Jinja2 templating instructions:

obo class filtered:
    is_a: class node
    class_uri: owl:Class
    slots:
      - definition
      - exact_synonyms
    annotations:
      sparql.construct: |-
        CONSTRUCT {
          ?id a owl:Class ;
              IAO:0000115 ?definition ;
              oboInOwl:hasExactSynonym ?exact_snonyms
        }
        WHERE {
          ?id a owl:Class .
          OPTIONAL { ?id IAO:0000115 ?definition } .
          OPTIONAL { ?id oboInOwl:hasExactSynonym ?exact_snonyms } .
          {% if query_has_subclass_ancestor %}
          ?id rdfs:subClassOf ?query_has_subclass_ancestor
          {% endif %}
        }

Supported Endpoints

This framework can be used with any SPARQL endpoint. However, the current pre-defined templates are geared towards the combination of OBO-style ontologies together with storage patterns employed in triplestores such as ubergraph and ontobee.

In particular, ubergraph uses the relation-graph inference tool to pre-compute inferred direct triples from TBox existential axioms, allowing for simple and powerful queries over inferred ontologies

See also

This was inspired in part by the powerful but arcane sparqlprog system

TODOs

  • Better Document
    • framework
    • templates
    • How-tos for use with Python, SHACL, ...
    • exemplar notebooks
  • Unify with SQL/rdftab functionality in semantic-sql
  • Split into bio-specific
  • Expose more ubergraph awesomeness
  • FastAPI/serverless endpoint
  • Expose more validatin
  • Integrate visualization / obographviz
  • Chaining
    • inject output from one into another and merge results, e.g. to get labels
    • similar to wikidata services
  • Templates for
    • uniprot
    • gocams
    • wikidata
You might also like...
Hydralit package is a wrapping and template project to combine multiple independant Streamlit applications into a multi-page application.
Hydralit package is a wrapping and template project to combine multiple independant Streamlit applications into a multi-page application.

Hydralit The Hydralit package is a wrapping and template project to combine multiple independant (or somewhat dependant) Streamlit applications into a

A parallel branch-and-bound engine for Python.

pybnb A parallel branch-and-bound engine for Python. This software is copyright (c) by Gabriel A. Hackebeil (gabe.hacke

XlvnsScriptTool -  Tool for decompilation and compilation of scripts .SDT from the visual novel's engine xlvns
XlvnsScriptTool - Tool for decompilation and compilation of scripts .SDT from the visual novel's engine xlvns

XlvnsScriptTool English Dual languaged (rus+eng) tool for decompiling and compiling (actually, this tool is more than just (dis)assenbler, but less th

Python Project Template

A low dependency and really simple to start project template for Python Projects.

Simple logger for Urbit pier size, with systemd timer template

urbit-piermon Simple logger for Urbit pier size, with systemd timer template. Syntax piermon.py -i [PATH TO PIER] -o [PATH TO OUTPUT CSV] systemd serv

Python template for Advent of Code event

Advent of Code Python Starter A tamplate for Advent of Code write in Python. Usage The project use poetry for project manager. Clone this repository a

This is the Code Institute student template for Gitpod.
This is the Code Institute student template for Gitpod.

Welcome AnaG0307, This is the Code Institute student template for Gitpod. We have preinstalled all of the tools you need to get started. It's perfectl

Template (v0) do Sistema Chatbot - atividade síncrona - INE5404
Template (v0) do Sistema Chatbot - atividade síncrona - INE5404

ine-5404-sistema-chatbot-template Template (v0) do Sistema Chatbot - atividade síncrona - INE5404 Veja abaixo um exemplo de funcionamento do sistema:

NotesToCommands - a fully customizable notes / command template program, allowing users to instantly execute terminal commands

NotesToCommands is a fully customizable notes / command template program, allowing users to instantly execute terminal commands with dynamic arguments grouped into sections in their notes/files. It was originally created for pentesting uses, to avoid the needed remembrance and retyping of sets of commands for various attacks.

Comments
  • Github Action that runs the test suite

    Github Action that runs the test suite

    Similar to the exemplar linkml-runtime repo, added a main.yaml github action that runs the test suite and generates coverage reports. Once this PR has been merged in, we can create the initial release to PyPI.

    opened by sujaypatil96 0
  • Refresh docs

    Refresh docs

    The old Ubergraph endpoint is referenced in the published docs, but I think it's correct in the source. See https://github.com/INCATools/ubergraph/issues/73

    opened by balhoff 0
Releases(v0.2.1)
  • v0.2.1(Apr 30, 2022)

    What's Changed

    • support for local rdf graphs by @cmungall in https://github.com/linkml/sparqlfun/pull/6

    Full Changelog: https://github.com/linkml/sparqlfun/compare/v0.2.0...v0.2.1

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Apr 28, 2022)

    What's Changed

    • missing line that is not updating pypi version by @sujaypatil96 in https://github.com/linkml/sparqlfun/pull/3
    • refactor-docs by @cmungall in https://github.com/linkml/sparqlfun/pull/4
    • endpoint docs by @cmungall in https://github.com/linkml/sparqlfun/pull/5

    New Contributors

    • @cmungall made their first contribution in https://github.com/linkml/sparqlfun/pull/4

    Full Changelog: https://github.com/linkml/sparqlfun/compare/v0.1.3...v0.2.0

    Source code(tar.gz)
    Source code(zip)
  • v0.1.3(Jan 15, 2022)

    What's Changed

    • Github Action responsible for automatically publishing PyPI releases by @sujaypatil96 in https://github.com/linkml/sparqlfun/pull/2

    Full Changelog: https://github.com/linkml/sparqlfun/compare/v0.1.2...v0.1.3

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Jan 15, 2022)

    What's Changed

    • First official release of the package on PyPI
    • Github Action that runs the test suite by @sujaypatil96 in https://github.com/linkml/sparqlfun/pull/1

    New Contributors

    • @sujaypatil96 made their first contribution in https://github.com/linkml/sparqlfun/pull/1

    Full Changelog: https://github.com/linkml/sparqlfun/compare/v0.1.1...v0.1.2

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Jan 11, 2022)

  • v0.1.0(Jan 11, 2022)

Owner
Linked data Modeling Language
LinkML is a general purpose modeling language that can be used with linked data, JSON, and other formalisms
Linked data Modeling Language
Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Sage Bionetworks 54 Dec 23, 2022
Service for working with open data of the State Duma of the Russian Federation

Сервис для работы с открытыми данными Госдумы РФ Исходные данные из API Госдумы РФ извлекаются с помощью Apache Nifi и приземляются в хранилище Clickh

Aleksandr Sergeenko 2 Feb 14, 2022
PaintPrint - This module can colorize any text in your terminal

PaintPrint This module can colorize any text in your terminal Author: tankalxat3

Alexander Podstrechnyy 2 Feb 17, 2022
Python Example Project Structure

Python Example Project Structure Example of statuses that can be in readme: Visit my docs for the full documentation, examples and guides. With this p

1 Oct 31, 2021
Projeto-menu - This project is designed to learn more about control mechanisms in Python programming

Projeto-menu - This project is designed to learn more about control mechanisms in Python programming

Henrik Ricarte 2 Mar 01, 2022
Our product DrLeaf which not only makes the work easier but also reduces the effort and expenditure of the farmer to identify the disease and its treatment methods.

Our product DrLeaf which not only makes the work easier but also reduces the effort and expenditure of the farmer to identify the disease and its treatment methods. We have to upload the image of an

Aniruddha Jana 2 Feb 02, 2022
SmartGrid - Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper & Lars Zwaan

SmartGrid - Een poging tot een optimale SmartGrid oplossing, door Dirk Kuiper & Lars Zwaan

1 Jan 12, 2022
Rotating cube with hand

I am still working on this project :)) To-Do(Present): = It needs an algorithm to fine tune my hand's coordinates for rotation of our cube (initial o

Jay Desale 2 Dec 26, 2021
Submission from Team OMR for the TRI-NIT Hackathon

Submission from Team OMR for the TRI-NIT Hackathon

0 Feb 01, 2022
i3wm helper tool for workspaces on multiple monitors

i3screens A helper tool for managing i3wm workspaces on multiple monitors. Use-case You have a multi-monitor setup and want to have the "same" workspa

Sebastian Neef 1 Dec 05, 2022
Recreating my first CRUD in python, but now more professional

Recreating my first CRUD in python, but now more professional

Ricardo Deo Sipione Augusto 2 Nov 27, 2021
Cirq is a Python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators

Cirq is a Python library for writing, manipulating, and optimizing quantum circuits and running them against quantum computers and simulators. Install

quantumlib 3.6k Jan 07, 2023
This synchronizes my appearances with my calendar

Josh's Schedule Synchronizer Here's the "problem:" I use a Google Sheets spreadsheet to maintain all my public appearances.

Developer Advocacy 2 Oct 18, 2021
Back-end API for the reternal framework

RE:TERNAL RE:TERNAL is a centralised purple team simulation platform. Reternal uses agents installed on a simulation network to execute various known

Joey Dreijer 7 Apr 15, 2022
データサイエンスチャレンジ2021 サンプル

データサイエンスチャレンジ2021 サンプル 概要 線形補間と Catmull–Rom Spline 補間のサンプル Python スクリプトです。 データサイエンスチャレンジ2021の出題意図としましては、訓練用データ(train.csv)から機械学習モデルを作成して、そのモデルに推論させてモーシ

Bandai Namco Research Inc. 5 Oct 17, 2022
This is a spamming selfbot that has custom spammed message and @everyone spam.

This is a spamming selfbot that has custom spammed message and @everyone spam.

astro1212 1 Jul 31, 2022
oracle arm registration script.

oracle_arm oracle arm registration script. 乌龟壳刷ARM脚本 本脚本优点 简单,主机配置好oci,然后下载main.tf即可,不用自己获取各种参数。 运行环境配置 本简单脚本使用python3编写,请自行配置好python3环境和requests库。(高版

test1234455 419 Jan 01, 2023
Werkzeug has a debug console that requires a pin. It's possible to bypass this with an LFI vulnerability or use it as a local privilege escalation vector.

Werkzeug Debug Console Pin Bypass Werkzeug has a debug console that requires a pin by default. It's possible to bypass this with an LFI vulnerability

Wyatt Dahlenburg 23 Dec 17, 2022
Este projeto se trata de uma análise de campanhas de marketing de uma empresa que vende acessórios para veículos.

Marketing Campaigns Este projeto se trata de uma análise de campanhas de marketing de uma empresa que vende acessórios para veículos. 1. Problema A em

Bibiana Prevedello 1 Jan 12, 2022
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python

Scalene: a high-performance CPU, GPU and memory profiler for Python by Emery Berger, Sam Stern, and Juan Altmayer Pizzorno. Scalene community Slack Ab

PLASMA @ UMass 7k Dec 30, 2022