I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

Modify the value and status of the records KoboToolbox

Modify the value and status of the records KoboToolbox (Modica el valor y status de los registros de KoboToolbox)

1 Oct 30, 2021
Implementation of the Folders📂 esoteric programming language, a language with no code and just folders.

Folders.py Folders is an esoteric programming language, created by Daniel Temkin in 2015, which encodes the program entirely into the directory struct

Sina Khalili 425 Dec 17, 2022
Python samples for Google Cloud Platform products.

Google Cloud Platform Python Samples Python samples for Google Cloud Platform products. Setup Install pip and virtualenv if you do not already have th

Google Cloud Platform 6k Jan 03, 2023
A practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python without library.

Finding-LCM-using-python-from-scratch Here, I write a practice program to find the LCM i.e Lowest Common Multiplication of two numbers using python wi

Sachin Vinayak Dabhade 4 Sep 24, 2021
Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

1 Jan 11, 2022
Estimating the potential photovoltaic production of buildings (in Berlin)

The following people contributed equally to this repository (in alphabetical order): Daniel Bumke JJX Corstiaen Versteegh This repository is forked on

Daniel Bumke 6 Feb 18, 2022
Create a simple program by applying the use of class

TUGAS PRAKTIKUM 8 💻 Nama : Achmad Mahfud NIM : 312110520 Kelas : TI.21.C5 Perintah : Buat program sederhana dengan mengaplikasikan pengguna

Achmad Mahfud 1 Dec 23, 2021
Animations made using manim-ce

ManimCE Animations Animations made using manim-ce The code turned out to be a bit complicated than expected.. It can be greatly simplified, will work

sparshg 17 Jan 06, 2023
A rough GSL work DynSAGE of my graduation project

DynSAGE Codes w.r.t DynSAGE-Diffuse can be found in function apply_dyn_model_v2 of src/utils.py. The training entrance is Line 144 - 155 of src/main.p

Yuhan Wang 3 Mar 22, 2022
Convert ldapdomaindump to Bloodhound

ldd2bh Usage usage: ldd2bh.py [-h] [-i INPUT_FOLDER] [-o OUTPUT_FOLDER] [-a] [-u] [-c] [-g] [-d] Convert ldapdomaindump to Bloodhoun

64 Oct 30, 2022
A good Tool to comment on xmw

A good Tool to comment on xmw

1 Feb 10, 2022
A shim for the typeshed changes in mypy 0.900

types-all A shim for the typeshed changes in mypy 0.900 installation pip install types-all why --install-types is annoying, this installs all the thin

Anthony Sottile 28 Oct 20, 2022
Virtual Assistant Using Python

-Virtual-Assistant-Using-Python Virtual desktop assistant is an awesome thing. If you want your machine to run on your command like Jarvis did for Ton

Bade om 1 Nov 13, 2021
PyDy, short for Python Dynamics, is a tool kit written in the Python

PyDy, short for Python Dynamics, is a tool kit written in the Python programming language that utilizes an array of scientific programs to enable the study of multibody dynamics. The goal is to have

PyDy 307 Jan 01, 2023
A simple Python script for generating a variety of hashes from safe urandom entropy.

Hashgen A simple Python script for generating a variety of hashes from safe urandom entropy. For whenever you need a random hash (e.g. generating an a

Xanspie 1 Feb 17, 2022
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Apache Airflow Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. When workflows are define

The Apache Software Foundation 28.6k Dec 28, 2022
Example platform plugin that fixes fentry calls in Binja

Example Binja Platform Plugin This is an example Binja platform plugin which fixes up linux kernel module calls to __fentry__. __fentry__ is the linux

_yrp 2 Oct 07, 2021
Telegram bot for Urban Dictionary.

Urban Dictionary Bot @TheUrbanDictBot A star ⭐ from you means a lot to us! Telegram bot for Urban Dictionary. Usage Deploy to Heroku Tap on above butt

Stark Bots 17 Nov 24, 2022
Bitflip Fault Simulation Platform by Daniele Rizzieri (2021)

SEE Injection Framework 2021 This repository contains two Single Event Effect (SEE) injection platforms. The first one is called BFSP - "Bitflip Fault

Daniele Rizzieri 2 Nov 05, 2022
Algorand Python API examples

Algorand-Py Algorand Python API examples This repo will hold example scripts to monitor activities on Algorand main net. You can: Monitor your assets

Karthik Dutt 2 Jan 23, 2022