A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call.

Discord_Meme_Bot 🤣 Bot that embeds a random hysterical meme from Reddit into your text channel as an embedded message, using an API call. Add the bot

2 Jan 16, 2022
This Discord bot is to give timely notifications to Students in the Lakehead CS 2021 Guild

Discord-Bot Goal of Project The purpose of this Discord bot is to give timely notifications to Students in the Lakehead CS 2021 Guild. How can I contr

8 Jan 30, 2022
doi, pubmed, arxiv.org的查询服务API接口,部署于vercel云函数

article-search-service doi, pubmed, arxiv.org的查询服务API接口,部署于vercel云函数 云函数 vercel,国内可能被qiang了。 DOI接口 POST https://article-search-service.vercel.app/api/

HyokaChen 2 Oct 10, 2021
A simple Telegram bot that can add caption to any media on your channel

Channel Auto Caption This bot can add a caption for any media/document sent to a channel. Just deploy bot and add bot as admin to a channel. Deploy to

22 Nov 14, 2022
Schedule Twitter updates with easy

coo: schedule Twitter updates with easy Coo is an easy to use Python library for scheduling Twitter updates. To use it, you need to first apply for a

wilfredinni 46 Nov 03, 2022
♻️ API to run evaluations of the FAIR principles (Findable, Accessible, Interoperable, Reusable) on online resources

♻️ FAIR enough 🎯 An OpenAPI where anyone can run evaluations to assess how compliant to the FAIR principles is a resource, given the resource identif

Maastricht University IDS 4 Oct 20, 2022
This package accesses nitrotype's official api along with its unofficial user api

NitrotypePy This package accesses nitrotype's official api along with its unofficial user api. Currently still in development. Install To install, run

The Moon That Rises 2 Sep 04, 2022
An unofficial client library for Google Music.

gmusicapi: an unofficial API for Google Play Music gmusicapi allows control of Google Music with Python. from gmusicapi import Mobileclient api = Mob

Simon Weber 2.5k Dec 15, 2022
Simple Telegram bot to confess to your crush this Valentine's Day

Simple Telegram bot to confess to your crush this Valentine's Day! Steps pip install python-telegram-bot Register a Telegram bot & get the token by fo

3 Mar 18, 2022
Notion API Database Python Implementation

Python Notion Database Notion API Database Python Implementation created only by database from the official Notion API. Installing / Getting started p

minwook 78 Dec 19, 2022
LimitatiBot - A simple telegram bot to establish a conversation with a user without having to use private chats

🤖 LimitatiBot [0.2] LimitatiBot is a simple telegram bot to establish a convers

xMrPente 9 Dec 27, 2022
Pixiv 爬虫,使用 Python 实现。支持批量下载、上传到图床。

用 Python 实现的 Pixiv 爬虫,支持批量下载和上传。 随机图片 API: https://loliapi.ml/ Deploy Github Action 集成部署 建议使用本方法部署,相较于本地部署,无需搭建环境,全程在线上完成。并且使用国外服务器下载、上传,网络更加通畅。 Fork

18 Feb 26, 2022
Tools used by Ada Health's internal IT team to deploy and manage a serverless Munki setup.

Serverless Munki This repository contains cross platform code to deploy a production ready Munki service, complete with AutoPkg, that runs entirely fr

Ada Health 17 Dec 05, 2022
Exporta archivos masivamente del TEC Digital.

TEC Digital Files Exporter Script que permite exportar los archivos de cursos del TEC Digital del Instituto Tecnológico de Costa Rica, debido al borra

Joseph Vargas 22 Apr 08, 2021
A Discord Self bot written in python

WitheredBot A Discord Self bot written in python Requirement Python = 3.9 How to Configure git clone https://github.com/a-a-a-aa/WitheredBot.git cd W

......... 0 Jan 05, 2023
This is the Best Calculator Bot!

CalculatorBot This is the Best Calculator Bot! Deploy on Heroku Variables API_HASH Your API Hash from my.telegram.org API_ID Your API ID from my.teleg

2 Dec 04, 2021
Plays air warning sound when detects a certain phrase or a word in a specified Telegram chat.

Tryvoha Bot Disclaimer: this is more a convenient naming, rather than a real bot. It is designed to play air warning sound when detects a certain phra

Dmytro Novikov 2 Mar 02, 2022
Custom bot I've made to host events on my personal Discord server.

discord_events Custom bot I've made to host events on my personal Discord server. You can try the bot out in my personal server here: https://discord.

AlexFlipnote 5 Mar 16, 2022
A program that automates the boring parts of completing the Daily accounting spreadsheet at Taos Ski Valley

TSV_Daily_App A program that automates the boring parts of completing the Daily accounting spreadsheet at my old job. To see how it works you will nee

Devin Beck 2 Jan 01, 2022
Telegram Voice Chat Music Player UserBot Written with Pyrogram Smart Plugin and tgcalls

Telegram Voice Chat UserBot A Telegram UserBot to Play Audio in Voice Chats. This is also the source code of the userbot which is being used for playi

Dash Eclipse 7 May 21, 2022