A Python client for the Softcite software mention recognizer server

Overview

Softcite software mention recognizer client

Python client for using the Softcite software mention recognition service. It can be applied to

  • individual PDF files

  • recursively to a local directory, processing all the encountered PDF

  • to a collection of documents harvested by biblio-glutton-harvester and article-dataset-builder, with the benefit of re-using the collection manifest for injectng metadata and keeping track of progress. The collection can be stored locally or on a S3 storage.

Requirements

The client has been tested with Python 3.5-3.7.

The client requires a working Softcite software mention recognition service. Service host and port can be changed in the config.json file of the client.

Install

cd software_mention_client/

It is advised to setup first a virtual environment to avoid falling into one of these gloomy python dependency marshlands:

virtualenv --system-site-packages -p python3 env

source env/bin/activate

Install the dependencies, use:

pip3 install -r requirements.txt

Usage and options

usage: software_mention_client.py [-h] [--repo-in REPO_IN] [--file-in FILE_IN]
                                  [--file-out FILE_OUT]
                                  [--data-path DATA_PATH] [--config CONFIG]
                                  [--reprocess] [--reset] [--load]
                                  [--diagnostic] [--scorched-earth]

Softcite software mention recognizer client

optional arguments:
  -h, --help            show this help message and exit
  --repo-in REPO_IN     path to a directory of PDF files to be processed by
                        the Softcite software mention recognizer
  --file-in FILE_IN     a single PDF input file to be processed by the
                        Softcite software mention recognizer
  --file-out FILE_OUT   path to a single output the software mentions in JSON
                        format, extracted from the PDF file-in
  --data-path DATA_PATH
                        path to the resource files created/harvested by
                        biblio-glutton-harvester
  --config CONFIG       path to the config file, default is ./config.json
  --reprocess           reprocessed failed PDF
  --reset               ignore previous processing states and re-init the
                        annotation process from the beginning
  --load                load json files into the MongoDB instance, the --repo-
                        in parameter must indicate the path to the directory
                        of resulting json files to be loaded
  --diagnostic          perform a full count of annotations and diagnostic
                        using MongoDB regarding the harvesting and
                        transformation process
  --scorched-earth      remove a PDF file after its successful processing in
                        order to save storage space, careful with this!

The logs are written by default in a file ./client.log, but the location of the logs can be changed in the configuration file (default ./config.json).

Processing local PDF files

For processing a single file., the resulting json being written as file at the indicated output path:

python3 software_mention_client.py --file-in toto.pdf --file-out toto.json

For processing recursively a directory of PDF files, the results will be:

  • written to a mongodb server and database indicated in the config file

  • and in the directory of PDF files, as json files, together with each processed PDF

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/

The default config file is ./config.json, but could also be specified via the parameter --config:

python3 software_mention_client.py --repo-in /mnt/data/biblio/pmc_oa_dir/ --config ./my_config.json

Processing a collection of PDF harvested by biblio-glutton-harvester

biblio-glutton-harvester and article-dataset-builder creates a collection manifest as a LMDB database to keep track of the harvesting of large collection of files. Storage of the resource can be located on a local file system or on a AWS S3 storage. The software-mention client will use the collection manifest to process these harvested documents.

  • locally:

python3 software_mention_client.py --data-path /mnt/data/biblio-glutton-harvester/data/

--data-path indicates the path to the repository of data harvested by biblio-glutton-harvester.

The resulting JSON files will be enriched by the metadata records of the processed PDF and will be stored together with each processed PDF in the data repository.

If the harvested collection is located on a S3 storage, the access information must be indicated in the configuration file of the client config.json. The extracted software mention will be written in a file with extension .software.json, for example:

-rw-rw-r-- 1 lopez lopez 1.1M Aug  8 03:26 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.pdf
-rw-rw-r-- 1 lopez lopez  485 Aug  8 03:41 0100a44b-6f3f-4cf7-86f9-8ef5e8401567.software.json

If a MongoDB server access information is indicated in the configuration file config.json, the extracted information will additionally be written in MongoDB.

License and contact

Distributed under Apache 2.0 license. The dependencies used in the project are either themselves also distributed under Apache 2.0 license or distributed under a compatible license.

Main author and contact: Patrice Lopez ([email protected])

Many discord bots serving different purposes

Discord_Botlari Farklı amaçlara hizmet eden bir çok discord botu En kapsamlı Bot Game Bottur. bir oyun botudur discord sunucularında kullanılır. (tüm

1 Dec 21, 2021
IdeasBot - Funny telegram bot to generate ideas for a project

Repository of PIdeas_bot About Funny telegram bot for generating projects ideas.

Just Koala 5 Oct 16, 2022
A Bot To Find Telegram User ID Easily

Telegram ID Bot 🤖 A Bot To Find Telegram User ID Easily Made with Python3 (C) @BXBotz Copyright permission under MIT License License - https://githu

MuFaz-TG 6 Nov 21, 2022
BingBot - A bot that will automate searches on bing

bingBot A bot that will automate searches on bing. To install this just download

Lukas 2 Jul 28, 2022
A Telegram Most Powerful Media Info Bot.

Media Info Bot Support 🚑 Demo For The Bot -Test Our Bot By Clicking The Button Below Deploy To Heroku 🗳 Press the Deploy Button to Get Your Own Bot.

Anonymous 5 May 16, 2022
Telegram client written in GTK & Python

Meowgram GTK Telegram Client 🐱 Why Meogram? Meowgram = Meow + Gram :D Meow - Talking cats sound. It's a symbol of unique and user friendly UI of clie

Artem Prokop 71 May 04, 2022
Ethereum Gas Fee for the MacBook Pro touchbar (using BetterTouchTool)

Gasbar Ethereum Gas Fee for the MacBook Pro touchbar (using BetterTouchTool) Worried about Ethereum gas fees? Me too. I'd like to keep an eye on them

TSS 51 Nov 14, 2022
A simple telegram bot to help you to remove forward tag from post from any messages . Maded in python3 using @Pyrogram . Developed by @Kunal-Diwan

Frwd-Tag-Remover Telegram Bot to Remove forward tag from any Post . If you need any more modes in repo or If you find out any bugs, mention in @Develo

Kunal Diwan 2 Oct 14, 2022
Hostapd-mac-monitor - Setup a hostapd AP to conntrol the connections of specific MACs

A brief explanation This script provides way to setup a monitoring service of sp

2 Feb 03, 2022
Python script to Funge NFTs.

Python script to Funge NFTs. It scrapes OpenSea for a given list of NFT collections and downloads a certain number of NFTs from each collection or the entire collections.

3 Apr 28, 2022
Local community telegram bot

Бот на районе Телеграм-бот для поиска адресов и заведений в вашем районе города или в небольшом городке. Требует недели прогулок по району д

Ilya Zverev 32 Jan 19, 2022
Pls give vaccine.

Pls Give Vaccine A script to spam yourself with vaccine notifications. Explore the docs » View Demo · Report Bug · Request Feature Table of Contents A

Rohan Mukherjee 3 Oct 27, 2021
Telegram Voice Chat UserBot made with Pyrogram and MarshalX/tgcalls with playlist and Heroku support

Telegram Voice Chat UserBot A Telegram UserBot to Play Audio in Voice Chats. This is also the source code of the userbot which is being used for playi

Calls Music 164 Nov 12, 2022
Auto-commiter - Auto commiter Github

auto committer Github Follow the steps below to use this repository: 1-install c

Arman Ebtekari 8 Nov 14, 2022
A simple url uploader bot with permenent thumbnail support

URL-Uploader A simple url uploader bot with permenent thumbnail support Scrapped some code from @SpEcHIDe's AnyDLBot Repository Please fork this repos

Fayas Noushad 40 Nov 29, 2021
Automatically gets clips from twitch streams and uploads them to a YouTube channel.

Twitch Stream Highlights to YT Automatic Uploader (AutoBot Clipper) This script can be used to automatically extract highlights (or clips) from a twit

Teja Swaroop 57 Dec 12, 2022
HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻🌐💡

aws-iot-shadow-rest-api HTTP Calls to Amazon Web Services Rest API for IoT Core Shadow Actions 💻 🌐 💡 This simple script implements the following aw

AIIIXIII 3 Jun 06, 2022
Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

Python wrappers for INHECO ODTC and SCILA libraries by INHECO GmbH.

1 Feb 09, 2022
Spacecrypto-bot - SpaceCrypto Bot Auto Clicker

SpaceCrypto Auto Clicker Bot Também fiz um para Luna Rush ( https://github.com/w

Walter Discher Cechinel 5 Feb 22, 2022
A file-based quote bot written in Python

Let's Write a Python Quote Bot! This repository will get you started with building a quote bot in Python. It's meant to be used along with the Learnin

1 Nov 01, 2021