Python plugin/extra to load data files from an external source (such as AWS S3) to a local directory

Overview

Data Loader Plugin - Python

Table of Content (ToC)

Table of contents generated with markdown-toc

Overview

The data loader plugin, aims at supporting running programs (e.g., API service backends) when downloading data from cloud services such as AWS S3. It provides a base Python library, namely data-loader-plugin, offering a few methods to download data files from AWS S3.

References

Python module

Python virtual environments

Installation

Clone this Git repository

$ mkdir -p ~/dev/infra && \
  git clone [email protected]:cloud-helpers/python-plugin-data-loader.git ~/dev/infra/python-plugin-data-loader
$ cd ~/dev/infra/python-plugin-data-loader

Python environment

  • If not already done so, install pyenv, Python 3.9 and, pip and pipenv
    • PyEnv:
$ git clone https://github.com/pyenv/pyenv.git ${HOME}/.pyenv
$ cat >> ~/.profile2 << _EOF

# Python
eval "\$(pyenv init --path)"

_EOF
$ cat >> ~/.bashrc << _EOF

# Python
export PYENV_ROOT="\${HOME}/.pyenv"
export PATH="\${PYENV_ROOT}/bin:\${PATH}"
. ~/.profile2
if command -v pyenv 1>/dev/null 2>&1
then
        eval "\$(pyenv init -)"
fi
if command -v pipenv 1>/dev/null 2>&1
then
        eval "\$(pipenv --completion)"
fi

_EOF
$ . ~/.bashrc
  • Python 3.9:
$ pyenv install 3.9.8 && pyenv local 3.9.8
  • pip:
$ python -mpip install -U pip
  • pipenv:
$ python -mpip install -U pipenv

Usage

Install the data-loader-plugin module

  • There are at least two ways to install the data-loader-plugin module, in the Python user space with pip and in a dedicated virtual environment with pipenv.

    • Both options may be installed in parallel
    • The Python user space (typically, /usr/local/opt/[email protected] on MacOS or ~/.pyenv/versions/3.9.8 on Linux) may already have many other modules installed, parasiting a fine-grained control over the versions of every Python dependency. If all the versions are compatible, then that option is convenient as it is available from the whole user space, not just from this sub-directory
  • In the remainder of that Usage section, it will be assumed that the data-loader-plugin module has been installed and readily available from the environment, whether that environment is virtual or not. In other words, to adapt the documentation for the case where pipenv is used, just add pipenv run in front of every Python-related command.

Install in the Python user space

  • Install and use the data-loader-plugin module in the user space (with pip):
$ python -mpip uninstall data-loader-plugin
$ python -mpip install -U data-loader-plugin

Installation in a dedicated Python virtual environment

  • Install and use the data-loader-plugin module in a virtual environment:
$ pipenv shell
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ python -mpip install -U data-loader-plugin
(python-...-JwpAHotb) ✔ exit

Use data-loader-plugin as a module from another Python program

  • Check the data file with the AWS command-line (CLI):
$ aws s3 ls --human s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv --no-sign-request
2021-10-29 20:44:34  249.3 MiB yellow_tripdata_2021-07.csv
  • Module import statements:
>>> import importlib
>>> from types import ModuleType
>>> from data_loader_plugin.base import DataLoaderBase
  • Create an instance of the DataLoaderBase Python class:
>>> plugin: ModuleType = importlib.import_module("data_loader_plugin.copyfile")
>>> data_loader: DataLoaderBase = plugin.DataLoader(
        local_path='/tmp/yellow_tripdata_2021-07.csv',
        external_url='s3://nyc-tlc/trip\ data/yellow_tripdata_2021-07.csv',
    )
>>> data_load_success, message = data_loader.load()

Development / Contribution

  • Build the source distribution and Python artifacts (wheels):
$ rm -rf _skbuild/ build/ dist/ .tox/ __pycache__/ .pytest_cache/ MANIFEST *.egg-info/
$ pipenv run python setup.py sdist bdist_wheel
  • Upload to Test PyPi (no Linux binary wheel can be uploaded on PyPi):
$ PYPIURL="https://test.pypi.org"
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://test.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://test.pypi.org/project/data-loader-plugin/0.0.1/
  • Upload/release the Python packages onto the PyPi repository:
    • Register the authentication token for access to PyPi:
$ PYPIURL="https://upload.pypi.org"
$ pipenv run keyring set ${PYPIURL}/ __token__
Password for '__token__' in '${PYPIURL}/':
  • Register the authentication token for access to PyPi:
$ pipenv run twine upload -u __token__ --repository-url ${PYPIURL}/legacy/ dist/*
Uploading distributions to https://upload.pypi.org/legacy/
Uploading data_loader_plugin-0.0.1-py3-none-any.whl
100%|███████████████████████████████████████| 23.1k/23.1k [00:02<00:00, 5.84kB/s]
Uploading data-loader-plugin-0.0.1.tar.gz
100%|███████████████████████████████████████| 23.0k/23.0k [00:01<00:00, 15.8kB/s]

View at:
https://pypi.org/project/data-loader-plugin/0.0.1/
$ pipenv run python setup.py build_sphinx
running build_sphinx
Running Sphinx v4.3.0
[autosummary] generating autosummary for: README.md
myst v0.15.2: ..., words_per_minute=200)
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] README
...
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] README
...
build succeeded.

The HTML pages are in build/sphinx/html.
  • Re-generate the Python dependency files (requirements.txt) for the CI/CD pipeline (currently Travis CI):
$ pipenv --rm; rm -f Pipfile.lock; pipenv install; pipenv install --dev
$ git add Pipfile.lock
$ pipenv lock -r > ci/requirements.txt
$ pipenv lock --dev -r > ci/requirements-dev.txt
$ git add ci/requirements.txt ci/requirements-dev.txt
$ git commit -m "[CI] Upgraded the Python dependencies for the Travis CI pipeline"

Test the data loader plugin Python module

  • Enter into the pipenv Shell:
$ pipenv shell
(python-...-iVzKEypY) ✔ python -V
Python 3.9.8
  • Uninstall any previously installed data-loader-plugin module/library:
(python-...-iVzKEypY) ✔ python -mpip uninstall data-loader-plugin
  • Launch a simple test with pytest
(python-iVzKEypY) ✔ python -mpytest tests
=================== test session starts ==================
platform darwin -- Python 3.9.8, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: ~/dev/infra/python-plugin-data-loader
plugins: cov-3.0.0
collected 3 items

tests/test_copyfile.py .                             [ 33%]
tests/test_s3.py ..                                  [100%]
====================== 3 passed in 1.22s ==================
  • Exit the pipenv Shell:
(python-...-iVzKEypY) ✔ exit
Owner
Cloud Helpers
Cloud helper tools and documentation
Cloud Helpers
Kunai Shitty Raider Leaked LMFAO

Kunai-Raider-Leaked Kunai Shitty Raider Leaked LMFA

5 Nov 24, 2021
A simple spyware in python.

Spyware-Python- Dependencies: Python 3.x OpenCV PyAutoGUI PyMongo (for mongodb connection) Flask (Web Server) Ngrok (helps us push our fla

Abubakar 3 Sep 07, 2022
This module is for finding the execution time of a whole python program

exetime 3.8 This module is for finding the execution time of a whole program How to install $ pip install exetime Contents: General Information Instru

Saikat Das 4 Oct 18, 2021
Extrator de dados do jupiterweb

Extrator de dados do jupiterweb O programa é composto de dois arquivos: Um constando apenas de classes complementares que representam as unidades e as

Bruno Aricó 2 Nov 28, 2022
A tool to guide you for team selection based on mana and ruleset using your owned cards.

Splinterlands_Teams_Guide A tool to guide you for team selection based on mana and ruleset using your owned cards. Built With This project is built wi

Ruzaini Subri 3 Jul 30, 2022
A python script that changes your desktop background based on current weather and time of the day.

Desktop background wallpaper, based on current weather and time A python script that changes your computer's desktop background based on current weath

Maj Gaberšček 1 Nov 16, 2021
The parser of a timetable of tennis matches for Flashscore website

FlashscoreParser The parser of a timetable of tennis matches for Flashscore website. The program collects the schedule of tennis matches for two days

Valendovsky 1 Jul 15, 2022
Helps compare between New and Old Tax Regime.

Income-Tax-Calculator Helps compare between New and Old Tax Regime. Sample Console Input/Output

2 Jan 10, 2022
Your one and only Discord Bot that helps you concentrate!

Your one and only Discord Bot thats helps you concentrate! Consider leaving a ⭐ if you found the project helpful. concy-bot A bot which constructively

IEEE VIT Student Chapter 22 Sep 27, 2022
Deis v1, the CoreOS and Docker PaaS: Your PaaS. Your Rules.

This repository (deis/deis) is no longer developed or maintained. The Deis v1 PaaS based on CoreOS Container Linux and Fleet has been replaced by Deis

Deis 6.1k Jan 04, 2023
Python library for ODE integration via Taylor's method and LLVM

heyoka.py Modern Taylor's method via just-in-time compilation Explore the docs » Report bug · Request feature · Discuss The heyókȟa [...] is a kind of

Francesco Biscani 45 Dec 21, 2022
Would upload anything I do with/related to brainfuck

My Brainfu*k Repo Basically wanted to create something with Brainfu*k but realized that with the smol brain I have, I need to see the cell values real

Rafeed 1 Mar 22, 2022
A tool that automatically creates fuzzing harnesses based on a library

AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pie

261 Jan 04, 2023
Change ACLs for QNAP LXD unprivileged container.

qnaplxdunpriv If Advanced Folder Permissions is enabled in QNAP NAS, unprivileged LXD containers won't start. qnaplxdunpriv changes ACLs of some Conta

1 Jan 10, 2022
A python script for practicing Toki Pona.

toki.py A python script for practicing Toki Pona. Modified from a hirigana script by ~vilmibm. Example of the script running: $ ./toki.py This script

Dustin 2 Dec 09, 2021
Stack BOF Protection Bypass Techniques

Stack Buffer Overflow - Protection Bypass Techniques

ommadawn46 18 Dec 28, 2022
Python script which allows for automatic registration in Golfbox

Python script which allows for automatic registration in Golfbox

Guðni Þór Björnsson 8 Dec 04, 2021
Tenda D151 & D301 - Unauthenticated configuration download

Exploit Title: Tenda D151 & D301 - Unauthenticated configuration download (login included)

Ayoub 3 Jul 14, 2022
Osu statistics right on your desktop, made with pyqt

Osu!Stat Osu statistics right on your desktop, made with Qt5 Credits Would like to thank these creators for their projects and contributions. ppy, osu

Aditya Gupta 21 Jul 13, 2022
This application is made solely for entertainment purposes

Timepass This application is made solely for entertainment purposes helps you find things to do when you're bored ! tells jokes guaranteed to bring on

Omkar Pramod Hankare 2 Nov 24, 2021