Extract an archive file (zip file or tar file) stored on AWS S3

Overview

S3 Extract

Extract an archive file (zip file or tar file) stored on AWS S3.

Details

Downloads archive from S3 into memory, then extract and re-upload to given destination.

The following S3 information is expected to be given as Environment Variables:

  • AWS_ENDPOINT_URL
  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • CERT_PATH (optional)
  • CERT_DL_URL (optional)
  • SIGNATURE_VERSION (optional, defaults to "s3v4")
  • REGION_NAME (optional, defaults to "us-east-1")

Additionally, these information needed for ClearML remote execution can also be given as Env Variable (optional, args will override env var):

  • DEFAULT_DOCKER_IMG
  • DEFAULT_QUEUE

For those not familiar, environment variables can be set through various ways, some being:

  • export = in terminal
  • can be set in ~/.bashrc as well for more permanence

Iteratively extracting a folder of zips/tars is supported as well, through --src-is-dir flag.

Usage

usage: run.py [-h] [--src-is-dir] [--dst-bucket DST_BUCKET] [--verbose] [--remote] [--clml-proj CLML_PROJ] [--clml-task-name CLML_TASK_NAME] [--clml-task-type CLML_TASK_TYPE]
              [--docker-img DOCKER_IMG] [--queue QUEUE]
              src_bucket src_path dst_path

positional arguments:
  src_bucket            Source bucket
  src_path              Source path
  dst_path              Destination path

optional arguments:
  -h, --help            show this help message and exit
  --src-is-dir          Flag to indicate that given src path is a directory. Will iteratively extract any files in it ending with .zip or .tar.
  --dst-bucket DST_BUCKET
                        Destination bucket (optional), will default to Source bucket.
  --verbose             print out current upload filename as it progresses
  --remote              use clearml to remotely run job
  --clml-proj CLML_PROJ
                        ClearML Project Name
  --clml-task-name CLML_TASK_NAME
                        ClearML Task Name
  --clml-task-type CLML_TASK_TYPE
                        ClearML Task Type, e.g. training, testing, inference, etc
  --docker-img DOCKER_IMG
                        Base docker image to pull for ClearML remote execution
  --queue QUEUE         ClearML remote execution queue

Example usage:

python run.py my-bucket dataset/coco/images.tar dataset/coco/ --verbose --remote --clml-proj coco --clml-task-name coco_extraction --docker-img ubuntu/20.04 --queue 1xGPU
Owner
Evan
Evan
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021
LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

Jose Rodriguez 6 Mar 05, 2022
Test app for importing contact information in CSV files.

Contact Import TestApp Test app for importing contact information in CSV files. Explore the docs » · Report Bug · Request Feature Table of Contents Ab

1 Feb 06, 2022
Convert All TXT Files To One File.

AllToOne Convert All TXT Files To One File. Hi 👋 , I'm Alireza A Python Developer Boy 🔭 I’m currently working on my C# projects 🌱 I’m currently Lea

4 Jun 07, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
A wrapper for DVD file structure and ISO files.

vs-parsedvd DVDs were an error. A wrapper for DVD file structure and ISO files. You can find me in the IEW Discord server

7 Nov 17, 2022
CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api k

Samsung 54 Dec 13, 2022
Nintendo Game Boy music assembly files parser into musicxml format

GBMusicParser Nintendo Game Boy music assembly files parser into musicxml format This python code will get an file.asm from the disassembly of a Game

1 Dec 11, 2021
This is just a GUI that detects your file's real extension using the filetype module.

Real-file.extnsn This is just a GUI that detects your file's real extension using the filetype module. Requirements Python 3.4 and above filetype modu

1 Aug 08, 2021
Python's Filesystem abstraction layer

PyFilesystem2 Python's Filesystem abstraction layer. Documentation Wiki API Documentation GitHub Repository Blog Introduction Think of PyFilesystem's

pyFilesystem 1.8k Jan 02, 2023
Simple addon to create folder structures in blender.

BlenderCreateFolderStructure Simple Add-on to create a folder structure in Blender. Installation Download BlenderCreateFolderStructure.py Open Blender

Dominik Strasser 2 Feb 21, 2022
Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

Department for International Trade 34 Jan 05, 2023
PyDeleter - delete a specifically formatted file in a directory or delete all other files

PyDeleter If you want to delete a specifically formatted file in a directory or delete all other files, PyDeleter does it for you. How to use? 1- Down

Amirabbas Motamedi 1 Jan 30, 2022
organize - The file management automation tool

organize - The file management automation tool

Thomas Feldmann 1.5k Jan 01, 2023
This project is a set of programs that I use to create a README.md file.

🤖 codex-readme 📜 codex-readme What is it? This project is a set of programs that I use to create a README.md file. How does it work? It reads progra

Tom Dörr 224 Jan 07, 2023
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
A Python script to backup your favorite Discord gifs

About the project Discord recently felt like it would be a good idea to limit the favorites to 250, which made me lose most of my gifs... Luckily for

4 Aug 03, 2022
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
Powerful Python library for atomic file writes.

Powerful Python library for atomic file writes.

Markus Unterwaditzer 313 Oct 19, 2022