Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Overview

open(LARGE)

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Motivation

Oftentimes, especially when working with data-heavy applications, large files can proliferate in a repository. Version controlling them is an obvious next step, however, GitHub's git LFS implementation doesn't support the deletion of large files, making it easy for them to eat-up the LFS quota and explode the size of your repos.

Solution

pip install open-large

Simple example

from open_large import LargeFile

LargeFile.configure_credentials({
    "aws_region_name": "your_region_like_eu-west-2",
    "aws_access_key_id": "YOUR_ACCESS_KEY_ID",
    "aws_secret_access_key": "YOUR_VERY_SECRET_ACCESS_KEY",
    "large_files_bucket_name": "create_a_bucket_and_put_its_name_here",
})

# Creates a new version and deletes the older version leaving the 3 most recently used intact
with LargeFile("test.txt", "w", keep_last_n=3) as f:
    for i in range(100000):
        f.write('test\n')

# By default the latest version is returned
# but an optional `version` keyword argument can be provided as well
with LargeFile("test.txt", "r") as f:
    print(f.readlines()[0])

Automatically creates a file, writes to it, uploads it to S3, and then queries the most recent version of it. In this case, the latest version is already in the local cache, no download is required.

More details

LargeFile behaves like an opened file (in the background it is a temp file after all). Binary reading and writing is supported along with the different keywords open() accepts.

The local cache can be configured with these properties:

LargeFile.cache_path = Path('.cache')
LargeFile.max_cache_size = "30 GB"

I only need a path

In case you only need a path to the "remote" file, this pattern can be applied:

path_to_model = LargeFile("folder-of-my-bert-model", version=31).get()

This will first download the file/folder into your local cache folder. Then, it returns a Path object to the local version. Which can be turned into a string with str(path_to_model).

The same approach works for uploads:

LargeFile("folder-of-my-bert-model").push('path_to_local/folder_or_file')

This way, both regular files and folders can be handled. The uploaded file is called folder-of-my-bert-model, the local name is ignored.

Lastly, all version of the remote object can be deleted by calling LargeFile("my-file").delete(). It will still reside in your local cache afterwards, its deletion will happen next time your local cache has to be pruned.

Command-line example

The package can be used as a module from the command-line to give you more flexibility.

Setup

Create an .ini file (or use ~/.aws/credentials). It may look like this:

[DEFAULT]
aws_region_name = your_region_like_eu-west-2
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_VERY_SECRET_ACCESS_KEY
large_files_bucket_name = my_large_files

Just like in example secrets.

Print the expected options

python3 -m open_large --help

Upload some files

python3 -m open_large --secrets secrets.ini --push my_first_file.json folder/my_second_file my_folder

Only the filename is used as the S3 name, the rest of the path is ignored.

Download some files to the local cache

This can be useful when building a Docker image for example. This way, the files can already reside inside the container and need not be downloaded later.

python3 -m open_large --secrets ~/.aws/credentials --cache my_first_file.json:3 my_second_file my_folder:0

Versions may be specified by using :-s.

Delete remote files

python3 -m open_large --secrets ~/.aws/credentials --delete my_first_file.json
File Downloader

File Downloader Watches a file containing download links and runs a command to download them. The link file is in form of: # comment DOWNLOAD_LINK

Pouriya 1 Jan 08, 2022
YoutubeDownloader - Repo for downloading YT audio and videos

YoutubeDownloader Downloads video/playlist/audio from youtube url. install all t

Anuj SP 2 Feb 17, 2022
Simple python script to download .mp3 formatted files from YouTube video URLs

Introduction: Simple python script to download .mp3 formatted files from YouTube video URLs Requirements: Requires: youtube_dl module Requires: ffmpeg

Pat 2 Aug 18, 2022
This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

Debiprasad Das 5 Dec 28, 2022
Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/.

Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/. Features Scrapes 10 pages Scrapes each thread Downloads all the images

aether 1 Dec 29, 2021
Youtube Downloader is a simple but highly efficient Youtube Video Downloader, made completly using Python

Youtube Downloader is a simple but highly efficient Youtube Video Downloader, made completly using Python

Arsh 2 Nov 26, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
Python script for downloading audio from YouTube songs/videos.

Python script for downloading audio from YouTube songs/videos. All you have to do is specify the path to your folder and then type song's/video's name and the sound will be downloaded into your folde

Mateusz Polis 0 Oct 05, 2022
A YouTube downloader app built with Django.

YouTube Downloader ⭐️ Star this project ⭐️ Requirements Python3+ Git Installation Install the dependencies and start the server. git clone https://git

Gabriel Tavares 26 Aug 19, 2022
The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

The PornHub Downloader is a powerfull script used to download and manage both videos and pictures

16 Aug 31, 2022
Search the gallerys by tag and download pictures to the local

booruDownloader Search the gallerys by tag and download pictures to the local

6 Jun 30, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- 11 Dec 23, 2021

Tool To download - Amazon - Netflix- Disney+ - VideoLand - Boomerang - RTE.ie

vinetrimmer Widevine Decryption Script for Python Modules Amazon Netflix (with [email protected]

9 Dec 31, 2021
Python based YouTube video Downloader GUI Application.

Youtube video Downloader Python based Youtube video Downloader GUI Application. Installation Python Dependencies Import pytube pip install pytube Im

Naem Azam 1 Jan 03, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
A Fast as F*** Downloader

FAFD A Fast as F*** Downloader Github Usages You'll want to use a URL like this: https://github.com/RPowell-C/FAFD/raw/main/FAFD.py It's easier DONT F

1 Jan 19, 2022
Persepolis Download Manager is a GUI for aria2.

Persepolis Download Manager Content About FAQ Screenshots Credits About Persepolis is a download manager & a GUI for Aria2. It's written in Python. Pe

Persepolis 5.6k Dec 31, 2022
A cli tool to download purchased products from the DLsite.

dlsite-downloader A cli tool to download purchased products from the DLsite. How can I use? This program runs with configurations defined at settings.

AcrylicShrimp 9 Dec 23, 2022
Twayback: Downloading deleted Tweets from the Wayback Machine, made easy

Finding and downloading deleted Tweets takes a lot of time. Thankfully, with this tool, it becomes a piece of cake! 🎂

126 Dec 27, 2022
Tool To download 4KHDR DV SDR from AppleTV

# APPLE-TV 4K Downloader Tool To download 4K HDR DV SDR from AppleTV Hello Fellow Developers/ ! Hi! My name is WVDUMP. I am Leaking the scripts to

5 Dec 25, 2021