Unified file system operation experience for different backend

Overview

megfile - Megvii FILE library

build docs Latest version Support python versions License

megfile provides a silky operation experience with different backends (currently including local file system and OSS), which enable you to focus more on the logic of your own project instead of the question of "Which backend is used for this file?"

megfile provides:

  • Almost unified file system operation experience. Target path can be easily moved from local file system to OSS.
  • Complete boundary case handling. Even the most difficult (or even you can't even think of) boundary conditions, megfile can help you easily handle it.
  • Perfect type hints and built-in documentation. You can enjoy the IDE's auto-completion and static checking.
  • Semantic version and upgrade guide, which allows you enjoy the latest features easily.

megfile's advantages are:

  • smart_open can open resources that use various protocols, including fs, s3, http(s) and stdio. Especially, reader / writer of s3 in megfile is implemented with multi-thread, which is faster than known competitors.
  • smart_glob is available on s3. And it supports zsh extended pattern syntax of [], e.g. s3://bucket/video.{mp4,avi}.
  • All-inclusive functions like smart_exists / smart_stat / smart_sync. If you don't find the functions you want, submit an issue.
  • Compatible with pathlib.Path interface, referring to S3Path and SmartPath.

Quick Start

Here's an example of writing a file to OSS, syncing to local, reading and finally deleting it.

from megfile import smart_open, smart_exists, smart_sync, smart_remove, smart_glob
from megfile.smart_path import SmartPath

# open a file in s3 bucket
with smart_open('s3://playground/refile-test', 'w') as fp:
    fp.write('refile is not silver bullet')

# test if file in s3 bucket exist
smart_exists('s3://playground/refile-test')

# copy files or directories
smart_sync('s3://playground/refile-test', '/tmp/playground')

# remove files or directories
smart_remove('s3://playground/refile-test')

# glob files or directories in s3 bucket
smart_glob('s3://playground/video-?.{mp4,avi}')

# or in local file system
smart_exists('/tmp/playground/refile-test')

# smart_open also support protocols like http / https
smart_open('https://www.google.com')

# SmartPath interface
path = SmartPath('s3://playground/megfile-test')
if path.exists():
    with path.open() as f:
        result = f.read(7)
        assert result == b'megfile'

Installation

PyPI

pip3 install megfile

You can specify megfile version as well

pip3 install "megfile~=0.0"

Build from Source

megfile can be installed from source

git clone [email protected]:megvii-research/megfile.git
cd megfile
pip3 install -U .

Development Environment

git clone [email protected]:megvii-research/megfile.git
cd megfile
sudo apt install libgl1-mesa-glx libfuse-dev fuse
pip3 install -r requirements.txt -r requirements-dev.txt

How to Contribute

  • We welcome everyone to contribute code to the megfile project, but the contributed code needs to meet the following conditions as much as possible:

    You can submit code even if the code doesn't meet conditions. The project members will evaluate and assist you in making code changes

    • Code format: Your code needs to pass code format check. megfile uses yapf as lint tool and the version is locked at 0.27.0. The version lock may be removed in the future

    • Static check: Your code needs complete type hint. megfile uses pytype as static check tool. If pytype failed in static check, use # pytype: disable=XXX to disable the error and please tell us why you disable it.

      Note : Because pytype doesn't support variable type annation, the variable type hint format introduced by py36 cannot be used.

      i.e. variable: int is invalid, replace it with variable # type: int

    • Test: Your code needs complete unit test coverage. megfile uses pyfakefs and moto as local file system and OSS virtual environment in unit tests. The newly added code should have a complete unit test to ensure the correctness

  • You can help to improve megfile in many ways:

    • Write code.
    • Improve documentation.
    • Report or investigate bugs and issues.
    • If you find any problem or have any improving suggestion, submit a new issuse as well. We will reply as soon as possible and evaluate whether to adopt.
    • Review pull requests.
    • Star megfile repo.
    • Recommend megfile to your friends.
    • Any other form of contribution is welcomed.
Owner
MEGVII Research
Power Human with AI. 持续创新拓展认知边界 非凡科技成就产品价值
MEGVII Research
An implementation of quantum convolutional neural network with MindQuantum. Huawei, classifying MNIST dataset

关于实现的一点说明 山东大学 2020级 苏博南 www.subonan.com 文件说明 tools.py 这里面主要有两个函数: resize(a, lenb) 这其实是我找同学写的一个小算法hhh。给出一个$28\times 28$的方阵a,返回一个$lenb\times lenb$的方阵。因

ぼっけなす 2 Aug 29, 2022
Implementation of Axial attention - attending to multi-dimensional data efficiently

Axial Attention Implementation of Axial attention in Pytorch. A simple but powerful technique to attend to multi-dimensional data efficiently. It has

Phil Wang 250 Dec 25, 2022
The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

Turing Change Point Detection Benchmark Welcome to the repository for the Turing Change Point Detection Benchmark, a benchmark evaluation of change po

The Alan Turing Institute 85 Dec 28, 2022
A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

JK Jung 118 Nov 10, 2022
A PyTorch implementation of a Factorization Machine module in cython.

fmpytorch A library for factorization machines in pytorch. A factorization machine is like a linear model, except multiplicative interaction terms bet

Jack Hessel 167 Jul 06, 2022
torchsummaryDynamic: support real FLOPs calculation of dynamic network or user-custom PyTorch ops

torchsummaryDynamic Improved tool of torchsummaryX. torchsummaryDynamic support real FLOPs calculation of dynamic network or user-custom PyTorch ops.

Bohong Chen 1 Jan 07, 2022
TextureGAN in Pytorch

TextureGAN This code is our PyTorch implementation of TextureGAN [Project] [Arxiv] TextureGAN is a generative adversarial network conditioned on sketc

Patsorn 147 Dec 14, 2022
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022
Tensorflow AffordanceNet and AffContext implementations

AffordanceNet and AffContext This is tensorflow AffordanceNet and AffContext implementations. Both are implemented and tested with tensorflow 2.3. The

Beatriz Pérez 6 Dec 01, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022
A light-weight image labelling tool for Python designed for creating segmentation data sets.

An image labelling tool for creating segmentation data sets, for Django and Flask.

117 Nov 21, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022
Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Association Rules Mining Using Python Implementation of association rules mining algorithms (Apriori|FPGrowth) using python. As a part of hw1 code in

Pre 2 Nov 10, 2021
Ganilla - Official Pytorch implementation of GANILLA

GANILLA We provide PyTorch implementation for: GANILLA: Generative Adversarial Networks for Image to Illustration Translation. Paper Arxiv Updates (Fe

Samet Hi 462 Dec 05, 2022
Python PID Tuner - Makes a model of the System from a Process Reaction Curve and calculates PID Gains

PythonPID_Tuner_SOPDT Step 1: Takes a Process Reaction Curve in csv format - assumes data at 100ms interval (column names CV and PV) Step 2: Makes a r

1 Jan 18, 2022
PyTorch implementation for Score-Based Generative Modeling through Stochastic Differential Equations (ICLR 2021, Oral)

Score-Based Generative Modeling through Stochastic Differential Equations This repo contains a PyTorch implementation for the paper Score-Based Genera

Yang Song 757 Jan 04, 2023
Dieser Scanner findet Websites, die nicht direkt in Suchmaschinen auftauchen, aber trotzdem erreichbar sind.

Deep Web Scanner Dieses Script findet Websites, die per IPv4-Adresse erreichbar sind und speichert deren Metadaten. Die Ausgabe im Terminal wird nach

Alex K. 30 Nov 18, 2022
Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

Machine Learning and Data Mining, Summer 2021-2022 How to learn data science and machine learning? Programming. Learn Python. Basic Statistics. Take a

Simeon Hristov 8 Oct 04, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022