A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Last update: Jan 08, 2023

Overview

WebDataset

WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and uses only sequential/streaming data access. This brings substantial performance advantage in many compute environments, and it is essential for very large scale training.

While WebDataset scales to very large problems, it also works well with smaller datasets and simplifies creation, management, and distribution of training data for deep learning.

WebDataset implements standard PyTorch IterableDataset interface and works with the PyTorch DataLoader. Access to datasets is as simple as:

import webdataset as wds

dataset = wds.WebDataset(url).shuffle(1000).decode("torchrgb").to_tuple("jpg;png", "json")
dataloader = torch.utils.data.DataLoader(dataset, num_workers=4, batch_size=16)

for inputs, outputs in dataloader:
    ...

In that code snippet, url can refer to a local file, a local HTTP server, a cloud storage object, an object on an object store, or even the output of arbitrary command pipelines.

WebDataset fulfills a similar function to Tensorflow's TFRecord/tf.Example classes, but it is much easier to adopt because it does not actually require any kind of data conversion: data is stored in exactly the same format inside tar files as it is on disk, and all preprocessing and data augmentation code remains unchanged.

Documentation

Installation

$ pip install webdataset

For the Github version:

$ pip install git+https://github.com/tmbdev/webdataset.git

Documentation

Introductory Videos

Here are some videos talking about WebDataset and large scale deep learning:

Related Libraries and Software

The AIStore server provides an efficient backend for WebDataset; it functions like a combination of web server, content distribution network, P2P network, and distributed file system. Together, AIStore and WebDataset can serve input data from rotational drives distributed across many servers at the speed of local SSDs to many GPUs, at a fraction of the cost. We can easily achieve hundreds of MBytes/s of I/O per GPU even in large, distributed training jobs.

The tarproc utilities provide command line manipulation and processing of webdatasets and other tar files, including splitting, concatenation, and xargs-like functionality.

The tensorcom library provides fast three-tiered I/O; it can be inserted between AIStore and WebDataset to permit distributed data augmentation and I/O. It is particularly useful when data augmentation requires more CPU than the GPU server has available.

You can find the full PyTorch ImageNet sample code converted to WebDataset at tmbdev/pytorch-imagenet-wds

Comments

Unable to reach maximum download speed

I have an image dataset of about 250 1GiB tars on GCS, which I'm loading by using urls = f'pipe:gsutil cat {urls}'.

I tested the maximum download speed with gsutil -m cp ... and got about 750MiB/s with 96 processes (on a 96 vCPU GCP VM), which would amount to about 15000 images/s. I'm testing the data I/O only, no accelerators.

However, using wds.MultiDataset (96 workers) with basic decode, to_tuple etc steps (and minimal cropping with albumentations) and batching to 256 results in much slower download & processing, about 2500 images/s (this is actually already really fast but I'd like to try to get it even faster :D ).

So I'm not bottlenecked by download speed, and CPU utilization for the 96 processes shows <30%, so I'm not bottlenecked by the CPU either...

I profiled the code with pyinstrument and the bottleneck seems to be somewhere in the batching:

I'm working on non-public data, but basically same thing happens with the docs openimages example.

So given that gsutil -m results in fast download speeds but webdataset doesn't, and both use python (?) multiprocessing, maybe gsutil doing something more efficiently here? I don't really know the multiprocessing etc. libraries well at all so I'm at a bit of a loss here...

opened by harpone 11
syntax error in 0.2.4 release

Getting a syntax error on .../webdataset/cache.py", line 43 while data := stream.read(chunk_size): Looks like := is a valid syntax from python 3.8. Is webdataset limited to python >= 3.8?
enhancement

opened by czmrand 9
npy support

in many cases reading tensor data using numpy.load is many times faster than loading a pickle file. Would it be possible to add it to the basic_handlers?
enhancement

opened by shayanfazeli 7
DDP training problem

Hi, how to make sure the WebLoader generated batches is different data when use multi node training? Is there any up-to-date example code using webdataset for DDP training?

opened by marson666 6
Adding option to shuffle shards before splitting to workers, based on current epoch

What

This PR adds an option to shuffle the shards before splitting them between nodes and workers. We coordinate this shuffling via a shared random seed, to ensure shards are properly distributed between workers. With epoch-based training loops, the idea is to set this random seed to a different number every epoch, ensuring shards are distributed differently at each iteration. This is completely optional, and should not affect existing workflows in any way. If users chose to use it, the only change they need to make is to simply call dataloader.set_epoch(epoch) at each epoch before iterating.

Why

Without this, each worker always sees the same subset of shards at every epoch, when using the standard nodesplitter and split_by_worker. We found this to have a drastic negative impact in performance when training contrastive models, since it limits the possibilities of which datapoints are being contrasted. With this fix, this issue was substantially mitigated.

How

The fix is two small modifications. One is to introduce a set_epoch method, which informs the dataloader (and everything that composes it up to the ShardList) of the current epoch. The second one is to use that as a random seed for shuffling shards before they are being split between workers and nodes.

We have found this to significantly improve performance in our contrastive learning experiments, and would greatly appreciate if you could take a look at this!

cc @mitchnw
documentation

opened by gabrielilharco 6
aws storage

The google cloud storage buckets work well (for the most part) with this framework. Is there any suggested/preferred methodology to utilize the amazon s3 storage with this? (Putting tar files in that bucket and use the URL to load them in a local machine)

opened by shayanfazeli 6

Add ability to re-create bit-exact Webdatasets

Currently it is impossible to re-create bit-exact Webdatasets, as each file in the Tar archive has a different mtime. This has slightly annoying implications for file caching and versioning, as you have to deal with changed-but-not-really-changed files all the time.

My first question is: Is this on purpose? Does mtime encode any valuable information for the user?

If no, why not set it to a fixed value, e.g. 1970-01-01?

If yes, why not make it overridable by the user, e.g. by a mtime kwarg?

Currently, this PR does the latter: It introduces a parameter that allows me to override this value.

To reproduce

import webdataset as wds

for f in ["test1.tar", "test2.tar"]:
    with wds.TarWriter(f) as sink:
        sink.write(
            {
                "__key__": "sample0000",
                "input.txt": "1",
                "output.txt": "2",
            },
        )

$ md5sum *.tar
0f90a5b6ca6f0e423685e23cad872d36  test1.tar                                                                                                                                                                                                                                     
49e91eaa9c54125897b2286af2757c0e  test2.tar
$ tar tvf test1.tar 
-r--r--r-- bigdata/bigdata   1 2021-11-28 15:16 sample0000.input.txt
-r--r--r-- bigdata/bigdata   1 2021-11-28 15:16 sample0000.output.txt
$ tar tvf test1.tar 
-r--r--r-- bigdata/bigdata   1 2021-11-28 15:16 sample0000.input.txt     # the dates are off by microseconds
-r--r--r-- bigdata/bigdata   1 2021-11-28 15:16 sample0000.output.txt    # the dates are off by microseconds

Behaviour after this PR

With this PR it is possible to override the mtime parameter of TarWriter.write(...), and to set it to arbitrary fixed values:

import webdataset as wds

for f in ["test1.tar", "test2.tar"]:
    with wds.TarWriter(f) as sink:
        sink.write(
            {
                "__key__": "sample0000",
                "input.txt": "1",
                "output.txt": "2",
            },
            mtime=0.0
        )

which creates bit-exact copies of the output file:

$ md5sum *.tar
b1de8150b28126256f05943741b2b5ab  test1.tar
b1de8150b28126256f05943741b2b5ab  test2.tar
$ tar tvf test1.tar 
-r--r--r-- bigdata/bigdata   1 1970-01-01 01:00 sample0000.input.txt
-r--r--r-- bigdata/bigdata   1 1970-01-01 01:00 sample0000.output.txt
$ tar tvf test2.tar 
-r--r--r-- bigdata/bigdata   1 1970-01-01 01:00 sample0000.input.txt
-r--r--r-- bigdata/bigdata   1 1970-01-01 01:00 sample0000.output.txt

opened by nils-werner 5

Remove torch from dependencies

WebDataset 0.1.76 which I was using before doesn't have torch as a dependency. However, newer versions do.

I don't think it's a good idea since TarWriter (and maybe other webdataset parts) can be used for creating the dataset in a solely data processing environment. Bringing in a huge (torch 1.10.0 is 1.8 GB!) dependency is a very big overhead.

What do you guys this?

opened by danielgafni 5
Large size of tar/shard writer output

I have a text dataset with 20G of data and I tried to use webdataset ShardWriter/TarWriter to convert it. Unfortunately, the initial 20Gb data becomes astonishingly large after the conversion, here are the methods I tried and the output size in the disk.

| Method | Output Size (Gb) | Data stored | |-------------------|------------------|------------------------------------| | ShardWritter pth | 800 | int32 tensors of variable size | | ShardWritter pdy | 300 | Raw text data | | ShardWritter ten | 260 | int32 numpy array of variable size | | ShardWritter npy | 300 | int32 numpy array of variable size | | ShardWritter json | 300 | Raw text data | | tarp cli | 210 | Raw text data |

Is this the expected behavior? Why a raw data stored in tar files are much larger than the original data?

opened by huberemanuel 5
True random access (i.e. Dataset vs. IterableDataset)

Hi, this is a really neat library! However it would be nice to have a simpler interface to TAR files that allows random access, instead of enforcing sequential access. Let me explain why.

The most common use-case for academic labs such as mine are not terabyte-scale remote datasets where sharding is common, but several gigabytes datasets with a few million small images.

So, a dataset fits in a single machine; but due to automatic scheduling in a cluster we cannot have all datasets in all machines. In this context, the easiest solution is to copy the dataset to the local machine at the start of training. However, millions of small files really strain the filesystem (both for copying and accessing during training). So copying and reading from a single TAR file would be ideal -- and WebDataset seems (on the surface) to do this well.

But the constraints of the IterableDataset are maybe a step too far. We now have to make decisions about how many shards to use, sizes of rolling buffers for shuffling, etc. This adds a ton of friction, and the uneasy feeling that now the samples are not sufficiently IID for training if you make a wrong decision. Compare this to the ease of training with tons of small images in a filesystem.

I was trying to use WebDataset and get colleagues to adopt it, but this is a big wall for adoption. Uncompressed TAR files allow random access. Could this not be a simpler entry-point for WebDataset? A user who wants to scale things up would find it easier to adapt then to the IterableDataset, but I think that many users would be perfectly happy with the random access version, which is much less restrictive.
enhancement

opened by jotaf98 5
Serious drop in data loading speed observed

Has anyone else noticed download issues with webdataset, about 10x drop in data loading speed? I'm observing slowdowns with everything (on GCS at least), for example also the @tmbdev 's openimages dataset...

Please see the gist here. Should be ready to run.

In my earlier benchmarks I was able to get about 2000 img/s with 8 processes with the above script (50 minibatches of size 256 in about 6.5 s), but now I'm getting about 400 img/s top.

I'm on a GCP VM. Tried downgrading various libraries, spun up a VM from an image from last July etc... gsutil multiprocessing downloads from GCS are still very fast (~570MiB/s => 11400 openimages img/s)

I'm totally confused what's going on!

opened by harpone 5
Periods in base filename interpreted as extensions

It appears that periods in the base part of the filename cause issues. For example the filename ./235342 Track 2.0 (Clean Version).mp3 leads to an unexpected key of 0 (clean version).mp3. This caused sporadic downstream issues like:

ValueError: didn't find ['mp3'] in ['__key__', '__url__', '0 (clean version).json', '0 (clean version).mp3']

Looking into the code, it appears that this is by design in order to support multiple extensions like ".seg.jpg", but it would be nice to mention this in the documentation to avoid surprises.

opened by iceboundflame 0
gopen_curl fails on windows

Loading a tarfile with a url as a webdataset fails because of a wrongly formatted curl command for Windows.

CMD is: https://github.com/webdataset/webdataset/blob/main/webdataset/gopen.py#L195 Single quotes around the url lead to improper interpretation of urls that contain the & symbol on a windows machine.
enhancement

opened by pratikgujjar 1
Checkpoint support
Hello,

I am wondering if there is any support for checkpoint in WebDataset. Our usecase is like this, if we have a series of samples[s_1, s_2, s_3... s_n]. If it is crashed, would webdataset can support this:

When the training is checkpointed, the index from which the webdataset sample is loading is also checkpointed.

When resuming from a checkpoint, WebDataset can start loading sampels from the checkpointed index.
opened by yncxcw 1
Updated syntax for multinode usage
Hi, I'm working on Google Colab and trying to setup minimal example of multi-core pytorch training with webdataset using data on GCP Buckets. Specifically I've got a bucket with my training shards and another bucket with test shards.

For colab setup, I'm using the suggested wheels from the pytorch XLA notebook for multi-core and adding webdataset to the pip installs. I see I've got webdataset 0.2.31 from pip show webdataset in the console.

I tried to start from (https://webdataset.github.io/webdataset/multinode/) with this code in the map function:

urls = list(braceexpand.braceexpand(FLAGS['train_urls'])) dataset = (wds.WebDataset(urls) .to_tuple('x_data.npy', 'y_data.npy') .map(preprocess) .batched(50))

but I'm seeing some strange behavior that I don't understand. Specifically, the docs indicates that a default nodesplitter and split_by_worker should be in effect:

urls = list(braceexpand.braceexpand("dataset-{000000..000999}.tar")) dataset = wds.ShardList(urls, splitter=wds.split_by_worker, nodesplitter=wds.split_by_node, shuffle=False) dataset = wds.Processor(dataset, wds.url_opener) dataset = wds.Processor(dataset, wds.tar_file_expander) dataset = wds.Processor(dataset, wds.group_by_keys)

What I see from the following however is that every core processes every file with the following in my map function. (Next step would be to investigate worker splitting once we've got shards distributed across cores)

loader = DataLoader(dataset, num_workers=FLAGS['num_workers']) for batch, (x,y) in enumerate(loader): print(f'core:{rank} batch:{batch}: x_min: {torch.min(x)} x_max: {torch.max(x)} len:{x.size()}')

I saw a related post with a suggestion to use v2 branch (#https://github.com/webdataset/webdataset/issues/138) but (if possible) I'd like to have my dependency be the main release that's going to come in under webdataset with pip.

I also attempted to expand on the wds.Processor approach but that doesn't seem to be available in this version maybe? High level just looking for best advice on implementing a pretty vanilla version of this with shard shuffling and sample shuffling. I'm not sure how shard shuffling across cores is coordinated (do you use a parallel loader?) so thoughts on that would be great too. If a minimum working example notebook would improve/clarify this question that's also something I'm happy to provide.
opened by matt-gss 0
when converting dataset to wds, data is getting larger.
this is my code:

with wds.ShardWriter(pattern.format(proc_id), maxsize=int(1e10), maxcount=int(1000000)) as sink: for i in tqdm(idx_list): text, image = old_ds[i] key = uuid1().__str__() sink.write({ "__key__": key, # uuid "input.jpg": image, # PIL image "input.txt": text, # a string })

My dataset is composed of 880k image-text pairs and it was 202G in LMDB format(same as small single image files on the disk, image stored as base64) When converting to wds, it became 474G

I saw the issues before, but the dataset still getting larger when trying tar.gz ..
opened by yli1994 4

AttributeError: module 'webdataset' has no attribute 'ShardList'

From the documentation: https://webdataset.github.io/webdataset/multinode/#splitting-shards-across-nodes-and-workers

urls = list(braceexpand.braceexpand("dataset-{000000..000999}.tar"))
dataset = wds.ShardList(urls, splitter=wds.split_by_worker, nodesplitter=wds.split_by_node, shuffle=False)
dataset = wds.Processor(dataset, wds.url_opener)
dataset = wds.Processor(dataset, wds.tar_file_expander)
dataset = wds.Processor(dataset, wds.group_by_keys)

Running this results in

AttributeError: module 'webdataset' has no attribute 'ShardList'

documentation

opened by drscotthawley 2

Releases(0.2.31)

0.2.31(Nov 29, 2022)

null
Source code(tar.gz)
Source code(zip)
0.2.30(Nov 4, 2022)

null
Source code(tar.gz)
Source code(zip)
0.2.26(Sep 17, 2022)

null
Source code(tar.gz)
Source code(zip)
0.2.25(Sep 14, 2022)

null
Source code(tar.gz)
Source code(zip)
0.2.22(Sep 14, 2022)
GOPEN_VERBOSE works for cache

maybe_cached_tarfile_to_samples for making optional caches easier

cache_dir now determines whether caching happens

sample caching with Cached / LMDBCached

Full Changelog: https://github.com/webdataset/webdataset/compare/0.2.20...0.2.22
Source code(tar.gz)
Source code(zip)
0.2.20(Aug 31, 2022)
0.2.20

added maybe_cached_tarfile_to_samples for simple pipeline building with caches

0.2.19

changed default behavior for caching in WebDataset (cache_dir, not cache_size determines cache use)

Source code(tar.gz)
Source code(zip)
0.2.18(Aug 22, 2022)

You can use GOPEN_REWRITE to rewrite paths with regular expressions, USE_AIS_FOR to redirect gs/s3 URLs to an AIStore instance, and Cached/LMDBCached to store local copies of samples after decoding.
Source code(tar.gz)
Source code(zip)
0.2.5(Mar 25, 2022)

Source code(tar.gz)
Source code(zip)
0.2.4(Mar 17, 2022)

Source code(tar.gz)
Source code(zip)
0.1.103(Nov 4, 2021)

Source code(tar.gz)
Source code(zip)
0.1.96(Oct 24, 2021)

Source code(tar.gz)
Source code(zip)
0.1.87(Oct 15, 2021)

Source code(tar.gz)
Source code(zip)
0.1.86(Oct 15, 2021)

Source code(tar.gz)
Source code(zip)
0.1.85(Oct 15, 2021)

Source code(tar.gz)
Source code(zip)
0.1.84(Oct 15, 2021)

Source code(tar.gz)
Source code(zip)
0.1.76(Sep 6, 2021)

Source code(tar.gz)
Source code(zip)
0.1.75(Sep 6, 2021)

Source code(tar.gz)
Source code(zip)
0.1.74(Sep 6, 2021)

Source code(tar.gz)
Source code(zip)
0.1.73(Aug 29, 2021)

Source code(tar.gz)
Source code(zip)
0.1.65(May 13, 2021)

Source code(tar.gz)
Source code(zip)
0.1.62(May 1, 2021)

Source code(tar.gz)
Source code(zip)
0.1.61(Apr 30, 2021)

Source code(tar.gz)
Source code(zip)
0.1.59(Apr 21, 2021)

Source code(tar.gz)
Source code(zip)
0.1.58(Apr 13, 2021)

Source code(tar.gz)
Source code(zip)
0.1.54(Mar 17, 2021)

Source code(tar.gz)
Source code(zip)
0.1.51(Mar 16, 2021)

Source code(tar.gz)
Source code(zip)
0.1.49(Feb 18, 2021)

Source code(tar.gz)
Source code(zip)
0.1.48(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.1.46(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)
0.1.44(Feb 12, 2021)

Source code(tar.gz)
Source code(zip)

Owner

High Performance I/O for Large Scale Deep Learning

GitHub Repository

Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

5 Mar 21, 2022

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Intro Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" Robotics:Science and

21 Dec 07, 2022

Face Mask Detection on Image and Video using tensorflow and keras

Face-Mask-Detection Face Mask Detection on Image and Video using tensorflow and keras Train Neural Network on face-mask dataset using tensorflow and k

12 Nov 11, 2022

Generative Adversarial Text-to-Image Synthesis

###Generative Adversarial Text-to-Image Synthesis Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee This is the

883 Dec 31, 2022

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

TaichiSLAM This project is a 3D Dense mapping backend library of SLAM based Taichi-Lang, designed for the aerial swarm. Intro Taichi is an efficient d

230 Dec 19, 2022

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

120 Dec 24, 2022

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

Introduction This is the source code of our TCSVT 2021 paper "MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieval". Ple

7 Aug 24, 2022

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

TransZero++ This repository contains the testing code for the paper "TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning" submitted

6 Aug 16, 2022

Open source code for the paper of Neural Sparse Voxel Fields.

Neural Sparse Voxel Fields (NSVF) Project Page | Video | Paper | Data Photo-realistic free-viewpoint rendering of real-world scenes using classical co

647 Dec 27, 2022

Simple STAC Catalogs discovery tool.

STAC Catalog Discovery Simple STAC discovery tool. Just paste the STAC Catalog link and press Enter. Details STAC Discovery tool enables discovering d

21 Oct 19, 2022

Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

3 Sep 30, 2021

InterfaceGAN++: Exploring the limits of InterfaceGAN

InterfaceGAN++: Exploring the limits of InterfaceGAN Authors: Apavou Clément & Belkada Younes From left to right - Images generated using styleGAN and

42 Dec 23, 2022

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

165 Nov 22, 2022

Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

296 Dec 27, 2022

CBKH: The Cornell Biomedical Knowledge Hub

Cornell Biomedical Knowledge Hub (CBKH) CBKG integrates data from 18 publicly available biomedical databases. The current version of CBKG contains a t

44 Dec 21, 2022

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

SeaLion is designed to teach today's aspiring ml-engineers the popular machine learning concepts of today in a way that gives both intuition and ways of application. We do this through concise algori

324 Dec 27, 2022

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

IAug_CDNet Official Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images. Overview We propose a

53 Dec 02, 2022

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

tf2onnx converts TensorFlow (tf-1.x or tf-2.x), tf.keras and tflite models to ONNX via command line or python api.

1.8k Jan 08, 2023

This repository contains the files for running the Patchify GUI.

Repository Name Train-Test-Validation-Dataset-Generation App Name Patchify Description This app is designed for crop images and creating smal

9 Feb 15, 2022

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Related tags

Overview

WebDataset

Documentation

Installation

Introductory Videos

Related Libraries and Software

Comments

What

Why

How

To reproduce

Behaviour after this PR

Releases(0.2.31)

0.2.31(Nov 29, 2022)

0.2.30(Nov 4, 2022)

0.2.26(Sep 17, 2022)

0.2.25(Sep 14, 2022)

0.2.22(Sep 14, 2022)

0.2.20(Aug 31, 2022)

0.2.18(Aug 22, 2022)

0.2.5(Mar 25, 2022)

0.2.4(Mar 17, 2022)

0.1.103(Nov 4, 2021)

0.1.96(Oct 24, 2021)

0.1.87(Oct 15, 2021)

0.1.86(Oct 15, 2021)

0.1.85(Oct 15, 2021)

0.1.84(Oct 15, 2021)

0.1.76(Sep 6, 2021)

0.1.75(Sep 6, 2021)

0.1.74(Sep 6, 2021)

0.1.73(Aug 29, 2021)

0.1.65(May 13, 2021)

0.1.62(May 1, 2021)

0.1.61(Apr 30, 2021)

0.1.59(Apr 21, 2021)

0.1.58(Apr 13, 2021)

0.1.54(Mar 17, 2021)

0.1.51(Mar 16, 2021)

0.1.49(Feb 18, 2021)

0.1.48(Feb 12, 2021)

0.1.46(Feb 12, 2021)

0.1.44(Feb 12, 2021)

Owner

Keras Image Embeddings using Contrastive Loss

Official implementation of "Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation" (RSS 2022)

Face Mask Detection on Image and Video using tensorflow and keras

Generative Adversarial Text-to-Image Synthesis

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

MARS: Learning Modality-Agnostic Representation for Scalable Cross-media Retrieva

TransZero++: Cross Attribute-guided Transformer for Zero-Shot Learning

Open source code for the paper of Neural Sparse Voxel Fields.

Simple STAC Catalogs discovery tool.

Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

InterfaceGAN++: Exploring the limits of InterfaceGAN

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Wide Residual Networks (WideResNets) in PyTorch

CBKH: The Cornell Biomedical Knowledge Hub

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

The first machine learning framework that encourages learning ML concepts instead of memorizing class functions.

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

tf2onnx - Convert TensorFlow, Keras and Tflite models to ONNX.

This repository contains the files for running the Patchify GUI.