PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Last update: Aug 01, 2022

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

The main objective of this library is to take training data from Kafka to create a PyTorch Dataset. This is useful when we have data distributed in Kafka and we want to train a model with this framework. The structure of data messages in Kafka should be key:value, where key is the label and value the input.

Usage

To use this library, you just have to create a TrainingKafkaDataset with a ControlMessage, boostrapServers, and a group_id. Once the object has been created and the data has been obtained from Kafka, the object is usable as a normal PyTorch Dataset, being for example, iterable with a DataLoader.

ControlMessage is a dictionary, which principal keys are topic and input_config.

In topic, you have to proportionate a comma-separated string with the different topic, partition, start and end offset (those values separated with double dots, like in Kafka). In input_config, you have to indicate the reshapes of the data fetched from Kafka, this is because Kafka works in bytes, and its needed to decode back the inputs of our model.

boostrap_servers and group_id are common parameters used in KafkaConsumers. This parameters are given directly to the KafkaConsumers inside the object.

Here you have an example of creating a TrainingKafkaDataset:

kafkaControlMessage = {'topic': 'pytorch_mnist_test:0:0:20000,pytorch:0:20000:50000,pytorch_mnist_test:0:120000:140000',
                'input_config': {'data_type': 'uint8', 
                                 'label_type': 'uint8', 
                                 'data_reshape': '28 28', 
                                 'label_reshape': ''}, 
                }
bootstrap_server = ["localhost:9094"]
group_id = 'sink'
df = TrainingKafkaDataset(kafkaControlMessage, bootstrap_server, group_id, ToTensor())

Examples

There is a folder with full example of Data Fetching and training of a model, specifically with MNIST dataset.

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

Usage

Examples

Owner

ERTIS Research Group

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Rank1 Conversation Emotion Detection Task

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.

Some code of the implements of Geological Modeling Using 3D Pixel-Adaptive and Deformable Convolutional Neural Network

《Geo Word Clouds》paper implementation

EMNLP 2021 Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.

Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

abess: Fast Best-Subset Selection in Python and R

neural image generation

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

ECCV2020 paper: Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards. Code and Data.

Improving Transferability of Representations via Augmentation-Aware Self-Supervision

Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

[NeurIPS 2020] Official Implementation: "SMYRF: Efficient Attention using Asymmetric Clustering".

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

Unsupervised Learning of Video Representations using LSTMs

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021