batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

Contenido del curso Bases de datos del DCC PUC versión 2021-2

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Posterior temperature optimized Bayesian models for inverse problems in medical imaging

Breast cancer is been classified into benign tumour and malignant tumour.

A simple version for graphfpn

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

Focal Loss for Dense Rotation Object Detection

[CVPR 2022] Thin-Plate Spline Motion Model for Image Animation.

Context-Sensitive Misspelling Correction of Clinical Text via Conditional Independence, CHIL 2022

Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images.

Implementation of EMNLP 2017 Paper "Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog" using PyTorch and ParlAI

TVNet: Temporal Voting Network for Action Localization

Pytorch implementation of "Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet"

Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

RLDS stands for Reinforcement Learning Datasets

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

Simulation of the solar system using various nummerical methods