Exploration-Exploitation Dilemma Solving Methods

Medium article for this repo - HERE

In ths repo I implemented two techniques for tackling mentioned tradeoff. Methods Include:-

Epsilon Greedy (With different epsilons)
Thompson Sampling(also known as posterior sampling)

The reason for choosing these two only is to show the upper and lower bounds as epsilons are a starting point in dealing with these tradeoffs and Thompson Sampling is considered a recent state of the Art in this field.

ENV SPECIFICATIONS - A 10 arm testbed is simulated as same demonstrated in Sutton-Barto Book.
True Reward distribution (Here Action-2 is best)

Comparison Greedy(or Epsilon Greedies and TS

we used three different epsilons here for testing i.e:

epsilon = 0 => Greedy Agent
epsilon = 0.01 => exploration with 1% probability
epsilon = 0.1 => exploration with 10% probability

and TS

Averaged Over 2500 independent runs with 1500 timesteps

Comparison

Percentage Actions selected for epsilon = 0.01 and TS

Conclusion -> epsilon = 0.01 can be considered best for eps-greedies as it is increasing but pretty slow and the percentage Optimal Actions for it is Around 80% in later stages, on the other hand Thomsan Sampling shows a significant improvement in these results as it quickly explores and then exploit the optimal one with percentage goes upto almost 100 even very early!!.

In case you want to know more about TS visit this Reference.

Exploration-Exploitation Dilemma Solving Methods

Related tags

Overview

Exploration-Exploitation Dilemma Solving Methods

Comparison Greedy(or Epsilon Greedies and TS

Owner

Aman Mishra

Arxiv harvester - Poor man's simple harvester for arXiv resources

Transfer Learning library for Deep Neural Networks.

Optimizers-visualized - Visualization of different optimizers on local minimas and saddle points.

[Link]mareteutral - pars tradg wth M []

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

CL-Gym: Full-Featured PyTorch Library for Continual Learning

Official implementation of the NeurIPS'21 paper 'Conditional Generation Using Polynomial Expansions'.

Pytorch implementations of the paper Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients

Milano is a tool for automating hyper-parameters search for your models on a backend of your choice.

Numerical-computing-is-fun - Learning numerical computing with notebooks for all ages.

This repo implements a 3D segmentation task for an airport baggage dataset.

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Generating Band-Limited Adversarial Surfaces Using Neural Networks

Trading Strategies for Freqtrade

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

这是一个yolox-pytorch的源码，可以用于训练自己的模型。