Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Last update: Dec 20, 2022

Related tags

Deep Learning BPref

Overview

B-Pref

Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments.

Install

conda env create -f conda_env.yml
pip install -e .[docs,tests,extra]
cd custom_dmcontrol
pip install -e .
cd custom_dmc2gym
pip install -e .
pip install git+https://github.com/rlworkgroup/[email protected]#egg=metaworld
pip install pybullet

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_sac.sh 
./scripts/[env_name]/run_sac_unsuper.sh

PPO & PPO + unsupervised pre-training

Experiments can be reproduced with the following:

./scripts/[env_name]/run_ppo.sh 
./scripts/[env_name]/run_ppo_unsuper.sh

Run experiments on irrational teacher

To design more realistic models of human teachers, we consider a common stochastic model and systematically manipulate its terms and operators:

teacher_beta: rationality constant of stochastic preference model (default: -1 for perfectly rational model)
teacher_gamma: discount factor to model myopic behavior (default: 1)
teacher_eps_mistake: probability of making a mistake (default: 0)
teacher_eps_skip: hyperparameters to control skip threshold (\in [0,1])
teacher_eps_equal: hyperparameters to control equal threshold (\in [0,1])

In B-Pref, we tried the following teachers:

Oracle teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Mistake teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0.1, teacher_eps_skip=0, teacher_eps_equal=0)

Noisy teacher: (teacher_beta=1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Skip teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0.1, teacher_eps_equal=0)

Myopic teacher: (teacher_beta=-1, teacher_gamma=0.9, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0)

Equal teacher: (teacher_beta=-1, teacher_gamma=1, teacher_eps_mistake=0, teacher_eps_skip=0, teacher_eps_equal=0.1)

PEBBLE

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PEBBLE.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

PrefPPO

Experiments can be reproduced with the following:

./scripts/[env_name]/[teacher_type]/[max_budget]/run_PrefPPO.sh [sampling_scheme: 0=uniform, 1=disagreement, 2=entropy]

note: full hyper-paramters for meta-world will be updated soon!

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

Related tags

Overview

B-Pref

Install

Run experiments using GT rewards

SAC & SAC + unsupervised pre-training

PPO & PPO + unsupervised pre-training

Run experiments on irrational teacher

PEBBLE

PrefPPO

Owner

The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Official implementation of "MetaSDF: Meta-learning Signed Distance Functions"

An self sufficient AI that crawls the web to learn how to generate art from keywords

Geometric Sensitivity Decomposition

On the model-based stochastic value gradient for continuous reinforcement learning

Spatial Contrastive Learning for Few-Shot Classification (SCL)

An All-MLP solution for Vision, from Google AI

On Out-of-distribution Detection with Energy-based Models

[CVPR 2021] "Multimodal Motion Prediction with Stacked Transformers": official code implementation and project page.

GestureSSD CBAM - A gesture recognition web system based on SSD and CBAM, using pytorch, flask and node.js

Differentiable Simulation of Soft Multi-body Systems

Training Cifar-10 Classifier Using VGG16

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

SIEM Logstash parsing for more than hundred technologies

Rethinking of Pedestrian Attribute Recognition: A Reliable Evaluation under Zero-Shot Pedestrian Identity Setting

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

The code for replicating the experiments from the LFI in SSMs with Unknown Dynamics paper.

Style transfer between images was performed using the VGG19 model

Reimplementation of Learning Mesh-based Simulation With Graph Networks