Official Repository for "Robust On-Policy Data Collection for Data Efficient Policy Evaluation" (NeurIPS 2021 Workshop on OfflineRL).

Overview

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

Source code of Robust On-Policy Data Collection for Data-Efficient Policy Evaluation (NeurIPS 2021 Workshop on OfflineRL).

The code is written in python 3, using Pytorch for the implementation of the deep networks and OpenAI gym for the experiment domains.

Requirements

To install the required codebase, it is recommended to create a conda or a virtual environment. Then, run the following command

pip install -r requirements.txt

Preparation

To conduct policy evaluation, we need to prepare a set of pretrained policies. You can skip this part if you already have the pretrained models in policy_models/ and the corresponding policy values in experiments/policy_info.py

Pretrained Policy

Train the policy models using REINFORCE in different domains by running:

python policy/reinfoce.py --exp_name {exp_name}

where {exp_name} can be MultiBandit, GridWorld, CartPole or CartPoleContinuous. The parameterized epsilon-greedy policies for MultiBandit and GridWorld can be obtained by running:

python policy/handmade_policy.py

Policy Value

Option 1: Run in sequence

For each policy model, the true policy value is estimated with $10^6$ Monte Carlo roll-outs by running:

python experiments/policy_value.py --policy_name {policy_name} --seed {seed} --n 10e6

This will print the average steps, true policy value and variance of returns. Make sure you copy these results into the file experiment/policy_info.py.

Option 2: Run in parallel

If you can use qsub or sbatch, you can also run jobs/jobs_value.py with different seeds in parallel and merge them by running experiments/merge_values.py to get $10^6$ Monte Carlo roll-outs. The policy values reported in this paper were obtained in this way.

Evaluation

Option 1: Run in sequence

The main running script for policy evaluation is experiments/evaluate.py. The following running command is an example of Monte Carlo estimation for Robust On-policy Acting with $\rho=1.0$ for the policy model_GridWorld_5000.pt with seeds from 0 to 199.

python experiments/evaluate.py --policy_name GridWorld_5000 --ros_epsilon 1.0 --collectors RobustOnPolicyActing --estimators MonteCarlo --eval_steps "7,14,29,59,118,237,475,951,1902,3805,7610,15221,30443,60886" --seeds "0,199"

To conduct policy evaluation with off-policy data, you need to add the following arguments to the above running command:

--combined_trajectories 100 --combined_ops_epsilon 0.10 

Option 2: Run in parallel

If you can use qsub or sbatch, you may only need to run the script jobs/jobs.py where all experiments in the paper are arranged. The log will be saved in log/ and the seed results will be saved in results/seeds. Note that we save the data collection cache in results/data and re-use it for different value estimations. To merge results of different seeds, run experiments/merge_results.py, and the merged results will be saved in results/.

Ploting

When the experiments are finished, all the figures in the paper are produced by running

python drawing/draw.py

Citing

If you use this repository in your work, please consider citing the paper

@inproceedings{zhong2021robust,
    title = {Robust On-Policy Data Collection for Data-Efficient Policy Evaluation},
    author = {Rujie Zhong, Josiah P. Hanna, Lukas Schäfer and Stefano V. Albrecht},
    booktitle = {NeurIPS Workshop on Offline Reinforcement Learning (OfflineRL)},
    year = {2021}
}
Owner
Autonomous Agents Research Group (University of Edinburgh)
Official code repositories for projects by the Autonomous Agents Research Group
Autonomous Agents Research Group (University of Edinburgh)
Neural Architecture Search Powered by Swarm Intelligence 🐜

Neural Architecture Search Powered by Swarm Intelligence 🐜 DeepSwarm DeepSwarm is an open-source library which uses Ant Colony Optimization to tackle

288 Oct 28, 2022
Pyeventbus: a publish/subscribe event bus

pyeventbus pyeventbus is a publish/subscribe event bus for Python 2.7. simplifies the communication between python classes decouples event senders and

15 Apr 21, 2022
PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time The implementation is based on SIGGRAPH Aisa'20. Dependencies Python 3.7 Ubuntu

soratobtai 124 Dec 08, 2022
Machine learning algorithms for many-body quantum systems

NetKet NetKet is an open-source project delivering cutting-edge methods for the study of many-body quantum systems with artificial neural networks and

NetKet 413 Dec 31, 2022
Dynamic Realtime Animation Control

Our project is targeted at making an application that dynamically detects the user’s expressions and gestures and projects it onto an animation software which then renders a 2D/3D animation realtime

Harsh Avinash 10 Aug 01, 2022
Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch]

Ensemble Learning Priors Driven Deep Unfolding for Scalable Snapshot Compressive Imaging [PyTorch] Abstract Snapshot compressive imaging (SCI) can rec

integirty 6 Nov 01, 2022
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
The `rtdl` library + The official implementation of the paper

The `rtdl` library + The official implementation of the paper "Revisiting Deep Learning Models for Tabular Data"

Yandex Research 510 Dec 30, 2022
Segmentation for medical image.

EfficientSegmentation Introduction EfficientSegmentation is an open source, PyTorch-based segmentation framework for 3D medical image. Features A whol

68 Nov 28, 2022
Neural Surface Maps

Neural Surface Maps Official implementation of Neural Surface Maps - Luca Morreale, Noam Aigerman, Vladimir Kim, Niloy J. Mitra [Paper] [Project Page]

Luca Morreale 49 Dec 13, 2022
Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Detecting Twenty-thousand Classes using Image-level Supervision Detic: A Detector with image classes that can use image-level labels to easily train d

Meta Research 1.3k Jan 04, 2023
Diffusion Probabilistic Models for 3D Point Cloud Generation (CVPR 2021)

Diffusion Probabilistic Models for 3D Point Cloud Generation [Paper] [Code] The official code repository for our CVPR 2021 paper "Diffusion Probabilis

Shitong Luo 323 Jan 05, 2023
Code for the paper "There is no Double-Descent in Random Forests"

Code for the paper "There is no Double-Descent in Random Forests" This repository contains the code to run the experiments for our paper called "There

2 Jan 14, 2022
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

Rishikesh (ऋषिकेश) 175 Dec 23, 2022
PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Introduction PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/tempor

RAGE UDAY KIRAN 43 Jan 08, 2023
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Implementation of Invariant Point Attention, used for coordinate refinement in the structure module of Alphafold2, as a standalone Pytorch module

Invariant Point Attention - Pytorch Implementation of Invariant Point Attention as a standalone module, which was used in the structure module of Alph

Phil Wang 113 Jan 05, 2023
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [CVPR 2021]

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [BCNet, CVPR 2021] This is the official pytorch implementation of BCNet built on

Lei Ke 434 Dec 01, 2022
Web-interface + rest API for classification and regression (https://jeff1evesque.github.io/machine-learning.docs)

Machine Learning This project provides a web-interface, as well as a programmatic-api for various machine learning algorithms. Supported algorithms: S

Jeff Levesque 252 Dec 11, 2022