JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Last update: Sep 28, 2022

Overview

Optimal Model Design for Reinforcement Learning

This repository contains JAX code for the paper

Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation

by Evgenii Nikishin, Romina Abachi, Rishabh Agarwal, and Pierre-Luc Bacon.

Summary

Model based reinforcement learning typically trains the dynamics and reward functions by minimizing the error of predictions. The error is only a proxy to maximizing the sum of rewards, the ultimate goal of the agent, leading to the objective mismatch. We propose an end-to-end algorithm called Optimal Model Design (OMD) that optimizes the returns directly for model learning. OMD leverages the implicit function theorem to optimize the model parameters and forms the following computational graph:

Installation

We assume that you use Python 3. To install the necessary dependencies, run the following commands:

1. virtualenv ~/env_omd
2. source ~/env_omd/bin/activate
3. pip install -r requirements.txt

To use JAX with GPU, follow the official instructions. To install MuJoCo, check the instructions.

Run

For historical reasons, the code is divided into 3 parts.

Tabular

All results for the tabular experiments could be reproduced by running the tabular.ipynb notebook.

To open the notebook in Google Colab, use this link.

CartPole

To train the OMD agent on CartPole, use the following commands:

cd cartpole
python train.py --agent_type omd

We also provide the implementation of the corresponding MLE and VEP baselines. To train the agents, change the --agent_type flag to mle or vep.

MuJoCo

To train the OMD agent on MuJoCo HalfCheetah-v2, use the following commands:

cd mujoco
python train.py --config.algo=omd

To train the MLE baseline, change the --config.algo flag to mle.

Acknowledgements

Tabular experiments are based on the code from the library for fixed points in JAX
Code for MuJoCo is based on the implementation of SAC in JAX
Code for CartPole reuses parts of the SAC implementation in PyTorch
For experimentation, we used a moditication of the slurm runner

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Related tags

Overview

Optimal Model Design for Reinforcement Learning

Summary

Installation

Run

Tabular

CartPole

MuJoCo

Acknowledgements

Owner

Evgenii Nikishin

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Developed an optimized algorithm which finds the most optimal path between 2 points in a 3D Maze using various AI search techniques like BFS, DFS, UCS, Greedy BFS and A*

Prometheus exporter for Cisco Unified Computing System (UCS) Manager

PyTorch implementation of NeurIPS 2021 paper: "CoFiNet: Reliable Coarse-to-fine Correspondences for Robust Point Cloud Registration"

Accelerated deep learning R&D

My course projects for the 2021 Spring Machine Learning course at the National Taiwan University (NTU)

Open standard for machine learning interoperability

Material related to the Principles of Cloud Computing course.

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

pq is a jq-like Pickle file viewer

PyTorch implementation of the paper Ultra Fast Structure-aware Deep Lane Detection

Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Reinforcement learning framework and algorithms implemented in PyTorch.

Sharpness-Aware Minimization for Efficiently Improving Generalization

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Lux AI environment interface for RLlib multi-agents

An self sufficient AI that crawls the web to learn how to generate art from keywords

Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.