Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

Last update: Nov 24, 2022

Related tags

Overview

README

Repository containing the code for the paper "Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions". Specifically, an implementation of SAC + Robust Control Barrier Functions (RCBFs) for safe reinforcement learning in two custom environments.

While exploring, an RL agent can take actions that lead the system to unsafe states. Here, we use a differentiable RCBF safety layer that minimially alters (in the least-squares sense) the actions taken by the RL agent to ensure the safety of the agent.

Robust control barrier functions

As explained in the paper, RCBFs are formulated with respect to differential inclusions that serve to represent disturbed dynamical system (x_dot \in f(x) + g(x)u + D(x)). The QP used to ensure the system's safety is given by:

u_star(x) = minimize_u ||u||^2 + l ||epsilon||^2
subject to min. h_dot(x, D(x), u, u_RL) > - gamma * h(x) + epsilon

In this work, the disturbance set D in the differential inclusion is learned via Gaussian Processes (GPs). The underlying library is GPyTorch.

Coupling RL & RCBFs to improve training performance

The above is sufficient to ensure the safety of the system, however, we would also like to improve the performance of the learning by letting the RCBF layer guide the training. This is achieved via:

Using a differentiable version of the safety layer that allows us to backpropagte through the RCBF based Quadratic Program (QP).
Using the GPs and the dynamics prior to generate synthetic data (model-based RL).

Other approaches

In addition, the approach is compared against two other frameworks (implementated here) in the experiments:

A vanilla baseline that uses SAC with RCBFs without generating synthetic data nor backproping through the QP (RL loss computed wrt ouput of RL policy).
A modified approach from "End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks" that replaces their discrete time CBF formulation with RCBFs, but makes use of the supervised learning component to speed up the learning.

Running the experiments

The two environments are Unicycle and SimulatedCars. Unicycle involves a unicycle robot tasked with reaching a desired location while avoiding obstacles and SimulatedCars involves a chain of cars driving in a lane, the RL agent controls the 4th car and must try minimzing control effort while avoiding colliding with the other cars.

Running the proposed approach: python main.py --env SimulatedCars --cuda --updates_per_step 2 --batch_size 512 --seed 12345 --model_based
Running the baseline: python main.py --env SimulatedCars --cuda --updates_per_step 1 --batch_size 256 --seed 12345 --no_diff_qp
Running the modified approach from "End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks": python main.py --env SimulatedCars --cuda --updates_per_step 1 --batch_size 256 --seed 12345 --no_diff_qp --use_comp True

Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

Related tags

Overview

README

Robust control barrier functions

Coupling RL & RCBFs to improve training performance

Other approaches

Running the experiments

Owner

Yousef Emam

Video Contrastive Learning with Global Context

Grammar Induction using a Template Tree Approach

Cross View SLAM

Individual Treatment Effect Estimation

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Contrastive Learning Inverts the Data Generating Process

PyTorch Implementation for "ForkGAN with SIngle Rainy NIght Images: Leveraging the RumiGAN to See into the Rainy Night"

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

Out-of-distribution detection using the pNML regret. NeurIPS2021

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

A system used to detect whether a person is wearing a medical mask or not.

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

SNIPS: Solving Noisy Inverse Problems Stochastically

Studying Python release adoptions by looking at PyPI downloads

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

Safe Model-Based Reinforcement Learning using Robust Control Barrier Functions

Related tags

Overview

README

Robust control barrier functions

Coupling RL & RCBFs to improve training performance

Other approaches

Running the experiments

Owner

Yousef Emam

Video Contrastive Learning with Global Context

Grammar Induction using a Template Tree Approach

Cross View SLAM

Individual Treatment Effect Estimation

PyTorch implementation for the Neuro-Symbolic Sudoku Solver leveraging the power of Neural Logic Machines (NLM)

Contrastive Learning Inverts the Data Generating Process

PyTorch Implementation for "ForkGAN with SIngle Rainy NIght Images: Leveraging the RumiGAN to See into the Rainy Night"

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

Out-of-distribution detection using the pNML regret. NeurIPS2021

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

A system used to detect whether a person is wearing a medical mask or not.

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

SNIPS: Solving Noisy Inverse Problems Stochastically

Studying Python release adoptions by looking at PyPI downloads

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

A study project using the AA-RMVSNet to reconstruct buildings from multiple images

Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队