Off-policy continuous control in PyTorch, with RDPG, RTD3 & RSAC

Overview

offpcc_logo

arXiv technical report soon available.

we are updating the readme to be as comprehensive as possible

Please ask any questions in Issues, thanks.

Introduction

This PyTorch repo implements off-policy RL algorithms for continuous control, including:

  • Standard algorithms: DDPG, TD3, SAC
  • Image-based algorithm: ConvolutionalSAC
  • Recurrent algorithms: RecurrentDPG, RecurrentTD3, RecurrentSAC, RecurrentSACSharing (see report)

where recurrent algorithms are generally not available in other repos.

Structure of codebase

Here, we talk about the organization of this code. In particular, we will talk about

  • Folder: where are certain files located?
  • Classes: how are classes designed to interact with each other?
  • Training/evaluation loop: how environment interaction, learning and evaluation alternate?

A basic understanding of these will make other details easy to understand from code itself.

Folders

  • file
    • containing plots reproducing stable-baselines3; you don’t need to touch this
  • offpcc (the good stuff; you will be using this)
    • algorithms (where DDPG, TD3 and SAC are implemented)
    • algorithms_recurrent (where RDPG, RTD3 and RSAC are implemented)
    • basics (abstract classes, stuff shared by algorithms or algorithms_recurrent, code for training)
    • basics_sb3 (you don’t need to touch this)
    • configs (gin configs)
    • domains (all custom domains are stored within and registered properly)
  • pics_for_readme
    • random pics; you don’t need to touch this
  • temp
    • potentially outdated stuff; you don’t need to touch this

Relationships between classes

There are three core classes in this repo:

  • Any environment written using OpenAI’s API would have:
    • reset method outputs the current state
    • step method takes in an action, outputs (reward, next state, done, info)
  • OffPolicyRLAlgorithm and RecurrentOffPolicyRLAlgorithm are the base class for all algorithms listed in introduction. You should think about them as neural network (e.g., actors, critics, CNNs, RNNs) wrappers that are augmented with methods to help these networks interact with other stuff:
    • act method takes in state from env, outputs action back to env
    • update_networks method takes in batch from buffer
  • The replay buffers ReplayBuffer and RecurrentReplayBuffer are built to interact with the environment and the algorithm classes
    • push method takes in a transition from env
    • sample method outputs a batch for algorithm’s update_networks method

Their relationships are best illustrated by a diagram:

offpcc_steps

Structure of training/evaluation loop

In this repo, we follow the training/evaluation loop style in spinning-up (this is essentially the script: basics/run_fns and the function train). It follows this basic structure, with details added for tracking stats and etc:

state = env.reset()
for t range(total_steps):  # e.g., 1 million
    # environment interaction
    if t >= update_after:
        action = algorithm.act(state)
    else:
        action = env.action_space.sample()
    next_state, reward, done, info = env.step(action)
   	# learning
    if t >= update_after and (t + 1) % update_every == 0:
        for j in range(update_every):
            batch = buffer.sample()
            algorithm.update_networks(batch)
    # evaluation
    if (t + 1) % num_steps_per_epoch == 0:
        ep_len, ep_ret = test_for_one_episode(test_env, algorithm)

Dependencies

Dependencies are available in requirements.txt; although there might be one or two missing dependencies that you need to install yourself.

Train an agent

Setup (wandb & GPU)

Add this to your bashrc or bash_profile and source it.

You should replace “account_name” with whatever wandb account that you want to use.

export OFFPCC_WANDB_ENTITY="account_name"

From the command line:

cd offpcc
CUDA_VISIBLE_DEVICES=3 OFFPCC_WANDB_PROJECT=project123 python launch.py --env <env-name> --algo <algo-name> --config <config-path> --run_id <id>

For DDPG, TD3, SAC

On pendulum-v0:

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1

Commands and plots for benchmarking on Pybullet domains are in a Issue called “Performance check against SB3”.

For RecurrentDDPG, RecurrentTD3, RecurrentSAC

On pendulum-p-v0:

python launch.py --env pendulum-p-v0 --algo rsac --config configs/test/template_recurrent_100k.gin --run_id 1

Reproducing paper results

To reproduce paper results, simply run the commands in the previous section with the appropriate env name (listed below) and config files (their file names are highly readable). Mapping between env names used in code and env names used in paper:

  • pendulum-v0: pendulum
  • pendulum-p-v0: pendulum-p
  • pendulum-va-v0: pendulum-v
  • dmc-cartpole-balance-v0: cartpole-balance
  • dmc-cartpole-balance-p-v0: cartpole-balance-p
  • dmc-cartpole-balance-va-v0: cartpole-balance-v
  • dmc-cartpole-swingup-v0: cartpole-swingup
  • dmc-cartpole-swingup-p-v0: cartpole-swingup-p
  • dmc-cartpole-swingup-va-v0: cartpole-swingup-v
  • reacher-pomdp-v0: reacher-pomdp
  • water-maze-simple-pomdp-v0: watermaze
  • bumps-normal-test-v0: push-r-bump

Render learned policy

Create a folder in the same directory as offpcc, called results. In there, create a folder with the name of the environment, e.g., pendulum-p-v0. Within that env folder, create a folder with the name of the algorithm, e.g., rsac. You can get an idea of the algorithms available from the algo_name2class diectionary defined in offpcc/launch.py. Within that algorithm folder, create a folder with the run_id, e.g., 1. Simply put the saved actor (also actor summarizer for recurrent algorithms) into that inner most foler - they can be downloaded from the wandb website after your run finishes. Finally, go back into offpcc, and call

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1 --render

For bumps-normal-test-v0, you need to modify the test_for_one_episode function within offpcc/basics/run_fns.py because, for Pybullet environments, the env.step must only appear once before the env.reset() call.

Owner
Zhihan
Zhihan
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on

Su Pang 254 Dec 16, 2022
The official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness.

This repository is the official implementation of A Unified Game-Theoretic Interpretation of Adversarial Robustness. Requirements pip install -r requi

Jie Ren 17 Dec 12, 2022
The code for two papers: Feedback Transformer and Expire-Span.

transformer-sequential This repo contains the code for two papers: Feedback Transformer Expire-Span The training code is structured for long sequentia

Facebook Research 125 Dec 25, 2022
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

News! Aug 2020: v0.4.0 version of AlphaPose is released! Stronger tracking! Include whole body(face,hand,foot) keypoints! Colab now available. Dec 201

Machine Vision and Intelligence Group @ SJTU 6.7k Dec 28, 2022
Implementation of the Swin Transformer in PyTorch.

Swin Transformer - PyTorch Implementation of the Swin Transformer architecture. This paper presents a new vision Transformer, called Swin Transformer,

597 Jan 03, 2023
FaceAPI: AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using TensorFlow/JS

FaceAPI AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using

Vladimir Mandic 395 Dec 29, 2022
Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

A method to solve the Higgs boson challenge using Least Squares - Novae This project is the Project 1 of EPFL CS-433 Machine Learning. The project is

Giacomo Orsi 1 Nov 09, 2021
Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

Galaxy images labelled by morphology (shape). Aimed at ML debugging and teaching.

Mike Walmsley 14 Nov 28, 2022
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
Neural Cellular Automata + CLIP

🧠 Text-2-Cellular Automata Using Neural Cellular Automata + OpenAI CLIP (Work in progress) Examples Text Prompt: Cthulu is watching cthulu_is_watchin

Mainak Deb 21 Dec 19, 2022
Apply AnimeGAN-v2 across frames of a video clip

title emoji colorFrom colorTo sdk app_file pinned AnimeGAN-v2 For Videos 🔥 blue red gradio app.py false AnimeGAN-v2 For Videos Apply AnimeGAN-v2 acro

Nathan Raw 36 Oct 18, 2022
A Python library for working with arbitrary-dimension hypercomplex numbers following the Cayley-Dickson construction of algebras.

Hypercomplex A Python library for working with quaternions, octonions, sedenions, and beyond following the Cayley-Dickson construction of hypercomplex

7 Nov 04, 2022
Repository for the semantic WMI loss

Installation: pip install -e . Installing DL2: First clone DL2 in a separate directory and install it using the following commands: git clone https:/

Nick Hoernle 4 Sep 15, 2022
A hybrid SOTA solution of LiDAR panoptic segmentation with C++ implementations of point cloud clustering algorithms. ICCV21, Workshop on Traditional Computer Vision in the Age of Deep Learning

ICCVW21-TradiCV-Survey-of-LiDAR-Cluster Motivation In contrast to popular end-to-end deep learning LiDAR panoptic segmentation solutions, we propose a

YimingZhao 103 Nov 22, 2022
Mesh Graphormer is a new transformer-based method for human pose and mesh reconsruction from an input image

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

LARICS Lab 10 Aug 13, 2022
(ICCV 2021) Official code of "Dressing in Order: Recurrent Person Image Generation for Pose Transfer, Virtual Try-on and Outfit Editing."

Dressing in Order (DiOr) 👚 [Paper] 👖 [Webpage] 👗 [Running this code] The official implementation of "Dressing in Order: Recurrent Person Image Gene

Aiyu Cui 277 Dec 28, 2022
Augmentation for Single-Image-Super-Resolution

SRAugmentation Augmentation for Single-Image-Super-Resolution Implimentation CutBlur Cutout CutMix Cutup CutMixup Blend RGBPermutation Identity OneOf

Yubo 6 Jun 27, 2022
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 = TF = 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022