Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Last update: Nov 07, 2022

Related tags

Overview

CQL-JAX

This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on top of the SAC base of JAX-RL.

Usage

Install Dependencies-

pip install -r requirements.txt
pip install "jax[cuda111]<=0.21.1" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Run CQL-

python train_offline.py --env_name=hopper-expert-v0 --min_q_weight=5

Please use the following values of min_q_weight on MuJoCo tasks to reproduce CQL results from IQL paper-

Domain	medium	medium-replay	medium-expert
walker	10	1	10
hopper	5	5	1
cheetah	90	80	100

For antmaze tasks min_q_weight=10 is found to work best.

In case of Out-Of Memory errors in JAX, try running with the following env variables-

XLA_PYTHON_CLIENT_MEM_FRACTION=0.80 python ...
XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 python ...

Performance & Runtime

Returns are more or less same as the torch implementation and comparable to IQL-

Task	CQL(PyTorch)	CQL(JAX)	IQL
hopper-medium-v2	58.5	74.6	66.3
hopper-medium-replay-v2	95.0	92.1	94.7
hopper-medium-expert-v2	105.4	83.2	91.5
antmaze-umaze-v0	74.0	69.5	87.5
antmaze-umaze-diverse-v0	84.0	78.7	62.2
antmaze-medium-play-v0	61.2	14.2	71.2
antmaze-medium-diverse-v0	53.7	10.7	70.2
antmaze-large-play-v0	15.8	0.0	39.6
antmaze-large-diverse-v0	14.9	0.0	47.5

Wall-clock time averages to ~50 mins, improving over IQL paper's 80 min CQL and closing the gap with IQL's 20 min.

Task	CQL(JAX)	IQL
hopper-medium-v2	52	27
hopper-medium-replay-v2	54	30
hopper-medium-expert-v2	57	29

Time efficiency over the original torch implementation is more than 4 times.

For more offline RL algorithm implementations, check out the JAX-RL, IQL and rlkit repositories.

Citation

In case you use CQL-JAX for your research, please cite the following-

@misc{cqljax,
  author = {Suri, Karush},
  title = {{Conservative Q Learning in JAX.}},
  url = {https://github.com/karush17/cql-jax},
  year = {2021}
}

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

Owner

Karush Suri

Light-weight network, depth estimation, knowledge distillation, real-time depth estimation, auxiliary data.

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

Building a real-time environment using webcam frame division in OpenCV and classify cropped images using a fine-tuned vision transformers on hybryd datasets samples for facial emotion recognition.

Realtime YOLO Monster Detection With Non Maximum Supression

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution

Catch-all collection of generative art made using processing

transfer attack; adversarial examples; black-box attack; unrestricted Adversarial Attacks on ImageNet; CVPR2021 天池黑盒竞赛

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

Fine-tuning StyleGAN2 for Cartoon Face Generation

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

This program writes christmas wish programmatically. It is using turtle as a pen pointer draw christmas trees and stars.

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

Immortal tracker

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Relative Uncertainty Learning for Facial Expression Recognition

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.