Addressing Function Approximation Error in Actor-Critic Methods

PyTorch implementation of Twin Delayed Deep Deterministic Policy Gradients (TD3). If you use our code or data please cite the paper.

Method is tested on MuJoCo continuous control tasks in OpenAI gym. Networks are trained using PyTorch 1.2 and Python 3.7.

Usage

The paper results can be reproduced by running:

./run_experiments.sh

Experiments on single environments can be run by calling:

python main.py --env HalfCheetah-v2

Hyper-parameters can be modified with different arguments to main.py. We include an implementation of DDPG (DDPG.py), which is not used in the paper, for easy comparison of hyper-parameters with TD3. This is not the implementation of "Our DDPG" as used in the paper (see OurDDPG.py).

Algorithms which TD3 compares against (PPO, TRPO, ACKTR, DDPG) can be found at OpenAI baselines repository.

Results

Code is no longer exactly representative of the code used in the paper. Minor adjustments to hyperparamters, etc, to improve performance. Learning curves are still the original results found in the paper.

Learning curves found in the paper are found under /learning_curves. Each learning curve are formatted as NumPy arrays of 201 evaluations (201,), where each evaluation corresponds to the average total reward from running the policy for 10 episodes with no exploration. The first evaluation is the randomly initialized policy network (unused in the paper). Evaluations are peformed every 5000 time steps, over a total of 1 million time steps.

Numerical results can be found in the paper, or from the learning curves. Video of the learned agent can be found here.

Bibtex

@inproceedings{fujimoto2018addressing,
  title={Addressing Function Approximation Error in Actor-Critic Methods},
  author={Fujimoto, Scott and Hoof, Herke and Meger, David},
  booktitle={International Conference on Machine Learning},
  pages={1582--1591},
  year={2018}
}

Author's PyTorch implementation of TD3 for OpenAI gym tasks

Related tags

Overview

Addressing Function Approximation Error in Actor-Critic Methods

Usage

Results

Bibtex

Owner

Scott Fujimoto

A Keras implementation of CapsNet in the paper: Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. Dynamic Routing Between Capsules

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers

DeconvNet : Learning Deconvolution Network for Semantic Segmentation

Implementation of Pooling by Sliced-Wasserstein Embedding (NeurIPS 2021)

A new data augmentation method for extreme lighting conditions.

This codebase proposes modular light python and pytorch implementations of several LiDAR Odometry methods

VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.

Multivariate Boosted TRee

Google AI Open Images - Object Detection Track: Open Solution

Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

PyTorch implementation of DD3D: Is Pseudo-Lidar needed for Monocular 3D Object detection?

Tensorforce: a TensorFlow library for applied reinforcement learning

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

Here is the implementation of our paper S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations.

Collection of NLP model explanations and accompanying analysis tools

The DL Streamer Pipeline Zoo is a catalog of optimized media and media analytics pipelines.

Elastic weight consolidation technique for incremental learning.