Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Last update: Dec 31, 2022

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name	Type	Estimator	Paper	File
Q-Learning	Value-based / Off policy	TD	Watkins et al. Q-Learning. Machine Learning, 1992	q_learning.py
REINFORCE	Policy-based On policy	MC	Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000.	reinforce.py
DQN	Value-based / Off policy	TD	Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015.	doing
A2C	Actor-Critic / On policy	n-step TD	Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.	a2c.py
A3C	Actor-Critic / On policy	n-step TD	.Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016	a3c.py
ACER	Actor-Critic / On policy	GAE	Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017.	doing
ACKTR	Actor-Critic / On policy	GAE	Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017.	doing
PPO	Actor-Critic / On policy	GAE	Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017.	ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component	Description
policy	neural network model
gamma	discount factor of cumulative reward
lr	learning rate. i.e. `lr_actor`, `lr_critic`
lr_decay	weight decay to schedule the learning rate
lr_scheduler	scheduler for the learning rate
coef_critic_loss	coefficient of critic loss
coef_entropy_loss	coefficient of entropy loss
writer	summary writer to record information
buffer	replay buffer to store historical trajectories
use_cuda	use GPU
clip_grad	gradients clipping
max_grad_norm	maximum norm of gradients clipped
norm_advantage	advantage normalization
open_tb	open summary writer
open_tqdm	open process bar

Methods

Methods	Description
preprocess_obs()	preprocess observation before input into the neural network
select_action()	use actor network to select an action based on the policy distribution.
estimate_obs()	use critic network to estimate the value of observation
update()	update the parameter by calculate losses and gradients
train()	set the neural network to train mode
eval()	set the neural network to evaluate mode
save()	save the model parameters
load()	load the model parameters

Update & To-do & Limitations

Update History

2021-12-09 ADD TRICK:norm_critic_loss in PPO
2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
2021-12-07 ADD ALGO: A3C
2021-12-05 ADD ALGO: PPO
2021-11-28 ADD ALGO: A2C
2021-11-20 ADD ALGO: Q learning, Reinforce

Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Related tags

Overview

PyTorch RL Minimal Implementations

RL Algorithms List

Quick Start

Requirements

Abstract Agent

Components / Parameters

Methods

Update & To-do & Limitations

Update History

To-do List

Current Limitations

Reference & Acknowledgements

Owner

Gemini Light

Principled Detection of Out-of-Distribution Examples in Neural Networks

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Code to replicate the key results from Exploring the Limits of Out-of-Distribution Detection

Read number plates with https://platerecognizer.com/

Implementation of the Swin Transformer in PyTorch.

An implementation of IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification

Repository for Traffic Accident Benchmark for Causality Recognition (ECCV 2020)

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Pytorch implementation of our paper accepted by NeurIPS 2021 -- Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

This framework implements the data poisoning method found in the paper Adversarial Examples Make Strong Poisons

Easy-to-use,Modular and Extendible package of deep-learning based CTR models .

Office source code of paper UniFuse: Unidirectional Fusion for 360$^\circ$ Panorama Depth Estimation

Faster RCNN with PyTorch

Optical machine for senses sensing using speckle and deep learning

Multi-Modal Fingerprint Presentation Attack Detection: Evaluation On A New Dataset

This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

Fast Neural Style for Image Style Transform by Pytorch

[CVPR2022] Representation Compensation Networks for Continual Semantic Segmentation