A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

Overview

PGPElib

A mini library for Policy Gradients with Parameter-based Exploration [1] and friends.

This library serves as a clean re-implementation of the algorithms used in our relevant paper.

Introduction

PGPE is an algorithm for computing approximate policy gradients for Reinforcement Learning (RL) problems. pgpelib provides a clean, scalable and easily extensible implementation of PGPE, and also serves as a reference (re)implementation of ClipUp [2], an optimizer designed to work specially well with PGPE-style gradient estimation. Although they were developed in the context of RL, both PGPE and ClipUp are general purpose tools for solving optimization problems.

Here are some interesting RL agents trained in simulation with the PGPE+ClipUp implementation in pgpelib.

HumanoidBulletEnv-v0
Score: 4853
HumanoidBulletEnv-v0
Humanoid-v2
Score: 10184
Humanoid-v2
Walker2d-v2
Score: 5232
Walker2d-v2

Contents

What is PGPE?

PGPE is a derivative-free policy gradient estimation algorithm. More generally, it can be seen as a distribution-based evolutionary algorithm suitable for optimization in the domain of real numbers. With simple modifications to PGPE, one can also obtain similar algorithms like OpenAI-ES [3] and Augmented Random Search [7].

Please see the following animation for a visual explanation of how PGPE works.

The working principles of PGPE

Back to Contents


What is ClipUp?

ClipUp is a new optimizer (a gradient following algorithm) that we propose in [2] for use within distribution-based evolutionary algorithms such as PGPE. In [3, 4], it was shown that distribution-based evolutionary algorithms work well with adaptive optimizers. In those studies, the authors used the well-known Adam optimizer [5]. We argue that ClipUp is simpler and more intuitive, yet competitive with Adam. Please see our blog post and paper [2] for more details.

Back to Contents

Installation

Pre-requisites: swig is a pre-requisite for Box2D, a simple physics engine used for some RL examples. It can be installed either system-wide (using a package manager like apt) or using conda. Then you can install pgpelib using following commands:

# Install directly from GitHub
pip install git+https://github.com/nnaisense/pgpelib.git#egg=pgpelib

# Or install from source in editable mode (to run examples or to modify code)
git clone https://github.com/nnaisense/pgpelib.git
cd pgpelib
pip install -e .

If you wish to run experiments based on MuJoCo, you will need some additional setup. See this link for setup instructions.

Back to Contents

Usage

To dive into executable code examples, please see the examples directory. Below we give a very quick tutorial on how to use pgpelib for optimization.

Basic usage

pgpelib provides an ask-and-tell interface for optimization, similar to [4, 6]. The general principle is to repeatedly ask the optimizer for candidate solutions to evaluate, and then tell it the corresponding fitness values so it can update the current solution or population. Using this interface, a typical communication with the solver is as follows:

from pgpelib import PGPE
import numpy as np

pgpe = PGPE(
    solution_length=5,   # A solution vector has the length of 5
    popsize=20,          # Our population size is 20

    #optimizer='clipup',          # Uncomment these lines if you
    #optimizer_config = dict(     # would like to use the ClipUp
    #    max_speed=...,           # optimizer.
    #    momentum=0.9
    #),

    #optimizer='adam',            # Uncomment these lines if you
    #optimizer_config = dict(     # would like to use the Adam
    #    beta1=0.9,               # optimizer.
    #    beta2=0.999,
    #    epsilon=1e-8
    #),

    ...
)

# Let us run the evolutionary computation for 1000 generations
for generation in range(1000):

    # Ask for solutions, which are to be given as a list of numpy arrays.
    # In the case of this example, solutions is a list which contains
    # 20 numpy arrays, the length of each numpy array being 5.
    solutions = pgpe.ask()

    # This is the phase where we evaluate the solutions
    # and prepare a list of fitnesses.
    # Make sure that fitnesses[i] stores the fitness of solutions[i].
    fitnesses = [...]  # compute the fitnesses here

    # Now we tell the result of our evaluations, fitnesses,
    # to our solver, so that it updates the center solution
    # and the spread of the search distribution.
    pgpe.tell(fitnesses)

# After 1000 generations, we print the center solution.
print(pgpe.center)

pgpelib also supports adaptive population sizes, where additional solutions are sampled from the current search distribution and evaluated until a certain number of total simulator interactions (i.e. timesteps) is reached. Use of this technique can be enabled by specifying the num_interactions parameter, as demonstrated by the following snippet:

pgpe = PGPE(
    solution_length=5,      # Our RL policy has 5 trainable parameters.
    popsize=20,             # Our base population size is 20.
                            # After evaluating a batch of 20 policies,
                            # if we do not reach our threshold of
                            # simulator interactions, we will keep sampling
                            # and evaluating more solutions, 20 at a time,
                            # until the threshold is finally satisfied.

    num_interactions=17500, # Threshold for simulator interactions.
    ...
)

# Let us run the evolutionary computation for 1000 generations
for generation in range(1000):

    # We begin the inner loop of asking for new solutions,
    # until the threshold of interactions count is reached.
    while True:

        # ask for new policies to evaluate in the simulator
        solutions = pgpe.ask()

        # This is the phase where we evaluate the policies,
        # and prepare a list of fitnesses and a list of
        # interaction counts.
        # Make sure that:
        #   fitnesses[i] stores the fitness of solutions[i];
        #   interactions[i] stores the number of interactions
        #       made with the simulator while evaluating the
        #       i-th solution.
        fitnesses = [...]
        interactions = [...]

        # Now we tell the result of our evaluations
        # to our solver, so that it updates the center solution
        # and the spread of the search distribution.
        interaction_limit_reached = pgpe.tell(fitnesses, interactions)

        # If the limit on number of interactions per generation is reached,
        # pgpelib has already updated the search distribution internally.
        # So we can stop creating new solutions and end this generation.
        if interaction_limit_reached:
            break

# After 1000 generations, we print the center solution (policy).
print(pgpe.center)

Parallelization

Ease of parallelization is a massive benefit of evolutionary search techniques. pgpelib is thoughtfully agnostic when it comes to parallelization: the choice of tool used for parallelization is left to the user. We provide thoroughly documented examples of using either multiprocessing or ray for parallelizing evaluations across multiple cores on a single machine or across multiple machines. The ray example additionally demonstrates use of observation normalization when training RL agents.

Training RL agents

This repository also contains a Python script for training RL agents. The training script is configurable and executable from the command line. See the train_agents directory. Some pre-trained RL agents are also available for visualization in the agents directory.

Back to Contents

License

Please see: LICENSE.

The files optimizers.py, runningstat.py, and ranking.py contain codes adapted from OpenAI's evolution-strategies-starter repository. The license terms of those adapted codes can be found in their files.

Back to Contents

References

[1] Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., & Schmidhuber, J. (2010). Parameter-exploring policy gradients. Neural Networks, 23(4), 551-559.

[2] Toklu, N.E., Liskowski, P., & Srivastava, R.K. (2020). ClipUp: A Simple and Powerful Optimizer for Distribution-based Policy Evolution. 16th International Conference on Parallel Problem Solving from Nature (PPSN 2020).

[3] Salimans, T., Ho, J., Chen, X., Sidor, S., & Sutskever, I. (2017). Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864.

[4] Ha, D. (2017). A Visual Guide to Evolution Strategies.

[5] Kingma, D.P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of 3rd International Conference on Learning Representations (ICLR 2015).

[6] Hansen, N., Akimoto, Y., Baudis, P. (2019). CMA-ES/pycma on Github. Zenodo, DOI:10.5281/zenodo.2559634, February 2019.

[7] Mania, H., Guy, A., & Recht, B. (2018). Simple random search provides a competitive approach to reinforcement learning arXiv preprint arXiv:1803.07055.

Back to Contents

Citation

If you use this code, please cite us in your repository/paper as:

Toklu, N. E., Liskowski, P., & Srivastava, R. K. (2020, September). ClipUp: A Simple and Powerful Optimizer for Distribution-Based Policy Evolution. In International Conference on Parallel Problem Solving from Nature (pp. 515-527). Springer, Cham.

Bibtex:

@inproceedings{toklu2020clipup,
  title={ClipUp: A Simple and Powerful Optimizer for Distribution-Based Policy Evolution},
  author={Toklu, Nihat Engin and Liskowski, Pawe{\l} and Srivastava, Rupesh Kumar},
  booktitle={International Conference on Parallel Problem Solving from Nature},
  pages={515--527},
  year={2020},
  organization={Springer}
}

Back to Contents

Acknowledgements

We are thankful to developers of these tools for inspiring this implementation.

Back to Contents

《Fst Lerning of Temporl Action Proposl vi Dense Boundry Genertor》(AAAI 2020)

Update 2020.03.13: Release tensorflow-version and pytorch-version DBG complete code. 2019.11.12: Release tensorflow-version DBG inference code. 2019.1

Tencent 338 Dec 16, 2022
High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

59 Dec 14, 2022
Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups

LieTransformer This repository contains the implementation of the LieTransformer used for experiments in the paper LieTransformer: Equivariant self-at

35 Oct 18, 2022
🔥 TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

🆕 Are you looking for a new YOLOv3 implemented by TF2.0 ? If you hate the fucking tensorflow1.x very much, no worries! I have implemented a new YOLOv

3.6k Dec 26, 2022
Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation

TVT Code of TVT: Transferable Vision Transformer for Unsupervised Domain Adaptation Datasets: Digit: MNIST, SVHN, USPS Object: Office, Office-Home, Vi

37 Dec 15, 2022
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

Vowpal Wabbit 8.1k Jan 06, 2023
A script that trains a model to recognize handwritten digits using the MNIST data set.

handwritten-digits-recognition A script that trains a model to recognize handwritten digits using the MNIST data set. Then it loads external files and

Hamza Sayih 1 Oct 30, 2021
Scripts and outputs related to the paper Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings.

Knowledge Graph Embeddings and Chemical Effect Prediction, 2020. Scripts and outputs related to the paper Prediction of Adverse Biological Effects of

Knowledge Graphs at the Norwegian Institute for Water Research 1 Nov 01, 2021
PyTorch Implementation of PIXOR: Real-time 3D Object Detection from Point Clouds

PIXOR: Real-time 3D Object Detection from Point Clouds This is a custom implementation of the paper from Uber ATG using PyTorch 1.0. It represents the

Philip Huang 270 Dec 14, 2022
Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Storchastic is a PyTorch library for stochastic gradient estimation in Deep Learning

Emile van Krieken 140 Dec 30, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
StackNet is a computational, scalable and analytical Meta modelling framework

StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science

Marios Michailidis 1.3k Dec 15, 2022
A command line simple note taking app

Why yet another note taking program? note was designed with a very specific target in mind: me, and my 2354 scraps of paper. It runs from the command

64 Nov 20, 2022
Multimodal commodity image retrieval 多模态商品图像检索

Multimodal commodity image retrieval 多模态商品图像检索 Not finished yet... introduce explain:The specific description of the project and the product image dat

hongjie 8 Nov 25, 2022
HINet: Half Instance Normalization Network for Image Restoration

HINet: Half Instance Normalization Network for Image Restoration Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, Chengpeng Chen Paper: https://arxiv.org

303 Dec 31, 2022
Pytorch implementation for "Density-aware Chamfer Distance as a Comprehensive Metric for Point Cloud Completion" (NeurIPS 2021)

Density-aware Chamfer Distance This repository contains the official PyTorch implementation of our paper: Density-aware Chamfer Distance as a Comprehe

Tong WU 93 Dec 15, 2022
This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.

ASL-Skeleton3D and ASL-Phono Datasets Generator The ASL-Skeleton3D contains a representation based on mapping into the three-dimensional space the coo

Cleison Amorim 5 Nov 20, 2022
A parametric soroban written with CADQuery.

A parametric soroban written in CADQuery The purpose of this project is to demonstrate how "code CAD" can be intuitive to learn. See soroban.py for a

Lee 4 Aug 13, 2022
PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning"

deepGCFX PyTorch implementation for our AAAI 2022 Paper "Graph-wise Common Latent Factor Extraction for Unsupervised Graph Representation Learning" Pr

Thilini Cooray 4 Aug 11, 2022